Predicting the grasp stability before lifting an object, i.e. whether a grasped object will move with respect to the gripper, gives more time to modify unstable grasps compared to after-lift slip detection. Recently, deep learning relying on visual and tactile information becomes increasingly popular. However, how to combine visual and tactile data effectively is still under research. In this paper, we propose to fuse visual and tactile data by introducing self-attention (SA) mechanisms for predicting grasp stability. In our experiments, we use two uSkin tactile sensors and one
Spresense
camera sensor. A past image of the object, not collected immediately before or during grasping, is used, as it might be more readily available. Dataset collection is done by grasping and lifting 35 daily objects 1050 times in total with various forces and grasping positions. As a result, the predicted accuracy improves over 2.89% compared to previous visual-tactile fusion research.
抄録全体を表示