Abstract
This research paper presents a novel approach for accurate weight estimation in robotic manipulation of noodle-like objects. The proposed approach combines vision transformer and autoencoder techniques with action data and RGB-D encoding to enhance the capabilities of robots in manipulating objects with varying weights. A deep-learning neural network is introduced to estimate the grasping action of a robot for picking up noodle-like objects using RGB-D camera input, a 6-finger gripper, and Cartesian movement. The hardware setup and characteristics of the noodle-like objects are described. The study builds upon previous work in RGB-D perception, weight estimation, and deep learning, addressing the limitations of existing methods by incorporating robot actions. The effectiveness of vision transformers, autoencoders, self-supervised deep reinforcement learning, and deep residual learning in robotic manipulation is discussed. The proposed approach leverages the Transformer network to encode sequential and spatial information for weight estimation. Experimental evaluation on a dataset of 20,000 samples collected from real environments demonstrates the effectiveness and accuracy of the proposed approach in grappling noodle-like objects. This research contributes to advancements in robotic manipulation, enabling robots to manipulate objects with varying weights in real-world scenarios.