Diffusion Policyによる視覚言語条件付きロボット制御

黄瀬 輝; 小栗 滉貴; 加賀屋 智之; 奥村 亮; 谷口 忠大

doi:10.11517/pjsai.JSAI2024.0_4O3OS16e02

Abstract

Achieving robots capable of understanding human language and autonomously determining actions based on it is a significant research challenge in the fields of robotics and machine learning. If robots can accurately grasp the intentions embedded in humans' abstract instructions and execute appropriate controls, it is expected that assistance to humans and task execution efficiency will greatly improve. In this paper, we propose a imitation learning method for robot control to autonomously determine actions based on human language instructions and goal images, named Vision-Language-conditioned Diffusion Policy (VLDP). Traditional language-based robot control methods have been inadequate in fully modeling the inherent ambiguity and polysemy present in human language. VLDP addresses this issue by extracting semantics from human language instructions and goal images through a visual language model and conditioning them on a Diffusion Policy. This enables the robot to generate multiple valid actions in response to instructions containing linguistic ambiguity. Experiments evaluate the success rate of action generation based on language instructions, the ability to adapt to unseen language instructions, and the multimodality of actions generated by the proposed method.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!