Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 4O3-OS-16e-02
Conference information

Vision-Language-Conditioned Diffusion Policies for Robotic Control
*Akira KINOSEKoki OGURITomoyuki KAGAYARyo OKUMURATadahiro TANIGUCHI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Achieving robots capable of understanding human language and autonomously determining actions based on it is a significant research challenge in the fields of robotics and machine learning. If robots can accurately grasp the intentions embedded in humans' abstract instructions and execute appropriate controls, it is expected that assistance to humans and task execution efficiency will greatly improve. In this paper, we propose a imitation learning method for robot control to autonomously determine actions based on human language instructions and goal images, named Vision-Language-conditioned Diffusion Policy (VLDP). Traditional language-based robot control methods have been inadequate in fully modeling the inherent ambiguity and polysemy present in human language. VLDP addresses this issue by extracting semantics from human language instructions and goal images through a visual language model and conditioning them on a Diffusion Policy. This enables the robot to generate multiple valid actions in response to instructions containing linguistic ambiguity. Experiments evaluate the success rate of action generation based on language instructions, the ability to adapt to unseen language instructions, and the multimodality of actions generated by the proposed method.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top