手書き指示を用いたロボット遠隔操作におけるVision-Language Modelベースのロボットタスク計画の導入

棚田 晃世; 岩永 優香; 土永 将慶; 森 健光; 山本 貴史

doi:10.11517/pjsai.JSAI2024.0_4P3OS17c02

Abstract

The social implementation of assistive robots is a crucial solution to problems such as labor shortages and improving the Quality of Life(QoL) in an aging society. In order to utilize robots in everyday life, a remote control system that allows users to easily manipulate robots anytime, anywhere is indispensable. One intuitive way to control robots for users is hand-written instruction, where users can freely sketch instructions on a screen. In order to control the robot using hand-written lines, it is necessary to understand the semantic information of these lines and transfer them into robot commands. In this paper, we propose a method of interpreting hand-written instructions using Vision-Language Models(VLMs). In this method, VLM takes pre-prompt including APIs, constraints, and examples as well as an observation image with hand-written lines, and outputs low-level task code sequences. Additionally, the generated code takes hand-written lines as an argument, enabling remote control that includes specifying the ambiguous position and path that are challenging to express through language. We demonstrate the high success rate of various tasks using our method. Furthermore, we show the high usability of our method in a user experiment with 10 participants by comparing it with a voice-based method.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!