Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 4P3-OS-17c-02
Conference information

Robot Task planning with Vision-Language Model via Hand-written Instruction for Remote Control.
*Kosei TANADAYuka IWANAGAMasayoshi TSUCHINAGATakemitsu MORITakashi YAMAMOTO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

The social implementation of assistive robots is a crucial solution to problems such as labor shortages and improving the Quality of Life(QoL) in an aging society. In order to utilize robots in everyday life, a remote control system that allows users to easily manipulate robots anytime, anywhere is indispensable. One intuitive way to control robots for users is hand-written instruction, where users can freely sketch instructions on a screen. In order to control the robot using hand-written lines, it is necessary to understand the semantic information of these lines and transfer them into robot commands. In this paper, we propose a method of interpreting hand-written instructions using Vision-Language Models(VLMs). In this method, VLM takes pre-prompt including APIs, constraints, and examples as well as an observation image with hand-written lines, and outputs low-level task code sequences. Additionally, the generated code takes hand-written lines as an argument, enabling remote control that includes specifying the ambiguous position and path that are challenging to express through language. We demonstrate the high success rate of various tasks using our method. Furthermore, we show the high usability of our method in a user experiment with 10 participants by comparing it with a voice-based method.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top