大規模言語モデルを用いたSwitching機構付きマルチモーダル検索モデルに基づく生活支援ロボットによる物体操作

是方 諒介; 兼田 寛大; 長嶋 隼矢; 今井 悠人; 杉浦 孔明

doi:10.11517/pjsai.JSAI2024.0_3T5OS6b04

38th (2024)

Session ID : 3T5-OS-6b-04

DOI https://doi.org/10.11517/pjsai.JSAI2024.0_3T5OS6b04

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence

Number : 38

Location : [in Japanese]

Date : May 28, 2024 - May 31, 2024

Fetch-and-Carry Tasks by Domestic Service Robots Based on Multimodal Retrieval Models with Switching Mechanism Using Large Language Models

*Ryosuke KOREKATA, Kanta KANEDA, Shunya NAGASHIMA, Yuto IMAI, Komei SUGIURA

Author information

Keywords: Domestic Service Robot, Object Retrieval, Large Language Model, Multimodal Foundation Model, Fetch-and-Carry

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

In this study, we aim to develop a domestic service robot (DSR) that carries an everyday object to a piece of furniture by retrieving images of target objects and receptacles from collected images of an environment, based on an open-vocabulary instruction. We propose a multimodal model that retrieves both target objects and receptacles individually using a single model based on the switching mechanism via large language models. The experimental results show that our method outperformed baseline methods on the newly-built datasets in terms of the standard metrics. Furthermore, our method achieved task success rates of more than 80% in the physical experiments.

Corresponding author

Conference information

Register with J-STAGE for free!