Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 3O1-OS-16b-02
Conference information

Task Success Prediction on Large-Scale Object Manipulation Datasets Based on Multimodal LLMs and Vision-Language Foundation Models
*Daichi SAITOMotonari KAMBARAKatsuyuki KUYOKomei SUGIURA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

For enhancing model performance in object manipulation tasks, high-performance prediction mechanisms for task success are crucial. However, existing methods are still insufficient in performance. Moreover, existing prediction mechanisms are designed to address only specific tasks, making it challenging to accommodate a diverse range of tasks. Therefore, our study aims to develop a task success prediction mechanism that can handle multiple object manipulation tasks. A key novelty of the proposed method is the introduction of the λ-Representation, which preserves all types of visual features: visual charactaristics such as colors and shapes; features aligned with natural language; features structured through natural language. For the experiments, we newly built datasets for task success prediction in object manipulation tasks based on the RT-1 dataset and VLMbench. The results show that the proposed method outperforms all baseline methods in accuracy.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top