Host: The Japanese Society for Artificial Intelligence
Name : The 39th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 39
Location : [in Japanese]
Date : May 27, 2025 - May 30, 2025
Task success prediction is important for open-vocabulary object manipulation tasks by manipulators, because it can enhance the reliability and efficiency of manipulations. In particular, it is highly convenient for efficient task execution if success prediction can be performed on-the-fly during the object manipulation. We propose a framework that extends Contrastive λ-Repformer, which determines task success based on images before and after open-vocabulary object manipulation and a instruction, to real-time task success prediction for open-vocabulary manipulation. Our framework detects the timing of successful object manipulation by focusing on minute changes between images, which is conducted by contrasting multi-level aligned representation for images in the initial state and images at any given time. Experimental results confirm that real-time success prediction could be performed using this framework.