Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
For robots to operate effectively in the real world, the ability to autonomously generate action plans is indispensable. This requires reasoning through future states and prediction of action effects at fine granularity. For use in human support scenarios, it is further important that planning goals can be provided in the form of human language. In previous research, we proposed a robot action planning system that generates action plans for goals specified at run time [Arnold 2023]. However, goals had to be specified as image patches depicting the goal condition, which is impractical in practice. Here, we extend the action planning system so that goals can be specified in text format, by integrating a modified CLIP [Radford 2021] model into the planning system. During plan generation, the system evaluates plans by predicting plan outcomes in image form and comparing these predictions to the goal text using CLIP. This paper focuses on evaluation of CLIP’s potential for relating images and text in our task domain. We also conduct a verification experiment of the integrated planning system and show that the system can generate plans based on goals specified in text format.