Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
In order for a robot to perform tasks related to human language, it needs to have a Semantic Map that maps semantic information about locations. Learning such a map often requires human intervention. In this study, we propose an active semantic mapping system by a robot that does not require human intervention, thereby reducing the burden on the user in the semantic mapping process. In this paper, we propose a method in which a robot actively learns spatial concepts and generates maps at the same time. Learning of spatial concepts is achieved through multimodal categorization using unsupervised online learning. Captions generated by CLIP, the underlying model for image captioning, are used to map the real world to the language. In order to evaluate what kind of spatial search method leads to efficient semantic mapping, we conducted experiments in a simulation environment using comparison methods which use different methods for determining the destination. We also evaluated the usefulness of the learning results for human language-related tasks in a real-world environment.