Host: The Japanese Society for Artificial Intelligence
Name : The 35th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 35
Location : [in Japanese]
Date : June 08, 2021 - June 11, 2021
Humans learn the names of objects by associating words to objects. It has been reported that joint attention, which is an ability to identify the target object, facilitates the acquisition of word meaning. We believe that this ability is also important for robots to flexibly acquire new words in the daily environment through interaction with humans. In this paper, we propose an algorithm that enables robots to learn word meanings in a cluttered scene by identifying the target object utilizing joint attention and co-occurrence of words and objects. In the proposed algorithm, a robot detects multiple objects using a region proposal network and selects one of them based on joint attention and the co-occurrence of words and objects. Finally, the robot acquires the word meaning by associating the word to the selected object by multimodal latent Dirichlet allocation.