2021 Volume 39 Issue 6 Pages 549-552
Humans can learn word meanings by associating objects with words even in an environment with a plurality of objects by using joint attention, which is an ability to detect a target object that others pay attention to. In this paper, we propose a method for robots to learn word meanings using joint attention and co-occurrence of objects and words, which is modeled by multimodal latent Dirichlet allocation (MLDA). A target object is detected by using MLDA and joint attention, and MLDA is updated by the detected object. This updated MLDA can improve the accuracy of the target object detection.