Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
Joint attention is said to play an important role in human language learning. Recently, research has been conducted on the use of joint attention for language understanding in artificial intelligence. However, previous studies only show the effectiveness of joint attention in mapping words to objects in images without motion, and the use of joint attention in mapping sentences to the actions of objects in image sequences (videos) has not been investigated. In this study, we designed a task that takes an image sequence depicting agents moving on a 2-D board and generates natural language sentences representing the subject and its actions. We propose a deep learning method that uses the trainer's joint attention for this task. Experimental results using synthetic joint attention show the accuracy was significantly improved when joint attention was used during training and testing, while it was not improved when joint attention was used only during training.