動作を表す言語を生成する深層学習における共同注意の有効性

小田倉 史麿; 若林 啓

doi:10.11517/pjsai.JSAI2023.0_4H2OS6a05

Abstract

Joint attention is said to play an important role in human language learning. Recently, research has been conducted on the use of joint attention for language understanding in artificial intelligence. However, previous studies only show the effectiveness of joint attention in mapping words to objects in images without motion, and the use of joint attention in mapping sentences to the actions of objects in image sequences (videos) has not been investigated. In this study, we designed a task that takes an image sequence depicting agents moving on a 2-D board and generates natural language sentences representing the subject and its actions. We propose a deep learning method that uses the trainer's joint attention for this task. Experimental results using synthetic joint attention show the accuracy was significantly improved when joint attention was used during training and testing, while it was not improved when joint attention was used only during training.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!