A neuro-symbolic approach for multimodal reference expression comprehension

Aman JAIN; Anirudh Reddy KONDAPALLY; Kentaro YAMADA; Hitomi YANAKA

doi:10.11517/pjsai.JSAI2023.0_3U5IS401

37th (2023)

セッションID: 3U5-IS-4-01

DOI https://doi.org/10.11517/pjsai.JSAI2023.0_3U5IS401

会議情報

主催: The Japanese Society for Artificial Intelligence

会議名: 2023年度人工知能学会全国大会（第37回）

回次: 37

開催地: 熊本城ホール＋オンライン

開催日: 2023/06/06 - 2023/06/09

A neuro-symbolic approach for multimodal reference expression comprehension

*Aman JAIN, Anirudh Reddy KONDAPALLY, Kentaro YAMADA, Hitomi YANAKA

著者情報

キーワード: Human-Machine Interaction, Reference Expression Comprehension, Neuro-symbolic Models

会議録・要旨集フリー

詳細

抄録

Human-Machine Interaction (HMI) systems have gained huge interest in recent years, with reference expression comprehension being one of the main challenges. Traditionally human-machine interaction has been mostly limited to speech and visual modalities. However, to allow for more freedom in interaction, recent works have proposed the integration of additional modalities, such as gestures in HMI systems. We consider such an HMI system with pointing gestures and construct a table-top object picking scenario inside a simulated virtual reality (VR) environment to collect data. Previous works for such a task have used deep neural networks to classify the referred object, which lacks transparency. In this work, we propose an interpretable and compositional model, crucial to building robust HMI systems for real-world application, based on a neuro-symbolic approach to tackle this task. Finally we also show the generalizability of our model on unseen environments and report the results.

責任著者(Corresponding author)

会議情報

J-STAGEへの登録はこちら（無料）