Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Disambiguating Ambiguous Questions using Eye-Gaze in Visual Question Answering
Shun InadumiSeiya KawanoAkishige YuguchiYasutomo KawanishiKoichiro Yoshino
Author information
JOURNAL FREE ACCESS

2025 Volume 32 Issue 1 Pages 3-35

Details
Abstract

Situated conversations, which refer to visual information as visual question answering (VQA), often contain ambiguities caused by reliance on directive information. Word ellipsis exacerbates the problem on some languages such as Japanese, which happens on central contents of the questions such as subjects or objects. Contexts in conversational situations, such as joint attention between a user and a system, or user gaze can clarify such ambiguities in questions. In this study, we propose the Gaze-grounded VQA dataset (LookVQA), which corresponds to directive or ellipsis and gaze information by focusing on a clarification process complemented by gaze information. We also propose a question answering model that utilizes gaze target estimation results to improve the accuracy of LookVQA tasks. Our experimental results showed that our model improved the performance in some question types of LookVQA and outperformed a VQA baseline.

Content from these authors
© 2025 The Association for Natural Language Processing
Previous article Next article
feedback
Top