Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 1E4-GS-6-01
Conference information

Multimodal Inference for Numerals with Model Checking and Knowledge Injection
*Nobuyuki IOKAWAHitomi YANAKA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Inference between different modalities has been actively studied in recent years. We focus on Visual-textual Entailment (VTE), one of the most critical tasks for multimodal inference. A variety of deep learning-based approaches have been proposed for the VTE task, but they have difficulty in accurately handling numerals. In contrast, approaches based on logical inference can successfully deal with numerals. However, since the previous logic-based approaches use automated theorem provers, their computational cost significantly increases for problems involving many entities. In this paper, we propose a logic-based VTE system with model checking and knowledge injection. We create a dataset for the VTE task containing numerals and negation to evaluate the extent to which VTE systems correctly understand those phenomena. Using this dataset, we show that our system solves the VTE task with numerals and negation more robustly than the previous approaches.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top