モデル検査と知識補完を用いた数量表現に関するマルチモーダル推論

五百川 展行; 谷中 瞳

doi:10.11517/pjsai.JSAI2023.0_1E4GS601

Abstract

Inference between different modalities has been actively studied in recent years. We focus on Visual-textual Entailment (VTE), one of the most critical tasks for multimodal inference. A variety of deep learning-based approaches have been proposed for the VTE task, but they have difficulty in accurately handling numerals. In contrast, approaches based on logical inference can successfully deal with numerals. However, since the previous logic-based approaches use automated theorem provers, their computational cost significantly increases for problems involving many entities. In this paper, we propose a logic-based VTE system with model checking and knowledge injection. We create a dataset for the VTE task containing numerals and negation to evaluate the extent to which VTE systems correctly understand those phenomena. Using this dataset, we show that our system solves the VTE task with numerals and negation more robustly than the previous approaches.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!