Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 1E4-GS-6-05
Conference information

Logical Inference with Phrasal Knowledge Injection using Vision-and-Language Model
*Akiyoshi TOMIHARIHitomi YANAKA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Recognizing Textual Entailment (RTE) is an important task, which is applied to question-answering and machine translation. One of the main challenges in logic-based approaches to this task is the lack of background knowledge. This study proposes a logical inference system with phrasal knowledge by comparing their visual representations based on the intuition that visual representations facilitate humans to judge entailment relations. First, we obtain candidate phrase pairs for phrasal knowledge from the process of logical inference. Second, using a Vision-and-Language model, the visual representations of these phrases are acquired in the form of images or embedding vectors. Finally, the obtained visual representations are compared to determine whether to inject the knowledge corresponding to the candidate or not. Besides simple similarity between phrases, asymmetric relations are considered in comparing visual representations. Our logical inference system improved the accuracy on the SICK dataset compared with a previous logical inference system, SPSA.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top