論理推論におけるVision-and-Languageモデルを用いたフレーズ間知識の補完

富張 聡祥; 谷中 瞳

doi:10.11517/pjsai.JSAI2023.0_1E4GS605

Abstract

Recognizing Textual Entailment (RTE) is an important task, which is applied to question-answering and machine translation. One of the main challenges in logic-based approaches to this task is the lack of background knowledge. This study proposes a logical inference system with phrasal knowledge by comparing their visual representations based on the intuition that visual representations facilitate humans to judge entailment relations. First, we obtain candidate phrase pairs for phrasal knowledge from the process of logical inference. Second, using a Vision-and-Language model, the visual representations of these phrases are acquired in the form of images or embedding vectors. Finally, the obtained visual representations are compared to determine whether to inject the knowledge corresponding to the candidate or not. Besides simple similarity between phrases, asymmetric relations are considered in comparing visual representations. Our logical inference system improved the accuracy on the SICK dataset compared with a previous logical inference system, SPSA.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!