文章と画像による二重拘束に対するLLMの調査検討

橋本 智己; 田端 智成

doi:10.3156/jsoft.37.2_618

Abstract

In this paper, we focus on the double bind between textual and visual information and investigate the following points: (1) whether Large Language Models (LLMs) can detect a sense of incongruity, (2) whether images that evoke positive or negative impressions influence the detection of incongruity, and (3) whether the incongruity judgments made by LLMs align with those of human subjects. We examined three LLMs: GTP-4o, Gemini 1.5 Flash, and Claude 3 Haiku. Our results indicate that LLMs tend to detect the sense of incongruity arising from the double bind between text and images. Moreover, despite variations in impression evaluations due to different images, there was a consistent tendency for LLMs to detect incongruity. Finally, among the LLMs studied, GTP-4o’s incongruity judgments were most similar to those of human subjects.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!