Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
Based on various prior knowledge, humans can imagine how an object looks from different perspectives. In this paper, we propose a task to evaluate whether recent large multimodal models have this capability and attempt to analyze current models. Specifically, an image of an object from an isometric view and an image of the same object from a different perspective are input to the model, and the task is to question the viewpoints of the two images. The evaluation dataset was constructed using sketch images from the design patents database and their captions describing the viewpoint information as data sources. In the experiment, the evaluation dataset constructed for the GPT-4V is used to analyze the spatial reasoning ability. Based on the experiment results, we discuss the potential and challenges of the GPT-4V's spatial reasoning ability.