Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 4Xin2-111
Conference information

Do Deep Generative Models Capture Spatial Concepts?: Spatial Understanding Task using Design Patent Dataset
*Makoto TAKENAKAHitomi YANAKA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Based on various prior knowledge, humans can imagine how an object looks from different perspectives. In this paper, we propose a task to evaluate whether recent large multimodal models have this capability and attempt to analyze current models. Specifically, an image of an object from an isometric view and an image of the same object from a different perspective are input to the model, and the task is to question the viewpoints of the two images. The evaluation dataset was constructed using sketch images from the design patents database and their captions describing the viewpoint information as data sources. In the experiment, the evaluation dataset constructed for the GPT-4V is used to analyze the spatial reasoning ability. Based on the experiment results, we discuss the potential and challenges of the GPT-4V's spatial reasoning ability.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top