Host: The Japanese Society for Artificial Intelligence
Name : The 39th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 39
Location : [in Japanese]
Date : May 27, 2025 - May 30, 2025
We propose GeoProg3D, a visual programming framework that enables natural language interaction with city-scale 3D scenes. GeoProg3D controls two important innovations that we introduce: Geography-aware City-scale 3D Language Field (GCLF) and Geographical Vision APIs (GV-APIs). GCLF extends language fields to city-scale 3D data, allowing precise queries based on geographic information. GV-API provides specialized geographical vision processing tools such as segmentation and object detection. GeoProg3D constructs executable programs by dynamically composing GCLF and GV-API components, resulting in accurate geographic inference. To evaluate this approach, we introduce GeoEval3D dataset, which contains 952 query-answer pairs for five challenging geographical vision tasks: grounding, spatial reasoning, comparison, counting, and measurement. Experimental results show that GeoProg3D outperforms existing models on a variety of geographic vision tasks. This framework is expected to be applied to urban planning, disaster response, environmental monitoring, and other fields.