Host: The Japanese Society for Artificial Intelligence
Name : The 39th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 39
Location : [in Japanese]
Date : May 27, 2025 - May 30, 2025
It is challenging for combinatorial optimization specialists to fully capture complex domain knowledge and accurately formulate constraints, restricting widespread practical adoption. We evaluated the effectiveness of a reasoning vision-language model (o1) for combinatorial optimization problems, while considering practical constraints expressed in natural language and visual information. Using the Traveling Salesman Problem (TSP), we compared the solutions produced by our approach with exact solutions derived from mixed-integer linear programming and with those generated by the visual language model (GPT-4o). For standard TSP instances with N = 10–30, o1 achieved a smaller optimality gap than GPT-4o and the nearest-neighbor heuristic. Moreover, for time-window and precedence constraints expressed in natural language, GPT-4o failed to meet these constraints, whereas o1 achieved a constraint satisfaction rate exceeding 90%. Additionally, o1 complied with over 80% of the visually defined area-order constraints. These results suggest that non-experts can introduce practical constraints without relying on mathematical models.