Artificial Intelligence and Data Science
Online ISSN : 2435-9262
Development of an event detection tool using LVLM for road and river management CCTV images
Shinya SAKAIDAHiroshi TSUSHIMAKatsuya AKIMOTOFumiya SUSAKI
Author information
JOURNAL OPEN ACCESS

2025 Volume 6 Issue 3 Pages 77-90

Details
Abstract

This study explores the development of a tool that detects various events of interest to river and road administrators by comparing current and past CCTV images. The approach leverages Visual Question Answering (VQA) tasks using Large Vision Language Models (LVLMs). The research began by identifying the tool’s operational requirements and constraints, including commercial usability and exclusion from restricted entity lists. Based on these criteria, three LVLMs were selected for testing: ChatGPT-4o, Gemini 1.5 Flash, and llava-llava-calm2-siglip.To evaluate practical applicability, real-world footage depicting events such as flooding and landslides was paired with prompts from the perspective of infrastructure managers. The outputs from each LVLM were assessed using precision, recall, and F1-score metrics. Among the models tested, ChatGPT-4o demonstrated the highest utility for practical deployment.The study also identified detectable event types in road and river contexts, clarified visual and system input conditions, and examined common causes of false positives and missed detections. Based on these insights, strategies for improving detection accuracy were proposed.

Content from these authors
© 2025 Japan Society of Civil Engineers
Previous article Next article
feedback
Top