Journal of the Japan Society for Precision Engineering
Online ISSN : 1882-675X
Print ISSN : 0912-0289
ISSN-L : 0912-0289
Selected Papers for Special Issue on Industrial Application of Image Processing
Surveillance Video Anomaly Detection via Automatic Caption Generation Using Multimodal Large Language Models
Satoshi HASHIMOTOHitoshi NISHIMURAMori KUROKAWA
Author information
JOURNAL FREE ACCESS

2025 Volume 91 Issue 12 Pages 1136-1143

Details
Abstract

Recently, state-of-the-art performance in video anomaly detection has been achieved by fine-tuning multimodal large language models (MLLM). However, the necessity of extensive caption annotations in training data imposes significant practical constraints. To overcome this limitation, we propose a novel MLLM-based video anomaly detection method that does not require manual caption annotation. The proposed method consists of an anomaly detection model for identifying and selecting key video samples, and an MLLM that autonomously generates and enhances captions to explain anomalous events. Extensive experiments demonstrate that our method achieves high detection accuracy and generates task-specific explanatory descriptions effectively.

Content from these authors
© 2025 The Japan Society for Precision Engineering
Previous article Next article
feedback
Top