2025 Volume 6 Issue 3 Pages 976-989
This study proposes a multimodal large language model incorporating domain-specific knowledge of bridge inspection, with the aim of improving the efficiency of inspection tasks. While previous studies have required separate models for each task, the proposed model leverages instruction-tuning-a training method that enhances task execution capabilities by providing task instructions and corresponding example responses-to enable consistent handling of multiple tasks such as damage classification and findings generation. The model is trained using periodic bridge inspection records provided by the Ministry of Land, Infrastructure, Transport and Tourism, and two data augmentation strategies-based on question diversity and expression variability-are introduced to improve generalization and robustness. Experimental evaluations were conducted to assess task-wise accuracy and to verify the effectiveness and practical applicability of the proposed model. The results demonstrate that, despite having a very small parameter size, the proposed model achieves comparable performance to image classification models and GPT-based generative models across multiple tasks, confirming its potential as a lightweight and high-accuracy solution for real-world inspection support.