Artificial Intelligence and Data Science
Online ISSN : 2435-9262
InfLLaVA: Construction of an instruction-tuned multimodal large language model for bridge inspection -Acquiring domain knowledge and multi-task execution capability-
Masaya SATOKeisuke MAEDATakahiro OGAWAMiki HASEYAMA
Author information
JOURNAL OPEN ACCESS

2025 Volume 6 Issue 3 Pages 976-989

Details
Abstract

This study proposes a multimodal large language model incorporating domain-specific knowledge of bridge inspection, with the aim of improving the efficiency of inspection tasks. While previous studies have required separate models for each task, the proposed model leverages instruction-tuning-a training method that enhances task execution capabilities by providing task instructions and corresponding example responses-to enable consistent handling of multiple tasks such as damage classification and findings generation. The model is trained using periodic bridge inspection records provided by the Ministry of Land, Infrastructure, Transport and Tourism, and two data augmentation strategies-based on question diversity and expression variability-are introduced to improve generalization and robustness. Experimental evaluations were conducted to assess task-wise accuracy and to verify the effectiveness and practical applicability of the proposed model. The results demonstrate that, despite having a very small parameter size, the proposed model achieves comparable performance to image classification models and GPT-based generative models across multiple tasks, confirming its potential as a lightweight and high-accuracy solution for real-world inspection support.

Content from these authors
© 2025 Japan Society of Civil Engineers
Previous article Next article
feedback
Top