InfLLaVA: Instruction-Tuning による橋梁点検特化型マルチモーダル大規模言語モデルの構築 －橋梁点検に関する専門的知識とマルチタスク対応能力の獲得－

佐藤 雅也; 前田 圭介; 小川 貴弘; 長谷山 美紀

doi:10.11532/jsceiii.6.3_976

Abstract

This study proposes a multimodal large language model incorporating domain-specific knowledge of bridge inspection, with the aim of improving the efficiency of inspection tasks. While previous studies have required separate models for each task, the proposed model leverages instruction-tuning-a training method that enhances task execution capabilities by providing task instructions and corresponding example responses-to enable consistent handling of multiple tasks such as damage classification and findings generation. The model is trained using periodic bridge inspection records provided by the Ministry of Land, Infrastructure, Transport and Tourism, and two data augmentation strategies-based on question diversity and expression variability-are introduced to improve generalization and robustness. Experimental evaluations were conducted to assess task-wise accuracy and to verify the effectiveness and practical applicability of the proposed model. The results demonstrate that, despite having a very small parameter size, the proposed model achieves comparable performance to image classification models and GPT-based generative models across multiple tasks, confirming its potential as a lightweight and high-accuracy solution for real-world inspection support.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!