Vision Transformerと大規模視覚言語モデルを用いた道路附属物の損傷種類と損傷程度の推定

渡部 航史; 前田 圭介; 藤後 廉; 小川 貴弘; 長谷山 美紀

doi:10.11532/jsceiii.6.3_966

Abstract

Road attachment facilities, including road signs and lighting, are ubiquitous across vast road networks, making efficient inspection crucial. Previously, AI models were proposed to classify the damaged type of road attachment facilities. However, practical implementation requires an interpretable framework and the ability to estimate damage level. This paper proposes a comprehensive framework based on the damage type classification with Vision Transformer and the damage level estimation with the in-context learning of the large vision-language models (VLMs). The ViT-based damage type classification provides an interpretable framework, while the LVM’s in-context learning enables damage level estimation, a challenging task for ViTs alone. In the last part of this paper, we evaluate our method with real-world images of road attachment facilities.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!