Visual language modelを用いた変状画像に対する所見の自動生成—類似画像検索によるFew-shot learningの導入—

渡邉 優宇人; 小川 直輝; 前田 圭介; 小川 貴弘; 長谷山 美紀

doi:10.11532/jsceiii.4.3_223

Abstract

In this study, we propose a novel method for automatic generation of findings using a visual language model to support the efficient creation of findings in inspection records for infrastructure facilities. It is essential for the creation of inspection records to write findings, which are sentences that include judgments and opinions of engineers in addition to what can be recognized from the distress image. However, there has been little discussion on the direct automatic generation of findings, and it is expected to realize generation methods to support the efficient creation of findings. With this background, in this paper, we introduce few-shot learning based on the similarity of distress images to the visual language model, which is an application of large language models attracted much attention in recent years and enables text output with a highly accurate understanding of both vision and language. By using past inspection records including images similar to the distress images, we can efficiently consider the relationship between the distress images and findings from a small number of pairs of them. In the last part of this paper, we confirm the effectiveness of the proposed method through experiments generating findings from the distress images included in the inspection records of bridges.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!