2023 Volume 4 Issue 3 Pages 223-232
In this study, we propose a novel method for automatic generation of findings using a visual language model to support the efficient creation of findings in inspection records for infrastructure facilities. It is essential for the creation of inspection records to write findings, which are sentences that include judgments and opinions of engineers in addition to what can be recognized from the distress image. However, there has been little discussion on the direct automatic generation of findings, and it is expected to realize generation methods to support the efficient creation of findings. With this background, in this paper, we introduce few-shot learning based on the similarity of distress images to the visual language model, which is an application of large language models attracted much attention in recent years and enables text output with a highly accurate understanding of both vision and language. By using past inspection records including images similar to the distress images, we can efficiently consider the relationship between the distress images and findings from a small number of pairs of them. In the last part of this paper, we confirm the effectiveness of the proposed method through experiments generating findings from the distress images included in the inspection records of bridges.