2024 Volume 5 Issue 3 Pages 706-718
In this study, we propose a method for the high-precision automatic generation of findings for distress images. Recently, multi-modal models have attracted attention as generative AI since they are capable of understanding both images and texts with high accuracy. In addition, they can learn and adapt to various tasks with only a few input examples. Therefore, in this paper, we propose a method to efficiently learn the relationship between distress images and findings based on the multi-modal model. We obtain these pairs based on similar image retrieval. This approach enables highly accurate finding generation. We also use the structural components and types of damage that engineers refer to when creating findings. By compressing the data pool for retrieval using this information, we can acquire more useful pairs of distress images and findings. In the last of this paper, we confirm the effectiveness of the proposed method through experiments by generating findings for distress images contained in actual bridge inspection reports.