2022 Volume 3 Issue J2 Pages 704-713
To support the maintenance of infrastructure structures, research on deep learning to classify the progression of distress from images has been widely conducted. In the classification of distress images, deep learning models are likely to incorrectly focus on regions unrelated to the classification target due to the complexity of the real data, such as the diversity of the subjects and the distance between the camera and the subject. Therefore, in this paper, we construct a multi-modal deep learning model that can focus on regions of distress by introducing text data indicating the parts and materials where distress occurs to the conventional deep learning model that uses only images. Furthermore, by calculating the confidence which indicates how confident the model is in the focused distress regions, the effect of the focused regions with high confidence on the distress classification can be controlled, and thus, the classification performance is improved.