Journal of Robotics and Mechatronics
Online ISSN : 1883-8049
Print ISSN : 0915-3942
ISSN-L : 0915-3942
Special Issue on Advanced Robotic Technology and System for DX in Construction Industry
Automatic Findings Generation for Distress Images Using In-Context Few-Shot Learning of Visual Language Model Based on Image Similarity and Text Diversity
Yuto WatanabeNaoki OgawaKeisuke MaedaTakahiro OgawaMiki Haseyama
著者情報
ジャーナル オープンアクセス

2024 年 36 巻 2 号 p. 353-364

詳細
抄録

This study proposes an automatic findings generation method that performs in-context few-shot learning of a visual language model. The automatic generation of findings can reduce the burden of creating inspection records for infrastructure facilities. However, the findings must include the opinions and judgments of engineers, in addition to what is recognized from the image; therefore, the direct generation of findings is still challenging. With this background, we introduce in-context few-short learning that focuses on image similarity and text diversity in the visual language model, which enables text output with a highly accurate understanding of both vision and language. Based on a novel in-context few-shot learning strategy, the proposed method comprehensively considers the characteristics of the distress image and diverse findings and can achieve high accuracy in generating findings. In the experiments, the proposed method outperformed the comparative methods in generating findings for distress images captured during bridge inspections.

著者関連情報

この記事は最新の被引用情報を取得できません。

© 2024 Fuji Technology Press Ltd.

This article is licensed under a Creative Commons [Attribution-NoDerivatives 4.0 International] license (https://creativecommons.org/licenses/by-nd/4.0/).
The journal is fully Open Access under Creative Commons licenses and all articles are free to access at JRM official website.
https://www.fujipress.jp/jrobomech/rb-about/#https://creativecommons.org/licenses/by-nd
前の記事 次の記事
feedback
Top