In-Context Learningを使用した大規模視覚言語モデルによる少数の例示画像付き外観検査

尾下 拓未; 上野 詩翔; 山田 悠正; 中塚 俊介; 加藤 邦人; 相澤 宏旭; 林 良和

doi:10.2493/jjspe.91.418

Abstract

In this study, we propose a methodology for determining the quality of products based on a few images of non-defective and defective products, along with descriptions that serve as criteria for judgment. Existing Large Vision-Language Models (LVLMs) have demonstrated high performance across a variety of tasks, yet they lack specialized knowledge required for visual inspection. To address this issue, we enhance the LVLM's domain-specific expertise through additional training with a diverse collection of images of non-defective and defective products gathered from the web. Moreover, by utilizing In-Context Learning (ICL), our approach enables inference on inspection images based on a few exemplar images of non-defective and defective products, along with their judgment criteria descriptions, thereby eliminating the need for collecting extensive training samples and training models for each product type as traditionally required. By integrating LVLM with ICL, our method introduces a novel approach to general visual inspection, demonstrating its utility.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!