2025 Volume 91 Issue 3 Pages 418-424
In this study, we propose a methodology for determining the quality of products based on a few images of non-defective and defective products, along with descriptions that serve as criteria for judgment. Existing Large Vision-Language Models (LVLMs) have demonstrated high performance across a variety of tasks, yet they lack specialized knowledge required for visual inspection. To address this issue, we enhance the LVLM's domain-specific expertise through additional training with a diverse collection of images of non-defective and defective products gathered from the web. Moreover, by utilizing In-Context Learning (ICL), our approach enables inference on inspection images based on a few exemplar images of non-defective and defective products, along with their judgment criteria descriptions, thereby eliminating the need for collecting extensive training samples and training models for each product type as traditionally required. By integrating LVLM with ICL, our method introduces a novel approach to general visual inspection, demonstrating its utility.