Journal of the Japan Society for Precision Engineering
Online ISSN : 1882-675X
Print ISSN : 0912-0289
ISSN-L : 0912-0289
Paper
Large Vision-Language Model Can Few-Shot Anomaly Detection by In-Context Learning
Takumi OSHITAShiryu UENOYusei YAMADAShunsuke NAKATSUKAKunihito KATOHiroaki AIZAWAYoshikazu HAYASHI
Author information
JOURNAL FREE ACCESS

2025 Volume 91 Issue 3 Pages 418-424

Details
Abstract

In this study, we propose a methodology for determining the quality of products based on a few images of non-defective and defective products, along with descriptions that serve as criteria for judgment. Existing Large Vision-Language Models (LVLMs) have demonstrated high performance across a variety of tasks, yet they lack specialized knowledge required for visual inspection. To address this issue, we enhance the LVLM's domain-specific expertise through additional training with a diverse collection of images of non-defective and defective products gathered from the web. Moreover, by utilizing In-Context Learning (ICL), our approach enables inference on inspection images based on a few exemplar images of non-defective and defective products, along with their judgment criteria descriptions, thereby eliminating the need for collecting extensive training samples and training models for each product type as traditionally required. By integrating LVLM with ICL, our method introduces a novel approach to general visual inspection, demonstrating its utility.

Content from these authors
© 2025 The Japan Society for Precision Engineering
Previous article Next article
feedback
Top