Journal of the Japan Society for Precision Engineering
Online ISSN : 1882-675X
Print ISSN : 0912-0289
ISSN-L : 0912-0289
Selected Papers for Special Issue on Industrial Application of Image Processing
Training and Evaluation Methods for Vision-Language Model Towards Improved General Visual Inspection
Shiryu UENOYoshikazu HAYASHIShunsuke NAKATSUKAKunihito KATOTakumi OSHITAHiroaki AIZAWA
Author information
JOURNAL FREE ACCESS

2025 Volume 91 Issue 12 Pages 1150-1155

Details
Abstract

In this framework, we improve the general visual inspection performance by changing the foundation Vision-Language Model (VLM), reconstructing the fine-tuning dataset, and proposing a selection algorithm for In-Context Learning (ICL). The existing approach using VLM and ICL gives non-defective or defective images and an explanatory description as a prompt to inspect the unknown products without additional parameter updating. However, the foundation VLM used in the existing approach focused on the ICL capability, without considering the local recognition capability. Thus, in this study, we change the foundation VLM to one focused on the local recognition capability. Also, we reconstruct the fine-tuning dataset to enable the model to detect defective coordinates. In addition, during the inference, we propose an example selection algorithm based on the Euclidean distance, and give the ICL example with a visual prompt. The experimental results show that our approach achieved F1-score of 0.950 on MVTec AD in a one-shot manner.

Content from these authors
© 2025 The Japan Society for Precision Engineering
Previous article Next article
feedback
Top