Journal of the Japan Society for Precision Engineering

In recent years, unsupervised anomaly detection methods based on the intermediate features of pretrained Deep Neural Networks have demonstrated high accuracy on benchmark datasets. In particular, PatchCore is expected to be applied in real-world visual inspection not only for accuracy but also for memory efficiency. However, PatchCore has a limitation in precisely detecting locationally-constrained anomalies such as incorrect placement or missing parts. In this study, Position-Aware PatchCore is proposed, a new anomaly detection method that incorporates position information, feature variance information and neighborhood normal feature information into PatchCore. Experiments using MVTec AD show that Position-Aware PatchCore achieves the highest pixel-level detection accuracy for locationally-constrained anomalies among the comparison methods while maintaining memory consumption comparable to PatchCore.

View full abstract

Recently, state-of-the-art performance in video anomaly detection has been achieved by fine-tuning multimodal large language models (MLLM). However, the necessity of extensive caption annotations in training data imposes significant practical constraints. To overcome this limitation, we propose a novel MLLM-based video anomaly detection method that does not require manual caption annotation. The proposed method consists of an anomaly detection model for identifying and selecting key video samples, and an MLLM that autonomously generates and enhances captions to explain anomalous events. Extensive experiments demonstrate that our method achieves high detection accuracy and generates task-specific explanatory descriptions effectively.

View full abstract

Show abstractHide abstract

At construction sites, wire rope inspection for construction machinery is conducted manually by inspectors. However, such inspections are often constrained by work conditions, such as limited inspection time and restricted use of equipment. As a result, they tend to rely heavily on the inspector's experience and skill. These limitations highlight the need for an automated inspection system that is robust against environmental variability and human subjectivity. In this study, we propose a wire breakage detection method using an unsupervised anomaly detection model based on deep learning. The model is trained only on normal images to statistically model local visual features and detect anomalies as deviations from the learned distribution. This enables the detection of wire breakage without requiring predefined damage patterns or large amounts of labeled data. To verify the method's effectiveness under practical conditions, we constructed a dataset of wire rope images captured in diverse environments, including indoor and outdoor construction settings. Experimental results show that the proposed method can accurately localize wire breakage areas even under varying environments. Furthermore, application of the proposed method to in-service wire rope inspection at actual construction sites enabled the successful detection of subtle anomalies at an early stage prior to wire rope breakage. The results of this study suggest the feasibility of applying unsupervised deep learning-based anomaly detection techniques to support automated visual inspection of wire ropes in real-world construction environments.

View full abstract

Download PDF (2243K)

In this framework, we improve the general visual inspection performance by changing the foundation Vision-Language Model (VLM), reconstructing the fine-tuning dataset, and proposing a selection algorithm for In-Context Learning (ICL). The existing approach using VLM and ICL gives non-defective or defective images and an explanatory description as a prompt to inspect the unknown products without additional parameter updating. However, the foundation VLM used in the existing approach focused on the ICL capability, without considering the local recognition capability. Thus, in this study, we change the foundation VLM to one focused on the local recognition capability. Also, we reconstruct the fine-tuning dataset to enable the model to detect defective coordinates. In addition, during the inference, we propose an example selection algorithm based on the Euclidean distance, and give the ICL example with a visual prompt. The experimental results show that our approach achieved F1-score of 0.950 on MVTec AD in a one-shot manner.

View full abstract

Register with J-STAGE for free!