In this paper, a highly accurate anomaly detection method using handcrafted feature extraction is presented for particular categories of MVTec AD, which are benchmark data sets of anomaly detection. In this method, local features based on grey level gradients are sampled in a subset by greedy method, and anomaly is detected by Euclidean distance. Evaluations for the category "screw" showed that the proposed method gained higher AUROC than PatchCore, which is one of State-of-the-art of deep-learning model.
In this study, we propose a new edge detection method. First, we redefine the concept of “Edge Region” and mention that the edge-narrowness principle for the edge detection could be originally introduced based on the “Edge Region”. Next, contour lines of the object are discussed, and the principle of the “Edge Region” expression be explained based on them. Based on the above considerations, we propose a new method called the Narrowness Edge Detection (NED) method. Experiments showed the behavior of the algorithm of the proposed method. The position of the proposed method was clarified by comparing it with methods based on the local contrast principle such as Sobel operator.
Understanding the three-dimensional structure of individual wires within a cable is vital for validating the cable's characteristics. This involves tracking thousands of wires longitudinally across numerous cross-sectional CT images obtained through X-ray CT non-destructive observation. While previous studies have achieved high accuracy using LSTM-based tracking methods, a significant challenge arises from the substantial annotation cost. Annotated data for the first 10% of the wires was required for LSTM training. In this study, a new tracking method is proposed, utilizing LSTM to reconnect wires that couldn't be connected using rule-based methods. This approach eliminates the need for annotation by utilizing trajectories accurately tracked by rulebased methods as LSTM training data. Experimental results show that ours outperforms previous methods in tracking accuracy under all conditions, with a maximum improvement of 30.1%. In one cable, tracking accuracy reached 100%, facilitating the reproduction of the wires' three-dimensional shape within the cable.
Recent advancements in multimodal models integrating various modalities such as images and language have been significant, with large-scale general-purpose models like OFA, Kosmos-2, and Unified-IO gaining particular attention in the image field. These models have shown remarkable achievements in diverse vision tasks by integrating features of images and language, surpassing image-only models. Despite their enhanced performance, driven by increased training data and scale, these models also face challenges with growing size and training costs. Moreover, the non-disclosure of training methodologies and datasets complicates model fine-tuning and reproducibility, posing issues for legitimate evaluations. Addressing these concerns, this study proposes a lightweight large-scale Vision & Language multimodal model using frozen pretrained encoder weights. We introduce a multitask training approach that is efficient in resource-limited settings and employs publicly available datasets for credible evaluations. The application of our model on the Human-Object Interaction task through fine-tuning demonstrated performance comparable to existing large models, while significantly reducing training time due to the model's lightweight design. This paper contributes a lightweight large-scale Vision & Language model feasible for fine-tuning on standard GPUs, an effective multitask training method for constrained environments, and a model that ensures valid evaluations using only public datasets.
Although deep learning models have shown excellent performance in image classification, it is known to be vulnerable to adversarial examples and easily misclassify them into the wrong class. An adversarial example is generated from a natural example by adding small perturbations to it. As a representative countermeasure, adversarial training that trains a model on adversarial samples has been proposed. However, it is difficult to completely prevent misclassification caused by adversarial examples, so it is important to take into account the degree of risk involved. From this background, we propose a novel adversarial training method that can reduce the possibility of an adversarial example being misclassified into a high-risk class even if it cannot prevent the misclassification itself. To this end, we introduce a loss function considering the degree of risk of misclassification. In the experiments, we compared the conventional and proposed methods on a general image classification dataset and a road traffic sign dataset. We confirmed that the proposed method can reduce the risk of misclassification.
In fish aquaculture, it is important to monitor the growth of fish in the fishpond for aquaculture control. In this paper, we develop a system for 3D measurement of aquaculture fish using a 3D reconstruction technique based on spherical stereo camera. In the experiment, 3D measurements of fish bodies were conducted in both a simulated environment using CG images generated by a game engine, and a real environment using images captured in a fishpond. The experimental results showed that the proposed method enables accurate measurement of the size distribution and average size of fish in the fishpond. The mean values of fork length, body height, and body weight could be measured within ±5% error for both simulation and real environments. These results show that the proposed method was suggested to be effective as a growth management system for aquaculture fish.
This study proposes a novel approach that performs one-shot calibration of extrinsic parameters of the RGB-D camera network using a newly designed reference marker with unique shape features. Our calibration scheme can be utilized even when the viewing angles of multiple cameras do not overlap. We define a novel metric that measures the geometric absolute error and reliable repeatability of calibration in three-dimensional (3D) by comparing predefined key point pairs from the CAD model and measured point cloud data of the reference marker, instead of 2D re-projection error that is not absolute. Additionally, camera parameters are further optimized through a fine-tuning technique using an evaluation function based on the proposed metric. Demonstration in a real environment proves the effectiveness of the proposed metric to measure absolute error and fine-tuning technique.
In this study, we propose a method based on the basic algorithm of 3D Gaussian Splatting that enables three-dimensional data interpolation and rendering up to 18 times faster, while maintaining quality equal to or better than the original method, even in situations where only a few viewpoints are available. This is achieved through several improvements: (1) the introduction of a loss function to suppress the occurrence of three-dimensional structural inconsistencies due to the insufficiency of training viewpoints, (2) the multiplexization of input resolution, and (3) the suppression of overfitting by batching the sum of loss functions from multiple viewpoints during backpropagation.