2024 Volume 5 Issue 1 Pages 25-33
In recent years, laser ultrasonic visualization testing (LUVT) has attracted much attention because of its ability to efficiently perform non-contact ultrasonic non-destructive testing. Despite many success reports of deep learning based image analysis for widespread areas, attempts to apply deep learning to defect detection in LUVT images face the difficulty of preparing a large dataset of LUVT images which is prohibitively expensive and time-consuming to scale. To compensate for the scarcity of such training data, we propose a data augmentation method that generates artificial LUVT images by simulating artificial LUVT images and then applying a style transfer to these simulated images. The experimental results showed that the effectiveness of data augmentation based on the style-transformed simulated images improved the prediction performance of defects, rather than directly using the raw simulated images for data augmentation.
In recent years, the demand for non-destructive testing in structural maintenance and management has increased due to the aging of structures and other societal factors. In particular, ultrasonic non-destructive testing is superior in terms of inspection efficiency and safety and has been widely used. Among ultrasonic non-destructive testing methods, this paper focuses on the laser ultrasonic visualization testing (LUVT) technique 1),2). LUVT can visualize the propagation of ultrasonic waves, and the presence or absence of defects can be determined from the image obtained by LUVT.
However, in many ultrasonic non-destructive testing facilities, human inspectors still conduct visual inspections for defects. In response to the everincreasing demand for non-destructive testing, the field of non-destructive testing is plagued by a shortage of well-trained inspectors and an increased workload. As a solution to this issue, machine learning technology has recently been introduced to nondestructive ultrasonic testing to reduce the labor required for inspections 3),4).
The goal of this study is to develop a machine learning technique for automatic inspection of images obtained by LUVT. The challenge of automating ultrasonic non-destructive testing has been a long-standing challenge in academia. It already has a long history and various learning machines such as shallow neural networks 5),6),7) and SVMs 8),9) have been examined. In the early 2010s, rich deep models emerged and replaced standard methods in many applications such as image recognition, speech recognition, and natural language processing. Through the enrichment of machine learning models, the effectiveness of these tasks has been brought to human levels. These successes eventually motivated researchers in ultrasonic non-destructive testing to explore deep learning for automated inspections 10),11).
Compared to generic image recognition tasks, however, the collection of training images for the non-destructive inspection is expensive and time-consuming, which limits the amount of available data for machine learning. However, the amount of data needed to train rich machine learning models such as deep neural networks is much larger than the amount needed to train human inspectors 12). Hence, the scarcity of training data severely hampers the generalization performance of machine learning models that rely on large amounts of training data.
This paper proposes an effective simulation-based data augmentation method to address the scarcity of training data for LUVT image inspection. The numerical simulations can generate the images for scattered wavefields from various defects that correspond to experimental results without having to cost LUVT measurements. However, the existence of differences between simulated and real images has a negative impact on machine learning, failing to build models that perform well for defect detection from unseen real images. The proposed method alleviates the differences by making simulated images more similar to real images by applying a style transfer technique 13). The experimental results reported in this paper shall demonstrate the effects of the simulation-based data augmentation combined with the style transfer.
Nowadays, few developers skip data augmentation to develop a generic image recognition algorithm. This is because intuitive and natural approaches are easier to develop and implement for generic images than they are for other data types. In general, data augmentation increases the diversity of the training images, thus reducing over-fitting to the training data and increasing generalization power.
Simple Transformations: Data augmentations commonly used for generic image recognition are simple image transformations, including geometric transformations such as flipping, rotating, cropping, cutout 14), and intensity transformation such as hue and brightness changes. Examples of these typical data augmentations are shown in Figure 1. Karen et al. 15) demonstrated improved performance on the ImageNet dataset using these simple data augmentations.
Another approach is to generate a new image by combining multiple images. Mixup 16) takes a linear combination of two images and category labels. Cut-Mix 17) improves upon Mixup and cutout by combining the two methods. Wang et al. 18) verified the effect of using CutMix as a data augmentation in YOLOv4 on object detection performance.
In addition, data augmentation methods using a generative adversarial network (GAN) 19), have also been proposed. GAN is a class of generative model that attempts to produce high-quality images via adversarial learning. Data augmentation using GANs has mainly been studied in the medical field using IAGAN 20) and PGGAN 21). However, data augmentation based on these image transformations produces images that could not possibly exist in reality, limiting the positive effect of classical data augmentation 22).
Virtual Flaws: To overcome the limitations of the simple transformation-based data augmentations, another approach called virtual flaws or virtual cracks were introduced for non-destructive inspection 23).
The virtual flaw approach moves the flaw signal identified in real data to an arbitrary position to expand the training dataset. This approach has been used with success to train not only machine learning models but also human inspectors 12). However, this approach is infeasible for LUVT images. It is because the change in an image due to a defect does not appear locally. Instead, its scattering wave spreads globally in the image, which makes it impossible to generate another defect image just by moving a local area to another position.
Simulation: This study employs numerical simulations for data augmentation which possesses several advantages. The first advantage is the ability to generate images that meet the arbitrary intended purpose. This simulator can generate a variety of images to meet specific conditions defined by the user. This is contrastive to data augmentation based on simple image transformations that are weak in synthesizing particular images specified by users. The second advantage is the ability to use the generated images on a large scale. A simulator can acquire an unlimited number of images without incurring the cost of image acquisition. Data augmentation by simulations has been proved to be effective in robot vision 24), automatic driving 25), medical imaging 26), and defect detection on steel surfaces 27). Their success motivated us to use simulation for data augmentation, although we found that the LUVT images generated by the numerical simulation have considerable differences from the actual measured images. This is because the actual LUVT images contain a lot of noise. Due to the nature of these images, direct use of the corresponding LUVT images obtained by simulations was not expected to result in good prediction performance.
Style Transfer: In this study, to reduce the difference between simulator-generated images and real images, a style transfer technique is introduced. There have been proposed several style transformation algorithms 28),29),30),31),13). This study employs an unsupervised style transfer algorithm. Another type of style transfer algorithms is a supervised algorithm that learns with pairs of an input image and its styletransformed output image, and the correspondence is learned based on this information. Isola et al. 31) developed Pix2Pix as a supervised style transfer and demonstrated its ability to transform grayscale images into color images. However, in LUVT applications, it is difficult to prepare pairs of input images and their style-transformed output images in advance. From this reason, a supervised style transfer was not employed in this study.
Brief description of LUVT: In non-destructive inspection using LUVT, a laser is first irradiated onto the test specimen to generate laser ultrasonic waves at the laser irradiation point, as shown in Figure 2. The excited laser ultrasonic waves are received by the pre-installed ultrasonic transducer. This operation is performed for various laser irradiation points on the surface of a specimen. After that, the reciprocity theorem 32) is applied to the transducer and all irradiation points. Then, an image as if ultrasonic waves were being transmitted can be obtained from the preinstalled ultrasonic probe 2). This operation was applied to an aluminum specimen in Figure 3, and the resulting image is shown in Figure 4. For defect-free images, the incident ultrasonic waves from the probe placed above the specimen propagate as they are. On the other hand, in the case of an image with a defect, scattered waves generated by the defect propagate isotropically in addition to the incident ultrasonic waves. As a result, scattered waves from the defect are generated, as shown in Figure 4. The internal defects can be detected by identifying the existence of these scattered waves. Challenges in Dataset Scaling. Scattered waves from defects in images obtained by LUVT may not be visible even to humans due to measurement noise. A large dataset for training is required to accurately discriminate such images with high accuracy. A typical way of collecting a training dataset is to use the images obtained through the actual periodical inspection routine. A common obstacle of this approach is the scarcity of defect examples. In this study, an alternative approach is adopted, which is to fabricate the defective sample in the following manner.
1. Prepare a specimen.
2. Artificially create defects inside the specimen using a drill.
3. Obtain LUVT images using the imaging methods described above.
This procedure is repeated to obtain a positive example. To obtain a negative example, only steps 1 and 3 are performed.
In the experiments described in Section 5., aluminum specimens were used. A cylindrical cavity with a diameter of d = 2mm was artificially created in each of the specimens. The vertical, horizontal, and depth lengths for the aluminum specimen were 100mm, 30mm, and 50mm, respectively, as shown in Figure 3. The longitudinal wave transducer with a center frequency of 2 MHz is used and located on the front side of the top surface, as found in the top area of Figure 3. The red rectangular area (20mm×50mm) of this figure is the laser irradiation area corresponding to the LUVT visualization area for the ultrasonic wave propagation.
Modern rich machine learning models cannot be powerful unless the training dataset can be scaled to a large size, although the specimens tend to be expensive. In addition, repeated measurements and imaging using the equipment require time and labor. This is a barrier to scale LUVT data sets.
(1) Image Generation From Simulation
In order to use a physical simulator to synthesize an ultrasonic wave propagation image that is equivalent to an LUVT image, it is necessary to solve the elastic wave equation that is satisfied by the ultrasonic waves under the boundary conditions and initial conditions for the specimen. Well-known methods for solving the elastic wave equation include the finite difference method 33), the finite element method 34), and the boundary element method 35). The boundary element method is known as a high-precision wave analysis method, but it requires a relatively large amount of computation time. On the other hand, the finite difference method and the finite element method can obtain numerical solutions in a relatively small computation time.
In this study, in order to prepare a large number of artificial images, we used the time-domain finite difference method to obtain simulation images, which can obtain rough numerical solutions in a small computational cost. The obtained simulated images are shown in Figure 5.
(2) Style Transfer
There is a difference between the LUVT image actually acquired and the image generated by the simulator. Various factors introduce noise in the process of acquiring real LUVT images. However, since the LUVT simulator is designed to understand physical phenomena, it cannot fully reproduce the process from ultrasonic scattering to image composition. This difference causes domain shifts, and the image synthesized from the simulator does not improve generalization performance. Therefore, rather than directly adding the simulator-generated images as training data, a transformation is applied to bring them closer to the real images.
In this study, a technique called style transfer 13), which has been developed in the field of image recognition, is applied in order to make simulatorgenerated images more similar to real LUVT images. Style transfer model learns a style to transform the style of input images (Figure 6). In this paper, we propose the introduction of a style transfer algorithm for LUVT image data augmentation. Style transfer model itself needs to be trained in addition to the predictor of the existence of defect. The style transfer model learns the content domain from a set of content images and simultaneously learns the style domain from another image set. Zhu et al. 13) developed a style transfer called CycleGAN and demonstrated that the learning of these two domains can transform a painting style, such as a landscape photograph from an actual painting. In the experiments reported in Section 5., we used CycleGAN to verify the effect of the proposed data augmentation method on predicting the defects from LUVT images.
First, we illustrate the changes in simulated images due to style transfer, and then demonstrate its effect on defect detection performance.
(1) Style Transfer
In order to make the simulated images closer to the real images, we performed style transfer using CycleGAN. Experiments on style transfer using CycleGAN were conducted based on the official implementation*. The CycleGAN architecture comprises a generative network and a patch-based discriminative network. The generative part consists of initial layers, intermediate residual blocks, and an output layer, with the number of residual blocks adjusted according to the resolution of the images. On the other hand, the discriminative part employs a 70x70 PatchGAN that judges the authenticity of small patches, allowing for efficient and flexible decisions. For training, a strategy utilizing least squares loss and a history of generated images for updates was adopted to improve stability and output quality. The initial learning rate was set to 0.0002, and the adam optimizer was used for training over 20,000 iterations. The simulated LUVT dataset consists of 431 images from each of the 55 defect locations, while the real one consists of 134 images from each of the 203 defect locations.
The images obtained from the style transfer of the simulated images using CycleGAN are shown in Table 1. The leftmost image is before applying the style transfer. The second, third, and fourth columns from the left show the transformed images after 100, 10,000, and 20,000 iterations of the learning process of CycleGAN, respectively.
Although each image shows differences in terms of wave intensities and defect visibility, it can be confirmed that the style of the transformed images becomes increasingly similar to that of real LUVT images as CycleGAN training progresses. Especially, as seen in Figure 4, the style-transformed images, unlike the raw artificial images, not only mimic the noise observed during measurements but also are confirmed to have become closer to the actual measured LUVT images in terms of image coloration. In what follows, we demonstrate how data augmentation with styletransformed images affects the defect detection performance of the machine learning models.
(2) Data Augmentation
In order to demonstrate the effectiveness of data augmentation by style transfer, experiments were conducted comparing the case of LUVT images alone, data augmentation directly with simulated LUVT images, and data augmentation with styletransformed images, denoted by REAL ALONE, REAL+SIMULATED, and PROPOSED, respectively. The performance of these three methods was evaluated using three prediction models including EfficientNet 36), ResNeXt 37), and Vision Transformer (ViT) 38). The hyper-parameters used in the training of each prediction model are shown in Table 2. In addition to simulation-based data augmentation, we added only HorizontalFlip as a simple transformation-based augmentation during training, with the weights pretrained on ImageNet as initial values. The dataset used is summarized in Table 3. We posed a binary categorization problem, where images up to the arrival of the wave to a defect are labeled as defect-free and images thereafter are defective.
The performance evaluation was conducted as follows. First, real images from 203 specimens were divided into training data, validation data, and testing data. For fair comparisons, this division was conducted in five different patterns to ensure that all specimens were tested across the dataset splits. For the method REAL ALONE, the training data containing only the real images was used to train the prediction model. Method REAL+SIMULATED mixed the simulated images to the training dataset, and PRO-POSED method added the style-transformed images to the training dataset. For each epoch, the loss on the validation data was monitored, and the weights that minimized the loss was selected as the training result. The weights obtained from the training were used to assess Accuracy, Precision, Recall, and F-score on the testing data. The above procedure was repeated five times with different random data partitioning patterns, and averaged over five data partitioning patterns to obtain the performance measures.
Table 4 reports the classification performance for the case where 203 image subsets were divided into 61 training subsets, 20 validation subsets, and 122 testing subsets. Let us first look at the data augmentation without style transfer. When using ViT for the prediction model, the REAL+SIMULATED method achieved better performance than the REAL ALONE method, although no improvement was observed when using EfficientNet and ResNeXt. This was because the simulated images were not similar enough to the real images to enhance the training of the prediction models. In contrast, the proposed method improved defect detection performance for all prediction models.
Table 5 shows the prediction performances for the case where the number of training data is reduced by dividing the real LUVT image subsets into 41 training subsets, 20 validation subsets, and 142 testing subsets. The objective of this numerical experiment is not only to address the difficulty of obtaining real data but also to investigate the impact on the learning process when the relative ratio of style-transformed images to the real data used for training is increased. Although the overall performance shows a decreasing trend when the training data comprises a larger quantity of real data, it is observed that the proposed method is more effective in reducing false detections and misses compared to using only real data as training data or augmenting data with artificial data.
Visualization of decision making: All three prediction models we used have a deep structure. The internal workings of deep neural networks are not easily interpretable. A technique called Grad-CAM 39) offers a solution to this issue that provides a perspective for how decisions are made in deep networks. GradCAM generates a heat map that represents the magnitude of the gradient obtained by back propagation. The heat map is called a class activation map. The deep prediction model is particularly concerned with areas of high gradient magnitude in its predictions.
The class activation mappings for the classification of LUVT images using the trained EfficientNet are shown in Table 6. In the example of the prediction for defect-free images, the proposed method correctly determines the absence of defects. In contrast, with the other data augmentation methods, the prediction model looks at some areas in the image even though these areas are free of defects. This tendency was especially observed when the ripples were disrupted independently of the defects. Except for the proposed method, the approximate location of the defect was not identified, thereby leading to missing the defect for the input image depicted in Table 6. Meanwhile, the prediction model trained with the proposed method gazed around the defect and successfully predicted the defect. The proposed method thus provided better training data for the prediction models to more accurately determine the presence or absence of defects in LUVT images.
In this paper, we have proposed a data augmentation method that applies style transfer to simulated LUVT images. The collection of LUVT images is expensive and time-consuming, and therefore largescale training data is unavailable. One solution to this issue might be adding a variety of simulated images to the training data. The experiments conducted in this study showed that direct use of the simulated images was not effective because the simulated images greatly differ from the real LUVT images. To cope with this issue, we applied a style transfer to make the style of the simulated images closer to that of the LUVT images and then add them to the training dataset. The experiments demonstrated that the data augmentation based on the style-transformed simulated images improved the prediction performance of defects.
Limitations: One limitation of the proposed method is that it currently focuses primarily on samples of aluminum. Consequently, the method’s applicability to other materials, especially those with different physical properties, has not been sufficiently verified. The behavior of ultrasonic wave propagation can vary significantly between different materials, potentially affecting the generalization capability of the proposed method. Additionally, the lack of runtime data augmentation implementation is another limitation of this study. Data augmentation is a critical means to enhance the model’s generalization, particularly important for real-time applications. In this study, we adopted an approach of pre-processing data augmentation using style transfer, but the consideration of runtime data augmentation remains a topic for future research.
This work was supported by SECOM Science and Technology Foundation and JSPS KAKENHI (C)(21K0423100). This work used computational resources provided by Kyoto University and Hokkaido University through Joint Usage/Research Center for Interdisciplinary Largescale Information Infrastructures and High Performance Computing Infrastructure in Japan (Project ID: jh220033).