Anomaly Detection in Rails Using Dimensionality Reduction

Shingo Mitsui; Toshihiko Sasaki; Masayoshi Shinya; Yasuo Arai; Ryutaro Nishimura

doi:10.2355/isijinternational.ISIJINT-2022-279

Abstract

As railroad rails are an important social infrastructure, monetary and human resources are spent on their maintenance and inspection for people’s safety and security. In particular, rails are uniformly ground periodically to reduce noise and vibration during railroad operations and avoid damage to the rail tops. However, due to cost reduction and lack of human resources, the damage mechanism of rails is being elucidated for more efficient grinding. Factors such as passing tonnage, transit speed, and weather conditions affect rail damage; however, the details of the damage mechanism are not clear. In this study, we measured a large amount of residual stresses and the full width at half maximum (FWHM) of the diffraction rings using a high-speed X-ray residual stress measurement system and explored the possibility of detecting abnormal areas of rails and diagnosing the signs of damage using statistical analysis methods and machine learning.

For verification, 2D mapping measurements of the x-axis component of the vertical stress σ_x, shear stresses τ_xy, τ_xz, and τ_yz, and FWHM of the diffraction ring at the head of a cracked rail were used. The data were subjected to dimensionality reduction by principal component analysis, kernel principal component analysis, and an autoencoder. These normal models were built using the data of the normal areas without cracks. The anomaly score was defined as the differences from the normal models, and the detection accuracies of the models were compared using the area under the receiver operating characteristic curve; the autoencoder showed the best performance.

1. Introduction

A railroad network, such as the Shinkansen, is a social infrastructure that forms the basis of people’s lives and economic activities, and thus, requires a safe operation. Therefore, regular maintenance and inspection are indispensable, and large amounts of human and monetary resources are required, but increasing costs and shortage of human resources have made this difficult. As part of maintenance, rails are ground down several hundred micrometers from the surface every year using a rail grinder to reduce vehicle vibration and running sound and remove the damage caused by friction cracks. To optimize the amount of grinding, research has been conducted on the rolling contact fatigue mechanism of rails, based on scientific evidence. Sasaki et al. reported the mechanism of rolling contact fatigue of rails by conducting triaxial stress analysis of rails using residual stresses and the full width at half maximum (FWHM) of the diffraction rings of X-rays.¹⁾ Many systematic studies have been conducted on rail damage, such as residual stress,^2,3,4,5) crack propagation,^6,7) and white etching layer (WEL).^{8,9,10,11,12)} However, the details of the damage mechanism of rails have not been clarified because of the complex combination of multiple conditions, such as passing tonnage, transit speed, and weather conditions. In addition, actual situations are difficult to reproduce, even if a two-cylinder rolling test is performed, in which test specimens simulating rails and wheels are rotated at a high speed while maintaining contact. Therefore, we are conducting research to quantitatively evaluate the signs of crack initiation and remaining life of rails using statistical analysis and machine learning, based on data on possible causes of damage obtained through previous studies. In this study, we measured a large amount of residual stresses and FWHM of diffraction rings of previously used rails and investigated the possibility of detecting abnormal points through statistical analysis and machine learning. Detailed measurements were retaken on a cracked rail used by Mitsui et al.¹³⁾ to compare and verify anomaly detection methods using dimensionality reduction, such as principal component analysis (PCA), kernel principal component analysis (Kernel PCA), and autoencoder.

2. X-ray Residual Stress Measurement

The application of residual stress measurement using X-rays is widespread in high-speed measurement systems that use a two-dimensional X-ray detector. In our previous study, we demonstrated that residual stress can be measured in less than a second using a device based on the cosα method and silicon-on-insulator (SOI) pixel detectors.¹⁴⁾ We used this device to obtain mapping measurements of the residual stresses and FWHM of the diffraction rings on the cracked rail and proved that stress-releasing areas due to cracks could be detected. In this study, the measured quantities are subjected to statistical analysis and machine learning to quantify the degree of damage to the rail, and a method to utilize this information for rail maintenance and inspection is proposed.

The residual stress measurement method using X-rays was proposed by Christenson et al. in 1953 as the sin²ψ method.¹⁵⁾ This method requires a large stationary apparatus, such as a precise goniometric mechanism, to measure the diffracted intensities of incident X-rays from several angles. The sin²ψ method is used as a global standard; however, it is mainly limited to research purposes because it takes more than 10 min to measure and is not suitable for large objects. In 1978, Taira et al. proposed a residual stress measurement method called the cosα method, which evaluates plane stress by observing the entire diffraction ring of diffracted X-rays with a two-dimensional X-ray detector with single X-ray incidence.¹⁶⁾ Furthermore, Sasaki et al. proposed the generalized cosα method in 1995, which was extended to triaxial stresses.¹⁷⁾ In the 2000 s, a portable X-ray residual stress measurement system using the cosα method became commercially available in Japan and was used by various companies and laboratories because it enabled on-site measurements. However, a smaller and faster device is required for further use on-site and industrial applications. Therefore, in a previous study, we created an X-ray residual stress measurement system using a charge-integration-type SOI pixel detector, INTPIX4,¹⁸⁾ developed at the High Energy Accelerator Research Organization (KEK), and demonstrated that it can measure a single point in less than a second with the same accuracy as the conventional system.¹³⁾ This reduced measurement time facilitates research using a large amount of measurement data, such as mapping measurements. Further applied studies have been conducted using the developed equipment: Sasaki et al. reported triaxial stress analysis of rails,¹⁾ and Nishimura et al. reported precise measurements using synchrotron radiation.¹⁹⁾

In the cosα method, an X-ray beam is incident on a polycrystalline metal in one direction, and the diffraction ring formed by the diffracted X-rays according to Bragg’s law is measured using a two-dimensional X-ray detector. In the stress-free state, the diffraction ring is a perfect circle; however, when the lattice spacing of the metal atoms changes due to stress, the diffraction angle of the incident X-ray changes, and the diffraction ring is deformed. Stress can be evaluated by measuring the changes in the diffraction ring. The detailed measurement and analysis methods were the same as those used by Mitsui et al.¹⁴⁾

3. Anomaly Detection Using Dimensionality Reduction

3.1. Multivariate Gaussian Distribution

In conventional metal material evaluation, the damage is evaluated by combining measurements of the residual stresses and FWHM of diffraction rings. However, there are many influential factors in actual metal materials, and estimating the damage mechanism is sometimes impossible. In this study, we investigated the possibility of detecting abnormal positions by statistically analyzing the residual stresses and FWHM of the diffraction rings at many points using multivariate analysis. To detect abnormal positions based on many factors, such as triaxial stresses and FWHM of the diffraction rings, dimensionality reduction of the measured data was performed, and the degree of the anomaly was defined according to Hotelling’s T² to quantify the degree of damage.^20,21)

Hotelling’s T² considers data 𝒟 ={x⁽ⁿ⁾|n = 1, 2, 3,…N}, which consists of N observations in M dimensions. If 𝒟 contains zero or very few abnormal samples and assumes that each sample follows a Gaussian distribution, the probability density function can be written as

𝒩( x ( n ) |μ,Σ ) = 1 ( 2π ) M | Σ | exp{ - 1 2 ( x ( n ) -μ ) T Σ -1 ( x ( n ) -μ ) },

(1)

where μ is the mean vector, and Σ is the covariance matrix. The log likelihood L(μ, Σ|𝒟) of this function based on 𝒟 is

(2)

In the maximum likelihood method, we determine values for μ and Σ that maximize Eq. (2). By performing partial differentiation on each and solving for the equation that is zero, the maximum likelihood solutions, μ ˆ and Σ ˆ , are obtained as follows:

μ ˆ = 1 N ∑ n=1 N x ( n )

(3)

Σ ˆ ≡ 1 N ∑ n=1 N ( x ( n ) - μ ˆ ) ( x ( n ) - μ ˆ ) T

(4)

The anomaly score, a(x′) of a new observation x′ can be defined by −2×ln𝒩(x′| μ ˆ , Σ ˆ ) and is expressed as follows by ignoring the constant term:

a( x ′ ) = ( x ′ - μ ˆ ) T Σ ˆ -1 ( x ′ - μ ˆ )

(5)

This is equivalent to the square of Mahalanobis distance. The relationship between the distribution of the anomaly score a(x′) and the position of the known damaged area was investigated to study the applicability of anomaly detection.

3.2. Principal Component Analysis

PCA is a multivariate analysis method in which the direction of maximum variance of the multivariate data is the first principal component, and subsequent principal components are orthogonal to the previous principal components and maximize the variance to obtain a new orthonormal basis. Because principal components can be used to represent the original data with fewer variables, they are used for dimensionality reduction to reduce the number of variables in the data. When dealing with data where variables are not independent of each other, PCA can be used to reduce unnecessary dimensions and avoid multicollinearity; thus, the covariance matrix can be calculated correctly, and anomalies can be detected.²²⁾

Consider data 𝒟 ={x⁽ⁿ⁾|n = 1, 2, 3,…N}, which consists of N observations of M-dimensional data. Assuming that 𝒟 contains zero or very few abnormal samples, the sample mean vector and covariance matrix can be written as Eqs. (3) and (4), respectively. Additionally, S is defined as follows:

S≡ ∑ n=1 N ( x (n) - μ ˆ ) ( x (n) - μ ˆ ) T

(6)

The data matrix X ˜ , with the origin adjusted such that the sample mean is zero, can be expressed using X ≡ [x⁽¹⁾,...,x⁽^N⁾] as follows:

X ˜ ≡X H N =[ x ( 1 ) - μ ˆ ,…, x ( N ) - μ ˆ ],

(7)

where the centering matrix H_N is defined as H N ≡ I N - 1 N 1 N 1 N T . Here, I_N is an N × N identity matrix, and 1_N is an N-dimensional column vector whose elements are all 1. Then,

Σ ˆ ≡ 1 N X ˜ X ˜ T = 1 N X H N H N T X T = 1 N X H N X T .

(8)

Here, H_N is a symmetric matrix and H_N² = H_N is used.

Assuming that the normal subspace consisting of only the data of the normal part is spanned by orthonormal bases u₁,..., u_m, let u be one of them for simplicity. The component of x⁽ⁿ⁾ along u is represented by u^Tx⁽ⁿ⁾, and its sample mean is

1 N ∑ n=1 N u T x ( n ) = 1 N u T ∑ n=1 N x ( n ) = u T μ ˆ .

(9)

The sample variance is as follows:

1 N ∑ n=1 N [ u T ( x ( n ) - μ ˆ ) ] 2 = 1 N u T ∑ n=1 N ( x ( n ) - μ ˆ ) ( x ( n ) - μ ˆ ) T u= u T Σ ˆ u

(10)

In other words, in PCA, u is the direction in which the variance u^T Σ ˆ u is maximized. We consider the question of maximizing u^T Σ ˆ u with the constraint that u^Tu= 1. If we introduce the Lagrange constant λ and use S = N Σ ˆ , then the equation satisfied by the optimal solution of u is as follows:

0= ∂ ∂u [ u T Su+λ( 1- u T u ) ] =( S+ S T ) u-2λu

(11)

Therefore, because S is a real symmetric matrix, the following eigenequations are derived.

Su=λu

(12)

Multiplying the above equation by u^T from the left, we get

u T Su= u T λu=λ

(13)

Therefore, the variance and eigenvalue are proportional. In other words, in PCA, if the variance has to be maximized as the basis of the normal subspace, we should take m eigenvectors of S from one with the larger eigenvalue.

Next, suppose an orthonormal basis U_m ≡ [u₁,..., u_m] is obtained by PCA of normal data only, which spans an m-dimensional normal subspace with dimensionality reduction by solving the eigenequation. By resolving an arbitrary input x′ into its normal subspace components x′₍₁₎ and orthogonal components x′₍₂₎, we can express it as follows:

x ′ (1) = ∑ i=1 m u i u i T ( x ′ - μ ˆ ) = U m U m T ( x ′ - μ ˆ )

(14)

x ′ (2) =( x ′ - μ ˆ ) - x ′ (1) =[ I M - U m U m T ]( x ′ - μ ˆ ) ,

(15)

where I_M is an M × M identity matrix. The anomaly score is defined as the distance from the normal subspace and can be expressed as follows:

a( x ′ ) = ‖ x ′ (2) ‖ 2 = ‖ [ I M - U m U m T ]( x ′ - μ ˆ ) ‖ 2 = ( x ′ - μ ˆ ) T [ I M - U m U m T ] T [ I M - U m U m T ]( x ′ - μ ˆ ) = ( x ′ - μ ˆ ) T [ I M - U m U m T ]( x ′ - μ ˆ )

(16)

Comparing Eq. (16) with the anomaly score by Hotelling’s T² method in Eq. (5), it is known that [ I M - U m U m T ] corresponds to Σ ˆ ⁻¹, which can be approximated by Eq. (16) when the rank of Σ ˆ is approximately equal to the number of principal components m.

3.3. Kernel Principal Component Analysis

Kernel PCA²³⁾ is a method in which a data matrix is nonlinearly mapped into a high-dimensional space and then subjected to PCA. A data matrix with a nonlinear distribution is mapped to a high-dimensional space to enable the linear separation of anomalous points. Here, we consider a transformation ϕ(x) that maps onto an M_ϕ-dimensional feature space. Because the computational complexity of high-dimensional mapping is high, we solve this problem by replacing X ˜ X ˜ T in PCA with the kernel functions k(x⁽ⁱ⁾, x⁽^j⁾) = ϕ(x⁽ⁱ⁾)ϕ(x⁽^j⁾)^T.²⁴⁾

Assuming that the mean of the dataset mapped into the M_ϕ-dimensional feature space ∑ n=1 N ϕ( x ( n ) ) is zero, the M_ϕ×M_ϕ covariance matrix C can be calculated as follows:

C= 1 N ∑ n=1 N ϕ( x ( n ) ) ϕ ( x ( n ) ) T

(17)

Subsequently, for i = 1, ..., M_ϕ, the eigenvalues and eigenvectors can be written as follows:

C v i = λ i v i

(18)

From the definition of C, it can be transformed as follows:

1 N ∑ n=1 N ϕ( x ( n ) ) { ϕ ( x ( n ) ) T v i }= λ i v i

(19)

Assuming λ_i > 0, v_i is given by a linear combination of ϕ(x⁽ⁿ⁾) and can be written as

v i = ∑ n=1 N a in ϕ( x ( n ) ) .

(20)

Therefore, Eq. (18) is transformed as follows by substituting v_i:

1 N ∑ n=1 N ϕ( x ( n ) ) ϕ ( x ( n ) ) T ∑ h=1 N a ih ϕ( x ( h ) ) = λ i ∑ n=1 N a in ϕ( x ( n ) )

(21)

If we multiply both sides by ϕ(x⁽^l⁾)^T from the left and use the kernel functions k(x⁽ⁱ⁾, x⁽^j⁾) = ϕ(x⁽ⁱ⁾) ϕ(x⁽^j⁾)^T, we can write the following:

1 N ∑ n=1 N k( x ( l ) , x ( n ) ) ∑ h=1 N a ih k( x ( n ) , x ( h ) ) = λ i ∑ n=1 N a in k( x ( l ) , x ( n ) )

(22)

The matrix notation is as follows:

K 2 a i = λ i NK a i ,

(23)

where a_i=[a₁_i,a₂_i,...a_Ni]^T is defined as an N-dimensional column vector. K is called the Gram matrix with elements K_ij = k(x⁽ⁱ⁾, x⁽^j⁾) in the i^th row and j^th column. The solution of a_i can be determined by solving the following equation:

K a i = λ i N a i

(24)

The normalization condition in the feature space for coefficient a_i from Eqs. (20) and (24) is

1= v i T v i = ∑ n=1 N ∑ h=1 N a in a ih ϕ ( x ( n ) ) T ϕ( x ( h ) ) = a i T K a i = λ i N a i T a i .

(25)

Assuming that we have solved the eigenvalue problem, we can express the mapping to the principal components using a kernel function. The mapping of point x to eigenvector v_i is expressed as follows:

y i ( x ) =ϕ ( x ) T v i = ∑ n=1 N a in ϕ ( x ) T ϕ( x ( n ) ) = ∑ n=1 N a in k( x, x ( n ) )

(26)

Because the set of mapped data is usually not zero, we consider the set of data after centering ϕ ˜ (x⁽ⁿ⁾).

ϕ ˜ ( x ( n ) ) =ϕ( x ( n ) ) - 1 N ∑ l=1 N ϕ( x ( l ) )

(27)

The corresponding elements of the Gram matrix are given by

K ˜ nh = ϕ ˜ ( x ( n ) ) T ϕ ˜ ( x ( h ) ) = [ ϕ( x ( n ) ) - 1 N ∑ l=1 N ϕ( x ( l ) ) ] T [ ϕ( x ( h ) ) - 1 N ∑ l=1 N ϕ( x ( l ) ) ] =ϕ ( x ( n ) ) T ϕ( x ( h ) ) - 1 N ∑ l=1 N ϕ ( x ( n ) ) T ϕ( x ( l ) ) - 1 N ∑ l=1 N ϕ ( x ( l ) ) T ϕ( x ( h ) ) + 1 N 2 ∑ j=1 N ∑ l=1 N ϕ ( x ( j ) ) T ϕ( x ( l ) ) =k( x ( n ) , x ( h ) ) - 1 N ∑ l=1 N k( x ( l ) , x ( h ) ) - 1 N ∑ l=1 N k( x ( n ) , x ( l ) ) + 1 N 2 ∑ j=1 N ∑ l=1 N k( x ( j ) , x ( l ) ) .

(28)

Putting this in matrix notation, the Gram matrix K ˜ can be written as

K ˜ = H N K H N ,

(29)

where H_N is the centralization matrix. Therefore, K ˜ can be expressed only by the kernel function, and the eigenvalue equation can be calculated by replacing X ˜ X ˜ T in the PCA with the kernel function, as in the following formula:

H N K H N v=λv

(30)

In this study, we used the following Gaussian kernel (radial basis function kernel):

k( x,y ) =exp( - ‖ x-y ‖ 2 2 σ 2 )

(31)

Suppose we solve Eq. (30) to determine the eigenvectors corresponding to the top m eigenvalues. Then, the anomaly score can be calculated in the same way as in the PCA using Eq. (16).

3.4. Autoencoder

Autoencoder,²⁵⁾ an algorithm for dimensionality reduction using neural networks, was used to evaluate the degree of rail damage by constructing a model from only the normal region of the cracked rail and calculating the degree of the anomaly score at the measurement points.

An autoencoder is a neural network in which the number of artificial neurons in the input and output layers is the same. It is a network that has been trained to reproduce the input data in the output layer. Usually, there is one middle layer, consisting of fewer artificial neurons than that in the input and output layers, which is a mechanism to reduce the dimensionality of the input data.

The activity level of the i^th neuron in layer l, a i ( l ) is expressed using the j^th output of layer (l-1), z i ( l-1 ) as follows:

a i ( l ) =h( ∑ j=1 n W ij ( l-1 ) z j ( l-1 ) + b i ( l-1 ) ) ,

(32)

where W ij ( l-1 ) is the weight in layer l-1, b i ( l-1 ) is the bias in layer l-1, and h() is the activation function (Fig. 1). In this study, the activation function for the middle layer was a sigmoid function defined by

h( x ) = 1 1+exp( -x ) .

(33)

The activation function of the output layer is based on the identity function σ(x) = x.

Fig. 1.

Architecture of an autoencoder.

When training the autoencoder, these two parameters W_ij and b_i are learned to minimize the following binary cross-entropy as a loss function:

L=- 1 N ∑ i=1 N ( y i ln p i +( 1- y i ) ln( 1- p i ) )

(34)

Here, N is the number of data, y_i is the ground truth label of the i^th data, and p_i is the predicted probability of the i^th data (predicted to be 1). Then, the test data were input to the trained model, and the square of the Mahalanobis distance in Eq. (5) in the difference x′ between the input and output was evaluated as the anomaly score.

a( x ′ ) = ( x ′ - μ ˆ ) T Σ ˆ -1 ( x ′ - μ ˆ ) ,

(35)

where μ ˆ is the sample mean of the test data, and Σ ˆ ⁻¹ is the inverse matrix of the covariance matrix of the test data.

4. Anomaly Detection of the Rail

4.1. X-ray Residual Stress Measurement of the Rail

Detailed X-ray residual stress measurements were retaken on a cracked rail used by Mitsui et al.¹³⁾ to validate the damage evaluation of metal by anomaly detection using PCA, kernel PCA, and autoencoder; the methods are described below.

A total of 1719 data points were measured on the cracked rail cut out to 200 mm, nine points at 5-mm intervals of 40 mm in the shorter direction, and 191 lines at 1-mm intervals of 190 mm in the longer direction. The exposure time of the X-ray detector was 100 ms/frame, and the diffraction ring image was acquired by integrating 10 frames. As it took approximately 26 ms to transfer one frame, the measurement time for one point was approximately 1.26 s. The detector was in contact with a heat sink and fixed at 15°C by cooling water. The measurement was carried out in an X-ray-shielded box, which also served as a light shield, and the temperature inside the box was approximately 30°C. The detector was fully depleted by a bias voltage of 70 V. Therefore, the 5.4 keV Cr-Kα characteristic X-rays used in the measurement were more than 99% detectable. The power of the X-ray tube was set at 20 kV, 4 mA. The calibration of the measurement system was performed as in Reference 13 using commercial zero and high stress standard test pieces.

The measurements were performed in two directions, ψ₀ = 0° and 35° to the vertical of the rail head, and the x-axis component of the residual stress σ_x, shear stresses τ_xy, τ_xz, and τ_yz, and FWHM of the diffraction ring were measured. There were nine measurement points per line in the shorter direction, and each measurement point had 10 measurement quantities (the x-axis component of the residual stress σ_x and its error, shear stresses τ_xy, τ_xz, and τ_yz and their errors, and FWHM of the diffraction ring from two directions) for a total of 90 dimensions of data. Here, the longer direction was defined as the X-direction, the shorter direction as the Y-direction, and the height of the rail was the Z-direction. Thirty lines (70–85 mm and 103–116 mm) with visual cracks of 191 lines of measurement data were identified as abnormal areas. The measured data were standardized to a mean value of 0 and a standard deviation of 1 for each parameter. Ninety-five lines from 0 to 49 mm and 146 to 190 mm, which did not contain any damaged areas, were used as training data, and the accuracy of the model was verified using the anomaly scores of the lines from 50 to 145 mm as test data.

Figure 2 shows the cracked rail sample and the FWHM distribution of the diffraction rings at ψ₀ = 35°. The distributions of σ_x at ψ₀ = 35°, τ_xy at ψ₀ = 35°, FWHM of the diffraction rings at ψ₀ = 0°, τ_xz at ψ₀ = 0°, and τ_yz at ψ₀ = 0° are shown in Figs. 3, 4, 5, 6, 7, respectively. The FWHM of the diffraction rings and τ_xz are larger at the upper edge of the rail. This is because of WEL formation due to martensitic transformation caused by frictional heat from rail–wheel contact. WEL is observed as a widening FWHM of the diffraction rings due to high dislocation density, which was confirmed in a previous study.¹⁾ Plastic flow is known to occur on the rail surface,^26,27,28,29) and it is reported to be consistent with the distribution of τ_xz.³⁰⁾

Fig. 2.

Cracked rail sample (upper) and distribution of FWHM of diffraction rings at ψ₀ = 35° (lower). (Online version in color.)

Fig. 3.

Distribution of σ_x at ψ₀ = 35°. (Online version in color.)

Fig. 4.

Distribution of τ_xy at ψ₀ = 35°. (Online version in color.)

Fig. 5.

Distribution of FWHM of diffraction rings at ψ₀ = 0°. (Online version in color.)

Fig. 6.

Distribution of τ_xz at ψ₀ = 0°. (Online version in color.)

Fig. 7.

Distribution of τ_yz at ψ₀ = 0°. (Online version in color.)

The FWHM of the diffraction rings are smaller around the cracks, which is caused by a type of softening phenomenon due to the repeated passage of trains after crack initiation. The σ_x is lower at the cracks due to the release of residual stresses. The purpose of this method is to detect these changes around the crack in the magnitude of the anomaly score.

4.2. Comparison of Model Accuracy

The anomaly score distribution of each model by PCA, kernel PCA, and autoencoder when the data is compressed to two dimensions is shown in Fig. 8.

Fig. 8.

Anomaly score distribution when data is compressed to two dimensions.

A higher anomaly score in this figure is assumed to indicate more damage in that location. In addition, by setting a threshold value, areas where the anomaly score exceeds an arbitrary predefined value can be evaluated as damaged areas. The accuracy of each model can be evaluated by comparing the actual crack locations and the locations evaluated as abnormal, adjusting the threshold of the anomaly score as needed. Then, a confusion matrix, as shown in Fig. 9, is created with each value defined by Eqs. (36), (37), (38), (39).

Sensitivity= TP TP+FN

(36)

Specificity= TN FP+TN

(37)

Positive predictive value= TP TP+FP

(38)

Negative predictive value= TN FN+TN

(39)

The change in the true positive rate against the false positive rate is called the receiver operating characteristic (ROC) curve, as shown in Fig. 10, and we can compare the performance of the models by comparing the area under the ROC curve (AUROC), i.e., the area of the lower region of the ROC curve. The higher the true positive rate and the lower the false positive rate, the larger the AUROC and the better the performance of the model. The ROC curves for each model in the two dimensions are shown in Fig. 11. The AUROC for the autoencoder was 0.803, which was larger than the 0.650 for PCA and 0.666 for Kernel PCA. The results of the AUROC comparison are shown in Fig. 12 when the number of dimensions after compression for each model is changed from 2 to 89. The autoencoder, using L1 regularization to remove the unnecessary variables and L2 regularization to prevent overfitting,³¹⁾ shows the highest AUROC, regardless of the dimension. In particular, it can detect anomalies with high accuracy even in low dimensions (such as two dimensions). In addition, the autoencoder and Kernel PCA, which are nonlinear transformations, exhibit relatively higher AUROCs than PCA, which is a linear transformation. In other words, nonlinear transformation is more efficient than linear transformation for dimensionality reduction without losing the original information. However, if the crack on the rail surface spreads horizontally inside the rail, the location of the crack and the measured anomaly may differ. For a more detailed comparison, a larger amount of data should be analyzed.

Fig. 9.

Confusion matrix.

Fig. 10.

ROC curve.

Fig. 11.

ROC curve in two dimensions for each model.

Fig. 12.

Dimension dependence of AUROC for each model.

5. Conclusion

Using the developed X-ray residual stress measurement system, the residual stresses and FWHM of the diffraction rings on the surface of the rail with cracks were measured, and anomaly detection was performed using PCA, kernel PCA, and autoencoder. When the accuracies of the models were compared, anomaly detection by machine learning with autoencoder showed the best performance. This research will enable quantitative damage evaluation of rails, which has been an ongoing challenge, and efficient maintenance. However, because rail usage varies by region, practical application will require on-site evaluation.

In the future, we plan to improve the accuracy of anomaly detection by analyzing a larger amount of measurement data and using more advanced machine learning, such as convolutional autoencoders. Furthermore, this analysis method can also be used for anomaly detection by combining measurement data from ultrasonic inspection and visible light images. In addition, because this method can be applied to anomaly detection not only in rails but also in various mechanical metal parts, such as bearings, we can verify its applicability to a variety of objects.

Acknowledgements

This work was supported by ISIJ Research Promotion Grant, Adaptable and Seamless Technology Transfer Program through Target-driven R&D from The Japan Science and Technology Agency, and JSPS KAKENHI Grant Numbers 21K03830, 16H02309, and 16K14145. Designing of INTPIX4 was supported by Systems Design Lab (d.lab), the University of Tokyo in collaboration with Cadence Design Systems, Inc., Synopsys, Inc., and Menter Graphics, Inc.

References

1) T. Sasaki, M. Shin-ya, S. Mitsui, R. Nishimura, K. Yanagi, T. Miyoshi and Y. Arai: Nucl. Instrum. Methods Phys. Res. Sect. A, 979 (2020), 164426. https://doi.org/10.1016/j.nima.2020.164426
2) C. Kang, M. Wenner and S. Marx: Constr. Build. Mater., 284 (2021), 122856. https://doi.org/10.1016/j.conbuildmat.2021.122856
3) Y. Li, J. Chen, J. Wang, X. Shi and L. Chen: Int. J. Fatigue, 139 (2020), 105750. https://doi.org/10.1016/j.ijfatigue.2020.105750
4) T. Takehashi, H. Higashi, T. Mizukami, K. Tachi, O. Yaguchi and T. Sasaki: Proc. Materials and Processing Conf. (M&P2009), (Toyama), JSME, Tokyo, (2009), 109 (in Japanese). https://doi.org/10.1299/jsmemp.2009.17._109-1_
5) T. Sasaki, S. Takahashi, Y. Kanematsu, Y. Satoh, K. Iwafuchi, M. Ishida and Y. Morii: Wear, 265 (2008), 1402. https://doi.org/10.1016/j.wear.2008.04.047
6) T. Kato, T. Fujimura, S. Hiramatsu and Y. Yamamoto: Mater. Trans., 62 (2021), 185. https://doi.org/10.2320/matertrans.Z-M2020860
7) S. Bogdanski, M. Olzak and J. Stupnicki: Wear, 191 (1996), 14. https://doi.org/10.1016/0043-1648(95)06685-3
8) Q. Lian, H. Zhu, G. Deng, X. Wang, H. Li, X. Wang and Z. Liu: Int. J. Fatigue, 159 (2022), 106799. https://doi.org/10.1016/j.ijfatigue.2022.106799
9) A. Al-Juboori, H. Zhu, D. Wexler, H. Li, C. Lu, A. McCusker, J. McLeod, S. Pannila and J. Barnes: Wear, 436–437 (2019), 202998. https://doi.org/10.1016/j.wear.2019.202998
10) Q. Lian, G. Deng, A. Al-Juboori, H. Li, Z. Liu, X. Wang and H. Zhu: Eng. Fail. Anal., 104 (2019), 816. https://doi.org/10.1016/j.engfailanal.2019.06.067
11) M. Tsujie, H. Mori, H. Matsuda and Y. Sato: Materia Jpn., 49 (2010), 455 (in Japanese). https://doi.org/10.2320/materia.49.455
12) J. W. Seo, S. J. Kwon, H. K. Jun and D. H. Lee: Key Eng. Mater., 417–418 (2009), 309. https://doi.org/10.4028/www.scientific.net/KEM.417-418.309
13) S. Mitsui, T. Sasaki, Y. Arai, T. Miyoshi and R. Nishimura: Nucl. Instrum. Methods Phys. Res. Sect. A, 924 (2019), 441. https://doi.org/10.1016/j.nima.2018.09.147
14) S. Mitsui, T. Sasaki, M. Shinya, Y. Arai, T. Miyoshi and R. Nishimura: Nucl. Instrum. Methods Phys. Res. Sect. A, 979 (2020), 164376. https://doi.org/10.1016/j.nima.2020.164376
15) A. L. Christenson and E. S. Rowland: Trans. Am. Soc. Met., 45 (1953), 638.
16) S. Taira, K. Tanaka and T. Yamasaki: J. Soc. Mater. Sci., Jpn., 27 (1978), 251 (in Japanese). https://doi.org/10.2472/jsms.27.251
17) T. Sasaki and Y. Hirose: J. Soc. Mater. Sci., Jpn., 44 (1995), 1138. https://doi.org/10.2472/jsms.44.1138
18) S. Mitsui, Y. Arai, T. Miyoshi and A. Takeda: Nucl. Instrum. Methods Phys. Res. Sect. A, 953 (2020), 163106. https://doi.org/10.1016/j.nima.2019.163106
19) R. Nishimura, S. Kishimoto, T. Sasaki, S. Mitsui, M. Shinya, Y. Arai and T. Miyoshi: Nucl. Instrum. Methods Phys. Res. Sect. A, 978 (2020), 164380. https://doi.org/10.1016/j.nima.2020.164380
20) H. Hotelling: Ann. Math. Stat., 2 (1931), 360. https://doi.org/10.1214/aoms/1177732979
21) T. Ide: Introduction to Anomaly Detection using Machine Learning, Corona Publishing, Tokyo, (2015), 37 (in Japanese).
22) P. C. Mahalanobis: Proc. Natl. Inst. Sci. India, 2 (1936), 49.
23) B. Schölkopf, A. Smola and K. R. Müller: Neural Comput., 10 (1998), 1299. https://doi.org/10.1162/089976698300017467
24) C. M. Bishop: Pattern Recognition and Machine Learning, Springer, Berlin, (2006), 586.
25) G. E. Hinton and R. R. Salakhutdinov: Science, 313 (2006), 504. https://doi.org/10.1126/science.1127647
26) M. Matsui, Y. Kanematsu, Y. Kamiya and N. Matsumoto: Q. Rep. RTRI, 56 (2015), 15. https://doi.org/10.2219/rtriqr.56.15
27) Y. Satoh and K. Iwafuchi: Q. Rep. RTRI, 46 (2005), 178. https://doi.org/10.2219/rtriqr.46.178
28) A. F. Bower and K. L. Johnson: Wear, 144 (1991), 1. https://doi.org/10.1016/0043-1648(91)90003-D
29) M. Akama and S. Matsuyama: Q. Rep. RTRI, 40 (1999), 98. https://doi.org/10.2219/rtriqr.40.98
30) T. Sasaki: 2020 Fiscal Year Final Research Report, https://kaken.nii.ac.jp/ja/file/KAKENHI-PROJECT-16H02309/16H02309seika.pdf, (accessed 2022-08-15) (in Japanese).
31) A. Y. Ng: Proc. 21st Int. Conf. on Machine Learning (ICML ’04), ACM, New York, (2004), 78. https://doi.org/10.1145/1015330.1015435

Corresponding author

Register with J-STAGE for free!