Estimating the S-N Curve by Machine Learning Random Forest Method

Nobuo Nagashima; Masao Hayakawa; Hiroyuki Masuda; Kotobu Nagai

doi:10.2320/matertrans.MT-Z2023006

Abstract

Fatigue limit is well predicted by tensile strength or hardness, and the relationship is often analyzed by linear regression using the minimum squared approximation. However, the prediction of the number of cycles to failure at a given stress amplitude, meaning the estimate of the S–N curve, has not been realized. Therefore, we aim to investigate the estimability of the S–N curve using the random forest method based on the data described in the NIMS fatigue data sheet. The random forest method is a machine learning algorithm and an ensemble learning algorithm that integrates weak learners of multiple decision tree models to improve generalization ability. It was clarified that the machine learning of the multiple decision tree model is excellent in fatigue limit prediction. The S–N curve can be accurately estimated by combining the prediction of fatigue limit and the number of cycles to failure at a given stress amplitude.

This Paper was Originally Published in Japanese in J. Soc. Mater. Sci., Japan 70 (2021) 876–880.

Fig. 9 Prediction of S-N curve of fracture life using data of S25C, S35C, S55C, SNCM439, SmN438, SmN43, SUS403, SUS304 (data of fracture life of 5 × 10⁶ times or less, fatigue limit considers only hardness).

1. Introduction

NIMS has accumulated fatigue test data of various structural materials for approximately 40 years, known as NIMS fatigue data sheets (FDS).¹⁾ These FDS show empirical correlations between fatigue limits (i.e., fatigue strength at 10⁷ cycles) and other mechanical properties (Fig. 1²⁾). From these FDS, it is empirically known that there is a correlation between fatigue limit and other mechanical properties (Fig. 1²⁾). In addition to the fatigue limit, the estimation of fatigue strength (S–N curve) is attempted by normalizing the stress amplitude using the tensile strength.³⁾ Table 1 lists the index properties of fatigue. In Table 1, fatigue is first classified into high- and low-cycle fatigue according to the life range. The high-cycle fatigue strength property is generally expressed by the curve σ_a-N_f, which is the relationship between stress amplitude and life. In this case, the index is the strength property, with tensile strength σ_B denoting the static index and cyclic yield stress σ_yc characterizing the dynamic index. The reasons for this are described in a later. Conversely, the low-cycle fatigue strength property is represented by the relationship between strain and life, ε_a-N_f. Therefore, the deformation characteristic is considered an index. In this case, the static index is the rupture ductility ε_f and the dynamic index is the exponent n′ of the cyclic stress–strain curve.³⁾ It is empirically known that an excellent correlation exists between tensile strength σ_B and fatigue limit σ_w. The correlation between yield stress σ_y (or 0.2% proof stress σ_0.2) and σ_w has also been investigated, but it is not as strong as the σ_B–σ_w relationship because σ_y is affected by an instability phenomenon called yielding. However, a linear relationship is established between the cyclic yield stress σ_yc and σ_w because σ_yc corresponds to the internal microstructure, reaching a certain steady state after repeated plastic deformation. Thus, it is reasonable to adopt tensile strength σ_B as a static index of high-cycle fatigue strength and cyclic yield stress σ_yc as a dynamic index. The dynamic index should essentially be adopted because fatigue is caused by repeated plastic strain, but there are some barriers to adopting σ_yc. First, σ_yc must be measured by a strain control test using the companion specimen method or the incremental step method,⁴⁾ and the measurement data are not plentiful. As shown in Fig. 2, the two index properties σ_B and σ_yc are proportional, so we believe it is acceptable to use the static index for practical purposes. Figure 3 shows σ_a/σ_B-Nf normalized by σ_B. However, the entire normalized results in a wide band, which is not an accurate estimation. Therefore, we attempted to estimate the S–N curve (relationship between stress amplitude and fatigue life) through machine learning.

Fig. 1

Relationship between mechanical properties and fatigue limit. (a) versus Vickers Hardness (b) versus tensile strength.

Table 1 Index property of fatigue.

Fig. 2

Relationship between tensile strength and cyclic yield stress.

Fig. 3

S-N curves were normalized in tensile strength.

The random forest method is an algorithm in machine learning. It is an ensemble learning algorithm that improves generalization ability by integrating weak learners of multiple decision tree models and is mainly used for classification (discrimination) and regression (estimation) applications. The key issues are (1) whether more accurate data can be sampled for the target data population and (2) whether decision tree models can be created for each training component. In conventional mathematical model regression, the regression is based on the least-squares approximation to find the correlation between two data sets of interest. However, machine learning can create a regression model that relates multiple decision tree models of the learning elements, which is expected to provide a more accurate estimation.

In this study, we explored the improvement of the estimation accuracy of the fatigue limit using the experimental data available from the NIMS FDS by the random forest method. Next, the possibility of estimating the S–N curve was also examined by predicting the fatigue strength below 10⁶ cycles using the same method.

2. Analysis Method

The data population for estimating fatigue limits was based on the experimental data of S25C (FDS No. 1) and S55C (FDS No. 4) by rotating bending fatigue tests. A random forest method was used to examine the effect of each study element. Next, fatigue limit data from torsional fatigue tests were added to the data population to study the effects of different fatigue test methods. Furthermore, the effect of stress ratio was examined by adding the fatigue test data with R = 0 and with stress ratio R = −1. On the basis of the results of previous studies, estimation accuracy was examined using fatigue data for different types of steels: S35C (FDS No. 2), SNCM439 (FDS No. 25), SmN438 (FDS No. 16), SmN443 (FDS No. 17), SUS403 (FDS No. 30), SUS304 (FDS No. 33), and S25C and S55C. The estimation accuracy of the fatigue data of different types of steel was examined. Next, fatigue life estimation was attempted using fatigue data of 10⁶ cycles or less for various steels. Finally, the estimation of the S–N curve was attempted for S45C (FDS No. 3) tempered at 550°C, Heat A, by predicting the fatigue strength under 10⁶ cycles for each stress amplitude. Until now, “elongation” and “reduction of area” have not been focused on because they correlate well with tensile strength and hardness for estimating fatigue limits. However, in the finite life range of the S–N curve, especially in the low-cycle range of short life, rupture ductility is an indicator of low-cycle fatigue, so a decision tree model was adopted to relate tensile strength, hardness, elongation, and reduction of area.

A commercial personal computer was used for machine learning, and Python 3.6.1,⁵⁾ available for free download, and the external library Anaconda⁶⁾ were used.

The target data were the fatigue test results described in the FDS. For the sake of fairness of analysis, 80% of the data were training data and 20% were test data randomly extracted each time. Therefore, it is impossible to determine which data are the test data. The mean absolute percentage error (MAPE) was obtained from the test data as one of the evaluation results of the analysis.

\begin{equation} \text{MAPE (%)} = \frac{100}{N} \sum\nolimits_{i = 1}^{N}\left| \frac{\widehat{y\imath} - yi}{yi} \right| \end{equation}

(1)

where $\widehat{y\imath }$ is the value of the data used in the analysis and yi is the estimate obtained from the analyzed data.

The root mean square error (RMSE), mean squared error (MSE), and coefficient of determination (R2) are used as indicators to evaluate the fit accuracy of the model obtained in the regression analysis. However, when calculated with the RMSE and MSE error functions, the + and − data are summed, resulting in a canceled mean error. Conversely, MAPE can localize discrepancies in prediction data because of absolute values, and problems with MAPE include cases where the measured value is zero, or the prediction is too small. Additionally, without cross-validation and grid search, biased conclusions may be obtained. However, for all predictions, a relationship diagram between experimental and predicted values, as shown in Fig. 4, is developed and visually observed, which is considered a substitute for cross-validation and grid search. For these reasons, we considered it appropriate to use MAPE rather than RMSE and MSE as the error function in this study.

Fig. 4

Relationship between fatigue limit by AI prediction and fatigue limit by experiment using 10⁷ times unbroken data of rotational bending fatigue test and torsional fatigue test of S25C and S55C. (a) Prediction using a decision tree model for HV only. (b) Prediction using a decision tree model for HV and test method.

3. Analysis Results and Discussion

3.1 Fatigue limit estimation by machine learning

3.1.1 Fatigue limit analysis of S25C and S55C under rotating bending tests

Using the data from S25C and S55C rotating bending fatigue tests (total = 218), four decision tree models were created as learning factors for Vickers hardness, tensile strength, elongation, and reduction of area. Table 2 shows the results. The MAPE of Vickers hardness and tensile strength is <2%, signifying a high estimation accuracy. These results confirm the excellent correlation between hardness and tensile strength and fatigue limit shown in FDS No. 5 (Fig. 1) by machine learning, and the estimation accuracy is much improved.

Table 2 Analysis results by machine learning.

3.1.2 Influence of test method

Torsion test data were added to the rotating bending test data for S25C and S55C conducted in Section 3.1.1. (total = 279). A test method section was added as a learning element. The analysis results are shown in Table 3 and Fig. 4. The fatigue limit estimated only using the Vickers hardness in Fig. 4(a) was approximately 12% of MAPE. Alternatively, the MAPE of the fatigue limit estimated from the regression model that links Vickers hardness and the decision tree model of the test method in Fig. 4(b) is 2.23%, dramatically improving estimation accuracy. This result indicates that the regression model by machine learning, which can relate multiple learning factors, is effective for fatigue limit estimation.

Table 3 Analysis results by machine learning.

3.1.3 Effect of stress ratio

A decision tree model was added to the rotating bending and torsion test results for S25C and S55C conducted in Section 3.1.2, using the test data from the axial loading tests (R = 0 and −1) as stress ratios (total = 306). The analytical results are shown in Table 3 and Fig. 5. The MAPE of the fatigue limit estimated only by the tensile strength and test method in Fig. 5(a) was 3.02%. The MAPE of the fatigue limit estimated from the regression model with three learning factors based on the decision tree model of stress ratio, tensile strength, and the test method in Fig. 5(b) is 2.35%, which is an enhancement in the estimation accuracy.

Fig. 5

Relationship between fatigue limit by AI prediction and fatigue limit by experiment using 10⁷ times unbroken data (total 306) of axial load test (R = 0, −1) for rotating bending fatigue test and torsion fatigue test of S25C and S55C. (a) Prediction by tensile strength and test method. (b) Prediction by tensile strength, test method and stress ratio.

3.1.4 Influence of various steel data

Fatigue limit data (total = 892) from rotating bending fatigue tests of S35C, SNCM439, SmN438, SmN443, SUS403, and SUS304 were added to the fatigue test results of S25C and S55C. The analysis results are shown in Table 3 and Fig. 6. The MAPE of the fatigue limit estimated from the regression model linking the hardness and decision tree model of the test method was 2.94%, which is a high estimation accuracy.

Fig. 6

Prediction of 10⁷ times fatigue limit using data of S25C, S35C, S55C, SNCM439, SmN438, SmN443, SUS403, SUS304.

3.2 Estimation of fatigue strength below 10⁶ times by machine learning

A decision tree model of Vickers hardness, tensile strength, reduction of area, and elongation was developed by restricting the analysis to the S25C and S55C fatigue data (total = 515) of 10⁶ cycles or less, and the fatigue strength of 10⁶ cycles or less was estimated by relating all decision tree training elements. The results of the analysis are shown in Table 4 and Fig. 7. The regression model with the decision tree model for Vickers hardness, tensile strength, elongation, and reduction of area showed a high estimation accuracy of 92.0% for the training data, but 65.8% for the randomly selected test data, and 38.7% for the MAPE. This is thought to be because the training data distinguish between S25C and S55C fatigue data, resulting in fatigue strength estimates closer to the original data. Conversely, since the test data are extracted randomly, S25C and S55C, which have different fatigue strengths, are not distinguished, and the estimated data vary. It is unknown which data correspond to each of S25C and S55C (because the data are extracted at random), but it is thought that it is probably the band indicated by the circle in the figure.

Table 4 Analysis results by machine learning.

Fig. 7

Prediction result of fracture life using data of S25C and S55C (only data with fracture life of 10⁶ times or less is used).

Next, the Vickers hardness, tensile strength, elongation, and reduction of area were estimated by linking the decision tree models using a total of 2478 pieces of fatigue data (10⁶ cycles or less) for different types of steels (S25C, S55C, S35C, SNCM439, SmN438, SmN443, SUS403, and SUS304). The results are shown in Table 4 and Fig. 8, which show that MAPE was estimated 29.8% more accurately than for the two steel grades, S25C and S55C, as shown in Fig. 7. This result is due to the increase in the total number of data by a factor of five compared to Fig. 7, and a further improvement in estimation accuracy can be expected with more experimental data in the future.

Fig. 8

Prediction result of fracture life using data of S25C, S35C, S55C, SNCM439, SmN438, SmN443, SUS403, SUS304 (only data with fracture life of 10⁶ times or less is used).

3.3 Estimation of the S–N curve of S45C steel

The relationship between the fatigue strength under 5 × 10⁶ cycles was obtained by machine learning using the decision tree models of Vickers hardness, tensile strength, elongation, and reduction of area, based on the fatigue data (total = 2834) under 5 × 10⁶ cycles for different types of steels (see Table 4). The fracture strength at each stress amplitude was estimated from the mechanical properties of S45C. Additionally, the fatigue limit at 2.12 × 10⁷ cycles was obtained from the Vickers hardness of S45C based on the relationship between Vickers hardness and the fatigue limit at 2.12 × 10⁷ cycles obtained by machine learning from various steel materials. The analysis results are shown in Fig. 9. Experimental and estimated data are indicated by △ and ●, respectively. First, as shown in Table 2, the fatigue limit estimated from the Vickers hardness agreed very well with the estimated accuracy of 99% and MAPE of 1.76. The estimated fatigue strength below 5 × 10⁶ cycles is also in good agreement, even though the MAPE is 29.8%. The test data did not include steel grades with different strengths, as shown in Figs. 7 and 8; thus, there was no variation in the prediction accuracy. Moreover, the estimation of the S–N curve by machine learning was highly accurate when the fatigue strength and limit were estimated separately. This result reveals that the approximation of the S–N curve is possible by utilizing the accumulated experimental data in the FDS.

Fig. 9

Prediction of S-N curve of fracture life using data of S25C, S35C, S55C, SNCM439, SmN438, SmN43, SUS403, SUS304 (data of fracture life of 5 × 10⁶ times or less, fatigue limit considers only hardness).

4. Conclusions

Using the experimental data provided in the NIMS FDS, we attempted to estimate the fatigue limit and the fatigue strength below 10⁶ cycles by the random forest method and examined the possibility of estimating the S–N curve. The results obtained are as follows:

(1) The regression model by machine learning, which can associate multiple learning factors, was superior in estimating the fatigue limit.
(2) The S–N curve could be estimated with high accuracy by estimating the fatigue strength and fatigue limit separately by machine learning.

REFERENCES

1) National Institute for Materials Science (NIMS): Fatigue data sheet (FDS), https://smds.nims.go.jp/fatigue.
2) S. Nishijima, A. Ishii, K. Kanazawa, S. Matsuoka and T. Masuda: “Standard fatigue characteristics of JIS machine structural steel”, NIMS Materials Strength Data Sheet Technical Document, No. 5 (1989).
3) S. Matsuoka, N. Nagashima and S. Nishijima: “Index property for the fatigue of engineering alloys”, NIMS Materials Strength Data Sheet Technical Document, No. 17 (1997).
4) JSMS Committee on Fatigue of Materials: Syoshinnsya no tameno hirousekkeihou, (The Society of Materials Science, Japan, 2004) p. 28.
5) Python 3.6.1 (https://www.python.org/).
6) Anaconda (https://www.anaconda.com/).

Corresponding author

Register with J-STAGE for free!