Prediction of Ac3 and Martensite Start Temperatures by a Data-driven Model Selection Approach

Hoheok Kim; Junya Inoue; Masato Okada; Kenji Nagata

doi:10.2355/isijinternational.ISIJINT-2017-212

Abstract

Four different information criteria, which are widely used for model selection problems, are applied to reveal the explanatory variables for phase transformation temperatures of steels, austenitise temperature (Ac₃) and martensite-start temperature (Ms). Using existing datasets for CCT diagram for various steels, the predictive equations for these critical temperatures are derived. A number of empirical equations have been proposed to enable efficient prediction of the the Ac₃ and Ms temperatures of steels. However, the key parameters in those equations are usually chosen based on researchers’ trials and errors. In this study, the performance of the information criteria is evaluated first using a simulated dataset mimicking the characteristics of those for the Ac₃ and the Ms temperatures. Then the criteria are applied to the experimental data obtained from two different sources. The key parameters are chosen for the Ac₃ and Ms temperatures and the derived equations are found to be in better agreement with experimental data than the previous empirical equations. Thus, it was clarified that the methods can be applied to automatically discover the hidden mechanism from complex multi-dimensional datasets of steels’ chemical composition.

1. Introduction

Due to the importance of phase transformations and heat treatments on the mechanical properties of steels, a large number of studies have been conducted to clarify the effect of various alloying elements on phase transformation temperatures, such as the martensite-start temperature (Ms) and austenite transformation temperature (Ac₃). Therefore, various empirical models have been proposed and the most commonly used equations among those are listed in Table 1.^{1,2,3,4,5,6,7)} All these equations were derived by the multivariate regression analysis using large experimental datasets composed of several tens or hundreds of steels with various chemical compositions.

Table 1. A summary of the past equations for Ac₃ and Ms temperature based on chemical composition of steels.

Ac₃ temperature
Proposed by	Equation
Andrews (1965)	Ac₃(°C)=910−203C^1/2+44.7Si−15.2Ni+31.5Mo+104.4V+13.1W
Hougardy (1984)	Ac₃(°C)=902−255C+19Si−11Mn−5Cr+13Mo−20Ni+55V
Kasatkin et al. (1984)	Ac₃(°C)=912−370C−27.4Mn+27.3Si−6.35Cr−32.7Ni+95.2V+190Ti+72Al+65.6Nb+5.57W+332S+276P+485N−900B+16.2CMn+32.3CSi+15.4CCr+48CNi+4.32SiCr−17.3SiMo+18.6SiNi+4.8MnNi+40.5MoV+174C²+2.46Mn²−6.86Si²+0.322Cr²+9.9Mo²+1.24Ni²−60.2V²
Trzaska and Dobrza ski (2007)	Ac₃(°C)=973−224.5C^1/2−17Mn+34Si−14Ni+21.6Mo+41.8V−20Cu
Ms temperature
Proposed by	Equation
Payson and Savage (1944)	M_s(°C)=489.9−316.7C−33.3Mn−27.8C−16.7Ni−11.1(Si+Mo+W)
Grange and Stewart (1946)	M_s(°C)=537.8−361.1C−38.9(Mn+Cr)−19.4Ni−27.8Mo
Andrews (Linear, 1965)	M_s(°C)=539−423C−30.4Mn−17.7Ni−12.1Cr−7.5Mo
Andrews (Non linear, 1965)	M_s(°C)=512−453C−16.9Ni+15Cr−9.5Mo+217C²−71.5CMn−67.6CCr
Wang et al. (2000)	M_s(°C)=545−470.4C−3.96Si−37.7Mn−21.5Cr+38.9Mo

However, the key parameters in these equations, which are considered to have notable effects on the phase transformation temperatures, were selected mostly from the insight of the experienced researchers. As a result, there are several disagreements on how alloying elements affect phase transformation temperatures. For instance, the terms corresponding to the effect of carbon on Ac₃ are totally different between equations by Andrews and Hougardy.^1,2)

For reliable prediction, therefore, determination of the key parameters is the most important process, which unfortunately is not automatically done using the conventional multivariate regression analysis. Accordingly, several data-driven approaches have been introduced. For instance, a Bayesian neural network has been introduced to establish predictive models for Ac₃ by Vermeulen et al.⁸⁾ and Ms by Sourmail and Garcia-Mateo,⁹⁾ and it was demonstrated that this new data-driven approach enables both automatic model selection and provides superior estimation of those transformation temperatures.

Although the neural network approach does provide a good estimation, it does not allow the explicit separation of different roles of alloying elements. Of course, it is possible to incorporate the effect of each alloying elements from the output of the neural networks to a series of controlled inputs. However, from an engineering point of view, the explicit correlation is usually more important, so the previous empirical regression models are still widely used.^10,11)

Recently, another data-driven approach called as data driven model selection has been increasingly introduced to solve the physical problems. The data-driven model selection is a method where the model most suitable for a given dataset is selected based on the information criteria. There exist several information criteria, such as Akaike information criterion (AIC) by Akaike,¹²⁾ Bayesian information criterion (BIC) by Schwarz,¹³⁾ Akaike’s Bayesian information criterion (ABIC) by Akaike,¹⁴⁾ and cross-validation (CV).¹⁵⁾ Some of application of the methods in materials engineering area are as follows: Al-Rubaie et al.¹⁶⁾ used AIC to select the model for fatigue crack growth rate, and Cockayne and van de Walle¹⁷⁾ obtained a cluster expansion model for the CaZr_1−xTi_xO₃ solid solution by applying CV.

All of these criteria can be used to select the best model suitable for a given dataset, but difference in their assumption may lead to different performance under different circumstances, such as the amount of data and the kinds of model to be selected. Therefore, a comprehensive understanding of the behavior of each model selection criterion is needed when applying it to a specific problem.

In this paper, AIC, BIC, ABIC, and CV are first briefly explained, and then the performance of each model selection criterion evaluated using a simulated dataset considering the characteristics of the databases of Ac₃ and Ms temperatures. Finally, the criteria are applied to determine the key parameters for the Ac₃ and Ms temperatures using two kinds of data sources: one is exactly same dataset used by Andrews¹⁾ and the other from the CCT database for steels provided by the NIMS Materials Database (MatNavi).¹⁸⁾

2. Likelihood Function and Model Selection Criteria

This section presents a brief review of the basis of typical model selection criteria.

2.1. Likelihood Function

In many scientific studies, the objective is to find the underlying relationship that yields the data. This problem usually leads to the evaluation of a set of candidate models. In such a problem, a likelihood is used to estimate the probability of a model for given observations. Assuming that each observation y_n includes Gaussian noise in a model and parameters, the likelihood of y_n is given by Eq. (1), where x, θ, and s indicate the input data, the parameters, and the standard deviation of noise, respectively.

p( y n |x,θ ) = 1 2π s 2 exp( - 1 2 s 2 ( y n -θx ) 2 )

(1)

Suppose each observation is generated independently, then the likelihood function of the entire data, p(y|x, θ), can be represented by

p( y|x,θ ) = ∏ n=1 N p( y n |x,θ ) = 1 ( 2π s 2 ) N 2 exp[ - 1 2 s 2 ∑ n=1 N ( y n -θx ) 2 ].

(2)

where N represents the number of data. The model parameters that maximize the likelihood are chosen for the given data. This is to maximize the agreement of the candidate model with the observations, and this method is known as maximum likelihood estimation (MLE). With MLE, the maximum likelihood L ˆ is written as

L ˆ =p( y|x, θ ˆ ) = 1 ( 2π s 2 ) N 2 exp[ - 1 2 s 2 ∑ n=1 N ( y n - θ ˆ x ) 2 ]

(3)

where θ ˆ is a parameter determined by the least-squares method. In other words, the probability of a model with the optimized parameters for given data can be calculated by MLE.

2.2. Model Selection Criteria

Simply comparing candidate models with the maximum likelihood may lead to an overfitting problem. This occurs when a model is too complex such as having too many parameters relative to the observation data. Then the model describes error or noise instead of the underlying relationship. This problem can be explained from Fig. 1(a). A simple model cannot describe given data precisely and also gives a poor prediction. When the complexity is moderate, the model fits both the given data and new observations successfully. A more complex model, on the other hand, describes the given data perfectly but fails to predict new observations. As shown in Fig. 1(b), the disagreement of a model with given data, which is usually represented as training error, decreases as a model becomes more complex. However, the prediction error of the model with unseen data is high when the model is too simple or complex and low when the complexity is moderate. Therefore, MLE, which estimates how well a model fits the observations, should not be used as the only indicator when choosing the best model. For this reason, a variety of methods have been suggested which include a penalty term to MLE to consider the model complexity.

Fig. 1.

Schematic plot of the input, the observation and the models with different complexity (a) and the training and prediction errors as the model complexity increases are represented (b).

Akaike¹⁾ proposed the AIC based on information theory. It estimates a relative quality of a given model measuring the Kullback Leibler distance. It also provides an estimate of the information loss when a model is used to express a data-generating process. In this way, it evaluates both the goodness of fit and the complexity of a model. The AIC is defined as Eq. (4), which evaluates the model fitting with the term -2ln L ˆ and penalizes the model complexity with the term 2 k where k is the number of parameters.

AIC=-2ln L ˆ +2k

(4)

The BIC is a criterion for model selection among a finite set of models and was developed by Schwarz.¹³⁾ The equation for the probability of a model for given data was derived by the Laplace approximation. It balances the increase in the likelihood with the term calculated from the number of parameters (k) and the number of data (N).

BIC=-2ln L ˆ +kln( N )

(5)

Next, Akaike’s Bayesian information criterion (ABIC) proposed by Akaike¹⁴⁾ is considered. With the assumption that the noise in data follows a Gaussian distribution and the prior follows a uniform distribution, the probability of a model for given data can be expressed as Eq. (6)

ABIC =- 1 2 μ T Λ -1 μ+ 1 2 s 2 y T y + n-k 2 log( 2π s 2 ) - 1 2 log( | Λ | )

(6)

where

Λ= ( x x T ) -1

(7)

μ= ( x x T ) -1 xy.

(8)

The first three terms estimate how well a model is fitted to the data and the last term is a penalty for the model complexity.

Finally, we consider cross-validation (CV) which is a technique for assessing how well the analysis will generalize an independent dataset.¹⁵⁾ The calculation procedure is illustrated in Fig. 2. In the calculation, a part of data is used for analysis (training dataset) and the remainder for validation (validation dataset). The goal of cross-validation is to avoid overfitting problems by verifying a model with the validation dataset. In particular, in leave-one-out cross validation (LOOCV) only one datum is used as a validation set and the remaining data are used as a training set. Regarding these criteria, a model with a lower AIC, BIC, ABIC, or CV is preferred.

Fig. 2.

Schematic representation of the cross-validation method.

3. Validation of Each Criterion Using Toy Model

3.1. Setting of Toy Models

First, to demonstrate the effectiveness of each criterion in the feature and model selection problem, the following simple linear combination model is considered:

Model # 1: y= ∑ i=1 10 f( x ) +w, f( x ) = x i θ i , w~N( 0, σ 2 )

(9)

where y, x_i, f(x), w, and θ_i are the observations, input data, underlying relationship, noise, and parameters, respectively. The input data are distributed uniformly within a range between 0 and 1, and the noise is given by a zero mean normal distribution with standard deviation σ. An overview of the toy model is listed in Table 2. Only five out of 10 inputs are designed to actually have an effect on the output. By varying the standard deviation σ from 1 to 10, 50 to 800 combinations of input data and the output observations are randomly generated to observe the trend of model selection result with increasing number of data. The parameters for two cross terms of input data (a₉ for x 1 * x 7 , a₁₀ for x 2 * x 8 ) are also considered. The whole procedure is repeated 100 times to evaluate the performance of each criterion.

Table 2. An overview of the the values of the parameters of the Model #1 and their portion of including zero data.

Parameter	a₁	a₂	a₃	a₄	a₅	a₆	a₇	a₈	a₉	a₁₀
Value	3	0	7	0	11	0	15	0	19	0
% of zero data	10	10	30	30	50	50	70	70	70–80	70–80

Additionally, zero data are inserted into the input data with different ratios to investigate the effect of blank data on the model selection procedure. The zero data are randomly inserted in the input data and their fraction of each parameter is increased from 33% to 99%. The following simple linear combination model is considered:

Model #2 :y= ∑ i=1 20 f( x ) +w, f( x ) = x i θ i , w~N( 0, σ 2 )

(10)

The overview of the parameter and the fraction of zero data is listed in Table 3.

Table 3. An overview of the values of the parameters of the Model #2 and their portion of including zero data.

Parameter	a₁	a₂	a₃	a₄	a₅	a₆	a₇	a₈	a₉	a₁₀	a₁₁	a₁₂	a₁₃	a₁₄	a₁₅	a₁₆	a₁₇	a₁₈	a₁₉	a₂₀
Value	0	0	0	0	0	5	5	5	5	5	10	10	10	10	10	20	20	20	20	20
% of zero data	33	66	77	88	99	33	66	77	88	99	33	66	77	88	99	33	66	77	88	99

3.2. Simulation Result for Toy Model

Figure 3 illustrates the frequency of selecting the true relationship for each criterion with Model #1 under different numbers of data and noise levels. In this analysis, if θ₁, θ₃, θ₅, θ₇, and θ₉ are all selected and θ₂, θ₄, θ₆, θ₈, and θ₁₀ and are all excluded, the result is counted as successful. For example, Fig. 3(a) plots the model selection result with different number of data (N) for the AIC. When the noise level is 1, AIC succeeds 39 times out of 100 trials with 50 data, and 54 times with 800 data. All four criteria have a common trend that the frequency of selecting true model decreases as the noise level increases. Also, when there are more data, the true model is more likely to be found. It is natural that if there is much more noise and less data, it becomes more difficult to find the true underlying relationship.

Fig. 3.

The frequency of choosing the true model by each criterion. The figure (a), (b), (c) and (d) respectively show the result by AIC, BIC, ABIC and CV.

Among the four criteria, the BIC shows the best performance in terms of true model selection problems with a probability of 70% of finding the true model with the largest dataset in the entire range of noise levels. With small datasets, it still chooses the true model in 70% of cases when there is little noise.

The frequency of finding the true model was lowest with the AIC and CV and was less than 60% even with a large dataset. With small dataset, the frequency of finding the true model was less than 50%. In addition to their poor performances, the trend of their performances appears to be similar. It was proved by Stone¹⁹⁾ that the calculation processes of the AIC and CV are equivalent in terms of the case of linear model selection. Therefore, both criteria can be treated as the same method in this linear case.

Furthermore, the frequency of features chosen as a working parameter was also evaluated to investigate the model selection performance from the viewpoint of feature selection with Model #1, and the result is plotted in Fig. 4. This figure shows how frequently each parameter is estimated as a necessary feature by the four methods for various numbers of data and noise levels. The result shows that parameters with lower values and including larger percentages of zero data, which are shown in Table 2 tend to be excluded as the noise level increases. This suggests that such parameters are considered unnecessary because they are buried in noise. Among the four criteria, the BIC recorded the highest ratio of finding the true key parameters (a₁, a₃, a₅, a₇ and a₉) and excluding the unnecessary ones. The ABIC is not an effective method with small datasets but it showed improved performance with a large number of datasets. The AIC and CV produced almost the same results. From the viewpoints of both true model selection and feature selection, the BIC method showed the best performance.

Fig. 4.

The result of parameter selection by each criterion. The y axis indicates the times that each parameter is chosen as a key parameter by each model selection criterion with different noise level (the x axis) and number of data (n) when the calculation is repeated 100 times.

In addition, Fig. 5 shows the effect of zero data on model selection result using the BIC with Model #2. Each chart indicates the fraction of success for the various fractions of zero data, noise in data, and value of parameters. When there is no zero data and the number of data is 100, the parameter of 20 is chosen 100% in all noise range. As the value of the parameter is decreased, the fraction of success is decreased with increasing noise. The trend of the result changes slightly until the fraction of zero data becomes 88%. However, with little information, the key parameters were barely chosen when the standard deviation of noise is higher than 5. This result suggests that zero data has no considerable effect on the model selection as far as the quality of input data is good and the noise level is sufficiently low.

Fig. 5.

The model selection result with different zero data ratio. It shows that there is no big difference of the results until when the% of zero data is 88.

4. Derivation of Working Parameters of Ac₃ and Ms Temperatures of Steels Using Existing Datasets

4.1. Datasets Used to Derive Equations for Ac₃ and Ms Temperatures

Two sets of data are used to find working parameters for the Ac₃ and Ms temperatures. The first dataset is from various sources in Andrews’ paper¹⁾, which include high carbon contents and low contents of alloying elements. The second dataset is provided by NIMS Materials Database (MatNavi),¹⁸⁾ which covers lower carbon contents and higher contents of alloying elements than those of the sources used by Andrews. An overview of the data and the distribution of important elements in both databases are listed in Table 4 and plotted in Fig. 6, respectively.

Table 4. The chemical composition and the minimum to maximum range of the data used to derive predictive equations for Ac₃ and Ms temperatures. Upper and lower values in the table indicate maximum and minimum content of each alloying element.

	Data source	No. of data	C	Si	Mn	Ni	Cr	Cu	Mo	V	Ti	Nb	B	P	S	Al	N	W	As
Ac₃	Andrews	155	0.11/ 0.95	0.06/ 1.78	0.04/ 1.98	0/ 5.00	0/ 4.48	0/ 0.91	0/ 1.02	0/ 0.7	0/ 0.05	–	–	0/ 0.06	0/ 0.06	0/ 0.10	–	0/ 4.1	0/ 0.07
Ac₃	NIMS	198	0.03/ 0.4	0.01/ 1.76	0/ 1.98	0/ 9.33	0/ 9.04	0/ 0.92	0/ 1.32	0/ 0.56	0/ 0.03	0/ 0.18	0/ 0.02	0/ 0.19	0/ 0.14	0/ 0.1	0/ 0.01	–	–
Ms	Andrews	243	0.11/ 0.58	0.11/ 1.89	0.04/ 4.87	0/ 5.04	0/ 4.61	0/ 0.91	0/ 5.4	0/ 0.7	–	–	–	0/ 0.05	0/ 0.04	–	–	0/ 8.88	0/ 0.07
Ms	NIMS	258	0.02/ 0.94	0/ 1.76	0/ 2.05	0/ 9.11	0/ 9.04	0/ 1.1	0/ 1.66	0/ 0.56	0/ 0.1	0/ 0.18	0/ 0.02	0/ 0.19	0/ 0.04	0/ 0.1	0/ 0.01	–	–

Fig. 6.

Chemical composition ranges of major alloying elements in Andrews and NIMS data sets.

In this study, the data used by Andrews are analyzed and the results are compared with the equation obtained by Andrews. Then, both sets of data are employed to derive new equations.

4.2. Model Selection Using Andrews’ Dataset

First, the four criteria are applied to the datasets used by Andrews. In this way, the determination of the key parameter by the researcher and the data-driven approach can be compared. The following linear combination models including lower and higher-order terms of carbon are considered:

A c 3 ( °C ) =910+ a 1 C 1 3 + a 2 C 1 2 + a 3 C+ a 4 C 2 + a 5 Si+ a 6 Mn+ a 7 Ni+ a 8 Cr+ a 9 Cu + a 10 Mo+ a 11 V+ a 12 Ti+ a 13 P+ a 14 S + a 15 Al+ a 16 W+ a 17 As

(11)

Ms( °C ) =539+ a 1 C 1 3 + a 2 C 1 2 + a 3 C+ a 4 C 2 + a 5 Si+ a 6 Mn++ a 7 Ni+ a 8 Cr+ a 9 Cu + a 10 Mo+ a 11 V+ a 12 Ti+ a 13 P+ a 14 S + a 15 Al+ a 16 W+ a 17 As.

(12)

The symbol of each element in Eqs. (11), (12) represents its chemical composition in steels in wt%. In many equations for Ac₃, the carbon term is often expressed in the form of a linear term or square-root term. For example, Hougardy²⁾ derived his equation using a linear form. On the other hand, Andrews¹⁾ assumed that the Ac₃ temperature is proportional to the square root of the carbon content. For this reason, the basic model for the Ac₃ and Ms temperatures in the present paper includes all the possible series of carbon found in the literature.

4.3. Model Selection Using Both Andrews’ and NIMS Datasets

Next, we applied each criterion to the combined dataset of Andrews and NIMS. In this analysis, the linear combination models also include cross terms between carbon and strong carbide-forming elements such as chromium, vanadium, titanium, manganese, and molybdenum:

A c 3 ( °C ) =910+ a 1 C 1 3 + a 2 C 1 2 + a 3 C+ a 4 C 2 + a 5 Si+ a 6 Mn+ a 7 Ni+ a 8 Cr+ a 9 Cu + a 10 Mo+ a 11 V+ a 12 Ti+ a 13 P+ a 14 S + a 15 Al+ a 16 W+ a 17 As+ a 18 CCr + a 19 CMo+ a 20 CTi+ a 21 CMn+ a 22 CV

(13)

Ms( °C ) =539+ a 1 C 1 3 + a 2 C 1 2 + a 3 C+ a 4 C 2 + a 5 Si+ a 6 Mn+ a 7 Ni+ a 8 Cr+ a 9 Cu + a 10 Mo+ a 11 V+ a 12 Ti+ a 13 P+ a 14 S + a 15 Al+ a 16 W + a 17 As+ a 18 CCr + a 19 CMo+ a 20 CTi+ a 21 CMn+ a 22 CV.

(14)

4.4. Model Selection Result with the Andrews’ Dataset

4.4.1. Ac₃ Temperature

The same procedure as that used for the toy model was conducted. The model selection results are compared with the equations derived by Andrews and are listed in Table 5. The results for each criterion are similar to the equation defined by Andrews. However, Andrews failed to include terms for titanium, sulfur, aluminum, and arsenic. This is because the amount of these elements were less than 0.1 wt% in the steels used in his study, which made it difficult to discover their effects on the Ac₃ and Ms temperatures. The result for the ABIC appears to be different from the others in that it includes many parameters that are determined to be unnecessary in the other methods. Considering that the number of samples is only 155, the method using the ABIC is likely to overfit the data, as indicated above using the toy model.

Table 5. The comparison of coefficients of the Andrews’ equations for Ac₃ and Ms with the results derived by each criterion. The same data which Andrews collected were used for the model selection calculation.

	Estimated by	Constant	C^1/3	C^1/2	C	C²	Si	Mn	Ni	Cr	Cu	Mo	V	Ti	P	S	Al	W	As
Ac₃	Andrews (1965)	910	0	−203	0	0	44.7	0	−15.2	0	0	31.5	104.4	0	0	0	0	13.1	0
	AIC	910	0	−191	0	0	35	0	−15	0	0	29	98	673	0	−281	231	14	0
	BIC	910	0	−198	0	0	33	0	−15	0	0	29	102	772	0	0	0	0	14
	ABIC	910	123	−434	203	−112	35	0	−15	0	−12	29	97	700	64	−297	234	14	28
	CV	910	0	−191	0	0	35	0	−15	0	0	29	98	673	0	−281	231	14	0
Ms	Andrews (1965)	539	0	0	−423	0	0	−30.4	−17.7	−12.1	0	38.9	0	0	0	0	0	0	0
	AIC	539	−4758	8491	−6014	2252	0	−25	−14	−11	34	0	0	0	336	−438	0	5	−343
	BIC	539	−4925	8832	−6301	2401	0	−26	−15	−10	0	0	0	0	0	0	0	0	−248
	ABIC	539	−4914	8771	−6195	2316	0	−25	−14	−11	33	0	30	0	319	−438	0	0	−342
	CV	539	0	0	−394	0	−9	−34	−18	−16	0	−9	19	0	0	0	0	4	−262

4.4.2. Ms Temperature

It is clear from Table 5 that the terms for manganese, nickel, and chromium, which are considered important in the Andrews’ equation, are also in every equation obtained from the criteria. The results for Ms temperature, however, are considerably different from those of Andrews in the selection of terms concerning the carbon content. The Andrews’ equation and the equation obtained by CV include only the linear term listed in Table 5. However, the other criteria have all carbon terms indicating that the carbon content does not monotonously affect the Ms temperature. In addition, molybdenum, which is included in the Andrews’ equation does not appear in the equations obtained by the AIC, BIC, and ABIC. It is because molybdenum’s actual effect on Ms temperature is not greater than the noise level; therefore, its effect is buried by noise in data.

These results clearly indicate that the data-driven approach can efficiently find the working parameters without the experience and knowledge of experts, such as Andrews, simply by collecting a large number of existing datasets. Note that Andrews selected working parameters in accordance with the knowledge obtained from his experiments, in which the chemical composition of each alloying element was systematically controlled.¹⁾

4.5. Model Selection Result Using Both Andrews and NIMS Datasets

4.5.1. Ac₃ Temperature

The equations for the Ac₃ temperature based on the chemical composition are estimated using the same procedure as that for the toy model. Each criterion selects different features as shown in Table 6. Carbon, silicon, and nickel are included in all the Ac₃ equations. From the data-driven methods, molybdenum, which is a well-known carbide-forming element, turned out to be a working parameter appearing in a cross term instead of a linear term, as in Andrews’ equation. This can be explained from the austenite area of the steel. Figure 7(a) illustrates the area where only austenite phase exists. Its lower left line and lower right line indicate the Ac₃ line and Acm line, respectively. The Ac₃ line of the Fe–C–Mo system based on the equation derived by BIC, and the Ac₃ and Acm lines of the same system derived using the thermodynamic calculation software, Thermo-Calc, are plotted in Fig. 7(b). The Ac₃ line for 0 wt% molybdenum estimated using the equation derived by BIC corresponds well with that derived from the thermodynamic calculation. It is clear from the thermodynamic calculation that, as the content of molybdenum increases, cementite becomes stable and the Acm line goes up. The regression equation actually tracing the behavior of Acm line instead of Ac₃ for the increased molybdenum content. In order to verify this, the Ac₃ and the Acm temperatures were calculated from the chemical composition of 353 experimental Ac₃ data with Thermocalc software. The estimated Acm of 14 out of 353 data were found to be higher than the estimated Ac₃. In addition, the differences between the experimental Ac₃ and the estimated Acm of above 14 data were smaller than those between the experimental and the estimated Ac₃. This result suggests that some of the Ac₃ temperatures in the data might be confused with the Acm temperatures, which is difficult to distinguish simply from the dilatometric analysis commonly used to determine Ac₃ temperatures.

Table 6. The equations for Ac₃ and Ms temperatures derived from the combined data. The total 353 Ac₃ temperature data (155 from Andrews and 198 from NIMS) and 501 Ms temperature (243 from Andrews and 258 from NIMS) are used for the model selection calculation.

	Model Selection	Constant	C^1/3	C^1/2	C	C²	Si	Mn	Ni	Cr	Cu	Mo	V	Ti	P	S	Al	W	As	CCr	CMo	CTi	CMn	CV
Ac₃	AIC	910	1282	−2664	1985	−1003	29	−22	−16	0	0	20	0	732	182	0	−149	19	139	0	0	0	0	165
	BIC	910	0	−161	0	0	27	−27	−16	0	0	0	0	1078	0	0	0	25	0	0	108	0	0	0
	ABIC	910	1301	−2629	1850	−912	30	−28	−17	1	0	0	0	1069	208	−127	−158	21	131	0	85	−982	27	110
	CV	910	1393	−2826	2020	−964	29	−24	−17	0	0	0	0	724	258	0	0	26	129	0	111	0	0	0
Ms	AIC	539	−985	1722	−1525	512	0	−32	−14	−5	17	0	−123	0	202	−446	−206	6	−363	−30	−18	0	0	352
	BIC	539	−954	1663	−1513	535	0	−32	−14	−5	0	0	0	0	0	0	−211	6	−304	−26	0	0	0	0
	ABIC	539	−993	1738	−1537	517	0	−33	−14	−5	17	0	−121	271	203	−445	−205	6	−363	−30	−18	−3865	0	348
	CV	539	−985	1722	−1525	512	0	−32	−14	−5	17	0	−123	0	202	−446	−206	6	−363	−30	−18	0	0	352

Fig. 7.

The austenite area of the Fe–C steel by Thermocalc (a) and The Ac₃ and Acm lines of the Fe–C–Mo steel drawn with Thermocalc and based on the BIC result for Andrews and NIMS data (b).

The derived equations are found to be in good agreement with experimental data as shown in Fig. 8. The root mean square error (RMSE) of the previously reported equations (Figs. 8(a), 8(b)) are as high as 39, while those of newly derived equations (Figs. 8(c)–8(f)) are about 30.

Fig. 8.

Comparison of the experimental and estimated Ac₃ temperatures by past researchers (a and b) and model selection criterions (c, d, e and f).

4.5.2. Ms Temperature

The equations for the Ms temperature were derived in the same way as for the Ac₃ temperature estimation. The newly derived equations and the past results by Andrews¹⁾ and Payson and Savage⁵⁾ both indicate that carbon, manganese, nickel, and chromium are key elements for the Ms temperature. In addition, the cross term between carbon and chromium, which is included in one of Andrews’ equations, is also found in all four data-driven equations. To clarify the reason why these cross terms are chosen, further research is required. It is shown in Fig. 9 that the Ms temperatures estimated by these equations show better agreement with the observation data than the previously reported equations.

Fig. 9.

Comparison of the experimental and estimated Ms temperatures derived by past researchers (a and b) and model selection criterions (c, d, e and f).

5. Conclusion

To demonstrate the effectiveness of the AIC, BIC, ABIC, and CV, these criteria were evaluated using a toy model. The results suggested that they could find meaningful parameters, which are not buried in noise in the given data. Among the four criteria, the BIC method showed the best performance for a linear combination model.

We applied the technique to the actual experimental dataset used by Andrews¹⁾ and that provided by NIMS to derive predictive equations for the Ac₃ and Ms temperatures. By applying the model selection criteria to the datasets collected from these sources, it was demonstrated that the key parameters, traditionally chosen by skilled researchers on the basis of systematically controlled experiments, could be found efficiently. In addition, simply by integrating the data obtained using various randomly selected conditions, the derived equations not only show improved agreement with the experimental data, but also clarify the hidden mechanism which is usually difficult to derive because of the high dimensionality of the material’s dataset. These results suggest that the model selection method can successfully be applied to other problems involving feature and working-parameter selection, such as the problem estimating fatigue and creep lifetime.

References

1) K. W. Andrews: J. Iron Steel Inst., 203 (1965), 721.
2) H. P. Hougardy: Werkstoffkunde Stahl, Band 1: Grundlagen, Springer/Verlag Stahleisen, Düsseldorf, (1984), 229.
3) O. G. Kasatkin, B. B. Vinokur and V. L. Pilyushenko: Met. Sci. Heat Treat., 26 (1984), 27.
4) J. Trzaska and L. A. Dobrazan ́ski: J. Mater. Process. Technol., 192 (2007), 504.
5) P. Payson and C. H. Savage: Trans. ASM, 33 (1944), 261.
6) R. A. Grange and H. M. Stewart: Trans. AIME, 167 (1946), 467.
7) J. Wang, P. J. van der Wolk and S. van der Zwaag: Mater. Trans. JIM, 41 (2000), 761.
8) W. G. Vermeulen, P. F. Morris, A. P. De Weijer and S. Van der Zwaag: Ironmaking Steelmaking, 23 (1996), 433.
9) T. Sourmail and C. Garcia-Mateo: Comput. Mater. Sci., 34 (2005), 323.
10) E. J. Seo, L. Cho and B. C. De Cooman: Metall. Mater. Trans. A, 46 (2015), 27.
11) C.-N. Li, F.-Q. Ji, G. Yuan, J. Kang, R. D. K. Misra and G.-D. Wang: Mater. Sci. Eng. A, 662 (2016), 100.
12) H. Akaike: IEEE Trans. Autom. Control., 19 (1974), 716.
13) G. Schwarz: Ann. Stat., 6 (1978), 461.
14) H. Akaike: Trab. Estad. Stica Y Investig. N Oper., 31 (1980), 143.
15) C. E. Rasmussen and C. Williams: Gaussian Processes for Machine Learning, The MIT Press, Cambridge, (2006), 105.
16) K. S. Al-Rubaie, E. K. L. Barroso and L. B. Godefroid: Int. J. Fatigue, 28 (2006), 934.
17) E. Cockayne and A. van de Walle: Phys. Rev. B, 81 (2010), 12104.
18) NIMS Materials Database (MatNavi), NIMS, http://mits.nims.go.jp/index_en.html, (accessed 2015-04-10).
19) M. Stone: J. R. Stat. Soc. Ser. B, 39 (1974), 44.

Corresponding author

Register with J-STAGE for free!