Prediction System for Solid Solubility Limits of Ag-, Cu-, Al-, and Mg-Based Alloys Using Artificial Neural Networks and First-Principles Calculations

Takafumi Mochizuki; Tokuteru Uesugi; Yorinobu Takigawa

doi:10.2320/matertrans.MT-MBW2019010

Abstract

A solid solubility prediction system using Hume-Rothery parameters and first-principles calculation to obtain explanatory variables was devised, and the resulting coefficients of determination, R², were compared. When we used the Hume-Rothery parameter, R² was 0.715, and when we used the first-principles calculation results, R² was 0.900, indicating the improved accuracy of prediction. We tested 10-fold cross validation to evaluate the generalization performance of the network. The number of explanatory variables was optimized using the stepwise method. R² was maximized when eight explanatory variables were used. As a result of 10-fold cross-validation, R² of the constructed solid solubility prediction system which uses eight explanatory variables was 0.6993. The mean absolute error for this network was 0.45. The common logarithm value was used as the explained variable. Thus, the solid solubility limit predicted from this network was on an average 0.35 to 2.85 times the true value.

Fig. 9 Comparison between experimental data and the predicted data using 8 explanatory variables after 10-fold cross validation.

1. Introduction

The solid solubility limit plays an important role in the design of various alloys. Even if the amount of solid solution is very small, the element of the solid solution often has a profound influence on the mechanical properties of the alloy.¹⁾ Thus, the solid solubility limit has long been studied experimentally and computationally.²⁾

Hume-Rothery rules are well known semi-empirical rules to determine whether the solid metal will dissolve.³⁾ By analyzing the solid solubility limit data of Ag-based and Cu-based binary alloys, two Hume-Rothery rules were proposed. First, when the ratio of the atomic radii between the solute element and the solvent element exceeds 15%, it is difficult to form a solid solution. Second, when the ratio of the atomic radii between the solute element and the solvent element is 15% or less, the solid solution can be formed, and the range is determined by the average number of valence electrons e/a, where e is the total number of valence electrons and a is the total number of atoms. Furthermore, three rules of thumb, including the effect of electronegativity on the range of solid solution formation, were published in 1940.⁴⁾ They are known as the Hume-Rothery rules.⁵⁾

Artificial neural networks (ANN) are inspired by the human brain system.⁶⁾ ANNs are supervised machine learning algorithms that can capture the relationship between explanatory and explained variables while training a network. Recently, the interest in applying machine learning to solve problems in materials sciences has increased.⁷^–¹³⁾

First Zhang et al. attempted to predict the solid solubility limit for the first time using parameters of the Hume-Rothery rules and artificial neural networks.⁷⁾ Zhang et al. developed an artificial neural network that uses atomic size, valence, and electronegativity as explanatory variables and the solid solubility limit as an explained variable to predict the solid solubility limit of Ag-based and Cu-based binary alloys.⁷⁾ However, the coefficient of determination, R², between experimental data and predicted data obtained by the network of Zhang et al. was as low as 0.590.⁷⁾ Moreover, the target of the network of Zhang et al. was only Ag-based and Cu-based alloys.

In this study, we aimed to improve the prediction accuracy of solid solution solubility using the first-principles calculation results as explanatory variables. Moreover, we targeted not only Ag-based and Cu-based alloys to compare the accuracy with the previous result by Zhang et al.,⁷⁾ but also Al-based and Mg-based alloys which are widely used in various applications because of their good mechanical properties, high strength-to-weight ratio, and market acceptability. In this study, unless otherwise specified, the maximum solid solubility limit is treated as the solid solubility limit.

2. Calculation Procedures

The first-principles calculations were performed using the Cambridge Serial Total-Energy Package (CASTEP).¹⁴⁾ CASTEP is an ab initio pseudopotential method code for solving the electronic ground state of periodic systems, in which the wave functions are expanded on a plane-wave basis that is set using a technique based on density functional theory (DFT).¹⁵^,¹⁶⁾ The electronic exchange-correlation energy used in the DFT was given by the generalized gradient approximation (GGA) proposed by Perdew et al. pseudopotentials¹⁷⁾ and was used for all elements.

For the calculation of Ag-, Cu-, and Al-based solid-solution alloys, we used fcc supercells that replace 1 out of 32 solvent atoms. For calculation of Mg-based solid-solution alloys, we used hcp supercells that replace 1 out of 36 solvent atoms. Figure 1 shows these supercell models. The plane-wave cut-off energy was set at 350 eV for calculations of Ag-, Al-, and Mg-based alloys and at 400 eV for calculations of Cu-based alloys. The energy integration over a Brillouin zone was performed using k-points in accordance with the Monkhorst–Pack grid¹⁸⁾ with sets of 6 × 6 × 6 in calculations of Ag-based and Cu-based alloys, 7 × 7 × 7 in calculations of Al-based alloys, and 6 × 6 × 5 in calculations of Mg-based alloys. The spin polarizations were calculated for alloys containing Cr, Mn, Fe, Co, and Ni solute atoms which exhibit the possibility of magnetism. Full structural relaxation of the atomic configurations and lattice constants were calculated because introducing a substitute solute atom leads to local lattice distortion and changes in the cell volume. Stable atomic configurations were obtained through relaxation based on the Hellmann–Feynman forces. The lattice constants at zero pressure were also optimized using a Broyden–Fletcher–Goldfarb–Shanno (BFGS)¹⁹⁾ minimization algorithm in conjunction with the stress calculations. The convergence parameters were as follows: the total energy tolerance was 1.6 × 10⁻²⁴ J/atom, the maximum force tolerance was 4.8 × 10⁻¹² N, the maximal stress component was 0.05 GPa, and the maximum displacement was 1 × 10⁻⁴ nm.

Fig. 1

Schematic representation of supercell modeling (a) Ag₃₁X, Cu₃₁X, and Al₃₁X, and (b) Mg₃₅X.

We calculated 10 parameters: the average number of electrons in s, p, and d orbits (N_s, N_p, and N_d); bond overlap population (BOP); heat of solution (ΔH_relax); heat of solution of ideal lattice (ΔH_unrelax); misfit strain (ε); Young’s modulus (E); shear modulus (G); and bulk modulus (K). N_s, N_p, N_d, and BOP were calculated using Mulliken population analysis.²⁰⁾ In Ag-, Cu-, and Al-based alloys, the values of BOP between the solute atom and solvent atom at the first nearest neighbor were used. In Mg-based alloys, the average value between the solute atoms and the solvent element at the first and second nearest neighbor was used. ΔH_relax and ΔH_unrelax were calculated using the following equations:

\begin{equation} \Delta H = E_{\text{MX}} - (31/32E_{\text{M}} + E_{\text{X}}) \end{equation}

(1)

\begin{equation} \Delta H = E_{\text{MgX}} - (35/36E_{\text{Mg}} + E_{\text{X}}) \end{equation}

(2)

where E_M is the energy of a solvent atom such as Ag, Cu, and Al. E_Mg is the energy of solvent atom Mg. E_X is the energy of a solute atom. When ΔH_relax was calculated, the energy after full structural relaxation was used. For ΔH_unrelax, the energy before the structural relaxation was used. The misfit strain ε was calculated using the following equation:²¹⁾

\begin{equation} \varepsilon = (d_{\text{MX}}-d_{\text{M}})/d_{\text{M}} \end{equation}

(3)

d_M is the interatomic distance in pure M such as, Ag, Cu, Al, and Mg, and d_MX is the first nearest neighbor distance between solvent atom M and solute atom X in M-X solid solutions. In the case of Mg-based alloys, the average values of the first and second nearest neighbor distances between solvent and solute atoms were used. Young’s modulus (E), shear modulus (G), and bulk modulus (K) were obtained by Hill’s approximation.²²⁾

We used Neural Network Console, which is a deep learning tool for training, evaluating, and designing the artificial neural network. An optimization algorithm called adaptive moment estimation (ADAM)²³⁾ was used to calculate the gradient needed in calculation of weights associated with each connection between neurons.

The network structure was explored and optimized from the number of layers and various activation functions using the Bayesian optimization based on Gaussian processes.²⁴⁾ The network was composed of four or five layers including one input layer, two or three hidden layers, and one output layer. Huber loss function²⁵⁾ was used as the loss function. The Huber loss function could stabilize the training set better than the squared error function. We used the following activation functions. The Relu function²⁶⁾ is widely used to perform quick calculations. The Elu function²⁷⁾ is an improved form of the Relu function. Abs is an absolute function that transfers the absolute value to the next layer. The Tanh function²⁸⁾ is symmetrical around 0. The Swish function²⁹⁾ looks like a sigmoid function. The Selu function³⁰⁾ can undergo self-normalization. Batch normalization³¹⁾ is a process of normalizing within a mini-batch and can help in quick learning of the network. Dropout³²⁾ can prevent overfitting.

3. Results and Discussion

3.1 Data collection

In this study, we constructed two solid solubility limit prediction systems. One used Hume-Rothery parameters similar to the study of Zhang et al.,⁷⁾ while the other used results from the first-principles calculations. In the artificial neural network with the Hume-Rothery parameters, four explanatory variables were used: atomic radius difference between the solute atom and solvent atom, Pauling’s electronegativity,³³⁾ which is the difference between the solute atom and the solvent atom, the valence of the solute element, and the valence of the solvent element. The number of valence electrons was obtained from the group number in the periodic table used by the Chemical Abstract Service (CAS).

We also used 11 explanatory variables: one was experimental data and the other ten were the results of the first-principles calculations. The experimental data were melting point of solute atoms T_x.³⁴⁾ The results of the first-principles calculations were N_s, N_p, N_d, BOP, ΔH_relax, ΔH_unrelax, ε, E, G, and K in M-X solid solution alloys. Based on thermodynamic knowledge, the solid solubility limit depends on an excess Gibbs free energy of the solid solution, which can be subdivided into an enthalpy of mixing and entropy contribution. We expected that the results of the first-principles calculations in M-X solid solution alloys would be related to the excess Gibbs free energy of the solid solution, but the experimental melting point of solute atoms was especially expected to be related to the entropy contribution.

In any artificial neural network, we used the common logarithmic value of the solid solubility limit, c, collected from the equilibrium diagram³⁵^,³⁶⁾ as the explained variable log c. A total of 104 data including those for 27 Ag-based, 26 Cu-based, 30 Al-based, and 21 Mg-based alloys were collected. Datasets of the results of the first-principles calculations and the explained variables are shown in Tables 1, 2, 3, and 4. All explanatory variables and explained variables were normalized before training to make them fall within the range of −1 to 1.

Table 1 Dataset of Ag-based alloys.

Table 2 Dataset of Cu-based alloys.

Table 3 Dataset of Al-based alloys.

Table 4 Dataset of Mg-based alloys.

3.2 Machine learning

Figure 2 shows the optimized network that used the Hume-Rothery parameter. Figure 3 shows the prediction of the solubility using Hume-Rothery parameters in comparison with the experimental data. We used the hold-out method for the network: 80% of all data were training data and 20% of all data were validation data. The coefficient of determination R² for the correlation between predicted and experimental values was 0.715. The coefficient of determination of the study of Zhang et al. was 0.590.⁷⁾ The coefficient of determination in this work was better than that by Zhang et al. The prediction accuracy was improved because of the increase in the number of data of Al-based and Mg-based alloys.

Fig. 2

Architecture of the optimized artificial neural network using Hume-Rothery parameters.

Fig. 3

Prediction of solubility using Hume-Rothery parameters in comparison with experimental data.

Figure 4 shows the optimized network that used the results of the first-principles calculations. Figure 5 shows the prediction of solubility using the results of the first-principles calculations in comparison with experimental data. Among all data, 80% were training data and 20% were validation data, as shown in Fig. 5. The coefficient of determination, R², for the correlation between predicted and experimental values was 0.900. Thus, the prediction system of solid solubility limit was more accurate than that constructed by Zhang et al.⁷⁾

Fig. 4

Architecture of the optimized artificial neural network using results of first-principles calculations.

Fig. 5

Prediction of solubility using first-principles data in comparison with experimental data.

Stepwise regression is a method of fitting regression models in which the explanatory variables are chosen by an automatic procedure. We used the stepwise method to optimize the number of explanatory variables of the network which used the results of the first-principles calculation. Figure 6 shows the principle diagram of the stepwise method used in this work. First, the explanatory variables are reduced by 1 from 11 explanatory variables, and the combination that maximizes R² is searched. Furthermore, the explanatory variables are reduced by one, and the optimal combination is searched. The hold-out method was used while selecting the explanatory variables by the stepwise method: 80% of all data were training data and 20% of all data were validation data. Figure 7 shows the changes in the coefficient of determination for only validation data with the decreasing number of explanatory variables. Table 5 shows the optimized combination of each number of explanatory variables. The best combination of explanatory variables includes eight explanatory variables which were the average number of electrons in the s and d orbits (N_s and N_d), heat of solution (ΔH_relax), heat of solution of ideal lattice (ΔH_unrelax), Young’s modulus (E), shear modulus (G), bulk modulus (K), and melting point of solute atoms (T_x).

Fig. 6

Principle diagram of stepwise method.

Fig. 7

Change in coefficient of determination for validation data decreasing number of explanatory variables.

Table 5 Combination of explanatory variables with the largest coefficient of determination for validation data.

Generally, machine learning faces a problem called overfitting.³²⁾ Therefore, we tested 10-fold cross validation to evaluate the generalization performance of the constructed network with eight explanatory variables. Figure 8 shows the principles diagram of 10-fold cross validation. The procedure of 10-fold cross validation is as follows. All data were divided into 10 groups: one was the validation data group and the others were training data groups. The network was trained using the training data, and the validation data were estimated. This process was repeated 10 times. Finally, R² was evaluated between experimental and all predicted values. Figure 9 shows the result of 10-fold cross validation. R² of the network that used eight explanatory variables was 0.6993 after 10-fold cross validation.

Fig. 8

Principle diagram of 10-fold cross validation.

Fig. 9

Comparison between experimental data and the predicted data using 8 explanatory variables after 10-fold cross validation.

After 10-fold cross validation, we calculated the mean absolute error (MAE) of the network that used eight explanatory variables of first-principles calculation results. Table 6 shows the MAE of each alloy obtained by the prediction network. For all alloys, the MAE of the network that used eight explanatory variables by the first-principles calculations was 0.45. Because the common logarithmic value was used as the explained variable, the solid solubility limit predicted from the network on average was 10^−0.45 to 10^0.45 times the true value. In other words, the solid solubility limit predicted from this network was on average 0.35 to 2.85 times the true value.

Table 6 Prediction accuracy of log c after 10-fold cross validation.

The accuracy of the solid solubility limit prediction system constructed in this study depended on the type of solvent elements as shown in Table 6. The MAEs of Ag-, Cu-, and Mg-based alloys were better than the value of 0.45 for all alloys. However, the MAE of Al-based alloys was 0.63, clearly inferior to the MAE for all alloys. Only Mg-based alloys were hcp structure, but the differences in crystal structures did not affect the accuracy of the prediction. Ag, Cu, and Mg have one or two valence electrons, and are simple metals which are metals with free-electron-like valence band structure. However, Al has three valence electrons and is a neighbor of Si in the periodic table. The several first-principles studies have shown that directional bonds with covalent characters can be locally formed for Al atoms compared to Cu and Mg atoms.³⁷^,³⁸⁾ It can be said that Al has a character intervening between simple metals and covalent materials, and that Al can form bonds with both metallic and covalent characters according to its environment. This tendency of Al atoms bonding may be related to the difficulty of predicting the solid solubility limit in Al-based alloys.

The prediction by the network constructed in this work was more accurate than that predicted by Zhang et al.⁷⁾ Thus, the constructed prediction system is useful for designing and development of new alloys with unknown solid solubility limit.

4. Conclusions

Using artificial neural network and first-principles calculation results, we constructed a solid solubility limit prediction system with high prediction accuracy. The obtained results are shown below:

(1) In Ag-, Cu-, Al-, and Mg-based alloys, the determination coefficient R² was improved from 0.715 to 0.900 using the first-principles calculation results as explanatory variables compared with the case of using the Hume-Rothery parameters.
(2) The best combination of explanatory variables was searched and eight explanatory variables were obtained: average number of electrons in s and d orbit (N_s and N_d), heat of solution (ΔH_relax), heat of solution of ideal lattice (ΔH_unrelax), Young’s modulus (E), shear modulus (G), bulk modulus (K), and melting point of solute atoms (T_x).
(3) Ten-fold cross validation was performed to evaluate the generalization of the network. The solid solubility limit predicted from this network was on an average 0.35 to 2.85 times the true value.

Acknowledgments

This study was supported by the Light Metal Educational Foundation, Inc.

REFERENCES

1) K. Kimura, H. Kushima, K. Yagi and C. Tanaka: Tetsu-to-Hagané 81 (1995) 757–762.
2) E. Clouet, J.M. Sanchez and C. Sigli: Phys. Rev. B 65 (2002) 094105.
3) W. Hume-Rothery, G.W. Mabbott and K.M.C. Evans: Philos. Trans. R. Soc. London, Ser. A 233 (1934) 1–97.
4) W. Hume-Rothery, P.W. Reynolds and G.V. Raynor: J. Inst. Met. 66 (1940) 191–207.
5) U. Mizutani: Materia Japan 48 (2009) 119–125.
6) S. Haykin: Neural Networks and Learning Machines, 3rd ed., (Pearson, New York, 2008).
7) Y.M. Zhang, S. Yang and J.R.G. Evans: Acta Mater. 56 (2008) 1094–1105.
8) D. Minami, T. Uesugi, Y. Takigawa and K. Higashi: Int. J. Mod. Phys. B 33 (2019) 1950055.
9) T. Shiraiwa, Y. Miyazawa and M. Enoki: Mater. Trans. 60 (2019) 189–198.
10) S. Miyazaki, M. Kusano, D.S. Bulgarevich, S. Kishimoto, A. Yumoto and M. Watanabe: Mater. Trans. 60 (2019) 561–568.
11) S. Wu, J. Yang and Z. Liu: Mater. Trans. 61 (2020) 691–699.
12) M. Shirai and H. Yamada: Mater. Trans. 61 (2020) 176–180.
13) N. Hasegawa, S. Hasegawa, T. Kitaoka and H. Ohtsu: Mater. Trans. 60 (2019) 758–764.
14) M.D. Segall, P.J.D. Lindan, M.J. Probert, C.J. Pickard, P.J. Hasnip, S.J. Clark and M.C. Payne: J. Phys. Condens. Matter 14 (2002) 2717–2744.
15) P. Hohenberg and W. Kohn: Phys. Rev. 136 (1964) B864–B871.
16) W. Kohn and L.J. Sham: Phys. Rev. 140 (1965) A1133–A1138.
17) D. Vanderbilt: Phys. Rev. B 41 (1990) 7892–7895.
18) H.J. Monkhorst and J.D. Pack: Phys. Rev. B 13 (1976) 5188–5192.
19) T.H. Fischer and J. Almlof: J. Phys. Chem. 96 (1992) 9768–9774.
20) R.S. Mulliken: J. Phys. Chem. 56 (1952) 295–311.
21) T. Uesugi and K. Higashi: Comput. Mater. Sci. 67 (2013) 1–10.
22) R. Hill: J. Mech. Phys. Solids 11 (1963) 357–372.
23) D.P. Kingma and J. Ba: arXiv:1412.6980 (2014).
24) C. Rasmussen and C. Williams: Gaussian Processes for Machine Learning, (The MIT Press, Cambridge, 2006).
25) V. Katkovnik: IEEE Trans. Signal Process. 46 (1998) 3104–3109.
26) Y. LeCun, Y. Bengio and G. Hinton: Nature 521 (2015) 436–444.
27) D.A. Clevert, T. Unterthiner and S. Hochreiter: arXiv:1511.07289 (2015).
28) X. Glorot and Y. Bengio: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, ed. by T. Yee Whye and T. Mike, (PMLR, Proceedings of Machine Learning Research, 2010) Vol. 9, pp. 249–256.
29) P. Ramachandran, B. Zoph and Q.V. Le: arXiv:1710.05941 (2017).
30) G. Klambauer, T. Unterthiner, A. Mayr and S. Hochreiter: Proceedings of the 31st International Conference on Neural Information Processing Systems, (Curran Associates Inc., Long Beach, California, USA, 2017) pp. 972–981.
31) S. Ioffe and C. Szegedy: arXiv:1502.03167 (2015).
32) N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov: J. Mach. Learn. Res. 15 (2014) 1929–1958.
33) L. Pauling: J. Am. Chem. Soc. 69 (1947) 542–553.
34) C. Kittel and P. McEuen: Introduction to Solid State Physics, (J. Wiley, Hoboken, N.J., 2005).
35) B. Predel: Phase Equilibria of Binary Alloys, (Springer-Verlag, Berlin Heidelberg, 2003).
36) H. Baker: ASM Handbook Volume 3, Alloy Phase Diagrams, (ASM International, Ohio, 1992).
37) T. Uesugi, M. Kohyama and K. Higashi: Phys. Rev. B 68 (2003) 184103.
38) S. Ogata, J. Li and S. Yip: Science 298 (2002) 807–811.

Corresponding author

Register with J-STAGE for free!