2020 Volume 61 Issue 11 Pages 2083-2090
A solid solubility prediction system using Hume-Rothery parameters and first-principles calculation to obtain explanatory variables was devised, and the resulting coefficients of determination, R2, were compared. When we used the Hume-Rothery parameter, R2 was 0.715, and when we used the first-principles calculation results, R2 was 0.900, indicating the improved accuracy of prediction. We tested 10-fold cross validation to evaluate the generalization performance of the network. The number of explanatory variables was optimized using the stepwise method. R2 was maximized when eight explanatory variables were used. As a result of 10-fold cross-validation, R2 of the constructed solid solubility prediction system which uses eight explanatory variables was 0.6993. The mean absolute error for this network was 0.45. The common logarithm value was used as the explained variable. Thus, the solid solubility limit predicted from this network was on an average 0.35 to 2.85 times the true value.

Fig. 9 Comparison between experimental data and the predicted data using 8 explanatory variables after 10-fold cross validation.
The solid solubility limit plays an important role in the design of various alloys. Even if the amount of solid solution is very small, the element of the solid solution often has a profound influence on the mechanical properties of the alloy.1) Thus, the solid solubility limit has long been studied experimentally and computationally.2)
Hume-Rothery rules are well known semi-empirical rules to determine whether the solid metal will dissolve.3) By analyzing the solid solubility limit data of Ag-based and Cu-based binary alloys, two Hume-Rothery rules were proposed. First, when the ratio of the atomic radii between the solute element and the solvent element exceeds 15%, it is difficult to form a solid solution. Second, when the ratio of the atomic radii between the solute element and the solvent element is 15% or less, the solid solution can be formed, and the range is determined by the average number of valence electrons e/a, where e is the total number of valence electrons and a is the total number of atoms. Furthermore, three rules of thumb, including the effect of electronegativity on the range of solid solution formation, were published in 1940.4) They are known as the Hume-Rothery rules.5)
Artificial neural networks (ANN) are inspired by the human brain system.6) ANNs are supervised machine learning algorithms that can capture the relationship between explanatory and explained variables while training a network. Recently, the interest in applying machine learning to solve problems in materials sciences has increased.7–13)
First Zhang et al. attempted to predict the solid solubility limit for the first time using parameters of the Hume-Rothery rules and artificial neural networks.7) Zhang et al. developed an artificial neural network that uses atomic size, valence, and electronegativity as explanatory variables and the solid solubility limit as an explained variable to predict the solid solubility limit of Ag-based and Cu-based binary alloys.7) However, the coefficient of determination, R2, between experimental data and predicted data obtained by the network of Zhang et al. was as low as 0.590.7) Moreover, the target of the network of Zhang et al. was only Ag-based and Cu-based alloys.
In this study, we aimed to improve the prediction accuracy of solid solution solubility using the first-principles calculation results as explanatory variables. Moreover, we targeted not only Ag-based and Cu-based alloys to compare the accuracy with the previous result by Zhang et al.,7) but also Al-based and Mg-based alloys which are widely used in various applications because of their good mechanical properties, high strength-to-weight ratio, and market acceptability. In this study, unless otherwise specified, the maximum solid solubility limit is treated as the solid solubility limit.
The first-principles calculations were performed using the Cambridge Serial Total-Energy Package (CASTEP).14) CASTEP is an ab initio pseudopotential method code for solving the electronic ground state of periodic systems, in which the wave functions are expanded on a plane-wave basis that is set using a technique based on density functional theory (DFT).15,16) The electronic exchange-correlation energy used in the DFT was given by the generalized gradient approximation (GGA) proposed by Perdew et al. pseudopotentials17) and was used for all elements.
For the calculation of Ag-, Cu-, and Al-based solid-solution alloys, we used fcc supercells that replace 1 out of 32 solvent atoms. For calculation of Mg-based solid-solution alloys, we used hcp supercells that replace 1 out of 36 solvent atoms. Figure 1 shows these supercell models. The plane-wave cut-off energy was set at 350 eV for calculations of Ag-, Al-, and Mg-based alloys and at 400 eV for calculations of Cu-based alloys. The energy integration over a Brillouin zone was performed using k-points in accordance with the Monkhorst–Pack grid18) with sets of 6 × 6 × 6 in calculations of Ag-based and Cu-based alloys, 7 × 7 × 7 in calculations of Al-based alloys, and 6 × 6 × 5 in calculations of Mg-based alloys. The spin polarizations were calculated for alloys containing Cr, Mn, Fe, Co, and Ni solute atoms which exhibit the possibility of magnetism. Full structural relaxation of the atomic configurations and lattice constants were calculated because introducing a substitute solute atom leads to local lattice distortion and changes in the cell volume. Stable atomic configurations were obtained through relaxation based on the Hellmann–Feynman forces. The lattice constants at zero pressure were also optimized using a Broyden–Fletcher–Goldfarb–Shanno (BFGS)19) minimization algorithm in conjunction with the stress calculations. The convergence parameters were as follows: the total energy tolerance was 1.6 × 10−24 J/atom, the maximum force tolerance was 4.8 × 10−12 N, the maximal stress component was 0.05 GPa, and the maximum displacement was 1 × 10−4 nm.

Schematic representation of supercell modeling (a) Ag31X, Cu31X, and Al31X, and (b) Mg35X.
We calculated 10 parameters: the average number of electrons in s, p, and d orbits (Ns, Np, and Nd); bond overlap population (BOP); heat of solution (ΔHrelax); heat of solution of ideal lattice (ΔHunrelax); misfit strain (ε); Young’s modulus (E); shear modulus (G); and bulk modulus (K). Ns, Np, Nd, and BOP were calculated using Mulliken population analysis.20) In Ag-, Cu-, and Al-based alloys, the values of BOP between the solute atom and solvent atom at the first nearest neighbor were used. In Mg-based alloys, the average value between the solute atoms and the solvent element at the first and second nearest neighbor was used. ΔHrelax and ΔHunrelax were calculated using the following equations:
| \begin{equation} \Delta H = E_{\text{MX}} - (31/32E_{\text{M}} + E_{\text{X}}) \end{equation} | (1) |
| \begin{equation} \Delta H = E_{\text{MgX}} - (35/36E_{\text{Mg}} + E_{\text{X}}) \end{equation} | (2) |
| \begin{equation} \varepsilon = (d_{\text{MX}}-d_{\text{M}})/d_{\text{M}} \end{equation} | (3) |
We used Neural Network Console, which is a deep learning tool for training, evaluating, and designing the artificial neural network. An optimization algorithm called adaptive moment estimation (ADAM)23) was used to calculate the gradient needed in calculation of weights associated with each connection between neurons.
The network structure was explored and optimized from the number of layers and various activation functions using the Bayesian optimization based on Gaussian processes.24) The network was composed of four or five layers including one input layer, two or three hidden layers, and one output layer. Huber loss function25) was used as the loss function. The Huber loss function could stabilize the training set better than the squared error function. We used the following activation functions. The Relu function26) is widely used to perform quick calculations. The Elu function27) is an improved form of the Relu function. Abs is an absolute function that transfers the absolute value to the next layer. The Tanh function28) is symmetrical around 0. The Swish function29) looks like a sigmoid function. The Selu function30) can undergo self-normalization. Batch normalization31) is a process of normalizing within a mini-batch and can help in quick learning of the network. Dropout32) can prevent overfitting.
In this study, we constructed two solid solubility limit prediction systems. One used Hume-Rothery parameters similar to the study of Zhang et al.,7) while the other used results from the first-principles calculations. In the artificial neural network with the Hume-Rothery parameters, four explanatory variables were used: atomic radius difference between the solute atom and solvent atom, Pauling’s electronegativity,33) which is the difference between the solute atom and the solvent atom, the valence of the solute element, and the valence of the solvent element. The number of valence electrons was obtained from the group number in the periodic table used by the Chemical Abstract Service (CAS).
We also used 11 explanatory variables: one was experimental data and the other ten were the results of the first-principles calculations. The experimental data were melting point of solute atoms Tx.34) The results of the first-principles calculations were Ns, Np, Nd, BOP, ΔHrelax, ΔHunrelax, ε, E, G, and K in M-X solid solution alloys. Based on thermodynamic knowledge, the solid solubility limit depends on an excess Gibbs free energy of the solid solution, which can be subdivided into an enthalpy of mixing and entropy contribution. We expected that the results of the first-principles calculations in M-X solid solution alloys would be related to the excess Gibbs free energy of the solid solution, but the experimental melting point of solute atoms was especially expected to be related to the entropy contribution.
In any artificial neural network, we used the common logarithmic value of the solid solubility limit, c, collected from the equilibrium diagram35,36) as the explained variable log c. A total of 104 data including those for 27 Ag-based, 26 Cu-based, 30 Al-based, and 21 Mg-based alloys were collected. Datasets of the results of the first-principles calculations and the explained variables are shown in Tables 1, 2, 3, and 4. All explanatory variables and explained variables were normalized before training to make them fall within the range of −1 to 1.




Figure 2 shows the optimized network that used the Hume-Rothery parameter. Figure 3 shows the prediction of the solubility using Hume-Rothery parameters in comparison with the experimental data. We used the hold-out method for the network: 80% of all data were training data and 20% of all data were validation data. The coefficient of determination R2 for the correlation between predicted and experimental values was 0.715. The coefficient of determination of the study of Zhang et al. was 0.590.7) The coefficient of determination in this work was better than that by Zhang et al. The prediction accuracy was improved because of the increase in the number of data of Al-based and Mg-based alloys.

Architecture of the optimized artificial neural network using Hume-Rothery parameters.

Prediction of solubility using Hume-Rothery parameters in comparison with experimental data.
Figure 4 shows the optimized network that used the results of the first-principles calculations. Figure 5 shows the prediction of solubility using the results of the first-principles calculations in comparison with experimental data. Among all data, 80% were training data and 20% were validation data, as shown in Fig. 5. The coefficient of determination, R2, for the correlation between predicted and experimental values was 0.900. Thus, the prediction system of solid solubility limit was more accurate than that constructed by Zhang et al.7)

Architecture of the optimized artificial neural network using results of first-principles calculations.

Prediction of solubility using first-principles data in comparison with experimental data.
Stepwise regression is a method of fitting regression models in which the explanatory variables are chosen by an automatic procedure. We used the stepwise method to optimize the number of explanatory variables of the network which used the results of the first-principles calculation. Figure 6 shows the principle diagram of the stepwise method used in this work. First, the explanatory variables are reduced by 1 from 11 explanatory variables, and the combination that maximizes R2 is searched. Furthermore, the explanatory variables are reduced by one, and the optimal combination is searched. The hold-out method was used while selecting the explanatory variables by the stepwise method: 80% of all data were training data and 20% of all data were validation data. Figure 7 shows the changes in the coefficient of determination for only validation data with the decreasing number of explanatory variables. Table 5 shows the optimized combination of each number of explanatory variables. The best combination of explanatory variables includes eight explanatory variables which were the average number of electrons in the s and d orbits (Ns and Nd), heat of solution (ΔHrelax), heat of solution of ideal lattice (ΔHunrelax), Young’s modulus (E), shear modulus (G), bulk modulus (K), and melting point of solute atoms (Tx).

Principle diagram of stepwise method.

Change in coefficient of determination for validation data decreasing number of explanatory variables.

Generally, machine learning faces a problem called overfitting.32) Therefore, we tested 10-fold cross validation to evaluate the generalization performance of the constructed network with eight explanatory variables. Figure 8 shows the principles diagram of 10-fold cross validation. The procedure of 10-fold cross validation is as follows. All data were divided into 10 groups: one was the validation data group and the others were training data groups. The network was trained using the training data, and the validation data were estimated. This process was repeated 10 times. Finally, R2 was evaluated between experimental and all predicted values. Figure 9 shows the result of 10-fold cross validation. R2 of the network that used eight explanatory variables was 0.6993 after 10-fold cross validation.

Principle diagram of 10-fold cross validation.

Comparison between experimental data and the predicted data using 8 explanatory variables after 10-fold cross validation.
After 10-fold cross validation, we calculated the mean absolute error (MAE) of the network that used eight explanatory variables of first-principles calculation results. Table 6 shows the MAE of each alloy obtained by the prediction network. For all alloys, the MAE of the network that used eight explanatory variables by the first-principles calculations was 0.45. Because the common logarithmic value was used as the explained variable, the solid solubility limit predicted from the network on average was 10−0.45 to 100.45 times the true value. In other words, the solid solubility limit predicted from this network was on average 0.35 to 2.85 times the true value.

The accuracy of the solid solubility limit prediction system constructed in this study depended on the type of solvent elements as shown in Table 6. The MAEs of Ag-, Cu-, and Mg-based alloys were better than the value of 0.45 for all alloys. However, the MAE of Al-based alloys was 0.63, clearly inferior to the MAE for all alloys. Only Mg-based alloys were hcp structure, but the differences in crystal structures did not affect the accuracy of the prediction. Ag, Cu, and Mg have one or two valence electrons, and are simple metals which are metals with free-electron-like valence band structure. However, Al has three valence electrons and is a neighbor of Si in the periodic table. The several first-principles studies have shown that directional bonds with covalent characters can be locally formed for Al atoms compared to Cu and Mg atoms.37,38) It can be said that Al has a character intervening between simple metals and covalent materials, and that Al can form bonds with both metallic and covalent characters according to its environment. This tendency of Al atoms bonding may be related to the difficulty of predicting the solid solubility limit in Al-based alloys.
The prediction by the network constructed in this work was more accurate than that predicted by Zhang et al.7) Thus, the constructed prediction system is useful for designing and development of new alloys with unknown solid solubility limit.
Using artificial neural network and first-principles calculation results, we constructed a solid solubility limit prediction system with high prediction accuracy. The obtained results are shown below:
This study was supported by the Light Metal Educational Foundation, Inc.