2023 Volume 40 Pages 236-249
In this study, in order to optimize a fabrication process for SiO2/TiO2 composite particles and control their coating ratio (CTi), regression models for the coating process were constructed using various machine learning techniques. The composite particles with a core (SiO2)/shell (TiO2) structure were synthesized by mechanical stress under various fabrication conditions with respect to the supply volume of raw materials (V), addition ratio of TiO2 (rTi), operation time (t), rotor rotation speed (S), and temperature (T). Regression models were constructed by the least squares method (LSM), principal component regression (PCR), support vector regression (SVR), and the deep neural network (DNN) method. The accuracy of the constructed regression models was evaluated using the determination coefficients (R2) and the predictive performance was evaluated by comparing the prediction coefficients (Q2). From the perspective of the R2 and Q2 values, the DNN regression model was found to be the most suitable model for the present coating process. Moreover, the effects of the fabrication parameters on CTi were analyzed using the constructed DNN model. The results suggested that the t value was the dominant factor determining CTi of the composite particles, with the plot of CTi versus t displaying a clear maximum.
Composite particles synthesized by compounding techniques have been extensively utilized in various fields (Al-Salihi H.A. et al., 2019; Karger-Kocsis J. et al., 2014; Pierpaoli M. et al., 2019). Since powder properties play a key role in the suitability for a particular application (Kimura T. et al., 2020), selecting the optimal combination of particles is crucial for the performance of the composite particles (Deki Y. et al., 2018). For example, composite particles consisting of oxides, such as SiO2 and TiO2, are utilized in cosmetics and ceramics because of their superior physical properties including optical and electrical properties (Adebisi A.A. et al., 2016; Himoto I. et al., 2016). Consequently, it is necessary to control the powder properties during the fabrication of composite particles.
In the synthesis of composite particles, compounding by mixing and dispersion, which involves convection, shear, and diffusion processes, is extensively applied. These compounding techniques require a combination of impact, compression, and friction to overcome the aggregation forces. (Kim K. et al., 2016). In many cases of conventional processes using mixers and mechanical compounding machinery, premixing and compounding in other equipment is necessary because the mechanism of mixing of a single device is biased toward convection, shear, or diffusion (Thongnopkoon T. et al., 2018). Several fabrication methods for composite particles have been reported, although achieving control over the powder properties of the particles during the fabrication processes remains challenging owing to the complex relationships between the powder properties and fabrication conditions (Matsuoka Y. et al., 2021). Hence, it is essential to develop methods of predicting these complex relationships to control the powder properties of the composite particles.
Data with complex correlations are often analyzed by statistical and machine learning methods (Kaneko H. and Funatsu K., 2015; Wada S. et al., 2021; Zhao Z. et al., 2018), such as the least-squares method (LSM), principal component regression (PCR), and support vector regression (SVR) (De Backer A. et al., 2021; Tran H. et al., 2018; Zhang Z. et al., 2021). In addition, a deep neural network (DNN) analysis has proved useful in a variety of fields, including agriculture (Cai Y. et al., 2019; Qui Z. et al., 2018), environmental studies (Ghatak M.D. and Ghatak A., 2018; Tanzifi M. et al., 2018), and medicine (Basheer I.A. and Hajmeer M., 2000; Horie Y. et al., 2019; Stokes J.M. et al., 2020). DNN analysis is one of the typical machine learning methods and aims to replicate the neural circuits of the human brain by a mathematical model using multiple artificial neurons. A DNN learns by adjusting parameters in the model and can predict complex correlations (Basheer I.A. and Hajmeer M., 2000; Zhang G. et al., 1998).
In previous studies (De Backer A. et al., 2021; Tran H. et al., 2018; Zhang Z. et al., 2021), regression models were constructed to examine the relationships between several explanatory variables and the objective variables using various methods, including LSM, PCR, and SVR, and their predictive performance was improved by considering multiple explanatory variables. For example, Matsuoka et al. (Matsuoka Y. et al., 2021) investigated the relationships between the operating conditions and the physical properties of oral solid dosage tablets during a continuous manufacturing process using a DNN model, which successfully predicted the physical properties of the tablets from the operating conditions with high accuracy. In another study, regression models were constructed to estimate the volume loss of AA7075/Al2O3 composites during wear test at various operating conditions using the LSM, SVR, and DNN methods, and the performance of the machine learning was compared with statistical analysis for this challenging situation involving complex correlations between the explanatory and objective variables (Aydin F., 2021).
In this study, the optimization of prediction techniques for the complex correlation between powder properties and fabrication conditions by constructing regression models using several methods was investigated. Furthermore, parameters affecting the powder properties of SiO2/TiO2 composite particles were analyzed with a focus on the coating ratio (CTi) of the composite particles. The SiO2/TiO2 composite particles were synthesized using a powder processing system (NOB-MINI, HOSOKAWA Micron Co., Japan). Because of the balanced effects of compression, shear, and impact on individual particles, the fabrication apparatus used in this study enables particle design and particle processing, such as compounding, surface modification, and spheronization. To analyze the correlations between the fabrication conditions and CTi of the SiO2/TiO2 composite particles, the supply volume (V), addition ratio of TiO2 (rTi), operation time (t), rotor rotation speed (S), and temperature (T) were varied. On the basis of the experimentally determined CTi values of the composite particles prepared under various fabrication conditions, regression models for CTi of the composite particles were obtained from the machine learning methods. Furthermore, the optimized models were used to predict CTi values of the composite particles under various fabrication conditions.
The composite particles were prepared from SiO2 (MT-150W, Tayca Co., Japan), as the core particle and TiO2 (Silsic T-1 (S-1), Yamamori Tsuchimoto Inc., Japan) as the shell particle. The SiO2 and TiO2 particles were supplied to the experimental apparatus (NOB-MINI, HOSOKAWA Micron Co., Japan). The supply volume of the raw materials (V) and the addition ratio of TiO2 (rTi) were varied in the ranges of 25–125 mL and 5.0–15.0 wt%, respectively. The operation time (t) was set to 1–15 min. The rotation speed (S) and temperature (T) were set to 2,200–5,700 min−1 and 295–319 K, respectively. Under all experimental conditions, the electric current was maintained at constant value of 3.5 A. In total, the SiO2/TiO2 composite particles were synthesized under 29 sets of experimental conditions, as summarized in Table 1. The surface morphology of the SiO2/TiO2 composite particles was examined by scanning electron microscopy (SEM; SU3500 II; Hitachi High-Tech Science Co., Japan), and the Si and Ti distributions were measured using an energy-dispersive X-ray spectrometer (EDS; Ultim Max, Oxford Instruments Co., Japan) connected to the SEM. The mapping time was set at approximately 120 s. The coating ratio (CTi) of the composite particles was calculated according to Eqn. (1) from the integrated areas of Si (SSi) and Ti (STi) in the elemental mapping images by using imaging software (WinROOF; Mitani Corp., Japan):
(1) |
Summary of fabrication conditions for the SiO2/TiO2 composite particles.
Experimental condition | |
---|---|
Supply volume (V) [mL] | 25–125 |
Addition ratio of Ti (rTi) [wt%] | 5.0–15.0 |
Operation time (t) [min] | 1–15 |
Rotation speed (S) [min−1] | 2,200–5,700 |
Temperature (T) [K] | 295–319 |
The regression models were constructed as described in Sections 2.2.1–2.2.4. The fabrication parameters (V, rTi, t, S, and T) were employed as the explanatory variables to serve as the input values. CTi was chosen as the objective variable to serve as the output value. The 29 samples were split into training to construct the regression models (21 samples) and test data to evaluate the predictive performance of the constructed models (8 samples). The input and output values (z) were auto-scaled by following Eqn (2):
(2) |
where z′ denotes the auto-scaled value, and μ and σ represent the mean and standard deviation of each explanatory variable, respectively.
The determination coefficient (R2) was used to evaluate the accuracy of the regression models constructed from the training data. The predictive performance of the constructed models was assessed by calculating the predictive coefficient (Q2) using the test data. The root mean square error (RMSE) for each model was also calculated. The formulas used to calculate the R2, Q2, and RMSE are given in Eqns. (3)–(5):
(3) |
(4) |
(5) |
where yi,
The R2 value indicates the accuracy of model construction because it expresses the degree of agreement between the experimental values and the calculated values when training data are substituted into the models constructed from training data. Meanwhile, the Q2 value reflects the degree of agreement between the experimental values and the predicted values when test data are substituted into the models constructed from training data. The maximum value for both R2 and Q2 is 1.0, and values closer to 1.0 indicate a higher quality model. The RMSE has a positive value and is used to evaluate errors in the numerical prediction, where a smaller value indicates superior predictive performance (Barrasso D. et al., 2015).
2.2.1 Least-squares method (LSM)LSM modeling is a construction method that involves determining the coefficients (βLSM) that minimize the sum-of-squares error (SLSM) between calculated values (ycalc.) and experimental values (y). When the numbers of samples and explanatory variables are m and n, respectively, the vector of error (ɛLSM), βLSM, y, and matrix of explanatory variables (X) can be expressed as shown in Eqns. (6)–(9):
(6) |
(7) |
(8) |
(9) |
Furthermore, the ycalc. and y vectors are given by Eqns. (10) and (11), respectively:
(10) |
(11) |
Because smaller errors between the ycalc. and y vectors were desired, the βLSM vector minimizing the SLSM matrix, which is determined by the square sum of the ɛLSM vector, was sought by partial differentiation of Eqn. (11), to afford the relationship shown in Eqn. (12):
(12) |
Furthermore, when the inverse matrix of the XTX matrix was multiplied from the left of both sides of Eqn. (12), the βLSM vector was optimized as shown in Eqn. (13), thus constructing the LSM model:
(13) |
For simple relationships, such as linear and quadratic functions consisting of a small number of parameters serving as explanatory variables, the LSM is a suitable method. Because the order of the explanatory variables was set to 1, the LSM model constructed in this study is a liner regression.
2.2.2 Principal component regression (PCR)PCR modeling is a construction method in which explanatory variables are transformed into principal components that are uncorrelated with each other. The algorithm for PCR model construction consists of the following two steps (Hotelling H., 1957).
In the first step, principal component analysis (PCA) is conducted. When the score vector (tPCR) is defined as a linear combination of the X matrix, the tPCR vector is obtained as described by Eqn. (14):
(14) |
where pPCR is the weight of the linear combination, which is referred to as loading.
Because PCA is performed by maximization of the score, the sum-of-squares score (SPCA) is maximized by using the Lagrange multiplier (GPCR) expressed in Eqn. (15):
(15) |
where j represents the number of explanatory variables. n is the number of solutions of the equation represented by the λ value. The tPCR vector with the n-th largest variance of λ is defined as the n-th principal component, and the principal component matrix (T) is given by Eqn. (16):
(16) |
The PCR model was constructed by adopting the T matrix as the explanatory variables in a similar manner as described for LSM in Section 2.2.1, and the relationship between the output variables and the feature components was optimized by varying the number of principal components (NP.C.) within the range of 1–5.
For processes with a correlation between the explanatory variables, the PCR method is considered the optimal selection because the impact of explanatory variables with dependent relationships on the accuracy of the regression model is reduced.
2.2.3 Support vector regression (SVR)An SVR model is constructed using a support vector machine (SVM) for regression analysis. In SVR modeling, a kernel trick along with SVM is applied to construct nonlinear models. The original form of the SVR minimizes the matrix (SSVR) shown in Eqn. (17), which is related to the vectors of error and coefficient in SVR:
(17) |
where f and w denote the SVR model and a weight vector, respectively, ɛ is a threshold, C is a penalty term that controls the trade-off between the model complexity and training errors, and N is the number of training data. The second term of Eqn. (17) is the ɛ-insensitive loss function, as defined in Eqn. (18):
(18) |
Minimization of Eqn. (17) affords a regression model with a satisfactory balance between generalization capability and ability to adapt to training data. When an x vector is inputted, a y value is predicted by Eqn. (19):
(19) |
where K is a kernel function, and uSVR is a constant. As the kernel function for this study, the radial basis function kernel given by Eqn. (20) was adopted:
(20) |
where γ represents a turning parameter for controlling the width of the kernel function, and αi and αi* in Eqn. (19) are obtained from Eqns. (17) and (18) by minimizing the Lagrange multiplier (GSVR), as expressed in Eqn. (21):
(21) |
and the αi and αi* values are subject to
(22) |
(23) |
and Kij in Eqn. (21) is
(24) |
In SVR modeling, the hyperparameters (C, ɛ, and γ values) have to be set beforehand. The hyperparameters were selected by a comprehensive grid search (Kaneko H. and Funatsu K. et al., 2013).
The SVR method is suitable when it is necessary to construct a regression model to predict processes involving nonlinearities and negligible error ranges.
2.2.4 Deep neural network (DNN)A neural network (NN) model is constructed by optimizing the hyperparameters, such as the numbers of hidden layers (NH.L.) and neurons (NNeur.), the activation function, and the number of learning times (NL.T.). An NN with an NH.L greater than two is referred to as a DNN. In this study, the NH.L. and NNeur. values were each varied in the range of 1–10. As the activation functions, the sigmoid function, tanh function, and rectified linear unit (ReLU) function were compared. The sigmoid function has a long history as an activation function for NN models. The tanh function has been reported to learn faster than the sigmoid function (Ismail H.Y. et al., 2019; Shirazian S. et al., 2017). In recent years, the ReLU function has seen increasing use because of several advantages such as reduced gradient vanishing and faster calculation (Roggo Y. et al., 2020). The sigmoid function, tanh function, and ReLU function are expressed in Eqns. (25), (26), and (27), respectively:
(25) |
(26) |
(27) |
In all cases, the stochastic gradient descent method was adopted as the optimization method. NL.T. was varied in the range of 30–3,500. The hyperparameters are summarized in Table 2.
Summary of the hyperparameter ranges used to optimize the DNN method.
Parameter | |
---|---|
Number of hidden layers (NH.L.) [−] | 1–10 |
Number of neurons (NNeur.) [−] | 1–10 |
Activation function | Sigmoid Hyperbolic tangent Rectified linear unit |
Loss function | Mean square error |
Optimization method | Stochastic gradient descent |
Learning times (NL.T.) [−] | 30–3,500 |
For systems in which the explanatory variables and objective variables are intricately related, the DNN method is an appropriate selection.
To investigate the effects of the fabrication conditions on CTi of the SiO2/TiO2 composite particles, the particles were synthesized under 29 sets of conditions by varying the values of V (25–125 mL), rTi (5.0–15.0 wt%), t (1–15 min), S (2,200–5,700 min−1), and T (295–319 K). V, rTi, t, S, and T were set to include the maximum and minimum values within the operable range of experimental manipulations. The SEM and EDS images of the SiO2/TiO2 composite particles for t values of 1 and 10 min are shown in Fig. 1 to demonstrate the surface morphology and the state of the TiO2 coating on the SiO2 particles. The V and rTi values were 25 mL and 5.0 wt%, respectively. The S and T values were varied with an increase in the t value. From the EDS observations, the CTi values of the composite particles were calculated using Eqn. (1). The data obtained from the fabrication experiments are listed in Table 3. In addition, the data subjected to auto-scaling using Eqn. (2) are listed in Table 4. During the construction of the regression models using the machine learning methods, data from 21 of the fabrication experiments were used as training data. The data from the remaining eight fabrication experiments were used as test data to evaluate the predictive performance of the constructed models.
SEM and EDS images of the SiO2/TiO2 composite particles obtained at t values of 1 min and 10 min.
Experimental data for the relationship between the fabrication conditions and CTi of the composite particles.
Run | V [mL] | rTi [wt%] | t [min] | S [min−1] | T [K] | CTi [%] |
---|---|---|---|---|---|---|
1 | 25 | 5.0 | 1 | 5,500 | 295 | 34.6 |
2 | 25 | 5.0 | 3 | 5,600 | 316 | 57.1 |
3 | 25 | 5.0 | 5 | 5,500 | 308 | 35.2 |
4 | 25 | 5.0 | 10 | 5,600 | 317 | 40.5 |
5 | 25 | 5.0 | 15 | 5,700 | 317 | 41.0 |
6 | 50 | 5.0 | 10 | 3,500 | 305 | 44.5 |
7 | 50 | 5.0 | 15 | 3,500 | 306 | 35.5 |
8 | 100 | 5.0 | 10 | 2,500 | 302 | 37.1 |
9 | 100 | 5.0 | 15 | 2,500 | 303 | 45.7 |
10 | 50 | 10.0 | 10 | 3,700 | 307 | 39.5 |
11 | 50 | 10.0 | 15 | 3,700 | 308 | 42.5 |
12 | 100 | 10.0 | 10 | 2,600 | 303 | 46.7 |
13 | 100 | 10.0 | 15 | 2,600 | 304 | 45.1 |
14 | 75 | 5.0 | 10 | 2,800 | 304 | 52.7 |
15 | 75 | 5.0 | 15 | 2,800 | 304 | 36.2 |
16 | 125 | 5.0 | 10 | 2,200 | 301 | 61.5 |
17 | 125 | 5.0 | 15 | 2,200 | 301 | 40.0 |
18 | 100 | 5.0 | 3 | 2,500 | 310 | 34.7 |
19 | 100 | 5.0 | 5 | 2,500 | 313 | 37.6 |
20 | 100 | 5.0 | 10 | 2,500 | 315 | 39.6 |
21 | 100 | 10.0 | 3 | 2,600 | 312 | 43.6 |
22 | 100 | 10.0 | 5 | 2,600 | 315 | 53.1 |
23 | 100 | 10.0 | 10 | 2,600 | 316 | 53.0 |
24 | 100 | 15.0 | 3 | 2,800 | 315 | 49.3 |
25 | 100 | 15.0 | 5 | 2,800 | 317 | 57.2 |
26 | 100 | 15.0 | 10 | 2,700 | 319 | 63.9 |
27 | 50 | 15.0 | 3 | 3,800 | 319 | 59.8 |
28 | 50 | 15.0 | 5 | 3,800 | 315 | 52.1 |
29 | 50 | 15.0 | 10 | 3,800 | 316 | 63.8 |
Auto-scaled data for the relationship between the fabrication conditions and CTi of the composite particles.
Run | V [mL] | rTi [wt%] | t [min] | S [min−1] | T [K] | CTi [%] |
---|---|---|---|---|---|---|
1 | −1.55 | −0.82 | −1.74 | 1.91 | −2.24 | −1.28 |
2 | −1.55 | −0.82 | −1.30 | 2.00 | 0.96 | 1.18 |
3 | −1.55 | −0.82 | −0.85 | 1.91 | −0.34 | −1.21 |
4 | −1.55 | −0.82 | 0.26 | 2.00 | 1.05 | −0.63 |
5 | −1.55 | −0.82 | 1.37 | 2.08 | 1.07 | −0.58 |
6 | −0.78 | −0.82 | 0.26 | 0.12 | −0.74 | −0.20 |
7 | −0.78 | −0.82 | 1.37 | 0.12 | −0.52 | −1.18 |
8 | 0.78 | −0.82 | 0.26 | −0.77 | −1.20 | −1.01 |
9 | 0.78 | −0.82 | 1.37 | −0.77 | −1.10 | −0.07 |
10 | −0.78 | 0.43 | 0.26 | 0.30 | −0.46 | −0.75 |
11 | −0.78 | 0.43 | 1.37 | 0.30 | −0.20 | −0.42 |
12 | 0.78 | 0.43 | 0.26 | −0.68 | −1.06 | 0.05 |
13 | 0.78 | 0.43 | 1.37 | −0.68 | −0.86 | −0.13 |
14 | 0.00 | −0.82 | 0.26 | −0.50 | −0.89 | 0.70 |
15 | 0.00 | −0.82 | 1.37 | −0.50 | −0.86 | −1.11 |
16 | 1.55 | −0.82 | 0.26 | −1.04 | −1.32 | 1.66 |
17 | 1.55 | −0.82 | 1.37 | −1.04 | −1.30 | −0.69 |
18 | 0.78 | −0.82 | −1.30 | −0.77 | −0.02 | −1.27 |
19 | 0.78 | −0.82 | −0.85 | −0.77 | 0.43 | −0.95 |
20 | 0.78 | −0.82 | 0.26 | −0.77 | 0.84 | −0.74 |
21 | 0.78 | 0.43 | −1.30 | −0.68 | 0.41 | −0.30 |
22 | 0.78 | 0.43 | −0.85 | −0.68 | 0.79 | 0.74 |
23 | 0.78 | 0.43 | 0.26 | −0.68 | 0.98 | 0.73 |
24 | 0.78 | 1.68 | −1.30 | −0.50 | 0.76 | 0.33 |
25 | 0.78 | 1.68 | −0.85 | −0.50 | 1.16 | 1.19 |
26 | 0.78 | 1.68 | 0.26 | −0.59 | 1.37 | 1.92 |
27 | −0.78 | 1.68 | −1.30 | 0.39 | 1.41 | 1.47 |
28 | −0.78 | 1.68 | −0.85 | 0.39 | 0.84 | 0.64 |
29 | −0.78 | 1.68 | 0.26 | 0.39 | 1.02 | 1.91 |
The regression models were constructed using four machine learning methods, namely, LSM, PCR, SVR, and DNN. Regression analysis was performed to predict CTi of the SiO2/TiO2 composite particles depending on the values of V (25–125 mL), rTi (5.0–15.0 wt%), t (1–15 min), S (2,200–5,700 min−1), and T (295–319 K), which were input as explanatory variables. CTi of the composite particles was set as the objective variable to serve as the output value. The accuracy of the constructed regression models with respect to the training data was evaluated by calculating the R2 values according to Eqn. (3), and the predictive performances of the constructed models with respect to the test data were compared using the Q2 values calculated from Eqn. (4). The errors of the regression models were evaluated by calculating the RMSE values according to Eqn. (5).
3.2.1 Construction of the LSM modelThe LSM model represents the relationship between the explanatory variables and objective variable (Stojanovic B. et al., 2016). When the order of the explanatory variables is 1, as in this study, the relationship derived by the LSM is linear. Thus, the regression model constructed by the LSM according to Eqns. (6)–(13) is expressed in Eqn. (28):
(28) |
where V′, rTi′, t′, S′, and T′ denote the auto-scaled values of each fabrication parameter and CTi′ is the auto-scaled value of CTi of the SiO2/TiO2 composite particles. Hence the coefficients in Eqn. (28) reflect the influence of the corresponding parameter on CTi of the composite particles. All of the coefficients were positive, indicating that increasing the value of each fabrication parameter increased CTi of the composite particles. Furthermore, the contribution of each parameter to CTi of the composite particles was calculated by comparing the absolute values of the coefficients. The contributions of the parameters decreased in the following order: rTi (61.6 %) > T (24.2 %) > V (6.69 %) > S (5.32 %) > t (2.15 %).
The relationship between the actual CTi values obtained from the experimental data and the calculated values obtained from the LSM regression model is presented in Fig. 2. The input values were 0.617 and 5.99 %, respectively. In general, the accuracy of a constructed regression model increases as the R2 value approaches 1.0 and the RMSE value decreases. The low R2 value was attributed to the features of LSM.
Relationship between the actual and predicted CTi values for the LSM regression model.
The correlation represented by the LSM is a linear variation of CTi of the composite particles with respect to five fabrication parameters (Arioli M. and Gratton S., 2012; Zhang Y. and Fearn T., 2015). Thus, when the correlation between the fabrication parameters and CTi of the composite particles is not linear, this nonlinear relationship cannot be adequately expressed by a regression model based on the LSM (Arioli M. and Gratton S., 2012; Zhang Y. and Fearn T., 2015). Moreover, if the fabrication parameters are highly related to each other, the coefficients in Eqn. (28) could be anomalous owing to instability in the analytical calculations and the inaccuracy of the regression equation (Arioli M. and Gratton S., 2012; Zhang Y. and Fearn T., 2015).
3.2.2 Construction of the PCR modelIn an effort to deal with the inaccuracy and instability of the regression model due to the correlation between the fabrication parameters as described in Section 3.2.1, the parameters were converted to principal components uncorrelated with each other by using PCA as expressed in Eqns. (14) and (15). Because the fabrication parameters highly related to each other were removed in advance, this allowed for prediction of CTi of the composite particles by a combination of fabrication parameters with low correlation (El Ghaziri A. and Qannari E.M., 2015).
The number of principal components (NP.C.) was varied within the range of 1–5, and the relationship between the actual CTi values obtained from the experimental data and the calculated values obtained from the PCR regression models for different NP.C. values are presented in Fig. 3. The R2 values for each regression model are also indicated. When the data points are closer to the dotted line of y = x in the plots, the R2 values for the PCR regression models approach 1.0. Because the R2 values increased with increasing NP.C., all five of principal components were applied to the construction of the PCR regression model.
Actual and predicted CTi values for the PCR regression model with various NP.C. values.
The PCR regression model constructed with NP.C. of 5 is expressed in Eqn. (29):
(29) |
where tPCR,i represents the i-th principal component obtained by PCA. The effects of the principal components on CTi of the composite particles are expressed by each coefficient. The obtained R2 and RMSE values were 0.617 and 5.99 %, respectively. The fact that all of the principal components were applied to the construction of the regression model implies that the correlation between each fabrication parameter prior to PCR processing was not strong (El Ghaziri A. and Qannari E.M., 2015).
When all of the principal components were used to construct the regression model, the values calculated from the PCR regression model were identical to those calculated from the LSM model, because the application of the last principal component means that any effect of the fabrication parameters was not removed. Hence, the contribution of any correlation between the fabrication parameters to the low accuracy of the LSM regression model discussed in Section 3.2.1 was small. Thus, in an attempt to improve the accuracy of the regression models, we next considered the possibility of a nonlinear correlation between the fabrication parameters and CTi of the composite particles.
3.2.3 Construction of SVR modelTo consider a nonlinear correlation between the fabrication parameters and CTi of the composite particles, a regression model was constructed using SVR according to Eqns. (17)–(24). Regression models using SVR are constructed by minimizing the structural risk. The hyperparameters (C, ɛ, and γ) of the SVR model were optimized by adapting the comprehensive combination from the candidates listed in Table 5 by exploring hyperparameters with maximum R2 values in verification results. The C, ɛ, and γ values in the optimized SVR model were 2−5, 20, and 2−2, respectively.
Hyperparameters used in the SVR regression model.
C | 2−5, 2−4, …, 29, 210 | 16 candidates |
ɛ | 2−15, 2−14, …, 2−1, 20 | 16 candidates |
γ | 2−20, 2−19, …, 29, 210 | 31 candidates |
The relationship between the actual CTi values obtained from the experimental data and the calculated values obtained from the SVR regression model is shown in Fig. 4. The R2 and RMSE values were 0.591 and 5.80 %, respectively. The former value is slightly lower than that obtained for the LSM model (0.617), indicating a lower accuracy. In contrast, the RMSE value was slightly smaller for the SVR model, indicating a high accuracy.
Relationship between the actual and predicted CTi values for the SVR regression model.
This comparison based on the R2 and RMSE values suggests that the SVR and LSM models had similar accuracy. Thus, under the conditions of this study, consideration of the possibility of a nonlinear correlation between the fabrication parameters and CTi of the composite particles resulted in little change in the accuracy of the regression model. Therefore, we next considered the possibility of more complex correlations between the fabrication parameters and CTi of the composite particles.
3.2.4 Construction of DNN modelTo consider more complex correlations between the fabrication parameters and CTi of the composite particles, a regression model was constructed using a DNN. For this model, the hyperparameters (NH.L., NNeur., activation function and NL.T.) were optimized by exploring which hyperparameters afford the highest R2 values in the verification results.
The variation of the R2 values with NH.L., NNeur., and NL.T. is plotted in Fig. 5. In the case of NH.L., as shown in Fig. 5a), the R2 values were almost constant for NH.L. values in the ranges of 1–5 layers and 6–10 layers but increased slightly when NH.L. was increased from 5 layers to 6 layers. This increase in the R2 values with an increase in NH.L. from 5 layers to 6 layers is caused by improved fit of the relation between the fabrication parameters and CTi of the composite particles. In the case of NNeur., as shown in Fig. 5b), the R2 values tended to increase with increasing NNeur. in the range of 1–6 neurons, after which the R2 values remained almost constant irrespective of NNeur.. This increase in the R2 values with increasing NNeur. in the range of 1–6 neurons is caused by improved fit of the relation between the fabrication parameters and CTi of the composite particles. The minimal variation of the R2 values in the NNeur. range of 6–10 neurons was attributable to the sufficiently good fit at the NNeur. of 6 neurons. Finally, NL.T. was varied in the range of 30–3,500 times. As shown in Fig. 5c), the R2 values rapidly increased as NL.T. was increased from 30 to 700 times. Then, as NL.T. was increased from 700 to 1,500 times, the R2 values increased more gradually. At NL.T. values above 1,500, the R2 values remained almost constant irrespective of NL.T.. Comparison of various activation functions revealed that the tanh afforded the highest R2 value, as summarized in Table 6. Hence, the optimized hyperparameters for the DNN regression model were an NH.L. of six layers, an NNeur. of six neurons, an NL.T. of 1,500 times, and a tanh activation function.
R2 values with variation in a) NH.L., b) NNeur., and c) NL.T. for optimizing the DNN regression model.
Relationship between the R2 values and activation functions for optimizing the DNN regression model.
Activation function | R2 [−] |
---|---|
Sigmoid | 0.0294 |
tanh | 0.596 |
ReLU | 0.152 |
The relationship between the actual CTi values obtained from the experimental data and the calculated values obtained from the DNN regression model is plotted in Fig. 6. The R2 and RMSE values were 0.941 and 2.19 %, respectively. Comparison of the results obtained for the DNN, LSM, and SVR regression models revealed that the DNN model displayed the highest accuracy, as indicated by its high R2 value and low RMSE value.
Relationship between the actual and predicted CTi values for the DNN regression model.
The LSM, PCR, SVR, and DNN regression models were used to predict CTi of the SiO2/TiO2 composite particles under various fabrication conditions based on the test data. Moreover, the predictive performances of the constructed models were evaluated by comparison of their Q2-values.
The relationship between the actual CTi values obtained from the experimental data and the calculated values obtained from the LSM, SVR, and DNN models for the training data are plotted in Fig. 7. Furthermore, to verify the predictive performances of the constructed models, the relationships between the actual CTi values obtained from the experimental data and the predicted values obtained from the three models for the test data are shown in Fig. 8. The results for the PCR regression model are excluded from these plots because they were identical to these obtained from the LSM model. The horizontal and vertical axes show the actual values obtained from the experimental data and the predicted values calculated from the training data or test data, respectively. The R2 and Q2 values become closer to 1.0 as the data points approach the dotted line of y = x in the plots. The R2 and Q2 values for each regression model are summarized in Table 7, along with the RMSE values for the training data (RMSEtrain) and test data (RMSEtest).
Comparison of the accuracy of the various regression models by R2 values based on the training data.
Comparison of the predictive performance of the various regression models by Q2 values based on the test data.
Summary of the R2, Q2, and RMSE values for the various regression models.
R2 [−] | RMSEtrain [%] | Q2 [−] | RMSEtest [%] | |
---|---|---|---|---|
LSM | 0.617 | 5.99 | −0.510 | 9.28 |
SVR | 0.591 | 5.80 | 0.280 | 7.97 |
DNN | 0.941 | 2.19 | 0.767 | 3.26 |
Comparison of the R2 values for the three regression models revealed that the DNN model had the highest accuracy. Similarly, the DNN model displayed the highest Q2 value, indicating the best predictive performance. For a regression model with high accuracy and predictive performance, higher R2 and Q2 values are required (Gurgenc T. et al., 2020). The high R2 and Q2 values of 0.941 and 0.767 obtained for the DNN regression model in this study demonstrate the successful construction of a regression model with high accuracy and predictive performance for estimating CTi of SiO2/TiO2 composite particles. The RMSE values for the training data and test data were 2.19 % and 3.26 %, respectively. The average CTi of the SiO2/TiO2 composite particles in the experimental data was 46.3 %. In comparison, the RMSE values for the training and test data were sufficiently small. Thus, the construction of a regression model for predicting CTi of SiO2/TiO2 composite particles under various fabrication conditions was successfully realized by using the DNN method.
The DNN regression model displayed the highest accuracy in this study because the DNN method considers more complex correlations between explanatory variables and objective variables, including nonlinearities, by varying NH.L. and NNeur.. The LSM regression model attempts to predict the CTi values by calculating a formula based on simple relationships involving the explanatory variables. Meanwhile, the PCR regression model has similar features to the LSM model because the main difference between the two methods is the replacement of explanatory valuables with principal components. The SVR regression model is constructed by using explanatory variables converted to support vectors by processing with kernel functions. Because the support vectors of SVR models are used in the same manner as the principal components of PCR models, SVR models possess similar characteristics to LSM and PCR models. Therefore, the constructed SVR regression model also predicted the CTi values from a calculation formula involving only simple relationships. In contrast, the application of the DNN method to construct a regression model leads to predictions based on complex correlations between the explanatory variables and objective variables because of the numerous hidden layers and neurons inherent to this approach.
3.4 Analyzing the effects of fabrication conditions on CTi of SiO2/TiO2 composite particles by the DNN regression modelThe DNN regression model was applied to analyzed the relationship between the fabrication conditions and CTi of the SiO2/TiO2 composite particles. The application range of the DNN regression model with respect to the five fabrication parameters was V = 25–125 mL, rTi = 5.0–15.0 wt%, t = 1–15 min, S = 2,200–5,700 min−1, and T = 295–320 K. Each fabrication parameter serving as an input value was divided into 20 points over the corresponding range, and CTi of the SiO2/TiO2 composite particles was predicted using the DNN regression model.
The calculated effects of the fabrication parameters on CTi of the SiO2/TiO2 composite particles are plotted in Fig. 9. Comparison of the resulting curves revealed that the maximum gradients decreased in the following order: t > T > V > rTi > S. Because a higher gradient indicates a larger influence of the fabrication parameter on CTi of the composite particles, these results suggest that the effects of the fabrication parameters on CTi decrease in the same order. The V, rTi, t, S and T values under the base condition when varying each parameter were 100 mL, 5.0 wt%, 5 min, 2,500 min−1, and 303 K, respectively.
Calculated effects of the fabrication parameters on CTi of the composite particles.
The plot of CTi versus t revealed a clear maximum. CTi of the composite particles initially increased with increasing t owing to the gradual coating of TiO2 onto SiO2 over time. However, at higher t values, CTi of the composite particles decreased as a result of exfoliation of the surface coating.
Upon varying T, CTi of the composite particles markedly decreased with increasing T in the low-T range then remained almost constant with increasing T in the high T range. These findings suggest that lower T values are beneficial for enhancing CTi of SiO2/TiO2 composite particles.
The variation of V initially had little effect on CTi of the composite particles, which remained almost constant with increasing V in the low-V range. At higher values of V, CTi of the composite particles decreased with increasing V, which was ascribed to a decrease in the contact frequency per single particle.
Examining of the relationship between rTi and CTi of the composite particles revealed that CTi slightly decreased with increasing rTi at lower rTi values. However, at higher rTi values, CTi increased with increasing rTi, which was attributed to the increased proportion of TiO2 particles with respect to SiO2.
Finally, upon increasing S, CTi of the composite particles slightly increased. This was ascribed to the progress of coating TiO2 onto SiO2.
In this study, SiO2/TiO2 composite particles with different CTi values were synthesized under various conditions (V, rTi, t, S, and T). To optimize the fabrication process of SiO2/TiO2 composite particles, regression models were constructed to predict CTi of the composite particles using the LSM, PCR, SVR, and DNN approaches. Furthermore, the regression model with the highest accuracy and predictive performance was employed to analyze the effects of the fabrication parameters on CTi of the SiO2/TiO2 coated composite particles. The obtained results can be summarized as follows:
This study was financially supported by the HOSOKAWA Powder Technology Foundation (No. 20502), Osaka, Japan.
Taichi Kimura
Taichi Kimura is a Ph.D. student at Doshisha University since 2020. He received his Bachelor and Master of Engineering degrees from Nihon University in 2017 and 2020, respectively. His research interests are analysis and design of particle fabrication processes using machine learning.
Riko Iwamoto
Riko Iwamoto received her Bachelor of Engineering from Doshisha University in 2022. Her research interests were composite particle synthesis by mechano-chemical processes and materials designs using data science. She works for a company involved in data analysis since 2022.
Mikio Yoshida
Dr. Mikio Yoshida has been a Professor at Doshisha University since 2022. He received his Ph.D. degree from Doshisha University in 2005. After that, he worked as a post-doctoral researcher at Okayama University. He was employed as an Assistant Professor of Okayama University in 2007, and as an Associate Professor of Doshisha University in 2016. He was a scientific committee member of the 9th World Congress on Particle Technology (WCPT9) 2022 in Madrid, Spain. He has been an editorial board member of Advanced Powder Technology since 2017.
Tatsuya Takahashi
He received his Bachelor of Engineering degree, and Master of Engineering in Physics from Kwansei Gakuin University, Japan in 2013, and 2015, respectively. From 2015, he works Institute of Powder Engineering and ICT Digital Promotion Division, Hosokawa Micron Co., Ltd., Japan. His research topics cover area in semiconductor materials engineering technology, powder analysis technology.
Shuji Sasabe
Shuji Sasabe is a Division Manager of ICT Digital Promotion Division of Hosokawa Micron Corp. He received Master’s degree in electrical engineering from Nippon Institute of Technology, Japan in 1991. He has been engaged in research on powder processing and powder characterization, and has been involved in the development of many powder characterization instruments. He is engaged in the promotion of Digital transformation.
Yoshiyuki Shirakawa
Dr. Yoshiyuki Shirakawa is a Professor at Doshisha University. He received his Ph.D. in 1993 from Niigata University. He started an academic carrier at Niigata University as a post-doctoral student for JSPS Research Fellowship Sciences in 1993 and worked at Osaka and Doshisha Universities. His research interests are composite particle synthesis by crystallization and mechano-chemical processes and materials designs using computer simulations.