2021 Volume 61 Issue 6 Pages 1915-1926
In this study, we established mathematical model of the carbon-containing pellet reduction process and used the neural network model to speed up the prediction process for actual production in the rotary hearth furnace (RHF). In order to obtain enough data to make a neural network, we calculated some results under different conditions by the pellet reduction mathematical model. Then, we developed and trained a feed-forward back-propagation neural network model using MATLAB software. The input parameters of the model included the temperature in the furnace, the reduction time, size and C/O ratio of the carbon-containing pellet and the output parameter was the final degree of metallization of the carbon-containing pellet. Beside, we optimized initial weights and thresholds of the model utilizing genetic algorithm, and also compared and analyzed the number of hidden layer neurons, training algorithm, learning rate, and population size of it. Finally, we chose 4-10-1 as the modeling structure of the neural network, the Levenberg-Marquardt training algorithm, the learning rate of 0.1 and population size of 150 as the optimal configuration. The coefficient correlation of training set and test set data calculated by the model indicates that the established neural network model has a high degree of suitability. Therefore, the neural network model combined with genetic algorithm has superiority as a reliable and efficient tool for predicting the reduction metallization rate of carbon-containing pellet in the RHF.
Metallurgical dust is one of the main sources of pollution in the steel production process and its recovery is of great environmental and economic value. It contains many elements that are valuable for recycling, such as iron, zinc, etc. As global environmental protection policies become more stringent, the effective treatment of it has attracted more attention. The efficient extraction of iron from it is of great practical importance. Among the technologies for the treatment of metal dust, the RHF process is considered to be one of the most effective treatment methods.1,2,3) The direct reduction process in the RHF uses carbon-containing pellet as raw material. Generally, blast furnace cyclone ash, steelmaking primary ash, dephosphorization ash, decarburization ash, etc. can be mixed as raw material in a certain proportion, and then prepared into pellet. Then the dried carbon-containing pellets are placed in the RHF, and after one rotation of the rotating hearth at a furnace temperature of about 1200°C, the iron oxides within the pellets will be directly reduced to valuable iron.4,5)
The internal reduction process of carbon-containing pellet is intricate in the direct reduction process of RHF. Establishing a heat transfer coupled chemical reaction pellet model is of great significance for understanding the reduction process. At the same time, the change of porosity with time in the pellet reduction process, the diffusion of internal gas, and the mass transfer at the boundary should also be considered. In addition, the factors affecting the direct reduction process of pellet also include the pellet size, shape, composition and furnace temperature. To study the direct reduction process of carbon-containing pellet, it is necessary to analyze its internal reduction mechanism,6) reduction process kinetics7) and heat transfer process,8,9) which are very complicated.
Many researchers have modeled and analyzed the direct reduction process of carbon-containing pellet in the RHF. Sun and Lu.10) established a pellet model considering the chemical reaction coupled heat and mass transfer in porous media, and found that the heat transfer process during pellet reduction is the main factor limiting the reduction rate. Liu et al.11) established a mathematical model of the carbon-containing pellet material layer in the RHF, taking into account the radiation shielding effect of the upper layer to the lower layer and the heating effect of the furnace. And the results showed that the staggered arrangement of carbon-containing pellet could increase the average temperature of the material layer, and the furnace temperature had little effect on the zinc removal rate. Wu et al.12) established a direct reduction process model of carbon-containing pellet in industrial-scale RHF. Studies had shown that the furnace temperature of the RHF preheating section was higher than that of the pilot-scale RHF, and the removal rate of heavy metals and alkali metal elements of the composite pellet was faster than that of iron metallization. Kumari et al.13) also established a mathematical model to study the reduction rate parameters and thermal efficiency of iron ore-coal composite pellet under multiple layers of pellet in RHF. Huang and Lu8) and Sun and Lu14) designed a new reactor to study the direct reduction process of iron ore powder and coal powder mixture, and found that the gas produced by the chemical reaction had an impact on heat and mass transfer. Halder and Fruehan15,16,17) used a RHF simulator to study the reduction test of multi-layer carbon-containing pellet, and the results showed that the reduction process of carbon-containing pellet was controlled by heat transfer at the initial stage, and later the reaction was limited by carbon gasification and the reduction reaction of iron oxides. Although many scholars have studied the direct reduction process of carbon-containing pellet, there are few studies on the rapid predicting the final metallization degree of the pellet.
Although mathematical modeling and commercial software calculations can provide detailed description of the complex reduction processes, the consideration of calculation conditions is also very important for efficient guidance of industrial production. Therefore, numerical methods such as artificial neural networks or genetic algorithms treating the process under “black box” consideration which can offset higher computational demands become promising. These methods use algorithms that can identify the combination between operating conditions (input parameters of the “black box”) and process output (such as the degree of metallization of carbon-containing pellet), ignoring the basic processes in unit operations, such as chemical reactions, material transfer, etc. Therefore, numerical methods usually describe the response of the system based on input parameters. However, the neural network must first “learn” the output response of the system due to input changes before it can be used in the prediction and optimization process.
In this study, the pellet reduction model will be firstly established and calculated, and the results data will be obtained for the neural network model. Finally a neural network prediction model will be established. Nevertheless, neural network suffers from some disadvantages such as getting trapped in local minima and slow rate of learning. Utilization of optimization algorithm such as genetic algorithm can significantly enhance the neural network performance.18,19) This study is focused on the establishing process of an effective neural network model combined with genetic algorithm in predicting the metallization rate of carbon-containing pellet. It is worth mentioning that a more practical, feasible and economical predictive model was obtained by relatively simple input and output data.
The reduction process of carbon-containing pellet is very complicated, involving heat transfer, mass transfer, and multi-phase chemical reactions. Many scholars did experimental research and mathematical modeling research on the direct reduction process of carbon-containing pellet. Due to different research purposes, the mathematical models were also different. This study takes the carbon-containing pellet as the research object, and a three dimensional mathematical model describing the direct reduction process of the carbon-containing pellet was established, in which heat and mass transfer coupling reduction reactions were taken into account, by adopting multiphysics software. The correctness of the model was verified by experimental results from reference.
2.1. Reaction of the Direct Reduction Process of Carbon-containing PelletIn the process of direct reduction of carbon-containing pellet, the heat is transferred from the furnace to the pellet surface by means of thermal radiation, and then through a comprehensive method of thermal radiation and conduction, the heat is gradually transferred from the surface to the pellet interior. At present, the direct reduction of carbon-containing pellet is mainly regarded as a two-step reaction mechanism. It is believed that the solid-solid direct reduction reaction of metal oxides and carbon particles only plays a role in initiating the reduction, and the reduction of pellet mainly depends on the intermediate products of CO and CO2. The direct reduction reactions of carbon-containing pellet are as follows:23)
Considering the complexity of the direct reduction process of carbon-containing pellet, before establishing the mathematical model, the following assumptions were adopted.
(1) Ignore the solid-solid direct reduction reaction, only the above five chemical reactions were considered;
(2) The carbon-containing pellet is made up of porous media, and gas and solid phases in the same position in the pellet reach local thermal equilibrium. Simultaneously, the gas and solid phase temperatures are equal at local position;
(3) The pellet is standard spherical, the pellet size and shape remains unchanged during the reduction process;
(4) The initial temperature, the spatial distribution of gas and solid phase components of the pellet are uniform;
(5) The re-oxidation of the pellet is not considered in the model.
The carbon-containing pellet spherical model was shown in Fig. 1. It’s radius is R0. The model was meshed and tested for grid independence. When the number of grids reached 8541, the calculation results kept unchanged. Therefore, the number of grids used in the following analysis is 8541.
Carbon-containing pellet physical model. (Online version in color.)
The control equations of the carbon-containing pellet direct reduction mathematical model include energy conservation equation, mass transfer equation. The radiation heat transfer boundary condition is adopted at the pellet surface, and the mass transfer boundary is set to convective mass transfer. Due to the chemical reaction, the direct reduction process is accompanied by variation in heat and mass in the pellet.
Energy conservation equation:
(1) |
(2) |
Among them:
(3) |
(4) |
In the equation, the θp is volume fraction, θp=1−εp, εp is the porosity. ρp is the solid phase density. cp,p is the solid phase constant pressure heat capacity. kp is the solid phase thermal conductivity. ρ is the gas phase density. cp is the gas phase constant pressure heat capacity. k is the gas phase thermal conductivity. For the effective thermal conductivity keff, power law calculation was used.
Mass transfer equation:
(5) |
(6) |
Among them:
(7) |
(8) |
According to the gas dynamics theory:20)
(9) |
According to the Millington Quirk model:21)
(10) |
In the equation ρ0 is the density of the pellet at the initial stage. Ri is the reaction source term. ωi is the mass fraction.
The initial condition is as follows.
At t=0 s, the temperature of the pellet is 293.15 K.The material components are as follows.
(11) |
The boundary condition is as follows.
When r=R0, the energy conservation equation is as the follows.
(12) |
Where Tamb is the outside temperature, ε is the pellet surface emissivity, σ is Stefan-Boltzmann constant (5.67 × 10−8 [W/(m2·K4)]).
When r=R0, the component transfer equation has the following formula.
(13) |
Among them, ρg,k,∞ is the concentration of component k in the mainstream gas. ρk is the concentration of substances on the surface of the pellet, and h is the surface convective mass transfer coefficient.
2.4. Determination of the Main Parameters in the ModelThe following is the calculation of the physical properties of the internal components of the pellet.
The density of gas or solid phase:
(14) |
Specific heat capacity of pellet at constant pressure:
(15) |
Thermal conductivity of pellet:13)
(16) |
The surface mass transfer coefficient is calculated by empirical formula:22)
(17) |
(18) |
Among them, v is the surface gas flow rate. D is the pellet diameter. μ is the aerodynamic viscosity.
Pellet reduction kinetics:
The direct reduction reaction mechanism of the carbon-containing pellet is very complicated. The Arrhenius formula was used to describe the kinetic model of the reduction reaction. For each chemical reaction, the reaction rate is:
(19) |
(20) |
Where ci indicates the molar concentration of the substance, the unit is mol·m−3.
2.5. Results and ValidationThe degree of metallization (DOM) of the carbon-containing pellet is defined as:
(21) |
Where FeM and FeT are the masses of metal iron and total iron after reduction.
The degree of metallization is of primary importance and has been studied extensively by experiments. The experimental results obtained by Liu23) was used to validate the physical pellet model and a reasonable correlation between calculated and experimental data has been obtained,24) as shown in Fig. 2.
Comparison of experimental value and calculated value at 1000°C and 1200°C. (Online version in color.)
Mathematical modeling is a powerful technique for understanding the behavior of a system and predicting its output variation with input parameters, especially for systems that predict linear behavior. However, in the case of a nonlinear system, it is difficult to understand and predict the response of output variations, and it is difficult to show the complex relationship between variables in the system.25) It is quite challenging to accurately predict the system by mathematical modeling. For this reason, several alternative methods have been developed to replace mathematical modeling for the predicting of engineering systems output with nonlinear behavior.
In recent years, the development of artificial intelligence has enabled system modeling to effectively predict nonlinear behavior. A large number of studies have shown that big data analysis and artificial intelligence modeling can accurately predict the results of various systems. For example, Kalogirou26) proved that the AI model could accurately predict the combustion process. In addition, Witek-Krowiak et al.27) presented a case study using neural network methods to model a bio-sorption system. Ochoae-Estopier et al.28) pointed out that the application of artificial intelligence to the modeling and optimization of engineering systems could reduce calculation time and improve accuracy. Machine learning is a branch of artificial intelligence that guides computers to learn from experience to imitate human behavior. Its algorithm uses computational methods to learn information directly from data, rather than relying on predetermined equations. When the number of samples (which can be used for learning) increases, it can appropriately improve the performance of the algorithm. Machine learning uses two different learning methods, namely supervised learning and unsupervised learning. Supervised learning is to train the model to predict future output based on known input and output data. Unsupervised learning is to find hidden patterns or unique structures in the input data. Supervised learning can generate reasonable predictions in response to new input data. It uses classification and regression techniques to develop predictive models.29)
Surrogate model for the mathematical model was established based on the artificial neural network. The motivation of using neural network model instead of the original mathematical model is to reduce the computation time in the prediction process and enhance work efficiency. In addition, it is particularly convenient to investigate the effect characteristics of different parameters on reduction performance, or perform sensitivity analysis by using neural network model to guide the actual production.
3.1. Introduction to the Artificial Neural Network ModelArtificial neural network is a nonlinear dynamic system and information processing system composed of a large number of simple neurons. Neurons in artificial neural networks are interconnected in a certain way, similar to biological neural networks, with learning, memory, induction and self-learning capabilities. Specifically, an artificial neural network is a learning algorithm inspired by human neural networks. It is based on nodes called artificial neurons, and signals are transmitted from one neuron to another. Neurons are linked together, and each link shows a certain weight. The weight indicates the degree of importance and it is distributed among the links. In addition, the artificial neural network is characterized in that its network architecture includes a certain number of layers, artificial neurons in each layer, activation functions of each layer, and training algorithms.30)
Among many artificial neural network models, the back propagation (BP) neural network model which is trained according to the error back propagation algorithm has been widely used in the field of data prediction.31) The BP neural network is a multi-layer feed forward neural network, and its topology is usually composed of a single-layer input layer, output layer and hidden layer. The hidden layer of the BP neural network can have one or more layers as needed, and each layer is composed of several neurons. Figure 3 shows the structure of a typical BP network. In the figure, x represents input data, a and c represent thresholds, w and v represent weights, y represents network output results, and f represents activation function. The artificial neural network is composed of several groups of neurons. In each neuron of the neural network, the input variables are weighted and summed, and then applied to a specific output function, called the activation function. The activation function provides the neural network with nonlinear modeling capabilities, enabling the neural network method to deal with highly nonlinear, non-constrained and non-convex systems. The output of this neural network model depends on the connection method, weight and activation function, which can be expressed as:
(22) |
The structure of BP neural network.
In the formula, f is the activation function, w is the weight value, x is the input vector, and a is the threshold.
3.2. BP Neural Network Prediction ModelThe reduction process of metallurgical dust carbon-containing pellet is carried out in a multi-factor coupling and complex furnace flue gas environment. The temperature in the furnace, the reduction reaction time of the carbon-containing pellet, the size of the carbon-containing pellet and the C/O of the ingredients of pellet will affect the final metallization rate of the carbon-containing pellet. In order to establish a concise and effective BP neural network model, it is necessary to select data that has an obvious causal relationship with the final metallization rate of carbon-containing pellet as the input variables of the network.
3.2.1. Data Set Extraction and Pre-processingIn this study, 123 pieces of carbon-containing pellet reduction process data obtained from different calculation conditions and some researchers’ experimental data are selected as the training samples and prediction samples for designing the network. In the original data, the inputs of the network include the temperature in the furnace, the reduction reaction time of the carbon-containing pellet, the size of the carbon-containing pellet, and the C/O of the pellet, a total of four important parameters. The output variable of the network is the final metallization rate of carbon-containing pellet. There are 123 groups of original sample data, and 1–100 groups are used as training samples, and 101–123 groups are used as test samples. Table 1 shows the training and testing sample data. This training data set is used for neural network training. The weights and thresholds of the neural network are systematically updated by the BP algorithm, so that the neural network can learn input and output data pairs. At the end of the training phase, the test data will be used to evaluate the prediction performance of the neural network.
C/O | SIZE (mm) | TEM (°C) | T (min) | DOM | C/O | SIZE (mm) | TEM (°C) | T (min) | DOM |
---|---|---|---|---|---|---|---|---|---|
1.236 | 22 | 1000 | 5.03 | 0.026 | 1.0 | 15 | 1250 | 7.67 | 0.625 |
1.236 | 22 | 1000 | 7.06 | 0.048 | 1.0 | 15 | 1250 | 9.86 | 0.703 |
1.236 | 22 | 1000 | 8.78 | 0.082 | 1.0 | 15 | 1250 | 12.69 | 0.760 |
1.236 | 22 | 1000 | 10.43 | 0.120 | 1.0 | 15 | 1250 | 15.66 | 0.782 |
1.236 | 22 | 1000 | 12.02 | 0.158 | 1.0 | 15 | 1250 | 18.36 | 0.792 |
1.236 | 22 | 1000 | 13.49 | 0.192 | 1.0 | 15 | 1250 | 21.26 | 0.798 |
1.236 | 22 | 1000 | 14.89 | 0.227 | 1.0 | 15 | 1250 | 24.81 | 0.801 |
1.236 | 22 | 1000 | 16.41 | 0.261 | 1.0 | 20 | 1250 | 2.96 | 0.095 |
1.236 | 22 | 1000 | 17.81 | 0.293 | 1.0 | 20 | 1250 | 4.57 | 0.306 |
1.236 | 22 | 1000 | 19.85 | 0.331 | 1.0 | 20 | 1250 | 5.99 | 0.470 |
1.236 | 22 | 1000 | 21.50 | 0.359 | 1.0 | 20 | 1250 | 7.15 | 0.580 |
1.236 | 22 | 1000 | 23.41 | 0.390 | 1.0 | 20 | 1250 | 8.57 | 0.681 |
1.236 | 22 | 1000 | 25.00 | 0.416 | 1.0 | 20 | 1250 | 10.57 | 0.773 |
1.236 | 22 | 1200 | 2.23 | 0.060 | 1.0 | 20 | 1250 | 12.82 | 0.826 |
1.236 | 22 | 1200 | 2.86 | 0.125 | 1.0 | 20 | 1250 | 15.98 | 0.858 |
1.236 | 22 | 1200 | 3.44 | 0.197 | 1.0 | 20 | 1250 | 18.30 | 0.871 |
1.236 | 22 | 1200 | 4.07 | 0.269 | 1.0 | 20 | 1250 | 20.43 | 0.877 |
1.236 | 22 | 1200 | 4.58 | 0.338 | 1.0 | 20 | 1250 | 22.55 | 0.880 |
1.236 | 22 | 1200 | 5.60 | 0.448 | 1.0 | 20 | 1250 | 24.74 | 0.883 |
1.236 | 22 | 1200 | 6.36 | 0.520 | 1.2 | 25 | 1250 | 2.78 | 0.041 |
1.236 | 22 | 1200 | 7.25 | 0.588 | 1.2 | 25 | 1250 | 4.29 | 0.193 |
1.236 | 22 | 1200 | 8.97 | 0.695 | 1.2 | 25 | 1250 | 5.56 | 0.351 |
1.236 | 22 | 1200 | 11.20 | 0.783 | 1.2 | 25 | 1250 | 7.43 | 0.550 |
1.236 | 22 | 1200 | 13.17 | 0.824 | 1.2 | 25 | 1250 | 9.96 | 0.740 |
1.236 | 22 | 1200 | 15.33 | 0.855 | 1.2 | 25 | 1250 | 12.68 | 0.853 |
1.236 | 22 | 1200 | 17.43 | 0.871 | 1.2 | 25 | 1250 | 16.12 | 0.909 |
1.236 | 22 | 1200 | 19.59 | 0.881 | 1.2 | 25 | 1250 | 19.81 | 0.935 |
1.236 | 22 | 1200 | 22.39 | 0.887 | 1.2 | 25 | 1250 | 22.83 | 0.947 |
1.236 | 22 | 1200 | 24.94 | 0.891 | 1.2 | 25 | 1250 | 24.88 | 0.949 |
1.0 | 25 | 1250 | 3.29 | 0.076 | 1.0 | 25 | 1250 | 3.14 | 0.065 |
1.0 | 25 | 1250 | 4.25 | 0.177 | 1.0 | 25 | 1250 | 3.93 | 0.145 |
1.0 | 25 | 1250 | 4.83 | 0.246 | 1.0 | 25 | 1250 | 4.77 | 0.243 |
1.0 | 25 | 1250 | 5.41 | 0.322 | 1.0 | 25 | 1250 | 5.92 | 0.389 |
1.0 | 25 | 1250 | 5.99 | 0.385 | 1.0 | 25 | 1250 | 7.19 | 0.526 |
1.0 | 25 | 1250 | 6.57 | 0.448 | 1.0 | 25 | 1250 | 8.76 | 0.660 |
1.0 | 25 | 1250 | 7.28 | 0.527 | 1.0 | 25 | 1250 | 11.59 | 0.808 |
1.0 | 25 | 1250 | 8.51 | 0.631 | 1.0 | 25 | 1250 | 13.77 | 0.864 |
1.0 | 25 | 1250 | 9.92 | 0.726 | 1.0 | 25 | 1250 | 16.06 | 0.900 |
1.0 | 25 | 1250 | 11.73 | 0.808 | 1.0 | 25 | 1250 | 19.26 | 0.917 |
1.0 | 25 | 1250 | 13.53 | 0.858 | 1.0 | 25 | 1250 | 22.52 | 0.923 |
1.0 | 25 | 1250 | 15.66 | 0.893 | 1.0 | 25 | 1250 | 24.70 | 0.926 |
1.0 | 25 | 1250 | 17.65 | 0.905 | 0.8 | 25 | 1250 | 3.44 | 0.089 |
1.0 | 25 | 1250 | 19.78 | 0.915 | 0.8 | 25 | 1250 | 4.65 | 0.229 |
1.0 | 25 | 1250 | 21.78 | 0.921 | 0.8 | 25 | 1250 | 5.62 | 0.333 |
1.0 | 25 | 1250 | 23.39 | 0.921 | 0.8 | 25 | 1250 | 7.00 | 0.475 |
1.0 | 25 | 1250 | 24.87 | 0.924 | 0.8 | 25 | 1250 | 8.27 | 0.582 |
1.0 | 30 | 1250 | 3.87 | 0.076 | 0.8 | 25 | 1250 | 10.57 | 0.713 |
1.0 | 30 | 1250 | 4.77 | 0.161 | 0.8 | 25 | 1250 | 13.47 | 0.787 |
1.0 | 30 | 1250 | 5.93 | 0.290 | 0.8 | 25 | 1250 | 17.63 | 0.801 |
1.0 | 30 | 1250 | 6.83 | 0.391 | 0.8 | 25 | 1250 | 21.86 | 0.795 |
1.0 | 30 | 1250 | 7.60 | 0.473 | 0.6 | 25 | 1250 | 3.26 | 0.062 |
1.0 | 30 | 1250 | 8.57 | 0.555 | 0.6 | 25 | 1250 | 4.11 | 0.139 |
1.0 | 30 | 1250 | 9.79 | 0.656 | 0.6 | 25 | 1250 | 4.89 | 0.223 |
1.0 | 30 | 1250 | 11.60 | 0.770 | 0.6 | 25 | 1250 | 5.86 | 0.312 |
1.0 | 30 | 1250 | 13.72 | 0.849 | 0.6 | 25 | 1250 | 7.61 | 0.446 |
1.0 | 30 | 1250 | 19.33 | 0.934 | 0.6 | 25 | 1250 | 10.08 | 0.526 |
1.0 | 30 | 1250 | 21.91 | 0.940 | 0.6 | 25 | 1250 | 12.74 | 0.531 |
1.0 | 30 | 1250 | 24.87 | 0.946 | 0.6 | 25 | 1250 | 16.30 | 0.534 |
1.0 | 15 | 1250 | 2.45 | 0.098 | 0.6 | 25 | 1250 | 19.02 | 0.533 |
1.0 | 15 | 1250 | 3.29 | 0.205 | 0.6 | 25 | 1250 | 22.64 | 0.530 |
1.0 | 15 | 1250 | 4.06 | 0.309 | 0.6 | 25 | 1250 | 24.58 | 0.530 |
1.0 | 15 | 1250 | 5.35 | 0.451 |
In the process of training neural network, if there is a significant difference in the value of the variable in the sample set, the error transfer method is likely to conceal the role of the smaller amplitude variable in correcting the weight. Therefore, before training the neural network, the input and output parameters need to be normalized. In the existing normalization method, the min-max method can maintain the relationship between the variables in the original data set without losing the generalization ability. At the same time, in order to speed up the convergence speed of the program and ensure that all data information can effectively affect the final metallization calculation results of the carbon-containing pellet, the data needs to be unified and normalized after the network input parameters are selected. The mapminmax function in Matlab software was used to normalize all the data to the range of 0–1. The calculation formula is:
(23) |
Where, ymax and ymin are setting parameters, set to 1 and 0 respectively. x is the data to be normalized. xmax and xmin are the maximum and minimum values in the normalized data, respectively.
3.2.2. Neural Network Structure and Training ParametersThe choice of neural network structure directly affects the prediction performance of the network. In the application of function fitting and system identification, the multilayer feedforward hidden layer network structure is the most commonly used. Among them, the three-layer feedforward network structure which has the single hidden layer network is currently the most widely used, and its prediction performance is also good. In this study, a three-layer feedforward neural network was proposed to predict the final metallization rate of the carbon-containing pellet.
The prediction performance of the neural network depends on the number of hidden layer neurons and the choice of the activation function. For the determination of the number of neurons, under the premise of meeting the accuracy requirements, the trial and error method is usually used to evaluate the number of neurons. For activation functions, BP networks generally use S-shaped functions, such as tangent function, logarithmic function, etc. For the learning rate, a higher learning rate will reduce the stability of the system, while a lower learning rate will reduce the convergence speed and increase the probability of the network falling into a local minimum. Training algorithms include the Levenberg-Marquardt algorithm (trainlm), proportional conjugate gradient algorithm (trainscg), elastic BP algorithm (trainrp), adaptive learning rate algorithm (traingdx), etc. It can be seen in section 3.2.5. for specific comparative analysis.
3.2.3. Optimization of Weights and Thresholds Based on Genetic AlgorithmGenetic algorithm (GA) has been introduced to optimize the neural network in recent years.32) The common drawbacks of the back-propagation algorithm known as the low speed of convergence and the possibility of being trapped in a local minimum have been solved by the use of genetic algorithm. GA is a computational model that simulates the biological evolution process of natural selection and genetic mechanism of Darwin’s biological evolution theory, and is a method of searching for the optimal solution by simulating the natural evolution process. Its main feature is to directly operate on structural objects, without the limitation of derivation and function continuity. It has inherent implicit parallelism and better global optimization capabilities. It adopts a probabilistic optimization method and does not require definite rules. It can also automatically obtain and guide the optimized search space, and adjust the search direction adaptively. GA targets all individuals in a group, and uses randomization technology to guide an efficient search of a coded parameter space. Among them, selection, crossover and mutation constitute the genetic operation of the GA. Five elements including parameter coding, initial population setting, fitness function design, genetic operation design, and control parameter setting constitute the core content of the GA.33)
The gradient descent method is used to continuously update the weights and thresholds of the traditional BP neural network. However, improper selection of initial weights and thresholds will cause the BP neural networks to be easily felt into local minimums and slow convergence speed. GA has the characteristics of global optimization and fast convergence speed. Therefore, GA was introduced into the BP neural network model to optimize search weights and thresholds and improve the prediction performance of the neural network.
In the beginning, assuming that the population size is N, randomly generate N individuals with different initial weights and thresholds to form the initial population, and then code the initial population to form a genetic coding chain. The population size directly affects the optimization effect of the GA. Too small group size may lead to the generation of disease genes, while too large group size may reduce convergence of the algorithm. The population size assessment was carried out.
According to the GA, the higher the fitness of the individuals in the group, the better the performance and the greater the possibility of becoming parents. In the optimization process, the fitness of the individual is continuously improved, and the error norm is continuously reduced. Therefore, the reciprocal of the error norm of the neural network is used as the fitness function of the GA.
The fitness calculation method of each chromosome is as follows.
(24) |
In the formula, ti is the actual value of the metallization rate of carbon-containing pellet. oi is the predicted value of the metallization rate of carbon-containing pellet. n is the number of prediction samples.
According to the fitness function value, an optimization operation was performed in the genetic space, and a certain proportion of individuals with a larger fitness function value were selected to enter the next generation. According to the genetic strategy, the population was crossed and mutated to form the next generation population. Then determine whether the number of iterations has been completed, or whether the performance of the group meets a certain index. If not, iterate should be continue to or modify the genetic strategy, and re-execute selection, crossover, and mutation operations. When the termination condition is met, the best initial weight and threshold are obtained. Figure 4 is a flowchart of a neural network weight and threshold optimization method based on the GA.
The flow chart of neural network weights and thresholds optimized by genetic algorithm in MATLAB. (Online version in color.)
According to the created BP neural network, the prediction samples were simulated and tested for errors. The root mean square error (RMSE), the coefficient of determination (R2) and mean absolute percentage error (MAPE) were used to evaluate the prediction performance of neural networks. The specific calculation formulas are as follows.
(25) |
(26) |
(27) |
The determination of the optimal configuration of neural network structure is an important step in the research of neural network prediction. The optimal configuration depends on the factors, including the number of hidden neuron, training algorithm, learning rate and group size.
Firstly, the number of neuron in the hidden layer was tested. With the increasing of neurons number, the prediction accuracy of the neural network was obviously improved, but the complexity of the network also increased, and it may be easy to fall into the phenomenon of overfitting. Therefore, a trial and error method was used to determine the optimal number of neuron. Compared with increasing the number of hidden layer, increasing the number of neuron in the hidden layer can quickly and effectively improve the training accuracy of the network. At present, there is no clear regulation on the selection of the number of neuron in the hidden layer. The number of hidden layer neuron n generally needs to be determined based on the empirical formula designed by the predecessors, and then compared through tests, and finally selected when the prediction error reaches the smallest. The empirical formula summarized by the research is as follows.34)
(28) |
Where m is the number of neurons in the input layer. d is the number of neurons in the output layer. v is a constant (1–10).
In this study, the input parameter has 4 variables and the output parameter has 1 variable. After calculating by the above formula, the number of neuron in the hidden layer was obtained to be 3–13. According to calculations under different number, the prediction situation is shown in Table 2. The BP neuron network has different performance on RMSE and R2. The impact is shown in Fig. 5. It can be seen that with the increasing of the number, the RMSE first decreases and then increases, R2 gradually increases and then decreases, and finally they become stable. When it is equal to 10, RMSE is the smallest and R2 is the largest. In order to ensure the prediction accuracy of the neural network, it is selected to be 10. Therefore, the topology of the BP neuron network is set to 4-10-1, as shown in Fig. 6.
Number of neurons | RMSE | R2 |
---|---|---|
3 | 0.0261 | 0.99666 |
4 | 0.02451 | 0.99567 |
5 | 0.01895 | 0.99569 |
6 | 0.01963 | 0.99683 |
7 | 0.01992 | 0.99647 |
8 | 0.01937 | 0.99923 |
9 | 0.01439 | 0.99935 |
10 | 0.00802 | 0.9998 |
11 | 0.01082 | 0.9996 |
12 | 0.01848 | 0.99891 |
13 | 0.01982 | 0.99862 |
The effect of neuron number on RMSE and R2. (Online version in color.)
BP neural network structure of the prediction model. (Online version in color.)
Secondly, a comparative study of different training algorithm was carried out. The Levenberg-Marquardt (trainlm) algorithm has a high convergence rate, and in many cases it can obtain a lower RMSE than other algorithms. The scale conjugate gradient back propagation algorithm (trainscg) and the gradient descent algorithm with elastic back propagation (trainrp) are commonly used for large network training algorithm. In addition, the adaptive learning rate algorithm (traingdx) is very suitable for some special problems that require slow convergence to the target. The error comparison results of different training algorithm are shown in Fig. 7. It can be seen that the Levenberg-Marquart algorithm has the lowest RMSE value and the highest R2 value. Therefore, the Levenberg-Marquart algorithm is selected as the training algorithm of the neural network.
The effect of training algorithm on RMSE and R2. (Online version in color.)
In addition, the influence of different learning rate was also studied. Figure 8 shows the effect of learning rate on RMSE and R2. It can be seen that when the learning rate is 0.1, it can ensure that the RMSE value is small, and the R2 value is also high, indicating that the neural network model has a higher prediction accuracy when the learning rate is 0.1.
The effect of learning rate on RMSE and R2. (Online version in color.)
Then, the effect of population size was studied, results were shown in Fig. 9. It can be seen that as the population size increases, the RMSE tends to decrease firstly and then increase. In general, increasing the population size will help improve the prediction accuracy. When the population size is 150, the RMSE and R2 values of the neural network model are 0.00248 and 0.99969, respectively, which has better predictive performance.
The effect of population size on RMSE and R2. (Online version in color.)
In addition, the tan-sigmoid function (tansig) and the linear transfer function (purelin) were respectively selected as the activation functions of the hidden layer and the output layer. According to the comparative analysis of the above parameters, the optimal configuration of the neural network model is obtained, as shown in Table 3. Furthermore, GA is operated with real code, initial population size of 150 and genetic algebra of 100. The selection function is normGeomSelect, the cross function is arithXover, the mutation function is nonUnifMutation, and the others use default parameters of GAOT toolbox.
Parameter name | Parameter setting |
---|---|
Training function | trainlm |
Hidden layer activation function | tansig |
Output layer activation function | purelin |
Number of learning iterations | 1000 |
Network learning accuracy | 0.00001 |
According to the optimal configuration of the neural network, combined with the DOM data as shown in Table 1, the prediction performance of the neural network was verified. Figure 10 shows the comparison between the predicted DOM and the actual value. The results show that the DOM value predicted by the neural network and optimized by the GA is in good agreement with the sample data.
Performance of neural network optimized by genetic algorithm in DOM prediction (training data, test data). (Online version in color.)
In order to quantitatively evaluate the prediction performance, the RMSE, R2 and MAPE of the DOM prediction results were calculated respectively. The predicted value and error of neural network model optimized by GA are shown in Table 4 and the error visual map is shown in Fig. 11. It can be counted that for the predicted sample data, RMSE and MAPE are 0.051% and 0.46% respectively, reflecting a very small deviation between the predicted value and the actual value. R2 is 0.99965 which indicates that the established neural network model has a high degree of fit.
DOM actual value | DOM predicted value | Error (%) | DOM actual value | DOM predicted value | Error (%) |
---|---|---|---|---|---|
0.9494 | 0.9562 | 0.68 | 0.9243 | 0.9247 | 0.18 |
0.6814 | 0.6829 | 0.15 | 0.3118 | 0.3136 | −0.44 |
0.0597 | 0.0468 | −1.29 | 0.4511 | 0.4467 | −0.08 |
0.2461 | 0.2526 | 0.65 | 0.0478 | 0.047 | 1.04 |
0.713 | 0.7091 | −0.39 | 0.7035 | 0.7139 | −1.09 |
0.8235 | 0.8249 | 0.14 | 0.5259 | 0.515 | 0.76 |
0.4156 | 0.4162 | 0.06 | 0.8013 | 0.8089 | −0.47 |
0.4476 | 0.4444 | −0.32 | 0.8999 | 0.8952 | 1.06 |
0.3589 | 0.3595 | 0.06 | 0.0946 | 0.1052 | −0.14 |
0.0823 | 0.0793 | −0.3 | 0.7918 | 0.7904 | 0.13 |
0.5195 | 0.5172 | −0.23 | 0.9211 | 0.9224 | 0.18 |
0.2226 | 0.2231 | 0.05 |
Predicted error of test data set. (Online version in color.)
Figure 12 shows the changes in the MSE and fitness value of the neural network optimized by the GA during the iteration process. It can be seen that when the iteration reaches the 96th generation, the mean square error reaches the minimum and the fitness function value have stabilized.
MSE and fitness in the iterative process of neural network optimized by genetic algorithm. (Online version in color.)
In addition, Fig. 13 shows the comparison between test data set and predicted metallization rate and it is obvious that the neural network model combined with GA has high accuracy and reliability for predicting the reduction metallization rate of carbon-containing pellet.
Comparison between test data set and predicted metallization rate. (Online version in color.)
Sensitivity analysis was also performed by analyzing the “causal” relationship between input variables and modeling output, and the contribution of independent variables to system performance was evaluated. There are multiple algorithms for sensitivity analysis. For example, Ibrahim35) compared seven methods including connection weighting algorithm, multiple linear regression, and dominance analysis. Olden et al.36) also compared methods such as connection weight algorithm, Garson algorithm, partial derivative and input disturbance. In this study, the improved Garson algorithm37) and the connection weight method38) were used to evaluate the relative importance of input variables in the pellet reduction process, as shown in formula (29) and formula (30). The expression is as follows.
(29) |
(30) |
In the formula, RIx is the relative importance of the input variables. n is the number of input neurons. m is the number of hidden neurons. wxy is the weight of the connection between the input neuron x and the hidden neuron y. wyz is hidden connection weight between neuron y and output layer neuron z.
The weights and thresholds of the model were shown in Table 5. The results show that all connection weights are positive or negative, and there is an offsetting effect. In order to avoid this situation, in the improved Garson algorithm, all connection weights were calculated based on their absolute values. In addition, these two methods are evaluated based on the final weights obtained from training, and the final weights are affected by the changes in the initial weights.
Hidden neurons | Connection weights | Bias | |||||
---|---|---|---|---|---|---|---|
C/O | Size | Temperature | Time | Output | b1 | b2 | |
1 | −0.125 | 0.810 | −0.178 | −0.466 | 0.847 | −0.845 | −0.151 |
2 | 0.494 | 0.377 | −0.364 | 0.206 | −0.323 | 0.013 | |
3 | −0.056 | −0.406 | −0.914 | 0.703 | 0.266 | −0.328 | |
4 | 0.335 | −0.267 | −0.136 | −0.013 | 0.383 | 0.751 | |
5 | 0.410 | −0.508 | −0.570 | −0.788 | −0.779 | 0.234 | |
6 | −0.618 | −0.650 | −0.836 | 0.677 | −0.534 | −0.287 | |
7 | −0.500 | 0.478 | 0.672 | −0.241 | −0.198 | −0.115 | |
8 | 0.372 | −0.868 | 0.323 | −0.175 | 0.633 | −0.411 | |
9 | 0.025 | −0.767 | 0.286 | 0.782 | 0.830 | 0.218 | |
10 | −0.498 | 0.229 | −0.259 | 0.459 | 0.538 | −0.059 |
As shown in Table 6, the relatively important variables were sorted. The results show that the reduction reaction time and temperature have the greatest impact on the reduction of carbon-containing pellet, followed by the larger pellet size and the carbon-oxygen ratio of the pellet. In the connection weight method, the higher the absolute value of relative importance, the more significant the impact of variables on system performance. Combining the training data, it can also be known that the reduction time and temperature of carbon-containing pellet have a significant impact on the metallization rate of carbon-containing pellet. When the reduction time and reduction temperature are large, the metallization rate of carbon-containing pellet will increases significantly; and insufficient reduction time and temperature will seriously affect the metallization rate of pellet, so a shorter reduction time and lower reduction temperature should be avoided.
Modified Garson’s algorithm | Connection Weight Approach | ||
---|---|---|---|
Relative importance | Rank | Relative importance | Rank |
0.338 | 1 | 0.806 | 1 |
0.263 | 2 | 0.732 | 2 |
0.212 | 3 | −0.061 | 3 |
0.187 | 4 | −0.053 | 4 |
In this work, the carbon-containing pellet reduction process was modeled and a neural network surrogate model was established based on the numerical results. The conclusions were summarized as follows.
(1) A reduction model of carbon-containing pellet was established and verified by experiments from reference. The reduction model was calculated to obtain the results under various working conditions in order to provide training and test data for the establishment of the neural network model.
(2) The temperature in the furnace, reduction reaction time of the carbon-containing pellet, size and the C/O ratio of the pellet were considered as the input parameters of the model. The final metallization of the carbon-containing pellet was regarded as the output parameter. Comparative analysis of the number of hidden layer neurons, training algorithm, learning rate, and population size was carried out. Furthermore, the initial weight and threshold of the neural network were optimized by GA, which improved the prediction accuracy of the model.
(3) Combined with the DOM data, the prediction performance of the neural network was tested and verified. The results show that RMSE and MAPE were 0.051% and 0.46%, respectively, reflecting a very small deviation between the predicted value and the actual value. R2 is 0.9997, indicating that the established neural network model has a high degree of fit. The neural network model combined with the GA has superiority as a reliable and efficient tool for predicting the reduction metallization rate of carbon-containing pellet.
(4) According to the weights of the model, the importance of the input parameters was analyzed. It is obtained that the reduction reaction time and temperature have the greatest impact on the reduction of carbon-containing pellet, followed by the pellet size and the pellet carbon-oxygen ratio. The result is basically consistent with the actual situation shown in experimental and simulation studies.