2024 Volume 64 Issue 14 Pages 1976-1987
As a critical parameter in blast furnace production, the coal injection rate is not only related to the stability of furnace condition, but also a vital index for evaluating production economy. In most of the blast furnaces, this parameter is often determined by the operator’s experience. This paper establishes a coal injection rate prediction model based on the Catboost (category gradient boosting algorithm), which can provide a better basis for operators to control the parameter. At first, the collected steel production data were processed, the last time operational parameters that had greater impact on the coal injection rate were selected out as the input of the model, and the current time coal injection rate was used as the single output of the model. Next, the Catboost model was quoted, and the Optuna optimization algorithm based on the Bayesian principle was used to optimize the Catboost model (BO-Catboost), enhancing the model’s capabilities and avoiding over-fitting phenomenon. Then, the effects of the Catboost model under different optimization algorithms were compared, and the prediction results of the BO-Catboost model were compared with the predictions of the ordinary Catboost, BO-Random Forest and BO-XGboost (Extreme Gradient Boosting) model. The results show that the BO-Catboost model is better than other models. Finally, a blast furnace coal injection monitoring system based on Web technology was established, which can display the coal injection prediction information on the board, the test shows that it has a certain guidance for the control of the coal injection rate.
The blast furnace iron smelting process is a nonlinear system that cannot be monitored on-site, and is also known as the “black box model” because its uncertainty and high dimensionality. Blast furnace coal injection rate is of great significance to ensure the stability of furnace condition, production economy and the quality of molten iron, but due to the opacity of the process and the lagging of the monitoring data, the coal injection rate of most domestic blast furnaces depends on the operator’s experience, which is highly subjective and ambiguous, resulting in unstable furnace condition and high production cost in the smelting process. Therefore, it is necessary to make timely and accurate prediction of coal injection rate during blast furnace production according to the actual processing situation of blast furnace. In recent years, with the development of artificial intelligence and machine vision industry, many researchers have started to refer various mainstream prediction models to predict the parameter.
Hou Jia1) used support vector machine to construct a coal injection prediction model based on furnace temperature trend. Compared with BP neural network, support vector machine overcomes the problems of BP neural network such as easy to fall into the local optimal solution and slow convergence speed, so as to improve the prediction accuracy. However, the recognition ability of support vector machines is greatly influenced by its own parameters, so there may be large fluctuations in prediction accuracy. The French Institute of Iron and Steel Smelting started from the research on the smelting mechanism of the blast furnace itself and proposed the Wu index mathematical model based on the chemical reactions in the blast furnace. This model was calculated based on data such as blast furnace top gas composition and pig iron content. Finally, the thermal balance furnace thermal index of the high-temperature area was obtained.2) Shan et al.3) established a blast furnace coal injection quantity prediction method based on improved PSO-optimized ELM and compared the prediction results of this model with the Extreme Learning Machine model by using part of the production data of the blast furnace. When furnace conditions fluctuate, the prediction accuracy is slightly higher than the single ELM model. S Li et al.4) established a blast furnace coal ratio prediction model based on fuzzy clustering and grid search optimized support vector regression. The average absolute error of the prediction result data is 1.77 kg/t, the hit rate within 0.5% error is 81.19%, the coefficient of determination R2 is 0.9474, and the prediction performance is superior to that of ridge regression and decision tree regression. Fan et al.5) proposed a classification method for polarized SAR images based on support vector machine and energy minimization. Zhou et al.,6) proposed an optimized prediction model for blast furnace Coal injection quantity based on BP neural network and genetic algorithm, which improved the operating parameters, but the model needs to set too many optimization target parameters during training.
In this paper, a more effective and robust categorical gradient boosting algorithm model (Catboost) is proposed for the prediction of coal injection rate, which effectively optimizes the problems of insufficient model scalability and cumbersome parameter adjustments in the existing researches, and innovatively develops the corresponding system to make a prediction of this parameter one hour in advance. After a large number of experimental validations, the new fusion model greatly reduces the possibility of overfitting while ensuring that all datasets can be used for learning. Combining the metallurgical process with the experience of the workers, a preliminary screening of the more important parameters, through a series of data processing processes, the accuracy and efficiency of the data used in the model are ensured, and the optimal parameters are finally determined through multiple Bayesian parameter evolutionary operations, resulting in an R2 of 92.1% and a significant reduction in the prediction time used.
Catboost (Categorical Boosting) is a machine learning algorithm based on the GBDT framework that was open-sourced in 2017 by Russia’s search giant Yandex, it uses Symmetric binary decision trees as the base learner, which can efficiently and rationally process category-based features with fewer parameters.7) In addition, Catboost effectively solves the problems of Gradient Bias and Prediction shift thanks to the coding method-Ordered Boosting, which reduces the risk of over-fitting and ensures that all datasets can be used for learning as much as possible, resulting in better robustness and generalizability, Easier to use and more practical.8)
2.1.1. GBDT Algorithmic FrameworkFirst, assign the same weight to each sample in the training set, build a new decision tree model according to the gradient descent method in each iteration, and add it to the model set. Update the weights of each training sample set based on the classification results of each sample on the current model set. When a sample is misclassified, its weight is increased, so that in the next iteration, the sample will be utilized to a greater extent. By continuously building a decision tree in the direction of gradient reduction, a strong learner is eventually formed.9)
2.1.2. Symmetric Binary Decision TreesOne of the major innovations of the Catoost algorithm is that it employs the construction of symmetric decision trees10) during the training process, and then builds the model by combining multiple decision trees. As shown in Fig. 1, in the first step, the decision tree only does the splitting operation and in each remaining step, the leaves of the previous tree are split according to different features using the same conditions, and the feature with the lowest loss is selected for splitting. and use it for all levels of nodes. On the one hand, this balanced tree structure can be used as regularization to prevent over-fitting, and on the other hand, it can also speed up prediction to a certain extent.
In order to reduce the over-fitting phenomenon caused by the complexity of the model, the Catboost algorithm introduces the Ordered Boosting method. The samples are randomly disrupted before training, each sample only uses the samples ranked before it to train the model, and then the trained model is used to calculate the first-order and second-order gradients of the sample prediction results. For each sample, train a separate model, use the model to estimate the gradient of the sample, and then use these gradients to construct a tree structure. Each leaf node of the tree is calculated using whole samples, finally use this gradient to train the base learner, which effectively reduces the gradient estimation error and mitigates the phenomenon of prediction bias.
In the algorithm, the average value of data labels is used as the standard for node splitting:11)
(1) |
(2) |
Where,
Although this is a common feature encoding method, its disadvantage is that features contain more information than labels, which can easily lead to the problem of conditional bias. To solve this problem, the Catboost algorithm adds the priori terms and weight coefficients:
(3) |
Where, p is the prior term, for the regression task, the average value of labels is calculated as the prior value, and for the classification task, the occurrence probability of the positive class is used as the prior value. a is the prioritized weight coefficient, which is a smoothing operation used to effectively reduce the impact of low-frequency feature parameters.
In addition, the Catboost algorithm can adaptively control the learning rate, which helps the algorithm to better control the contribution of weak learners in each iteration, thus accelerating the improvement of model accuracy.12) The formula is as follows:
(4) |
where, t is the number of iterations, ηt is the learning rate of the t-th iteration, and αt is the average learning rate of the previous t iterations.
2.2. Optuna Hyperparameter OptimizationOptuna is an automated hyperparameter optimization framework proposed by the Japanese deep learning company PerferredNet-works, which can dynamically construct a hyperparameter search space and achieve global optimization of model hyperparameters. It includes grid search method, random search method, and Bayesian optimization algorithm,13) which can prune experiments with poor results to improve efficiency.
After comparison, Bayesian optimization principle is used for this optimization process:
(5) |
(6) |
(7) |
where, D1:n is the known training set, p(f|D1:n) is the probability of occurrence of training data-set D1:n in case f, p(D1:n|f) is the likelihood distribution of y, p(f) is the prior probability of f, p(D1:n) is the marginal likelihood distribution of f for hyperparameter optimization, xn is the decision vector, yn is the observation value, n is the number of iterations, and εn is the error value.14)
At each iteration, a new hyperparameter vector xn+1 needs to be chosen, and in order to select the optimal xn+1, its corresponding posterior probability distribution needs to be computed:
(8) |
After calculating the posterior probability distribution p(fn+1|D1:n), the search direction can be updated by expecting the hyperparameter value xn+1 that minimizes the value of the objective function.15) By this way it is gradually iterated until the optimal parameter combination is obtained.
In view of the above advantages, Bayesian optimization can dynamically adjust the parameter space using previous search results, and select the next parameter combination in each iteration based on the current model performance, so as to effectively explore and utilize the parameter space and find a balance between the known optimal regions to avoid falling into overfitting. So choosing it allows the Catboost model to select hyperparameters in a better and more efficient way, improving model performance.
2.3. Forecasting ProcessThe process of predicting the blast furnace coal injection rate by BO-Catboost is shown in Fig. 2.
Due to the non-linearity of the smelting process and the influence of various external factors, the production data of the blast furnace are often incomplete, and even erroneous data will appear when the sensor is damaged.16) If the collected data is directly used for experiments, it will not be able to train the model accurately, so it is necessary to process the collected data before the test, in order to achieve better experimental results.
3.1. Data ScreeningThis paper uses the production data of a blast furnace in a domestic steel plant from July 2022 to May 2023, and the collected data mainly contains charging data, operation data, iron output data, monitoring data, et.al, totaling 109 parameters, because of the different measurement frequencies between the data, it is necessary to use Python to unify the data frequency into hourly frequency, categorize the data within 90 min as the first hour, and more than 90 min as the next hour, and finally get the 24 h time series data. The main data categories are shown in Table 1, the all data will be divided according to 80% training set and 20% test set.
Charging data | Operational data | Iron output data | |||
---|---|---|---|---|---|
fuel ratio | wind volume | actual wind speed | 7–18 difference in temperature | [Si] quantity | R2 |
net air volume | furnace gas index | seal chamber temperature | [C] quantity | R3 | |
thermal load | wind temperature | furnace gas | seal chamber N2 | [Mn] quantity | R4 |
wind pressure | edge temperature | net gas content | [P] quantity | Mg/Al | |
metallurgical cycle | Total oxygen | Z-value | barren gas quantity | [S] quantity | CO quantity |
oxygen enrichment rate | W-value | bottom flow | [Ti] quantity | CO2 quantity | |
today’s batch | top temperature | H-value | blast kinetic | [SiO2] quantity | H2 quantity |
top pressure | bottom temperature difference | seal chamber water volume | [CaO] quantity | N2 quantity | |
coal injected rate | K-value | upper differential pressure | net gas pressure | [MgO] quantity | [S] standard deviation |
index | 1–6 flux | barren gas pressure | [MnO] quantity | [Si] standard deviation | |
pressure difference | 1–6 difference in temperature | breathability Index | [FeO] quantity | CO2/CO+CO2 | |
standard wind speed | 7–18 flux | cooling air temperature | [Al2O3] quantity | CH4 quantity |
In order to improve the prediction performance of the model and avoid digging out the wrong rule due to the missing data, more than 70 parameters were retained in a more complete way after excluding the monitoring parameters of different orientations, hourly material batches, cylinder brick temperatures, and the parameters with large missing rate and low reference significance with reference to the relevant research literature and combined with the production experience.
3.2. Blank Values ProcessingThe main reason for the occurrence of blank values is the failure of data collection equipment or the loss of data caused by manual errors. Blank values are usually filled by interpolation, including Lagrangian interpolation and Newtonian interpolation. The principle is mainly to use the points on the left and right sides of the blank value to perform curve fitting, then find the value at the vacant point. This method is simple and easy to operate, usually more accurate.
3.3. Outliers ProcessingOutliers are values that significantly out of phase with the surrounding data due to fluctuations in furnace conditions during production, subjective factors in manual operation, aging and damage to sensors.17) This article uses the box plot method to detect and process outliers.18) The box plot consists of five main numerical points: minimum, maximum, first quartile (Q1), median (Q2) and third quartile (Q3), Q1, Q2 and Q3 accounted for 25%, 50% and 75% of the all date respectively. In the box plot, the upper and lower edges of the box correspond to the third quartile and the first quartile. The length of the box is the IQR, which intuitively displays the degree of data fragmentation. Outliers are usually the values less than Q1–1.5IQR or greater than Q3+1.5IQR. In the model, outliers greater than Q3+1.5IQR and less than Q1–1.5IQR are first deleted, after elimination, the Lagrangian interpolation method is used to fill in the missing data, the data after outlier processing will be input into the next step of standardization process. Figure 3 shows an example of the box-plot method for dealing with outliers in the coal injection rate (unit: kg/thm).
Because blast furnace data will produce different dimensions according to different calculation methods, order-of-magnitude differences can over-exaggerate or reduce certain data, or even obliterate it.19) Therefore, it is necessary to normalize the data, so that the data are transformed into pure dimensionless data. In this paper, we use the method of extreme difference normalization, which is usually not easy to change the structural characteristics of the data:20)
(9) |
Where, Xn is the normalized data; X is the original data; Xmin and Xmax are the minimum and maximum values of all data in the collected data-set. The normalization of coal injection rate (unit: kg/thm) is shown in Fig. 4, the x-axis represents the total amount of coal injection rate data processed, while the y-axis represents the pre-processed and post-processed dimensionless values, respectively.
Blast furnace smelting is a multi-system, continuous process, and the collected data sometimes have strong correlations. Direct use will cause the model calculation amount to be too large, and will greatly increase the risk of model over-fitting. So it is necessary to screen the parameters of the input model, a reasonable parameter can improve the computational efficiency of the model and enhance the accuracy of training. In this paper, after referring to the previous researches of the authors and combining with the experience of a large number of blast furnace operators, the parameters that may have a greater influence on the coal injection rate are screened out as the initial selection of input parameters, the parameter name and its code name as shown in Table 2.
Parameter | Code name | Parameter | Code name |
---|---|---|---|
breathability index (Last time) | X1 | fuel ratio (current time) | X11 |
wind volume (Last time) | X2 | coal injected rate (current time) | X12 |
wind temperature (Last time) | X3 | bottom flow (Last time) | X13 |
wind pressure (Last time) | X4 | bottom temperature difference (Last time) | X14 |
thermal load (Last time) | X5 | seal chamber temperature (Last time) | X15 |
total oxygen (Last time) | X6 | seal chamber N2 (Last time) | X16 |
oxygen enrichment rate (Last time) | X7 | cooling air temperature (Last time) | X17 |
top temperature (Last time) | X8 | furnace gas (Last time) | X18 |
top pressure (Last time) | X9 | actual wind speed (Last time) | X19 |
pressure difference (Last time) | X10 | blast kinetic (Last time) | X20 |
Correlation analysis is a statistical method used to measure the degree of association between two or more variables. Commonly used methods include Pearson correlation coefficient method and Spearman rank correlation coefficient method. This article uses the Pearson correlation coefficient method:
(10) |
Where, W is the Pearson correlation coefficient,
As shown in Fig. 5, when the absolute value of the Pearson correlation coefficient is greater than 0.8, it belongs to the strong correlation range, and the correlation between parameters is abnormally high. The amount of redundant data will not only affect the model accuracy,22) but also prolong the model training. time, so they need to be eliminated. Through heat map screening, the fuel ratio (X11) parameters that are highly correlated with the coal injection rate (X12) are eliminated, remaining breathability index (X1), wind volume (X2), wind temperature (X3), wind pressure (X4), thermal load (X5), total oxygen (X6), oxygen enrichment rate (X7), top temperature (X8), top pressure (X9), pressure difference (X10), bottom flow (X13), bottom temperature difference (X14), seal chamber temperature (X15), seal chamber N2 (X16), cooling air temperature (X17), furnace gas (X18), actual wind speed (X19), blast kinetic (X20).
The importance ranking analysis in this paper adopts the mutual information method, which is a method used to determine the relative importance of variable factors based on an algorithm that calculates the contribution of each parameter to the target parameter. and is widely used within the field of feature screening. The main principle is to evaluate the strength of the relationship between each variable and each factor through the use of factor loading firstly, and then sort the selection according to the scores. Higher factor loading values indicate a stronger association between the variables and the factor, so these variables can be considered to have higher importance:
(11) |
Where, I(X;Y) is the correlation between variables X and Y, p(x,y) denotes the probability of x and y occurring simultaneously, and p(x), p(y) denote the probability of x and y occurring separately, respectively. The greater the degree of information, the stronger the correlation between the variables.
The mutual information ranking for the coal injection rate is shown in Fig. 6, from which it can be seen that the features are ranked from top to bottom according to the size of the importance score. As shown in Fig. 7, after several experiments, according to the importance ranking and combined with the metallurgical process, also considering the efficiency of the model, the threshold value is set to 0.1, when greater than 0.1, the input parameter decreases, the model accuracy loss is large; when less than 0.1, the input parameter increases, the model total forecast time will increase, but the accuracy improvement is not large. After selection, 5 parameters such as X8 and X1 et al., which has a small impact on the coal injection rate are eliminated, and 13 parameters with correlation coefficients greater than 0.1 were finally selected as input parameters of the model: X3, X4, X5, X6, X7, X9, X13, X14, X16, X17, X18, X19, X20.
In this paper, Mae (mean absolute error), Rmse (root mean square error) and R2 (R-square) are used to assess the regression prediction effect of different models and the optimization effect of different optimization algorithms on the Catboost model:23)
(12) |
(13) |
(14) |
Where yi is the true value of the sample,
Mae is the average of the absolute error and the absolute size of the prediction error, which directly reflects the true situation of the error; Rmse can visualize the effect of the error value on the results; R2, also known as goodness-of-fit, reflects the proportion of variation in the dependent variable that can be explained by the model, also is the accuracy of the model’s fit to the data, it is a statistical indicator commonly used to evaluate regression models. The closer its value is to 1, the better the model fits the data.
5.1. Comparison of Different Optimization AlgorithmsThis article uses the Bayesian optimization method, grid search optimization method, and random search optimization method in the Optuna library to optimize the Catboost model.
As can be seen from Fig. 8, in the 200 iterations of the three optimization methods, the R2 of the Bayesian optimized model is not only the highest among the three methods, but also smoother and more accurately fitted to the extreme values; in Table 3, for error, both Rmse and Mae are smaller than the random search method and the grid search method, and the stability performance is more outstanding, which is more suitable to be used for optimizing the Catboost model. The conditions for the model to achieve the optimal state are shown in Table 4. Compared with random search and grid search, the learning rate of the Bayesian optimization model is smaller, but the learning depth and the number of nodes increase, and the iterations to reach the optimum is also the smallest among the three, the decision tree constructed is more robust and efficient. After Bayesian optimization, R2 increased by 4.6% and Rmse decreased by 64.9%.
optimization algorithms | R2 (average) | R2 (max) | Rmse (average) | Rmse (max) | Mae (average) | Mae (max) |
---|---|---|---|---|---|---|
Bayesian | 0.921 | 0.936 | 5.96×10−5 | 1.72×10−4 | 1.31×10−4 | 4.08×10−5 |
Grid search | 0.658 | 0.879 | 5.28×10−3 | 2.57×10−2 | 2.06×10−4 | 7.96×10−4 |
Random search | 0.619 | 0.925 | 4.53×10−3 | 2.56×10−2 | 1.85×10−4 | 7.92×10−4 |
Model \ Setting | Num_ nodes | Iterations | Learning rate | Depth |
---|---|---|---|---|
Catboost | \ | 1351 | \ | \ |
BO-Catboost | 127 | 1196 | 0.143 | 7 |
Grid search-Catboost | 63 | 1384 | 0.147 | 6 |
Random search-Catboost | 63 | 1483 | 0.161 | 6 |
This paper uses the BO-Catboost model, the ordinary Catboost model, the BO-Xgboost model and the BO-Random Forest model to predict the coal injection rate respectively, and performs fitting and regression analysis between the prediction results and the real values, as shown in Figs. 9, 10.
As can be seen from Fig. 9, in the comparison of the fitting effects for the four models on the same coal injection rate data-set, the performance of the ordinary Catboost model is close to that of the BO-Random Forest model, and in the overall fitting effect of the extreme values, both of them are stronger than the BO-Xgboost model; after optimization of the hyper-parameters by the Bayesian algorithm, the BO-Catboost model has a more prominent performance in the fitting effect of the extreme values.
As can be seen from Fig. 10, in the comparison of the regression effects for the four models on the same coal injection rate data-set, the overall density of predicted values in the ordinary Catboost model is better than BO-Xgboost model and the BO-Random Forest model, and the BO-Catboost model is obtained Significantly improved, it is closer and denser to the true values, and has fewer prediction deviation values.
In addition, as can be seen from Table 5, in terms of training and prediction time, although the prediction accuracy of the BO-Random Forest model is close to that of the BO-Catboost model, the time used is much larger than that of the latter, because the BO-Catboost model has added an optimization process, the time used for training is slightly larger than that before the optimization, but the time used for prediction is also reduced.
Model | R2 | Rmse | Mae | Training time/s | Predicting time/s |
---|---|---|---|---|---|
BO-Catboost | 0.936 | 1.72×10−4 | 4.08×10−5 | 1.54 | 0.016 |
Catboost | 0.894 | 4.90×10−4 | 1.91×10−4 | 1.52 | 0.021 |
BO-Xgboost | 0.815 | 9.32×10−4 | 6.85×10−5 | 1.91 | 0.023 |
BO-Random Forest | 0.903 | 6.45×10−3 | 2.25×10−5 | 2.87 | 0.031 |
In order to display and analyze the blast furnace coal injection rate more intuitively, this paper designed and developed a blast furnace coal injection rate monitoring system based on Web technology. The platform is able to monitor the coal injection rate of the blast furnace in real time and provide data monitoring, alarm information and other functions. Development tools such as IDEA 2021 and Xshell 7 were chosen for the development process. CSS (Cascading Style Sheets), HTML (Hyper Text Markup Language) and JS (JavaScript) technologies24,25) were used for the front-end pages, while the back-end pages were designed through Spring Boot. In order to ensure the stability and security of the system, the platform was deployed on the Linux system, and was tested in a steel plant in China in January 2024. For better testing of the model, the data are newly collected data, and the test results on the new data are shown in Fig. 11, the R2 approaching 0.91, the BO-Catboost model performs well.
By using technologies such as CSS, HTML and JS, an interface with convenient interaction and concise content has been created.HTML is a markup language that mainly consists of a series of tags that describe the various elements in the page. These tags define the content and layout of the web page according to a certain structure. In particular, the advantages of CSS and JS are fully utilized in this platform. CSS is mainly used to control the style and layout of web pages, making the web pages more beautiful and easier to read;26) On the other hand, JS is mainly used to realize the dynamic effects and interactive functions of the web pages to enhance the user experience, the combination of the two can make the pages richer and more powerful.
6.2. Back-end Design and Spring BootThe back-end is the core support for the entire platform, which uses the Spring Boot27) development framework designed to simplify and accelerate the process of developing and deploying Spring applications. It enables developers to quickly build and deploy applications without having to manually configure a large number of framework settings by providing default configuration. Besides, Spring Boot also supports applications packaged into executable JAR packages or WAR packages, and with container technologies such as Docker,28) you can achieve a more lightweight rapid deployment.
6.3. Database Design and OracleBecause of the sheer volume of data during blast furnace production, an independent database is essential. The platform chose Oracle database, which has excellent scalability and performance to handle large-scale data and highly concurrent requests. In addition, Oracle Database provides multi-level security measures including user authentication, permission management and other security measures, more security,29,30) so it is widely used in enterprise-level applications and large data systems to provide stable and efficient data management solutions.
6.4. Deployment and Load BalancingThe stability and availability of the platform is determined by how it is deployed. The platform was chosen to be deployed on a Linux system, and an Nginx proxy server was introduced for load balancing. By sending data traffic to multiple servers, single points of failure are avoided, and the stability and response speed of the system are improved. At the same time, Nginx is used as a reverse proxy to hide the details of the back-end servers, preventing direct access to the real servers in the case of malicious requests and improving the security of the system.31,32)
The platform interface is shown in Fig. 12. The coal injection rate monitoring platform mainly consists of 5 modules: user management module, coal injection rate data monitoring module for blast furnaces 1 to 5, blast furnace alarm information module, system log module, and system setting module.
The user management module can record the work number information of the person who logs into the system, providing a certain degree of traceability; The data monitoring module can use the BO-Catboost prediction model established in the previous section to display the predicted data of the blast furnace coal injection rate on the board, and can switch the displayed blast furnace, so as to facilitate the operator to understand the data more intuitively; The alarm Information module is used to record alarm information about the coal injection rate during blast furnace production; The system log can record all the coal injection data of the blast furnace in one year, which is convenient for subsequent data analysis and optimization; The System Setting module allows you to set up some basic information in the system.
In this paper, a blast furnace coal injection rate prediction model based on Catboost model optimized by Optuna algorithm with Bayesian idea is proposed. As shown by the experimental results of applying the model to a steel plant blast furnace in China, the model achieves a prediction accuracy of 91% on the unknown data-set using the already mined rules. It is proved that the system can solve the problems of non-linearity, non-equilibrium and multivariate input in the prediction of coal injection rate effectively, and it has certain guiding significance for the control of coal injection rate.
(1) Based on the method of correlation analysis and combined with metallurgical process experience, the input parameters of the model are screened, and the coal injection rate is taken as a single output. The Catboost model was quoted, and a new fusion model BO-Catboost was established by combining the Bayesian-based optuna optimization algorithm with the Catboost model for the prediction of coal injection rate.
(2) By screening the controllable operational parameters which has a greater impact on the coal injection rate to predict this parameter at the next moment, the uncertainty caused by prediction based on iron output parameters is avoided, and more explanatory.
(3) By comparing the Bayesian optimization with the grid search optimization and random search optimization methods, the results show that the Bayesian optimization is the best of the three, both in terms of data fitting and in terms of error metrics, the average rmse is only 5.96×10−5, the average R2 reached 0.921.
(4) By comparing BO-Catboost model with ordinary Catboost model, BO-Xgboost model and BO-Random Forest model, the results show that BO-Catboost model has higher prediction accuracy and faster prediction speed, the maximum R2 reached 0.936 and the prediction speed was narrowed to 0.016 s.
(5) Through the establishment of the blast furnace coal injection monitoring system based on Web technology, it can predict the next coal injection rate up to an hour in advance, and visualize it on the screen directly, the performance on new data shows that it has certain guiding significance for the control of coal injection rate.
This study was supported by the Natural Science Foundation of Hebei Province (E2019209314) and the Natural Science Foundation of Hebei Province (E2022209086).