Ensemble Learning Based Methods for Crown Prediction of Hot-Rolled Strip

Guangtao Li; Dianyao Gong; Xing Lu; Dianhua Zhang

doi:10.2355/isijinternational.ISIJINT-2020-639

Abstract

The strip crown or profile generated by the cooperation of finishing mills is affected by many factors, so obtaining an accurate crown has always been a challenge in hot strip rolling. As a kind of solution to ensure the crown accuracy of hot-rolled strips, this study develops three novel strip crown prediction models using the well-performing and efficient tree-based ensemble learning algorithms, including Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), respectively. The comparison results of measured and predicted strip crown show that all developed strip crown prediction models perform well based on the accurate extraction of key features as model inputs, the collection and pre-processing of a large amount of modeling data, and the use of the Bayesian optimization technique. By comparison, the LightGBM model with both high efficiency and high accuracy is considered the most recommended method for hot-rolled strip crown prediction. Besides, according to the feature importance scores of the input variables calculated based on the LightGBM model, the impact levels of each input variable on the strip crown are measured, and the calculation results fit well with the classical hot rolling theory indicating the modeling route as a reliable one.

1. Introduction

The rapid development of the manufacturing industry has placed higher demands on the steel strip shape. The strip crown is one of the most important indicators to evaluate the strip cross-sectional shape. However, obtaining an accurate strip crown has always been a difficulty in the hot strip rolling process. As shown in Fig. 1, strip crown control as a subset of the strip shape control is a complex process, consisting of the pre-setting model and adaptive learning model in the process control system (L2) and the control models in the basis automation system (L1).¹⁾ The pre-setting model is used to provide the reference values of the bending force and work roll shifting position of each stand for obtaining the target strip crown. The pre-setting model includes a calculation process for strip and roll deformation. The calculation of the strip deformation is usually based on the Finite Element Method.^2,3) The calculation of roll deformation involves the calculation of roll bending and flattening based on the Influence Function Method^4,5) or Finite Element Method,^6,7) the calculation of roll thermal expansion based on the Finite Element Method^8,9) or Finite Difference Method,^10,11) the calculation of roll wear based on discretization or statistical regression methods.¹²⁾ Since the results of the pre-setting model sometimes have large deviations, the control models in L1 are essential for obtaining the required strip crown.¹³⁾ According to the difference between the measured crown and the target crown, the crown feedback control eliminates the crown deviations by adjusting the bending force in the upstream stands. The feedforward control of the bending force is committed to eliminating the effect of rolling force fluctuations on the strip shape. However, due to the lag of crown measurement, the crown deviation can not be corrected immediately by feedback control. Thus, once the setting deviation of the pre-setting model is large, considering the high rolling speed, the crown deviation in the strip head is difficult to be avoided. As an available solution for the above problem, strip crown prediction is proposed. As shown in Fig. 1, the prediction model of strip crown is in L2, which is used to provide the accurate prediction of strip crown according to the current rolling conditions and actuator settings before the strip starts to be finishing rolled. Based on the difference between the predicted crown and the target crown, the crown deviation in the strip head can be reduced by adjusting the bending force in advance.

Fig. 1.

Automatic control system for hot-rolled strip shape. (Online version in color.)

Since strip crown is affected by many factors, achieving accurate and efficient strip crown prediction is not an easy task. Recently, the successful application of artificial intelligence (AI) techniques in various fields has attracted great attention from the steel rolling industry. Powerful AI techniques can brilliantly handle problems in complex systems without any assumptions. Therefore, well-performing and efficient machine learning algorithms combined with industrial data are considered as a way to address this challenge. Up to now, there have been few studies of strip crown prediction based on AI techniques. Wang et al.¹⁴⁾ built a crown prediction model based on the support vector machine (SVM), but the proposed model is only applicable to one hot strip mill. Sikdar and Kumari¹⁵⁾ conducted a study to apply the structurally simple Artificial Neural Networks (ANN) to predict the crown of strips of different widths. Deng et al.¹⁶⁾ developed a crown prediction model based on Deep Neural Network (DNN). Although the proposed model has high prediction accuracy, the sophisticated modeling process and the expensive calculation cost are not well compatible with the features of industrial control, such as fast and simple. Therefore, it is important to apply more advanced AI technology to establish a high-precision, high-efficiency, and easy-to-implement crown prediction model.

Tree-based ensemble algorithms have been recognized as one of the best and most commonly used supervised learning methods recently. The ensemble algorithms with the decision tree as the base learner inherit the advantages of the decision tree such as simple, highly interpretable, and robustness to anomalies, overcome the disadvantages of unstable and high-variance.¹⁷⁾ Random Forest (RF) and Gradient Boosting Decision Tree (GBDT) are two of the most representative tree-based ensemble algorithms, the high efficiency, and the incredibly powerful performance make them stand out from some other classical algorithms on many complex issues. By using the RF algorithm, the better prediction performance was produced than ANN, SVM, Regression Decision Tree, Ridge Regression (RR), or Stepwise Regression (SR) in some issues, such as aqueous solubility prediction in medicine development,¹⁸⁾ mineral distribution prediction in mineral exploration,¹⁹⁾ soil organic carbon prediction in environmental science^20,21) or material properties prediction in metallurgical engineering.²²⁾ For the GBDT algorithm, there are two novel efficient implementations proposed in recent years, which are eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM). The better performance of XGBoost or LightGBM has been shown in assessing the potential toxicities of pharmaceuticals and industrial chemicals,²³⁾ price prediction,^24,25) risk prediction in the financial industry,²⁶⁾ global solar radiation prediction for the use of renewable energy,²⁷⁾ prediction of the bioactive molecule and protein-protein interactions in the chemical and biological fields,^28,29) compared to ANN, DNN, SVM, RF, k‑nearest neighbor (KNN), autoregressive integrated moving average model (ARIMA) or Naïve Bayes (NB).

RF, XGBoost, and LightGBM, with the characteristics of high prediction accuracy, high efficiency, simple and easy to implement, are well suited for solving complex industrial problems. But they are rarely used in the steel rolling industry, especially in the crown prediction of hot-rolled steel strip. Therefore, one contribution of this study is to apply the RF, XGBoost, and LightGBM to establish three novel crown prediction models for improving the strip crown control accuracy. Besides, to obtain an excellent prediction performance, we pre-process the large amount of data collected and optimize the developed prediction models using Bayesian optimization. Furthermore, the performance (efficiency and accuracy) of different models is compared to identify the best one among them for predicting strip crown, and the feature importance scores are calculated to measure the impact level of each input variable on the strip crown.

2. Hot Strip Rolling Technology

2.1. Definition of Strip Crown

As shown in Fig. 2, the cross-section of the strip is always divided into the center zone, edge drop zone, and feather zone according to the characteristics of thickness distribution.³⁰⁾ Most used in practice is the center crown, which is defined as the difference between the center thickness and the arithmetic average of either the feather thickness or edge drop thickness. The former expressed by Eq. (1) is called the overall center crown, the latter expressed by Eq. (2) is called the partial center crown.³¹⁾ In our study, strip crown is the difference between center thickness and the average thicknesses at 40 mm from each edge.

C f = h c - h f′ + h f″ 2

(1)

C d = h c - h d′ + h d″ 2

(2)

Where C_f is the overall center crown, C_d is the partial center crown, h_c is the center thickness of cross-section, h_f_′ and h_f_″ are feather thicknesses, f′ = f″ = 9.5–25 mm are distances to drive and operation sides, h_d_′ and h_d_″ are edge drop thicknesses, d′ = d″ = 50–70 mm are distances to drive and operation sides.

Fig. 2.

Cross-section of strip. (Online version in color.)

2.2. Hot Strip Rolling Process

The modeling data of this study are collected from a 1780 mm hot steel strip rolling line in China. Figure 3 has shown the entire production process. First, the slab is heated to the desired temperature and the oxide scale is removed by the first descaling process. The intermediate slab is then obtained by a multi-pass roughing rolling process. After the irregular head and tail shear process and the second descaling process, the target strip appearance is generated through the finishing rolling process. Finally, the finishing rolled strip is water-cooled to achieve the required metallurgical properties and coiled by the down coiler. For the entire hot rolling process, the control of strip crown primarily depends on the close cooperation of seven finishing mills. As shown in Fig. 4, the control actuators for the strip crown of the first four mills (F1–F4) are the bending and the shifting of the Continuously Variable Crown (CVC) work rolls, and the control actuator for the strip crown of the last three mills (F5–F7) is only the bending of cylindrical work rolls.

Fig. 3.

Hot strip rolling process. (Online version in color.)

Fig. 4.

Structure of four-high strip mill for (a) F1–F4; (b) F5–F7. (Online version in color.)

3. Methodology

3.1. Random Forest

Random Forest (RF), introduced by Breiman,³²⁾ combines the ideas of bagging³³⁾ and random feature selection.^34,35,36) Each random-forest tree is trained independently on a different resampled training set, which is generated by selecting samples with replacement from the original training set. Since each tree can be built simultaneously, RF can perform parallel operations to greatly reduce training time. When splitting each node to grow a tree, the best split is found based on a random subset of all input features rather than all of them. The injection of the randomness is intended to further enhance the diversity of trained trees so that a reduced variance for RF can be achieved by averaging the predictions of all constructed trees. For the training dataset with N samples D = {(x₁, y₁), …, (x_i, y_i), …, (x_N, y_N)}, where each sample includes P inputs ( x i ∈ ℝ P ) and one output ( y i ∈ℝ) in this paper, the formulation of RF can be expressed as

ψ(x)= 1 N_RF ∑ e=1 N_RF T e (x)

(3)

Where T_e(x) is a random-forest tree, N_RF is the number of trees. Grow each tree by repeating the following steps at each split node until tree-based parameters are reached.

(a) Random selection of Q features from all features of the resampled training set, where Q ≤ max_RF_feature (max_RF_feature is the maximum number of features involved in splitting finding).

(b) Pick the best feature and splitting point among the Q.

3.2. EXtreme Gradient Boosting

Gradient Boosting Decision Tree (GBDT) is a widely-used algorithm, in which each tree is built along the negative gradient direction of the loss function of the previous model in an additive way.³⁷⁾ Although the prediction accuracy of GBDT is high, the training cost is always expensive. The eXtreme Gradient Boosting (XGBoost) is a kind of highly effective implementation of the GBDT.^38,39) So, XGBoost is also based on the idea of combining a set of weak learners into a strong ensemble model by the additive way, the general function for XGBoost using N_XGBoost regression trees to predict the output is as follow:

y i * =Φ( x i )= ∑ e=1 N_XGBoost f e ( x i ), f e ∈Γ,

(4)

Where Γ is the space containing all possible regression trees, f_e(x) = ω_q₍_x₎, (q: ℝ P →{1, 2, …, U}, ω∈ ℝ U ) , q is a function (i.e., the structure of a regression tree) that assigns an example to the corresponding leaf, U is the number of leaves in the tree, ω is leaf weight vector. The tree structure q and leaf weights ω are both independent for each f_e. Compared to GBDT, XGBoost is centered on using Newton Boosting instead of Gradient Boosting, and XGBoost minimizes the following regularized objective:

κ(Φ)= ∑ i l( y i , y i * ) + ∑ e Ω( f e )

(5)

Where l is the loss function that measures the deviation between the predicted y i * and the target y_i, Ω is the regularization item that does not exist in the GBDT objective function and is used to penalize the complexity of each tree:

Ω(f)=γU+ 1 2 λ ‖ ω ‖ 2

(6)

Where γ and λ are penalty coefficients. As the values of these two parameters increase, the model becomes more conservative and less prone to overfitting. The second-order approximation is used to quickly optimize the objective. The detailed derivation process can be found in Chen and Guestrin.⁴⁰⁾

Besides, to speed up the decision tree growing process, XGBoost orders features only once before training and stores them in a column block structure for reusing in subsequent iterations. The use of column block structure not only reduces the ordering cost of features but also makes XGBoost a parallel algorithm for split finding. Therefore, XGBoost has a significant improvement in learning speed compared to GBDT.

3.3. Light Gradient Boosting Machine

Light gradient boosting machine (LightGBM), similar to XGBoost, is another highly efficient implementation of GBDT. LightGBM based on the idea of Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) is designed to overcome GBDT’s low efficiency and poor scalability when dealing with the data of large size or high dimension.⁴¹⁾ GOSS keeps data instances with large gradients (i.e., data instances contribute more to the information gain), and randomly drops data instances with small gradients. Although GOSS reduces the data instances used for training, the distribution of data instances is not changed. Therefore, accelerating training without compromising accuracy is achieved. EFB captures the sparsity of the feature space for high-dimensional data, mutually exclusive features (i.e., features never take nonzero values simultaneously) are bundled into a single feature to speed up the training.

Besides, LightGBM uses the histogram-based algorithm instead of the pre-sort-based algorithm (the default algorithm for XGBoost) for decision tree learning.⁴²⁾ The histogram-based algorithm first maps continuous feature values into discrete buckets to form bins, and then uses these bins to construct the histogram, so that the cost of calculating gain for splitting finding is reduced. Meanwhile, the histograms of the leaf nodes are generated using histogram subtraction, which further improves the training efficiency.

The decision tree growth strategy of LightGBM is leaf-wise with depth restriction, which is also different from the level-wise used by most GBDTs.⁴³⁾ Level-wise splits all leaf nodes in each layer. Since leaf nodes with low gain are not necessary for splitting, the indiscriminate splitting leads to the expensive computing cost. And leaf-wise chooses a leaf node with the largest gain in each layer for splitting. Compared to level-wise, leaf-wise makes the algorithm more efficient and accurate, but the risk of overfitting is also presented. LightGBM adds a maximum depth limit to leaf-wise to prevent overfitting.

4. Experimental Investigation

4.1. Data Preparation

6429 sets of process data related to the strip crown recorded by Level 2 automation and saved in the data warehouse are collected from the 1780 mm hot rolling line. From all the collected data, 72 important variables described in Table 1 are selected as inputs to the prediction model of strip crown. These variables affect the strip crown by affecting roll initial crown, roll elastic deformation, roll thermal expansion, roll wear, or strip deformation.

Table 1. List of inputs of the prediction model for strip crown.

Number	Description	Symbol	Unit
1	The number of each strip in one rolling period	NO.
2–7	Chemical component of strip: Carbon/Silicon/Manganese/Chromium/Nickel/others	C/Si/Mn/Cr/Ni/others	%
8	Strip width at the exit of last finishing mill (F7)	B	mm
9	Strip thickness at the exit of last finishing mill (F7)	h	mm
10	Strip crown at the entrance of first finishing mill (F1)	C_EN	μm
11–17	Reduction rates in each finishing mill (F1–F7)	r₁–r₇	%
18–24	Rolling force in each finishing mill (F1–F7)	F_r₁–F_r₇	kN
25–31	Bending force in each finishing mill (F1–F7)	F_b₁–F_b₇	kN
32–38	Roll shifting position in each finishing mill (F1–F7)	P_s₁–P_s₇	mm
39	Total rolling distance before rolling this strip in this period	L_m	m
40–44	Total rolling distance between this rolled strip and the previous one, two, four, eight and fifteen rolled strips in this period	L_m₁, L_m₂, L_m₄, L_m₈, L_m₁₅	m
45	Total rolling time before rolling this strip in this period	L_w	s
46–50	Total rolling time between this rolled strip and the previous one, two, four, eight, fifteen rolled strips in this period	L_w₁, L_w₂, L_w₄, L_w₈, L_w₁₅	s
51	Total rest time before rolling this strip in this period	L_r	s
52–56	Total rest time between this rolled strip and the previous one, two, four, eight and fifteen rolled strip in this period	L_r₁, L_r₂, L_r₄, L_r₈, L_r₁₅	s
57	Rolling speed at the exit of last finishing mill (F7)	V_EX	m/s
58	Strip temperature at the exit of last finishing mill (F7)	T_EX	°C
59–65	Diameter of work roll in each finishing mill (F1–F7)	D_w₁–D_w₇	mm
66–72	Diameter of backup roll in each finishing mill (F1–F7)	D_b₁–D_b₇	mm

Data pre-processing is one of the most critical steps in improving the accuracy of the data-driven model. Therefore, we pre-process the raw strip data before inputting them to the tree-based ensemble algorithms. As shown in Fig. 5, steel coil data with missing values is first removed from the raw dataset, then Pauta Criterion and Grubbs Criterion are applied to remove outliers. Pauta Criterion is expressed as

| x i j - x j ¯ |>3σ; i=1, 2, ..., n; j=1, 2 ,..., P

(7)

Where x j ¯ and σ are respectively the average value and standard deviation of variable j, n is the number of samples, P is the number of variables.

Fig. 5.

Data pre-processing procedure. (Online version in color.)

Grubbs Criterion identifies the outliers based on the relationship between the calculated G and the critical G(n,α). When G is greater than G(n,α), x_i is judged as the outlier. G(n,α) is obtained from the Grubbs table, α is the significance level equal to 0.01 herein. G is calculated by

G= max| x i j - x j ¯ | σ

(8)

After data pre-processing, 4789 sets of strip data are obtained from the original dataset, some of which are shown in Table 2. These strip data are then randomly divided into two parts according to 9:1 for training and testing the models, respectively.

Table 2. Partial experimental data after pre-processing.

No.	Crown (μm)	h (mm)	B (mm)	C (%)	…	F_r₁ (kN)	F_b₁ (kN)	P_s₁ (mm)	…	L_m (m)	L_w (s)	L_r (s)	D_w₁ (mm)
1	34.56	1.96	1230	0.17		22583.2	1000.00	−69.98		35185.40	5318	4801	826.00
2	62.32	4.51	1560	0.013		19153.2	569.24	59.01		20456.33	2852	9406	826.34
3	58.76	3.98	1612	0.24	…	28042.7	800.00	100.00	…	7572.03	1305	2815	847.43
4	54.05	4.72	1535	0.18		21030.1	1421.04	89.97		3978.81	673	1804	826.34
5	38.50	3.50	1530	0.16		26352.6	2300.00	100.00		29039.42	4515	3085	825.06
…	…	…	…	…		…	…	…		…	…	…	…
4787	27.00	1.83	1203	0.08		24054.6	500.00	80.00		22885.88	2990	1965	819.64
4788	44.38	3.52	1301	0.66	…	20240.0	100.00	−99.98	…	16412.73	2606	2023	817.96
4789	24.88	3.02	1266	0.03		18203.5	1294.46	−9.02		7300.98	755	723	815.32

4.2. Modeling Process

4.2.1. Bayesian Optimization

The Bayesian optimization technique is a kind of popular method to solve the black-box optimization problems in machine learning.^44,45,46) The optimization process is achieved based on the Bayesian inference and Gaussian process (GP), which attempts to find the global maximum in a minimum number of steps.⁴⁷⁾ Since Bayesian optimization takes the results of the previous iteration into account when selecting the values for the next iteration, it can approach the optimum more efficiently than grid search and random research.⁴⁸⁾ Therefore, Bayesian optimization is used to tune the parameters of each model in this study, the parameter adjustment procedure is shown in Fig. 6. The objective function of Bayesian optimization in this study is expressed as:

S k * =arg max S k ∈ θ k Z( S k )

(9)

Where Z(S_k) is the objective function score of Model k, i.e., the mean of negative mean square error (Mean_Neg.MSE) of Model k resulting from 10-fold cross-validation for a certain combination of parameters S_k, θ_k is the parameter search space for Model k. The target of Bayesian optimization is to find out the best combination of parameters S k * that can maximize the objective function score Z( S k * ) .

Fig. 6.

The parameter adjustment procedure of Bayesian optimization. (Online version in color.)

As shown in Fig. 6, the algorithm is initialized by randomly generating M initialization points (S k 1 , ..., S k M ) and calculating their corresponding objective function scores (Z(S k 1 ), ..., Z(S k M )) , and each iteration adds a new point until the end of the iteration. When searching for the new point S k g (M + 1 ≤ g ≤ J), the Gaussian process is first fitted to the previously explored points (S k 1 , ..., S k g-1 ) and their objective function scores (Z(S k 1 ),...,Z(S k g-1 )) to obtain the posterior probability of the objective function score at any point. Then, the acquisition function is constructed based on the posterior probability, and the point corresponding to the maximum value of the acquisition function is regarded as the new search point S k g . As the iterative process progresses, Bayesian optimization finally provides a set of parameters that maximize Mean_Neg.MSE of Model k.

4.2.2. Establishment of RF, XGBoost, LightGBM Models

As shown in Table 3, the parameters of RF, XGBoost and LightGBM to be optimized can be grouped into three main categories: (1) Process impact parameters optimized for better prediction accuracy, including the number of trees and learning rate; (2) Tree structure parameters optimized for dealing with over-fitting, including maximum depth and maximum number of leaves in each tree, minimum number of samples at a leaf node, minimum sum of instance weight needed in a child node, minimum sample size for node splitting; (3) Randomness injection parameters optimized for faster training speed and dealing with over-fitting, including subsample ratio of the training instances, subsample ratio of features for node splitting, frequency for bagging.

Table 3. The optimization results for RF, XGBoost, LightGBM within the defined parameter boundaries.

Parameters	RF		XGBoost		LightGBM
Parameters	Boundaries	Results	Boundaries	Results	Boundaries	Results
The number of trees	(10, 400)	183	(100, 4000)	640	(100, 4000)	3053
Learning rate	—	—	(0, 0.1)	0.055	(0, 0.1)	0.034
The maximum depth for each tree	(2, 100)	98	(2, 30)	24	(2, 30)	15
The minimum sample size for node splitting	(2, 30)	2	—	—	—	—
The maximum number of leaves in each tree	—	—	—	—	(1, 500)	50
Minimum sum of instance weight needed in a child node	—	—	(1, 30)	28	—	—
The minimum sample size at a leaf node	(1, 20)	2	—	—	(1, 300)	21
Subsample ratio of the training instances	—	—	(0.5, 1)	0.55	(0.5, 1)	0.56
Subsample ratio of features for node splitting	(0, 1)	0.34	(0, 1)	0.41	(0, 1)	0.50
Frequency for bagging	—	—	—	—	(2, 10)	4

Bagging algorithm is committed to reducing the variance of the overall model as the number of base learners increases so that the ability to prevent overfitting is enhanced and the accuracy of the overall model is improved, so the bias of the overall model is similar to the base learner and base learner needs to be strong. However, boosting algorithm is committed to integrating multiple weak learners into one stronger learner by the additive way for reducing the bias of the overall model. Therefore, compared to XGBoost and LightGBM based on boosting algorithm, RF based on bagging algorithm usually has fewer trees but a more complex tree structure. According to this distinctive feature, the optimization boundaries for the number of trees and tree structure parameters in RF, XGBoost and LightGBM are defined differently for a more efficient optimization process, as shown in Table 3.

Learning rate in XGBoost and LightGBM scales the contribution of each tree, if set to a low value, more trees are needed to fit the training set, but the predicted generalization performance is usually better. Hence, the optimization boundaries for the learning rate and the number of trees are set to very small and very large, respectively, to find the best combinations.

To find the optimal parameters for each model, different parameter combinations in Bayesian optimization have been evaluated, including the number of initialization points M, maximum iterations J, and the selection of acquisition functions. The commonly used acquisition functions are Expected Improvement (EI), Upper Confidence Bound (UCB), and Probability of Improvement (PI), respectively. Figures 7, 8, 9 have shown the results of Bayesian optimization for RF, XGBoost, LightGBM, respectively. As the iterative optimization process progresses, the optimal point is finally determined. The optimization results of RF in Fig. 7 show that the best objective function score is −15.86 μm², discovered using the acquisition function EI after 30 iterations when M = 20 and J = 50. The optimization results of XGBoost in Fig. 8 indicate that the best objective function score is −12.93 μm², obtained using acquisition function PI after 21 iterations based on 15 initialization points. The optimization results of LightGBM in Fig. 9 show that the best objective function score is −11.73 μm², found using acquisition function EI after 29 iterations when M = 15 and J = 30. And the values of the parameters for RF, XGBoost, LightGBM leading to the best objective function scores are all listed in Table 3, which are considered to be the optimal parameters for RF, XGBoost and LightGBM, respectively. The constructed RF, XGBoost and LightGBM models are compared in section 5.

Fig. 7.

The results of Bayesian optimization in RF for (a) M = 15, J = 30; (b) M = 20, J = 50. (Online version in color.)

Fig. 8.

The results of Bayesian optimization in XGBoost for (a) M = 15, J = 30; (b) M = 20, J = 50. (Online version in color.)

Fig. 9.

The results of Bayesian optimization in LightGBM for (a) M = 15, J = 30; (b) M = 20, J = 50. (Online version in color.)

5. Results and Discussion

5.1. Comparison of Efficiency for RF, XGBoost and LightGBM

The efficiency of the strip crown prediction model is one of the most important evaluation indicators of its performance, efficient models are always more preferred. Hence, as shown in Table 4, the average time-consuming of RF, XGBoost, LightGBM at training and testing phases are compared, respectively. All computational procedures for constructing crown prediction models are implemented based on software JetBrains PyCharm Community Edition and performed on a computer with processor configuration of intel (R) Core (TM) i5-7200U CPU @ 2.50 GHz. The time-consuming of Bayesian optimization for each model listed in Table 4 is obtained by executing 5 iterations based on 3 random initialization points and acquisition function EI. Although the difference in initialization points may have a slight effect on the time consumed during Bayesian optimization, the trend of the results shown in Table 4 is obvious, i.e., RF and LightGBM are relatively close in time-consuming of Bayesian optimization, and both significantly less than XGBoost due to their special algorithm design mechanisms above mentioned. When fitting the best model from the training set, LightGBM is a bit more time consuming due to the slightly higher complexity of the constructed model, but they’re not that far apart. Therefore, it can be concluded that RF and LightGBM are significantly more efficient than XGBoost at the training phase. And the time-consuming of each model to complete 479 predictions on the testing set is all very short, which means that these methods can quickly give the prediction results during the hot strip rolling process to fit well with the requirements of real-time crown control. In summary, the advantages of high efficiency for the LightGBM and RF algorithms will make them more compatible with the real field application.

Table 4. Time-consuming of RF, XGBoost, LightGBM at different phases.

Models	Training phase		Testing phase
Models	Bayesian optimization (s)	Build models from the training set using optimal model parameters (s)	Prediction from the testing set using best-developed models (s)
RF	776	7.2	0.030
XGBoost	3724	8.2	0.015
LightGBM	415	16.4	0.12

5.2. Comparison of Prediction Accuracy for RF, XGBoost, and LightGBM

Coefficient of determination (R²), root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) are adopted as the performance evaluation criterion for RF, XGBoost and LightGBM models, which are expressed as

R 2 =1- ∑ i=1 m ( y i - y i * ) 2 ∑ i=1 m ( y i - y i ¯ ) 2

(10)

RMSE= 1 m ∑ i=1 m ( y i * - y i ) 2

(11)

MAE= 1 m ∑ i=1 m | y i * - y i |

(12)

MAPE= 1 m ∑ i=1 m | y i * - y i y i | ×100%

(13)

where m is the number of test samples, y i ¯ is the average value of measured strip crown, y i * and y_i are predicted and measured strip crown, respectively.

Figure 10 shows the prediction results of RF, XGBoost, and LightGBM models for the 479 sets of testing data. The color scale is used to describe the predictions with different absolute errors (AE). As can be seen from Fig. 10, the prediction distributions of proposed models are regular, few predictions (The green points outside the two lines) have an absolute error greater than 10 μm. For higher crown control accuracy, the acceptable absolute error range should be less than 10 μm. Compared to RF and XGBoost models, the LightGBM obtains fewer bad predictions (green points) and achieves the largest R² with a value of 0.9466. The closer the value of R² is to 1, the more accurate the model will be.

Fig. 10.

Prediction results for (a) RF; (b) XGBoost; (c) LightGBM. (Online version in color.)

According to the frequency distribution results of absolute error shown in Fig. 11, the absolute error distributions of the three proposed models are both approximate to the symmetrical normal distribution, where the middle part is higher than the sides. This distribution feature is particularly prominent for XGBoost and LightGBM, which means that the vast majority of predictions of these two models have very small errors and only a very small fraction of predictions with relatively large errors. The statistics for the proportion of prediction results within different absolute error ranges are listed in Table 5.

Fig. 11.

Frequency distribution of absolute error for each model. (Online version in color.)

Table 5. Absolute error statistics for the predictions of different models.

Models	AE = 0 μm	AE = ±5 μm	AE = ±10 μm	AE > 10 μm
RF	18.37%	83.09%	96.66%	3.34%
XGBoost	41.96%	86.01%	98.12%	1.88%
LightGBM	44.98%	89.12%	98.75%	1.25%

The results of RMSE, MAE, MAPE in Fig. 12 show that the LightGBM model has the smallest values of RMSE, MAE, MAPE of 3.27, 1.95, 4.78, respectively. Based on the above evaluation results for the prediction accuracy, it can be concluded that the crown prediction model built using the LightGBM algorithm is more accurate than using the RF and XGBoost algorithms. Therefore, the LightGBM algorithm combined with Bayesian optimization, with both high efficiency and high accuracy, is the most recommended method for strip crown prediction.

Fig. 12.

Error comparison for RF, XGBoost and LightGBM. (Online version in color.)

5.3. Feature Importance Based on LightGBM

Feature importance refers to the technique of assigning scores to input variables based on their contribution to the output. To better understand the training data and the proposed model, feature importance scores of 72 input variables are calculated based on the constructed LightGBM crown prediction model. As can be seen from Fig. 13, the distribution of feature importance scores is relatively equilibrium, which fits well with the classic hot-rolling theory, i.e., these variables all have a relatively significant impact on the strip crown, indicating that the choice of input variables is completely accurate, and the establishment of LightGBM model is perfectly reliable. The bending force in stand seven (F_b₇) achieves the highest feature score, as the means of correcting crown deviations during the rolling process, the ranking of feature importance scores for bending force provides the basis for adjusting the bending force to get the target strip crown. The feature importance scores of backup roll diameters (D_b₁–D_b₇) are very low, and the main reason for this result is the inadequate data collection for backup roll diameters. The wear of the backup rolls is relatively minor, as a result, the short data collection cycle in this study resulted in the small diameter fluctuation of the large-diameter backup roll, which makes the backup roll diameter finally shows the weak impact on the strip crown. Therefore, in our future work, more adequate and richer data will be collected for further improving the prediction accuracy of the LightGBM model.

Fig. 13.

Feature importance scores of input variables. (Online version in color.)

6. Conclusions and Future Work

In this paper, three novel prediction models of strip crown with excellent performance were proposed for improving the control accuracy of strip crown based on RF, XGBoost, and LightGBM, respectively. The successful implementation of the strip crown prediction models benefited from the collection and pre-processing of a large amount of actual hot rolling production data, and the application of Bayesian optimization techniques to optimize the model parameters. Furthermore, the efficiency and prediction accuracy of the proposed models were compared to identify the best performing one of them, and the feature importance scores were calculated to measure the impact level of each input variable on the strip crown. Our conclusions draw from this study are as follows:

(1) The LightGBM and RF algorithms were significantly more efficient than the XGBoost algorithm for strip crown prediction. The efficiency advantages of the LightGBM and RF algorithms made them more compatible with real field applications.

(2) Bayesian optimization was an effective way to improve model accuracy. For the models optimized by Bayesian optimization, the LightGBM model had the highest prediction accuracy compared to the RF and XGBoost models, which achieved the smallest RMSE, MAE, MAPE, and the largest R².

(3) The equilibrium distribution of feature importance scores verified that input variables all had a significant impact on the strip crown, which was in good agreement with the classic hot-rolling theory, indicating that the choice of input variables was completely accurate, and the establishment of the LightGBM model was perfectly rational.

LightGBM algorithm combined with Bayesian optimization is the most recommended method for predicting strip crown in this study. In our future work, more efforts will be placed on data collection and pre-processing for higher crown prediction accuracy.

Acknowledgements

This work was supported by National Key R&D Program of China (2017YFB0304100), National Natural Science Foundation of China (51704067, 51774084, 51634002).

Abbreviations

AI: Artificial intelligence

AE: Absolute error

ANN: Artificial neural networks

ARIMA: Autoregressive integrated moving average model

CVC: Continuously variable crown

DNN: Deep neural network

EFB: Exclusive feature bundling

EI: Expected improvement

GOSS: Gradient-based one-side sampling

GP: Gaussian process

KNN: K‑nearest neighbor

LightGBM: Light gradient boosting machine

MAE: Mean absolute error

MAPE: Mean absolute percentage error

NB: Naïve Bayes

PI: Probability of improvement

RF: Random forest

RMSE: Root mean square error

RR: Ridge regression

SVM: Support vector machine

SR: Stepwise regression

UCB: Upper confidence bound

XGBoost: eXtreme Gradient Boosting

References

1) K. X. Peng, H. Zhong, L. Zhao, K. Xue and Y. D. Ji: Int. J. Adv. Manuf. Technol., 72 (2014), 589.
2) C. Liu, P. Hartley, C. E. N. Sturgess and G.W. Rowe: Int. J. Mech. Sci., 27 (1985), 829.
3) G. M. Zhang, H. Xiao and C. H. Wang: J. Iron Steel Res. Int., 13 (2006), 23.
4) K. N. Shohet and N. A. Townsend: J. Iron Steel Inst., 206 (1968), 1088.
5) J. Shao, B. Li, A. R. He, W. Q. Sun, Z. B. Liu and J. Zhou: Open Autom. Control Syst. J., 7 (2015), 93.
6) Y. L. Li, J. G. Cao, N. Kong, D. Wen, H. H. Ma and Y. S. Zhou: Int. J. Adv. Manuf. Technol., 91 (2017), 2725.
7) Q. L. Wang, J. Sun, Y. M. Liu, P. F. Wang and D. H. Zhang: Int. J. Adv. Manuf. Technol., 92 (2017), 1371.
8) C. S. Li, X. H. Liu, G. D. Wang and X. M. He: Mater. Sci. Technol., 18 (2002), 1147.
9) S. Serajzadeh and F. Mucciardi: Model. Simul. Mater. Sci. Eng., 11 (2003), 179.
10) M. Abbaspour and A. Saboonchi: Appl. Math. Model., 32 (2008), 2652.
11) X. M. Zhang, Z. Y. Jiang, A. K. Tieu, X. H. Liu and G. D. Wang: J. Mater. Process. Technol., 130 (2002), 219.
12) S. John, S. Sikdar, A. Mukhopadhyay and A. Pandit: Ironmaking Steelmaking, 33 (2006), 169.
13) A. R. He, Q. Yang, X. L. Chen and L. Zhao: Chin. J. Mech. Eng., 21 (2008), 103.
14) Z. H. Wang, Y. M. Liu, D. Y. Gong and D. H. Zhang: Steel Res. Int., 89 (2018), 1800003.
15) S. Sikdar and S. Kumari: Int. J. Adv. Manuf. Technol., 42 (2009), 450.
16) J. F. Deng, J. Sun, W. Peng, Y. H. Hu and D. H. Zhang: Appl. Soft. Comput., 78 (2019), 119.
17) Z. H. Zhou: Ensemble Methods: Foundations and Algorithms, Chapman and Hall/CRC, Boca Raton, (2012), 23.
18) D. S. Palmer, N. M. O’boyle, R. C. Glen and J. B. Mitchell: J. Chem. Inf. Model., 47 (2007), 150.
19) V. Rodriguez-Galiano, M. Sanchez-Castillo, M. Chica-Olmo and M. Chica-Rivas: Ore Geol. Rev., 71 (2015), 804.
20) K. Were, D. T. Bui, Ø. B. Dick and B. R. Singh: Ecol. Indic., 52 (2015), 394.
21) H. Zhang, P. B. Wu, A. J. Yin, X. H. Yang, M. Zhang and C. Gao: Sci. Total Environ., 592 (2017), 704.
22) S. W. Wu, J. K. Ren, X. G. Zhou, G. M. Cao, Z. Y. Liu and J. Yang: Trans. Indian Inst. Met., 72 (2019), 1277.
23) J. Zhang, D. Mucs, U. Norinder and F. Svensson: J. Chem. Inf. Model., 59 (2019), 4150.
24) X. L. Sun, M. X. Liu and Z. Q. Sima: Financ. Res. Lett., 32 (2020), 101084.
25) Y. R. Zhou, T. Y. Li, J. Y. Shi and Z. J. Qian: Complexity, 2019 (2019), Article No. 4392785.
26) X. J. Ma, J. L. Sha, D. H. Wang, Y. B. Yu, Q. Yang and X. Q. Niu: Electron. Commer. Res. Appl., 31 (2018), 24.
27) J. L. Fan, X. K. Wang, L. F. Wu, H. M. Zhou, F. C. Zhang, X. Yu, X. H. Lu and Y. Z. Xiang: Energy Convers. Manag., 164 (2018), 102.
28) I. B. Mustapha and F. Saeed: Molecules, 21 (2016), 983.
29) C. Chen, Q. M. Zhang, Q. Ma and B. Yu: Chemom. Intell. Lab. Syst., 191 (2019), 54.
30) V. B. Ginzburg: High-Quality Steel Rolling: Theory and Practice, Marcel Dekker, Inc., New York, (1993), 3.
31) V. B. Ginzburg: Iron Steel Eng., 64 (1987), 23.
32) L. Breiman: Mach. Learn., 45 (2001), 5.
33) L. Breiman: Mach. Learn., 24 (1996), 123.
34) Y. Amit and D. Geman: Neural Comput., 9 (1997), 1545.
35) T. K. Ho: Proc. 3rd Int. Conf. on Document Analysis and Recognition, IEEE, New York, (1995), 278.
36) T. K. Ho: IEEE Trans. Pattern Anal. Mach. Intell., 20 (1998), 832.
37) A. Natekin and A. Knoll: Front. Neurorobot., 7 (2013), Article 21.
38) A. Gómez-Ríos, J. Luengo and F. Herrera: Int. Conf. on Hybrid Artificial Intelligence Systems, Springer, Berlin, (2017), 268.
39) D. Nielsen: Master’s thesis, Norwegian University of Science and Technology, (2016), http://hdl.handle.net/11250/2433761, (accessed 2017-03-13).
40) T. Q. Chen and C. Guestrin: Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, ACM, New York, (2016), 785.
41) G. L. Ke, Q. Meng, T. Finley, T. F. Wang, W. Chen, W. D. Ma, Q. W. Ye and T. Y. Liu: Advances in Neural Information Processing Systems, Curran Associates Inc., New York, (2017), 3146.
42) Y. X. Liang, J. Y. Wu, W. Wang, Y. J. Cao, B. L. Zhong, Z. K. Chen and Z. Z. Li: Proc. 2nd Int. Conf. on Artificial Intelligence and Pattern Recognition, ACM, New York, (2019), 150.
43) H. Zeng, C. Yang, H. Zhang, Z. H. Wu, J. M. Zhang, G. J. Dai, F. Babiloni and W. Z. Kong: Comput. Intell. Neurosci., 2019 (2019), Article ID 3761203.
44) E. Brochu, V. M. Cora and N. D. Freitas: arXiv: 1012.2599, (2010), https://arxiv.org/abs/1012.2599, (accessed 2010-12-12).
45) J. Snoek, H. Larochelle and R. P. Adams: Advances in Neural Information Processing Systems, Curran Associates Inc., New York, (2012), 2951.
46) K. Swersky, J. Snoek and R. P. Adams: Advances in Neural Information Processing Systems, Curran Associates Inc., New York, (2013), 2004.
47) M. Pelikan, D. E. Goldberg and E. Cantú-Paz: Proc. Genetic and Evolutionary Computation Conf. (GECCO-99), Morgan Kaufmann Publishers Inc., San Francisco, (1999), 525.
48) S. Shin, Y. Lee, M. Kim, J. Park, S. Lee and K. Min: Eng. Appl. Artif. Intell., 94 (2020), 103761.

Corresponding author

Register with J-STAGE for free!