Predicting Hot-rolled Strip Crown Using a Hybrid Machine Learning Model

Yafeng Ji; Yu Wen; Wen Peng; Jie Sun

doi:10.2355/isijinternational.ISIJINT-2023-203

Abstract

The stability of crown is a crucial factor in ensuring the quality of hot-rolled strips. In this study, a hybrid model based on ensemble learning is developed, incorporating four reliable ML models, namely support vector machine (SVM), gaussian process regression (GPR), artificial neural network (ANN), and random forest (RF). To enhance the accuracy and interpretability of the resulting crown model, pretreatment methods such as feature selection and cluster analysis are employed. The feature selection method based on mechanism analysis and maximum information coefficient (MIC) is used to obtain the optimized feature subset, while the K-means clustering algorithm is utilized to measure data similarity and cluster data points with high similarity. Analysis of experimental results indicates that the four single ML models exhibit good prediction performance for strip crown, with determination coefficients above 0.96. The hybrid model outperforms each of the single models in terms of prediction accuracy. Moreover, the incorporation of pretreatment methods leads to an increase in the determination coefficient and a decrease in the root mean square error for each model, culminating in the superior overall performance of the hybrid model established after pretreatment. These findings highlight the potential of the proposed approach for improving the accuracy and reliability of ML models in complex industrial environments.

1. Introduction

In the iron and steel production industry, hot rolling is a crucial method for processing steel. By subjecting the heated metal strip to compression between rollers, plastic deformation occurs, resulting in the production of strip products. The crown is a fundamental metric for assessing the quality of strip steel, typically defined as the variation in thickness along the width of the strip. Given that the central region of the strip tends to become thicker, it is imperative to implement measures to mitigate this effect in order to maintain precise control over the strip’s shape. For iron and steel enterprises, severe deviations in strip crown can interrupt the production process, leading to substandard product quality and eventual product rejection.¹⁾ Therefore, it is essential to establish an accurate prediction model for strip crown to ensure the quality and efficiency of the hot rolling process.

The hot rolling process of strip steel is characterized by its multi-variant, strongly coupled, nonlinear, and time-varying nature. To predict strip parameters, researchers have developed traditional physical models, finite element analysis models, and ML models. Physical models are based on the analysis of the mechanism of the hot rolling process. Pour et al.²⁾ studied the thermal crown and thermal wear of work rolls and established a prediction model based on numerical methods. Li et al.³⁾ proposed a tangential velocity field and strain rate field to better predict rolling force, and discussed the influence of rolling conditions on rolling force and related parameters. However, the establishment of physical models is complicated and tedious, and many simplifications and approximations are inevitably made. Finite element analysis models can overcome some of these limitations by simulating the hot rolling process in a more realistic manner. The finite element model is established by specifying the meshing and boundary conditions. Hu et al.⁴⁾ used the finite element method to analyze the evolution of roll surface temperature and stress and compared it with the mathematical model. Li et al.⁵⁾ analyzed the deformation of roll and strip during hot rolling and tempering by an elastic-plastic finite element method, and proposed a local variable crown work roll profile suitable for industrial applications. Although the finite element method provides flexibility in predicting strip parameters, it requires a significant amount of analysis experience.

In recent years, artificial intelligence (AI) has garnered the attention of metallurgical researchers. ML models do not require a robust knowledge base of the metallurgical industry, and can solve complex problems in rolling by mining data information from industrial sites. Specifically, Ding et al.⁶⁾ proposed a prediction method for strip bending and roll seam straightness by combining mechanism models and an ML model, thereby demonstrating the applicability of this fusion approach. Wang et al.⁷⁾ presented an artificial neural network model optimized using a genetic algorithm (GA) for predicting bending force, and determined the optimal network structure. Liu et al.⁸⁾ proposed a GA and particle swarm optimization (PSO) algorithm to optimize the rolling force prediction model of the extreme learning machine. Xu et al.⁹⁾ developed a convolutional neural network model to predict the mechanical properties of strip steel and conducted a sensitivity analysis on metallurgical phenomena. Sun et al.¹⁰⁾ designed a flatness prediction model for cold-rolled strip steel based on kernel partial least squares combined with ANN and optimized the setting value of the flatness executor. Li et al.¹¹⁾ developed a cold rolling force prediction model by combining the T-S fuzzy neural network and analytical model. Additionally, Ji et al.¹²⁾ proposed a hybrid method that combines ML and GA to establish a strip width deviation prediction model, achieving a trade-off between model accuracy and interpretability.

Although ML methods have been extensively utilized in steel rolling, most studies of strip crown prediction focus on the optimal selection of model parameters to enhance prediction performance. Deng et al.¹³⁾ conducted a comparison of three models for strip crown prediction, namely ANN, non-dominated sorting genetic algorithm II (NSGA II) optimized ANN, and deep neural network (DNN). Wang et al.¹⁴⁾ proposed an enhanced PSO optimized SVM model for predicting strip crown. Ensemble learning, as a sophisticated method of AI, has gained increasing attention in numerous fields because of its exceptional predictive capability.^15,16,17) Despite its successes in many domains, it has yet to be commonly applied in the rolling industry. In this paper, we propose several criteria for the development of a practical model, including compatibility with physical models, interpretability, accuracy, stability, and generalization. To enhance physical compatibility and interpretability, we propose a feature selection method that relies on the physical model and selects fewer feature parameters. Although a general ML model can learn rules from disordered data of multidimensional heterogeneity, it may suffer from small amounts of abnormal prediction bias. In the rolling process, such abnormal prediction deviations can adversely affect the control level of the rolling mill. Therefore, we propose a hybrid model to mitigate this issue.

The present study focuses on the prediction problem of strip crown with an emphasis on accuracy, stability, and generalization. To address this, a hybrid method that integrates feature selection, cluster analysis, PSO algorithm, and ensemble learning is proposed. A pretreatment method that combines feature selection and cluster analysis is proposed, and the impact of the number of features and clusters on the performance of the prediction model is analyzed. The PSO algorithm is employed to determine the weight of the single model and establish a hybrid model, and the performance of the model is assessed.

2. Hot Strip Rolling Technology

2.1. Definition of Strip Crown

The difference in thickness between the center of the strip cross-section and a reference point located 40 mm away from the strip edge is commonly referred to as the strip crown.¹⁸⁾ The strip crown can be represented mathematically using Eq. (1).

C 40 = h c - h 40 + h 40 ′ 2

(1)

Where h_c denotes the strip center thickness, h₄₀ and h 40 ′ denote the thickness of the edge reference points on both sides, respectively.

2.2. Hot Rolling Process and Traditional Physical Model

Figure 1 depicts a 1580 mm hot-rolled strip production line, which involves the coordination of several complex processes to achieve successful strip rolling. This process is influenced by the cross-coupling effect between mill stands, which further complicates the process of strip production. Furthermore, the rolling process is highly sensitive to changes in process parameters, and even slight abnormalities in these parameters can result in significant errors that may accumulate over time. As such, the production of hot-rolled strips necessitates acute attention to detail and the adoption of precise control measures. The following equation is a model for calculating crown in engineering.¹³⁾

C(i)=C(i-1)× C η (i)× K η (i)+P(i)× C P (i)× K P (i) +F(i)× C F (i)× K F (i)+B(i)× C B (i)× K B (i) +{ C hR (i)+ C hW (i)+ C hT (i) }× C hC (i)× K hC (i) + C δ (i)× C hδ (i)× K δ (i)+ C ε (i) (i=1∼7)

(2)

Where i is the rolling mill number; C(i) is the strip crown; P(i) is the rolling force; F(i) is the bending force; B(i) is the strip width; C_hR(i), C_hW(i), C_hT(i) are the basic crown of work roll, thermal expansion crown, wear crown, respectively; C_δ(i) is the equivalent crown of rolling shifting; C_ε(i) is the model correction constant. The residual components of the equation are the influence coefficient and correction coefficient of process parameters. For example, C_F(i) and K_F(i) are the influence coefficient and correction coefficient of bending force, respectively. The coefficients in the crown equation are intricately linked to critical process parameters, such as steel grade, rolling force, and bending force, among others. The determination of these coefficients is a laborious and demanding process. Our objective is to identify and screen crucial feature variables from the physical models mentioned above and to evaluate their impact on the strip crown. To construct a hybrid machine learning model compatible with the physical models.

Fig. 1. Hot-rolled strip production line. (Online version in color.)

3. Establishment of ML Model

In this section, SVM, GPR, ANN, and RF models are introduced. Previous studies have extensively investigated the suitability of these models in addressing nonlinear problems.^19,20,21,22) This paper evaluates the performance of these models in analyzing a hot rolling data set.

SVM are machine learning models that utilize kernel functions to nonlinearly map low-dimensional data into high-dimensional space. By establishing a decision hyperplane in this high-dimensional space and maximizing the distance between positive and negative hyperplanes, SVM can achieve pattern classification or nonlinear regression.¹⁹⁾ In this study, the radial basis function (RBF) is employed as the kernel function, as shown in Eq. (3).

K( x i , x j )=exp( - ‖ x i - x j ‖ 2 2 σ 2 )

(3)

GPR is a non-parametric regression prediction model that is based on statistical Bayesian theory. It provides probabilistic predictions and allows for the computation of confidence intervals when predicting fixed points. GPR adapts the number of model parameters based on the information provided by training samples and incorporates prior knowledge of existing objects into the modeling process. By combining this prior knowledge with experimental data, GPR obtains a posterior Gaussian process model.²³⁾

ANN is the result of human exploration of biological neural networks and exhibits significant capability in handling nonlinear problems. The regression process in ANN is divided into two stages. First, the network is trained by moving forward according to preset weights and biases, and the output of the hidden layer and output layer nodes are obtained in turn. Second, the error backpropagation algorithm is utilized to adjust the weights and biases based on the prediction deviation.²⁴⁾

RF is an ensemble algorithm that combines multiple decision trees (DT) using the “bagging” method. RF incorporates three advanced concepts: bootstrap, out-of-bag (OOB) data, and random feature selection.²⁵⁾ The bootstrap method improves sample diversity by random sampling, while OOB divides some data for testing the model’s generalization ability. Random feature selection enhances tree diversity and mitigates the problem of overfitting commonly associated with decision trees. The predicted results of RF are calculated by Eq. (4).

ψ(x)= 1 t ∑ k=1 t R k (x)

(4)

Where R_k(x) is the predicted values of the kth tree, and t is the number of trees.

4. Hybrid Prediction Model Based on Ensemble Learning

In view of the strong dependence of ML models on data quality, a pretreatment method based on feature selection and cluster analysis is proposed. Subsequently, a hybrid model is established using the resulting data set. The research process of this paper is illustrated in Fig. 2.

Fig. 2. The research process of this paper. (Online version in color.)

4.1. Pretreatment Methods

Feature selection is a process that involves removing irrelevant and redundant features while retaining those that have a significant impact on output variables. When building a crown prediction model, it is crucial to identify the input features that strongly influence the crown. Based on physical models and empirical knowledge, we can select features related to the crown, but it is not clear which features are significant and which are redundant. Therefore, this section proposes a statistical method to mine the rules between features and extract features with a strong correlation with the crown.

The maximum information coefficient (MIC) was initially introduced by Reshef et al.²⁶⁾ in 2011 to evaluate the nonlinear correlation between high-dimensional data. The value of MIC ranges from 0 to 1, with a value closer to 1 indicating a stronger correlation. Select each input feature as X and the crown of strip steel after finishing rolling as Y. The MIC value is then calculated between them using the following formula:

MIC[X;Y]= max a×b<B I[X;Y] lo g 2 (min(a,b))

(5)

Where a and b denotes the discretization of X and Y into a two-dimensional coordinate system to form a grid distribution of a×b, B denotes the upper limit of the grid size, and I[X;Y] denotes the mutual information between X and Y, calculated by Eq. (6).

I[X;Y]= ∑ y∈Y ∑ x∈X p(x,y)lo g 2 p(x,y) p(x)p(y)

(6)

Where p(x,y) represents the joint probability distribution of X and Y, and p(x), p(y) represent the marginal probability distribution of X and Y, respectively.

The K-means clustering algorithm is a method that measures data similarity based on distance and classifies data into different clusters. This study employs the K-means clustering algorithm for the analysis of rolling data following dimensionality reduction. By clustering data with high similarity, the ML model can be trained and tested on each cluster. Finally, the prediction results of each cluster are combined to form the final result. We believe that for ML models, more effective sample data contains more valuable internal information. Cluster analysis can facilitate the generation of multiple data clusters. The internal data of its cluster is highly similar and stable, and the quality of training data is improved.

4.2. Establishment of Hybrid Model

This study proposes a novel hybrid model based on ensemble learning, which combines multiple weak learning models to create a stronger and more reliable learning model. The hybrid model offers several advantages over single-model approaches, including the ability to better balance the relationship between the amount of training data and the size of the hypothesis space, and to compensate for some limitations in the search learning process of a single model. Such as, ANNs are prone to getting stuck in local minima, while SVMs are sensitive to outliers and missing data. Additionally, the hybrid model has the potential to extend the function space and provide a more accurate objective function.²⁷⁾

Ensemble learning can be categorized into two types based on the ensemble modes, namely homogenous and heterogeneous ensemble.²⁸⁾ In this study, our emphasis lies in leveraging a widely trusted foundational learner to enhance the effectiveness of heterogeneous ensemble. Specifically, we select four ML models, namely SVM, GPR, ANN, and RF, as base models due to their good universality and ability to solve various complex regression prediction problems.

The modeling was carried out using a predefined data set, and four ML models were developed in parallel based on relevant parameter settings to generate diverse prediction results, denoted as SVM(T₁), GPR(T₂), ANN(T₃), and RF(T₄). The final prediction result (T) of the strip crown is obtained through weighted regression coupling, as described by Eq. (7).

T= k 1 × T 1 + k 2 × T 2 + k 3 × T 3 + k 4 × T 4

(7)

Where k₁, k₂, k₃, k₄ represent the weight coefficient of each single model, respectively. The framework for constructing a hybrid model through a heterogeneous ensemble is shown in Fig. 3.

Fig. 3. Framework of a hybrid model constructed by a heterogeneous ensemble. (Online version in color.)

Given the significant impact of the weight distribution of a single model on the ensemble effect, this study analyzes two weight determination methods. Simple averaging is a combination method that does not take into account the performance of each single model. Consequently, the final prediction result exhibits significant randomness, which increases the likelihood of transforming stable and accurate prediction results into unstable ones. In contrast, the PSO algorithm^29,30) can dynamically search the entire solution space to provide weight distributions that improve the model’s prediction results. The optimization objective of the PSO algorithm is to find potential global optimal solutions by continuously optimizing the fitness function. Each particle is defined by its velocity and position, and the individual optimal solution is calculated for each particle based on the initial parameters by constructing a particle swarm that traverses the entire space. Through information sharing between particles, the global optimal solution of the particle swarm is obtained. In each iteration, the velocity and position of each particle are updated based on the individual optimal solution found and the global optimal solution of the current particle swarm. Figure 4 shows the iterative process of particles.

Fig. 4. Iterative process of particles. (Online version in color.)

The root mean square error (RMSE) is sensitive to prediction deviations and is used as the fitness function for the PSO algorithm. For the constraint conditions, we set the weights of each single model between [0,1], with a total sum of 1. To achieve superior optimization outcomes, the PSO algorithm parameters are chosen as shown in Table 1.

Table 1. Parameter selection of PSO algorithm.

Parameter	Description	Value
c₁	Cognitive learning factor	1.5
c₂	Social learning factor	1.5
w	Inertia coefficient	0.8
N	Number of particles	30
s	Maximum number of iterations	100

5. Model Analysis and Discussion

5.1. Data Set

Through analyzing the traditional physical model in section 2.2, it can be inferred that the strip crown is closely related to various process parameters such as rolling force, bending force, rolling shifting, strip width, roll thermal expansion, and thermal wear. Among them, the impact of rolling force and bending force on the strip crown is achieved by causing roll deformation and altering the shape of the roll gap. The amount of rolling shifting may change the bending of the roll, subsequently affecting the shape of the strip steel. Furthermore, during the rolling process, the work rolls undergo extrusion deformation due to direct contact with the strip, leading to thermal expansion and thermal wear. However, directly measuring these parameters in production practice is challenging. Therefore, indirect factors that have an impact on them are considered, such as the complex temperature conduction which affects the thermal expansion of the roll, and the speed of the rolling process which affects the thermal wear of the roll. The width and exit thickness of the strip directly reflect the variation of strip shape and are additional candidate features. Through mechanism analysis, 39-dimensional rolling parameters were selected as candidate input features, as shown in Table 2.

Table 2. Description of mechanism characteristic parameters.

Variable	Description	Parameter	Unit
P₁-P₇	The rolling force	Rolling force	kN
F₁-F₇	The bending force	Bending force	kN
V_l-V₇	The rolling speed	Rolling speed	m/s
R₁-R₇	The gap value of the rollers	Rolling gap	mm
S₁-S₇	The shifting value of the rollers	Rolling shifting	mm
B	The rolled strip width	Strip width	mm
h	The thickness of the strip after finishing rolling	Exit thickness	mm
T_in	The temperature of the strip prior to the rolling process	Entrance temperature	°C
T_out	The final temperature of the strip after the rolling process	Exit temperature	°C
C₄₀	The strip crown at the exit of the F7 mill stand	Strip crown	μm

Pauta criterion represents an effective approach for identifying and removing anomalous industrial data. It calculates standard deviations based on provided data and subsequently establishes a control limit at three times the standard deviation. Data points that exceed this control limit are considered outliers.³¹⁾ A total of 2732 sets of rolling data were collected from 7 four-roll finishing mills, after which data cleaning was performed using the Pauta criterion to eliminate 38 sets of abnormal data, resulting in 2694 sets of normal data. The data set was then divided into training and test sets at a ratio of 9:1, and each model was trained 10 times to select the best result. The training set was utilized for training and constructing the prediction models, while the purpose of the test set was to evaluate the generalization ability of the models.

In order to eliminate the impact of the order of magnitude differences among the singular sample data, a feature scaling technique has been employed. For the SVM, GPR, and ANN models, features containing only positive values are normalized to the range of [0,1], as shown in Eq. (8), while features containing both positive and negative values are normalized to the range of [-1,1], as shown in Eq. (9).

x ti = x i - x min x max - x min

(8)

x ti = 2( x i - x min ) x max - x min -1

(9)

Where x_i, x_max and x_min represents the original data and its maximum and minimum values, respectively, and x_ti represents the normalized data.

The RF model is insensitive to the value of variables and does not perform feature scaling.

5.2. Evaluation Indicators

Four evaluation indicators were used to compare the performance of the models, including the coefficient of determination (R²), RMSE, mean absolute error (MAE), and mean absolute percentage error (MAPE). Generally, a higher R² value and lower RMSE, MAE, and MAPE values indicate better model performance. The calculation equations are as follows.

R 2 =1- ∑ i=1 n ( y i - y ˆ i ) 2 ∑ i=1 n ( y i - y ¯ ) 2

(10)

RMSE= 1 n ∑ i=1 n ( y i - y ˆ i ) 2

(11)

MAE= 1 n ∑ i=1 n | y i - y ˆ i |

(12)

MAPE= 1 n ∑ i=1 n | y i - y ˆ i y i |

(13)

Where y_i, y ˆ i represent the measured and predicted values, respectively, and y ¯ represent the arithmetic mean of y_i.

5.3. Feature Variable Selection

As the number of features used for training has varying impacts on the accuracy and speed of the algorithm, redundant features can increase not only the training time but also negatively affect the model’s accuracy. In this regard, we computed the MIC values between the crown and candidate features.

As depicted in Table 3, the rolling parameters are obviously correlated with the crown, and are in accordance with the existing physical understanding to a certain extent. For instance, the bending force, rolling shifting, rolling gap value, and downstream stand rolling force have a notable influence on the crown of steel, which can be utilized for regulating board shape during the rolling process. However, it is essential to note that the ranking in Table 3 is not unique and more comprehensive and diverse data may enhance the compatibility of MIC methods with physical understanding. The top rolling parameters from the ranking are chosen as input features, and an error test of the input features is conducted with dimensions ranging from 9 to 39. Figure 5 visualizes the accuracy of single ML models with different numbers of features.

Table 3. Ranking of candidate input features for correlation analysis using the MIC method.

Features	MIC	MIC ranking	Features	MIC	MIC ranking	Features	MIC	MIC ranking
S₂	0.8186	1	P₇	0.5184	14	V₅	0.3932	27
S₅	0.8013	2	R₆	0.4980	15	S₃	0.3932	28
S₆	0.7855	3	P₅	0.4926	16	V₁	0.3742	29
R₁	0.7516	4	R₅	0.4884	17	F₅	0.3649	30
S₁	0.7062	5	F₁	0.4655	18	F₂	0.3503	31
F₄	0.6785	6	F₇	0.4430	19	B	0.3415	32
S₄	0.6571	7	F₃	0.4427	20	P₃	0.3307	33
h	0.6301	8	P₆	0.4242	21	V₂	0.3264	34
R₇	0.6296	9	R₂	0.4126	22	F₆	0.2897	35
S₇	0.6204	10	V₃	0.4080	23	P₁	0.2880	36
R₄	0.5954	11	V₇	0.3995	24	P₂	0.2791	37
R₃	0.5936	12	V₆	0.3972	25	T_out	0.2093	38
P₄	0.5237	13	V₄	0.3954	26	T_in	0.1230	39

Fig. 5. Accuracy of single ML models with different numbers of features (R²). (Online version in color.)

From Fig. 5, it is evident that as the number of input features increases, the accuracy of the four ML models, as represented by R², increases gradually. R² reaches its highest value when the number of features reaches 33. After comprehensive consideration of the model performance, the final input features are determined to be the top-ranked 33-dimensional features.

5.4. Clustering Parameter Selection

The clustering number K is a crucial factor that impacts the accuracy of the ML model. As the number of clusters increases, the size of the single-cluster dataset decreases, which may prevent the ML model from effectively mining information from the data. Therefore, it is necessary to find a balance between the number of clusters and the dataset size. In our study, we tested the number of clusters ranging from 2 to 6. Figure 6 is an intuitive display of the accuracy of single ML models under different clustering numbers, where K=1 denotes the unclustered results.

Fig. 6. Accuracy of single ML models under different clustering numbers (R²). (Online version in color.)

As shown in Fig. 6, it can be observed that the optimal number of clusters varies for different ML models. The SVM model achieves the highest R² and best prediction performance when the number of clusters is 4. Similarly, it can be observed that the optimal number of clusters for the GPR model is 3, while for the ANN and RF models it is 2. The SVM model selects the largest number of clusters because of its ability to mine potential relationships between key and target features based on small data sets, whereas a large data set may make it difficult to select appropriate model parameters, resulting in poorer prediction results. As a non-parametric model, GPR has the advantage of being less prone to over-fitting, even when the dataset is small. Thus, the GPR model tends to select a relatively larger number of clusters. By contrast, insufficient data can lead to instability in the ANN model, increasing the risk of over-fitting. Moreover, it reduces the complexity of the decision tree in the RF model, which may limit its ability to learn complex patterns and knowledge within the rolling data. Consequently, both the ANN and RF models tend to select a smaller number of clusters.

5.5. Performance Evaluation

The experiments reported in this paper present the results obtained from the test set after completion of the training to assess the generalizability of the models. In this section, we compare the hybrid model built after pretreatment (Hybrid-2) with the process models. These comprise models constructed directly without pretreatment (SVM-1, GPR-1, ANN-1, RF-1, Hybrid-1) and single models developed after pretreatment (SVM-2, GPR-2, ANN-2, RF-2).

The crucial parameters of the four ML models were fine-tuned, and the optimal outcomes obtained from ten experiments were integrated into Eq. (7). The weights assigned to each single model are presented in Table 4. It can be observed that in the final established Hybrid-2 model, the weights assigned to SVM, GPR, ANN, and RF are 0.1916, 0.2322, 0.1458, and 0.4304, respectively.

Table 4. Single model weights.

Single models	Hybrid-1	Hybrid-2
SVM	0.2365	0.1916
GPR	0.2982	0.2322
ANN	0.1242	0.1458
RF	0.3411	0.4304

Table 5 presents a comparison of the performance of each model. It can be observed that RF and GPR demonstrate superior prediction results among the four single models, with higher R² and smaller RMSE, MAE, and MAPE, indicating their more effective capability in predicting strip crown. Table 4 presents the weights of the two hybrid models, ranked from highest to lowest: RF > GPR > SVM > ANN. This ranking is positively correlated with the prediction accuracy, demonstrating that high-precision single models contribute more to the results of the hybrid model. In other words, the hybrid model weakens the effect of poor single models.

Table 5. Performance comparison between Hybrid-2 model and process models.

Models	Performance indicators
Models	R2	RMSE	MAE	MAPE
SVM-1	0.9683	1.7032	1.3587	4.1333
GPR-1	0.9719	1.5979	1.2469	3.7178
ANN-1	0.9654	1.7739	1.3838	4.1618
RF-1	0.9738	1.5464	1.1972	3.6143
Hybrid-1	0.9753	1.5124	1.1742	3.5773
SVM-2	0.9714	1.6150	1.2702	3.8824
GPR-2	0.9736	1.5620	1.2142	3.6396
ANN-2	0.9671	1.7420	1.3437	4.0235
RF-2	0.9749	1.5166	1.1792	3.5866
Hybrid-2	0.9766	1.4905	1.1666	3.5442

Upon further examining Table 5, it is evident that the R² values of SVM-1, GPR-1, ANN-1, and RF-1 were lower than the R² values of their respective counterparts SVM-2, GPR-2, ANN-2, and RF-2. Furthermore, the other evaluation metrics of the latter were significantly lower, providing ample evidence for the efficacy of the pretreatment methods employed in the strip crown prediction model. The comparison between the hybrid models further supports this conclusion. Upon comparing the performance of the hybrid model with that of the single models, it can be observed that the Hybrid-2 model exhibits superior performance, as evidenced by its R² of 0.9766, RMSE of 1.4905, MAE of 1.1666, and MAPE of 3.5442. Notably, the Hybrid-2 model outperformed SVM-2, GPR-2, ANN-2, and RF-2 in all the indicators, displaying superior predictive ability during the testing process.

Figures 7 and 8 visualize the fitting performance of each model, displaying the absolute error at each point using a graded color scale. The two dashed lines in the figures represent an absolute error of ±4 μm. It can be observed that all the established models can be well fitted during the testing process, with error distribution histograms that approximate a normal distribution. Among them, the absolute errors of the hybrid models are mostly within 4 μm. Specifically, as shown in Fig. 7(e), 98.14% of the sample prediction errors in the Hybrid-1 model are less than 4 μm, while in Fig. 8(e), 98.51% of the sample prediction errors in the Hybrid-2 model are within 4 μm. These findings highlight the high stability of the established hybrid models in predicting strip crown.

Fig. 7. Performance of the directly built prediction models. (Online version in color.)

Fig. 8. Performance of the prediction models built after pretreatment. (Online version in color.)

To further validate the efficacy of the developed Hybrid-2 model, advanced ML models, such as extreme learning machine (ELM), and deep learning models, such as convolutional neural network (CNN), have been employed as additional comparison objects. As depicted in Fig. 9, the results of the Hybrid-2 model are compared with those obtained from the aforementioned models. The ELM model exhibited comparatively poor predictive performance, with R² and RMSE values of 0.9650 and 1.7903, respectively. This outcome may be attributed to the fact that the performance of ELM is highly dependent on the selection of the number of nodes in the hidden layer. In contrast, the CNN model exhibited better performance, with R² and RMSE values of 0.9726 and 1.5839, respectively. However, the accuracy of the CNN model is constrained by the size of the data set.

Fig. 9. Performance comparison of Hybrid-2, ELM and CNN models. (Online version in color.)

In summary, the Hybrid-2 model proposed in this study exhibits superior performance in strip crown prediction. This model demonstrates a remarkable level of robustness and stability, while also avoiding the strong dependence on the selection of model parameters. Furthermore, it not only meets the accuracy requirements for production purposes but also displays good generalization. Simultaneously, the proposed pretreatment methods comprehensively consider the selection processing of the data set, emphasizing the selection of features based on the mechanistic model in combination with the MIC method. As a result, the model exhibits physical compatibility, an important characteristic for real-world applications.

6. Conclusions

This paper presents a novel approach for predicting the crown of a hot-rolled strip using a hybrid ML model. The proposed model combines the concepts of feature selection, cluster analysis, PSO algorithm, and ensemble learning, thereby complementing and enhancing each other’s strengths to overcome the limitations of a single ML model. The effectiveness and robustness of the developed approach were rigorously evaluated through a comparative experiment. To summarize the full work, the following conclusions can be drawn.

(1) Through testing the four selected single models, the results revealed that four models exhibited the satisfactory performance in predicting the crown of the strip, with R² values exceeding 0.96. Moreover, the models were ranked according to their prediction performance as follows: RF > GPR > SVM > ANN.

(2) The 33-dimensional features, which were selected using the method of mechanism analysis and MIC feature selection, were deemed to be more appropriate input features for modeling.

(3) The incorporation of cluster analysis improved the quality of modeling data and significantly enhanced the predictive performance of the crown model, particularly when sufficient data is available.

(4) The hybrid model obtained by a heterogeneous ensemble has the advantages of single models. Upon comparison with other relevant models, the proposed Hybrid-2 model demonstrated optimal evaluation indices and prediction performance.

The Hybrid-2 model developed in this study is deemed suitable for the prediction of hot-rolled strip crown, exhibiting remarkable performance in terms of prediction accuracy, stability and generalization. In practical industrial applications, the predicted crown values can serve as a valuable reference for adjusting mill equipment and optimizing the rolling process, ensuring the resulting strip meets the desired quality standards. Moreover, the model is built through offline training, and once the training phase is completed, the actual prediction time is notably short and feasible in practice. The findings of this research are expected to provide guidance for the production of hot-rolled strip.

Acknowledgements

This work was financially supported by the National Natural Science Foundation of China (No. 52005358), the Key R&D Program of Shanxi Province (No. 202102020101011), and the Technological Innovation Talent Team Special Plan of Shanxi Province (No. 202204051002002).

References

1) G. Mücke, P. D. Pütz and F. Gorgels: Flat-Rolled Steel Processes Advanced Technologies, CRC Press, Boca Raton, (2009), 287. https://doi.org/10.1201/9781420072938-c27
2) H. S. S. Pour, H. K. Beheshti, Y. Alizadeh and M. Poursina: Neural Comput. Appl., 24 (2014), 1123. https://doi.org/10.1007/s00521-012-1322-6
3) S. Li, Z. G. Wang and Y. F. Guo: J. Manuf. Process., 47 (2019), 202. https://doi.org/10.1016/j.jmapro.2019.09.037
4) K. J. Hu, Q. H. Shi, W. Q. Han, F. X. Zhu and J. F. Chen: Materials, 13 (2020), 5054. https://doi.org/10.3390/ma13215054
5) H. Li, G. Y. Zhou, A. R. He, Z. H. Zhang, C. H. Yao, C. Liu, J. W. Zhao, W. G. Li and J. Shao: J. Iron Steel Res. Int., 29 (2022), 1619. https://doi.org/10.1007/s42243-022-00762-y
6) J. G. Ding, Y. H. C. He, L. P. Kong and W. Peng: ISIJ Int., 61 (2021), 2540. https://doi.org/10.2355/isijinternational.ISIJINT-2020-357
7) Z. H. Wang, D. Y. Gong, X. Li, G. T. Li and D. H. Zhang: Int. J. Adv. Manuf. Technol., 93 (2017), 3325. https://doi.org/10.1007/s00170-017-0711-5
8) J. Y. Liu, X. X. Liu and T. L. Ba: Complexity, (2019), 3476521. https://doi.org/10.1155/2019/3476521
9) Z. W. Xu, X. M. Liu and K. Zhang: IEEE Access, 7 (2019), 47068. https://doi.org/10.1109/ACCESS.2019.2909586
10) J. Sun, P. F. Shan, Z. Wei, Y. H. Hu, Q. L. Wang, W. Peng and D. H. Zhang: J. Iron Steel Res. Int., 28 (2021), 563. https://doi.org/10.1007/s42243-020-00505-x
11) J. D. Li, X. C. Wang, Q. Yang, Z. Guo, L. B. Song and X. Mao: Int. J. Adv. Manuf. Technol., 121 (2022), 4087. https://doi.org/10.1007/s00170-022-09567-5
12) Y. J. Ji, S. X. Liu, M. C. Zhou, Z. Y. Zhao, X. W. Guo and L. Qi: Inf. Sci., 589 (2022), 360. https://doi.org/10.1016/j.ins.2021.12.063
13) J. F. Deng, J. Sun, W. Peng, Y. H. Hu and D. H. Zhang: Appl. Soft Comput., 78 (2019), 119. https://doi.org/10.1016/j.asoc.2019.02.030
14) Z. H. Wang, Y. M. Liu, D. Y. Gong and D. H. Zhang: Steel Res. Int., 89 (2018), 1800003. https://doi.org/10.1002/srin.201800003
15) K. Siwek, S. Osowski and R. Szupiluk: Int. J. Appl. Math. Comput. Sci., 19 (2009), 303. https://doi.org/10.2478/v10006-009-0026-2
16) J. Heinermann and O. Kramer: Renew. Energy, 89 (2016), 671. https://doi.org/10.1016/j.renene.2015.11.073
17) D. C. Feng, Z. T. Liu, X. D. Wang, Z. M. Jiang and S. X. Liang: Adv. Eng. Inf., 45 (2020), 101126. https://doi.org/10.1016/j.aei.2020.101126
18) Y. Liu, X. J. Wang, J. Sun, G. M. Liu, H. Y. Li and Y. F. Ji: Steel Res. Int., 94 (2023), 2200447. https://doi.org/10.1002/srin.202200447
19) Q. Li, Q. L. Meng, J. J. Cai, H. Yoshino and A. Mochida: Appl. Energy, 86 (2009), 2249. https://doi.org/10.1016/j.apenergy.2008.11.035
20) A. Zeng, H. Ho and Y. Yu: J. Build. Eng., 28 (2020), 101054. https://doi.org/10.1016/j.jobe.2019.101054
21) S. S. Picart, P. Tandeo, E. Autret and B. Gausset: Remote Sens., 10 (2018), 224. https://doi.org/10.3390/rs10020224
22) Z. Li, D. H. Wen, Y. Ma, Q. Wang, G. Q. Chen, R. Q. Zhang, R. Tang and H. He: J. Iron Steel Res. Int., 25 (2018), 717. https://doi.org/10.1007/s42243-018-0104-5
23) S. Sniekers and A. van der Vaart: Electron. J. Stat., 9 (2015), 2475. https://doi.org/10.1214/15-EJS1078
24) X. Yao: Proc IEEE, 87 (1999), 1423. https://doi.org/10.1109/5.784219
25) L. Breiman: Mach. Learn., 45 (2001), 5. https://doi.org/10.1023/A:1010933404324
26) D. N. Reshef, Y. A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher and P. C. Sabeti: Science, 334 (2011), 1518. https://doi.org/10.1126/science.1205438
27) T. G. Dietterich: Proceedings of the First International Workshop on Multiple Classifier Systems, Springer, Berlin, (2000), 1. https://doi.org/10.1007/3-540-45014-9_1
28) S. S. Rathore and S. Kumar: Expert Syst. Appl., 82 (2017), 357. https://doi.org/10.1016/j.eswa.2017.04.014
29) D. S. Wang, D. P. Tan and L. Liu: Soft Comput., 22 (2018), 387. https://doi.org/10.1007/s00500-016-2474-6
30) J. G. Ding, Y. H. C. He, L. P. Kong and W. Peng: ISIJ Int., 61 (2021), 2540. https://doi.org/10.2355/isijinternational.ISIJINT-2020-357
31) Y. F. Ji, L. B. Song, H. Yuan, H. Y. Li, W. Peng and J. Sun: Appl. Soft Comput., 146 (2023), 110670. https://doi.org/10.1016/j.asoc.2023.110670

Corresponding author

Register with J-STAGE for free!