ISIJ International
Online ISSN : 1347-5460
Print ISSN : 0915-1559
ISSN-L : 0915-1559
Instrumentation, Control and System Engineering
An Improved CBR Model Using Time-series Data for Predicting the End-point of a Converter
Mao-qiang GuAn-jun Xu Fei YuanXiao-meng HeZhi-feng Cui
著者情報
ジャーナル オープンアクセス HTML

2021 年 61 巻 10 号 p. 2564-2570

詳細
Abstract

The end-point temperature is one of parameters for the end-point control in the converter. Accurate prediction of the end-point temperature is helpful to improve the hit rate of the end-point. An improved CBR model using time-series data (CBR_TM) was proposed to predict the end-point carbon content and temperature in the converter according to the data types of process parameters. The attributes of the cases in the model not only include the influencing factors of single-value type such as composition and temperature of hot metal, but also include the influencing factors of time-series type such as lance position and oxygen flow, in the case retrieval process, the single-value data similarity and time-series data similarity between the cases were calculated based on the Euclidean distance and the dynamic time warping algorithm, and then weighted to obtain the comprehensive similarity. Then the influence of the weight of the time-series data similarity on the prediction accuracy was studied based on the production data. Finally, the prediction accuracy of the established model was also compared to models based on SVR and BPNN. The results show that: The prediction accuracy of the model increases at first and then decreases with the increase of similarity weight of time series data. The prediction accuracy of the model was the highest when the weight of time-series data similarity was 0.4 and was better than the SVR and BPNN models. The established can meet the requirements of field production.

1. Introduction

Converter steelmaking is a very complicated multi-variant multi-phase high-temperature physical and chemical process. Obviously, it is featured with a high reaction rate, multiple influence factors and reaction complexity.1) Converter endpoint control is mainly concerned with end-point carbon content and temperature. However, inaccurate endpoint control may lead to many problems, including a rise of oxygen content in molten steel, iron loss increases, blowing time extension and loss of lining life.2) Therefore, increasing hit rates of converter end-point control is helpful to improve product quality, rhythm of production and corporate profits.

At present, converter end-point control models can be divided into those of static control and dynamic control. In terms of the static control model, it is the foundation of dynamic control models. Based on relevant modeling principles, static control models can be further classified into mechanism models and data-driven models. However, accuracy of the mechanism model is rather low because such a model is comparatively ideal and model parameters cannot be obtained under limitation of field conditions. With rapid improvement in automation and informatization of steelworks, big data platforms have been established in different plants. In this way, mass production data can be collected. In this context, the data-driven end-point prediction model may provide a solution to hit rate increase of converters.

At present, multiple methods were used to predict endpoints of various processes in steelmaking plant, such as support vector regression, neural network, decision tree and case-based reasoning (CBR), etc. For example, a static prediction model is raised for converters by Gao Chuang et al. by virtue of the modified twin support vector machine;3,4,5) Han Min et al. established a static control model for converter steelmaking based on ANFIS and robust support vector machine;6) He Fei et al. constructed a converter end-point phosphorus content prediction model based on PCA and BP neural network;7) Lv Wu et al. proposed an end-point temperature prediction model in LF based on extreme learning machine;8) Tian Huixin et al. proposed an ensemble extreme learning machine model based on modified AdaBoost.RT algorithm for predicting end-point molten steel temperature in LF refining,9) Han Min et al. established an end-point prediction model for converter steelmaking based on membrane algorithm evolving extreme learning machine.10) Wang Xiaojun et al. proposed end-point temperature prediction models in LF refining are respectively based on random forest and ensemble of regression trees of bootstrap feature subsets.11,12) He Fei et al. established an end-point prediction model in LF based on CBR.13) An improved CBR model based on mechanism model similarity was proposed by Feng Kai et al. to predict endpoint phosphorus content in dephosphorization converters.14) Wang Xinzhe et al. proposed a CBR model based on causality for static control of converter steelmaking.15) Jiang Shenglong et al. proposed a hybrid model based on multiple linear regression and Gaussian process regression for predicting oxygen consumption in converter.16) Yan Liangtao et al. proposed a predicting model of carbon content at end point based on kernel partial least squares regression of genetic algorithm (GA-KPLSR) in converter steel-making.17) A multi-task learning (MTL) data-driven endpoint prediction approach was established by Cheng Jin et al. for steelmaking.18) Lv Wu proposed a novel process modeling method for steel sulphur content soft sensing during ladle furnace steel refining.19) Liang Yanrui et al. proposed a two-step case-based reasoning method based on attributes reduction for predicting the endpoint phosphorus content.20) Wang Hongbing et al. proposed an integrated CBR model for predicting endpoint temperature of molten steel in AOD.21) Okura Toshinori proposed a high-performance prediction of molten steel temperature in tundish through gray-box model.22) Ahmad Iftikhar et al. proposed a prediction model of molten steel temperature in steel making process with uncertainty by integrating gray-box model and bootstrap filter.23)

Although the above prediction models can predict the end-point of the processes in steelmaking process more accurate than the mechanism models. However, the models didn’t fully consider the time-series type data such as oxygen lance position, oxygen flow and bottom blow gas flow. These time-series type process parameters have an important influence on the end-point composition and temperature in converter. To solve this problem, an improved CBR model using time-series data in converter was proposed to predict the end-point carbon content and temperature in converter.

The remaining sections are organized as follows: establishment of the endpoint prediction model in converter is presented in Section 2, the experiments and discussions are presented in Section 3, and Section 4 contains conclusion.

2. Establishment of the Endpoint Prediction Model in Converter

2.1. The Procedures of Converter Steelmaking Process

The main procedures of a conventional converter can be described in Fig. 1.24)

Fig. 1.

The main procedures of the converter.

(1) In each converter process, a certain proportions of molten hot metal and scraps were loaded in a converter;

(2) Then the oxygen lance was lowered to blow the oxygen into a molten pool at a certain rate; at the same time, auxiliary raw materials (e.g., lime, dolomite and sinter) of certain weight were added in different batches into the converter; and, oxygen blown into the converter reacts with elements in hot metal, such as carbon, silicon, manganese and phosphorus, producing slag and furnace gas;

(3) Since about 80% of the total oxygen consumption was blown into the metal pool, a sublance was lowered to measure carbon content and temperature (i.e., “TSC”) of hot metal within the molten pool and take samples for testing;

(4) According to “TSC” measurement results, oxygen volume and coolant addition were adjusted in the subsequent blowing process;

(5) As the blowing stage was completed, the sublance was lowered again to measure carbon content and temperature (i.e., “TSO”) of molten steel and depending on corresponding measurement results, a decision of tapping or reblowing was made;

(6) After the smelting, the converter was shaked to tap the molten steel into a ladle, and alloy was also added at the same time;

(7) After the tapping, slag-splashing was performed to protect of the converter lining in some cases; and the entire process may be completed after slag splashing.

2.2. The Principles of Model

CBR is a critical method in the field of artificial intelligence. Once a new problem occurs, similar problems that have been solved and corresponding solutions can be retrieved from the case library. By comparing differences in backgrounds and time of occurrence of the present and the previous problems, solutions to the latter may be adjusted and altered so that a modified solution can be used to settle the former.25) Procedures of CBR mainly consist of case description, case retrieval, case reuse, case revision and case retaining. Among them, case retrieval is a key link. A flow chart of a CBR algorithm has been presented in Fig. 2.26)

Fig. 2.

The process of CBR model.

2.2.1. Case Description

Case description, also known as case representation, is deemed as a basis for case based reasoning. It is aimed at describing a case in a certain way. Generally, case description involves feature description and solution description for a case. As for converter endpoint influence factors, they are given in Fig. 3.

Fig. 3.

The influence factors of end-point of converter.

In accordance with influence factors on converter endpoint compositions and temperature, corresponding data can be categorized into the following two types:

(1) Single-valued Data

Single-valued data primarily include information about hot metal (e.g., temperature, weight, carbon content, silicon content, manganese content, phosphorus content and sulphur content), amount of scrap added, amount of auxiliary raw materials added (e.g., lime, dolomite and sinter) and gas consumption (e.g., oxygen and argon).

(2) Time Series Data

Here, time series data consist of oxygen flow, oxygen lance position and bottom-blowing gas flow.

Therefore, structure of a case can be described as that shown in Fig. 4, that is, Case={Single-valued dataset, time series dataset}, where single-valued dataset = {Hot metal information (e.g., compositions and temperature), auxiliary raw materials (e.g., scrap, lime and dolomite), gas consumption (e.g., oxygen and argon)} and time series dataset ={ oxygen flow, oxygen lance height and argon flow}.

Fig. 4.

The analysis of case attributes.

2.2.2. Case Retrieval

Case retrieval is to find the same or similar case in the case library according to case description of the case to be solved. However, under circumstances that a case has many feature attributes, there is no identical cases in the case library. So a certain calculation method are needed to find the most similar case.

In consistency with description of case attributes, the similarity calculation methods for the single-valued type data and time-series type data were performed respectively.

(1) Single-valued Data Similarity

Similarity of single-valued data can be calculated by various algorithms, such as the Euclidean distance and grey distance. In this paper, the Euclidean distance is selected for similarity calculations. The Euclidean distance between the new case and a case in the case base can be expressed in Eq. (1).   

d(X,Y)= j=1 m w j [ ( x j - y j ) 2 ] (1)

Where, m is the number of influence factors on a case is; xj denotes the jth influencing factor of the new case; yj denotes the jth influencing factor of a case in a case library; wj is the weight of the jth influencing factor.

Similarity between the cases can be calculated by the following Eq. (2).   

G sim (X,Y)= 1 1+d(X,Y) (2)

(2) Time Series Data Similarity

Similarity of time-series data can be calculated by various methods. Roughly, such similarity calculation methods are classified into two categories. One is trajectory point based similarity measurement; and, the other is the trajectory segment based similarity calculation.27) As for the former, it consists of similarity calculation based on global and partial matching respectively. More particularly, the global matching based measurement approaches cover the Euclidean distance model, Dynamic time warping (DTW) and Edit Distance on Real Penalty (ERP). Considering that time-series type data in the converter are featured with different lengths, DTW is adopted here to calculate similarities of this type data. Assumed two time series as A = {A1, A2, …, Ai, …, AN} and B = {B1, B2, …, Bi, …, BM} were designed firstly. By means of DTW, the time shaft was bent so as to acquire the minimum distance between above two time series and determine optimal matching relations of points. In this case, a difference between Ai and Bj that match each other represents the distance for this moment.

With the goal of defining an optimal matching relation, A and B are utilized to form a N×M DTW matrix d expressed in the following Eq. (3):   

d=[ d 1,1 d 1,M d N,1 d N,M ] (3)

In such a DTW matrix, Eq. (4) below was adopted for a distance from start point (1, 1) to end point (N, M) in line with basic thoughts of dynamic programming.   

D N,M = d N,M +min{ D N-1,M , D N-1,M-1 , D N,M-1 } (4)

Where, DN,M refers to a locally-optimal cumulative distance. It is obtained by adding up distances between the current point and its previous point.

Define the time-series similarity between two time series A = {A1, A2, …, Ai, …, AN} and B = {B1, B2, …, Bi, …, BM} as:   

D sim (A,B)= 1 1+ D N,M (5)

(3) Comprehensive Similarity

The single-valued data similarities and time series data similarities were weighted to obtain the comprehensive similarity between the cases. As for the corresponding calculation formula, it is presented below:   

S sim = w single * G sim (X,Y)+ w time * 1 n i=1 n D sim ( A i , B i ) (6)

Where, n denotes the number of time series data variables; wsingle and wtime denote respectively the weight of the single-value data similarity and time-series data similarity, wsingle + wtime =1, the value is determined by the experiment.

2.2.3. Case Reuse

Subsequent to case retrieval, the k-nearest neighbor (KNN) algorithm was selected to solve the problem, as expressed in the following Eq. (7):   

T= i=1 k S i T i i=1 k S i (7)

In the above equation, k denotes the number of reused cases, Si denotes the comprehensive similarity between the new case and the ith most similar case, Ti denotes the solution of the ith most similar case.

3. Experiments and Discussions

3.1. Datasets

To validate prediction accuracy of the proposed model, 946 items of the actual production data from B steelworks are used for such validation. Moreover, they are divided into a training set (including 846 items of data) and a test set (including 100 items of data). Depending on the field data, 16 influence factors are selected from them. To be more concrete, these influence factors consist of 13 single-value type and 3 time-series type. As for the former, they cover temperature of hot metal, weight of hot metal, carbon content in hot metal, silicon content in hot metal, manganese content in hot metal, phosphorus content in hot metal, amount of scraps, amount of the added lime, amount of the added dolomite, amount of the added sinter, concurrent heating reagent, total gas consumption and total argon consumption. In terms of the latter, they are constituted by oxygen flows, oxygen lance height and bottom-blowing argon flows. The time interval of time series data collection is 5 seconds. The statistical results of influencing factors data were shown in Tables 1 and 2.

Table 1. Statistical results of influencing factors of single-value type.
Influence factorsSymbolsMean.MinimumMaximumStd.
temperature of hot metal/°CX1143710801264.22120.40
Weight of hot metal/tX2297220275.5310.55
W[C]iron/%X34.69783.80014.29250.1442
W[Si]iron/%X40.506220.004730.154410.09165
W[Mn]iron/%X50.272240.008730.160960.02762
W[P]iron/%X60.131640.047300.102610.01130
Weight of Scrap/tX758.924.345.224.453
Amount of Lime/tX816.7811.11310.3601.774
Amount of Dolomite/tX915.1212.3114.0421.009
Amount of Sinter/tX1011.80302.69482.2516
Amount of heat supplementary/tX114.07800.59640.82364
Oxygen consumption/Nm3X12169901170014951631
Argon consumption/Nm3X131231040.7734.29
TSO[C]/%Y10.04430.018480.12810.01546
TSO[T]/°CY216751620171518.2

Table 2. Statistical results of the influencing factors of the time-series type.
Influence factorsMeanMaximumMinimumMaximum length of time-seriesMinimum length of time-series
Oxygen flow462/Nm3/min769/Nm3/min0/Nm3/min12/min33/min
lance position1411/mm2850/mm1197/mm13/min34.5/min
Argon flow251/Nm3/h307/Nm3/h122/Nm3/h13/min34.5/min

Where: TSO[C] and TSO[T] denote the measurement results of end-point carbon content and end-point temperature. The length of time-series in Table 2. means the time interval between the end point and the beginning point of time-series data.

3.2. Evaluation Metrics

In order to evaluate the prediction accuracy of the models, three indexes are used to evaluate, which are the mean absolute error (MAE), the root mean square error (RMSE) and hit rate of end-point (HitRate). The calculation formula is as follows:   

MAE= 1 n i=1 n | y ˆ i - y i | (8)
  
RMSE= 1 n i=1 n ( y i - y ˆ i ) 2 (9)
  
HitRate=  the   number   of | y i - y ˆ i |<errorbound n  ×100% (10)

Where: yi and y ˆ i denote the actual and prediction of end-point temperature in the ith case; n is the size of the cases; errorbound denote error range, the error range of carbon content and temperature prediction model is 0.02% and 15°C respectively in this paper.

3.3. Results and Analysis

The parameters of the CBR_TM were set as follows: the data standardization of single-valued type data adopts (−1, 1) standardization, the similarity calculation method was based on Euclidean distance, the weight calculation method is entropy weight method, and the number of reused case is 3. The weights of single-valued data was shown in Table 3.

Table 3. The weights of single-valued data.
X1X2X3X4X5X6X7
Weight0.01340.01190.00990.06410.01160.00820.0578
X8X9X10X11X12X13
Weight0.01620.03910.16530.47400.01110.1174

Dynamic warping (DTW) algorithm is used to calculate the similarity of time-series data, and (−1, 1) standardization is also used for data standardization.

wsingle and wtime were important parameters in the model, the influence on the prediction accuracy was studied in this paper, the setting of wtime was shown in Table 4. The model only consider the single-value data when wtime is 0 and only consider the time-series data when wtime is 1.

Table 4. The setting of ωtime in the model.
NO.Symbolswsinglewtime
1CBR_TM(1,0)10
2CBR_TM(0.9,0.1)0.90.1
3CBR_TM(0.8,0.2)0.80.2
4CBR_TM(0.7,0.3)0.70.3
5CBR_TM(0.6,0.4)0.60.4
6CBR_TM(0.5,0.5)0.50.5
7CBR_TM(0.4,0.6)0.40.6
8CBR_TM(0.3,0.7)0.30.7
9CBR_TM(0.2,0.8)0.20.8
10CBR_TM(0.1,0.9)0.10.9
11CBR_TM(0,1)01

The statistics of prediction accuracy of the model with different wtime was shown in Figs. 5 and 6.

Fig. 5.

Statistical results of evaluation metrics of carbon content prediction models with different ωtime.

Fig. 6.

Statistical results of evaluation metrics of temperature prediction models with different ωtime.

It can be seen from the above figures that with the wtime increases, the MAE and RMSE of models both show a trend of first decreases and then increases and the HitRate of models show a trend of first increases and then decreases, It shows that the prediction accuracy of both carbon content and temperature prediction models first increases and then decreases with the wtime increases. The model get the highest prediction accuracy when wtime was 0.4. For the carbon content prediction model, the MAE, RMSE and HitRate of model with wtime = 0.4 were 6.034×10−5, 7.032×10−5, and 85%, respectively. Compared to the model with wtime = 0, the MAE and RMSE were reduced by 1.048×10−5 and 1.310×10−5 respectively and HitRate increased by 9%. For the temperature prediction model, the MAE, RMSE and HitRate of model with wtime = 0.4 were 8.361°C, 9.687°C, and 89%, respectively. Compared to the model with wtime = 0, the MAE and RMSE were reduced by 1.51°C and 1.68°C respectively and HitRate increased by 9%.

Further analysis shows that the comprehensive utilization of single-value type data and time series data in the prediction model is helpful to improve the accuracy of the prediction model. However, the respective weights of single-value type data and time-series type data should not too high. Otherwise, the impact of a certain type on the endpoint temperature is ignored, and the prediction accuracy of the model is reduced.

In order to further verify the accuracy of the model, the prediction models based on support vector regression (SVR) and Back Propagation Neural Network (BPNN) were also established using the same single-value data in this paper.

The SVR model was constructed by calling the SVR algorithm in the python data mining toolkit scikit-learn. The parameters of the model was setting as follows, the polynomial kernel (poly kernel) was selected as the kernel function and the degree of the polynomial kernel function was three.

The BPNN model was constructed by calling the python deep learning toolkit tensorflow. The parameters of model was setting as follows, the layers of network was four, the input layer node was 13, the number of first hidden layer nodes and the second layer nodes were 8 and 5 respectively. The number of output layer nodes was 1, the activation function is ReLU. The comparison of the evaluation metrics results of different models was shown in Figs. 7 and 8.

Fig. 7.

Comparison of evaluation metrics of different carbon content prediction models.

Fig. 8.

Comparison of evaluation metrics of different temperature prediction models.

It can be seen from the above figures that the performance of the CBR_TM(0.6,0.4) was better than the models based on CBR_TM(1,0), SVR and BPNN. The established model in this paper can meet the requirement of field production that is the HitRate of the models more than 85%.

4. Conclusions

(1) An improved case-based reasoning model using time-series data was established to predict the end-point in the converter. The input variables of the model not only includes single-value type data such as composition and temperature of hot metal but also includes the times-series type data such as lance position and oxygen flow, which makes the attributes of case more comprehensive. And the similarity calculation method of time-series type data was proposed based on dynamic time warping algorithm, which improves the accuracy of case retrieval.

(2) The influence of the wtime on the prediction accuracy of the model was studied in this paper. The results show that the prediction accuracy of both carbon content and temperature prediction models first increases and then decreases with the wtime increases. The model get the highest prediction accuracy when wtime was 0.4. For the carbon content prediction model, the MAE, RMSE and HitRate of model with wtime = 0.4 were 6.034×10−5 7.032×10−5, and 85%, respectively. Compared to the model with wtime = 0, the MAE and RMSE were reduced by 1.048×10−5 and 1.310×10−5 respectively and HitRate increased by 9%. For the temperature prediction model, the MAE, RMSE and HitRate of model with wtime = 0.4 were 8.361°C, 9.687°C, and 89%, respectively. Compared to the model with wtime = 0, the MAE and RMSE were reduced by 1.51°C and 1.68°C respectively and HitRate increased by 9%.

(3) The prediction accuracy of established model (CBR_TM) is further compared with the models based on SVR and BPNN. The MAE and RMSE of CBR_TM is lower than them and HitRate is higher than them, which proves the validity of the model. The CBR_TM can meet the requirements of the field production.

Acknowledgement

This work is supported by the National Key Technology R&D Program of China (2017YFB0304000&2017YFB0304001).

References
 
© 2021 The Iron and Steel Institute of Japan.

This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs license.
https://creativecommons.org/licenses/by-nc-nd/4.0/
feedback
Top