Evaluation and Prediction of Blast Furnace Status Based on Big Data Platform of Ironmaking and Data Mining

Hongyang Li; Xiangping Bu; Xiaojie Liu; Xin Li; Hongwei Li; Fulong Liu; Qing Lyu

doi:10.2355/isijinternational.ISIJINT-2020-249

Abstract

The applications of big data in the steel industry are widely developed. Ironmaking is a multi-sectoral joint-operation production process that generates massive data constantly. It is required to build the big data platform to efficiently organize and fully utilize the production data of the ironmaking. In this work, we build a comprehensive status evaluation and prediction system for the blast furnace (BF) to achieve the goal of high production, low consumption, high quality and long life of the BF. The evaluation system is based on the big data platform and equipped with the factor analysis method, which can define and extract the hidden common factors in the production index of the BF by considering 19 state parameters and can calculate the comprehensive BF status index as well. The prediction system employs the AdaBoost model which can accurately predict the BF status index 3 hours in advance. Evaluation results show that the proposed BF status index is highly consistent with the actual status of the BF in the selected time period. The coincidence degree between BF status index in different time periods and the actual situation is also verified by factor analysis. Although the evaluation and prediction system demonstrates high accuracy in current production environment, it may still need calibrate and update regularly due to the changing of the BF production in the long run. The online comprehensive evaluation and prediction system for BF can effectively assist operators to optimize the BF operation and maintain the stabilization of BF.

1. Introduction

The deep integration of big data technology with manufacturing industry triggered far-reaching industrial revolutions.¹⁾ With the improvement of manufacturing technology, the generated data increased exponentially. The values behind these manufacturing data could increase productivity, system sustainability and product quality.^2,3,4,5,6,7) Industrial big data is the core of smart manufacturing, and big data platform architecture is the basic work of data application. The reference architecture model of industry 4.0 (RAMI4.0) in Germany,⁸⁾ the industrial internet reference architecture (IIRA) in US, the Japanese Industrial Value Chain Reference architecture (IVRA) and the Made-in-China 2025 architecture all address interoperability issues in different scenarios. Siemens’Mindsphere Industrial Big Data Platform developed components in the form of ‘Internet of Things + Big Data Analysis + Management and Control Decision.⁹⁾ In iron and steel enterprises, a big data platform suitable for enterprise-level operation, maintenance and data mining has been established. The Internet Data Center of Hebei Iron and Steel Group in China using micro-module data center, technology of cloud computing resource pool and load balancing and resource unified management technology has been established.¹⁰⁾ BONC established an intelligent interconnected platform for deep mining of ironmaking data by utilizing existing basic data of the ironmaking system.¹¹⁾ An enterprise-level big data platform was established by SHOUGANG Group in China, which used big data analysis technology to integrate whole process data while improving product quality and reducing costs.¹²⁾ The advanced data mining model is used to optimize and improve some operations in iron-making process. The BF permeability was established by W-PCA-ML-ELM model to measure the smoothness of the BF operation.¹³⁾ The relationship model of SVR was used to reflect the exact quantitative relationship between load operation and state variables.¹⁴⁾ The MWCHPCA model monitored the ironmaking process and detected the abnormalities in BF.¹⁵⁾ Zhang studied the prediction of hot metal temperature with different depth learning and shallow prediction.¹⁶⁾ In order to obtain the best prediction results, a large number of relevant data were collected and analyzed within a reasonable range.^17,18) Based on the above point of view, the big data platform has been widely used in the industrial field, and the data mining technology of iron-making process has also been well studied. Although there are several successful cases of big data applications, there are still areas in ironmaking industry eagerly requiring the big data technology. The BF status is unique in the ironmaking process. Quantitative indexes and prediction of the BF status in advance are of great significance to the operators of BF.

The purpose of this study is to establish a quantitative evaluation and prediction system for the status of the BF. The core of the system is based on the big data platform of the ironmaking (BDPI). In this work, we define and implement the overall blast furnace status index (OBFSI) via factor analysis method by analyzing 19 BF location parameters. We also design and implement an AdaBoost method-based BF status index prediction model, which can give accurate prediction of the BF status 3 hours in advance by using real time parameters of BF operations. We integrate the evaluation and prediction module on top of big data platform using python programming language, which can demonstrate OBFSI and the prediction result periodically though efficient user interface. The whole system is able to efficiently organize the ironmaking data, fully utilize the value of the production data and effectively guide the field operation.

2. The Construction of BDPI

2.1. The Characteristics of the Ironmaking System Data

The ironmaking process with long process involves multi-department joint production, such as raw material workshop, sintering workshop, coking workshop, pelletizing workshop, blast furnace workshop, quality inspection workshop and so on. It could generate a large amount of production data. Different production procedures, such as material transportation, energy distribution, production operations and quality inspection, generate different types of data. Due to the variations in data demand and limitations in existed technologies, data recording methods and contents may be different. For example, firstly, the parameters of production equipment need to be recorded in real time. Secondly, the production plan, quality inspection parameters and production log need to be input regularly. Finally, the data of overhaul plan, production report and accident report need to be input from time to time according to requirements. The production data of a 2500 m³ BF for vanadium-titanium ore smelting studied in this paper are stored in different types of databases. Real-time data is stored in Wonderware, iron smelting management data is stored in SQL server, material detection data is stored in Oracle, various technical reports exist in the single-machine computer. Previous data storage methods of iron and steel enterprises can only satisfy the requirements of display parameters on the fly and simple quick query during production process, it can not support complex data analysis. By utilizing the comprehensive advantages of BDPI, various data in the platform can be stored efficiently and organized systematically to meet the requirements of data mining and intelligent manufacturing.

2.2. The Framework of BDPI

The framework of BDPI is systematically and normatively established by using the methods of industrial Internet of Things, edge computing, cloud storage and cloud computing. The functional architecture of the BDPI as shown in Fig. 1.

Fig. 1.

Functional architecture of big data platform. (Online version in color.)

The infrastructure as a service (IaaS) of BDPI for the whole life cycle process of ironmaking process is the practice of delivering a full compute stack as an abstract, virtualized construct. The main function of the platform as a service (PaaS) is to build an extensible operating system on the existing IaaS. The core of PaaS is to clean and integrate the original data via data modeling, and to provide various data analysis service, such as multidimensional analysis, scenario analysis, statistical analysis and mining analysis. The application layer of BDPI equips the system with the capability of visual analysis, which can be used for the optimization of supply chain, energy, production, equipment abnormality, quality and environment. The system supports multiple users simultaneously. It is able to significantly reduce the difficulties in data analysis and improve the data utilization in iron and steel industry.

The platform collects data by periodically pulling data from the original data sources, such as Wonderware, SQL server, Oracle and office. The structure of BDPI is shown in Fig. 2. By analyzing the types and storage differences of the original files, the database of the platform stores all the non-real-time data parameters using MySQL database to ensure the data association relationship, and saves the real-time data through time series database influxdb. In this work, the real-time data acquisition frequency is 1 s/time. The acquisition rule of non-real-time data is collecting whenever new data arrives. According to each unique practical application scenario, the database of each BDPI application model is built with the required parameters and receives data from the platform database through kafka. The results of each model are stored in the MySQL database and presented in the platform system through browser and server (B/S) architecture patterns.

Fig. 2.

Data flow of big data platform for ironmaking. (Online version in color.)

2.3. The Establishment of the Model

Different analytical and data mining models can be built on the platform according to the business requirements. The workflow of each model is basically the same. Firstly, according to the structure characteristics of their own database, the data provided by MySQL and influxdb are pre-processed by dealing with missing values and removing outliers to generate the model training data set. Secondly, different data analysis and mining models are established according to different actual requirements. Finally, the data mining results of iron-making process are achieved by applying data to the model, which plays an important role in promoting the development of iron-making production. Using the infrastructure of BDPI, the analysis model is built by using SPSS and python data analysis software. To solve the problem that it is difficult to quantify the actual status of BF production, a model for evaluating the status of BF was established. The model established the comprehensive index of BF state by using factor analysis method through various BF evaluation parameters, and analyzed the coincidence degree with the actual production. The model is also equipped with AdaBoost predictive algorithm. It is able to preidct the BF status index 3 hours in advance by inputting real time production data.

3. The Establishment of OBFSI

3.1. The Principle of the Factor Analysis

The principle of factor analysis is to use the principle of dimension reduction and ensure the minimum loss of the original information. The objective of factor analysis is to get a completely new comprehensive variable to evaluate the overall original variable matrix. Based on the correlation matrix of the original variables, this method studies the internal dependency among many variables, and classifies the variables with strong correlation into one common factor, and integrates several common factors into comprehensive variables. It is assumed that there are p original parameters (utilization factor, coke ratio, hot metal composition, etc.) for blast furnace production status evaluation in different aspects. After a series of data preprocessing, the standard variable matrix X = (X₁, X₂, …, X_P) corresponding to the time is obtained. Using factor analysis approach, the evaluation index of BF status is obtained by combining common factors hidden in the above BF status parameters. The general model for factor analysis is expressed as:

x i = a i1 f 1 + a i2 f 2 +⋯+ a im f m + ε i

(1)

where f₁, f₂, ..., f_m (m ≤ p) represents a common factor. The linear combination coefficient a_i1 represent the factor loads which reflect the degree of correlation between the evaluation variable x_i and the common factor f_i. The constant factor ε_i is a unique factor which represents other factors and cannot be included by m common factors.

The matrix form of the factor analysis model is expressed as:

X=AF+ε

(2)

[ x 1 x 2 ⋮ x p ]=[ a 11 a 12 ⋯ a 1m a 21 a 22 ⋯ a 2m ⋯ ⋯ ⋯ ⋯ a p1 a p2 ⋯ a pm ][ f 1 f 2 ⋮ f m ]+[ ε 1 ε 2 ⋮ ε p ]

(3)

where X is the original sample, A is the factor load matrix, F is the common factor vector, ε is the special factor matrix.

The model matrix satisfies the following conditions:

1) E(F) = 0, var(F) = I_m. The mean and standard deviation of each common factor are 0 and 1 respectively, which are independent of each other.

2) E(ε) = 0, var(ε) = diag(Φ₁, Φ₂, ..., Φ_p). The mean and standard deviation of ε_i are 0 and Φ_i respectively. ε_i is independent of each other.

3) cov(F, ε) = 0, cov(f_j, ε_i) = 0, i≠j. The common factor and the special factor are independent of each other.

3.2. The Selection of the Blast Furnace Parameters

The target of BF production is “high production, low consumption, high quality and long life”. The BF state evaluation should be comprehensively analyzed from various production perspectives. Selected parameters of BF status are shown in Table 1. The selected BF status parameters represent the status of many aspects of the BF production. There are significant differences among the aspects reflected by individual status indexes. Each Individual status index can only reflect the state of the BF production to a certain extent. Therefore, individual status index may not be able to comprehensively characterize the overall trend of the BF smelting process. There are a few previous works evaluating overall BF status using single parameters, which may cause key information missing. In this work, we used factor analysis to extract valuable information from all state parameters and integrate them into the OBFSI.

Table 1. Results of BF parameter selection.

Number	Code	Physical meaning	Number	Code	Physical meaning
1	HIW	theoretical iron weight	11	[Si]	[Si] in hot metal
2	UC	utilization coefficient	12	BV	blast volume
3	coke rate	coke rate	13	TR	thermal road
4	coal rate	coal rate	14	GUR	gas utilization rate
5	HCC	hour coke consumption	15	PI	permeability index
6	HCI	hour coal injection	16	PD	pressure drop in furnace
7	CA	cullet adding	17	O₂	oxygen enrichment
8	HMT	Hot metal temperature	18	TIBV	tons of iron blast volume
9	[V]	[V] in hot metal	19	BKE	blast kinetic energy
10	[Ti]	[Ti] in hot metal	20	[S]	[S] in hot metal

3.3. The Processes of the Data Processing

1) Time alignment. In the BDPI, the update frequency of BF status index is defined as 1-hour. According to the characteristics of generated parameters, the frequency of all original parameters is normalized into 1 hour by increasing or decreasing frequencies for different parameters respectively. The production time of iron is uncontrollable and the measurement can only be conducted after the hot metal tank is full. As a result, the actual iron quantity in adjacent hours differs greatly, so the hourly theoretical iron weight (HIW) is used instead of the real iron quantity for conversion. The daily blast furnace output is defined as the total iron content accumulated in a 24-hour cycle and the utilization coefficient (UC) is defined as the quotient of daily iron content to the volume of BF. Due to the limitations of production conditions and existing detection methods, the quality parameters (HMT, [Si], [Ti], [V], [S]) of blast furnace iron adopt the strategy of updating immediately after new values are generated and filling the last available data if there is no new values being updated. Due to the continuity of BF production, the real-time data (BV) and derivatives (TR, GUR, PI, etc.) with 1 second generation frequency are normalized to 1-hour interval.

2) Parameter range Selection. The evaluation indexes of BF with long life cycle are different in various stages of furnace service, even in different periods of the same furnace service stage. Therefore, the BF production parameters with a period of 10 months were selected for analysis. Even during normal production activities of BF, shutdown and maintenance are necessary. All production data of BF during blowing out period were abnormal values. To avoid unfavorable influence of abnormal parameters on OBFSI, the period of blowing out was eliminated by removing production log recorded on production site and real production data of BF.

3) Data Preprocessing and standardization. Due to data transmission delay and manual recording errors, the data recorded in actual production activities of BF inevitably produce missing, duplicated and abnormal values. To ensure the high quality, various methods are employed to handle the anomalies in different situations. Firstly, for duplicated values, when the amount of continuous duplication is less than 5%, delete the redundant data that is repeated three times or more and leave only one for the same time period. Delete this parameter if the amount of continuous duplication is greater than 5%. Secondly, missing values process are conducted on the data after the deduplication process. For parameters with less than 5% missing rate, they will be filled by using Lagrange’s interpolation method. For parameters with over 5% missing rate, they will be removed because of the insufficiency to support subsequent data analysis. Thirdly, for outliers, the predictable outliers during blast furnace maintenance are eliminated. Parameters that pass the normal distribution test but beyond the reasonable interval (Q1-1.5IQR, Q3+1.5IQR) of the box plot method are deleted. Q1 is the lower quartile of the data, Q3 is the upper quartile, IQR = Q3-Q1. The results from the processing of parameter [Si] outliers are shown in Fig. 3. The distribution of [Si] follows a normal distribution, and values outside the interval (0.0725, 0.1925) were deleted. In the process of outlier handling, outliers need to be removing according to the specific physical meaning of different indicators. For example, Oxygen enrichment rate can be 0% under normal blast furnace conditions, which should not be considered as abnormal data. Finally, Z-score standardization is able to normalize all the variables to the same mathematical scale. The formula for the Z-score standardization method is shown as follows:

x'= x- x ¯ σ

(4)

where x is the original data, x is the sample mean, and б is the sample standard deviation.

Fig. 3.

Results from the processing of [Si] outliers. (Online version in color.)

3.4. The Processes of the Factor Analysis

1) The Factor analysis applicability test of the original variables. An important prerequisite for factor analysis is that the original variables are correlated. To ensure the true meaning of the analysis results, the original data without factor analysis were analyzed by Pearson correlation analysis and KMO and Bartlett spherical test. Pearson correlation coefficient is shown in Fig. 4.

Fig. 4.

Results of Pearson correlation coefficient. (Online version in color.)

From Fig. 4 we can see that most of the correlation coefficients between 20 parameters are greater than 0.3, which indicates that there is a certain correlation between variables. KMO is used to compare the relationship between simple and partial correlation coefficients among variables, expressed mathematically as follows:

KMO= ∑ ∑ j≠i r ij 2 ∑ ∑ j≠i r ij 2 +∑ ∑ j≠i p ij 2

(5)

where r_ij is the simple correlation coefficient of the variables x_i and x_j, p_ij is the partial correlation coefficient of x_i and x_j when other variables are unchanged. Generally, factor analysis requires that the KMO value of the original variable should be greater than 0.5. Bartlett’s test of sphericity is also used to test the correlation of variables in the correlation matrix. The original data with spherical distribution has spherical test sig value less than 0.05, which meets the requirements of factor analysis. The test results of the original data are shown in Table 2. The original data with KMO and Bartlett sphere test results falling within the normal range is suitable for factor analysis. The results of the above three tests show that the blast furnace parameters with the selected type and in the time period can be analyzed by factor analysis.

Table 2. Results of the KMO and Bartlett sphericity test.

Kaiser-Meyer-Olkin	Bartlett’s test of sphericity
Kaiser-Meyer-Olkin	The approximate chi-square	df	Sig.
0.693	99292.317	190	0.000

2) The Commonality of the variables. The commonality of variables (h_i) reflects the explanatory proportion of m common factors to the total variance of the original variable (a_i). The mathematical expression of commonality is as follows:

h i 2 = ∑ j=1 m a ij 2

(6)

The commonality of the original variables, which have been standardized, is transformed into the following formula:

1= ∑ j=1 m a ij 2 + σ i 2 = h i 2 + σ i 2

(7)

where σ i 2 is the variance of the special factor. The smaller the difference between h i 2 and 1, the better the spatial transformation of original variables and the better the effect of factor analysis. The common factors of the original variables were extracted using principal components, unweighted least square, generalized least square, maximum likelihood, image factoring provided by SPSS software. The results are shown in Table 3. The Table 3 shows that the explanatory effect of the same parameter varies among different methods, and the principal component analysis method gives the highest proportion of explanations for each variable. The commonality of pressure drop in furnace is 0.883 by principal component method, that is, the explanation proportion of m common factors to the total variance of pressure drop in furnace is 88.3%, and the other parameters are the same. Regarding the commonality of all variables, only the commonality of hot metal temperature is the lowest all the methods, so hot metal temperature is removed. After comparing with a number of common factor extraction methods, we selected the principal component method with a degree of commonality greater than 0.5 for each variable as the factor analysis method.

Table 3. Results of common factors of original variables under different extraction methods.

	principal components	unweighted least square	generalized least squares	maximum likelihood	image factoring
HIW	0.591	0.499	0.659	0.558	0.484
UC	0.812	0.776	0.852	0.844	0.654
coke rate	0.716	0.649	0.667	0.585	0.561
coal rate	0.853	0.999	0.999	0.999	0.763
HCC	0.695	0.45	0.606	0.487	0.408
HCI	0.862	0.854	0.882	0.866	0.824
CA	0.818	0.242	0.343	0.054	0.052
HMT	0.459	0.008	0.008	0.007	0.006
[V]	0.659	0.559	0.64	0.583	0.418
[Ti]	0.914	0.999	0.999	0.999	0.696
[Si]	0.811	0.775	0.84	0.829	0.661
BV	0.776	0.819	0.997	0.998	0.967
TR	0.502	0.277	0.393	0.286	0.256
GUR	0.573	0.419	0.435	0.337	0.295
PI	0.930	0.902	0.985	0.982	0.972
PD	0.883	0.999	0.999	0.999	0.968
O2	0.722	0.64	0.645	0.564	0.485
TIBV	0.847	0.78	0.697	0.664	0.402
BKE	0.794	0.783	0.755	0.698	0.685
[S]	0.576	0.317	0.415	0.357	0.285

3) The Selection of the number of common factors. The contribution of the variance of the common factor ( g j 2 ) , which measures the relative importance of the common factor, reflects the ability of a single factor to interpret the total variance of all the original variables. The formula is as follows:

g j 2 = ∑ i=1 p a ij 2

(8)

The results of the total variance of the interpretation calculated by the principal component method are shown in Table 4.

Table 4. Results of the total variance of the interpretation.

Component	Initial Eigenvalues			Extraction Sums of Squared Loadings			Rotation Sums of Squared Loadings
Component	Total	% of Variance	Cumulative %	Total	% of Variance	Cumulative %	Total	% of Variance	Cumulative %
1	5.765	30.344	30.344	5.765	30.344	30.344	5.018	26.412	26.412
2	2.246	11.823	42.167	2.246	11.823	42.167	2.257	11.878	38.289
3	1.884	9.917	52.084	1.884	9.917	52.084	1.952	10.276	48.565
4	1.506	7.927	60.011	1.506	7.927	60.011	1.879	9.891	58.457
5	1.225	6.446	66.457	1.225	6.446	66.457	1.405	7.397	65.854
6	1.131	5.952	72.409	1.131	5.952	72.409	1.245	6.555	72.409
7	0.995	5.238	77.647
8	0.754	3.971	81.618
9	0.692	3.643	85.261
10	0.572	3.011	88.272
11	0.487	2.564	90.836
12	0.451	2.374	93.209
13	0.392	2.061	95.27
14	0.272	1.429	96.699
15	0.217	1.144	97.844
16	0.197	1.035	98.878
17	0.118	0.62	99.499
18	0.088	0.462	99.961
19	0.007	0.039	100

According to the selection principle of characteristic value > 1, the first six common factors meeting the requirements are selected. The first six common factors can explain 72% of the information in the original data, which conforms to the selection principle that the cumulative contribution of variance of common factors is more than 70%. Therefore, the first six selected common factors can reasonably describe the matrix structure of the original variable.

4) The Rotation of the factor load matrix. The purpose of factor analysis models is not only to find common factors, but also to get the real meaning behind the data by interpreting each common factor. The initial factor load matrix column obtained by the principal component method is shown in Table 5. As shown in the Table 5, the absolute load values of the 11 variables are above 0.5 in the first common factor, indicating that the first common factor has a high correlation with these variables. The analysis of other common factors shows the similar results. Each common factor is related to a different original variable. The first common factor, which contains too many variables, does not give a true metallurgical meaning in BF production and cannot be well explained. Therefore, the original factor load matrix is rotated.

Table 5. The initial factor load matrix column.

	Component
	1	2	3	4	6
HIW	0.667	0.148	0.11	0.302	0.083
UC	0.76	0.253	0.09	0.054	−0.388
coke rate	−0.744	−0.065	−0.007	0.314	0.03
coal rate	0.638	−0.053	−0.172	−0.547	−0.074
HCC	0.446	0.288	0.134	0.585	0.205
HCI	0.86	0.021	−0.089	−0.252	0.024
CA	0.133	0.073	0.007	−0.028	0.089
[V]	0.184	−0.65	0.307	−0.15	−0.103
[Ti]	−0.144	−0.372	0.845	0.01	−0.112
[Si]	−0.404	0.084	0.698	0.151	−0.112
BV	0.841	−0.009	0.12	0.099	0.034
TR	−0.511	0.294	0.046	0.038	−0.26
GUR	−0.51	0.261	−0.074	0.227	−0.062
PI	0.506	−0.588	−0.26	0.442	−0.201
PD	0.234	0.629	0.381	−0.382	0.228
O2	0.518	0.562	0.135	0.25	0.2
TIBV	−0.353	−0.316	−0.157	−0.003	0.757
BKE	0.748	−0.284	−0.049	0.287	0.073
[S]	−0.341	0.324	−0.483	0.113	−0.314

In order to simplify the structure for better interpretation, the factor load matrix with unique characteristics is rotated by the square value of the element of the load matrix to extreme differentiation of 0 and 1. Based on SPSS software, the initial factor load matrix is rotated by orthogonal rotation of Varimax, Quartimax and Equamax, and the rotation factor matrix is analyzed to get the most suitable rotation mode for metallurgical process knowledge. The matrix of rotational factors conforming to metallurgical process is obtained by the Quartimax method as shown in Table 6.

Table 6. Result of rotation factor matrix.

	Component
	1	2	3	4	5	6
HCI	0.876	0.137	−0.108	−0.088	0.117	0.012
CoalRate	0.794	−0.243	−0.147	−0.162	0.13	0.008
CokeRate	−0.749	−0.076	0.031	0.133	−0.182	0.297
BV	0.736	0.423	0.095	0.035	0.123	0.169
BKE	0.661	0.382	0.063	0.367	−0.042	0.23
TR	−0.574	−0.097	−0.068	−0.127	0.26	−0.17
GUR	−0.568	0.049	−0.175	−0.094	0.037	0.374
HCC	0.147	0.812	−0.01	0.052	0.007	0.038
O₂	0.259	0.71	−0.129	−0.293	0.116	−0.185
TIW	0.46	0.59	0.027	0.059	0.108	−0.076
[Ti]	−0.134	−0.058	0.928	−0.002	0.073	−0.023
[Si]	−0.497	0.112	0.602	−0.211	0.158	0.033
[S]	−0.398	−0.094	−0.563	0.063	0.269	0.098
[V]	0.328	−0.285	0.554	0.282	−0.05	0.106
PD	0.221	0.214	0.052	−0.862	0.081	0.128
PI	0.425	0.161	0.024	0.836	0.047	0.031
TIBV	−0.183	−0.104	−0.016	0.041	−0.88	−0.035
UC	0.474	0.357	−0.03	−0.009	0.585	−0.079
CA	0.055	0.101	−0.011	0.05	−0.008	−0.905

As shown in the Table 6, the first common factor with absolute loads value greater than 0.5 has a strong correlation with the parameters for HCI, CoalRate, CokeRate, BV, BKE, TR and GUR. From the point of view of metallurgical process, all the above seven indices are related to energy utilization in BF production. Therefore, the first common factor (CF1) is defined as energy indices. The absolute load of the second common factor (CF2) on the indexes of HCC, O2 and TIW which are closely related to the productivity of the BF exceeds 0.5. The CF2 is defined as the productivity index.The load absolute value of the third common factor (CF3) in the index of hot metal composition [Ti], [Si], [S] and [V] is more than 0.5, and it does not exceed 0.2 in other indexes. The CF3 is defined as an index of hot metal composition. In terms of PD and PI indexes affecting the stability of BF, the absolute value of load of the fourth common factor (CF4) exceeds 0.8. The CF4 is defined as the stability index. The absolute value of load of fifth common factor (CF5) exceeds 0.5 for index TIBV and UC of enhanced production degree in BF production. The CF5 is defined as the index of enhanced production degree. Only in the index of clinker ratio, the absolute load of the sixth common factor (CF6) is −0.905. The CF6 is defined as the iron mixing index.

5) Factor score Calculation. Through the previous factor analysis process, the raw data is decomposed into a linear combination of six common factors, as shown in Formula (1). The purpose of the study is to obtain the OBFSI by evaluating all parameters.

Each common factor contains all the original variables and can be expressed as a linear combination of the original variables as follows:

f ∧ j = b j1 x 1 + b j2 x 2 +⋯+ b jp x p ,j=1,2,…,m

(9)

The common factor matrix is as follows:

F ∧ =BX

(10)

where B = (b_ij)_m*p is the score coefficient matrix. Based on SPSS software, the common factor expression is represented as the product of the original matrix and the score coefficient matrix obtained by regression method. The expressions for the six common factors are as follows:

CF1=-0.043X1+0.008X2-0.188X3+0.309X4 -0.174X5+0.223X6-0.042X7+0.107X9 -0.086X10-0.196X11+0.084X12-0.154X13 -0.147X14-0.039X15+0.118X16-0.079X17 +0.069X18+0.05X19-0.094X20

(11)

CF2=0.252X1+0.045X2+0.067X3-0.219X4 +0.411X5-0.023X6+0.011X7-0.138X9 -0.007X10+0.085X11+0.152X12-0.057X13 +0.074X14+0.07X15+0.061X16+0.309X17 +0.095X18+0.175X19 -0.067X20

(12)

CF3=0.014X1-0.008X2-0.008X3-0.045X4 -0.017X5-0.032X6-0.004X7+0.288X9 +0.47X10+0.297X11+0.059X12-0.045X13 -0.105X14+0.005X15+0.054X16-0.062X17 -0.008X18+0.032X19-0.3X20

(13)

CF4=0.034X1+0.038X2+0.079X3-0.109X4 -0.036X5-0.068X6+0.042X7+0.115X9 -0.016X10-0.097X11+0.002X12-0.018X13 -0.025X14+0.451X15-0.478X16-0.143X17 -0.058X18+0.172X19+0.091X20

(14)

CF5=-0.01X1+0.385X2-0.071X3+0.068X4 -0.1X5+0.013X6-0.039X7+0.026X9 +0.104X10+0.141X11+0.012X12+0.234X13 +0.05X14+0.078X15-0.056X16-0.05X17 -0.677X18-0.076X19 +0.24X20

(15)

CF6=0.052X1+0.036X2-0.21X3-0.042X4 -0.028X5-0.041X6+0.734X7-0.08X9 +0.03X10-0.017X11-0.156X12+0.142X13 -0.292X14-0.006X15-0.142X16+0.132X17 +0.055X18-0.187X19-0.079X20

(16)

Compared with 19 original indexes, the common factor is more concise and intuitive in evaluating the state of BF from different perspective. The hourly common factor score can be obtained by using the common factor formula and the parameters of the hourly BF provided by the BDPI. However, for the same blast furnace state at the same time, the six common factors with different judgment still can not provide the specific value which is most suitable for the OBFSI. The formula for calculating the OBFSI by using the variance contribution rate of each common factor is as follows:

OBFSI= ∑ i=1 (P V i *C F i ) ∑ i=1 P V i

(17)

where PV_i is the variance contribution of rotation common factor. Through the calculation of the above formula, the OBFSI is synthesized by six common factors. As shown in the Fig. 5, we validated the OBFSI using a 10-month long production log, which contains various actual BF conditions including continues fluctuations in the early part, serious conditions in the middle part and great improvement in the late part. In the period of March to June, the actual BF conditions fluctuated and led to several shut-off operations. From the result we can see that in the same period, the OBFSI fluctuates near 0 value. Also there are several −100 values indicating the shut-off abnormal status. From July to August, the actual condition of the BF kept being poor consistently. From July 5 to 15, these serious conditions led to a stop of oxygen enrichment operation for the purpose of adjustment. We can see similar trend from the OBFSI. During July to August period, OBFSI is consistently below 0. The huge score drops during July 5 to 15 reflects the abnormal adjustment operation. After a series of adjustments, the BF gradually back to normal status. Accordingly, the OBFSI is also presenting an uptrend at the same period. From September to December, the state of BF has turned well and maintained in a good state for a long time. Similarly, the OBFSI shows a trend of first rising then stabilizing, most of which are greater than 0. The detailed analysis of the OBFSI by using the production log demonstrates that this index can accurately reflect the production situations of BF and capture the abnormal status of BF. Based on the analysis of the distribution of OBFSI values in the 10-month test interval, the blast furnace indices are divided into four grades as shown in the Table 7.

Fig. 5.

The OBFSI per hour. (Online version in color.)

Table 7. Grade rules of the OBFSI.

Status Value	True State
50 < Y	GOOD
−50 < Y < 50	NORMAL
Y < −50	POOR
Y < −100	WARNING

6) Verification in different periods. Factor analysis of the OBFSI is based on the raw data from March to December 2018. The applicability of the OBFSI can be verified by comparing the status index with the actual production situation in a new time range. The OBFSI from January to March 2019 is shown as Fig. 6.

Fig. 6.

Results of the OBFSI from Jan to Mar 2019. (Online version in color.)

In the selected time range, the large fluctuation of the OBFSI in the early stage indicates that the BF state is poor and several shut-off operations are conducted. After a series of adjustment, the BF state is more stable with a small range of index changes in the later period. The change trend of the OBFSI is aligned with the actual situation recorded in production work log.

4. The Prediction of the State Index

The OBFSI can be obtained by factor analysis with parameters from different aspects of BF. However, the requirement of 1-hour generation of raw data cannot be guaranteed. On the production, for some parameters, the platform can not obtain the true value of current state, such as the generation of detection value of molten iron composition. In conclusion, the current BF status index obtained by factor analysis can not bring great guiding value to BF production. It can only meet the needs of operators to evaluate BF status. To make the OBFSI more useful in operation guide, we proposed a system to predict the future value of the OBFSI in order to overcome the parameter delay issue. The system is equipped with the adaptive enhancement prediction model and takes current BF parameters as input.

1) Real-time parameters Selection. The input parameters of the forecast model come from the BDPI. In order to satisfy the hourly prediction frequency of the OBFSI, we select the real-time data of BF as input data. The data with time delay or manually input data are not selected. The 12 real-time measurable parameters of Blast volume (BV), Blast pressure (BP), Blast temperature (BT), Oxygen enrichment (OE), Soft water flow rate (SWFR), Actual blast velocity (ABV), Carbon monoxide (CO), Carbon dioxide (CO₂), Underpart differential pressure (Under BP), Upperpart differential pressure (Upper BP), Top temperature (TT) and Top gas pressure (TGP) are selected in this study. These parameters are stored in the influxdb database of the BDPI with the frequency of 1 s/time. The model input parameters for forecasting are obtained by using Python program according to the data preprocessing method introduced before.

2) Time intervals analysis. The correlation between model input parameters and OBFSI is analyzed by mutual information coefficient (MIC). The best prediction model lead time is the time interval with the largest correlation coefficient between the real-time parameters of different intervals and the current time OBFSI. The formula for calculating MIC is as follows:

MIC(x,y)= max X*Y<B ∫ p(x,y) log 2 p(x,y) p(x)p(y) dxdy log 2 min(X,Y)

(18)

where x is the number of cells divided by the original data in the X direction, y is the number of cells divided by the original parameters in the Y direction, B is equal to the alpha (α) power of the data amount. In this paper, the α of B is equal to 0.7. Using the minepy Library of Python program to get the correlation coefficients between real-time data and OBFSI at different time intervals as shown in the Table 8.

Table 8. Correlation coefficient between real-time parameters and OBFSI.

	0H	1H	2H	3H	4H	5H	6H	7H	8H
BV	0.349	0.339	0.349	0.359	0.354	0.352	0.348	0.350	0.350
BP	0.196	0.195	0.187	0.201	0.195	0.191	0.189	0.195	0.192
BT	0.241	0.240	0.241	0.252	0.250	0.249	0.249	0.248	0.248
TGP	0.151	0.147	0.156	0.157	0.145	0.152	0.155	0.154	0.149
ABV	0.313	0.303	0.314	0.317	0.311	0.318	0.311	0.317	0.307
OE	0.294	0.286	0.287	0.299	0.289	0.289	0.296	0.296	0.291
SWFR	0.541	0.550	0.548	0.552	0.542	0.545	0.547	0.546	0.542
CO	0.184	0.188	0.185	0.184	0.189	0.181	0.183	0.183	0.186
CO2	0.301	0.301	0.298	0.303	0.295	0.296	0.300	0.294	0.293
TT	0.266	0.269	0.267	0.268	0.259	0.267	0.263	0.267	0.266
Under BP	0.345	0.345	0.333	0.338	0.338	0.328	0.330	0.333	0.327
Upper BP	0.215	0.217	0.222	0.218	0.215	0.214	0.219	0.220	0.218

There are differences in the correlation between real-time parameters and OBFSI at different time intervals. At 3-hour intervals, the number of variables with the best correlation was the highest. Real-time parameters represent the instantaneous production status of BF, and the OBFSI is an evaluation of the overall status of BF. According to the time lag of BF and the analysis result of MIC, 3 hours is selected as the best time interval. The real-time parameters are used as the input parameters of the prediction model to forecast the OBFSI after 3 hours.

3) The Establishment and Evaluation of the prediction model. The AdaBoost algorithm in python sikit-learn library is used to train the forecast model of the OBFSI. The AdaBoost algorithm is roughly divided into three steps: 1. Initializing the weight distribution of the training data. Each training sample is initially given the same weight. 2. Constantly optimize the weight distribution of the weak classifier through iteration. 3. Combine each trained weak classifier into a strong classifier by weight. The core formula of the AdaBoost algorithm is as follows:

H final =sign(f(x))=sign( ∑ t=1 T a t H t (x) )

(19)

where H_final is the final strong classifier, a_t is the weight of the weak classifier and H_t is the basic classifier. The whole parameter set is randomly divided into training set and test set at a 9:1 scale to satisfy the capture of the entire parameter cycle characteristics. Prediction model result of best evaluation index is obtained by optimizing adjusting parameters. Parameter setup of AdaBoost and evaluation index of result are shown in Table 9.

Table 9. Parameter setting of AdaBoost and evaluation index of prediction results.

Parameter setup		Evaluation index
The base estimator	tree	Train time [s]	2.142
Number of estimators	32	MSE	530.379
Learning rate	0.4	RMSE	23.03
Classification algorithm	SAMME.R	MAE	14.441
Regression loss function	Square ()	R2	0.735

The comparison curve between the predicted results and the actual values based on the AdaBoost model is shown in the Fig. 7. Details of April are shown in the Fig. 8.

Fig. 7.

Results of the AdaBoost model. (Online version in color.)

Fig. 8.

Details of the results of the AdaBoost model in April. (Online version in color.)

The OBFSI is predicted efficiently by the AdaBoost model throughout the interval. The detailed chart in April shows that the prediction results are not only consistent with actual status indicators but also can sharply capture the BF abnormal status. By using the data flow and solidification prediction model of the BDPI, the OBFSI for the BF operators is informed three hours in advance, which plays a guiding role in the BF production.

5. The Interface and Function of the Status Evaluation System

The processes of data processing, data analysis and data prediction were implemented by Python program in the BDPI. By collecting the original data from the database of the iron and steel enterprise, and through the data flow process of the BDPI, the predicted value of OBFSI was obtained. The indicator system displayed in the BF control room is shown in the Fig. 9.

Fig. 9.

Display interface of comprehensive state system of BF. (Online version in color.)

The status indicator system has three core tags: KPI, historical trend, and parameter analysis. In the KPI module, the value and status of the current OBFSI and the predicted value and status are presented. The trend of the predicted value is displayed, to provide reference for the BF operators. In the History Trend module, the OBFSI of the previous 12 hours were displayed dynamically. Any historical time can be selected freely, which makes it easy for operators to compare data. In the Parameter module, different real-time parameters of BF were presented dynamically with different color markers. Any real-time parameters can be added or deleted arbitrarily in the historical time, which can meet the different analysis needs of operators.

The continuity and long-term nature of the blast furnace production determine that the status of BF varies from time to time, which may lead to changes in the OBFSI. Therefore, the performance of the OBFSI, which is derived from the historical data analysis, may downgrade in the long run, as well as the forecasting model. We recommend that refresh all the analytical components including the load factor matrix, OBFSI and furcating models in a half-year frequency. In this way, the status index can be kept in the best condition to adapt to the variations in BF production in order to meet the requirements of existing operators, so as to achieve the purpose of guiding BF production.

6. Conclusions

Based on the BDPI with existing data storage in iron-making enterprises, a comprehensive status evaluation and prediction system for BF is established, and the following conclusions are obtained:

(1) The architecture of the BDPI includes edge intelligence, IaaS, PaaS and APP, the transmission of raw data in specific databases at different levels to meet the requirements of various data analysis models.

(2) The abnormalities of production data include: inconsistent time correspondence, missing data, duplicate data, abnormal data, blowing out data. By processing the original data using methods time alignment, parameter range selection, data preprocessing and standardization to obtain data that can be used for data mining.

(3) Based on 19 state parameters of different aspects of BF, the OBFSI are calculated by factor analysis method. These indexes not only conform to the actual production conditions of BF, but also can accurately capture the early stage state of abnormal conditions of BF. This method is also applicable to different stages of BF.

(4) According to the time lag characteristic of BF production, the relationship between real-time parameters and OBFSI is strongest at 3 hours interval. The predicted result of R2 = 0.735 is obtained by using AdaBoost algorithm and 12 real-time parameters to predict the status of BF after three hours.

(5) On the basis of BDPI, the comprehensive status evaluation and prediction system for BF has completed the software development. The software interface shows the current value, predicted value and change of the status of BF, along with the historical trend and core parameters. The historical OBFSI and core parameters can be queried and combined arbitrarily, which can help operators to judge the operation trend of BF and maintain the stability and smooth operation of BF.

Acknowledgement

Thanks are given to the financial supports form the key program of national nature science foundation of china (U1360205), the Hebei province higher education technology research project (QN2019200).

References

1) S. Q. Zheng, Y. W. Zong, W. S. Dong and Z. G. Ding: Industry Big Data: Architecture and Application, ShangHai Scientific & Technical Publishers, ShangHai, (2017), 2 (in Chinese).
2) D. Hughes, J. Ueyama, E. Mendiondo, N. Matthys, W. Horré, S. Michiels, C. Huygens, W. Joosen, K. L. Man and S. U. Guan: J. Braz. Comput. Soc., 17 (2011), 85.
3) W. J. Zhang and Y. Lin: Enterp. Inf. Syst., 4 (2010), 99.
4) P. Jiang, K. Ding and J. Leng: Manuf. Lett., 7 (2016), 15.
5) C. Lu, X. Li, L. Gao, W. Liao and J. Yi: Comput. Ind. Eng., 104 (2017), 156.
6) T. Tsuda, S. Inoue, A. Kayahara, S. Imai, T. Tanaka, N. Sato and S. Yasuda: IEEE Trans. Semicond. Manuf., 28 (2015), 229.
7) J. L. Wang and J. Zhang: Int. J. Prod. Res., 54 (2016), 7231. https://doi.org/10.1080/00207543.2016.1174789
8) X. H. Zhang, W. A. Peek, B. Pikas and T. Lee: J. Appl. Bus. Econ, 18 (2016), 97.
9) Siemens: Intell. Manuf., 7 (2019), 24 (in Chinese).
10) J. J. Li: Autom. Appl., 9 (2018), 63 (in Chinese).
11) H. B. Zhao, W. Liu, Y. J. Li, Q. Wang and J. Wu: Big Data Res., 3 (2017), 15 (in Chinese).
12) Z. J. Zhao, W. X. Wang and Y. X. Pang: Nonferrous Met. Eng. Res., 36 (2015), 42 (in Chinese).
13) X. L. Su, S. Zhang, Y. X. Yin and W. D. Xiao: J. Frankl. Inst., 355 (2018), 1663.
14) K. X. Zhang, M. Wu, J. Q. An, W. H. Cao, Z. T. Liu and F. L. Ning: IFAC-PapersOnLine, 50 (2017), 13796.
15) B. Zhou, H. Ye, H. F. Zhang and M. L. Li: Control Eng. Pract., 47 (2016), 1.
16) X. M. Zhang, M. Kano and S. Matsuzaki: Comput. Chem. Eng., 130 (2019), 106575. https://doi.org/10.1016/j.compchemeng.2019.106575
17) P. Patel, M. I. Ali and A. Sheth: IEEE Intell. Syst., 32 (2017), 64.
18) K. S. Wang: Adv. Mater. Res., 1039 (2014), 490.

Corresponding author

Register with J-STAGE for free!