PCA-LMNN-Based Fault Diagnosis Method for Ironmaking Processes with Insufficient Faulty Data

Tongshuai Zhang; Hao Ye; Haifeng Zhang; Mingliang Li

doi:10.2355/isijinternational.ISIJINT-2016-101

Abstract

Fault detection and fault classification are important in the modern ironmaking process. Some studies based on principal component analysis (PCA) techniques have been performed for fault detection in the ironmaking process. However, studies on fault classification in the ironmaking process remain limited. In this paper, problems that are related to the classification of abnormalities in blast furnaces are considered. We fuse historical abnormal data that were collected from three real blast furnaces to address the problem of insufficient historical faulty data. To extract common features for the same type of abnormalities, which are not affected by different operation points or different blast furnaces, we propose the use of a contribution vector as a fault feature, which is calculated by the PCA-based technique. The large marginal nearest neighbor (LMNN) technique is employed to train a classifier with contribution vectors as inputs. Twenty-one historical abnormalities in three different real blast furnaces are employed to validate the proposed method. The results indicate that this method achieves the desired performance.

1. Introduction

Ironmaking processes are an important part of the modern iron and steel industry.¹⁾ Blast furnaces, which are the main reactors, are the core of the entire ironmaking process.²⁾ In a typical ironmaking process, ironmaking-bearing materials (such as iron ore, sinters, and pellets), cokes and flux are dumped into the top of a blast furnace, whereas hot dry air, enriching oxygen, fuels (such as tar or pulverized coal) and moisture are blasted into the bottom of the blast furnace. Similarly, the output, which is composed of liquid molten iron and slag, flows from the bottom of the blast furnace, whereas coal gas and dust are collected at the top of the furnace. With one downward stream and one upward stream, complex reactions occur in the blast furnace.³⁾

A blast furnace should operate at normal status and in a steady state. Because the physical and chemical reactions in a blast furnace are complex, various abnormalities, such as hanging, slipping, channeling, and cold furnace condition, frequently occur in a blast furnace,⁴⁾ which may cause severe accidents if they are not detected and handled in a timely manner. Thus, effective fault diagnosis methods for the ironmaking process, including fault detection, i.e., detection of an abnormality, and fault classification, i.e., identification of the cause or type of abnormality after fault detection, should be developed.

Expert systems have been successfully applied to fault detection and fault classification in ironmaking processes.^5,6,7,8) However, the requirements of comprehensive rules and sufficient historical information about abnormalities, which are not always available, limit the application of these systems in many factories.

To address the limitations of expert system-based methods, data-driven methods for either fault detection or fault classification have been developed in recent years.

Data-driven methods for fault detection in ironmaking process are primarily based on principal component analysis (PCA). Although PCA has been extensively and successfully applied in the chemical industry,^9,10) its application to the ironmaking process remains limited. A study by Gamero,¹¹⁾ which addressed qualitative trend analysis, and a study by Vanhatalo,¹²⁾ which considered an experimental blast furnace, are representative of limited research that is based on PCA for fault detection in ironmaking process. Motivated by their studies and to address the problem of unknown switching disturbances in hot blast stoves, a two-stage PCA-based method was proposed for the incipient detection of abnormalities in a real blast furnace of the Liuzhou Iron & Steel Co., Ltd.¹³⁾ The method was tested using more real historical data in literature,¹⁴⁾ which were collected from three blast furnaces in the Liuzhou Iron & Steel Co., Ltd. And among a total of 24 abnormalities, 20 abnormalities were detected prior to the successful monitoring of the operators. However, these PCA-based methods cannot be utilized to perform fault classification after fault detection. Based on our previous work,^13,14) this paper focuses on fault classification, i.e., the classification of newly occurring abnormalities after they have been detected by the two-stage PCA-based fault detection method proposed in literature¹³⁾ for the ironmaking processes at the Liuzhou Iron & Steel Co., Ltd.

Numerous data-driven methods for fault classification exist. These methods consist of two phases: offline training and online classification.¹⁵⁾ In offline training, a classifier is trained by using a proper optimization method with historical faulty data, whose fault types are known a priori. Many methods include a step that is referred to as fault feature extraction prior to training, which transforms the original faulty data into a proper form to manifest the fault features,¹⁶⁾ e.g., Li et al.¹⁷⁾ employed the Fourier transform of the original faulty signal as a fault feature. In online classification, new data (or the corresponding fault feature) with an unknown fault type, are fed to the trained classifier, and the output of the classifier is calculated and employed as the classification result.¹⁸⁾ Some data-driven fault classification methods for ironmaking processes, which are primarily based on support vector machines (SVMs) and its variants,^4,19,20,21) and neural networks^22,23) employ original data instead of extracted fault features to train a classifier. Compared with expert system based methods, the merit of these methods is that they do not require comprehensive rules. However, they still meet with a common difficulty that is encountered by all data-driven fault classification methods,²⁴⁾ i.e., the requirement of rich historical faulty data that cover all types of faults to train the classifier, which is frequently impossible in real applications. As shown in Table 3 of Sec. 3, we employ historical data from 2012 to 2014 for 21 confirmed abnormalities for the three blast furnaces at the Liuzhou Iron & Steel Co., Ltd. The lack of historical faulty data is revealed in the following aspects, which renders existing data-driven fault classification methods inapplicable:

➢ Historical data of some abnormality types may be lacking for a blast furnace. For example, historical data of the cold furnace condition were unavailable for neither Blast Furnace No. 2 nor Blast Furnace No. 5 at the Liuzhou Iron & Steel Co., Ltd.

➢ Blast furnaces may run at different operating points due to changes in production requirements, the quality of raw materials, or the weather. Even for the same types of abnormalities, they may occur at different operating points and cannot be covered by historical faulty data.

➢ Faulty data are very limited. For Blast Furnace No. 5 at the Liuzhou Iron & Steel Co., Ltd., only historical data for three abnormalities were collected.

Table 3. Summary of the detection results of the abnormalities.

Abnormality No.	Blast Furnace	Abnormality Type	Alarm Statistics	Lead Time	Sample Size
1	No. 2	Hanging	T²	20.7 hours	3795
2		Hanging	T²	3.1 hours	944
3		Channeling	SPE	3 hours	952
4		Hanging	T²&SPE	1.6 hours	296
5		Hanging	T²&SPE	0.5 hours	302
6		Hanging	T²&SPE	3.6 hours	564
7		Hanging	T²	1.1 hours	566
8		Hanging	T²	3.0 hours	841
9		Hanging	T²	7.5 hours	951
10		Hanging	T²	3.4 hours	700
11	No. 3	Hanging	T²&SPE	9.3 hours	2790
12		Slipping	SPE	1.6 hours	644
13		Hanging	T²	19.6 hours	1413
14		Hanging	T²	1.4 hours	763
15		Hanging	T²	1.3 hours	342
16		Channeling	SPE	4.9 hours	1019
17		Cold furnace condition	T²&SPE	1.3 hours	491
18		Hanging	T²&SPE	1.5 hours	474
19	No. 5	Hanging	T²&SPE	6.5 hours	1611
20		Channeling	T²&SPE	0.6 hours	326
21		Hanging	T²&SPE	3.3 hours	990

Motivated by a study on predictive maintenance,²⁵⁾ which fused the historical data of different aircraft engines all over the world to address the difficulty of insufficient historical faulty data for each engine, we fuse the historical data of three blast furnaces at the Liuzhou Iron & Steel Co., Ltd. to offer a more complete historical dataset for fault classification, which may cover more abnormality types and operating points than the historical data of each single blast furnace. An important reason why we can do this lies in the consideration that the majority of the monitoring variables in the three blast furnaces are common, and the reaction mechanisms of the ironmaking processes are similar.

However, the manifestation of the same type of abnormalities in different blast furnaces or at different operating points may differ in most cases. To realize the idea of data fusion in feature extraction, which is a step of fault classification, the extraction of common features for the same type of abnormalities that are robust to different operating points and different blast furnaces becomes a key problem to be solved. Note that none of the existing data-driven fault classification methods for ironmaking processes^{4,19,20,21,22,23)} addresses the problem because they directly employ the original process data as fault features.

To address the problem, we propose the use of a contribution vector as a fault feature in this paper. Contribution plots are frequently adopted by PCA-based methods to preliminarily analyze the possible cause of a detected fault, i.e., the process variable with the largest contribution value to the abnormal T² or SPE statistic will be treated as the cause of the fault.¹⁰⁾ As noted in related literature,²⁶⁾ contribution plots do not properly indicate the root cause of a fault. In contrast to existing PCA-based methods, which directly employ contribution values to perform cause analysis of faults, we employ a contribution vector that is composed of the contribution values of all process variables as the fault feature, which will be classified by a classifier. The contribution values reflect the influence of each process variable on the variation of a system from its normal state¹⁰⁾ and an abnormality of a blast furnace is a system variation, while different operating points and different blast furnaces can be regarded as various normal states of a system. Thus, contribution values may be helpful for eliminating the influence of the differences due to different operating points and different blast furnaces. To the best of our knowledge, the use of contribution values as fault features has not been observed in existing fault classification methods.

Many state-of-the-art classification algorithms, such as SVMs and the large margin nearest neighbor (LMNN), can be adopted to realize classification after the fault features have been extracted. In this paper, the LMNN, which is a newly developed classification algorithm that is based on metric learning,²⁷⁾ is adopted as the classifier. Other classification algorithms can also be adopted.

The remainder of the paper is organized as follows: Sec. 2 introduces the basic idea of PCA¹⁰⁾ and a two-stage PCA-based fault detection method for the ironmaking process proposed in our previous work,¹³⁾ which will be employed to extract the contribution vectors in this paper, as well as the LMNN-based classification method,²⁷⁾ which will be applied to train the classifier in this paper. Sec. 3 describes the dataset collected from the Liuzhou Iron & Steel Co., Ltd. Sec. 4 presents the complete LMNN-PCA-based method that is proposed in this paper for the classification of abnormalities of blast furnaces at the Liuzhou Iron & Steel Co., Ltd. and the experimental results that are based on real historical data. Sec. 5 concludes with a discussion of the results and future studies.

2. Preliminaries

2.1. A Brief Introduction to the PCA-Based Fault Detection Method

Let X denote a data matrix for L observations of q variables; X can be decomposed as,²⁸⁾

X=T P T = ∑ i=1 q t i p i T = ∑ i=1 s t i p i T +E

(1)

where t_i, p_i, (i=1, ···, q), and E denote the score vector, the loading vector and the residual matrix, respectively, and s denotes the number of the main score vectors that represent the majority of information about X. Please refer to literature²⁸⁾ for details.

Hotelling’s T² statistic and SPE statistic have been extensively employed by PCA-based monitoring methods, which are defined as¹⁰⁾

T 2 (k)=x (k) T P s Σ s -2 P s T x(k)

(2)

and

SPE(k)=r (k) T r(k), with r(k)=x(k)- x ˆ (k), x ˆ (k)= P s P s T x(k)

(3)

where x(k) denotes a newly arriving observation at sampling instant k, P_s is a matrix that is composed of the first s columns of P (i.e., p_i for 1≤i≤s), and Σ_s is a diagonal matrix that is composed of the s largest singular values of X. For a given confidence level α, the corresponding thresholds for the T² statistic and SPE statistic can be calculated according to literature.²⁸⁾ For the newly arriving observation x(k), if its corresponding T²(k) or SPE(k) exceeds the threshold, a fault alarm is given.

In addition, after a fault alarm has been given, the contribution of each individual process variable x_i(k)(i=1, ···, q) to the abnormal T² statistic or SPE statistic at the k^th sampling instant can be calculated as²⁶⁾

con t T 2 ,i (k)= ( ξ i T P s Σ s -1 P s T x(k)) 2 con t SPE,i (k)= ( ξ i T r k ) 2 = r i (k) 2 i=1,…,q

(4)

where ξ_i=[0 0···1···0]^T denotes the i^th column of an identity matrix. Because T 2 (k)= ∑ i=1 q con t T 2 ,i (k) and SPE(k)= ∑ i=1 q con t SPE,i (k) , the percentages of cont_T²,i (k) and cont_SPE_,_i (k) in the corresponding statistic (i.e., T² and SPE) reflect the influence of the i^th variable to the variations in the principal space and the residual space, respectively. In traditional PCA-based methods, the variable with the largest contribution to abnormal statistics is usually treated as the cause of the abnormality,¹⁰⁾ which is a preliminary cause analysis instead of a root cause analysis, as mentioned in literature.²⁶⁾

Unknown switching between blast stoves at the Liuzhou Iron & Steel Co., Ltd. introduces severe disturbances to both the modelling procedure and the monitoring procedure if a standard PCA-based method is applied to fault detection in the ironmaking process. Thus, a two-stage PCA-based method was proposed in literature¹³⁾ to address this issue.

In literaure,¹³⁾ the first-stage PCA is designed to achieve effective identification, location and removal of the switching disturbances via the comparison of the T² statistic with a threshold in the hypothesis testing framework. All identified disturbances are removed to construct a new training set without disturbances. The second-stage PCA is equivalent to the standard PCA-based monitoring method, with the exception that a procedure to distinguish the switching disturbances and anomalies is added. Interested readers may refer to literature¹³⁾ for a detailed algorithm.

2.2. A Brief Introduction to LMNN-Based Classification

2.2.1. Offline Training Model of LMNN

LMNN, which is proposed by Weinberger et al.,²⁷⁾ is an extensively applied metric learning method. The basic model is introduced as follows.

Let {( x → i , y i )} i=1 n denote a training set of n labeled samples with the variable vectors x → i ∈ R q and the class labels y_i ∈{1,2,…,m}. LMNN defines the squared distance between x → i , x → j as²⁷⁾

D M ( x → i , x → j )= ( x → i - x → j ) T M( x → i - x → j ),

(5)

where the Mahalanobis metric M is a positive semidefinite matrix to be determined. The following two terms, which were introduced by Weinberger et al.,²⁷⁾ illustrate how to learn the matrix M in LMNN.

• Target neighbors: For each sample x → i with label y_i, its target neighbors are samples that are the c nearest neighbors with the same label y_i determined by Euclidean distance. The number of the target neighbors c should be predetermined.

• Impostors: For a sample x → i with label y_i and its farthest target neighbor x → j , an impostor is any sample x → k with label y k ≠ y i such that

D M ( x → i , x → k )≤ D M ( x → i , x → j )+C

(6)

where C>0, which is usually 1.

The objective of LMNN is to learn a matrix M that minimizes the distance between each training sample with its “target neighbors” while maximizing the distance between each training sample with its “impostors”, which can be realized by solving the following semidefinite program (SDP)²⁷⁾

min M J= min M (1-μ) ∑ ( x → i , x → j )∈S D M ( x → i , x → j ) +μ ∑ i,j,k ξ ijk s.t. (1) D M ( x → i , x → k )- D M ( x → i , x → j )≥1- ξ ijk ∀( x i , x j , x k )∈R (2) ξ ijk ≥0 (3) M≥0.

(7)

where J is the loss function, μ∈[0,1] is the weighting parameter that controls the “pull/push” trade-off and {ξ_ijk} are slack variables. S and R are defined as follows:

S={( x → i , x → j ): y i = y j and x → j belongs to c nearest neighborhood of x → i }

R={( x → i , x → j , x → k ):( x → i , x → j )∈S, y i ≠ y k }.

Therefore, a classifier can be trained offline by solving (7) using collected historical data with known labels as the training set, which corresponds to the offline phase of the scheme of fault classification, as mentioned in Sec. 1.

2.2.2. Energy-Based Online Classification

In this paper, energy based classification²⁹⁾ is adopted. In contrast with distance-based classification,²⁷⁾ which employs the optimal matrix M, that minimizes the loss function J in (7) as a metric, the energy-based classification directly employs the loss function J as a classifier.²⁹⁾

As mentioned in Sec. 1, the classification procedure is implemented online, i.e., for a newly arriving sample x → t with unknown labels, energy-based classification considers it as an extra training sample and computes the loss function J for each possible label y_t with the optimal matrix M that is obtained by solving the SDP (8). Then, the unlabeled sample is classified by label y_optimal if it minimizes the total loss function,²⁹⁾ i.e.,

y optimal =arg min y t {(1-μ) ∑ ( x → t , x → j )∈S D M ( x → t , x → j ) +μ { ∑ ( x → i , x → j x → t )∈R [1+ D M ( x → i , x → j ) - D M ( x → i , x → t ) ] + + ∑ ( x → t , x → j x → k )∈R [1+ D M ( x → t , x → j ) - D M ( x → t , x → k ) ] + }

(8)

3. Description of the Monitored Blast Furnaces and the Historical Data

In this paper, only the problem of fault classification of the three blast furnaces at the Liuzhou Iron & Steel Co., Ltd. is considered because a two-stage PCA-based method has been provided in literature^13,14) for their fault detection. The volumes of the blast furnaces are 2650 m³, 2000 m³ and 1500 m³.

Table 1 shows the 35 main process variables of the three blast furnaces that are monitored. Although most variables are common for the three blast furnaces, only three variables only available for Blast Furnaces No. 3 and No. 5, and only five variables are available for Blast Furnace No. 2. Please refer to the footnotes of Table 1 for details.

Table 1. Variable list of the dataset.

No.	Variable	No.	Variable
1	Oxygen enrichment rate (%)	19	Cold blast pressure(2)² (MPa)
2	Blast furnace permeability index	20	Total pressure drop (kPa)
3	CO volume¹ (%)	21	Hot blast pressure(1) (MPa)
4	H₂ volume¹ (%)	22	Hot blast pressure(2)² (MPa)
5	CO₂ volume¹ (%)	23	Actual blast velocity (m/s)
6	Blast velocity at tuyere of blast furnace (m/s)	24	Cold blast temperature² (°C)
7	Enriching oxygen flow (m³/h)	25	Hot blast temperature (°C)
8	Cold blast flow (10⁴ m³/h)	26	Top temperature (1) (°C)
9	Blast momentum (KJ)	27	Top temperature (2) (°C)
10	Blast furnace bosh gas volume (m³)	28	Top temperature (3) (°C)
11	BF bosh gas index	29	Top temperature (4) (°C)
12	Theoretical combustion temperature (°C)	30	Downcomer temperature² (°C)
13	Blast furnace top gas pressure (1) (kPa)	31	Drag coefficient
14	Blast furnace top gas pressure (2) (kPa)	32	Blast humidity
15	Blast furnace top gas pressure (3) (kPa)	33	Coal injection set value (t/h)
16	Blast furnace top gas pressure (4)² (kPa)	34	Actual coal injection rate (t/h)
17	Enriching oxygen pressure (MPa)	35	Actual coal injection in last hour (t)
18	Cold blast pressure(1) (MPa)

1. Variables only available for Blast Furnaces No. 3 and No. 5.

2. Variables only available for Blast Furnace No. 2.

A total of 21 abnormalities are confirmed and recorded for Blast Furnaces No. 2, 3 and 5 from 2012 to 2014, which cover four abnormality types. Table 2 lists the number of times each type of abnormality occurs in each blast furnace. Their corresponding data have been employed to validate the performance of the two-stage PCA-based fault detection method in literature^13,14) and will be utilized to validate the performance of the fault classification proposed in this paper.

Table 2. Summation of the abnormalities.

Abnormalities	Blast Furnace
Abnormalities	No. 2	No. 3	No. 5
Hanging	9	5	2
Channeling	1	1	0
Slipping	0	1	0
Cold furnace condition	0	1	0

In our previous work,¹⁴⁾ all 21 abnormalities have been successfully detected by the T² or SPE statistic of the two-sage PCA. The results are listed in Table 3. For each abnormality, the column Alarm Statistics lists the two statistics (i.e., T² or SPE) that exceeds the threshold and gives alarms. The column Lead Time indicates how much earlier the two-stage PCA-based method generates an alarm than the operators.¹⁴⁾

In this paper, the data that correspond to each abnormality of Table 3 will be employed as the abnormal data to train or test the classification method. The starting time is set as the alarm time of the two-stage PCA-based method,¹⁴⁾ and the ending time is set earlier than the time during which operators confirmed the abnormalities. In Table 3, the sample size for each abnormality indicates how many abnormal samples are included in the abnormal data, which is determined by its starting time and ending time.

If the corresponding abnormal samples for each abnormality are employed for training, the known abnormality type will be involved in the training. If the corresponding abnormal samples for each abnormality are employed for testing, then the abnormality type is assumed to be unknown to the algorithm and will be employed to evaluate whether the classification results of the proposed method are correct.

4. Fault Classification Based on the Contribution Vector and LMNN

4.1. Basic Idea and Complete Scheme

Figure 1 shows the scheme of the fault classification method that is proposed in this paper, which includes an offline training phase and an online classification phase, as mentioned in Sec. 1.

Fig. 1.

Scheme of fault classification.

As shown in Fig. 1, using the abnormal samples for training that are listed in Table 3 in the offline training procedure of the proposed method, the abnormal samples for each abnormality of each blast furnace are transformed via a contribution vector-based feature extraction algorithm, which will be introduced in Sec. 4.2. All extracted fault features (i.e., contribution vectors) and their known abnormality types are employed to train the classifier, which will be introduced in Sec. 4.3.

In the online classification procedure of the proposed method, as shown in Fig. 1, the contribution vector for a newly arriving abnormal sample of a blast furnace, whose abnormality type is unknown, is calculated by the feature extraction algorithm. Then, the trained classifier will use the contribution vector as input and calculate the output, which indicates which type of abnormality has occurred. Note that we assume that a newly arriving sample must be abnormal because the fault classification procedure is activated by a fault detection procedure, i.e., only after a fault detection algorithm has given an alarm about the occurrence of an abnormality, the newly arriving samples will be employed as the input of the classifier. As mentioned in Sec. 1 and Sec. 3, this paper only focuses on the fault classification, whose detailed algorithms are provided in Sec. 4.3, because the two-stage PCA-based fault detection method for the three blast furnaces has been discussed in literature.¹⁴⁾

4.2. Fault Feature Extraction Based on Variables’ Contributions

As mentioned in Sec. 1, blast furnaces may operate at different operating points and the three blast furnaces are different, which lead to different manifestations of even the same type of abnormalities. Therefore, features that are robust to different blast furnaces and different operating points should be extracted.

PCA is a statistical technique to build up a linear model for a process that maps the original process variables to a set of uncorrelated components through an orthogonal transformation. And the data space for the original process is divided into two parts: the principal space and the residual space, which can be monitored by T² and SPE statistics, respectively. Furthermore, the contributions of variables to the two statistics can be calculated. And different contributions of the corresponding variables reflect the variation of the process.As mentioned in Sec. 2, the contribution values reflect the influence of each process variable on the variation of a system from its normal state. Because an abnormality of a blast furnace is the variation from the current operating point of the ironmaking process, whereas different operating points and different blast furnaces can be regarded as various normal states of the system, contribution values should be helpful for extracting common features from the data that correspond to the same type of abnormalities regardless of different blast furnaces or different operating points.

In the first step of classification in this paper (refer to Fig. 1), i.e., feature extraction, we propose the use of the contribution vectors, i.e.,

Con t T 2 (k)=( con t T 2 ,1 (k),⋯,con t T 2 ,i (k),⋯,con t T 2 ,35 (k) ) Con t SPE (k)=( con t SPE,1 (k),⋯,con t SPE,i (k),⋯,con t SPE,35 (k) )

(9)

as the feature vectors, where con t T 2 ,i (k) and cont_SPE_,_i (k), i=1,2,···,35 denote the contribution of variable No. i in Table 1 to the abnormal T² and SPE statistics, respectively, which can be calculated according to (4).

In the following section, a numerical simulation example is given to illustrate the benefits of the contribution vector.

Let

μ 1 =( 1 8 ) , Σ 1 =( 1 0.1 0.1 1 ) , μ 2 =( 8 1 ) , Σ 2 =( 1 0.11 0.11 1.21 ) , f 1 =( 0 4 ) and f 2 =( 4 0 )

Then, let z_i_,_j=y_i+f_j, i=1,2; j=1,2 denote a two-dimensional system that operates at two possible operating points with two possible types of faults, where y_i~N(μ_i, Σ_i) for i=1,2 denotes normal process operation at the i^th operating point, where f_j ∈ R² for j=1,2 denotes the j^th type of fault and z_i_,_j denotes the process variable that operates at the i^th operating point with the j^th type of fault.

Figure 2 displays the simulation results for two normal cases, i.e., y₁ and y₂, in which each case has 1000 samples, and four faulty cases, i.e., z_1,1, z_1,2, z_2,1, z_2,2, in which each case has 50 samples.

Fig. 2.

Two types of faults at two operating points. (Online version in color.)

In Fig. 2, the points in the same color are desired to be classified in the same group, i.e., points for z_1,1 and z_2,1, and points for z_1,2 and z_2,2 are desired to be classified in the same group, respectively. However, because red points (for z_1,2) are located between the two sets of black points (for z_1,1 and z_2,1) and black points (for z_2,1) are located between the two sets of red points (for z_1,2 and z_2,2), a classifier (which usually classifies similar data into the same group) has difficulty yielding correct classification results or the classifier may classify z_1,2 and z_1,1 or z_1,2 and z_2,1 in the same group and these two possibilities may further lead to different wrong classifications for z_2,2.

Let z i,j 1 and z i,j 2 denote two elements of z_i_,_j, i=1,2; j=1,2. Let con t T 2 , z i,j 1 (k) and con t T 2 , z i,j 2 (k) denote the contribution of z i,j 1 and z i,j 2 to the T² statistic, respectively, which are calculated according to (4). Define the contribution vector as Con t z i,j (k)=( con t T 2 , z i,j 1 (k), con t T 2 , z i,j 2 (k) ) , where k=1,···,50. Figure 3 shows the points Con t z i,j , which have a one-to-one relationship with the points z_i_,_j in Fig. 2. We use black stars and black circles to denote Con t z 1,1 and Con t z 2,1 , respectively, and use red stars and red circles to denote Con t z 1,2 and Con t z 2,2 , respectively. Via the transformation, the two types of black points are situated near the vertical axis, and two types of red points are situated near the horizontal axis. Thus, the black and red points are separated, and the influence of the operating points is eliminated, which enables any type of classifier to make a correct classification with these new points (i.e., contribution vectors) as fault features.

Fig. 3.

Contributions of two variables. (Online version in color.)

An additional merit of using a contribution vector as a fault feature is that its use is consistent with the practical experience of operators because operators usually classify abnormalities according to the trends of key variables that deviated from their current running state regardless of different operating points, which indicates that operators believe that the “symptoms” of the same type of abnormalities are similar regardless of the operating point.

We may normalize the feature vector at sampling instant k as

Cont_normalize d * (k)= [con t *,1 (k),…,con t *,i (k),…,con t *,35 (k)] ∑ i=1 35 con t *,i (k) 2

(10)

to guarantee that ‖ Cont_normalize d * (k) ‖ 2 =1 , where the subscript ‘*’ denotes T² or SPE and has the same meaning in (11).

To reduce the data scale and suppress the effect of noise, the variables’ contributions can be smoothed by averaging over a ten-sample interval, i.e.,

Cont_ave r * (k)= 1 10 ∑ i=1 10 Cont_normalized(10k+i-10)

(11)

Therefore, the sample sizes of the contribution vectors in Sec. 4 are approximately one-tenth of the samples sizes of the abnormal samples listed in Column Sample Size of Table 3 in Sec. 3.

The algorithm to extract contribution vectors as fault features, whose scheme is shown in Fig. 4, can be summarized as follows.

Fig. 4.

Algorithm flowchart of feature extraction algorithm.

Algorithm 1: Feature extraction

1. For abnormal samples x(k), k=1,⋯,N of an abnormality, calculate the T² and SPE statistic based on a two-stage PCA method according to (2) and (3). Determine which statistic exceeds the corresponding threshold.

2. Calculate the contribution of the i^th variable to the abnormal statistic con t *,i (k),i=1,⋯,35 according to (4), and compose the contribution vectors Con t * (k) , k=1,⋯,N according to (9).

3. Normalize the contribution vectors according to (10).

4. Smooth the contribution vectors by a moving average according to (11), and obtain the final contribution vectors Cont_ave r * (k) , k=1,⋯,N/ 10 as the fault feature.

Remark 1: For abnormal samples of an abnormality, Algorithm 1 may generate one set of contribution vectors—T²-based or SPE-based—or two sets of contribution vectors—both T²-based and SPE-based—depending on which statistic exceeds its threshold, as shown in Fig. 4.

According to (9), to fuse the data of the three blast furnaces, we need to calculate the contribution values of all 35 variables in Table 1 for each blast furnace based on its process data. However, slight differences exist among the variable lists of the three blast furnaces, i.e., variables No. 16, 19, 22, 24 and 30, only exist in Blast Furnaces No. 3 and No. 5 but are missing in Blast Furnace No. 2. However, variables No. 3, 4 and 5 only exist in blast furnace No. 2 but are missing in Blast Furnaces No. 3 and No. 5. Thus, reasonable values for the contributions of the missing variables to the statistics are needed. In this paper, their contributions are set to 0, which is a simple but effective approach.

4.3. Offline Training and Online Classification Algorithms

Let m=4 denote the total number of abnormality types and let K denote the total number of abnormalities, which are covered by the abnormal samples employed for training. According to Fig. 5, which displays the flowchart of the offline training algorithm, K= k 1 + k 2 + k 3 , where k₁, k₂ and k₃ are the numbers of abnormalities occurring in Blast Furnace No. 2, No. 3 and No. 5, respectively. And the sum K is the number of all the abnormalities collected from three blast furnaces. As in the majority of the classification methods, we use the integer y_i to represent the type of the i^th abnormality, where y i =1,⋯,4 represents hanging, channeling, slipping and cold furnace condition, respectively. For the i^th abnormality, N_i samples, i=1,⋯,K , exist.

Fig. 5.

Algorithm flowchart of offline training algorithm.

The offline training algorithm can be summarized as follows:

Algorithm 2: Offline training

1. For the i^th abnormality in Fig. 5, i=1,⋯,K , perform feature extraction to N_i abnormal samples according to Algorithm 1, which is given in Sec. 4.2, and obtain the T²-based contribution vectors Cont_ave r T 2 ,i (k) or/and (according to which statistic is abnormal) the SPE-based contribution vectors Cont_ave r SPE,i (k) , where k=1,⋯, N i / 10 .

2. Label the contribution vectors Cont_ave r T 2 ,i (k) or/and Cont_ave r SPE,i (k) , where k=1,⋯, N i / 10 , for the i^th abnormality with y_i (i.e., the known type of the i^th abnormality).

3. Use all labeled contribution vectors Cont_ave r T 2 ,i (k) or/and Cont_ave r SPE,i (k) , where k=1,⋯, N i / 10 and i=1,⋯,K , to train the two classifiers based on the LMNN technique by solving the SDP according to (7).

As mentioned in Sec. 1, a complete fault diagnosis procedure includes fault detection and fault classification. However, the online classification procedure will not be activated until the two-stage PCA method gives an alarm, i.e., if the T² statistic or SPE statistic for newly arriving samples exceed the corresponding threshold. Thus, the online classification procedure assumes that each newly arriving sample that is fed into the online classification algorithm must be an abnormal sample, with an unknown abnormality type.

The online classification algorithm, which is shown in Fig. 6, can be summarized as follows.

Fig. 6.

Algorithm flowchart of online classification algorithm.

Algorithm 3: Online classification

1. For every ten newly arriving samples (because the width of the moving window is set to 10 according to (11)), calculate the corresponding contribution vector Cont_ave r * (k) , where ‘*’ denotes the abnormal statistic, i.e., T² or SPE, by Algorithm 1 in Sec. 4.2.

2. According to whether the contribution vector is based on T² or SPE, select the corresponding T²- or SPE-based classifier that is trained by Algorithm 2 and calculate its output according to (8) with the feature vector Cont_ave r * (k) as an input. Then, the output of the classifier, which is represented by the number y * (k)∈{1,2,3,4} , gives the classified abnormality type.

Because any abnormality will be retained before it is handled by an operator, a succession of abnormal samples exist for a newly occurring abnormality, which produces a succession of classification results, which may not always be consistent. In addition, the classification results of T²-based contribution vectors may differ from the classification results for SPE-based contribution vectors. Thus, a final decision is necessary to determine the type of the abnormality based on the multiple classification results given by T²- and SPE-based contribution vectors for multiple samples, which may not be consistent. The decision logic is illustrated as follows:

Algorithm 4: Decision logic

1. Set the length threshold L and the majority threshold β.

2. Let M(j) denote how many times Type j appears in the L classification results for j=1,⋯,4 . Let j_m denote the type that appears for the most times, i.e., M( j m )= max 1≤j≤4 [ M(j) ] . Then, determine M(j_m) and j_m. In some cases, both the T² and SPE statistic may give alarms in the stage of fault detection, which will cause Algorithm 3 to yield classification results for both T²- and SPE-based contribution vectors, i.e., M( j m, T 2 ) and M( j m,SPE ) . A simple and direct method for fusing these two types of classification results is to let j m = j m,* , where ‘*’ denotes T² or SPE and M( j m,* )=max[ M( j m, T 2 ),M( j m,SPE ) ] . Then, let M( j m )= max[ M( j m, T 2 ),M( j m,SPE ) ] .

3. If M( j m ) L ×100%>β , set the final classification result to j_m because Type j_m has appeared a sufficient number of times among the L classification results (exceeding the threshold β). In the case that M(j_m) does not exceed the threshold β, which indicates an insufficient number of consistent classification results, we have to maintain an undetermined abnormality type, which indicates that the algorithm fails to make the classification.

4.4. Experimental Results

In this section, the proposed method will be tested based on real historical abnormal data collected from three blast furnaces at the Liuzhou Iron & Steel Co., Ltd., as shown in Table 3. Two cases are designed in this paper.

In the first case, Abnormalities No. 19, 20 and 21 of Blast Furnace No. 5 are employed as the testing set, whereas the other abnormalities, i.e., Abnormalities No. 1–No. 18 for Blast Furnaces No. 2 or No. 3 are employed as the training set. In this case, no abnormal samples for Blast Furnace No. 5 exist in the training set, which indicates that the three testing abnormalities are novel to Blast Furnace No. 5. Table 4 lists the sample sizes of the training set and the testing set.

Table 4. Sample sizes of the training set and testing set.

Feature Vectors	T²-based	SPE-based
Training set	1519	750
Testing set	292	292

In the second case, more abnormalities are employed to test the total performance of the proposed method, i.e., Abnormalities No. 4, 8, 10, 11, 13, 15, 16, 18, and 21 and part of No. 12 and 17 are employed as the testing set, and Abnormalities No. 1, 2, 3, 5, 6, 7, 9, 14, 19, and 20 and the remaining part of No. 12 and 17 are employed as the training set. Because only one cold furnace condition and one slipping exist in the collected data, which correspond to Abnormalities No. 12 and No. 17, respectively, the data for Abnormalities No. 12 and 17 are divided into two parts. The first part is employed for training data, and the second part is employed for testing data. Table 6 lists the sample sizes of the training set and the testing set.

Table 6. Sample sizes of the training set and testing set.

	T²-based Contribution Vectors	SPE based Contribution Vectors	Original Data
Training Set	999	426	1126
Testing Set	812	616	945

4.4.1. Parameter Settings for LMNN-Based Classification

As introduced in Sec. 2, two parameters can be tuned for the LMNN algorithm: The number of target neighbors c and the weighting coefficient μ. In our study, c=3 and μ=0.5 is chosen.

As a rule of thumb, c is chosen as 1 or 3,²⁹⁾ which is given in the reference list in both the original and the revised manuscript. Generally, a large c suppresses the effect of noises of classification. And the value c=3 works well for our practice. Therefore, in this paper, c is chosen as 3.

The weighting parameter, μ∈[0,1] , controls the “pull/push” trade-off, which can be tuned via cross validation. However, as is reported in Weinberger’s work,²⁹⁾ the optimization of the LMNN is not sensitive to the value of μ. Moreover the value μ=0.5 works well for our practice.

For the decision logic, i.e., Algorithm 4 in Sec. 4.3, the length threshold is the sample size of each abnormality, and the majority threshold is β=90%.

4.4.2. Testing Results for Case I

The classification results of the two classifiers, which are based on T²-based contribution vectors and SPE-based contribution vectors, are listed in Table 5. Regardless of which classifier is employed, the two hanging abnormalities (i.e., Abnormalities No. 19 and 21) are correctly classified with a very low error rate; however, none of the classifiers can correctly recognize the channeling abnormality (i.e., Abnormality No. 20). These results indicate the merit of fusing data that were collected from different blast furnaces for fault classification because no abnormal samples are involved in the training set for Blast Furnace No. 5. Because 14 hanging abnormalities in Blast Furnaces No. 2 and No. 3 are included in the training set, the method recognizes the same abnormality for Blast Furnace No. 5. The channeling abnormality (i.e., Abnormality 20) cannot be correctly classified because only two channeling abnormalities are alarmed by the SPE statistic in the historical data for Blast Furnaces No. 2 and No. 3, which are substantially insufficient.

Table 5. Experimental results of Case I.

Abnormality No.	Abnormality Type	Sample Size	Classification Error Rate (%)		Correctness of the Classification Result
Abnormality No.	Abnormality Type	Sample Size	T²-based	SPE-based	Correctness of the Classification Result
19	Hanging	161	0.62	0.00	True
20	Channeling	32	100.00	100.00	False
21	Hanging	99	0.00	0.00	True

Table 5 provides the correctness of the final classification results given by Algorithm 4 because the classification results given by the two statistics (i.e., T² and SPE) at different instants may not be consistent, as mentioned in Sec. 4.3.

To illustrate the classification procedure, Fig. 7 shows the online classification process for Abnormality No. 21. Subfigure 7(a) shows all the process variables with their original amplitudes, and their corresponding units are shown in Table 1. Some variables appear overlapped at the bottom of Subfigure 7(a), because their amplitudes are relative small with comparison to the red, yellow, green and blue ones, which correspond to enriching oxygen flow, blast furnace bosh gas volume, theoretical combustion temperature and hot blast temperature, respectively. Subfigure 7(b) and 7(c) show the changes of T² statistic and SPE statistic along with time, respectively, according to which the two-stage PCA based¹³⁾ method generates an alarm, i.e. an abnormality is detected. The time when the abnormality is detected is marked on both Subfigure 7(b) and 7(c). Subfigure 7(d) shows the outputs of the classifiers for both T² based and SPE based contribution vectors.

Fig. 7.

Online classification process for Abnormality No. 21. (Online version in color.)

As shown in Fig. 7, a newly occurring abnormality is detected by the T² statistic and the SPE statistic at the 93000^th instant using the two-stage PCA method proposed in literature.¹³⁾ Then, the classification procedure that is proposed in this paper is activated. Contribution vectors based on the T² statistic and the SPE statistic are calculated by Algorithm 1 in Sec. 4.2, with the exception of the samples that correspond to strong disturbances, which are denoted by dashed green rectangles. For details on the identification of disturbances, please refer to literature.¹³⁾ Using Algorithm 3 in Sec. 4.3, the two classifiers for the T²-based contribution vector and the SPE-based contribution vector will yield an integer that indicates the possible type of abnormality as output, where the integers 1, 2, 3 and 4 represent the hanging, channeling, slipping and the cold furnace condition, respectively; they are denoted in in red stars and green stars, respectively, in subfigure (d). All classification results are consistent and correct, which indicates that they belong to the same abnormality type. The majority threshold β=90% that is adopted in this paper indicates that the type of the newly occurring abnormality can be determined if the percentage of the majority decisions exceeds 90%. With regard to Abnormality No. 21, the decision would be hanging, which is correct.

4.4.3. Testing Results for Case II

In this case, a larger number of abnormality samples are included in the testing set to evaluate the total classification performance with limited faulty data, i.e., 11 abnormalities are employed as the testing set and 13 abnormalities are employed as the training set (as mentioned in Sec. 4.4.1, Abnormality No. 12 and Abnormality No. 17 are divided into two parts and employed by the training set and the testing set, respectively). In addition, the proposed method with the contribution vectors as the input of the classifier is compared with the direct use of the LMNN classification method with the original abnormal samples as the input to demonstrate the advantage of contribution vectors.

Table 7 lists the classification error rates of three classifiers that are based on T²-based contribution vectors, SPE-based contribution vectors and original data, and the correctness of the final classification results given by Algorithm 4. Note that ‘/’ in Table 7 indicates that no corresponding classification results exist because the corresponding abnormalities are not alarmed by the T² or SPE statistic, which causes a difference in the sample sizes between T²-based contribution vectors and SPE-based contribution vectors.

Table 7. Experimental results of Case II.

Abnormality No.	Abnormality Type	Sample Size	Classification Error Rate (%)			Correctness of thee Classification Result
Abnormality No.	Abnormality Type	Sample Size	T²- Based	SPE-Based	Original Data	Contribution Vector-Based	Original Data-Based
4	Hanging	29	0.00	0.00	20.69	True	False
8	Hanging	84	0.00	/	0.00	True	True
10	Hanging	70	0.00	/	0.00	True	True
11	Hanging	279	2.51	10.04	38.35	True	False
12	Slipping	32	/	0.00	0.00	True	True
13	Hanging	141	0.00	/	0.00	True	True
15	Hanging	34	0.00	/	0.00	True	True
16	Channeling	101	/	0.00	66.34	True	False
17	Cold Furnace Condition	29	3.45	0.00	31.03	True	False
18	Hanging	47	0.00	10.64	57.45	True	False
21	Hanging	99	0.00	10.10	2.02	True	True

From Table 7, the following facts can be concluded:

• The total performance of the fault classification based on contribution vectors is satisfactory. The majority of the error rates are less than 10% and only the error rates of Abnormalities No. 11, 18 and 21 by SPE-based contribution vectors are slightly larger than 10%. Therefore, the proposed classification method yields an acceptable classification accuracy.

• The performance of the contribution vector-based classification outperforms the performance of the original data-based classification for all abnormalities, except that the error rate of the original data-based classification is lower than the error rate of the SPE-based contribution vectors in Abnormality 21.

• According to the decision logic in Algorithm 4 of Sec. 4.3, with the majority threshold β=90%, the 11 abnormalities are correctly classified by contribution vector-based classification, whereas only six abnormalities can be correctly determined by the original data-based classification.

Figure 7 has already illustrated the detailed detection and classification procedure for a hanging abnormality, i.e. Abnormality No. 21 in Section 4.4.2. To show the procedure more comprehensively for the other three types of abnormalities, figures for the detection and classification procedures of Abnormalities No. 12, 16 and 17 are given in Figs. 8, 9, 10, respectively, which correspond to slipping, channeling and cold furnace condition, respectively. Note that the symbols in Subfigures (a)–(d) of Figs. 8, 9, 10 have similar meaning to those in Fig. 7, and to make a comparison between the classification performance based on contribution vectors and that based on original data, Subfigure (e) is given to show the classification results based on original data in Figs. 8, 9, 10. And the classification based on contribution vectors outperforms the classification based on original data for Abnormalities No. 16 and 17, i.e. a channeling and a cold furnace condition, which is shown in Figs. 9 and 10, while they get the same classification results for Abnormality 12, i.e. a slipping, which is shown in Fig. 8.

Fig. 8.

Classification results of Abnormality No. 12. (Online version in color.)

Fig. 9.

Classification results of Abnormality No. 16. (Online version in color.)

Fig. 10.

Classification results of Abnormality No. 17. (Online version in color.)

5. Conclusions

In this paper, a PCA-LMNN-based method for the fault classification of the ironmaking processes at the Liuzhou Iron & Steel Co., Ltd. is proposed. Historical faulty data of three different blast furnaces are fused to address the lack of abundant historical data. To overcome the influence of the changes of operating points and the differences among blast furnaces, contribution vectors that are based on the T² or SPE statistic are employed as fault features, which are extracted by a two-stage PCA-based method. The LMNN is utilized to train classifiers, which is a suitable choice for multi-class classification. The experimental results show that the proposed method can obtain the desired performance, which combines the advantages of both the multivariate statistical process control technique and the machine learning technique.

In addition, many interesting problems require future investigation, for example, methods for adding a new abnormality, which is identified by operators, to a historical database.

Another possible study is to improve the classification performance by introducing a nonlinear classifier. Due to the complexity of the ironmaking process, even the root causes of abnormalities that belong to the same type, which are identified by operators, may differ. Therefore, abnormalities clustering, which can be achieved by unsupervised learning techniques, would be meaningful.

Acknowledgments

This study was supported by the National Natural Science Foundation of China under Grants 61290324 and 61490701.

References

1) J. G. Peacey and W. G. Davenport: The Iron Blast Furnace: Theory and Practice, Pergamon Press, New York, (1979), 34.
2) M. Geerdes, H. Toxopeus, C. van der Vliet, R. Chaigneau and T. Vander: Modern Blast Furnace Ironmaking: An Introduction, IOS Press, Amsterdam, (2009), 4.
3) A. S. Foundation: The Making, Shaping, and Treating of Steel: Ironmaking Vol. 2, AISE Steel Foundation, Pittsburgh, (1999), 229.
4) L. Liu, A. Wang, M. Sha, X. Sun and Y. Li: ISIJ Int., 51 (2011), 1474.
5) S. Amano, T. Takarabe, T. Nakamori, H. Oda, M. Taira, S. Watanabe and T. Seki: ISIJ Int., 30 (1990), 105.
6) J. Dorn: IEEE Expert, 11 (1996), 18.
7) H. Lu, B. Gao, L. Zhao, H. Guo and T. Yang: J. Univ. Sci. Technol. B, 3 (2002), 013.
8) A. Klinger, B. Schürz, K. Stohl, M. Schaler and T. Kronberger: Expert Systems Controlling the Iron Making Process in Closed Loop Operation, INTECH Open Access Publisher, Rijeka, (2010), 117.
9) S. Joe Qin: J. Chemom., 17 (2003), 480.
10) J. F. Macgregor and T. Kourti: Control Eng. Pract., 3 (1995), 403.
11) F. I. Gamero, J. Colomer, J. Meléndez and P. Warren: Eng. Appl. Artif. Intell., 19 (2006), 103.
12) E. Vanhatalo: Qual. Reliab. Eng. Int., 26 (2009), 495.
13) T. Zhang, H. Ye, W. Wang and H. Zhang: ISIJ Int., 54 (2014), 2334.
14) T. Zhang, H. Ye and W. Wang: Chinese Automation Congress (CAC), 2015, IEEE, New York, (2015), 893.
15) J. V. Abellan-Nebot and F. R. Subirón: Int. J. Adv. Manuf. Tech., 47 (2010), 237.
16) K. Worden and J. Dulieu-Barton: Struct. Health Monit., 3 (2004), 85.
17) B. Li, M.-Y. Chow, Y. Tipsuwan and J. C. Hung: IEEE Trans. Ind. Electron., 47 (2000), 1060.
18) V. Venkatasubramanian, R. Rengaswamy, S. N. Kavuri and K. Yin: Comput. Chem. Eng., 27 (2003), 327.
19) A. Wang, L. Zhang and N. Gao: Proc. 3rd Int. IEEE Conf. on Intelligent Systems 2006, IEEE, Piscataway, NJ, (2006), 571.
20) A. Wang, L. Zhang, N. Gao and M. Wang: Sixth World Cong. on Intelligent Control and Automation 2006, IEEE, Piscataway, NJ, (2006), 5615.
21) L. Liu, A. Wang, M. Sha and F. Zhao: J. Iron. Steel. Res. Int., 18 (2011), 17.
22) V. R. Radhakrishnan and A. R. Mohamed: J. Process Control, 10 (2000), 509.
23) J. Chen: Eng. Appl. Artif. Intel., 14 (2001), 77.
24) S. Yuan and F. Chu: Mech. Syst. Signal Process, 20 (2006), 939.
25) J. Austin, R. Davis, M. Fletcher, T. Jackson, M. Jessop, B. Liang and A. Pasley: Proc. IEEE, 93 (2005), 496.
26) S. Yoon and J. F. MacGregor: J. Process Control, 11 (2001), 387.
27) K. Q. Weinberger, J. Blitzer and L. K. Saul: Advances in Neural Information Processing Systems 18, MIT Press, Cambridge, (2005), 1473.
28) I. Jolliffe: Principal Component Analysis, Springer-Verlag, New York, (2002), 1.
29) K. Q. Weinberger and L. K. Saul: J. Mach. Learn. Res., 10 (2009), 207.

Corresponding author

Register with J-STAGE for free!