Adaptive Weighting Just-in-Time-Learning Quality Prediction Model for an Industrial Blast Furnace

Kun Chen; Yi Liu

doi:10.2355/isijinternational.ISIJINT-2016-292

Abstract

Development of accurate soft sensors for online quality prediction (e.g., silicon content) in an industrial blast furnace is a difficult task. A novel just-in-time-learning (JITL) prediction approach using adaptive feature-weighting for similar samples is developed. First, a dual-objective joint-optimization framework is proposed to introduce both input and output information into the model. Then, a suitable similarity criterion with feature weighting strategy is formulated, which is not considered in conventional JITL methods. Moreover, the trade-off parameter in the joint-optimization problem can be chosen automatically, without the time-consuming cross-validation procedure. The proposed method is applied to online predict the silicon content in an industrial blast furnace in China. Compared with other JITL-based soft sensors, better prediction performance has been obtained.

1. Introduction

The blast furnace ironmaking process is an important unit operation in the manufacturing of iron and steel. It consumes the main energy input to the integrated route of steel production and emits much carbon oxide which is the main cause of the greenhouse effect. As one of the most energy intensive and complicated industrial process, there has been a growing awareness of modeling and controlling the blast furnace ironmaking process for increasing efficiency and reducing cost. The silicon content, indicating the thermal state in blast furnace, is the most important index of pig iron quality. It must be kept at an appropriate level to facilitate the production and stable running of the ironmaking process. Therefore, accurate online prediction of the silicon content in hot metal is very critical.^1,2,3,4,5,6) Extensive research on the thermodynamic and kinetic behaviors occurring inside the blast furnace ironmaking process has been investigated. However, an accurate mechanism model in industrial processes has not been constructed.

Nowadays, a large amount of process data containing useful information can be obtained in industrial blast furnace ironmaking processes. To online predict the silicon content, various data-driven soft sensor modeling approaches, including various neural networks,^{7,8,9,10,11,12,13,14)} partial least squares,^14,15) fuzzy inference systems,¹⁶⁾ nonlinear time series analysis,^17,18,19,20) subspace identification,²¹⁾ support vector regression (SVR) and least squares SVR (LSSVR),^22,23,24) and others^{25,26,27,28,29)} have been investigated. A recent overview of black-box models for short-term silicon content prediction in blast furnaces can be referred to.³⁰⁾ Without substantial understanding of the complicated phenomenology, the data-driven soft sensor models can be built in a quick manner.^30,31,32) Among them, SVR and LSSVR have shown promising prediction performance, especially when the training data are insufficient.^22,23,24,33)

One main disadvantage of most existing data-driven modeling approaches for the silicon content is that a single global model is built. However, it is not enough to describe all the process characteristics only using a single model,^{34,35,36,37,38,39,40)} especially for some complicated regions with insufficient information. To improve the prediction performance, Nurkkala et al.²⁹⁾ presented multiple autoregressive vector models to describe complex systems. On the other hand, although moving-window-based recursive soft sensors can gradually be adapted to new operational conditions, how to choose a suitable moving-window size for complex blast furnace ironmaking processes is difficult.^10,11,17) Additionally, most recursive models may not function well in a new operational region until a sufficient period of time has passed because of the time delay when they adapt themselves to new operational conditions. Recently, the just-in-time LSSVR (JLSSVR) modeling approach has been applied to industrial ironmaking processes.⁴⁰⁾ For a query sample, the JLSSVR based local model is online built using its similar samples. Consequently, a suitable JLSSVR model can better describe process nonlinearity directly. However, there are still two disadvantages of JLSSVR. First, data samples utilized for construction of a JLSSVR model are assigned with the same feature weight. This is inconsistent with many practical applications mainly because each input variable has its special impact on the final quality. Second, for JLSSVR and most current just-in-time modeling methods,^{34,35,36,37,38,40)} similarity measurements are only considered on the input variables. However, the information of output variables is not utilized.

This work mainly develops an adaptive just-in-time-learning (JITL)-based local model for better prediction of the silicon content. First, a dual-objective optimization framework is proposed to preserve the local structure of both input and output variables. Then, according to the importance, weights are assigned to the variables in the projection space. Using the new similarity, the relevant samples can be selected and weighted. Moreover, the trade-off parameter in the optimization problem can be chosen in a efficient manner, without the time-consuming cross-validation. All these improvements can enhance the performance of JITL-based models.

The remainder of this paper is organized as follows. The JLSSVR soft sensor modeling method is described in Section 2. In Section 3, the detailed implementations of the adaptive weighting JLSSVR model are developed. It is evaluated by the silicon content online prediction in an industrial process in Section 4. Comparison studies with other methods are also investigated. Finally, a conclusion is made in Section 5.

2. Local Soft Sensor Model for Online Quality Prediction

2.1. Basic LSSVR Method

Using the kernel learning framework, the SVR/LSSVR soft sensor model is to learn a mapping f: X→Y with a modeling set S = {X, Y} = { x i , y i } i=1 N . A general form of the kernelized nonlinear model for process modeling can be formulated as³⁸⁾

y i =f( x i ;θ ) + e i =f( x i ;w, b ) + e i = w T ϕ( x i ) +b+ e i ,i=1,⋯,N

(1)

where y_i and e_i denote the output measurement and the process noise at instance i, respectively; x_i is usually composed of several measured variables, and f is the wanted model; ϕ is a feature map; θ = [w^T,b]^T; that is, the symbols w and b are the model parameter vector and the bias term, respectively.³⁸⁾ Applying the LSSVR framework to Eq. (7),³³⁾ the following optimization problem is formulated

{ min J( w,b ) = 1 2 ‖ w ‖ 2 + γ 2 ‖ e ‖ 2 s.t. y i - w T ϕ( x i )-b- e i =0, i=1,⋯,N

(2)

where e = [e₁,…,e_N]^T is the approximation error. The user-defined regularization parameter γ (γ > 0) determines the trade-off between the model’s complexity and approximation accuracy, and a suitable choice of γ can prevent over-fitting.³³⁾ After solving the optimization problem in Eq. (2),³³⁾ the solution can be expressed as

{ α=P[ y- 1 1 T Py 1 T P1 ] b= 1 T Py 1 T P1

(3)

where α = [α₁,…,α_N]^T are Lagrange multipliers; y = [y₁,…,y_N]^T; I∈R^N^×^N is a unit matrix and 1 is a vector of ones; P= ( Ω+ I γ ) -1 . The kernel matrix Ω is composed of elements Ω(i,j) = ⟨ ϕ( x i ),ϕ( x j ) ⟩ , ∀i,j = 1,...,N.³³⁾ Briefly, the development of an LSSVR-based soft sensor model amounts to solving a set of linear equations using the kernel transform. Compared with solving a convex quadratic programming problem of the SVR method, LSSVR can be implemented more efficient and easily.³³⁾

Finally, the LSSVR model for prediction of a test sample x_q, i.e., y ˆ q , can be obtained³³⁾

y ˆ q =f(θ, x q ) = ∑ i=1 N α i ⟨ ϕ( x i ) ,ϕ( x q ) ⟩ +b = α T k q +b

(4)

where α=[α₁,…,α_N]^T are Lagrange multipliers related to the model parameter vector w, and k_q(i) = ⟨ ϕ( x i ),ϕ( x q ) ⟩ , ∀i = 1,...,N is a kernel vector.

2.2. JLSSVR-based Local Model

For some industrial processes, the direct application of a global/fixed model with complicated structure is often difficult. Another limitation of the global methods is that it is difficult for them to be updated in a quick manner when the process dynamics are changing.³⁷⁾ Additionally, for some situations with more complex characteristics, the training data samples are not sufficient. Therefore, only using an LSSVR/SVR model for industrial ironmaking processes is still not enough.

To alleviate these problems and construct the local models automatically, the JITL method inspired by the ideas of local modeling has been developed as an alternative to nonlinear process modeling and control.^{34,35,36,37,38,39,40)} As illustrated in Fig. 1, for a query sample x_q, there are three steps to construct a JITL model (take JLSSVR for example). First, select similar samples as a similar set S_q in the database S based on some defined similarity criterions. Second, construct a JLSSVR model f_JLSSVR(x_q) using the relevant dataset S_q. Third, online predict the output y ˆ q for the current query sample x_q and then discard the JLSSVR model f_JLSSVR(x_q).⁴⁰⁾

Fig. 1.

The common flowchart of JITL-based online soft sensor modeling method.

With the same three-step implements, a new JLSSVR model can be constructed for the next query sample. The Euclidean distance-based similarity is commonly utilized.^{34,35,36,37,38,39,40)} The similarity index (SI) between the query sample x_q and the sample x_i in the dataset is defined below³⁵⁾

S x,qi =exp( - d qi ) =exp( -‖ x i - x q ‖ ) , i=1,⋯,N

(5)

where d_qi is the distance similarity between x_q and x_i in the dataset. The value of S_x,qi is bounded between 0 and 1 and when S_x,qi approaches to 1, x_q resembles x_i closely. With the distance-and-angle-based similarity factor, a little better prediction performance can be obtained. However, another parameter for balance of the distance and the angle should be chosen.³⁶⁾ Additionally, some correlation-based similarity criteria have been proposed recently.³⁷⁾ Based on the these similarity criteria, a relevant dataset S_q with n_q similar samples can be adopted to build a JLSSVR model.

For construction of the JLSSVR model, the user-defined parameters include the regularization parameter γ, and the kernel parameter (e.g., the Gaussian kernel K( x i , x j )=exp[ - ‖ x i - x j ‖ 2 /σ ] with the width parameter σ > 0). With a pair of parameters [γ, σ], the JLSSVR model can be constructed. The leave-one-out (LOO) cross validation has been shown to provide a sensible criterion for model selection.⁴¹⁾ However, the LOO criterion for LSSVR is computationally expensive and, therefore, is not suitable for online performance. Fortunately, a fast LOO (FLOO) cross-validation criterion was proposed by Cawley and Talbot.⁴¹⁾ Based on the FLOO criterion,^40,41) the JLSSVR model can be optimized by solving the minimum FLOO-based error with n_q samples.

E n q FLOO = ∑ i=1 n q ‖ e i FLOO ‖ = ∑ i=1 n q ‖ α i P ii + s i 2 /o ‖

(6)

where P_ii is the item at the ith row and ith column of P in Eq. (3), s = P1 = [s₁,⋯,s_{n_q}]^T and o = −1^TP1. The related terms (i.e., P and α_i) are available. Additionally, the computational load of s and o is small. The complexity of FLOO can be reduced to about O( n q 3 ) operations, compared to the basic LOO with about O( n q 4 ) operations.⁴¹⁾ Consequently, the computation is much more efficient and online selection of the parameters of JLSSVR is feasible.

3. Adaptive Weighting Local Soft Sensor Model

Although the criterion in Eq. (5) has been widely adopted in JITL-based methods for process modeling and monitoring, the similarity still may not be described adequately in two folds. First, different input variables are equally weighting, which is inconsistent with many practical situations. Additionally, similarity measurements are only built on the input variables (i.e., in Eq. (5)). No information in output quality variables is used.

Recently, a supervised locality preserving projection (LPP) strategy⁴²⁾ was proposed to construct the similarity measurement using both of input and output information.³⁹⁾

S I ij = S x,ij η S y,ij 1-η

(7)

where S_x,ij is the closeness measurement of the input variables between the i^th and j^th samples; S_y,ij is the closeness measurement of the output variables between the i^th and j^th samples; and 0 ≤ η ≤ 1 is a parameter to balance the importance of input and output information.

With the similarity defined in Eq. (7), an LPP approach is employed to seek the mapping direction, which serves as the weights for variables. Then, relevant samples are selected according to the Euclidean distance in the projection space. Unfortunately, there are still two drawbacks in the LPP-based similarity method. First, the parameter η is selected through the cross-validation method, which is really time-consuming. Second, in the low-dimensional space, the internal variables are assigned with the equal weights, which may cause the same problem as the traditional criterion in.³⁶⁾

The proposed relevant sample selection strategy is to construct an adaptive weighted distance as the similarity criterion by adequately utilizing the information in both of input and output variables. Consequently, the LPP algorithm⁴²⁾ is adopted to keep the local structure of both input and output information. And it is solved in a dual-objective optimization form as a general eigen problem. Then, according to the eigenvalue of projection direction, adaptive weights are assigned to the internal variables to construct the similarity criterion to select relevant samples.

3.1. Feature Extraction Using LPP

One challenge to establish the suitable similarity criterion is to select representative features from various input variables while preserving the local structure as much as possible. To achieve this goal, the LPP approach is employed. LPP⁴²⁾ is a recent dimensionality reduction method which has been successfully applied in information retrieval and pattern recognition areas. LPP aims at preserving the neighborhood structure of the data set, while principal component analysis (PCA) only retains most of the original variance.⁴²⁾ Additionally, LPP shares many of the properties of nonlinear methods such as locally linear embedding⁴³⁾ and Laplacian eigenmaps.⁴⁴⁾ Therefore, compared with PCA, LPP can reveal the intrinsic geometrical structure of the observed data and find more meaningful low-dimensional information hidden in the high-dimensional observations. Moreover, the linear property of LPP makes it efficient in computation and suitable for practical applications.⁴²⁾

Given a set of m-dimensional input variables X = {x₁,…,x_N} with corresponding output variables Y = {y₁,…,y_N} where x_i∈R^m, LPP aims to find a transformation matrix B = [β₁,…,β_d] to project these samples to a low-dimensional sample set Z = {z₁,…,z_N} in R^d (d ≤ m) based on following objective function:⁴²⁾

J LPP =min ∑ i,j=1,⋯,N,i≠j ( z i - z j ) 2 S I ij

(8)

where SI_ij indicates the similarity between the i^th and j^th samples, and z_i is the one-dimensional representation of x_i with a projection vector β, i.e., z_i = β^Tx_i, with z_i as the vector of most important d representation of z_i.

To better utilizing both the secondary and primary information, a dual-objective optimization scheme is proposed:

{ J x =min ∑ i,j=1,⋯,N,i≠j ‖ z i - z j ‖ 2 S x ij J y =min ∑ i,j=1,⋯,N,i≠j ‖ z i - z j ‖ 2 S y ij

(9)

It is expected that, after projection, the nearby points, with both similar input and output variables, are still close in the low-dimensional projection space. However, these two objectives are generally difficult to be obtained at the same time, as the projection directions are usually different. To solve this problem, a trade-off parameter η₁ is introduced to balance these two objectives. Then, the objective function can be described as follows:

J=min η 1 J x +( 1- η 1 ) J y =min ∑ i,j=1,⋯,N,i≠j ‖ z i - z j ‖ 2 ( η 1 S x,ij +( 1- η 1 ) S y,ij )

(10)

where 0 ≤ η₁ ≤ 1. Details on the selection of parameter η₁ will be presented in Section 3.3.

To solve this optimization problem, an orthogonal constraint β^Tβ = 1 is introduced to avoid singularity problem. Then, the optimization problem can be solved through an eigen problem:⁴²⁾

XL X T β=λβ

(11)

where L = D − SI, with SI_ij = η₁S_x,ij + (1 − η₁)S_y,ij and D a diagonal matrix defined as D ii = ∑ j=1 N S I ij . The items β₁,…,β_d are the eigenvectors corresponding to the smallest d eigenvalues λ₁,…,λ_d. And the internal variables z can be obtained as z_i = B^Tx_i with B = [β₁,…,β_d].

3.2. Adaptive Weighting Similarity Criterion

Instead of using the Euclidean distance directly (e.g., Eq. (5)), a weighted distance is adopted here:

d w ij = ∑ k=1 d v k ( z i,k - z j,k ) 2

(12)

where v_k is the feature weight parameter to balance the components of z_i, and z_i,k indicates the k^th element in z_i.

Inspired by the principle component analysis,⁴⁵⁾ an eigenvalue based weighting strategy is formulated to calculate the importance of the latent variables:

v k = λ k -1 ∑ i=1 d λ k -1

(13)

With Eq. (13), the element with a larger eigenvalue will be assigned a relative smaller importance (≈0) and can be ignored to reduce the projection dimension automatically. Then, according to the weighted distance between the training data and the query sample, η_q samples with the largest weighted similarity measurements are selected as relevant samples.

A simple illustration of sample selection in two-dimensional projection space is shown in Fig. 2 to distinguish the proposed strategy and the criterion in.³⁹⁾ Suppose the values of β₁, β₂ are the mapping direction corresponding to the λ₁, λ₂, respectively. The shape of samples with the same similarity indices would be an ellipse in with the long-axis-to-short-axis ratio a:b= λ 1 -1 : λ 2 -1 , rather than a circle in.³⁹⁾ It is straightforward to observe that the more relevant feature would be assigned a larger weight and play a more important role to determine the similarity measurement, which can make the criterion more accurate.

Fig. 2.

Illustration of sample selection with feature weighting in two dimensional projection space.

3.3. Auto Selection of Parameter η₁

For the dual-objective optimization problem in Eq. (9), it is difficult to obtain an optimal solution mainly due to the conflict between the two sub-objectives. To obtain a relatively good solution, the scales and convergence speed of the sub-objectives should be considered carefully to select the parameter η₁.⁴⁶⁾

The solution of Eq. (10) is obtained by solving the eigen problem rather than using an iterative manner. As a result, the convergence speed problem can be avoided. Thus, the parameter η₁ should be selected to balance the scale issue. Inspired by,⁴⁶⁾ the scale of J_x and J_y can be defined as:

S input =ρ( X L x X T ) S output =ρ( X L y X T )

(14)

where L_x and L_y are defined based on S_x,ij and S_y,ij, respectively, ρ(∘) indicates the largest eigenvalue of the matrix ∘ . Thus, η₁ can be obtained by:

η 1 S input =( 1- η 1 ) S output ⇒ η 1 = S output S output + S input

(15)

In summary, the steps to select the relevant samples for construction of a JLSSVR model can be described as follows:

Step 1: Establish the matrices Sx, Sy (The elements in Sx and Sy are S_x,ij and S_y,ij, respectively.), L_x and L_y.

Step 2: Set η₁ automatically according to Eqs. (14) and (15).

Step 3: Solve the optimization problem in Eq. (10) to obtain the corresponding eigenvalues and eigenvectors.

Step 4: Assign adaptive weights to the latent variables and calculate the weighted distance based on Eqs. (12) and (13).

Step 5: Select n_q relevant samples on the basis of the weighted similarity measurement.

Step 6: The JLSSVR soft sensor model can be online constructed using Eqs. (1), (2), (3) and (6).

It should be noted that, by introducing the dual-objective optimization scheme, the local structure of both input and output information can be better preserved. Additionally, using the LPP method, the new variables will be independent with each other by removing the correlation among the process variables. Moreover, the adaptive weighted similarity criterion keeping the local information can help estimate the similarity and select the relevant samples. Consequently, with the properly selected relevant samples, the built local model in a JITL manner would be more accurate to predict the output variables.

4. Industrial Silicon Content Prediction

In this section, the proposed improved JITL-based soft sensor modeling methods are applied to online predict the silicon content of an industrial blast furnace ironmaking process in China. All the data samples have been collected from daily process records and the corresponding laboratory analysis. The process input variables correlated with the product quality (i.e., the silicon content) have been selected.^21,22,24) These input variables include the blast volume, the blast temperature, the top pressure, the gas permeability, the top temperature, the ore/coke ratio, and the pulverized coal injection. The sampling time of most of these input variables is 1 minute. According to expert experience and correlation analysis,³⁰⁾ the time difference between the silicon content and input variables can be selected. For example, the time difference between the silicon content and the top pressure is about 2 h; and the time difference between the silicon content and the gas permeability is about 1 h.

After simply preprocessing the modeling set with 3-sigma criterion, most of obvious outlier samples and missing values have been removed out. After preprocessing, a set of about 260 samples is investigated. The first 150 samples are treated as the historical samples. The rest set of about 110 samples is for testing. The simulation environment in this case is MatLab V2009b with CPU main frequency 2.3 GHz and 4 GB memory.

As discussed in previous research, only a single global model is not enough for description of all the process characteristics in industrial ironmaking processes.⁴⁰⁾ Additionally, the blast furnace dynamics may change, gradually or abruptly, and a fixed soft sensor model validated on earlier data may not perform well on future data.²⁹⁾ Here, JLSSVR is considered as a JITL-based local modeling method. To better illustrate the effect of the proposed method, the adaptive weighted relevant sample selection strategy is applied to acquire the relevant data set for each query sample. Two cases are investigated here. The first one in Section 4.1 is to study the performance of eigenvalue based adaptive weighting approach only using the information in input variables, i.e. η=1 in Eq. (7). The other one in Section 4.2 focuses on the utilizing both of the information in input and output variables and how to auto-select the trade-off parameter η₁ in Eq. (10).

The root-mean-square error (RMSE) and relative RMSE (simply noted as RE) are two common performance indices to quantitatively evaluate the prediction performance of different soft sensor models.⁴⁰⁾ Additionally, the hit rate (HR) index is often adopted in industrial blast furnace ironmaking processes.^{21,22,23,24,25,26,27,28)} Three indices of RMSE, RE, and HR are defined, respectively.

RMSE= ∑ q=1 l ( y q - y ˆ q l ) 2

(16)

RE= ∑ q=1 l ( y q - y ˆ q y q ) 2 /l

(17)

{ HR= ∑ q=1 l H q l ×100% where H q ={ 1, | y ˆ q - y q |<0.1 0, else

(18)

where y ˆ q and y_q are the predicted value and the actual value, respectively. And l is the number of test samples.

4.1. Performance of Weighting Strategy

As aforementioned, the comparison of the feature weighting strategies is investigated only using the information in input variables, i.e. η₁ = 1 in Eq. (7). Two following weighting strategies are considered for the variables in the projected space: (1) equal weighting, where the values of v_k are assigned as v₁ = ··· = v_d in Eq. (12); (2) eigenvalue based weighting, with its weights assigned as Eqs. (12) and (13).

The prediction results of these two weighting strategies are listed in Table 1 with the number of relevant samples n_q = 10, 20, 30, 40, respectively. Additionally, the online prediction results of the silicon content with n_q = 30 are shown in Fig. 3. To show the result more clear, only the first 50 testing samples are plotted in Fig. 3. From Fig. 3 and Table 1, it can be shown that, regardless of the value of n_q (n_q = 10, 20, 30, 40), the eigenvalue based weighting criterion outperforms the equal weighting strategy, with the smaller RMSE and RE indices, and the larger HR value. For this case, the number of similar samples n_q = 30 is a suitable choice.

Table 1. Comparisons of LPP-based JLSSVR soft sensors with different feature weighting strategies for different similar samples (best results are bold and underlined).

Similar samples	Feature weighting strategy	RMSE	RE (%)	HR (%)
10	Eigenvalue based weighting	0.099	19.1	68.2
10	Equal weight	0.108	20.3	65.4
20	Eigenvalue based weighting	0.097	18.2	70.1
20	Equal weight	0.107	20.1	67.3
30	Eigenvalue based weighting	0.095	18.1	71.0
30	Equal weight	0.105	19.9	69.2
40	Eigenvalue based weighting	0.101	18.9	69.2
40	Equal weight	0.109	20.6	66.4

Fig. 3.

Online silicon content prediction error comparison results of JLSSVR soft sensor models with eigenvalue based weighting and equal weighting strategies.

The prediction results indicate that the eigenvalue based weighting criterion can help determine the feature weights and construct the similarity measurement more suitably. This is mainly because the feature weights are assigned according to their relative importance. Consequently, the local structures of initial data samples can be kept and the similarity criterion can be estimated more properly.

4.2. Performance of Utilizing the Output Information

To show the effect of utilizing the output variable information of the training dataset, five LPP-based similarity criteria are utilized to search the mapping direction, followed by the eigenvalue based weighting strategy to calculate the weighted distance in projection space. They are listed as: (1) Sx, only using the input variable information of the training samples; (2) Sy, only using the output information of the training samples; (3) SxSy, that is the criterion S I ij = S x,ij η S y,ij 1-η defined in Eq. (7), with the trade-off parameter η determined by the cross-validation approach; (4) the proposed criterion J = min η₁J_x + (1−η₁)J_y in Eq. (10) with η₁ selected by cross-validation, noted as PC_cv; (5) the proposed criterion J = min η₁J_x + (1−η₁)J_y in Eq. (10) with auto parameter setting for η₁, noted as PC_auto for simplicity.

The comparison results of the aforementioned criteria with n_q = 30 are tabulated in Table 2. From Table 2, one can see that a better prediction performance can be obtained by utilizing the information in both of input and output variables, compared with only using Sx or Sy. The output variables contain some important information and should be adopted. However, it was not explored in most traditional JITL-based soft sensors in industrial processes. The prediction results of traditional JLSSVR^38,40) without LPP-based similarity for searching the similar set are also listed in Table 2. It is inferior to other methods with LPP-based similarity strategies. The adaptive weighting JLSSVR (i.e., PC_auto) and traditional JLSSVR methods are compared using a parity plot shown in Fig. 4. It can be indicated that the adaptive weighting JLSSVR method is more accurate than JLSSVR (the HR index increases from 66.4% to 73.8%).

Table 2. Comparisons of the online prediction errors using six different JLSSVR soft sensor models (best results are bold and underlined).

JLSSVR-based models	Brief description	RMSE	RE (%)	HR (%)
Sx	LPP-based input variable information	0.095	18.2	71.0
Sy	LPP-based output variable information	0.097	18.3	70.1
SxSy	LPP-based input and output variable information	0.093	18.0	72.0
PC_cv	LPP-based feature weighting for input and output variable information (with η₁ selected by cross-validation)	0.087	17.3	73.8
PC_auto	LPP-based feature weighting for input and output variable information (with auto parameter setting for η₁)	0.086	17.4	73.8
JLSSVR⁴⁰⁾	Traditional similarity criterion without LPP	0.107	20.5	66.4

Fig. 4.

The RE index according to possible values of the trade-off parameter η₁ and its auto-setting result.

Moreover, it is shown that the cross-validation based criterion (PC_cv) can also make the model predict well. However, for a query sample with n_q = 30, the total computational time for online modeling and prediction is about 11 s, which is much larger than using the PC_auto method (less than 1 s). It is rather time-consuming for the PC_cv method mainly because all the possible values of η₁ need to be computed to select the proper value by cross-validation. Meanwhile, for the proposed auto parameter setting strategy defined in Eqs. (14) and (15) (PC_auto), the computation is more efficient because only a generalized eigen problem is solved. As illustrated in Fig. 5, the predictive performance of the RE index with possible trade-off parameters η₁ is presented. It is shown that the auto-setting parameter can be a good and suitable choice in practice although it is suboptimal. This is mainly because the optimal value of η₁ shown in Fig. 5 is not known beforehand. In a word, the proposed similarity PC_auto can achieve almost the best performance as the traditional cross-validation method, while saving a lot of computational time.

Fig. 5.

Parity plot based of assay values against the prediction values of the silicon content in the test set using the adaptive weighting JLSSVR and traditional JLSSVR soft sensor models.

Therefore, from all the obtained results and comparison analysis, the proposed PC_auto approach by utilizing both of the input and output information with automatic trade-off parameter determination and the eigenvalue based weighting strategy, can achieve a promising prediction performance. Moreover, compared with the traditional cross-validation approach, it can be implemented in an efficient manner.

5. Conclusion

This paper has proposed a novel JITL-based local soft sensor model with adaptive relevant sample selection for better prediction of the silicon content. The main contributions are three folds: (1) the eigenvalue based adaptive weighting strategy, (2) using a dual-objective optimization framework for employment of the output information, (3) and the auto-selection of the trade-off parameter η₁ in an efficient manner. The superiority of the proposed method is demonstrated and compared with several JLSSVR soft sensors in terms of online prediction of the silicon content in an industrial blast furnace. Note that other JITL-based modeling methods can also be integrated with the proposed relevant sample selection strategy. Additionally, some advanced outlier detection methods can be applied as a preprocessing method to enhance the reliability of quality prediction. Therefore, there are several interesting research directions worth investigating to further enhance the accuracy and transparency of a silicon content prediction model.

Acknowledgment

The authors would like to gratefully acknowledge the National Natural Science Foundation of China (Grant No. 61004136) and Jiangsu Key Laboratory of Process Enhancement & New Energy Equipment Technology (Nanjing University of Technology) for their financial support.

Abbreviations

FLOO: fast leave-one-out

HR: hit rate

JITL: just-in-time learning

JLSSVR: just-in-time least squares support vector regression

LPP: locality preserving projection

LSSVR: least squares support vector regression

RE: relative root-mean-square error

RMSE: root-mean-square error

SI: similarity index

SVR: support vector regression

References

1) K. Sugawara, K. Morimoto, T. Sugawara and J. S. Dranoff: AIChE J., 45 (1999), 574.
2) X. G. Bi, K. Torrssel and O. Wijk: ISIJ Int., 32 (1992), 481.
3) H. Nogami, M. S. Chu and J. Yaji: Comput. Chem. Eng., 29 (2005), 2438.
4) K. Nishioka, T. Maeda and M. Shimizu: ISIJ Int., 45 (2005), 669.
5) V. R. Radhakrishnan and K. M. Ram: J. Process Control, 11 (2001), 565.
6) S. Ueda, S. Natsui, H. Nogami, J. I. Yagi and A. Tatsuro: ISIJ Int., 50 (2010), 914.
7) H. Singh, N. V. Sridhar and B. Deo: Steel Res. Int., 67 (1996), 521.
8) V. R. Radhakrishnan and A. R. Mohamed: J. Process Control, 10 (2000), 509.
9) J. Chen: Eng. Appl. Artif. Intell., 14 (2001), 77.
10) J. Jimenez, J. Mochon, J. S. de Ayalai and F. Obeso: ISIJ Int., 44 (2004), 573.
11) H. Saxen and F. Pettersson: ISIJ Int., 47 (2007), 1732.
12) F. Pettersson, N. Chakraborti and H. Saxén: Appl. Soft Comput., 7 (2007), 387.
13) A. Nurkkala, F. Pettersson and H. Saxen: Ind. Eng. Chem. Res., 50 (2011), 9236.
14) X. J. Hao, F. M. Shen, G. Du, Y. S. Shen and Z. Xie: Steel Res. Int., 76 (2005), 694.
15) T. Bhattacharya: ISIJ Int., 45 (2005), 1943.
16) R. D. Martin, F. Obeso, J. Mochon, R. Barea and J. Jimenez: Ironmaking Steelmaking, 34 (2007), 241.
17) M. Waller and H. Saxen: Indust. Eng. Chem. Res., 39 (2000), 982.
18) C. H. Gao, Z. M. Zhou and J. M. Chen: Ind. Eng. Chem. Res., 47 (2008), 3037.
19) M. Waller and H. Saxen: ISIJ Int., 42 (2002), 316.
20) T. Miyano, S. Kimoto, H. Shibuta, K. Nakashima, Y. Ikenaga and K. Aihara: Phys. D, 135 (2000), 305.
21) J. S. Zeng and C. H. Gao: J. Process Control, 19 (2009), 1519.
22) L. Jian, C. H. Gao and Z. Q. Xia: Steel Res. Int., 82 (2011), 169.
23) C. H. Gao, J. S. Zeng and S. H. Luo: IEEE Trans. Ind. Electron., 59 (2012), 1134.
24) L. Jian, S. Q. Shen and Y. Q. Song: J. Appl. Math., (2012), Article ID 949654.
25) C. H. Gao, J. M. Chen, J. S. Zeng, X. Y. Liu and Y. X. Sun: AIChE J., 55 (2009), 947.
26) C. H. Gao, J. S. Zeng and Z. M. Zhou: AIChE J., 57 (2011), 3448.
27) Y. X. Chu and C. H. Gao: AIChE J., 60 (2014), 2197.
28) S. H. Luo, C. H. Gao, J. S. Zeng and J. Huang: Asian J. Control, 15 (2013), 553.
29) A. Nurkkala, F. Pettersson and H. Saxen: ISIJ Int., 52 (2012), 1763.
30) H. Saxen, C. H. Gao and Z. W. Gao: IEEE Trans. Ind. Inform., 9 (2013), 2213.
31) M. Kano and Y. Nakagawa: Comput. Chem. Eng., 32 (2008), 12.
32) P. Kadlec, B. Gabrys and S. Strandt: Comput. Chem. Eng., 33 (2009), 795.
33) J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor and J. Vandewalle: Least Squares Support Vector Machines, World Scientific, Singapore, (2002).
34) Y. Liu and J. Chen: J. Process Control, 23 (2013), 793.
35) G. Bontempi, M. Birattari and H. Bersini: Int. J. Control, 72 (1999), 643.
36) C. Cheng and M. S. Chiu: Chem. Eng. Sci., 59 (2004), 2801.
37) K. Fujiwara, M. Kano, S. Hasebe and A. Takinami: AIChE J., 55 (2009), 1754.
38) Y. Liu, Z. L. Gao, P. Li and H. Q. Wang: Ind. Eng. Chem. Res., 51 (2012), 4313.
39) K. Chen, J. Ji, H. Q. Wang, Y. Liu and Z. H. Song: Chem. Eng. Res. Des., 89 (2011), 2117.
40) Y. Liu and Z. L. Gao: Ironmaking Steelmaking, 42 (2015), 321.
41) G. C. Cawley and N. L. C. Talbot: Neural Netw., 17 (2004), 1467.
42) X. F. He and P. Niyogi: Proc. Conf. on Advances in Neural Information Processing Systems, MIT Press, Masseachusettes, (2004).
43) S. T. Roweis and L. K. Saul: Science, 290 (2000), No. 5500, 2323.
44) M. Belkin and P. Niyogi: Neural Comput., 15 (2003), No. 6, 1373.
45) X. Q. Liu, U. Kruger, T. Littler, L. Xie and S. Q. Wang: Chemom. Intell. Lab. Syst., 96 (2009), 132.
46) M. G. Zhang, Z. Q. Ge, Z. H. Song and R. W. Fu: Ind. Eng. Chem. Res., 50 (2011), 6837.

Corresponding author

Register with J-STAGE for free!