2017 Volume 57 Issue 1 Pages 107-113
Development of accurate soft sensors for online quality prediction (e.g., silicon content) in an industrial blast furnace is a difficult task. A novel just-in-time-learning (JITL) prediction approach using adaptive feature-weighting for similar samples is developed. First, a dual-objective joint-optimization framework is proposed to introduce both input and output information into the model. Then, a suitable similarity criterion with feature weighting strategy is formulated, which is not considered in conventional JITL methods. Moreover, the trade-off parameter in the joint-optimization problem can be chosen automatically, without the time-consuming cross-validation procedure. The proposed method is applied to online predict the silicon content in an industrial blast furnace in China. Compared with other JITL-based soft sensors, better prediction performance has been obtained.
The blast furnace ironmaking process is an important unit operation in the manufacturing of iron and steel. It consumes the main energy input to the integrated route of steel production and emits much carbon oxide which is the main cause of the greenhouse effect. As one of the most energy intensive and complicated industrial process, there has been a growing awareness of modeling and controlling the blast furnace ironmaking process for increasing efficiency and reducing cost. The silicon content, indicating the thermal state in blast furnace, is the most important index of pig iron quality. It must be kept at an appropriate level to facilitate the production and stable running of the ironmaking process. Therefore, accurate online prediction of the silicon content in hot metal is very critical.1,2,3,4,5,6) Extensive research on the thermodynamic and kinetic behaviors occurring inside the blast furnace ironmaking process has been investigated. However, an accurate mechanism model in industrial processes has not been constructed.
Nowadays, a large amount of process data containing useful information can be obtained in industrial blast furnace ironmaking processes. To online predict the silicon content, various data-driven soft sensor modeling approaches, including various neural networks,7,8,9,10,11,12,13,14) partial least squares,14,15) fuzzy inference systems,16) nonlinear time series analysis,17,18,19,20) subspace identification,21) support vector regression (SVR) and least squares SVR (LSSVR),22,23,24) and others25,26,27,28,29) have been investigated. A recent overview of black-box models for short-term silicon content prediction in blast furnaces can be referred to.30) Without substantial understanding of the complicated phenomenology, the data-driven soft sensor models can be built in a quick manner.30,31,32) Among them, SVR and LSSVR have shown promising prediction performance, especially when the training data are insufficient.22,23,24,33)
One main disadvantage of most existing data-driven modeling approaches for the silicon content is that a single global model is built. However, it is not enough to describe all the process characteristics only using a single model,34,35,36,37,38,39,40) especially for some complicated regions with insufficient information. To improve the prediction performance, Nurkkala et al.29) presented multiple autoregressive vector models to describe complex systems. On the other hand, although moving-window-based recursive soft sensors can gradually be adapted to new operational conditions, how to choose a suitable moving-window size for complex blast furnace ironmaking processes is difficult.10,11,17) Additionally, most recursive models may not function well in a new operational region until a sufficient period of time has passed because of the time delay when they adapt themselves to new operational conditions. Recently, the just-in-time LSSVR (JLSSVR) modeling approach has been applied to industrial ironmaking processes.40) For a query sample, the JLSSVR based local model is online built using its similar samples. Consequently, a suitable JLSSVR model can better describe process nonlinearity directly. However, there are still two disadvantages of JLSSVR. First, data samples utilized for construction of a JLSSVR model are assigned with the same feature weight. This is inconsistent with many practical applications mainly because each input variable has its special impact on the final quality. Second, for JLSSVR and most current just-in-time modeling methods,34,35,36,37,38,40) similarity measurements are only considered on the input variables. However, the information of output variables is not utilized.
This work mainly develops an adaptive just-in-time-learning (JITL)-based local model for better prediction of the silicon content. First, a dual-objective optimization framework is proposed to preserve the local structure of both input and output variables. Then, according to the importance, weights are assigned to the variables in the projection space. Using the new similarity, the relevant samples can be selected and weighted. Moreover, the trade-off parameter in the optimization problem can be chosen in a efficient manner, without the time-consuming cross-validation. All these improvements can enhance the performance of JITL-based models.
The remainder of this paper is organized as follows. The JLSSVR soft sensor modeling method is described in Section 2. In Section 3, the detailed implementations of the adaptive weighting JLSSVR model are developed. It is evaluated by the silicon content online prediction in an industrial process in Section 4. Comparison studies with other methods are also investigated. Finally, a conclusion is made in Section 5.
Using the kernel learning framework, the SVR/LSSVR soft sensor model is to learn a mapping f: X→Y with a modeling set S = {X, Y} =
(1) |
(2) |
(3) |
Finally, the LSSVR model for prediction of a test sample xq, i.e.,
(4) |
For some industrial processes, the direct application of a global/fixed model with complicated structure is often difficult. Another limitation of the global methods is that it is difficult for them to be updated in a quick manner when the process dynamics are changing.37) Additionally, for some situations with more complex characteristics, the training data samples are not sufficient. Therefore, only using an LSSVR/SVR model for industrial ironmaking processes is still not enough.
To alleviate these problems and construct the local models automatically, the JITL method inspired by the ideas of local modeling has been developed as an alternative to nonlinear process modeling and control.34,35,36,37,38,39,40) As illustrated in Fig. 1, for a query sample xq, there are three steps to construct a JITL model (take JLSSVR for example). First, select similar samples as a similar set Sq in the database S based on some defined similarity criterions. Second, construct a JLSSVR model fJLSSVR(xq) using the relevant dataset Sq. Third, online predict the output
The common flowchart of JITL-based online soft sensor modeling method.
With the same three-step implements, a new JLSSVR model can be constructed for the next query sample. The Euclidean distance-based similarity is commonly utilized.34,35,36,37,38,39,40) The similarity index (SI) between the query sample xq and the sample xi in the dataset is defined below35)
(5) |
For construction of the JLSSVR model, the user-defined parameters include the regularization parameter γ, and the kernel parameter (e.g., the Gaussian kernel
(6) |
Although the criterion in Eq. (5) has been widely adopted in JITL-based methods for process modeling and monitoring, the similarity still may not be described adequately in two folds. First, different input variables are equally weighting, which is inconsistent with many practical situations. Additionally, similarity measurements are only built on the input variables (i.e., in Eq. (5)). No information in output quality variables is used.
Recently, a supervised locality preserving projection (LPP) strategy42) was proposed to construct the similarity measurement using both of input and output information.39)
(7) |
With the similarity defined in Eq. (7), an LPP approach is employed to seek the mapping direction, which serves as the weights for variables. Then, relevant samples are selected according to the Euclidean distance in the projection space. Unfortunately, there are still two drawbacks in the LPP-based similarity method. First, the parameter η is selected through the cross-validation method, which is really time-consuming. Second, in the low-dimensional space, the internal variables are assigned with the equal weights, which may cause the same problem as the traditional criterion in.36)
The proposed relevant sample selection strategy is to construct an adaptive weighted distance as the similarity criterion by adequately utilizing the information in both of input and output variables. Consequently, the LPP algorithm42) is adopted to keep the local structure of both input and output information. And it is solved in a dual-objective optimization form as a general eigen problem. Then, according to the eigenvalue of projection direction, adaptive weights are assigned to the internal variables to construct the similarity criterion to select relevant samples.
3.1. Feature Extraction Using LPPOne challenge to establish the suitable similarity criterion is to select representative features from various input variables while preserving the local structure as much as possible. To achieve this goal, the LPP approach is employed. LPP42) is a recent dimensionality reduction method which has been successfully applied in information retrieval and pattern recognition areas. LPP aims at preserving the neighborhood structure of the data set, while principal component analysis (PCA) only retains most of the original variance.42) Additionally, LPP shares many of the properties of nonlinear methods such as locally linear embedding43) and Laplacian eigenmaps.44) Therefore, compared with PCA, LPP can reveal the intrinsic geometrical structure of the observed data and find more meaningful low-dimensional information hidden in the high-dimensional observations. Moreover, the linear property of LPP makes it efficient in computation and suitable for practical applications.42)
Given a set of m-dimensional input variables X = {x1,…,xN} with corresponding output variables Y = {y1,…,yN} where xi∈Rm, LPP aims to find a transformation matrix B = [β1,…,βd] to project these samples to a low-dimensional sample set Z = {z1,…,zN} in Rd (d ≤ m) based on following objective function:42)
(8) |
To better utilizing both the secondary and primary information, a dual-objective optimization scheme is proposed:
(9) |
It is expected that, after projection, the nearby points, with both similar input and output variables, are still close in the low-dimensional projection space. However, these two objectives are generally difficult to be obtained at the same time, as the projection directions are usually different. To solve this problem, a trade-off parameter η1 is introduced to balance these two objectives. Then, the objective function can be described as follows:
(10) |
To solve this optimization problem, an orthogonal constraint βTβ = 1 is introduced to avoid singularity problem. Then, the optimization problem can be solved through an eigen problem:42)
(11) |
Instead of using the Euclidean distance directly (e.g., Eq. (5)), a weighted distance is adopted here:
(12) |
Inspired by the principle component analysis,45) an eigenvalue based weighting strategy is formulated to calculate the importance of the latent variables:
(13) |
With Eq. (13), the element with a larger eigenvalue will be assigned a relative smaller importance (≈0) and can be ignored to reduce the projection dimension automatically. Then, according to the weighted distance between the training data and the query sample, ηq samples with the largest weighted similarity measurements are selected as relevant samples.
A simple illustration of sample selection in two-dimensional projection space is shown in Fig. 2 to distinguish the proposed strategy and the criterion in.39) Suppose the values of β1, β2 are the mapping direction corresponding to the λ1, λ2, respectively. The shape of samples with the same similarity indices would be an ellipse in with the long-axis-to-short-axis ratio
Illustration of sample selection with feature weighting in two dimensional projection space.
For the dual-objective optimization problem in Eq. (9), it is difficult to obtain an optimal solution mainly due to the conflict between the two sub-objectives. To obtain a relatively good solution, the scales and convergence speed of the sub-objectives should be considered carefully to select the parameter η1.46)
The solution of Eq. (10) is obtained by solving the eigen problem rather than using an iterative manner. As a result, the convergence speed problem can be avoided. Thus, the parameter η1 should be selected to balance the scale issue. Inspired by,46) the scale of Jx and Jy can be defined as:
(14) |
(15) |
In summary, the steps to select the relevant samples for construction of a JLSSVR model can be described as follows:
Step 1: Establish the matrices Sx, Sy (The elements in Sx and Sy are Sx,ij and Sy,ij, respectively.), Lx and Ly.
Step 2: Set η1 automatically according to Eqs. (14) and (15).
Step 3: Solve the optimization problem in Eq. (10) to obtain the corresponding eigenvalues and eigenvectors.
Step 4: Assign adaptive weights to the latent variables and calculate the weighted distance based on Eqs. (12) and (13).
Step 5: Select nq relevant samples on the basis of the weighted similarity measurement.
Step 6: The JLSSVR soft sensor model can be online constructed using Eqs. (1), (2), (3) and (6).
It should be noted that, by introducing the dual-objective optimization scheme, the local structure of both input and output information can be better preserved. Additionally, using the LPP method, the new variables will be independent with each other by removing the correlation among the process variables. Moreover, the adaptive weighted similarity criterion keeping the local information can help estimate the similarity and select the relevant samples. Consequently, with the properly selected relevant samples, the built local model in a JITL manner would be more accurate to predict the output variables.
In this section, the proposed improved JITL-based soft sensor modeling methods are applied to online predict the silicon content of an industrial blast furnace ironmaking process in China. All the data samples have been collected from daily process records and the corresponding laboratory analysis. The process input variables correlated with the product quality (i.e., the silicon content) have been selected.21,22,24) These input variables include the blast volume, the blast temperature, the top pressure, the gas permeability, the top temperature, the ore/coke ratio, and the pulverized coal injection. The sampling time of most of these input variables is 1 minute. According to expert experience and correlation analysis,30) the time difference between the silicon content and input variables can be selected. For example, the time difference between the silicon content and the top pressure is about 2 h; and the time difference between the silicon content and the gas permeability is about 1 h.
After simply preprocessing the modeling set with 3-sigma criterion, most of obvious outlier samples and missing values have been removed out. After preprocessing, a set of about 260 samples is investigated. The first 150 samples are treated as the historical samples. The rest set of about 110 samples is for testing. The simulation environment in this case is MatLab V2009b with CPU main frequency 2.3 GHz and 4 GB memory.
As discussed in previous research, only a single global model is not enough for description of all the process characteristics in industrial ironmaking processes.40) Additionally, the blast furnace dynamics may change, gradually or abruptly, and a fixed soft sensor model validated on earlier data may not perform well on future data.29) Here, JLSSVR is considered as a JITL-based local modeling method. To better illustrate the effect of the proposed method, the adaptive weighted relevant sample selection strategy is applied to acquire the relevant data set for each query sample. Two cases are investigated here. The first one in Section 4.1 is to study the performance of eigenvalue based adaptive weighting approach only using the information in input variables, i.e. η=1 in Eq. (7). The other one in Section 4.2 focuses on the utilizing both of the information in input and output variables and how to auto-select the trade-off parameter η1 in Eq. (10).
The root-mean-square error (RMSE) and relative RMSE (simply noted as RE) are two common performance indices to quantitatively evaluate the prediction performance of different soft sensor models.40) Additionally, the hit rate (HR) index is often adopted in industrial blast furnace ironmaking processes.21,22,23,24,25,26,27,28) Three indices of RMSE, RE, and HR are defined, respectively.
(16) |
(17) |
(18) |
As aforementioned, the comparison of the feature weighting strategies is investigated only using the information in input variables, i.e. η1 = 1 in Eq. (7). Two following weighting strategies are considered for the variables in the projected space: (1) equal weighting, where the values of vk are assigned as v1 = ··· = vd in Eq. (12); (2) eigenvalue based weighting, with its weights assigned as Eqs. (12) and (13).
The prediction results of these two weighting strategies are listed in Table 1 with the number of relevant samples nq = 10, 20, 30, 40, respectively. Additionally, the online prediction results of the silicon content with nq = 30 are shown in Fig. 3. To show the result more clear, only the first 50 testing samples are plotted in Fig. 3. From Fig. 3 and Table 1, it can be shown that, regardless of the value of nq (nq = 10, 20, 30, 40), the eigenvalue based weighting criterion outperforms the equal weighting strategy, with the smaller RMSE and RE indices, and the larger HR value. For this case, the number of similar samples nq = 30 is a suitable choice.
Similar samples | Feature weighting strategy | RMSE | RE (%) | HR (%) |
---|---|---|---|---|
10 | Eigenvalue based weighting | 0.099 | 19.1 | 68.2 |
Equal weight | 0.108 | 20.3 | 65.4 | |
20 | Eigenvalue based weighting | 0.097 | 18.2 | 70.1 |
Equal weight | 0.107 | 20.1 | 67.3 | |
30 | Eigenvalue based weighting | 0.095 | 18.1 | 71.0 |
Equal weight | 0.105 | 19.9 | 69.2 | |
40 | Eigenvalue based weighting | 0.101 | 18.9 | 69.2 |
Equal weight | 0.109 | 20.6 | 66.4 |
Online silicon content prediction error comparison results of JLSSVR soft sensor models with eigenvalue based weighting and equal weighting strategies.
The prediction results indicate that the eigenvalue based weighting criterion can help determine the feature weights and construct the similarity measurement more suitably. This is mainly because the feature weights are assigned according to their relative importance. Consequently, the local structures of initial data samples can be kept and the similarity criterion can be estimated more properly.
4.2. Performance of Utilizing the Output InformationTo show the effect of utilizing the output variable information of the training dataset, five LPP-based similarity criteria are utilized to search the mapping direction, followed by the eigenvalue based weighting strategy to calculate the weighted distance in projection space. They are listed as: (1) Sx, only using the input variable information of the training samples; (2) Sy, only using the output information of the training samples; (3) SxSy, that is the criterion
The comparison results of the aforementioned criteria with nq = 30 are tabulated in Table 2. From Table 2, one can see that a better prediction performance can be obtained by utilizing the information in both of input and output variables, compared with only using Sx or Sy. The output variables contain some important information and should be adopted. However, it was not explored in most traditional JITL-based soft sensors in industrial processes. The prediction results of traditional JLSSVR38,40) without LPP-based similarity for searching the similar set are also listed in Table 2. It is inferior to other methods with LPP-based similarity strategies. The adaptive weighting JLSSVR (i.e., PCauto) and traditional JLSSVR methods are compared using a parity plot shown in Fig. 4. It can be indicated that the adaptive weighting JLSSVR method is more accurate than JLSSVR (the HR index increases from 66.4% to 73.8%).
JLSSVR-based models | Brief description | RMSE | RE (%) | HR (%) |
---|---|---|---|---|
Sx | LPP-based input variable information | 0.095 | 18.2 | 71.0 |
Sy | LPP-based output variable information | 0.097 | 18.3 | 70.1 |
SxSy | LPP-based input and output variable information | 0.093 | 18.0 | 72.0 |
PCcv | LPP-based feature weighting for input and output variable information (with η1 selected by cross-validation) | 0.087 | 17.3 | 73.8 |
PCauto | LPP-based feature weighting for input and output variable information (with auto parameter setting for η1) | 0.086 | 17.4 | 73.8 |
JLSSVR40) | Traditional similarity criterion without LPP | 0.107 | 20.5 | 66.4 |
The RE index according to possible values of the trade-off parameter η1 and its auto-setting result.
Moreover, it is shown that the cross-validation based criterion (PCcv) can also make the model predict well. However, for a query sample with nq = 30, the total computational time for online modeling and prediction is about 11 s, which is much larger than using the PCauto method (less than 1 s). It is rather time-consuming for the PCcv method mainly because all the possible values of η1 need to be computed to select the proper value by cross-validation. Meanwhile, for the proposed auto parameter setting strategy defined in Eqs. (14) and (15) (PCauto), the computation is more efficient because only a generalized eigen problem is solved. As illustrated in Fig. 5, the predictive performance of the RE index with possible trade-off parameters η1 is presented. It is shown that the auto-setting parameter can be a good and suitable choice in practice although it is suboptimal. This is mainly because the optimal value of η1 shown in Fig. 5 is not known beforehand. In a word, the proposed similarity PCauto can achieve almost the best performance as the traditional cross-validation method, while saving a lot of computational time.
Parity plot based of assay values against the prediction values of the silicon content in the test set using the adaptive weighting JLSSVR and traditional JLSSVR soft sensor models.
Therefore, from all the obtained results and comparison analysis, the proposed PCauto approach by utilizing both of the input and output information with automatic trade-off parameter determination and the eigenvalue based weighting strategy, can achieve a promising prediction performance. Moreover, compared with the traditional cross-validation approach, it can be implemented in an efficient manner.
This paper has proposed a novel JITL-based local soft sensor model with adaptive relevant sample selection for better prediction of the silicon content. The main contributions are three folds: (1) the eigenvalue based adaptive weighting strategy, (2) using a dual-objective optimization framework for employment of the output information, (3) and the auto-selection of the trade-off parameter η1 in an efficient manner. The superiority of the proposed method is demonstrated and compared with several JLSSVR soft sensors in terms of online prediction of the silicon content in an industrial blast furnace. Note that other JITL-based modeling methods can also be integrated with the proposed relevant sample selection strategy. Additionally, some advanced outlier detection methods can be applied as a preprocessing method to enhance the reliability of quality prediction. Therefore, there are several interesting research directions worth investigating to further enhance the accuracy and transparency of a silicon content prediction model.
The authors would like to gratefully acknowledge the National Natural Science Foundation of China (Grant No. 61004136) and Jiangsu Key Laboratory of Process Enhancement & New Energy Equipment Technology (Nanjing University of Technology) for their financial support.
FLOO: fast leave-one-out
HR: hit rate
JITL: just-in-time learning
JLSSVR: just-in-time least squares support vector regression
LPP: locality preserving projection
LSSVR: least squares support vector regression
RE: relative root-mean-square error
RMSE: root-mean-square error
SI: similarity index
SVR: support vector regression