Ensemble Prediction of Tundish Open Eyes Using Artificial Neural Networks

Alvin Ma; Saikat Chatterjee; Kinnor Chattopadhyay

doi:10.2355/isijinternational.ISIJINT-2018-637

Abstract

As global steelmakers are feeling the economical pinch, the need for improving quality and quantity using what is already readily available, increases. This gap in achievement can be bridged by innovation and perforation of already existing techniques and methodologies from other fields. Steel quality, an important issue, is often not associated with a phenomenon known as tundish open eyes. However, recently researchers have shown the detrimental effects of reoxidation and the deterioration of the final product (slabs/billets). Understanding the formation of this event, and mitigating the formation will be an important issue to solve. Current models investigating the former have existed largely in the computational fluid dynamics modelling domain. However, the solution for the former, can only provide static recommendations thus are less useful in a dynamic environment. Hence, development of a reliable model which has the ability to “learn on the fly” is very much needed. In the current study, artificial neural network models have been used to predict non-dimensional open eye sizes in the tundish. The dataset has been compiled from previous regression formulations. The performance of the models is determined based on the following metrics 1) coefficient of multiple determination (R²), 2) and root mean square error (RMSE). The ANN based models, show significant promise, in particular the ensemble variants, which have shown increased accuracy and stability across all domain and range.

1. Introduction

Often considered a trivial event, recent research on the tundish open eye phenomena has directly associated the existence of the event with final steel quality. To prevent reoxidation during the liquid metal transfer from ladle to tundish, an inert gas such as argon is typically used around the refractory ladle shroud to protect the melt stream from re-oxidizing. However, aspiration of argon is inevitable due to the presence of a negative static pressure. Once entrained, the argon bubbles travel downwards into the tundish, until the buoyancy force starts to act and pushes them upwards. At which point, a rising bubble plume is formed beneath the ladle shroud, and the slag is pushed radially outwards forming an exposed eye (Fig. 1). A similar affair also occurs in the ladle whereby, porous plug purging results in slag being pushed radially outwards due to the rising plume. Correspondingly, with the current pinch in prices of steel, any iterative improvement on understanding and predicting steel quality using already available data is very much welcomed.

Fig. 1.

Schematic depicting the exposure to oxygen at the TOE as a result of the argon aspiration. (Online version in color.)

Because the occurrence of open eyes is inherently physical, the modelling of these events has understandably been, for the most part limited to physical simulations. Work in this domain has been lead by researchers such as Chattopadhyay and Chatterjee^1,2,3,4) while work on ladle physics has been lead by researchers such as Mazumdar and Guthrie.^{5,6,7,8,9,10,11,12)} While these models are exhaustive, they are typically relegated to the design and engineering phase due to their inability to perform on-line and dynamic predictions.

Machine learning is a field growing in prominence due to its versatility and strength in making predictive models. While both machine learning models and physical models have benefited from surges in computing power, machine learning models, can continuously make predictions, in real-time, once trained to do so. The perforation of machine learning models into metallurgical applications have been expectedly slow.^{13,14,15,16,17)} There is no current on-line method which predicts open eye formation. Artificial neural networks, specifically, are a burgeoning field of study in the world of machine learning. It’s flexibility in data input and variable relationships makes it an ideal model for many different applications. In it’s most high profile application,¹⁸⁾ the board game Go, artificial neural networks were used to learn from human players which eventually allowed the model to contend against, and defeat equally high profile opponents.

The research presented here will demonstrate the viability of machine learning techniques in metallurgical process prediction. An artificial neural network will be explored for its viability in predicting open eye formation in process metallurgy with specific emphasis on tundish open eye formations.

2. Current Models Used to Predict Tundish Open Eye Size

2.1. Computational Fluid Dynamics Simulation

The use of models involving physical simulation for predicting tundish open eye size typically involve relationships describing the physical nature of the phenomena. An exhaustive description of the underlying flow causing the open eye can be described through these equations. These typically cover turbulence (e.g., Launder and Spaulding’s K-e model¹⁹⁾), interface tracking (e.g., Hirt and Nichols’ volume of fluid method²⁰⁾) and some form of plume approximation (e.g., Buevich’s discrete phase model²¹⁾). These models, while exhaustive, are not able to make predictions in real time. Because these models are based on a combination of these components (with varying boundary conditions), they will not be discussed further in this work.

2.2. Non-dimensional Regression

In a recent study, Chatterjee and Chattopadhyay³⁾ demonstrated that a robust relationship can be developed which can approximate the slag area formed at the tundish as a result of various operating variables. Chatterjee et al. employed a regression model based on operating factors and non dimensional factors (Froude number, density ratio and Reynolds number). The relationship is as follows:

A e * =282.289 ( U p 2 gh ) 1.766 ( Δρ ρ l ) 1.588 ( V s h U p ) 0.089

(1)

As seen in Fig. 2TOP, the effectiveness of the model is clear. However, a strong case can be made against the model’s effectiveness at larger open eye areas. The strength and weakness at smaller and larger areas respectively are also corroborated through plotting the residuals (Fig. 2MID). Larger deviations are seen in the predictions of larger slag areas. Besides an imperfect model, the larger than normal discrepancies can be attributed to 2 factors. These are 1) lack of experimental data points at larger slag eye areas and 2) insufficient model formulation. The former is apparent as showcased in Fig. 2BOT. The density plot shows a unimodal distribution favouring smaller open eye sizes. Furthermore, as the phenomena is driven by a complex interplay of factors, the process is inherently stochastic. As this lack of data may be reflective of plant operation, the latter must be robust enough to account for such large variances.

Fig. 2.

TOP: Relationship derived in Chatterjee et al.’s research,³⁾ MID: Residuals of Chatterjee et al.’s model,³⁾ and, BOT: Density of data utilized in Chatterjee et al.’s model.³⁾ (Online version in color.)

3. Data Visualization

The data in this study has been taken from Chatterjee et al.’s work³⁾ on non-dimensional regression modelling of non-dimensional tundish open eye sizes. In it’s entirety, the dataset includes 20 predictors (e.g. slag height…) and 1 target variable (non-dimensional open eye size).

The host of variables which will be used in this model represents those used to form Chatterjee et al.’s model. Akin to describing an apple through its colour, its texture and taste, these variables seek to describe the process from multiple different angles. This type of engineering, coined feature engineering, is a well developed method which can significantly increase the performance of models with existing data.²²⁾

Because the dataset contains many variables, each with its own relationship with respect to the open eye size, the dataset needs to be trimmed to allow for easier model consumption. Using correlation as defined in Eq. (2), variables which do not appear to be strongly correlated to the eye size can be effectively removed from further model formulation. For ease of viewing, figures in this section have been renamed through Table 1. To ensure that only the variables which have high correlation are used, a correlation cut off of 0.6 is defined and only those above will be utilized for model formulation. The threshold along with variables with and without sufficient correlation have been plotted in Fig. 3. A scatterplot of the qualified variables, exhibiting their inter-relationships, have also been included in Fig. 4.

Correlation= Covar(x,y) Var(x)×Var(y)

(2)

Where:

Var(i)= ∑ (i- i ¯ ) 2 n i=x,y

(3)

Covar(x,y)= ∑(x- x ¯ )×(y- y ¯ ) n

(4)

Table 1. Variable legend for convenient model formulation.

Slag height (m)	x1
Bath height (m)	x2
Water flow rate (litre/min)	x3
Gas flow rate (litre/min)	x4
avg bubble diameter (mm)	x5
pct gas injection	x6
Density of slag (pslag)	x7
Density diff (pl-pslag)	x8
Dynamic Viscosity of upper phase	x9
Kinematic Viscosity of upper phase	x10
gd3/µl2ρl2 *10^8	x11
H (Bath height)/db	x12
Dimensionless Up	x13
Up, m/s	x14
Fr = Up2/gh	x15
Δρ/ρl	x16
1/Re = νslag/hUp	x17
log_Fr = Up2/gh	x19
log_Δρ/ρl	x20
log_1/Re = ν_slag/hUp	x21
Actual Area	Y

Fig. 3.

Correlation values for all variables in Chaterjee et al.’s dataset.³⁾

Fig. 4.

Relationship between highly correlated variables from Chaterjee et al.’s data set.³⁾

4. Optimization of Artificial Neural Networks

4.1. Background

The goal, as with any predictive model formulation, is to identify the correct coefficients for which the error between model predicted values and actual experimental values are minimized (Eq. (5)).

Cost function= 1 2 ∑ (Y- Y ˆ ) 2

(5)

In the case of neural networks, the predicted value can be described by a series of successive relationships which describe the connections between nodes, synapses, outputs and inputs. For a single layer structure this can be described as the following:

f (2) ( f (1) ( [ X 1,1 ⋯ X 1,i ⋮ ⋱ ⋮ X n,1 ⋯ X n,i ]×[ W 1,1 (1) ⋯ W 1,k (1) ⋮ ⋱ ⋮ W i,1 (1) ⋯ W i,k (1) ] ) ×[ W 1 (2) ⋮ W k (2) ] ) = Y ˆ

(6)

Where f is a logistic function which introduces non-linearity into the model and is applied twice, once prior to and once post nodes. Visually, a single hidden layer architecture can be seen in Fig. 5.

Fig. 5.

A Generic Neural Network. (Online version in color.)

While mathematically, obtaining the weights is a simple exercise in identifying the minimum point of the cost function, high dimensional problems suffer from the curse of dimensionality. In other words, solving for all domain and range at sufficient resolution through utilizing brute force techniques, becomes impossible for complex systems. Because of this, a gradient descent algorithm is typically employed to aid in finding the converged solution. Imagine a ball placed at a random point on a curved plane. The ball naturally rolls towards the minimum through a trajectory as determined by gravity. Similarly, the gradient descent algorithm finds the direction of minimum slope iteratively. This allows a possible and much faster convergence of the cost function. As mentioned in previous works,²³⁾ the step size and learning rates of the algorithm must be carefully selected to avoid errors associated with the use of this type of algorithm.

4.2. Neural Network Creation and Validation

Currently there are many software products available which can provide the foundation for basic prediction models. These range from both open source such as R, and Python to commercial packages such as SAS. Because of its accessibility, the open source R programming environment will be employed.

The modelling approach will use the optimization methodology developed by Anastasiadis et al.²⁴⁾ In this model, learning rates are adaptive which removes the need for advance knowledge of optimal hidden nodes, starting weights and optimal learning rates. Furthermore, use of learning rates which are adaptive allow better speed and stability compared to algorithms using set learning rates.^25,26)

To ensure that proper formulation and eventually validation, the dataset will be randomly split. Where 60% of the data will be used for model formulation (i.e. training of the neural network) and the remaining will be used for validation. This is to ensure that the model is not over fit and can handle new data. A simple flow chart detailing the entire process has been included in Fig. 6.

Fig. 6.

Flow chart of a single neural network model. (Online version in color.)

5. Model Formulation

5.1. Single Model Neural Network

Neural network models consisting of a single hidden layer comprising 4, 6 and 8 nodes have been trained and deployed. The results of which have been recorded in Table 2. The results of the model based on R², and RMSE are similar to those seen in Chatterjee et al.’s work.³⁾ Unfortunately, also like Chatterjee et al.’s work, is the increase in variance seen at higher open eye sizes for all cases. This suggests for such a complex process; a single neural network may not be ideal. Plots showing actual versus predicted non-dimensional eye sizes for 4, 6, and 8 nodes have been also included in Figs. 7, 8, 9 respectively. From the plots, it is also quite obvious the upper limit of predictive power has been reached for this type of architecture and further refining to stabilize the prediction of larger eye sizes is needed for any useful neural network model. Because 8 nodes perform the best, although slightly, it will be used for further model augmentation.

Table 2. Number nodes and corresponding performance metrics.

	Number of nodes
	4	6	8
R²	0.941	0.950	0.958
RMSE	20.486	19.099	17.759

Fig. 7.

TOP: Predicted versus Actual non-dimensional eye size for a 4-node neural network. BOT: Residuals of 4-node neural network. (Online version in color.)

Fig. 8.

TOP: Predicted versus Actual non-dimensional eye size for a 6-node neural network. BOT: Residuals of 6-node neural network. (Online version in color.)

Fig. 9.

TOP: Predicted versus Actual non-dimensional eye size for an 8-node neural network. BOT: Residuals of 8-node neural network. (Online version in color.)

As an aside, previous research on optimized neural network architectures where the number of hidden nodes could be represented by the number of input nodes (i.e. input variables) gave rise to the current 4–8 node range.^27,28,29,30) As the current research focuses on the utility of this novel modelling technique, any further development in optimization of the hidden nodes & layers for steel making will not be discussed further.

5.2. Ensemble Prediction: Bootstrap Aggregating

Obtaining more data points, to strengthen the prediction, is always more ideal; However, in practice, this may not be possible. Despite the severe limitations in data points at the higher eye sizes in the current study, there are still other statistical techniques which can strengthen predictive models.

Bootstrap aggregating is a two step statistical technique employed to improve stability and accuracy of machine learning algorithms. The first step, bootstrapping, samples, with replacement, the original data set m number of times. This will create m number of datasets which are formed from fragments of the original dataset, with possibility of observing repeating segments. Once formed, separate models, totalling m number of models, are trained using dataset m and deployed against the original dataset. The final step consists of aggregating the predictions. Aggregating predictions can take many forms depending on the goals of the model (i.e., classification versus regression). For regression, average or median of the aggregated predictions are natural choices.

In this work, an initial 1000 dataset will be created through sampling, with replacement, 60%, of the original dataset. An equal number of models will be formulated, and the results aggregated comparing between median and averaged predictions. Similarly, the remaining 40% will be used to test the models.

In total 1000 distinct artificial neural networks were formulated from 1000 unique data sets. Overall, the model augmentation through bootstrapping aggregation showed favourable results. Accuracy improved nearly 35% as is observed in Fig. 8. Furthermore, increased stability can also be observed in the residuals, as seen in Fig. 9. The reduced spread of this model augmentation method can be directly attributed to the sampling and resampling of sparse data points, allowing the model to learn from segments of the data previously learnt poorly. Despite the stronger accuracy, an element of randomness can still be observed at the higher open eye areas.

5.2.1. How Many is Enough?

In this ensemble, 1000 unique neural network models from equally different datasets were generated. These, when aggregated, output a predicted value. This has been shown to be much more accurate than regression and single neural network models. Although this type of system is simple to parallelize, developing 1000 neural networks still may not be feasible for larger datasets. Therefore, it is important to define a point where additional neural networks add negligible value to the prediction. During the formation of the 1000 models, a running calculation of RMSE was tracked for both aggregate techniques (Fig. 10). From the running RMSE plot, two features can be observed: 1) the median aggregate technique consistently performed better than the average and 2) the RMSE stabilizes past 600 neural networks for both techniques. This suggests that for this particular system, for a stable prediction, over 600 different models need to be formulated.

Fig. 10.

Flow chart of the bootstrapping aggregation process part. (Online version in color.)

Fig. 11.

Results of neural network ensemble LEFT: Aggregated average, RIGHT: Aggregated median. (Online version in color.)

Fig. 12.

Residuals of neural network ensemble LEFT: Aggregated average, RIGHT: Aggregated median. (Online version in color.)

Fig. 13.

RMSE stability over n-trained Neural Networks. (Online version in color.)

6. Conclusion

The tundish open eye is a complex phenomenon whose effect on steel has not been fully understood yet and is an ongoing endeavour. Equally complex is the process in which it forms because of its inherent multi-phase and multi-physics nature. Researchers have sought to understand the process through fundamental analysis, experiments, physical model formulation, and CFD modelling. While impactful within the realm of fundamental understanding, a static model is not pragmatic in an industrial setting.

Conversely, the modelling method shown in this work, while relatively simple to implement, has profound predictive capabilities. The base data was gathered and clustered to identify those having high correlation. From here a model was built and augmented through boot strap aggregation. With enough data, some basic feature engineering, and a widely available machine learning framework, a complex phenomenon such as the tundish open eye can be better anticipated.

In a plant setting, this methodology can be further augmented via two avenues: 1) with the addition of sensor data, a more robust estimation of process parameters will become a reality via methods such as Kalman filters and 2) parallelizable model training and execution can sustain a population of models capable of making predictions in near real-time. The combination of both streams will bolster any prediction and by extension better prescription at the shop floor.

Nomenclature

ANN: Artificial neural network

Covar(x,y): Covariance function

fⁿ(): Activation function where n is the layers of synapses between nodes

f_S^2-: Activity coefficient of ionic sulphur

Fr: Froud number

g: Acceleration due to gravity (ms⁻²)

h: Depth of upper fluid phase (m)

J: Cost function for the gradient descent algorithm

Log: Base 10 logarithmic function

Δρ: Density difference between lower and upper fluid phase (kgm⁻³)

ρ_l: Density of bulk fluid phase (kgm⁻³)

R²: Coefficient of multiple determination

Re: Reynolds number

RMSE: Root mean square error

U_p: Plume velocity (m/s)

V_i: Variable count (I = 1, 2, 3…)

V_s: Kinematic viscosity of upper fluid phase (m²s ⁻¹)

Var(i): Variance function

W_i,k: Matrix of synapse weights prior to hidden node where i is the number of input variables and k is the amount of hidden nodes

W_k: Matrix of synapse weights post-hidden nodes where k is the number of hidden nodes

X_n,i: Dataset matrix where n represents the amount of data points (n = 1, 2, 3…) and i the amount of variables

x: Average of predictor

Y: Actual non-dimensional open eye size

Y ˆ : Predicted non-dimensional open eye size

y: Average non-dimensional open eye size

References

1) K. Chattopadhyay, M. Isac and R. I. L. Guthrie: ISIJ Int., 51 (2011), 573.
2) K. Chattopadhyay: Ph.D. thesis, McGill University, (2011), http://digitool.library.mcgill.ca/webclient/StreamGate?folder_id=0&dvs=1555456261452~301, (accessed 2016-06-01).
3) S. Chatterjee and K. Chattopadhyay: Metall. Mater. Trans. B, 47 (2016), 508.
4) S. Chatterjee and K. Chattopadhyay: ISIJ Int., 55 (2015), 1416.
5) D. Mazumdar and R. I. L. Guthrie: Appl. Math. Model., 12 (1988), 398.
6) D. Mazumdar, H. B. Kim and R. I. L. Guthrie: Ironmaking Steelmaking, 27 (2013), 302.
7) D. Mazumdar, H. Nakajima and R. I. L. Guthrie: Metall. Trans. B, 19 (1988), 507.
8) D. Mazumdar, R. I. L. Guthrie and Y. Sahai: Appl. Math. Model., 17 (1993), 255.
9) D. Mazumdar and R. I. L. Guthrie: Metall. Trans. B, 17 (1986), 725.
10) D. Mazumdar and R. I. L. Guthrie: Metall. Mater. Trans. B, 25 (1994), 308.
11) D. Mazumdar and R. I. L. Guthrie: ISIJ Int., 34 (1994), 384.
12) D. Mazumdar and R. I. L. Guthrie: Metall. Mater. Trans. B, 41 (2010), 976.
13) A. González-Marcos, J. Ordieres-Meré, F. Alba-Elías, F. J. Martínez-De-Pisón and M. Castejón-Limas: Ironmaking Steelmaking, 41 (2014), 262.
14) J. Mori and V. Mahalec: Expert Syst. Appl., 49 (2016), 1.
15) J. Mori and V. Mahalec: Comput. Chem. Eng., 79 (2015), 113.
16) E. Palaneeswaran, G. Brooks and X. B. Xu: Metall. Mater. Trans. B, 43 (2012), 571.
17) L. Z. Yang, R. Zhu, K. Dong, W. J. Liu and G. H. Ma: Proc. 3rd Int. Conf. on Chemical, Metallurgical Engineering, ICCMME, Zhuhai, (2014), 1540.
18) D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel and D. Hassabis: Nature, 529 (2016), 484.
19) B. E. Launder and D. B. Spalding: Comput. Methods Appl. Mech. Eng., 3 (1974), 269.
20) C. Hirt and B. Nichols: J. Comput. Phys., 39 (1981), 201.
21) Y. A. Buevich: Fluid Dyn., 1 (1966), 119.
22) R. Szeliski: Computer Vision: Algorithms and Applications, Springer Science & Business Media, Berlin, (2010), 206.
23) A. Ma, S. Mostaghel and K. Chattopadhyay: ISIJ Int., 57 (2017), 114.
24) A. D. Anastasiadis, G. D. Magoulas and M. N. Vrahatis: Neurocomputing, 64 (2005), 253.
25) R. Liu, G. Dong and X. Ling: Proc. 1995 34th IEEE Conf. on Decision and Control, IEEE, Piscataway, NJ, (1995), 1278.
26) M. Gori and A. Tesi: IEEE Trans. Pattern Anal. Mach. Intell., 14 (1992), 76.
27) J. Li, T. Chow and Y. Yu: Proc. 1995 IEEE Int. Conf. on Neural Networks, University of Western Australia, Perth Western Australia, (1995), 1229.
28) S. Tamura and M. Tateishi: IEEE Trans. Neural Netw., 8 (1997), 251.
29) K. Shibata and Y. Ikeda: Proc. ICROS-SICE Int. Joint Conf., Society of Instrument and Control Engineers, Fukuoka, (2009), 5008.
30) G. Sheela and S. N. Deepa: Math. Probl. Eng., 6 (2013), 1.

Corresponding author

Register with J-STAGE for free!