2025 Volume 65 Issue 10 Pages 1454-1462
Accurate prediction of oxygen supply in BOF steelmaking is essential for precise endpoint control, energy efficiency, and product quality. However, the high dimensionality and strong feature coupling of industrial data pose significant challenges for effective feature selection. This study proposes a comprehensive evaluation framework that integrates four filter-based methods: Pearson correlation coefficient (PCC), Spearman rank correlation coefficient (SCC), mutual information (MI), and maximal information coefficient (MIC), with five widely used regression models: elastic net (EN), support vector regression (SVR), extreme gradient boosting (XGBoost), deep neural networks (DNN), and k-nearest neighbors (KNN). The framework evaluates prediction accuracy, model sensitivity, and feature importance. Results show that MIC consistently outperformed the other methods, achieving the lowest average RMSE (197.4 m3) and highest R2 (0.649), particularly improving the robustness of models sensitive to input features. In contrast, MI resulted in significantly higher errors across all models, with SVR reaching an RMSE of 231.1 m3. Furthermore, the study introduces a hybrid PI-SHAP interpretability approach to construct feature sets that are both predictive and mechanistically meaningful, further reducing DNN prediction error by 2%. The derived feature importance rankings align closely with metallurgical principles, highlighting the dual benefits of interpretable feature selection for accuracy enhancement and domain insight. This work offers a practical framework for feature selection in complex industrial modeling tasks within the steelmaking industry.
BOF steelmaking is a mature and highly efficient process in which high-purity oxygen is blown into the converter at high velocity through a top lance. This promotes the rapid removal of impurities from molten iron and elevates the molten pool temperature, thereby enabling fast and efficient steel production. Among the various process parameters, the volume of supplied oxygen plays a pivotal role, directly influencing furnace temperature control, reaction kinetics, and metallurgical equilibrium. Therefore, developing accurate and stable oxygen supply prediction models is essential for advancing automation and intelligent optimization in BOF operations.
With the continuous advancement of industrial sensing and data acquisition technologies, large volumes of multi-source process data have been accumulated in metallurgical production. This data richness has propelled machine learning to the forefront of process modeling and endpoint prediction. Previous studies1,2,3,4,5,6,7,8) have made substantial progress in BOF endpoint forecasting, additive optimization, and process control, using both static and dynamic modeling approaches. Compared to traditional mechanism-based models grounded in thermodynamic and kinetic principles, data-driven methods offer distinct advantages in capturing nonlinear relationships and complex multivariate couplings, and have emerged as powerful tools for oxygen prediction in BOF steelmaking.
Nevertheless, in real-world industrial applications, high-dimensional, redundant, and noisy input features often degrade model performance and hinder interpretability, making feature selection a critical preprocessing step. Correlation-based techniques, such as the Pearson and Spearman coefficients,9,10,11,12) are commonly employed due to their simplicity and computational efficiency. While effective in reducing feature dimensionality and computational load, they are typically limited to detecting linear or monotonic associations, and thus may fail to uncover more intricate dependencies. Alternative methods, such as Lasso regression,13) decision trees,14) and autoencoders,15) leverage regularization or sparsity principles but often suffer from model dependency and limited generalizability. Model performance-guided selection strategies,1,6,16,17,18,19,20) though more predictive, are constrained by high computational demands and poor scalability in industrial deployment.
In current metallurgical modeling practice, feature selection largely relies on empirical knowledge or correlation-based filtering, with limited support for systematic validation or interpretability. This challenge is particularly pronounced in BOF steelmaking, where process variables exhibit strong batch-to-batch variation, complex interactions, and operator-driven variability. For instance, variables such as hot metal temperature, blowing pattern, and lance flow rate may not exhibit strong linear correlations with oxygen demand but instead operate synergistically. Identifying such mechanism-informed features remains a critical bottleneck in the development of intelligent modeling systems.
To address these challenges, this study proposes a comprehensive evaluation framework that combines four filter-based feature selection methods: Pearson correlation coefficient (PCC), Spearman rank correlation coefficient (SCC), mutual information (MI), and maximal information coefficient (MIC), together with five representative regression models: elastic net (EN), support vector regression (SVR), extreme gradient boosting (XGBoost), deep neural network (DNN), and k-nearest neighbors (KNN). This framework enables a systematic comparison of predictive performance and feature selection adaptability across diverse modeling strategies. Additionally, an interpretable feature selection scheme was introduced by integrating permutation importance (PI) and SHapley additive exPlanations (SHAP), enabling the construction of feature subsets that are both predictive and consistent with metallurgical mechanisms. Comparative analyses were conducted in terms of model accuracy, variable contribution, and feature relevance. Furthermore, the study examined model response patterns to feature construction and validated the resulting feature rankings through engineering interpretation grounded in BOF process knowledge. In contrast to prior research that often focuses on a single model or isolated selection method, this work presents an integrated and interpretable framework for feature selection, offering both theoretical insights and practical guidance for complex industrial modeling scenarios.
Feature selection techniques can be broadly categorized into three groups: filter, wrapper, and embedded methods,21,22,23,24) as illustrated in Fig. 1. Among these, filter methods are the most commonly applied in practical prediction tasks due to their model independence and computational efficiency. To evaluate their applicability in BOF oxygen supply prediction, this study compares four representative filter-based algorithms: PCC, SCC, MI, and MIC. These methods assess the relevance between input features and the target variable from four perspectives: linear correlation, monotonic trends, information dependence, and arbitrary functional associations.

This method measures the strength of the linear correlation between two continuous variables by normalizing covariance to account for scale differences. The formula is:
| (1) |
where Cov(X, Y) is the covariance between variables X and Y, σX and σY are their standard deviations, and
This method evaluates monotonic relationships, including both linear and nonlinear trends, based on rank values. It is robust to outliers. The formula is:
| (2) |
where di is the rank difference for the ith observation, and n is the number of observations.
2.1.3. MIThis method measures general statistical dependence between variables by computing entropy from joint and marginal probability distributions:
| (3) |
where p(x,y) is the joint probability distribution, and p(x), p(y) are the marginal distributions of X and Y.
2.1.4. MICThis method extends MI by employing adaptive grid discretization to capture more complex associations. The formula is:
| (4) |
where a and b are the number of bins in the X and Y directions, and B is the maximum grid size, typically B=n0.6, with n as the sample size.
2.2. Model SelectionTo comprehensively assess the effectiveness of various filter-based feature selection approaches for BOF oxygen supply prediction, this study employs five representative regression models.25,26,27,28,29) These models span a broad spectrum of complexity, ranging from linear methods to deep learning architectures, and reflect diverse modeling paradigms. Widely used in industrial practice, they facilitate a multifaceted evaluation of how different feature subsets affect predictive performance, providing both complementary insights and comparative benchmarking. The underlying principles and suitable application scenarios for each model are summarized in Table 1.
| Model Name | Principle | Application Scenarios |
|---|---|---|
| EN | Combines L1 and L2 penalties for regularization | Suitable for handling multicollinearity and feature selection in regression tasks |
| SVR | Uses kernel functions to map data into higher dimensions for regression | Suitable for small to medium-sized datasets with complex non-linear relationships |
| XGBoost | Gradient boosting framework that combines multiple weak learners | Suitable for high-dimensional data, non-linear relationships, and tasks requiring fast, accurate predictions |
| DNN | Uses multiple layers of neurons to learn complex patterns in data | Suitable for large-scale datasets with intricate non-linear relationships |
| KNN | Predicts target values based on the closest K neighbors in the feature space | Suitable for small datasets where local similarity provides useful information |
In industrial modeling, the interpretability of model outputs plays a crucial role in enabling practical deployment and fostering user confidence. To enhance both transparency and engineering relevance, this study introduces a hybrid PI-SHAP feature selection approach30,31,32) and systematically compares it with conventional filter-based methods.
PI is a model-agnostic, post-hoc evaluation technique that assesses the global importance of each feature by randomly permuting its values and measuring the resulting impact on model performance. In contrast, SHAP, based on Shapley values from game theory, quantifies the marginal contribution of each feature to individual predictions and aggregates these effects to derive global importance scores.
Whereas PI emphasizes the effect of feature perturbation on overall performance, SHAP reveals the internal attribution pathways that contribute to specific predictions. To capitalize on the strengths of both methods, this study normalizes and integrates their global importance scores to form a unified feature importance metric. This combined measure enables the construction of feature subsets that are simultaneously optimized for predictive accuracy and interpretability.
To systematically evaluate the predictive performance of different feature selection strategies for BOF oxygen supply modeling, a unified experimental workflow was developed, as illustrated in Fig. 2. The process began with data cleaning and standardization to ensure consistency in feature dimensions and to eliminate the influence of varying units. Subsequently, each filter-based method was used to rank all features, from which the top eight were selected to construct corresponding feature subsets.

In the modeling phase, five regression models were independently trained on each subset, forming a cross-combination structure of “feature selection method × model type.” To enhance model performance and mitigate hyperparameter tuning bias, the Grey Wolf Optimizer (GWO)33) was employed to optimize key model parameters. Furthermore, a stratified five-fold cross-validation strategy, based on oxygen supply intervals,34) was applied (as shown in Fig. 3) to address sample distribution imbalance and ensure sufficient learning for underrepresented cases.

After training, the optimized models were applied to the entire dataset to evaluate global feature importance using the PI-SHAP approach. Based on the resulting importance scores, new feature subsets were generated and used to retrain the five models. The outcomes were then compared with those obtained using the original filter-based feature sets. Additionally, by examining the feature rankings in conjunction with the underlying physical mechanisms of BOF steelmaking, the practical contribution of key variables was interpreted. This engineering-informed analysis was used to validate both the relevance and interpretability of the selected features.
3.2. Data Processing and Feature SelectionThe dataset used in this study was derived from BOF steelmaking production records collected at a large steel plant between January and March 2025. After removing entries with missing values or abnormal heats, a total of 2779 valid samples were retained. Features deemed irrelevant to oxygen supply prediction or unavailable in real time (such as endpoint carbon content and temperature) were excluded. Consequently, 19 key features were selected for modeling, as summarized in Table 2.
| Variable | Label | Minimum | Maximum | Mean | Standard deviation |
|---|---|---|---|---|---|
| Total oxygen supply/m3 | Y | 5128 | 8112 | 6164.87 | 266.83 |
| Shutdown time/min | X1 | 15 | 4700 | 417.03 | 410.71 |
| Height of molten pool level/mm | X2 | 6850 | 8400 | 8108.98 | 132.51 |
| Number of splashes | X3 | 0 | 11 | ||
| Calcined lime/kg | X4 | 408 | 7901 | 3513.42 | 783.61 |
| Calcined dolomite/kg | X5 | 0 | 4088 | 2059.06 | 257.85 |
| Dolomite lumps/kg | X6 | 0 | 4062 | 17.38 | 156.82 |
| Calcium ferrite/kg | X7 | 0 | 5254 | 423.12 | 645.75 |
| Ore fines/kg | X8 | 0 | 3141 | 48.75 | 253.42 |
| Magnesium pellet/kg | X9 | 0 | 1004 | 32.26 | 111.99 |
| Coke fines/kg | X10 | 0 | 2860 | 24.96 | 84.44 |
| Blowing mode | X11 | 0 | 10 | – | – |
| Hot metal weight/t | X12 | 92 | 140 | 110.87 | 4.49 |
| Hot metal temperature/°C | X13 | 1251 | 1455 | 1348.32 | 32.03 |
| Scrap steel weight/t | X14 | 12.8 | 51.6 | 38.97 | 4.39 |
| [%C]i/wt% | X15 | 3.51 | 7.09 | 4.68 | 0.33 |
| [%Si]i/wt% | X16 | 0.03 | 1.35 | 0.40 | 0.15 |
| [%Mn]i/wt% | X17 | 0.1 | 0.78 | 0.31 | 0.13 |
| [%P]i/wt% | X18 | 0.076 | 0.215 | 0.116 | 0.020 |
| [%S]i/wt% | X19 | 0 | 0.105 | 0.027 | 0.011 |
Note: [%C]i, [%Si]i, [%Mn]i, [%P]i and [%S]i are the initial Si, S, Mn, P, and S content of the hot metal, respectively.
Due to significant differences in scale and distribution across the feature variables, all variables were standardized using Z-score normalization:
| (5) |
where X represents the original data, μ is the mean of the feature, and σ is the standard deviation.
Based on the standardized dataset, the four filter-based methods were applied to evaluate the correlation or information dependency between each feature and the target variable. The top eight features, ranked by their respective scores, were selected to construct feature subsets, which were then used as input for subsequent model training.
3.3. Model Optimization and Evaluation MetricsDuring model training, a stratified five-fold cross-validation strategy was employed, based on predefined oxygen supply intervals. Specifically, the dataset was partitioned into three ranges: [5128, 5898), [5898, 6432], and (6432, 8112]. Within each range, samples were further divided into five subsets using a four-training, one-testing split. This cross-validation process was repeated across all subsets, and the average results from the five folds were used as the final performance metrics.
To ensure fair and consistent hyperparameter tuning across models, the GWO was used to automatically search for optimal parameters for the five regression models. The GWO population size was set to 30, with a maximum of 100 iterations, and the convergence coefficient followed a linear decay from 2 to 0. Hyperparameter search ranges were determined based on relevant literature25,26,27,28,29) and preliminary experimental results. The detailed search space for each model is presented in Table 3. The best-performing parameter combinations were selected and used in subsequent training and evaluation stages.
| Model | Hyperparameter | Range |
|---|---|---|
| EN | l1_ratio | [0, 1.0] |
| alpha | [0.0001, 10] | |
| SVR | C | [0.1, 100] |
| epsilon | [0.01, 1] | |
| XGBoost | n_estimators | [50, 300] |
| max_depth | [3, 20] | |
| learning_rate | [0.01, 0.5] | |
| DNN | n_hidden_layers | [3, 10] |
| n_units | [20, 200] | |
| learning_rate | [0.0001, 0.1] | |
| KNN | n_neighbors | [1, 50] |
| distance metric | [‘euclidean’, ‘manhattan’] |
After model training, two standard evaluation metrics were used: root mean square error (RMSE) and the coefficient of determination (R2). Their mathematical definitions are as follows:
| (6) |
| (7) |
where yi is the actual value,
Figure 4 presents the scoring results of PCC, SCC, MI, and MIC for evaluating the correlation or dependency between each input feature and the oxygen supply. Figure 5 summarizes the selection frequency of each variable among the top eight features across the four methods.


PCC and SCC are both correlation-based filter methods, yet their scoring results differ significantly. For example, feature X2 scores 0.33 under PCC, indicating a strong linear relationship with oxygen supply. However, its SCC score decreases to 0.11, suggesting that while a general trend exists, local rank inconsistencies reduce monotonicity. Conversely, feature X5 receives a PCC score of 0.16 but a negative SCC score of −0.02, implying possible rank inversion or local noise, which hinders SCC’s ability to capture consistent monotonic trends.
In contrast, MI and MIC are both grounded in information theory and capable of capturing arbitrary, potentially nonlinear dependencies. However, they differ markedly in computational strategy. MI relies on variable discretization, making it sensitive to sample distribution irregularities. As a result, its scores tend to be lower and less discriminative. For instance, the dependency scores of key features X16 and X17 under MI are only 0.05 and 0.03, respectively, whereas MIC yields higher values of 0.11 and 0.09. This demonstrates MIC’s stronger capability in identifying complex coupling relationships and providing more stable evaluations.
From the final selection results, features X16, X14, X17, and X19 receive high scores across all four methods and are selected consistently, as shown in Fig. 5. This indicates statistical relevance and cross-method agreement. However, not all selected features have a clear causal relationship with oxygen supply from a metallurgical standpoint. For example, although X19 (S content in hot metal) ranks highly, its typical concentration is low and has limited direct impact on oxygen demand. In contrast, elements such as carbon and silicon are more mechanistically relevant. This highlights a limitation of filter-based methods: their vulnerability to spurious statistical patterns. Therefore, feature importance should be interpreted in conjunction with domain knowledge to avoid overestimating variables lacking true mechanistic significance.
4.2. Feature Selection and Model Performance AnalysisThe top eight features selected by each filter method were individually used as inputs for training the five regression models. RMSE and R2 were employed as evaluation metrics to assess model performance. Figure 6 compares the predictive results across all feature selection–model combinations.

Using the commonly applied PCC method as a baseline, the results show that MIC consistently outperforms the other methods across multiple models, particularly those handling nonlinear relationships. For instance, in the DNN model, MIC-based feature selection reduces RMSE from 209.91 m3 to 198.47 m3 (a 5.46% decrease), while R2 improves from 0.58 to 0.64, marking an 11.32% increase. In the EN model, RMSE drops from 207.97 m3 to 199.40 m3 (a 4.12% reduction), and R2 rises from 0.59 to 0.64 (an 8.28% improvement), indicating that MIC also enhances performance in linear models. For SVR and KNN, MIC yields moderate gains, with R2 increases of 3.03% and 0.94%, and RMSE reductions of 1.82% and 0.60%, respectively.
SCC also performs well, particularly in the DNN model, where it reduces RMSE by 4.70% and increases R2 by 9.79%. A consistent performance improvement is observed in models such as EN, SVR, and XGBoost. For example, in XGBoost, SCC raises R2 by 2.01% and decreases RMSE by approximately 1.02%. Although less effective than MIC overall, SCC still provides stable and meaningful gains across various models.
In contrast, MI demonstrates consistently poor performance and results in significant prediction degradation. In the XGBoost model, MI increases RMSE to 227.44 m3 (11.42% higher than PCC), while R2 drops sharply from 0.60 to 0.46, a 24.02% decrease. The worst case is observed in SVR, where RMSE surges to 231.12 m3 and R2 falls to 0.45. These findings suggest that MI fails to capture the underlying feature–target relationships in this task and lacks robustness and reliability.
A sample-level analysis of the effects of different feature selection methods on the predictive performance of the KNN model is presented in Fig. 7. The figure reports the first quartile (Q1), mean, and third quartile (Q3) of absolute prediction errors. The results indicate that MIC yields the most concentrated error distribution among all methods. Specifically, 71.4% of predictions have errors below 200 m3, forming a sharp peak with a narrow tail—suggesting excellent error control and stability. In contrast, MI produces a heavily right-skewed error distribution with a long tail; the Q3 exceeds 240 m3 and the mean error reaches 168 m3, indicating poor containment of high-error samples.

PCC and SCC also perform reasonably well, with mean errors of 154 m3 and 152 m3, respectively. However, both exhibit a higher incidence of large errors compared to MIC, as evidenced by Q3 values of 218 m3 and 217 m3.
Taken together, the performance differences observed across the “feature selection method × model type” combinations reveal that sensitivity to feature selection varies by model. SVR and EN are particularly dependent on the choice of feature construction, with RMSE differences between MI and MIC reaching 37.06 m3 and 31.60 m3, respectively. This suggests that these models lack strong internal feature regulation and are highly sensitive to the quality and relevance of input variables.
While DNNs possess robust feature representation and nonlinear mapping capabilities, they still show substantial performance variation with different feature sets. For example, using MIC-selected features reduces the RMSE from 227.59 m3 to 198.47 m3, a decrease of 29.12 m3. This underscores the importance of incorporating highly relevant and low-redundancy features, even in deep learning models applied to high-dimensional industrial data.
Compared to other models, KNN and XGBoost exhibit greater fault tolerance, showing smaller RMSE fluctuations across different feature selection methods. The maximum RMSE differences for KNN and XGBoost are 24.34 m3 and 23.79 m3, respectively, suggesting that MIC contributes more modest performance gains in these cases.
4.3. Feature Importance Analysis and Mechanistic Interpretation 4.3.1. Performance Validation of the PI-SHAP MethodTo assess the practical effectiveness of the PI-SHAP method for feature selection, the top eight features identified through this approach were used for a second round of model training across the five regression models. The resulting performance was compared with that of traditional filter-based methods, as illustrated in Fig. 8.

Compared with the best-performing MIC method, PI-SHAP further improved predictive performance in most models, demonstrating strong generalization and cross-model adaptability. In the DNN model, the feature subset generated by PI-SHAP reduced the RMSE from 198.47 m3 to 194.6 m3, representing the most substantial error reduction among all models. For the KNN model, RMSE decreased from 193.3 m3 (MIC) to 191.3 m3, yielding the lowest error across all evaluated configurations. This indicates that PI-SHAP provides effective local error suppression in non-parametric settings.
For linear or kernel-based models such as EN and SVR, the improvements from PI-SHAP were marginal, suggesting that its advantages are less pronounced when the model’s representational capacity is limited. Nevertheless, PI-SHAP consistently delivered modest performance gains over MIC in most cases and proved particularly beneficial in scenarios involving high model complexity or strong sensitivity to feature interaction structures.
4.3.2. Global Importance Ranking and Correspondence to Metallurgical MechanismsFigure 9 presents the PI and SHAP importance scores for each feature across the five regression models. The horizontal axis denotes the importance values, and the length of each bar reflects the relative contribution of the corresponding feature. Overall, the rankings derived from PI and SHAP exhibit a high degree of consistency. The top six features are identical across both methods, confirming the robustness of key variable identification from multiple interpretability perspectives.

These highly ranked features possess clear physical significance within the BOF steelmaking process. X14 (scrap steel weight) and X12 (hot metal weight), as primary input materials, jointly determine the total mass of reactants and directly influence oxygen consumption. X15 ([%C]i) and X16 ([%Si]i) are key reactive elements that combine with oxygen to form products such as CO, CO2, and SiO2 during smelting.
Notably, X13 (hot metal temperature) and X12 (height of molten pool), though not prioritized by traditional filter-based methods, rank relatively high in the interpretability analysis. Hot metal temperature affects the carbon–oxygen reaction rate and thermal control in the furnace—lower temperatures necessitate additional oxygen to maintain thermal equilibrium. The molten pool height influences the effective lance position, which alters oxygen jet distribution and reaction efficiency, suggesting a possible coupling effect.
In addition, X5 (calcined dolomite) and X8 (ore fines), though auxiliary rather than primary reactants, may release oxygen or modify slag composition during smelting, thereby indirectly impacting the oxygen balance. Conversely, features with low and inconsistent importance across models are likely to have limited influence on oxygen supply or be more susceptible to data noise, reducing their modeling value.
In summary, the PI-SHAP framework not only enhances model prediction accuracy but also reliably identifies variables that align with metallurgical mechanisms. These methods provide strong interpretability and practical utility, reinforcing their suitability for feature selection in complex industrial modeling tasks.
4.4. Comprehensive Comparison of Feature Selection MethodsThis study systematically compared the performance of four filter-based feature selection methods with that of the PI-SHAP method in the context of BOF oxygen supply modeling. The results demonstrate that filter-based approaches offer advantages such as ease of implementation, computational efficiency, and flexible deployment—particularly in scenarios involving high-dimensional features, constrained computational resources, or lightweight models. For instance, in the EN and KNN models, the MIC method effectively reduced RMSE to 199.4 m3 and 192.5 m3, respectively, achieving performance levels close to the optimum. In contrast, PI-SHAP exhibited stronger performance in high-complexity models such as DNN and SVR, highlighting its advantage in capturing feature interactions and providing interpretable insights under nonlinear conditions.
Overall, a two-stage strategy is recommended: initial dimensionality reduction using a filter-based method such as PCC or MIC, followed by interpretability-guided refinement with PI-SHAP within the selected model framework. This hybrid approach offers the combined benefits of predictive accuracy, efficiency, and mechanistic interpretability—making it well suited for complex industrial modeling tasks.
It is worth noting that in current industrial modeling practice, conventional oxygen supply prediction models based on thermodynamic mechanisms remain widely adopted. While such models offer clear theoretical foundations and engineering interpretability, their predictive accuracy and adaptability are often limited when facing raw material fluctuations, unstable process conditions, and complex variable couplings. For example, mechanism-based models35,36) derived from oxygen balance equations typically yield RMSE values ranging from 250 to 300 m3 under real production conditions. In contrast, the PI+SHAP+KNN model proposed in this study achieved an RMSE of 191.3 m3 in high-dimensional multivariate scenarios, demonstrating significantly improved robustness and adaptability.
In summary, the proposed modeling framework not only outperforms traditional physics-based approaches in terms of accuracy and stability, but also exhibits superior generalizability and deployment potential. It provides a practical and scalable pathway for enhancing and complementing conventional mechanistic models through intelligent data-driven strategies.
To address the challenges of high-dimensional redundancy and complex feature coupling in modeling oxygen supply during BOF steelmaking, this study developed a comprehensive evaluation framework that integrates feature selection techniques with predictive modeling. An interpretable feature selection strategy was also introduced for comparative validation. The key conclusions are as follows:
(1) Feature selection significantly influences model performance. Among the methods evaluated, MIC achieved the lowest average RMSE (197.42 m3) and the highest R2 (0.64895) across all five models. It notably enhanced predictive accuracy in models sensitive to input features, such as EN, SVR, and DNN. In contrast, MI produced the poorest performance, with the RMSE of SVR reaching 231.12 m3, indicating substantial model degradation.
(2) Model sensitivity to feature selection varies. EN and SVR were highly dependent on the choice of input features, with RMSE differences up to 31.60 m3 and 37.06 m3, respectively. DNN exhibited moderate sensitivity, with MIC reducing its RMSE by 29.12 m3. KNN and XGBoost demonstrated greater robustness, showing relatively limited variation in RMSE across different feature sets.
(3) The PI-SHAP method improves performance and interpretability in complex models. In the DNN model, PI-SHAP reduced RMSE from 198.5 m3 (under MIC) to 194.6 m3, representing a nearly 2% improvement and a reduction of over 33 m3 compared to MI. The selected features were closely aligned with metallurgical mechanisms, enhancing both the credibility and industrial applicability of the models.
(4) A two-stage feature selection strategy is recommended. Initial filtering using methods such as PCC or MIC facilitates rapid dimensionality reduction and redundancy elimination. This can be followed by interpretability-driven refinement using PI-SHAP, enabling the identification of globally or locally important features. This hybrid approach supports both model accuracy and mechanistic insight, making it well suited for complex industrial modeling applications.
Authors are requested to declare any conflicts of interest related to the conduct of this research.
This work was supported by the National Natural Science Foundation of China (No. 52074001), the Anhui Provincial Universities Outstanding Scientific Research and Innovation Team Project (2002AH010024), and the Anhui Province Discipline Leader Training Program (DTR2023015).