An Improved Model Based on YOLO v5s for Intelligent Detection of Center Porosity in Round Bloom

Zi-xuan Xiao; Zheng-hai Zhu; Guang-xu Wei; Shang-Dong Liang; Cheng-cheng Yang; Xiang Zheng; Dong-jian Huang; Fei He

doi:10.2355/isijinternational.ISIJINT-2023-341

Abstract

To address the problem that speed and accuracy cannot be taken into account in intelligent detection models of center porosity in round bloom, an improved model for it based on YOLO v5s (the fifth version of You only look once) was determined by establishing a data set of about 10000 images, setting up a contrast experiment and an ablation experiment embedded with Coordinate Attention and Slim-neck modules. The results show that the improved YOLO v5s has good detection performance: mAP@0.5 of the verification set reaches 99.17%, which is respectively 0.2%, 0.1%, 2.9% and 1.7% higher than Faster RCNN, SSD, YOLO v3-Tiny and YOLO v5s; the detection speed is 86 fps, which is respectively 514.2%, 168.8% higher than Faster RCNN, SSD and maintains the speed of the original YOLO v5s while its accuracy is improved. The operation time of a single picture in the testing set is only 0.015 s, which could be implemented the achieve real-time and accurate location of center porosity in round bloom. This study provides a new method for the research of detecting center porosity, which is helpful to the development of intelligent detection of defects in continuous casting billet.

1. Introduction

Center porosity are an internal quality defect that is difficult to avoid during the solidification process of round bloom. Severe center porosity will affect the yield of rolled steels, not conducive to the production of high-quality steel in iron and steel enterprises as well. Rapid and accurate detection of center porosity is a basis for improving and ensuring the quality of round bloom.

Normally, defects of continuous casting billet are detected manually by experts. However, in the inspection process, inconsistent results occur between different personnel detecting the same billet, while the accuracy of inspection is limited by vision, thus the subjective factor of manual inspection greatly affects the rating of the quality of continuous casting billet. With a rapid development of image processing technology based on computer vision, target detection algorithms are available as a non-contact, highly accurate, and automated solution to assist or replace manual work in the accurate identification and localization of defects. Target detection algorithms are mainly divided into algorithms based on traditional machine learning and deep learning.¹⁾ Traditional target detection algorithms rely heavily on manual feature creation, which negatively impacts detection results if features are improperly or inadequately extracted. Wu²⁾ et al. used Adaboost algorithm to coarsely detect cracks in images, applying Gabor wavelets and Canny edge detection to extract features, and then compared detection results for feedback, but the capacity of the sample library and test samples of this system were not sufficient, the target detection speed supposed to be improved; Neogi³⁾ et al. proposed the application of globally adaptive percentile threshold to gradient images, a method that improved the accuracy of detecting defects of different sizes, but it was susceptible to error in the presence of diverse backgrounds and variable illumination. To a certain extent, traditional detection algorithms address errors arising from subjective factors, yet it is more demanding on the imaging environment and reduces robustness and generalization ability of manually designed features in the case of diverse defect shapes, obscure features and chaotic scenes, which is not easy to implement in practical engineering. With the advent of the second wave of machine learning, researches on deep learning continue to heat up, so target detection algorithms based on deep learning⁴⁾ have attracted the attention of metallurgists due to their fast recognition speed and strong generalization ability. Zhuang⁵⁾ et al. analyzed features of surface images of continuous casting billet by training a convolutional neural network to select ideal parameters, and combined with contour extraction methods to detect the whole image to verify the classification accuracy. Although it has achieved a high accuracy rate, the method leads to fluctuations in the rate of false detection and missed detection without high-quality training images; Li⁶⁾ et al. proposed an embedded attention mechanism and custom field block structure network of YOLO v4, which enhanced information collection and feature extraction capability of the network and provided a new detection method for detecting surface defects. As shown in Figs. 1 and 2, two types of algorithms achieve automatic extraction of feature information in the end-to-end learning process, largely solving the display feature design.⁷⁾ Although accuracy of two-stage algorithms^8,9,10) is relatively high, their detection speed is far inferior to that of one-stage algorithms, so they cannot meet requirements of detection speed in engineering; one-stage algorithms^{11,12,13,14,15)} are fast enough to achieve real-time detection, but the detection accuracy still needs to be improved.

Fig. 1. Two-stage algorithms based on candidate region. (Online version in color.)

Fig. 2. One-stage algorithms based on regression. (Online version in color.)

In order to meet the accuracy and speed of industrial inspection, this paper will use YOLO v5s model to identify and locate center porosity in round bloom. Firstly, to address the problem of insufficient sample size and high similarity of training sets in current studies, a data set established in this paper contained a total of 898 images of different sizes and grades of center porosity from two steel grades. Through Data Augmentation by applying filtering, adding noise, adjusting lightness and darkness, and rotating angles to simulate complex detection scenarios, images of the original data set were increased to about 10000, ensuring enough rich training and validation data to improve the robustness of the model. Secondly, to address a problem of complex structure, many parameters and high configuration required for training of YOLO v5s, its network structure was adjusted and improved accordingly: Coordinate Attention was embedded to Backbone to enhance attention to extracted features and Slim-neck by GSConv was integrated into Neck to lighten the depth of convolutional layers while preserving the connections between them, so as to balance the detection speed and accuracy of the model. In order to provide an effective detection method for the identification of center porosity in round bloom.

2. Networks Improvement and Experiment Preparation

In this paper, based on YOLO v5s, we first performed an operation of Data Augmentation on the user-defined data set to improve the robustness of the model, then embedded Coordinate Attention mechanism in Backbone to improve the feature extraction ability, finally introduced Slim-neck lightweight network based on GSConv in Neck part to reduce network parameters and improve detection speed, which is shown in Fig. 3. The above improved model, shown in Fig. 4, guaranteed efficiency and met the real-time requirement.A network structure of the improved model based on YOLO v5s.

Fig. 3. The network structure of the improved model based on YOLO v5s. (Online version in color.)

Fig. 4. Applied modules and improved methods. (Online version in color.)

2.1. Network Structure of Improved Model Based on YOLO v5s

In modern industrial production, target detection algorithms based on deep learning are more and more frequently used in material defect recognition and location. YOLO series algorithms stand out among them because of superior detection speed and accuracy. The main working principle is that after the input layer inputs an image, the output layer will directly perform regression prediction on the border position and category of the target. YOLO v5 incorporates advantages of YOLO series of versions, in which YOLO v5s has a relatively small number of parameters and has high accuracy while fast detection. Hence this paper chooses YOLO v5s as the base network model, which is undertaken transfer learning with pre-training weights to enhance the generalization ability of a detector. A network structure of YOLO v5s is shown in Fig. 5.

Fig. 5. A network structure of YOLO v5s. (Online version in color.)

The input layer uses Mosaic to enrich a data set. As shown in Fig. 6, images to be trained being randomly selected are spliced and combined into new images through random scaling, random cropping and random arrangement¹⁶⁾ to improve accuracy of target detection. After images of different sizes are scaled by adaptive image scaling, a minimum number of black edges is added to unify the image size, in order to enhance the training speed and network accuracy of the model. The adaptive anchor frame design can automatically calculate the optimal anchor frame value for different data sets.

Fig. 6. Images processed by Mosaic. (Online version in color.)

The main function of Backbone is feature extraction. Its main modules are shown in Fig. 7. The CSPNet (Cross Stage Partial Network) uses feature information from different layers to obtain a richer feature map, reducing the number of parameters and FLOPS values of the model and achieving feature reuse. Focus module includes a total of 4 slice operations and a convolution operation of 32 convolution kernels, reducing the amount of calculation and parameters, thereby increasing the speed. SPP¹⁷⁾ module uses 4 different pooling kernels, e.g. 1×1, 5×5, 9×9 and 13×13, for processing to achieve adaptive size output.

Fig. 7. Main modules of Backbone. (Online version in color.)

The Neck part uses FPN (Feature Pyramid Network)¹⁸⁾ to fuse different levels of feature maps through up-sampling and down-sampling to enhance the multi-scale information representation and improve the accuracy of target detection. Finally, Head outputs three types of feature maps of different sizes to filter target frames and predicts them by NMS (Non-Maximum Supression).¹⁹⁾ In summary, YOLO v5s has a good balance of detection speed and accuracy in the design of each module, making it a superior performer in a task of target detection.

2.1.1. Improving Backbone with Coordinate Attention

In deep learning models, local information is generally extracted using convolutional kernels, but it influences whether an image is correctly recognized. So attention mechanisms^20,21,22) are often introduced into models to strengthen the representation of information in this part while suppressing other useless information. The attention mechanism is a model proposed by Treisman and Gelade²³⁾ to mimic the attention mechanism of the human brain through assigning different weights to different parts of the input, focusing on more critical local information and making more accurate judgments to optimize the model without imposing a greater computational and storage overhead on the model.²⁴⁾

CA (Coordinate Attention) decomposes the location information into longitudinal and lateral directions. They are encoded into the channel attention and aggregated into direction-aware and location-aware features. As shown in Fig. 8, firstly, feature maps of height and width are obtained by encoding the pooled kernel along two directions respectively; then after the feature map of C * 1 * W generated by stitching them is reduced by convolution operation, the feature map of two directions normalized in batches are increased to the initial number of channels according to the original height and width respectively; passing through the Sigmoid activation function, the attention weights in two directions are obtained; finally, the feature map with attention weights in the width and height is obtained by multiplicative weighting calculation on the original feature map. Compared with some of the more complex attention mechanisms, CA are generally more computationally efficient. And CA assigns attention weights to the elements in the input sequence that are relevant to the current position, which can capture important information more accurately.in contrast,some traditional attention mechanisms may assign the same weight to the entire sequence, failing to accurately differentiate the importance of different elements.

Fig. 8. The calculation process of Coordinate Attention mechanism. (Online version in color.)

CA can be deployed in either the Backbone or the Neck part of the YOLO v5s network structure, with the main aim of enhancing the focus on key features and making long-range dependencies on feature positions, ultimately generating a more logical output. Therefore, the final choice was made to introduce this mechanism in Backbone the feature extraction part, rather than in Neck the feature fusion part. If the attention mechanism is added to every stage of the Backbone, it will deepen the layers of the network, increase the computational complexity and reduce the computational speed, which is counterproductive.In this paper, we chose to add the attention mechanism at the last layer of the Backbone, where CA attention mechanism only needed to focus locally on the extracted features, extending the network’s capacity to focus on specific parts of the input, optimizing the model, improving the accuracy of the model while reducing the computational effort to simplify the model and speed up computation.

2.1.2. Improving Neck with GSConv and VoVGSCSP Modules

In general, as input images are fed through the convolutional network, spatial information is gradually transferred to the channel. In this process, some semantic information is lost with each spatial compression or channel expansion of the feature image. Although YOLO v5s can effectively reduce this loss of information through dense convolutional layers, the use of dense convolutional computation comes at the cost of speed. In this situation, existing lightweight designs^{25,26,27,28,29)} reduce the computational effort through Depthwise Separable Convolution, such as Xception, MobileNets and ShuffleNets. Since the channel information of the input image is separated in the calculation process, this method leads to a decrease in accuracy. GSConv, a lightweight convolution, achieves this by reducing the amount of convolution computation while retaining as many hidden layers between each channel as possible, ultimately reducing the loss of semantic information during the transfer process. Beacuse GSConv uses shuffle to make the output of convolution calculation close to SC to reduce the loss of semantic information in the process of transmission, while others separate the channel information of the input image in the calculation process through the DSC (Depth-wise Separable Convolution) to increase speed but lead to a decrease in accuracy. It is a method proposed by H³⁰⁾ to reduce model complexity and maintain accuracy. The main modules GSBottleneck and VoVGSCSP are shown in Fig. 9. The VoVGSCSP uses a one-shot aggregation method to design a cross-level partial network (GSCSP) instead of CSPNet consisting of standard convolution in Neck to fuse feature information, maintaining accuracy while effectively reducing computational effort and network structure complexity.

Fig. 9. The structures of a GSBottleneck module and a VoVGSCSP module. (Online version in color.)

Using GSConv in backbone will lead to the loss of semantic information transmitted to the channel. After processing in backbone, the channel dimensions of feature images become long enough and the spatial dimensions are small enough to have less redundant information. Applying the GSConv-based expansion module to process these feature maps in Neck can accelerate the capture of feature regions. If it is used at all of the model, the network layers will deepen, significantly increasing the computational effort and reducing the detection speed.

2.2. Experiment Preparation

2.2.1. Establishment of a Data Set

This paper focused on center porosity defects of continuous cast billet with a data set derived from real industrial inspections. Figure 10 shows the images of center porosity formed by different steel grades under the influence of different factors to enrich the morphology of center porosity and improve the robustness of model training.

Fig. 10. Center porosity samples of round bloom. (Online version in color.)

In the data acquisition process, the image quality would be affected by degree of acid corrosion on the defects of the image, degree of oxidation of the surface due to placement of specimen after pickling, brightness of the image at the time of acquisition, a reflection of light from the surface of a specimen, etc. In order to simulate complex industrial scenes and ensured that the model training had enough rich samples, this experiment applied filtering, added noise, adjusted brightness and darkness, rotated angle and other methods to the original data set for data augmentation processing. As shown in Fig. 11, (a) is an original image; (b–d) are respectively processed by Mean Filtering, Median Filtering and Bilateral Filtering; (e) is added Gaussian noise; (f) is enhanced brightness and (g) is adjusted darkness; (h) is rotated 180°. In this experiment, the original data set was firstly augmented to about 10000 images, and then the training set, validation set and test set were randomly divided according to the ratio of 7:2:1, which were labeled with an image annotation tool of LabelImg and saved in PASCAL VOC.

Fig. 11. Data augmentation of images. (Online version in color.)

2.2.2. Experimental Environment and Evaluation Index

The training, validation and testing of models in this experiment were all run in the same environment. The specific experimental runtime environment settings are shown in Table 1.

Table 1. Experimental runtime environment settings.

Experimental environment configuration
Operating System	Windows10 (×64)
CPU	Intel Core i5 12400F
GPU	NVIDIA GeForce RTX 3080
Framework	PyTorch
Compile Software	Pycharm

To comprehensively and objectively evaluate the performance of the improved YOLO v5s model proposed in this paper, mAP (Mean Average Precision) and FPS (Frames per second) were used as metrics to measure the performance of the network model, while confidence of the test set was used as an aid to evaluate the model performance.

AP represents the evaluation index of each category, which is equal to the area of the area surrounded by P-R curve (Precision and Recall) and coordinate axis. As shown in Formula 1, mAP represents the average value of AP of all label categories. As shown in formula 2, which is used to measure the performance of the whole model. The larger the mAP value, the better the performance of the model. mAP@0.5 represents the mAP change curve when IoU threshold (Intersect over Union Threshold) is 0.5. The mAP@.5:.95 represents the average mAP at different IoU thresholds (from 0.5 to 0.95, step size is 0.05).

AP= ∫ 0 1 P( r ) dr

(1)

mAP= ∑ i=0 n AP( i ) n

(2)

In the formula: AP (i) - the accuracy of detection of type i; n - number of detected categories.

FPS represents the number of images detected by the model per second, as shown in Formula 3. The larger it is, the more images the model detects per second. ElapsedTime is the total running time for the model to detect an image.

FPS= 1 ElapsedTime

(3)

Confidence indicates that the model considers detection boxes as the credibility of a certain class. As shown in Formula 4, Pr is a probability of an object existing within the boundary. If it exists, the value is 1, otherwise it is 0. That is, when the confidence is greater than the set threshold, the model considers that the detection box has the target of the corresponding class. Otherwise, the detection box regarded as having no corresponding target is filtered. Therefore, the selection of the threshold will affect detection results. It can be seen that confidence level is positively correlated with detection accuracy. The higher the confidence level, the greater the probability that the detection box is judged as a positive sample; on the contrary, the lower the confidence level, the greater the possibility that the detection box is judged as a negative sample.

Confidence=Pr( Object ) ×Io U Pred Truth

(4)

3. Results and Analysis

3.1. Contrast Experiment

In order to test the detection performance of YOLO v5s, a contrast test of models based on two-stage algorithms Faster RCNN, SSD and one-stage algorithms YOLO v3-Tiny, YOLO v5s were set up, which had important guiding significance for final determination of using YOLO v5s for improvement.

The training parameters of the contrast experiment were set as follows: above models were trained for 50 epochs under the self-defined data set, initial learning rate was 0.001, batch size was set to 32, momentum was 0.937, and weight decay was set to 0.0005. The last five training weights of Faster RCNN and SSD were taken to verify the validation set and take the average value. The YOLO v3-Tiny, YOLO v5s and the improved model were verified by taking the best weight in training. The experimental results are shown in Table 2:

Table 2. Training and verification results of the contrast experiment.

Algorithm	Backbone Network	mAP@.5:.95	mAP@0.5	FPS (fps)
Faster RCNN	Resnet-50	0.775	0.989	14.2
SSD	Resnet-50	0.794	0.990	31.6
YOLO v3-Tiny	Darknet-53	0.720	0.967	104.2
YOLO v5s	Darknet-53	0.751	0.974	86.2
Ours	Darknet-53	0.814	0.991	86.2

From the experimental results in Table 2, it can be seen that the speed of the model based on the two-stage algorithm Faster RCNN is 14.2 fps, which is significantly lower than that of the one-stage algorithms such as SSD, YOLO series, etc. Its detection speed is far from the industrial detection requirements; YOLO v3-Tiny has the fastest detection speed of 104.2 fps, but it achieves the effect of increasing speed by reducing the number of feature layers in order to simplify the model, resulting in a significant reduction in detection accuracy, its mAP@.5:.95 is 0.720 and mAP@0.5 is 0.967, which cannot meet the accuracy requirements of defect detection. The accuracy of SSD is better than YOLO v5s, but the speed is less than half of it. Integrated accuracy value and FPS, the experiment finally chose YOLO v5s as the basic detection algorithm to improve, in order to be more accurate, fast detection and positioning of the center sparse target. From the final improvement effect, it can be seen that the improved model maintains the original model’s FPS of 86.2 while at the same time, mAP@0.5 is improved by 0.017, and mAP@0.5:0.95 is improved by 0.063.

3.2. Ablation Experiment

In order to test whether the proposed lightweight model incorporating the attention mechanism had a positive impact on YOLO v5s, an ablation experiment was set up, which was of great guiding significance for finalization of the improvement scheme of YOLO v5s network structure. This experiment was based on YOLO v5s, which gradually introduces CA attention and Slim-neck by GSConv for training, validation and testing.

The training parameters for the ablation experiment were set as follows: the training epochs were 50 under the self-defined data set, initial learning rate was 0.001, batch size was set to 32, momentum was 0.937, and weight decay was set to 0.0005. The training was also carried out with the same hyper-parameter settings on the unimproved YOLO v5s, and the optimal weights were selected for validation and test set to detect. The training and validation results of the ablation experiment are shown in Table 3:

Table 3. Training and validation results of the ablation experiment.

Model	Introduced module			Evaluation index
	CA		GSConv	mAP@0.5	FPS
	Backbone	Neck	Neck	mAP@0.5	FPS
1	—	—	—	0.9743	86.2
2	√	—	—	0.9715	87.3
3	—	√	—	0.9863	88.4
4	—	—	√	0.9867	88.4
5	√	—	√	0.9917	86.2
6	—	√	√	0.9888	87.0

‘—’ indicates that this module is not introduced, ’√’ indicates that this module is introduced

By comparing the results of the improved model with YOLO v5s, as shown in Table 3: When only CA is introduced, as shown in Model 2 and 3, mAP@0.5 of Model 2 decreases by 0.29%, mAP@0.5 of Model 3 has a large improvement of 1.23%, which indicates that when training this module introduced in Backbone, a field of view is enough to fit the data, too much attention is paid to the other information of the whole globe, resulting in a negative impact on its accuracy. However, training this module in Neck can increase the actual field of view range to make it close to the theoretical field of view and achieve effect of improving performance. When only Slim-neck by GSConv is introduced, as shown in Model 4, mAP@0.5 has a significant improvement, an increase of 1.27%; when CA and Slim-neck by GSConv are combined in the entire network structure, as shown in Model 5 and 6, mAP@0.5 is significantly improved, which is increased by 1.79% and 1.49%, respectively, both higher than Model 2, 3, 4, indicating that fusion of two modules has a more effective effect on improving detection accuracy than introduction of only one module. Compared with the speed of Model, 86.2 fps, the speed of Model 2, 3, 4 and 6 are all improved, and Model 5 is maintained. The key function of Backbone is feature extraction, and Neck is feature fusion, therefore, the CA mechanism can play a more effective role in Backbone part. After it, the channel dimension of the feature map reaches the maximum, the width and height dimensions reached the minimum, redundant repetitive information is less, and information that CA module can pay attention to is less, so the speed is improved. By synthesizing results of the accuracy and detection speed of models, it can be concluded that Model 5 can meet the needs of rapid detection while significantly improving the accuracy of detection, which shows that the improved model in this experiment is effective. From above can be analysed that when CA is added in the last layer of backbone, the attention mechanism only needs to pay local attention to the extracted features, extend the ability of the network, focus on a specific part of the input, so that the model is optimised to improve the accuracy of the algorithm while reducing the amount of computation to achieve the purpose of simplifying the model and accelerating the calculation; after this stage, concatenated feature maps are suitable for processing by is suitable for GSConv due to less redundant and repetitive information and no need to be compressed.

After training the above six models, the best weight of training is selected to detect 1000 images in the test set. The results are as follows:

Figures 12 and 13 show the confidence of the test set and the total time for loading the model and detecting the data set. Taking the detection results of Model 1 of the original YOLO v5s as a control, it can be seen from the above table that: the average accuracy of Model 2 and 3 with the introduction of the CA module is improved to 89.4% and 89.6%, the runtime of Model 2 is reduced, and the runtime of Model 3 is increased; the maximum confidence and average confidence of Model 4 with the introduction of Slim-neck by GSConv are reduced to 94.58%, 89.2%, but the runtime is improved by 7.09 s; the minimum confidence of Model 5 and 6, which fuse CA and Slim-neck by GSConv, is significantly improved by 43.2% and 39.5%. Comparing all the models: Model 5 has the highest minimum confidence and average confidence, the average confidence improved by 0.4% compared to Model 1, Model 6 has the lowest average confidence of 88.8%; Model 5 has the shortest runtime and the operation time of a single picture in the testing set is only 0.015 s, and Model 6 has the longest runtime, which indicates that Model 5 has better accuracy and faster speed than other models in the actual detection.

Fig. 12. Comparison of confidence of test set of six models. (Online version in color.)

Fig. 13. Running time of six models (the total time for loading the model and detecting the data set). (Online version in color.)

As shown in Fig. 14, the number of images in confidence intervals of 20%–60%, 60%–80%, and 80%–100% are counted. It can be seen that there is no low-value confidence for Model 3, 4, 5. Model 6 and 5 have the highest number of images in the confidence interval of 80%–100%, which indicates that Model 5 has the highest number of positive samples that comply with the confidence thresholds in the detection results, followed by Model 3, Model 4, Model 2, Model 1, and Model 6. In brief, the confidence level of Model 5 is significantly higher in the test set detection, which indirectly indicates that Model 5 performs best on the test set.

Fig. 14. The number of test results statistical histogram in confidence intervals of 20%–60%, 60%–80%, and 80%–100%. (Online version in color.)

Comprehensive results of the validation set and testing set can be concluded that the detection performance of Model 5 is the best, which has high accuracy while ensuring fast detection speed. It verifies the effectiveness of the improved model proposed in this paper, which is informative for the accuracy needs of industrial inspection and for realizing the real-time inspection needs of portable mobile.

4. Conclusions

Aiming at the detection of center porosity in round bloom, this paper establishes a data set of about 10000 images of center porosity defects and improves it based on YOLO v5s. The effectiveness of the improved model is verified by comparative experiments and ablation experiments.

(1) Establish a data set: At present, many researchers use deep learning to detect macrodefects in round bloom. But the sample size of the data set is small, with a high similarity between the test set samples and the training set samples, which cannot correctly reflect the false detection rate and missed detection rate of the model for actual product defect detection. Therefore, this paper collects a total of 898 center porosity images containing center porosity of different steel grades and different forms. Then the data set is established by image enhancement to about 10000 to simulate complex scenes, realize the diversification of data set samples, and improve the robustness of the model.

(2) Contrast test: Through the training and detecting data set of the center porosity defect with Faster RCNN, SSD, YOLO v3-Tiny and YOLO v5s, the experimental results show that the speed of the model based on two-stage target detection algorithm Faster RCNN is 14.2 fps, and models based on one-stage algorithms SSD, YOLO v3-Tiny and YOLO v5s are 31.6 fps,104.2 fps and 86.2 fps respectively. The model based on one-stage detection algorithm shows great advantages in detection speed, YOLO v3-Tiny having the fastest detection speed. However, mAP@0.5 of YOLO v3-Tiny is 2.2%, 2.3%, and 7% lower than other models. Based on the comprehensive accuracy and speed performance, YOLO v5s was finally selected as the basic detection model.

(3) Ablation experiment: Aiming at problems of complex network structure, many parameters, high configuration required for training and low number of frames per second (FPS) in real-time detection of YOLO v5s target detection, this paper introduces CA mechanism and Slim-neck by GSConv to YOLO v5s, and carries out 6 groups of ablation experiments. The experimental results show that the accuracy of the improved model is significantly improved. The mAP@0.5 of the verification set is increased to 0.9917, the speed is 86.2 fps, and the total running time of the test set is 290.03 s. It meets the requirements of round bloom’ defect detection accuracy and speed.

The improved model proposed in this paper has better detection performance and target recognition ability for defect in round bloom. In actual industrial detection, coupling deep learning and artificial experience, in-depth exploration is not only conducive to improving the technological process in continuous casting billet process but also provides an optimization model for real-time defect detection of continuous casting billet quality, which is helpful for the efficient and high-quality production development of continuous casting billet.

Acknowledgments

This work was supported financially by the National Natural Science Foundation of China (Nos. 51974003).

References

1) D. G. Xu, L. Wang and Fan. L: Comput. Eng. Appl., 57 (2021), 10 (in Chinese). https://kns.cnki.net/kcms/detail/11.2127.TP.20210126.1445.006.html
2) J. W. Wu, J. Q. Yan, Z. H. Fang and Y. Xia: J.Iron Steel Res., 24 (2012), 59 (in Chinese). https://doi.org/10.13228/j.boyuan.issn1001-0963.2012.09.002
3) N. Neogi, D. K. Mohanta and P. K. Dutta: Image.J. Inst. Eng. (India) Series B, 98 (2017), 557. https://doi.org/10.1007/s40031-017-0296-2
4) H. G. Liang, C. Zuo and W. M. Wang: IEEE Access, 8 (2020), 38448. https://doi.org/10.1109/access.2020.2974798
5) X. Q. Zhuang, Z. Liu and H. Liu: Ind.Control Comput., 31 (2018), 77 (in Chinese).
6) M. J. Li, H. Wang and Z. B Wang: Comput. Electr. Eng., 102 (2022), 108208. https://doi.org/10.1016/j.compeleceng.2022.108208
7) V. Badrinarayanan, A. Kendall, R. Cipolla and S. Member: IEEE Trans. Pattern Anal. Mach. Intell., 39 (2017), 2481. https://doi.org/10.1109/tpami.2016.2644615
8) R. Girshick, J. Donahue, T. Darrell, J. Malik and U. Berkeley: Proc. IEEE Conf. Comput. Vis. Pattern Recog., (2014), 580. https://doi.org/10.1109/CVPR.2014.81
9) S. Q. Ren, K. M. He, R. Girshick and J. Sun: IEEE Trans. Pattern Anal. Mach. Intell., 39 (2017), 1137. https://doi.org/10.48550/arXiv.1506.01497
10) R. Girshick: Proc. IEEE Int. Conf. Comput. Vis., (2015), 1440. https://doi.org/10.48550/arXiv.1504.08083
11) J. Redmon, S. Divvala, R. Girshick and A. Farhadi: Proc. IEEE Conf. Comput. Vis. Pattern Recog., (2016), 779. https://doi.org/10.48550/arXiv.1506.02640
12) J. Redmon and A. Farhadi: Proc. IEEE Conf. Comput. Vis. Pattern Recog., (2017), 6517. https://doi.org/10.48550/arXiv.1612.08242
13) J. Redmon and A. Farhadi: Proc. IEEE Conf. Comput. Vis. Pattern Recog., (2018), 02767. https://doi.org/10.48550/arXiv.1804.02767
14) A. Bochkovskiy, C. Y. Wang and H. Y. M. Liao: Proc. IEEE Conf. Comput. Vis. Pattern Recog., (2020), 10934. https://doi.org/10.48550/arXiv.2004.10934
15) W. Liu, D. Anguelov, D. Erhan, C. Szegedy, Scott Reed, C.Y. Fu and A.C. Berg: Proc. Eur. Conf. Comput. Vis., (2016), 21. https://doi.org/10.48550/arXiv.1512.02325
16) K. Ning, D. B. Zhang and F. Yin: J. Image Graphics, 24 (2019), 1358 (in Chinese).
17) K. M. He, X. Y. Zhang, S. Q. Ren and J. Sun: IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015), 1904. https://doi.org/10.48550/arXiv.1406.4729
18) S. Liu, L. Qi, H. F. Qin, J. P. Shi and J. Y Jia: Proc. IEEE Conf. Comput. Vis. Pattern Recog., (2018), 8759. https://doi.org/10.48550/arXiv.1803.01534
19) A. Neubeck and L. Van Gool: 18th Int. Conf. Pattern Recog., 3 (2006), 850. https://doi.org/10.1109/ICPR.2006.479
20) J. Hu, L. Shen, S. Albanie, G. Sun and E. Wu: IEEE Trans. Pattern Anal. Mach. Intell., 42 (2020), 2011. https://arxiv.org/abs/1709.01507v4
21) S. Woo, J. Park, J. Lee and I. S. Kweon: Proc. Eur. Conf. Comput. Vis., 11211 (2018), 3. https://doi.org/10.1007/978-3-030-01234-2_1
22) Q. Hou, D. Zhou and J. Feng: Proc. IEEE Conf. Comput. Vis. Pattern Recog., (2021), 13708. https://doi.org/10.1109/CVPR46437.2021.01350
23) V. Mnih, N. Heess, A. Graves and K. Kavukcuoglu: Proc. IEEE Conf. Comput. Vis. Pattern Recog., (2014), 6247. https://doi.org/10.48550/arXiv.1406.6247
24) G. S. Gao: Comput.Eng.Appl., 58 (2022), 9 (in Chinese). https://kns.cnki.net/kcms/detail/11.2127.TP.20220124.1214.004.html
25) A. G. Howard, M. L. Zhu, B. Chen, D. Kalenichenko, W. J. Wang, T. Wey, M. Andreetto and H. Adam: Proc. IEEE Conf. Comput. Vis. Pattern Recog., (2017), 04861. https://doi.org/10.48550/arXiv.1704.04861
26) M. Sandler, A. Howard, M.L. Zhu, A. Zhmoginov and L.C. Chen: Proc. IEEE Conf. Comput. Vis. Pattern Recog., (2018), 4510. https://doi.org/10.48550/arXiv.1801.04381
27) A. Howard, M. Sandler, G. Chu, L. C. Chen, B. Chen, M. X. Tan, W. J. Wang, Y. K. Zhu, R. M. Pang, V. Vasudevan, Q. V. Le and H. Adam: Proc. IEEE Int. Conf. Comput. Vis., (2019), 1314. https://doi.org/10.48550/arXiv.1905.02244
28) X. Y. Zhang, X. Y. Zhou, M. X. Lin and J. Sun: Proc. IEEE Conf. Comput. Vis. Pattern Recog., (2018), 6848. https://doi.org/10.48550/arXiv.1707.01083
29) K. Han, Y. H. Wang, Q. Tian, J. Y. Guo, C. J. Xu and C. Xu: Proc. IEEE Conf. Comput. Vis. Pattern Recog., (2022), 1580. https://doi.org/10.48550/arXiv.1911.11907
30) H. L. Li, J. Li, H. B. Wei, Z. Liu, Z. F. Zhan and Q. L. Ren: Proc. IEEE Conf. Comput. Vis. Pattern Recog., (2022), 02424. https://doi.org/10.48550/arXiv.2206.02424

Corresponding author

Register with J-STAGE for free!