ISIJ International
Online ISSN : 1347-5460
Print ISSN : 0915-1559
ISSN-L : 0915-1559
Regular Article
Strip Steel Surface Defect Classification Method Based on Enhanced Twin Support Vector Machine
Maoxiang Chu Rongfen GongAnna Wang
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML

2014 Volume 54 Issue 1 Pages 119-124

Details
Abstract

The strip steel surface defect classification belongs to multi-class classification. It demands high classification accuracy and efficiency. However, traditional methods are not fit for abnormal datasets, such as the large-scale, sparse, unbalanced and corrupted dataset. So a novel classification method is proposed in this paper based on enhanced twin support vector machine (TWSVM) and binary tree. According to the density information, the large-scale dataset is pruned, the sparse dataset is added with unlabeled samples, and TWSVM is improved to multi-density TWSVM (MDTWSVM) which has efficient successive overrelaxation (SOR) algorithm. Finally, MDTWSVM and binary tree are combined together to realize multi-class classification. Some experiments are done on the strip steel surface defect datasets with the proposed algorithm. Experimental results show that MDTWSVM has higher accuracy and efficiency than the other methods of multi-class classification for the strip steel surface defect.

1. Introduction

The surface defect classification is an important step in the inspection techniques for the strip steel surface quality, because effective classification can not only improve the quality but also enhance the competitiveness of the strip steel products.1) As there are many kinds of defects during the producing and processing, such as scarring, dent, scratch, dirt, hole, damage and edge-crack, the strip steel surface defect classification belongs to multi-class classification problem. These defects should be completely classified on line so that the workers can deal with them as fast as possible. Moreover in the real classification condition, the defect samples are always unbalanced. That is to say, there are much more common defect samples than uncommon ones. And noise problems should be considered in real classification condition because noise samples can affect the performance of the classifier and even generate the classification error. So the main problem for the strip steel surface defect multi-class classification is how to get high-accuracy and high-efficiency classification results especially for large-scale, sparse, unbalanced and corrupted training datasets.

There are many methods for the strip steel surface defect classification,1,2,3,4,5,6,7) while support vector machine (SVM) is proved to be the most effective classifier. In references4,5,6,7) the standard or improved SVM is used to classify the strip steel surface defects, which gets better classification results compared to the other methods. However, there exists a conflict between the accuracy and efficiency in real application of SVM. So some novel SVMs8,9,10,11) are appeared with the development of technology, such as fuzzy support vector machine, granular support vector machine, generalized eigenvalue proximal support vector machine (GEPSVM), and twin support vector machine (TWSVM). Among them the TWSVM is being widely used,12,13,14) because not only it is approximately four times faster than the standard SVM but also it is fit for the cross plane dataset.

A novel method is proposed in this paper, which is more effective and accurate especially for large-scale, sparse, unbalanced and corrupted training datasets. This novel method is based on density information of training samples, where the large-scale training dataset is pruned, the sparse training dataset is added with the unlabeled samples, and the TWSVM is enhanced to multi-density TWSVM (MDTWSVM). Then the MDTWSVM, successive overrelaxation (SOR) technique and binary tree are combined to realize high-accuracy and high-efficiency multi-class classification for the strip steel surface defects.

This paper is structured as follows. In section 2 the TWSVM is introduced briefly. Section 3 focuses on extracting the density information for training datasets and processing large-scale and sparse training datasets. In section 4 the novel MDTWSVM are not only described, but also combined with SOR and binary tree to realize the multi-class classification. In section 5 some testing experiments are done and corresponding experimental results are described. Some conclusions are drawn in section 6.

2. TWSVM

TWSVM is proposed by Javadeva in 2007, which is originated from GEPSVM. It generates two non-parallel planes such that each plane is closer to one of two classes and is as far as possible from the other.11) The TWSVM is four times faster than the standard SVM, for which it solves two smaller sized quadratic programming problems (QPPs) rather than one large QPP.11) TWSVM is described as the following two primal QPPs:   

min w 1 , b 1 1 2 i=1 m 1 (K( A i ',C') w 1 + b 1 ) 2 + c 1 j=1 m 2 ξ j s.t.K( B j ',C') w 1 + b 1 -1+ ξ j                             ξ j 0,j=1,2, m 2 (1)
  
min w 2 , b 2 1 2 j=1 m 2 (K( B j ',C') w 2 + b 2 ) 2 + c 2 i=1 m 1 η i s.t.(K( A i ',C') w 2 + b 2 )1- η i                             η i 0,i=1,2, m 1 (2)

Where sample matrix A = [A1 A2 … Am1]' belongs to class +1, sample matrix B = [B1 B2 … Bm2]' belongs to class –1, matrix C = [A'B']', and K is any discretionary kernel. By introducing Lagrangian vectors, the Wolfe’s dual of QPPs (1) and (2) can be given just like the following:   

min α 1 2 α'G (H'H) -1 G'α- e 2 'α s.t.   0α c 1 (3)
  
min β 1 2 β'H (G'G) -1 H'β- e 1 'β s.t.   0β c 2 (4)

Where H = [K(A,C') e1] and G = [K(B,C') e2]. QPPs (3) and (4) determine two nonparallel kernel-generated surfaces K(x',C')w1 + b1 = 0, K(x',C')w2 + b2 = 0. By using the following linear kernel K(x',C') = x',C', and defining the equations u1 = C'w1 and u2 = C'w2, the linear TWSVM classification hyper-planes x'u1 + b1 = 0 and x'u2 + b2 = 0 can be gotten. A new data sample x is assigned to the class +1 or –1 depending on which of the two hyper-planes lies closer to x in terms of perpendicular distance.11)

3. Enhanced Methods for Training Dataset

TWSVM is more effective and accurate than the standard SVM. However, the classification results of TWSVM are not satisfied for the real time requirements in the strip steel surface defects multi-class classification. Moreover, the TWSVM can not tackle the large-scale, sparse, unbalanced and corrupted training datasets well. In order to solve these problems, the density information of every sample in each defect dataset is extracted and used. In detail, the large-scale dataset is pruned, and the sparse dataset is added with unlabeled samples according to that density information. All these methods are applied to TWSVM to enhance the performance of the TWSVM and realize optimal classification results.

3.1. Density Information of Training Dataset

Kernel density estimation (KDE) is a well developed statistic field.15,16) It studies the feature of the data distribution from the data sample itself, which can be used in many real problems.16) In this paper, KDE method is used to estimate density information of each sample in the training dataset. Consider a training dataset matrix X = [X1 X2Xn]', and suppose that Ωi is the nearest neighbor region with radius r and central point Xi, then density information of each sample in the training dataset can be defined following:   

d i = X t Ω i t=1,2n exp(- || X t - X i | | 2 (r/2) 2 )             i=1,2,,n (5)

Where di reflects density information of the sample Xi, and r is set according to the real problem by the user. The bigger di, the more important the sample Xi in the training dataset. It can be seen from the Eq. (5), the density information of the sample Xi is determined by all samples in {Xt|Xt ∈Ωi, t = 1,2…n}. The closer the sample Xt is to the central sample Xi, the bigger the density contribution is to Xi. That is to say, the density information di is determined by the similarity between Xi and Xt in region Ωi.

Density information di reflects the importance of sample Xi, which can be effectively used when the classifier deals with the training dataset. Take the training dataset with noise samples for example, the density value of noise sample is very small because the noise sample is isolated and the samples which are similar with it are very few. According to the density value, the noise samples of the training dataset can be deleted.

3.2. Pruning Samples for Training Dataset

For large-scale training dataset, it is necessary to prune samples for improving its classification efficiency. The pruning principle is not only to ensure the least effect on sample distribution of the training dataset, but also to reduce the number of the sample. In this paper the method that uses density information of large-scale dataset to guide the pruning process is proposed.

Consider a matrix B = [B1 B2Bm2]' belonging to large-scale training dataset, and suppose that Ωl is the nearest neighbor region with radius r and the central point Bl, and dl is the density information of sample Bl. Firstly, all density information {dl|l = 1,2,…m2} is sorted in order. Then the sample Bl whose density information dl is smaller than the threshold d is pruned from the large-scale training dataset. The pruned training dataset B ¯ =[ B ¯ 1 B ¯ 2 B ¯ m - ] ' is obtained. Finally, set the density information of the pruned samples from B to be 0 and the other densities to be d j - of B ¯ . dl and d j - are calculated just like the following:   

d j - { d l | I l =1,      l=1,2,, m 2 }            j=1,2,, m - where             I l ={ 1             d l >d 0             d l d             l=1,2,, m 2                                                  d l = B t Ω l exp(- || B t - B l | | 2 (r/2) 2 )            l=1,2,, m 2 (6)

Where, the threshold d is determined according to the real problem by user. It can be seen from the above formulas all pruned samples provide the contribution to density information of B ¯ . Both the classification efficiency and accuracy can be improved by using the density information d j - for large-scale training dataset.

3.3. Adding Unlabeled Samples for Training Dataset

For sparse training dataset, its complete sample distribution can not be reflected because there are not enough necessary samples in that dataset. To solve this problem, a novel method of selecting the samples from the vast of unlabeled samples and inserting these samples into the sparse training dataset is proposed in this paper. Unlabeled samples are added to the sparse training dataset by active learning17) and their density information is used to guide the process.

Consider a matrix A = [A1 A2Am1]' belonging to sparse training dataset, and suppose that Ωi is the nearest neighbor region with radius r and central point Ai, and di is the density information of sample Ai. Firstly, the unlabeled sample Xp in region Ωi will be selected. Then the sample Xp becomes a labeled sample of the training dataset, which can enhance the density of the sample Ai. Secondly, the center Ac of the training set will be calculated. Then a sphere region Ωc with the central point Ac and radius R will be constructed. The unlabeled sample Xq that is not in the region Ωi but in the region Ωc will be selected. The sample Xq becomes a labeled sample of the training dataset. Then insert the new labeled sample Xq into the training dataset A to obtain the new training dataset A ¯ =[ A ¯ 1 A ¯ 2 A ¯ m 1 A ¯ m 1 +1        A ¯ m 1 +2 A ¯ m + ]'= [ A 1 A 2 A m 1 X 1        X 2 X m + - m 1 ]' . Finally, reconstruct Ωi and calculate the sample density d i + for the new training dataset A ¯ . The parameters, such as Ac, R, di and d i + , are calculated just like the following:   

A c = 1 m 1 i=1 m 1 A i ,                     R= max i=1,2 m 1 || A i - A c ||-r (7)
  
d i + ={     d i + X p Ω i exp(- || X p - A i | | 2 (r/2) 2 )          i=1,2,, m 1 X t Ω i exp(- || X t - X i- m 1 | | 2 (r/2) 2 )    + A t Ω i exp(- || A t - X i- m 1 | | 2 (r/2) 2 )          i= m 1 +1, m 1 +2,, m + where             d i = A t Ω i exp(- || A t - A i | | 2 (r/2) 2 )          i=1,2,, m 1 (8)

The sample distribution of sparse training dataset will be improved by adding unlabeled samples. It can be seen clearly from the above formulas that these added unlabeled samples enhanced the density of not only the original samples but also the integral samples of training dataset. The density information d i + can be used to improve classification accuracy for spare training dataset.

4. Multi-density TWSVM and Multi-class Classification Algorithm

4.1. Multi-density TWSVM

In order to get high-accuracy and high-efficiency multi-class classification results especially for large-scale, sparse, unbalanced and corrupted training datasets, TWSVM is improved to be multi-density TWSVM based on enhanced methods in section 3.

Consider the following binary classification problem. Suppose A = [A1 A2Am1]' is a labeled sparse training dataset, which belongs to the class +1. It becomes A ¯ =[ A ¯ 1 A ¯ 2 A ¯ m + ]' after being added by the new labeled samples. And suppose B = [B1 B2Bm2]' is a labeled large-scale training dataset, which belongs to the class –1. And the matrix B ¯ =[ B ¯ 1 B ¯ 2 B ¯ m - ]' is a pruned training dataset on the base of B. Based on what mentioned above, MDTWSVM is described as the following two primal QPPs:   

min w 1 , b 1     1 2 i=1 m + d i + (K( A ¯ i ',C') w 1 + b 1 ) 2 + c 1 j=1 m - d j - η j s.t.            K( B ¯ j ',C') w 1 + b 1 -1+ η j ,             η j 0,            j=1,2,, m - (9)
  
min w 2 , b 2     1 2 j=1 m - d j - (K( B ¯ j ',C') w 2 + b 2 ) 2 + c 2 ( i=1 m + d i + ξ i ) s.t.            K( A ¯ i ',C') w 2 + b 2 1- ξ i ,             ξ i 0,            i=1,2,, m + (10)

Where C'=[ A ¯ '       B ¯ '] , and c1 and c2 represent penalty factors. d i + is density information of sample in training dataset A ¯ . d j - is density information of sample in training dataset B ¯ . d j - and d i + can be obtained by using Eqs. (6) and (8). Optimizing the primal QPP (9) of MDTWSVM tends to make the hyper-plane close to class +1 and far to class –1 at a distance of at least 1. However, for class +1, the samples with big density values are possible to be near the hyper-plane and the samples with small density values have little effect on the hyper-plane because of added density parameters. At the same time, for the class –1, the introduction of density parameters makes it possible for the hyper-plane to be at a distance of at least 1 from the samples with big density values. Multi-density information of the training samples can effectively reduce effect of noise samples and improve the classification accuracy. The primal QPP (10) of MDTWSVM has similar results.

The Lagrangian corresponding to the primal QPP (9) of MDTWSVM is given:   

L( w 1 , b 1 , α j , β j , η j )= 1 2 i=1 m + d i + (K( A ¯ i ',C') w 1 + b 1 ) 2 + c 1 j=1 m - d j - η j                                                                            + j=1 m - α j (K( B ¯ j ',C') w 1 + b 1 +1- η j )- j=1 m - β j η j (11)

Using Karush-Kuhn-Tucker conditions and a serial similar process to TWSVM in reference,11) the Wolfe dual of the primal QPP (9) of MDTWSVM is obtained:   

min α        1 2 α'G (H' D + H) -1 G'α- e 2 'α s.t.            0α c 1 D - e 2 (12)

Where α = (α1 α2, …, αm)', H=[K( A ¯ ,C')       e 1 ] and G=[K( B ¯ ,C')       e 2 ] . D+ is a diagonal matrix with the diagonal element d i + D is a diagonal matrix with the diagonal element d j - . The corresponding hyper-plane K(x',C')w1 + b1 = 0 can be obtained through the primal QPP (12) of MDTWSVM. And the parameters w1 and b1 can be calculated with the following formula:   

[ w 1 b 1 ]=- (H' D + H) -1 G'α (13)

In a similar way, the Wolfe’s dual of the primal QPP (10) of MDTWSVM can be obtained, and the hyper-plane K(x',C')w1 + b1 = 0 is solved by following:   

min μ        1 2 μ'H (G' D - G) -1 H'μ- e 1 'μ s.t.            0μ c 2 D + e 1 (14)
  
[ w 2 b 2 ]= (G' D - G) -1 H'μ (15)

Where μ = (η1 η2, …, ηm+)'. A new data sample x is assigned to the class +1 or –1 depending on which of the two hyper-planes lies closer to x in terms of perpendicular distance.   

i=arg  min k=1,2 |K(x',C') w k + b k | || w k || ,  x{ class +1   if i=1 class -1   if i=2 (16)

4.2. SOR Technique for MDTWSVM

In MDTWSVM, there are two QPPs (12) and (14) to be solved. The primal QPP (12) is rewritten in the following form by defining U = G(H'D+H)–1G', c = c1De2 and φ = α:   

min α        1 2 φ'Uφ-e'φ s.t.            0φc (17)

SOR for quadratic programs is a high effective iterative method, which is considerably faster than other methods. It has been proved that this algorithm converges linearly to a solution.18) In reference,18) the SOR technique has been used effectively to improve the efficiency of solving QPP. The paper also employs the SOR technique to solve the primal QPP (17). The iterative equations of the SOR for the QPP (17) are just like the following:   

φ j k+1 = φ j k -ρ E jj -1 ( l=1 j-1 U jl φ l k+1 + l=j m U jl φ l k -1 )    j=1,2,, m - (18)

Where ρ∈(0,2). Ejj is the diagonal element of matrix U. k is the iterative times. It can be seen from the Eq. (18) that the iterative process begins with an initial φ0, and then φk+1 is calculated from φk, until||φk+1φk|| is less than some prescribed tolerance ε. During the iterative process, φk+1 is restricted within the interval [0,c].

The primal QPP (14) is also rewritten as QPP (17) by defining U = H(G'DG)–1H', c = c2D+e1 and φ = μ, which can be calculated with the Eq. (18).

4.3. Multi-class Classification Algorithm

The strip steel surface defect classification belongs to multi-class classification problem. There are many multi-class classification methods based on binary classifier, such as one-against-one, one-against-rest, decision directed acyclic graph and binary tree.19) And binary tree is most widely used. The basic binary tree model for seven classes is shown in Fig. 1.

Fig. 1.

Binary tree for seven classes.

It can be seen from Fig. 1 that the binary tree improves the classification efficiency by constructing 6 classifiers for seven classes. However, the training dataset may be unbalanced for every binary classifier. So it is necessary to solve the unbalanced problem when the binary tree is used to realize multi-class classification. In this paper the binary tree and binary classifier MDTWSVM are combined. In MDTWSVM, the class +1 represents the sparse training dataset A and the class –1 represents the large-scale training dataset B. When every node classifier of binary tree is constructed, it should be confirmed to be class +1 or class –1 according to the number of samples in training dataset. Suppose there are N classes for strip steel surface defects in training dataset, the multi-class classification algorithm by combining binary tree and binary MDTWSVM is obtained. The algorithm steps are just like the following:

Step 1: analyze training datasets of N classes and decide sparse and large-scale training datasets. Then select rational parameters, such as radius r and pruning threshold d. And add unlabeled samples to the sparse training dataset and prune samples for the large-scale training dataset to get new training datasets based on enhanced methods in section 3.

Step 2: select one class of samples as training dataset 1, and the others as training dataset 2 for the current node of binary tree. Then determine the sparse training dataset A of class +1 and the large-scale training dataset B of class –1 according to the numbers of the training datasets 1 and 2.

Step 3: obtain new added training dataset A ¯ from A and the new pruned training dataset B ¯ from B according to the information of step 1.

Step 4: calculate density information d j - and d i + for training datasets B ¯ and A ¯ according to Eqs. (6) and (8) respectively.

Step 5: select rational parameters c1, c2 and kernel function K.

Step 6: define G, H, D+, D, U and select parameter ρ. And calculate Lagrangian vectors α and μ according to Eq. (18).

Step 7: calculate parameters of the two nonparallel hyper-planes with Eqs. (13) and (15) to obtain the classifier for this node.

Step 8: repeat steps from 2 to 7 until obtain all the nodes classifiers of binary tree.

Step 9: According to the binary tree, calculate every node classifier by Eq. (16) until a new defect sample falls into one of N classes.

5. Experiments

5.1. Description of Strip Steel Surface Defect Datasets

It is important to obtain a complete database of strip steel surface defect before classifying. It is well known that computer vision technology has been widely used in the strip steel surface defect detection based on charged coupled device (CCD). All experimental images of the strip steel surface defect are obtained by using CCD surface detection system of national large steel plants in this paper. Four classes of defect images with large-scale samples and three classes with sparse samples are selected. These defects are scarring, dent, scratch, dirt, hole, damage and edge-crack, which are shown in Fig. 2.

Fig. 2.

Images of seven surface defects.

A defect sample is a multi-dimensional vector, which is obtained through extracting features of defect image. These features have immediate impact on the classification accuracy and efficiency. These defect features including geometry, gray, projection, texture and frequency-domain features20) are often extracted. And each of these features involves multi-dimensional information. In this paper there are 64 defect features for each defect image, which compose a 64-dimensional defect sample. In order to ensure the efficiency, manifold learning algorithm21) is used to reduce the dimension to be 34. So each sample is a 34-dimensional vector.

In this paper, 2330 images are selected with seven classes of defect images, which become 34-dimensional samples through feature extraction and dimensionality reduction. These samples are provided to do testing classification experiments. Then they are divided into training dataset and testing dataset randomly. The dividing results are shown in Table 1.

Table 1. Different surface defect data samples.
Defect typeNumber of training samplesNumber of testing samples
Scarring47753
Dent46852
Scratch45050
Dirt45951
Hole7218
Damage8119
Edge-crack6317

5.2. Parameters Analysis for MDTWSVM

As discussed in section 3 and 4, six parameters of MDTWSVM algorithm, such as r, d, ρ, c1, c2 and K, will affect the performance of multi-class classification algorithm. It is necessary to consider how to select the optimal values for these parameters when MDTWSVM algorithm is used to realize the strip steel surface defect multi-class classification. Parameter r is the radius of nearest neighbor region that affects density information of all samples in training dataset. The smaller r, the more independent the sample, and vice verse. Parameter d is pruning threshold for pruning samples, which affects pruning ratio. Too small d is not satisfied the demand of pruning ratio, and too large d affects the classification accuracy. Parameter ρ is the slack factor of the SOR algorithm, which affects the convergence. When penalty factors c1 and c2 are optimal the performance of the MDTWSVM algorithm is satisfied. Radial basis function (RBF) is often used as kernel function K of MDTWSVM. For RBF, the corresponding parameter is the kernel radius δ that directly affects the classification accuracy of MDTWSVM.

In this paper, all training datasets are firstly normalized and then six parameters are determined. In general, the radius r and the threshold d should be selected from 0.2 to 0.4. Slack factor ρ should be defined in the interval [0,2]. Penalty factors c1 and c2 are obtained from 2–10 to 25. And kernel radius δ is from 2–10 to 24. These parameters can be optimized by using some usual methods, such as recursive exhaustion technique, gradient descent method and genetic algorithm. The tenfold cross-validation method16,18) is used to obtain these optimal parameters in this paper.

5.3. Experiments and Results Analysis

Four experiments need to be done to testify the efficiency of multi-class classification algorithm with MDTWSVM and binary tree for the strip steel surface defect. Firstly, 3 experiments are used to test the effects on which the added sparse sample has, the pruned large-scale sample has, and the density information algorithm has respectively. And the last one is to make comparisons among 3 different classifiers. All experimental results are shown in order in the following content.

Firstly, corresponding to different pruning ratio for large-scale training dataset, different testing results are shown in Table 2. To mention that the pruning ratio is the ratio of the number of pruned data samples to that of the original data samples. From Table 2, it is clear that the larger the ratio, the shorter the training and testing time, but the lower the classification accuracy. Pruning ratio should be appropriately selected by considering both time and accuracy in real classification problem. So the pruning ratio is determined as 0.40 in this paper.

Table 2. Testing results with different pruning ratio.
Pruning ratioTraining time (s)Testing time (s)Accuracy (%)
0.002.64470.275395.38
0.301.45550.225195.00
0.401.12430.090294.62
0.500.87130.092089.23
0.700.13400.077876.92
0.800.06180.036261.15

Secondly, corresponding to different adding ratio for sparse training dataset, different testing results are shown in Table 3. To mention that the adding ratio is the ratio of the number of added unlabeled samples to that of the original samples. It is necessary to point out that the experiment is made when the pruning ratio is 0.4 for large-scale training dataset. It can be seen from Table 3 that the larger the adding ratio, the longer the training and testing time. However, classification accuracy is firstly high then low as the ratio becomes large. So the adding ratio is determined as 0.60 in this paper.

Table 3. Testing results with different adding ratio.
Adding ratioTraining time (s)Testing time (s)Accuracy (%)
0.001.12430.090294.62
0.301.18650.099695.00
0.501.22130.102296.15
0.601.26230.106896.92
0.701.30400.110996.54
0.901.36180.114194.23

Thirdly, in order to test the efficiency of the density information parameter in MDTWSVM, classification accuracy is tested on training dataset with multi-density (MD) parameter or not respectively. It can be seen from Table 4 that the accuracy is much higher in using MD than not using MD. So combining adding unlabeled samples for sparse training dataset, pruning samples for large-scale training dataset and using multi-density information, can enhance classification accuracy effectively.

Table 4. Classification accuracy of different defects.
Defect typeAccuracy (%)
Not using MDUsing MD
Scarring94.3498.11
Dent96.15100
Scratch90.0096.00
Dirt90.2096.08
Hole94.4594.45
Damage89.4794.74
Edge-crack88.2494.12

Finally, in order to verify the efficiency of the algorithm proposed in this paper, comparative experiments are made with MDTWSVM, standard SVM and TWSVM. In order to be fair, the same binary tree is used in these different methods. The final 3 results are shown in Table 5. It is clear that the MDTWSVM proposed in this paper is much better than the others on time cost and classification accuracy.

Table 5. Testing results of three classifiers.
ClassifierTraining time(s)Testing time (s)Accuracy (%)
SVM91.15121.188692.31
TWSVM11.37070.688193.85
MDTWSVM1.26230.106896.92

6. Conclusions

A novel multi-class classification algorithm for the strip steel surface defects is propose based on density information of training dataset in this paper. Density information is used to guide these processes of pruning large-scale training dataset, adding unlabeled samples for sparse training dataset and improving TWSVM to MDTWSVM. And the SOR method is used to resolve the QPPs of MDTWSVM. The novel method proposed in this paper is fit for multi-class, large-scale, sparse, unbalanced and noised sample dataset. Finally, the results of experiments made on the strip steel surface defect dataset show that MDTWSVM combining binary tree is of perfect classification accuracy and efficiency.

References
 
© 2014 by The Iron and Steel Institute of Japan
feedback
Top