Strip Steel Surface Defect Classification Method Based on Enhanced Twin Support Vector Machine

Maoxiang Chu; Rongfen Gong; Anna Wang

doi:10.2355/isijinternational.54.119

Abstract

The strip steel surface defect classification belongs to multi-class classification. It demands high classification accuracy and efficiency. However, traditional methods are not fit for abnormal datasets, such as the large-scale, sparse, unbalanced and corrupted dataset. So a novel classification method is proposed in this paper based on enhanced twin support vector machine (TWSVM) and binary tree. According to the density information, the large-scale dataset is pruned, the sparse dataset is added with unlabeled samples, and TWSVM is improved to multi-density TWSVM (MDTWSVM) which has efficient successive overrelaxation (SOR) algorithm. Finally, MDTWSVM and binary tree are combined together to realize multi-class classification. Some experiments are done on the strip steel surface defect datasets with the proposed algorithm. Experimental results show that MDTWSVM has higher accuracy and efficiency than the other methods of multi-class classification for the strip steel surface defect.

1. Introduction

The surface defect classification is an important step in the inspection techniques for the strip steel surface quality, because effective classification can not only improve the quality but also enhance the competitiveness of the strip steel products.¹⁾ As there are many kinds of defects during the producing and processing, such as scarring, dent, scratch, dirt, hole, damage and edge-crack, the strip steel surface defect classification belongs to multi-class classification problem. These defects should be completely classified on line so that the workers can deal with them as fast as possible. Moreover in the real classification condition, the defect samples are always unbalanced. That is to say, there are much more common defect samples than uncommon ones. And noise problems should be considered in real classification condition because noise samples can affect the performance of the classifier and even generate the classification error. So the main problem for the strip steel surface defect multi-class classification is how to get high-accuracy and high-efficiency classification results especially for large-scale, sparse, unbalanced and corrupted training datasets.

There are many methods for the strip steel surface defect classification,^{1,2,3,4,5,6,7)} while support vector machine (SVM) is proved to be the most effective classifier. In references^4,5,6,7) the standard or improved SVM is used to classify the strip steel surface defects, which gets better classification results compared to the other methods. However, there exists a conflict between the accuracy and efficiency in real application of SVM. So some novel SVMs^8,9,10,11) are appeared with the development of technology, such as fuzzy support vector machine, granular support vector machine, generalized eigenvalue proximal support vector machine (GEPSVM), and twin support vector machine (TWSVM). Among them the TWSVM is being widely used,^12,13,14) because not only it is approximately four times faster than the standard SVM but also it is fit for the cross plane dataset.

A novel method is proposed in this paper, which is more effective and accurate especially for large-scale, sparse, unbalanced and corrupted training datasets. This novel method is based on density information of training samples, where the large-scale training dataset is pruned, the sparse training dataset is added with the unlabeled samples, and the TWSVM is enhanced to multi-density TWSVM (MDTWSVM). Then the MDTWSVM, successive overrelaxation (SOR) technique and binary tree are combined to realize high-accuracy and high-efficiency multi-class classification for the strip steel surface defects.

This paper is structured as follows. In section 2 the TWSVM is introduced briefly. Section 3 focuses on extracting the density information for training datasets and processing large-scale and sparse training datasets. In section 4 the novel MDTWSVM are not only described, but also combined with SOR and binary tree to realize the multi-class classification. In section 5 some testing experiments are done and corresponding experimental results are described. Some conclusions are drawn in section 6.

2. TWSVM

TWSVM is proposed by Javadeva in 2007, which is originated from GEPSVM. It generates two non-parallel planes such that each plane is closer to one of two classes and is as far as possible from the other.¹¹⁾ The TWSVM is four times faster than the standard SVM, for which it solves two smaller sized quadratic programming problems (QPPs) rather than one large QPP.¹¹⁾ TWSVM is described as the following two primal QPPs:

min w 1 , b 1 1 2 ∑ i=1 m 1 (K( A i ',C') w 1 + b 1 ) 2 + c 1 ∑ j=1 m 2 ξ j s.t. K( B j ',C') w 1 + b 1 ≤-1+ ξ j ξ j ≥0, j=1,2,⋅⋅⋅ m 2

(1)

min w 2 , b 2 1 2 ∑ j=1 m 2 (K( B j ',C') w 2 + b 2 ) 2 + c 2 ∑ i=1 m 1 η i s.t. (K( A i ',C') w 2 + b 2 )≥1- η i η i ≥0, i=1,2,⋅⋅⋅ m 1

(2)

Where sample matrix A = [A₁ A₂ … A_m₁]' belongs to class +1, sample matrix B = [B₁ B₂ … B_m₂]' belongs to class –1, matrix C = [A'B']', and K is any discretionary kernel. By introducing Lagrangian vectors, the Wolfe’s dual of QPPs (1) and (2) can be given just like the following:

min α 1 2 α'G (H'H) -1 G'α- e 2 'α s.t. 0≤α≤ c 1

(3)

min β 1 2 β'H (G'G) -1 H'β- e 1 'β s.t. 0≤β≤ c 2

(4)

Where H = [K(A,C') e₁] and G = [K(B,C') e₂]. QPPs (3) and (4) determine two nonparallel kernel-generated surfaces K(x',C')w₁ + b₁ = 0, K(x',C')w₂ + b₂ = 0. By using the following linear kernel K(x',C') = x',C', and defining the equations u₁ = C'w₁ and u₂ = C'w₂, the linear TWSVM classification hyper-planes x'u₁ + b₁ = 0 and x'u₂ + b₂ = 0 can be gotten. A new data sample x is assigned to the class +1 or –1 depending on which of the two hyper-planes lies closer to x in terms of perpendicular distance.¹¹⁾

3. Enhanced Methods for Training Dataset

TWSVM is more effective and accurate than the standard SVM. However, the classification results of TWSVM are not satisfied for the real time requirements in the strip steel surface defects multi-class classification. Moreover, the TWSVM can not tackle the large-scale, sparse, unbalanced and corrupted training datasets well. In order to solve these problems, the density information of every sample in each defect dataset is extracted and used. In detail, the large-scale dataset is pruned, and the sparse dataset is added with unlabeled samples according to that density information. All these methods are applied to TWSVM to enhance the performance of the TWSVM and realize optimal classification results.

3.1. Density Information of Training Dataset

Kernel density estimation (KDE) is a well developed statistic field.^15,16) It studies the feature of the data distribution from the data sample itself, which can be used in many real problems.¹⁶⁾ In this paper, KDE method is used to estimate density information of each sample in the training dataset. Consider a training dataset matrix X = [X₁ X₂ … X_n]', and suppose that Ω_i is the nearest neighbor region with radius r and central point X_i, then density information of each sample in the training dataset can be defined following:

d i = ∑ X t ∈ Ω i t=1,2⋅⋅⋅n exp(- || X t - X i | | 2 (r/2) 2 ) i=1,2,⋅⋅⋅,n

(5)

Where d_i reflects density information of the sample X_i, and r is set according to the real problem by the user. The bigger d_i, the more important the sample X_i in the training dataset. It can be seen from the Eq. (5), the density information of the sample X_i is determined by all samples in {X_t|X_t ∈Ω_i, t = 1,2…n}. The closer the sample X_t is to the central sample X_i, the bigger the density contribution is to X_i. That is to say, the density information d_i is determined by the similarity between X_i and X_t in region Ω_i.

Density information d_i reflects the importance of sample X_i, which can be effectively used when the classifier deals with the training dataset. Take the training dataset with noise samples for example, the density value of noise sample is very small because the noise sample is isolated and the samples which are similar with it are very few. According to the density value, the noise samples of the training dataset can be deleted.

3.2. Pruning Samples for Training Dataset

For large-scale training dataset, it is necessary to prune samples for improving its classification efficiency. The pruning principle is not only to ensure the least effect on sample distribution of the training dataset, but also to reduce the number of the sample. In this paper the method that uses density information of large-scale dataset to guide the pruning process is proposed.

Consider a matrix B = [B₁ B₂ … B_m₂]' belonging to large-scale training dataset, and suppose that Ω_l is the nearest neighbor region with radius r and the central point B_l, and d_l is the density information of sample B_l. Firstly, all density information {d_l|l = 1,2,…m₂} is sorted in order. Then the sample B_l whose density information d_l is smaller than the threshold d is pruned from the large-scale training dataset. The pruned training dataset B ¯ =[ B ¯ 1 B ¯ 2 ⋅⋅⋅ B ¯ m - ] ' is obtained. Finally, set the density information of the pruned samples from B to be 0 and the other densities to be d j - of B ¯ . d_l and d j - are calculated just like the following:

d j - ∈{ d l | I l =1, l=1,2,⋅⋅⋅, m 2 } j=1,2,⋅⋅⋅, m - where I l ={ 1 d l >d 0 d l ≤d l=1,2,⋅⋅⋅, m 2 d l = ∑ B t ∈ Ω l exp(- || B t - B l | | 2 (r/2) 2 ) l=1,2,⋅⋅⋅, m 2

(6)

Where, the threshold d is determined according to the real problem by user. It can be seen from the above formulas all pruned samples provide the contribution to density information of B ¯ . Both the classification efficiency and accuracy can be improved by using the density information d j - for large-scale training dataset.

3.3. Adding Unlabeled Samples for Training Dataset

For sparse training dataset, its complete sample distribution can not be reflected because there are not enough necessary samples in that dataset. To solve this problem, a novel method of selecting the samples from the vast of unlabeled samples and inserting these samples into the sparse training dataset is proposed in this paper. Unlabeled samples are added to the sparse training dataset by active learning¹⁷⁾ and their density information is used to guide the process.

Consider a matrix A = [A₁ A₂ … A_m₁]' belonging to sparse training dataset, and suppose that Ω_i is the nearest neighbor region with radius r and central point A_i, and d_i is the density information of sample A_i. Firstly, the unlabeled sample X_p in region Ω_i will be selected. Then the sample X_p becomes a labeled sample of the training dataset, which can enhance the density of the sample A_i. Secondly, the center A_c of the training set will be calculated. Then a sphere region Ω_c with the central point A_c and radius R will be constructed. The unlabeled sample X_q that is not in the region Ω_i but in the region Ω_c will be selected. The sample X_q becomes a labeled sample of the training dataset. Then insert the new labeled sample X_q into the training dataset A to obtain the new training dataset A ¯ =[ A ¯ 1 A ¯ 2 ⋅⋅⋅ A ¯ m 1 A ¯ m 1 +1 A ¯ m 1 +2 ⋅⋅⋅ A ¯ m + ]'= [ A 1 A 2 ⋅⋅⋅ A m 1 X 1 X 2 ⋅⋅⋅ X m + - m 1 ]' . Finally, reconstruct Ω_i and calculate the sample density d i + for the new training dataset A ¯ . The parameters, such as A_c, R, d_i and d i + , are calculated just like the following:

A c = 1 m 1 ∑ i=1 m 1 A i , R= max i=1,2⋅⋅⋅ m 1 || A i - A c ||-r

(7)

d i + ={ d i + ∑ X p ∈ Ω i exp(- || X p - A i | | 2 (r/2) 2 ) i=1,2,⋅⋅⋅, m 1 ∑ X t ∈ Ω i exp(- || X t - X i- m 1 | | 2 (r/2) 2 ) + ∑ A t ∈ Ω i exp(- || A t - X i- m 1 | | 2 (r/2) 2 ) i= m 1 +1, m 1 +2,⋅⋅⋅, m + where d i = ∑ A t ∈ Ω i exp(- || A t - A i | | 2 (r/2) 2 ) i=1,2,⋅⋅⋅, m 1

(8)

The sample distribution of sparse training dataset will be improved by adding unlabeled samples. It can be seen clearly from the above formulas that these added unlabeled samples enhanced the density of not only the original samples but also the integral samples of training dataset. The density information d i + can be used to improve classification accuracy for spare training dataset.

4. Multi-density TWSVM and Multi-class Classification Algorithm

4.1. Multi-density TWSVM

In order to get high-accuracy and high-efficiency multi-class classification results especially for large-scale, sparse, unbalanced and corrupted training datasets, TWSVM is improved to be multi-density TWSVM based on enhanced methods in section 3.

Consider the following binary classification problem. Suppose A = [A₁ A₂ … A_m₁]' is a labeled sparse training dataset, which belongs to the class +1. It becomes A ¯ =[ A ¯ 1 A ¯ 2 ⋅⋅⋅ A ¯ m + ]' after being added by the new labeled samples. And suppose B = [B₁ B₂ … B_m₂]' is a labeled large-scale training dataset, which belongs to the class –1. And the matrix B ¯ =[ B ¯ 1 B ¯ 2 ⋅⋅⋅ B ¯ m - ]' is a pruned training dataset on the base of B. Based on what mentioned above, MDTWSVM is described as the following two primal QPPs:

min w 1 , b 1 1 2 ∑ i=1 m + d i + (K( A ¯ i ',C') w 1 + b 1 ) 2 + c 1 ∑ j=1 m - d j - η j s.t. K( B ¯ j ',C') w 1 + b 1 ≤-1+ η j , η j ≥0, j=1,2,⋅⋅⋅, m -

(9)

min w 2 , b 2 1 2 ∑ j=1 m - d j - (K( B ¯ j ',C') w 2 + b 2 ) 2 + c 2 ( ∑ i=1 m + d i + ξ i ) s.t. K( A ¯ i ',C') w 2 + b 2 ≥1- ξ i , ξ i ≥0, i=1,2,⋅⋅⋅, m +

(10)

Where C'=[ A ¯ ' B ¯ '] , and c₁ and c₂ represent penalty factors. d i + is density information of sample in training dataset A ¯ . d j - is density information of sample in training dataset B ¯ . d j - and d i + can be obtained by using Eqs. (6) and (8). Optimizing the primal QPP (9) of MDTWSVM tends to make the hyper-plane close to class +1 and far to class –1 at a distance of at least 1. However, for class +1, the samples with big density values are possible to be near the hyper-plane and the samples with small density values have little effect on the hyper-plane because of added density parameters. At the same time, for the class –1, the introduction of density parameters makes it possible for the hyper-plane to be at a distance of at least 1 from the samples with big density values. Multi-density information of the training samples can effectively reduce effect of noise samples and improve the classification accuracy. The primal QPP (10) of MDTWSVM has similar results.

The Lagrangian corresponding to the primal QPP (9) of MDTWSVM is given:

L( w 1 , b 1 , α j , β j , η j )= 1 2 ∑ i=1 m + d i + (K( A ¯ i ',C') w 1 + b 1 ) 2 + c 1 ∑ j=1 m - d j - η j + ∑ j=1 m - α j (K( B ¯ j ',C') w 1 + b 1 +1- η j )- ∑ j=1 m - β j η j

(11)

Using Karush-Kuhn-Tucker conditions and a serial similar process to TWSVM in reference,¹¹⁾ the Wolfe dual of the primal QPP (9) of MDTWSVM is obtained:

min α 1 2 α'G (H' D + H) -1 G'α- e 2 'α s.t. 0≤α≤ c 1 D - e 2

(12)

Where α = (α₁ α₂, …, α_m^–)', H=[K( A ¯ ,C') e 1 ] and G=[K( B ¯ ,C') e 2 ] . D⁺ is a diagonal matrix with the diagonal element d i + D^– is a diagonal matrix with the diagonal element d j - . The corresponding hyper-plane K(x',C')w₁ + b₁ = 0 can be obtained through the primal QPP (12) of MDTWSVM. And the parameters w₁ and b₁ can be calculated with the following formula:

[ w 1 b 1 ]=- (H' D + H) -1 G'α

(13)

In a similar way, the Wolfe’s dual of the primal QPP (10) of MDTWSVM can be obtained, and the hyper-plane K(x',C')w₁ + b₁ = 0 is solved by following:

min μ 1 2 μ'H (G' D - G) -1 H'μ- e 1 'μ s.t. 0≤μ≤ c 2 D + e 1

(14)

[ w 2 b 2 ]= (G' D - G) -1 H'μ

(15)

Where μ = (η₁ η₂, …, η_m⁺)'. A new data sample x is assigned to the class +1 or –1 depending on which of the two hyper-planes lies closer to x in terms of perpendicular distance.

i=arg min k=1,2 |K(x',C') w k + b k | || w k || , x∈{ class +1 if i=1 class -1 if i=2

(16)

4.2. SOR Technique for MDTWSVM

In MDTWSVM, there are two QPPs (12) and (14) to be solved. The primal QPP (12) is rewritten in the following form by defining U = G(H'D⁺H)^–1G', c = c₁D^–e₂ and φ = α:

min α 1 2 φ'Uφ-e'φ s.t. 0≤φ≤c

(17)

SOR for quadratic programs is a high effective iterative method, which is considerably faster than other methods. It has been proved that this algorithm converges linearly to a solution.¹⁸⁾ In reference,¹⁸⁾ the SOR technique has been used effectively to improve the efficiency of solving QPP. The paper also employs the SOR technique to solve the primal QPP (17). The iterative equations of the SOR for the QPP (17) are just like the following:

φ j k+1 = φ j k -ρ E jj -1 ( ∑ l=1 j-1 U jl φ l k+1 + ∑ l=j m U jl φ l k -1 ) j=1,2,⋅⋅⋅, m -

(18)

Where ρ∈(0,2). E_jj is the diagonal element of matrix U. k is the iterative times. It can be seen from the Eq. (18) that the iterative process begins with an initial φ⁰, and then φ^k+1 is calculated from φ^k, until||φ^k+1 – φ^k|| is less than some prescribed tolerance ε. During the iterative process, φ^k+1 is restricted within the interval [0,c].

The primal QPP (14) is also rewritten as QPP (17) by defining U = H(G'D^–G)^–1H', c = c₂D⁺e₁ and φ = μ, which can be calculated with the Eq. (18).

4.3. Multi-class Classification Algorithm

The strip steel surface defect classification belongs to multi-class classification problem. There are many multi-class classification methods based on binary classifier, such as one-against-one, one-against-rest, decision directed acyclic graph and binary tree.¹⁹⁾ And binary tree is most widely used. The basic binary tree model for seven classes is shown in Fig. 1.

Fig. 1.

Binary tree for seven classes.

It can be seen from Fig. 1 that the binary tree improves the classification efficiency by constructing 6 classifiers for seven classes. However, the training dataset may be unbalanced for every binary classifier. So it is necessary to solve the unbalanced problem when the binary tree is used to realize multi-class classification. In this paper the binary tree and binary classifier MDTWSVM are combined. In MDTWSVM, the class +1 represents the sparse training dataset A and the class –1 represents the large-scale training dataset B. When every node classifier of binary tree is constructed, it should be confirmed to be class +1 or class –1 according to the number of samples in training dataset. Suppose there are N classes for strip steel surface defects in training dataset, the multi-class classification algorithm by combining binary tree and binary MDTWSVM is obtained. The algorithm steps are just like the following:

Step 1: analyze training datasets of N classes and decide sparse and large-scale training datasets. Then select rational parameters, such as radius r and pruning threshold d. And add unlabeled samples to the sparse training dataset and prune samples for the large-scale training dataset to get new training datasets based on enhanced methods in section 3.

Step 2: select one class of samples as training dataset 1, and the others as training dataset 2 for the current node of binary tree. Then determine the sparse training dataset A of class +1 and the large-scale training dataset B of class –1 according to the numbers of the training datasets 1 and 2.

Step 3: obtain new added training dataset A ¯ from A and the new pruned training dataset B ¯ from B according to the information of step 1.

Step 4: calculate density information d j - and d i + for training datasets B ¯ and A ¯ according to Eqs. (6) and (8) respectively.

Step 5: select rational parameters c₁, c₂ and kernel function K.

Step 6: define G, H, D⁺, D^–, U and select parameter ρ. And calculate Lagrangian vectors α and μ according to Eq. (18).

Step 7: calculate parameters of the two nonparallel hyper-planes with Eqs. (13) and (15) to obtain the classifier for this node.

Step 8: repeat steps from 2 to 7 until obtain all the nodes classifiers of binary tree.

Step 9: According to the binary tree, calculate every node classifier by Eq. (16) until a new defect sample falls into one of N classes.

5. Experiments

5.1. Description of Strip Steel Surface Defect Datasets

It is important to obtain a complete database of strip steel surface defect before classifying. It is well known that computer vision technology has been widely used in the strip steel surface defect detection based on charged coupled device (CCD). All experimental images of the strip steel surface defect are obtained by using CCD surface detection system of national large steel plants in this paper. Four classes of defect images with large-scale samples and three classes with sparse samples are selected. These defects are scarring, dent, scratch, dirt, hole, damage and edge-crack, which are shown in Fig. 2.

Fig. 2.

Images of seven surface defects.

A defect sample is a multi-dimensional vector, which is obtained through extracting features of defect image. These features have immediate impact on the classification accuracy and efficiency. These defect features including geometry, gray, projection, texture and frequency-domain features²⁰⁾ are often extracted. And each of these features involves multi-dimensional information. In this paper there are 64 defect features for each defect image, which compose a 64-dimensional defect sample. In order to ensure the efficiency, manifold learning algorithm²¹⁾ is used to reduce the dimension to be 34. So each sample is a 34-dimensional vector.

In this paper, 2330 images are selected with seven classes of defect images, which become 34-dimensional samples through feature extraction and dimensionality reduction. These samples are provided to do testing classification experiments. Then they are divided into training dataset and testing dataset randomly. The dividing results are shown in Table 1.

Table 1. Different surface defect data samples.

Defect type	Number of training samples	Number of testing samples
Scarring	477	53
Dent	468	52
Scratch	450	50
Dirt	459	51
Hole	72	18
Damage	81	19
Edge-crack	63	17

5.2. Parameters Analysis for MDTWSVM

As discussed in section 3 and 4, six parameters of MDTWSVM algorithm, such as r, d, ρ, c₁, c₂ and K, will affect the performance of multi-class classification algorithm. It is necessary to consider how to select the optimal values for these parameters when MDTWSVM algorithm is used to realize the strip steel surface defect multi-class classification. Parameter r is the radius of nearest neighbor region that affects density information of all samples in training dataset. The smaller r, the more independent the sample, and vice verse. Parameter d is pruning threshold for pruning samples, which affects pruning ratio. Too small d is not satisfied the demand of pruning ratio, and too large d affects the classification accuracy. Parameter ρ is the slack factor of the SOR algorithm, which affects the convergence. When penalty factors c₁ and c₂ are optimal the performance of the MDTWSVM algorithm is satisfied. Radial basis function (RBF) is often used as kernel function K of MDTWSVM. For RBF, the corresponding parameter is the kernel radius δ that directly affects the classification accuracy of MDTWSVM.

In this paper, all training datasets are firstly normalized and then six parameters are determined. In general, the radius r and the threshold d should be selected from 0.2 to 0.4. Slack factor ρ should be defined in the interval [0,2]. Penalty factors c₁ and c₂ are obtained from 2^–10 to 2⁵. And kernel radius δ is from 2^–10 to 2⁴. These parameters can be optimized by using some usual methods, such as recursive exhaustion technique, gradient descent method and genetic algorithm. The tenfold cross-validation method^16,18) is used to obtain these optimal parameters in this paper.

5.3. Experiments and Results Analysis

Four experiments need to be done to testify the efficiency of multi-class classification algorithm with MDTWSVM and binary tree for the strip steel surface defect. Firstly, 3 experiments are used to test the effects on which the added sparse sample has, the pruned large-scale sample has, and the density information algorithm has respectively. And the last one is to make comparisons among 3 different classifiers. All experimental results are shown in order in the following content.

Firstly, corresponding to different pruning ratio for large-scale training dataset, different testing results are shown in Table 2. To mention that the pruning ratio is the ratio of the number of pruned data samples to that of the original data samples. From Table 2, it is clear that the larger the ratio, the shorter the training and testing time, but the lower the classification accuracy. Pruning ratio should be appropriately selected by considering both time and accuracy in real classification problem. So the pruning ratio is determined as 0.40 in this paper.

Table 2. Testing results with different pruning ratio.

Pruning ratio	Training time (s)	Testing time (s)	Accuracy (%)
0.00	2.6447	0.2753	95.38
0.30	1.4555	0.2251	95.00
0.40	1.1243	0.0902	94.62
0.50	0.8713	0.0920	89.23
0.70	0.1340	0.0778	76.92
0.80	0.0618	0.0362	61.15

Secondly, corresponding to different adding ratio for sparse training dataset, different testing results are shown in Table 3. To mention that the adding ratio is the ratio of the number of added unlabeled samples to that of the original samples. It is necessary to point out that the experiment is made when the pruning ratio is 0.4 for large-scale training dataset. It can be seen from Table 3 that the larger the adding ratio, the longer the training and testing time. However, classification accuracy is firstly high then low as the ratio becomes large. So the adding ratio is determined as 0.60 in this paper.

Table 3. Testing results with different adding ratio.

Adding ratio	Training time (s)	Testing time (s)	Accuracy (%)
0.00	1.1243	0.0902	94.62
0.30	1.1865	0.0996	95.00
0.50	1.2213	0.1022	96.15
0.60	1.2623	0.1068	96.92
0.70	1.3040	0.1109	96.54
0.90	1.3618	0.1141	94.23

Thirdly, in order to test the efficiency of the density information parameter in MDTWSVM, classification accuracy is tested on training dataset with multi-density (MD) parameter or not respectively. It can be seen from Table 4 that the accuracy is much higher in using MD than not using MD. So combining adding unlabeled samples for sparse training dataset, pruning samples for large-scale training dataset and using multi-density information, can enhance classification accuracy effectively.

Table 4. Classification accuracy of different defects.

Defect type	Accuracy (%)
Defect type	Not using MD	Using MD
Scarring	94.34	98.11
Dent	96.15	100
Scratch	90.00	96.00
Dirt	90.20	96.08
Hole	94.45	94.45
Damage	89.47	94.74
Edge-crack	88.24	94.12

Finally, in order to verify the efficiency of the algorithm proposed in this paper, comparative experiments are made with MDTWSVM, standard SVM and TWSVM. In order to be fair, the same binary tree is used in these different methods. The final 3 results are shown in Table 5. It is clear that the MDTWSVM proposed in this paper is much better than the others on time cost and classification accuracy.

Table 5. Testing results of three classifiers.

Classifier	Training time(s)	Testing time (s)	Accuracy (%)
SVM	91.1512	1.1886	92.31
TWSVM	11.3707	0.6881	93.85
MDTWSVM	1.2623	0.1068	96.92

6. Conclusions

A novel multi-class classification algorithm for the strip steel surface defects is propose based on density information of training dataset in this paper. Density information is used to guide these processes of pruning large-scale training dataset, adding unlabeled samples for sparse training dataset and improving TWSVM to MDTWSVM. And the SOR method is used to resolve the QPPs of MDTWSVM. The novel method proposed in this paper is fit for multi-class, large-scale, sparse, unbalanced and noised sample dataset. Finally, the results of experiments made on the strip steel surface defect dataset show that MDTWSVM combining binary tree is of perfect classification accuracy and efficiency.

References

1) L. A. O. Martins, F. L. C. Pádua and P. E. M. Almeida: Proc. of 36th Annual Conf. on IEEE Industrial Electronics Society, IEEE, Piscataway, NJ, (2010), 1081.
2) S. R. Aghdam, E. Amid and M. F. Imani: Proc. of 7th IEEE Conf. on Industrial Electronics and Applications, IEEE, Piscataway, NJ, (2012), 1447.
3) M. Yazdchi, M. Yazdi and A. G. Mahyari: Proc. of Int. Conf. on Digital Image Processing, IEEE, Piscataway, NJ, (2009), 346.
4) K. Agarwal, R. Shivpuri, Y. J. Zhu, T. S. Chang and H. Huang: Expert Syst. Appl., 38 (2011), 7251.
5) Y. J. Chen, L. Chen, X. M. Liu, S. Ding and H. Zhang: Practical Applications of Intelligent Systems, Vol. 124, Springer-Verlag, Berlin Heidelberg, (2012), 185.
6) Y. W. Yu, G. F. Yin and L. Q. Du: Adv. Mater. Res., 217–218 (2011), 336.
7) E. Amid, S. R. Aghdam and H. Amindavar: World Acad. Sci., Eng. Technol., 67 (2012), 1303.
8) C. F. Lin and S. D. Wang: IEEE Trans. Neural Netw., 13 (2002), 464.
9) Y. C. Tang, B. Jin, Y. Sun and Y. Q. Zhang: Proc. of 2004 IEEE Symp. on Computational Intelligence in Bioinformatics and Computational Biology, IEEE, Piscataway, NJ, (2004), 73.
10) O. L. Mangasarian and E. W. Wild: IEEE Trans. Pattern Anal. Mach. Intell., 28 (2006), 69.
11) Jayadeva, R. Khemchandani and S. Chandra: IEEE Trans. Pattern Anal. Mach. Intell., 29 (2007), 905.
12) X. J. Ding, G. L. Zhang, Y. Z. Ke, B. L. Ma and Z. C. Li: Proc. of Int. Symp. on Information Science and Engineering, Vol. 1, IEEE, Piscataway, NJ, (2008), 560.
13) H. H. Cong, C. F. Yang and X. R. Pu: Proc. of IEEE Conf. Robotics, Automation and Mechatronics, IEEE, Piscataway, NJ, (2008), 348.
14) G. R. Naik, D. K. Kumar and Jayadeva: IEEE Trans. Inf. Technol. Biomed., Vol. 14, IEEE, Piscataway, NJ, (2010), 301.
15) J. L. Li and H. G. Fu: Control Decision, 25 (2010), 507 (in Chinese).
16) X. J. Peng and D. Xu: Neurocomputing, 99 (2013), 134.
17) L. M. Liu, A. N. Wang, M. Sha, X. Y. Sun and Y. L. Li: ISIJ Int., 51 (2011), 1474.
18) Y. H. Shao, C. H. Zhang, X. B. Wang and N. Y. Deng: IEEE Trans. Neural Netw., 22 (2011), 962.
19) L. M. Liu, A. N. Wang, M. Sha and F. Y. Zhao: J. Iron Steel Res. Int., 18 (2011), 17.
20) Y. Zhang, W. W. Liu, Z. T. Xing and Y. H. Yan: J. Northeastern Uni. (Natural Science), 33 (2012), 267 (in Chinese).
21) M. Wang, H. Qiao and B. Zhang: IEEE Trans. Intell. Transp. Syst., 12 (2011), 1195.

Corresponding author

Register with J-STAGE for free!