ISIJ International
Online ISSN : 1347-5460
Print ISSN : 0915-1559
ISSN-L : 0915-1559
Regular Article
Learning a Class-specific and Shared Dictionary for Classifying Surface Defects of Steel Sheet
Shiyang ZhouYouping ChenDailin Zhang Jingming XieYunfei Zhou
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML

2017 Volume 57 Issue 1 Pages 123-130

Details
Abstract

An approach to a class-specific and shared dictionary learning (CDSDL) for sparse representation is proposed to classify surface defects of steel sheet. The proposed CDSDL algorithm is modelled as a unified objective function, covering reconstructive error, sparse and discriminative promotion constraints. With the high-quality dictionary, the compact, reconstructive and discriminative feature representation of an image can be extracted. Then the classification can be efficiently performed by discriminative information obtained from the reconstructive error or the sparse vector. Based on a dataset of surface images captured from a practical steel production line, the CDSDL algorithm is carried out to verify its effectiveness. Experimental results indicate that the CDSDL algorithm is more effective in classifying surface defects of steel sheet than other algorithms.

1. Introduction

Machine vision-based classification of surface defects of steel sheet has attracted much attention due to the potential value and challenges for some practical industrial applications.1,2,3,4,5,6,7,8) As a classical pattern recognition problem, feature extraction (learning) and designing of classifier construct the core. Most of existing works have focused on these two aspects to enhance the classification performance over the past few years.9,10,11,12,13,14,15,16,17,18,19,20) A set of features are studied, including grayscale features,9,10) geometric features,9,10) shape features,9,10,11) texture features,10,12) Gabor filter output features,13) Fourier spectral,14) wavelet transform,14,15,16,17) local binary pattern (LBP),18,19) shearlet transform;20) and the classifier of artificial neural network (ANN)12,13,15) and support vector machine (SVM)9,10,14,17,18,19,20) is discussed. However, these approaches, which treat feature extraction and classifier training as two separating steps, are difficult to control the interaction of two steps. In fact, sparse representation and dictionary learning based classification methods, with no explicit stage of feature extraction, may be more effective in classification of surface defects of steel sheet. The sparse representation and dictionary learning, which can be formulated as follows, has also been successfully applied in many machine vision applications,21) such as face recognition.22)   

min x y-Dx 2 2 +λ x 1 (1)
where y d is a given feature vector, D d×K is a dictionary, x 1 is l1-norm of a coding vector x, l2-norm of an atoms di is less than one, λ is a positive parameter that balances the tradeoff between reconstructive error and sparsity. The optimization of x and D is carried out by an iterative method composing two steps: (a) fixing D to update x; (b) fixing x to update D. Equation (1) can be solved efficiently by many kinds of algorithms,36,37,38,39,40) such as orthogonal matching pursuit (OMP).41)

A choice of dictionary D plays a key role to obtain a sparse, high-fidelity and discriminative feature representation of an image. Many state-of-the-art results have shown that a good dictionary is learned from the training dataset itself.23,24) For example, given a set of feature vectors {yi}i = 1, 2, ..., N in d , the goal of dictionary learning is to learn a dictionary D = {di}i = 1, 2, ..., K in d×K that each yi can be compactly represented as a sparse linear combination of a few atoms from D, meanwhile keeping the reconstruction error y i -D x i 2 2 as small as possible. Many existing dictionary learning approaches can be divided into two categories: unsupervised dictionary learning and supervised dictionary learning. For the unsupervised dictionary learning methods, such as method of optimal directions (MOD)25) and K-SVD,26) the learned dictionary may not be best for classification because the class information of a training dataset isn’t utilized. In recent years, a large number of researches about supervised or discriminative dictionary learning have been proposed, such as sparse representation based classification (SRC).22) A reconstructive and discriminative dictionary, which has a finer adaptation for classification, can be learned. The state-of-the-art classification performance27,28,29,30,31,32,33,34,35) suggests that sparse representation and dictionary learning is a promising method for classifying surface defects of steel sheet.

The surface defects of steel sheet as shown in Fig. 1 can be regarded as the local anomaly against the relatively homogeneous background. The defect area of an image (discriminative pattern) occupies a small portion of an image. The background texture of an image (shared pattern), which is useful for reconstruction rather than discrimination, is nearly the same. Because the class-specific information of defects is only a small proportion in the total information of an image, the feature representation of different defect images aren’t discriminative. If a traditional discriminative dictionary learning method is adopted, most of atoms of the learned dictionary are used to represent the background information that contributes nothing to classify different defects, and only a small part of atoms are used to represent the class-specific information of defects. Therefore, the discriminative ability of a coding vector between different defect images will reduce, and the poor classification performance may be achieved.

Fig. 1.

Examples of practical surface defects of steel sheet: (a) Folding; (b) Line; (c) Patch; (d) Scratch; (e) Non-defective.

Based on the above analysis, a class-specific and shared dictionary learning (CDSDL) algorithm, in which some class-specific sub-dictionaries and only one shared sub-dictionary can be learned simultaneously, is proposed. Each class-specific sub-dictionary encodes the exclusive pattern for corresponding class, and the shared sub-dictionary encodes the explicit and hidden common pattern that shared by all the classes. With these sub-dictionaries, the exclusive pattern and the shared pattern of an image can be explicitly separated. Because the atoms of these sub-dictionaries can capture the inherent structure of a defect image, the effective feature representation of the image for classification can be obtained.

The remainder of this paper is organized as follows. In Section 2, we briefly introduce some related works about classification of surface defects of steel sheet, sparse representation and dictionary learning based classification. In Section 3, the proposed CDSDL algorithm, including formulation and optimization, is introduced. In Section 4, experimental results are presented. Finally, conclusions are given.

2. Related Work

In the past decades, a lot of works about classification of surface defects of steel sheet have been studied by some researchers. Caleb et al. presented a classification method of surface defects of hot rolled steel based on the classifier of multi-layer perceptron (MLP) and self-organising maps network (SOM).12) Medina et al. introduced an approach to classify six different kinds of defects of flat steel coils by combining an ANN, a k-nearest neighbor (KNN) and a naive Bayesian.13) Ghorai et al. exploited an approach of automatic defect detection of hot rolled flat steel based on three-level Haar feature set.17) Hu et al. established a SVM model to classify five types of surface defects of stripe steel.9) Furthermore, they optimized feature vector of image and kernel function of SVM by hybrid chromosome genetic algorithm and had better classification performance.10) Moreover, Chu et al. constructed an invariant feature extraction approach based on smoothed LBP.19) However, these methods largely depend on an appropriate set of hand-crafted features, which need the special design for the special task.

Recently, some sparse representation and dictionary learning based classification methods, which are mainly divided into two categories, have also been paid more attention. One category aims to improve the discriminative capability of a dictionary.27,28) Ramirez et al. introduced a structured incoherence term for promoting the independence of sub-dictionary associated with different classes.27) Gao et al. learned both some category-specific sub-dictionaries and a shared dictionary by imposing cross-incoherence constraint between different sub-dictionaries and self-incoherence constraint in each sub-dictionary.28) For these methods, the reconstructive error associated with different classes can be used for classification, but a coding vector, which isn’t discriminative, isn’t suitable for classification. Another category aims to improve the discriminative capability of a coding vector based on a discriminative dictionary.29,30,31,32) Huang et al. presented a method that quantified the discriminative ability of a coding vector by Fisher’s discrimination criterion.29) Zhang et al. introduced a classification error term of a linear classifier.30) Furthermore, Jiang et al. exploited a label consistency regularization term.31) These methods have learned a shared dictionary by all the classes and treat a coding vector as a new feature representation of an original image. Although the coding vector is discriminative, the classification based on the class-specific reconstructive error doesn’t work. The main difference between these two kinds of method is the discriminative capability of a dictionary or a coding vector. If the dictionary discrimination and the coding vector discrimination can be combined, better classification performance may be gained.

3. Class-specific and Shared Dictionary Learning (CDSDL)

Given a training dataset of c classes, the CDSDL algorithm constructs c discriminative class-specific sub-dictionaries for each class and a shared sub-dictionary for all classes, respectively. Some structural incoherence terms are added in the objective function to promote the incoherence for the sub-dictionary and the coding vector. The coding vector over the learned dictionary, which can be directly treated as a new feature representation of an image, contains crucial information for reconstruction and discriminative information for classification. The coding vector of a defect image of the same class is similar, while the coding vector of different class is dissimilar. Therefore, both the reconstructive error associated with different classes and the coding vector can be used for classification. In addition, we can amplify the discriminative part and suppress the shared part for feature representation of a defect image by controlling the number of atoms of sub-dictionary. The illustration of the CDSDL algorithm is shown in Fig. 2.

Fig. 2.

An illustration of the CDSDL algorithm. Each image can be sparsely represented by a few atoms from corresponding class-specific sub-dictionary Di (purple) and shared sub-dictionary Dc+1 (red).

3.1. Formulation of CDSDL

Let Y = [Y1, ..., Yi, ..., Yc]∈ d×N denote a training dataset of c classes, Yi d× n i is a matrix constructed by stacking ni samples of length d, i=1 c n i =N , where d is the dimension of a sample, ni is the number of sample of the i-th class, N is the total number of the whole training dataset. Let D = [D1, ..., Di, ..., Dc, Dc+1]∈ d×K denote the learned dictionary of K atoms, Di d× k i , i = 1, 2, ..., c, is the class-specific sub-dictionary that trained from a corresponding training dataset Yi, Dc+1 d× k c+1 is the shared sub-dictionary that trained from the whole training dataset Y, where K = i=1 c+1 k i , ki is the number of atoms of the i-th sub-dictionary. Therefore, the objective function of the CDSDL algorithm is shown as follows.   

J= min D,X F( D,X;Y ) + λ 1 X 1 + λ 2 G( X ) +H( D ) s.t.       D i ( :,   j ) 2 =1,   i,   j (2)
where   
F( D,X;Y ) = i=1 c ( Y i -D X i F 2 + Y i -( D Q ˜ i ) ( ( Q ˜ i ) T X i ) F 2 + λ 1 X i 1 ) (3)
  
G( X ) =[ i=1 c X i - M i F 2 - i=1 c M i -M F 2 + i=1 c X i F 2 ] (4)
  
H( D ) = i=1 c+1 ( w 1 n i k i 2 D i T D i - I k i F 2 + w 2 n i k i ( K- k i ) D i T ( D Q -i ) F 2 ) (5)

F(D, X; Y) is a reconstructive error term, G(X) is a Fisher discriminative term imposed on the coding vector matrix X, H(D) is a structural incoherence term imposed on the dictionary D, each atom dj of D is constrained to unit l2-norm to avoid trivial solution X.

F denotes Frobenius-norm, X 1 is a sparsity regularizer, I k i is an identity matrix, λ1, λ2, w1 and w2 are the scalar weight parameters, nc+1 = N.

Xi = [Xi1; ...; Xii; ...; Xic; Xic+1]∈ K× n i is a coding vector matrix corresponding to a training dataset Yi, each column xij is the coding vector corresponding to the j-th training sample of the i-th class, Xii is the coding vector matrix of Yi corresponding to a sub-dictionary Di.

Qi K× k i is a selective operator, each column has only one nonzero element 1 that the row index equals the column index of an atom dj, Q ˜ i = [Qi, Qc+1]∈ K×( k i + k c+1 ) , Q−i = [Qi, ..., Qi−1, Qi+1, ..., Qc, Qc+1 ]∈ K×(K- k i ) .

3.2. Analysis of CDSDL

The term Y i -D X i F 2 indicates that the dictionary D should represent Yi well.

Because Di is a class-specific sub-dictionary and Dc+1 is a shared sub-dictionary, the term Y i -(D Q ˜ i )( ( Q ˜ i ) T X i ) F 2 indicates that Yi should be well represented by [Di, Dc+1], but not by [Dj, Dc+1], j ≠ i, which forces the samples from the same class to have similar feature representation.

The term G(X) directly operates the coding vector and promotes them to more discriminative between different classes. Based on Fisher’s linear discriminant,43) which maximizes the ratio of between-class scatter matrix to within-class scatter matrix, we can jointly minimize the within-class scatter matrix SW(X) and maximum the between-class scatter matrix SB(X). SW(X) and SB(X) are defined as follows.   

S W ( X ) = i=1 c j=1 n i ( x i ( j ) - m i ) ( x i ( j ) - m i ) T (6)
  
S B ( X ) = i=1 c n i ( m i -m ) ( m i -m ) T (7)
where xi(j) denotes the j-th training sample of the i-th class, mi and m are mean vector of Xi and X respectively, mi = 1 n i j=1 n i x i ( j ) K , m = 1 N i=1 c j=1 n i x i ( j ) K .

Therefore   

tr( S W ) = i=1 c X i - M i F 2 (8)
  
tr( S B ) = i=1 c n i m i -m 2 2 = i=1 c M i -M F 2 (9)
where Mi K× n i , each column equals mi. For corresponding Mi, M K× n i , each column equals m.

The combination of X 1 and X F 2 can make the solution X more stable.38)

The term D i T D i - I k i F 2 shows that each sub-dictionary is self-incoherent, which enhances incoherence for each sub-dictionary and has a direct impact on the speed of computation.27,44)

The term D i T D -i F 2 denotes a cross-incoherent constraint between different sub-dictionaries, which promotes atoms of different sub-dictionary to be as independent as possible.27,44)

The weight w1 n i k i 2 and w2 n i k i (K- k i ) alleviate the effect of imbalance between the number of samples and atoms.28)

An incoherent constraint on sub-dictionary was proposed in,27,28,35) a Fisher discrimination constraint on coding vector was presented in,33,34) and a class-specific and shared dictionary was adopted in.28,34,35) However, there are mainly two differences between the proposed CDSDL algorithm and these algorithms. Firstly, Ramirez27) and Yang33) only learn some class-specific sub-dictionaries, which can’t separate the class-specific features and the shared features. Secondly, the constraints of these algorithms are imposed on either sub-dictionary27,28,35) or coding vector,33,34) while the proposed CDSDL algorithm explicitly imposes the constraints on both sub-dictionary and coding vector.

3.3. Optimization of CDSDL

The optimization of the CDSDL algorithm is not jointly convex, but is convex with respect to each variable when the others are fixed. The optimization problem can be divided into three sub-problems: (1) Solving X with fixed D; (2) Solving Di with fixed X and D-j, ji, where D-j is the sub-matrix by removing Dj from D; (3) Solving Dc+1 with fixed X and D-(c+1). An effective iterative algorithm that alternatively optimizes objective function J with respect to coding vector matrix X, class-specific sub-dictionary Di and shared sub-dictionary Dc+1 is introduced as follows. The CDSDL algorithm is summarized in Table 1.

Table 1. Algorithm of class-specific and shared dictionary learning.
Algorithm 1: Class-specific and Shared Dictionary Learning
Input:Training dataset Y = {Yi}i= 1, 2, ...,c; size of sub-dictionary ki, i = 1, 2, ..., c, c+1; parameters λ1, λ2, w1, w2, and T.
Output:The learned dictionary D = {Di}i= 1, 2, ...,c,c+1, Di is a class-specific sub-dictionary, ic+1, Dc+1 is a shared sub-dictionary.
Initialize:The class-specific sub-dictionary Di is initialized by K-SVD in Yi, the shared sub-dictionary Dc+1 is initialized by K-SVD in Y.
While stop criterion isn’t reached
      Update of X
          Equation (11) is solved by FISTA and TwIST
      Update of Di
          Equation (18) is solved by MOCOD
      Update of Dc+1
          Equation (22) is solved by MOCOD
End While

3.3.1. Update of X

With fixed D, J can be rewritten as follows.   

min Xi Y i -D X i F 2 + Y i -D Q ˜ i ( Q ˜ i ) T X i F 2 + λ 2 ( X i - M i F 2 - i=1 c M i -M F 2 + X i F 2 ) + λ 1 X i 1 (10)

Equation (10) can be rewritten as follows.   

min Xi F( X i ) +2τ X i 1 (11)
where F(Xi) = Y i -D X i F 2 + Y i -D Q ˜ i ( Q ˜ i ) T X i F 2 + λ2 ( X i - M i F 2 - i=1 c M i -M F 2 + X i F 2 ) , τ = λ1/2.

Because F(Xi) is convex and differentiable to Xi, a fast iterative shrinkage thresholding algorithm (FISTA)42) and a two-step iterative shrinkage/thresholding (TwIST) algorithm45) can be employed to solve Eq. (11).

The first derivative of F(Xi) with respect to Xi is calculated, some detailed steps are listed in Appendix 1, then we have   

X i F( X i ) =2 D T ( D X i - Y i ) +2 Q ˜ i ( Q ˜ i ) T D T [ D Q ˜ i ( Q ˜ i ) T X i - Y i ] +2 λ 2 [ X i O i O i T + X i P i P i T -R P i T + X i Q i j ( Q i j ) T - T j ( Q i j ) T ] (12)
where E i j = ( 1 1 1 1 ) n i × n j , Oi = I n i × n i - E i i n i , Q i j = E i j N , Pi = E i i n i - Q i i , R = j=1 ji c X j Q j i , T j = X j E j j n j - l=1 li c X l Q l j .

According to FISTA and TwIST algorithm, we have   

{ X i } t+1 =( 1-α ) { X i } t-1 +( α-β ) { X i } t +β S τ σ [ { X i } t - 1 2σ X i F( X i ) ] (13)
where S τ σ ( ) is a soft-threshold shrinkage operator, {Xi}t−1 is the previous value, {Xi}t is the current value, {Xi}t+1 is the next value, α>0, β>0, σ>0.

3.3.2. Update of Di

With fixed X and D-j, ji, J can be rewritten as follows.   

min D i i=1 c ( Y i -D X i F 2 + Y i -D Q ˜ i ( Q ˜ i ) T X i F 2 ) + w 1 n i k i 2 D i T D i - I k i F 2 + w 2 n i k i ( K- k i ) D i T ( D Q -i ) F 2 (14)

Then, we have   

min D i Y-D( X 1 ,   ,    X c ) F 2 + Y-D[ Q ˜ 1 ( Q ˜ 1 ) T X 1 ,   ,   D Q ˜ c ( Q ˜ c ) T X c ] F 2 + w 1 n i k i 2 D i T D i - I k i F 2 + w 2 n i k i ( K- k i ) D i T ( D Q -i ) F 2 (15)

Denote A = [Y, Y]∈ d×2N , B = [X1, …, Xc, Q ˜ 1 ( Q ˜ 1 ) T X 1 , …, D Q ˜ c ( Q ˜ c ) T X c ]∈ K×2N , C = DQi d×(K- k i ) , we have   

min D i A-DB F 2 + w 1 n i k i 2 D i T D i - I k i F 2 + w 2 n i k i ( K- k i ) D i T C F 2 (16)

Therefore   

min D i A- j=1 ji c+1 D j B j - D i B i F 2 + w 1 n i k i 2 D i T D i - I k i F 2 + w 2 n i k i ( K- k i ) D i T C F 2 (17)

Denote A ˜ = A j=1 ji c+1 D j B j d×2N , Eq. (17) is equivalent to the following problem.   

min D i A ˜ - D i B i F 2 + w 1 n i k i 2 D i T D i - I k i F 2 + w 2 n i k i ( K- k i ) D i T C F 2 (18)
where Bi denotes ki rows of B corresponding to the sub-dictionary Di.

Equation (18) can be solved by method of optimal coherence-constrained directions (MOCOD)44) algorithm.

3.3.3. Update of Dc+1

With fixed X and D-(c+1), J can be rewritten as follows.   

min D c+1 i=1 c Y i -D X i F 2 + i=1 c Y i - D i X i i - D c+1 X i c+1 F 2 + w 1 n c+1 k c+1 2 D c+1 T D c+1 - I k c+1 F 2 + w 2 n c+1 k c+1 (K- k c+1 ) D c+1 T (D Q -(c+1) ) F 2 (19)

Because i=1 c Y i -D X i F 2 denotes the global reconstructive error, therefore   

min D c+1 Y- j=1 c D j X j - D c+1 X c+1 F 2 + i=1 c Y i - D i X i i - D c+1 X i c+1 F 2 + w 1 n c+1 k c+1 2 D c+1 T D c+1 - I k c+1 F 2 + w 2 n c+1 k c+1 ( K- k c+1 ) D c+1 T ( D Q -( c+1 ) ) F 2 (20)
where Xc+1 = ( X 1 c+1 ,    X 2 c+1 ,   ...   ,    X c c+1 ) .

Denote A = YDQ−(c+1)(Q−(c+1))TX, B = (Y1D1 X 1 1 , Y2D2 X 2 2 , …, YcDc X c c ), C = DQ−(c+1), then we have   

min D c+1 A- D c+1 X c+1 F 2 + i=1 c B- D c+1 X c+1 F 2 + w 1 n c+1 k c+1 2 D c+1 T D c+1 - I k c+1 F 2 + w 2 n c+1 k c+1 ( K- k c+1 ) D c+1 T C F 2 (21)

Denote U = [A, B], V = [Xc+1, Xc+1], Eq. (21) is equivalent to the following problem.   

min D c+1 U- D c+1 V F 2 + w 1 n c+1 k c+1 2 D c+1 T D c+1 - I k c+1 F 2 + w 2 n c+1 k c+1 ( K- k c+1 ) D c+1 T C F 2 (22)

Similar to Eqs. (18), (22) can be solved by MOCOD algorithm.

3.4. Classification

For the CDSDL algorithm, both the reconstructive error associated with different class and the coding vector can achieve good classification performance. Based on the coding vector, there are two kinds of classifier as follows.

For a global classifier, a test sample y ˆ is computed over the whole dictionary D as follows.   

x ˆ =arg   mi n x y ˆ -Dx 2 2 +λ x 1 (23)

According to arg mini y ˆ - D i x ˆ i 2 2 , the class of y ˆ is the index i of the smallest reconstruction error y ˆ - D i x ˆ i 2 2 , where x ˆ = [ x ˆ 1 ; …; x ˆ i ; …; x ˆ c ; x ˆ c+1 ]∈ K , i = 1, 2, ..., c, x ˆ i is the sub-vector of x ˆ that corresponding to the sub-dictionary Di, the global coding vector x ˆ G equals x ˆ .

For a local classifier, we stack a class-specific sub-dictionary Di and a shared sub-dictionary Dc+1 to construct c compositional dictionaries. A test sample y ˆ is computed over each compositional dictionary as follows.   

x ˆ i =arg   mi n x y ˆ -( D i ,    D c+1 ) x 2 2 +λ x 1 (24)

According to arg mini y ˆ -( D i ,    D c+1 ) x ˆ i 2 2 , the class of y ˆ is the index i of the smallest reconstruction error y ˆ -( D i ,    D c+1 ) x ˆ i 2 2 , where x ˆ i = [ x ˆ i i ; x ˆ i c+1 ]∈ k i + k c+1 , i = 1, 2, ..., c, x ˆ i i is the sub-vector corresponding to the class-specific sub-dictionary Di, x ˆ i c+1 is the sub-vector corresponding to the shared sub-dictionary Dc+1.

In fact, x ˆ i c+1 , which just works for reconstruction, can be omitted. Therefore, we stack c remaining x ˆ i i together, and denote the local coding vector x ˆ L = [ x ˆ 1 i ; …; x ˆ i i ; …; x ˆ c i ]∈ K- k c+1 .

The x ˆ G and x ˆ L can be directly treated as the input feature vector of linear one versus rest (OVR) SVM, respectively.

4. Experimental Results

The dataset of steel surface images as shown in Fig. 1 was collected from practical steel sheet production line. The dataset comprises four kinds of typical surface defects (Folding, Line, Patch, and Scratch) and one kind of non-defective (Original). There are 300 8-bit grayscale images per class, whose image size is 200×200 pixels. Each image is resized from 200×200 pixels to 40×40 pixels to avoid over-fitting. Because of non-uniform brightness of the images, each image is normalized to have zero mean and unit l2-norm. We randomly choose 50% images per class for a training dataset, and the rest is a testing dataset. The classification accuracy is defined as follows.   

Acc= N R N (25)
where NR is the number of test samples that are correctly classified, N is the total number of test samples. To deal with random influence, we do each experiment 10 times and have the mean and standard deviation of results. The weight parameters λ1, λ2, w1, w2 and sparsity T are tuned with 5-fold cross validation. Finally, we set λ1 = 0.1, λ2 = 0.001, w1 = 0.01, w2 = 0.1, T = 5, respectively. We consider further fine-tuning may further improve the classification performance.

The relationship between classification accuracy and size of sub-dictionary is shown in Fig. 3. Figure 3(a), in which the size of class-specific sub-dictionary is fixed, shows that classification performance decreases with increasing the size of shared sub-dictionary. The possible reasons are that smaller shared sub-dictionary is enough to capture the shared features of the defect images, and that larger shared sub-dictionary may cause redundancy of the feature vector. Figure 3(b), in which the size of shared sub-dictionary is fixed, shows the classification performance improves with increasing the size of class-specific sub-dictionary. The possible reason is that more discriminative information can be captured by a larger class-specific sub-dictionary. In fact, larger size of dictionary may have stronger representative ability of details and achieve better classification performance at the expense of increasing computational load. When the class-specific sub-dictionary reaches to a certain size, classification performance won’t develop. Therefore, we should make a tradeoff between classification performance and computational efficiency. As shown in Fig. 3, when the size of shared sub-dictionary ks = 2 and the size of class-specific sub-dictionary kc = 25, CDSDL can still have higher classification accuracy 94.25%. From kc = 50 to kc = 25, classification accuracy only drops by 1.58%. Therefore, the CDSDL algorithm is simple and efficient to learn a compact, reconstructive and discriminative dictionary, which can boost classification performance and reduce computational load.

Fig. 3.

The classification accuracy versus the number of atoms in sub-dictionary, where the legend denotes the number of atoms in the class-specific sub-dictionary (top) and the shared sub-dictionary (bottom), respectively.

Figure 4 shows that objective function value of the CDSDL algorithm decreases fast and converges within about 5–8 iterations.

Fig. 4.

The convergence curve of the CDSDL algorithm.

Figure 5 shows that the global coding vector matrix of training dataset and testing dataset are nearly diagonal block. Each block indicates the main part of coding matrix for the corresponding class-specific sub-dictionary. Therefore, the discrimination of the coding vector which can improve classification performance is enhanced by the CDSDL algorithm.

Fig. 5.

The visualization image of global coding vector matrix (Folding, Line, Non-defect, Patch, Scratch): training dataset (top); testing dataset (bottom).

The CDSDL algorithm is compared with several state-of-the-art methods, including COPAR,35) FDDL,33) LCKSVD31) and SRC.22) As shown in Table 2, CDSDL with a local classifier achieves 94.25% classification accuracy, compared to 91.85% for COPAR, 51.89% for FDDL, 66.99% for LCKSVD and 56.96% for SRC. Compared to SRC, which is the baseline method in the experiment, CDSDL improves the classification accuracy with a margin of more than 37%. CDSDL outperforms FDDL by about 42.36%. CDSDL is competitive with 2.4% improvement to 91.85% achieved by COPAR. Therefore, CDSDL has better classification performance than these methods. Besides, a local classifier (94.25%) of CDSDL achieves much better performance than a global classifier (83.51%) with about 10% advancement. The probable reason is that the number of training samples of each class is relatively large, or the learned class-specific sub-dictionary has large size.

Table 2. The comparison of classification accuracy of different methods.
MethodAccuracy (%)
COPAR91.85±1.02
FDDL51.89±4.31
LCKSVD66.99±1.42
SRC56.96±1.20
CDSDL94.25±0.42

According to Fig. 6, error rate of scratch is bigger than other defects. There are 33 images of scratch are misclassified to 9 folding, 10 line and 13 patch. The possible reasons for this phenomenon are listed as follows. (1) These defects are similar with scratch in size, direction and illumination. (2) The CDSDL algorithm is sensitive to scratch defect, and thus generates the improper feature representation. We will make a further research for this phenomenon. Figure 7 shows that some scratch samples are misclassified to other defects.

Fig. 6.

The confusion matrix, where the element located in the i-th row and the j-th column indicates the number of the j-th class sample is misclassified to the i-th class, ji.

Fig. 7.

Some scratch samples are misclassified to other defects: (a) Scratch→Folding; (b) Scratch→Line; (c) Scratch→Patch.

To evaluate robustness of the CDSDL algorithm, additive Gaussian noise was added to some surface images. We randomly choose 50% images per class as a training dataset, and the rest is a testing dataset. All the training samples are added to Gaussian noise with different signal to noise ratio (SNR), including 15 dB, 20 dB and 25 dB. As shown in Fig. 8, the same image is polluted by Gaussian noise with different SNR. According to Table 3, CDSDL can achieve 92.45% classification accuracy even at 25 dB noise.

Fig. 8.

The same image is polluted by Gaussian noise: (a) 15 dB; (b) 20 dB; (c) 25 dB; (d) Original.

Table 3. The classification accuracy of the CDSDL algorithm for different Gaussian noise.
SNR (dB)Accuracy (%)
1565.77±2.50
2085.88±1.63
2592.45±0.29

5. Conclusion

A classification method based on class-specific and shared sub-dictionary learning is presented in the paper. The class-specific sub-dictionary learning and shared sub-dictionary learning are jointly formulated with reconstructive error, sparse and discriminative promotion constraints. Each class-specific sub-dictionary has good performance to represent the sample of corresponding class, and the shared sub-dictionary contributes to reconstruction of all the samples. Therefore, the associated feature representation based on class-specific sub-dictionary and shared sub-dictionary not only faithfully represent the sample from anyone class, but also are more compact and discriminative for classification. Experimental results show that the CDSDL algorithm can achieve better accuracy in classification of surface defects of steel sheet.

Acknowledgements

This work was supported by the Natural Science Foundation of China, under grant No. 51174151, and the Specialized Research Fund for the Key Science and Technology Innovation Plan of Hubei Province, under grant No. 2013AAA011. The authors would like to thank Kechen Song and Yungang Tan for providing the defect images. MATLAB procedure was revised and optimized from.27,33,35,44)

Appendix

Appendix 1: Derivation of F(Xi) in Eq. (12)

Denote E i j = ( 1 1 1 1 ) n i × n j , I n i × n i = ( 1 0 0 1 ) n i × n i , O i = I n i ×  n i - E i i n i , then, we have   

X i - M i F 2 = X i ( 1 0 0 1 ) n i × n i - X i ( 1 n i 1 n i 1 n i 1 n i ) n i × n i F 2 = X i I n i × n i - X i E i i n i F 2 = X i O i F 2

Denote Q i j = E i j N , P i = E i i n i - E i i N = E i i n i - Q i i , R= j=1 ji c X j Q j i , then, we have   

i=1 c M i -M F 2 = X i E i i n i -( X i E i i N + j=1 ji c X j E j i N ) F 2 = X i E i i n i -( X i Q i i + j=1 ji c X j Q j i ) F 2 = X i E i i n i - X i Q i i - j=1 ji c X j Q j i F 2 = X i ( E i i n i - Q i i ) - j=1 ji c X j Q j i F 2 = X i P i - j=1 ji c X j Q j i F 2

For the i-th class, we have X i P i - j=1 ji c X j Q j i F 2

For the non i-th class, we have j=1 ji c X j P j - l=1 lj c X l Q l j F 2

Choose the j-th part of the i-th class, we have   

X j P j - l=1 lj c X l Q l j F 2 = X j E j j n j - X j Q j j - l=1 lj c X l Q l j F 2 = X j E j j n j -( X j Q j j + l=1 lj c X l Q l j ) F 2

Separate Xi, we have X j E j j n j -( X i Q i j + l=1 li c X l Q l j ) F 2 = X j E j j n j - l=1 li c X l Q l j - X i Q i j F 2 = T j - X i Q i j F 2

Expand the anyone non i-th class, j=1 ji c T j - X i Q i j F 2 , then, we have   

X i - M i F 2 + i=1 c M i -M F 2 = X i O i F 2 - X i P i -R F 2 - j=1 ji c T j - X i Q i j F 2

Calculate the first derivative with respect to Xi, we have   

X i O i F 2 X i =2 X i O i O i T X i P i -R F 2 X i =2( X i P i P i T -R P i T ) j=1 ji c T j - X i Q i j F 2 X i = 2 j=1 ji c [ X i Q i j ( Q i j ) T - T j ( Q i j ) T ]

References
 
© 2017 by The Iron and Steel Institute of Japan
feedback
Top