2017 Volume 57 Issue 1 Pages 123-130
An approach to a class-specific and shared dictionary learning (CDSDL) for sparse representation is proposed to classify surface defects of steel sheet. The proposed CDSDL algorithm is modelled as a unified objective function, covering reconstructive error, sparse and discriminative promotion constraints. With the high-quality dictionary, the compact, reconstructive and discriminative feature representation of an image can be extracted. Then the classification can be efficiently performed by discriminative information obtained from the reconstructive error or the sparse vector. Based on a dataset of surface images captured from a practical steel production line, the CDSDL algorithm is carried out to verify its effectiveness. Experimental results indicate that the CDSDL algorithm is more effective in classifying surface defects of steel sheet than other algorithms.
Machine vision-based classification of surface defects of steel sheet has attracted much attention due to the potential value and challenges for some practical industrial applications.1,2,3,4,5,6,7,8) As a classical pattern recognition problem, feature extraction (learning) and designing of classifier construct the core. Most of existing works have focused on these two aspects to enhance the classification performance over the past few years.9,10,11,12,13,14,15,16,17,18,19,20) A set of features are studied, including grayscale features,9,10) geometric features,9,10) shape features,9,10,11) texture features,10,12) Gabor filter output features,13) Fourier spectral,14) wavelet transform,14,15,16,17) local binary pattern (LBP),18,19) shearlet transform;20) and the classifier of artificial neural network (ANN)12,13,15) and support vector machine (SVM)9,10,14,17,18,19,20) is discussed. However, these approaches, which treat feature extraction and classifier training as two separating steps, are difficult to control the interaction of two steps. In fact, sparse representation and dictionary learning based classification methods, with no explicit stage of feature extraction, may be more effective in classification of surface defects of steel sheet. The sparse representation and dictionary learning, which can be formulated as follows, has also been successfully applied in many machine vision applications,21) such as face recognition.22)
(1) |
A choice of dictionary D plays a key role to obtain a sparse, high-fidelity and discriminative feature representation of an image. Many state-of-the-art results have shown that a good dictionary is learned from the training dataset itself.23,24) For example, given a set of feature vectors {yi}i = 1, 2, ..., N in
The surface defects of steel sheet as shown in Fig. 1 can be regarded as the local anomaly against the relatively homogeneous background. The defect area of an image (discriminative pattern) occupies a small portion of an image. The background texture of an image (shared pattern), which is useful for reconstruction rather than discrimination, is nearly the same. Because the class-specific information of defects is only a small proportion in the total information of an image, the feature representation of different defect images aren’t discriminative. If a traditional discriminative dictionary learning method is adopted, most of atoms of the learned dictionary are used to represent the background information that contributes nothing to classify different defects, and only a small part of atoms are used to represent the class-specific information of defects. Therefore, the discriminative ability of a coding vector between different defect images will reduce, and the poor classification performance may be achieved.
Examples of practical surface defects of steel sheet: (a) Folding; (b) Line; (c) Patch; (d) Scratch; (e) Non-defective.
Based on the above analysis, a class-specific and shared dictionary learning (CDSDL) algorithm, in which some class-specific sub-dictionaries and only one shared sub-dictionary can be learned simultaneously, is proposed. Each class-specific sub-dictionary encodes the exclusive pattern for corresponding class, and the shared sub-dictionary encodes the explicit and hidden common pattern that shared by all the classes. With these sub-dictionaries, the exclusive pattern and the shared pattern of an image can be explicitly separated. Because the atoms of these sub-dictionaries can capture the inherent structure of a defect image, the effective feature representation of the image for classification can be obtained.
The remainder of this paper is organized as follows. In Section 2, we briefly introduce some related works about classification of surface defects of steel sheet, sparse representation and dictionary learning based classification. In Section 3, the proposed CDSDL algorithm, including formulation and optimization, is introduced. In Section 4, experimental results are presented. Finally, conclusions are given.
In the past decades, a lot of works about classification of surface defects of steel sheet have been studied by some researchers. Caleb et al. presented a classification method of surface defects of hot rolled steel based on the classifier of multi-layer perceptron (MLP) and self-organising maps network (SOM).12) Medina et al. introduced an approach to classify six different kinds of defects of flat steel coils by combining an ANN, a k-nearest neighbor (KNN) and a naive Bayesian.13) Ghorai et al. exploited an approach of automatic defect detection of hot rolled flat steel based on three-level Haar feature set.17) Hu et al. established a SVM model to classify five types of surface defects of stripe steel.9) Furthermore, they optimized feature vector of image and kernel function of SVM by hybrid chromosome genetic algorithm and had better classification performance.10) Moreover, Chu et al. constructed an invariant feature extraction approach based on smoothed LBP.19) However, these methods largely depend on an appropriate set of hand-crafted features, which need the special design for the special task.
Recently, some sparse representation and dictionary learning based classification methods, which are mainly divided into two categories, have also been paid more attention. One category aims to improve the discriminative capability of a dictionary.27,28) Ramirez et al. introduced a structured incoherence term for promoting the independence of sub-dictionary associated with different classes.27) Gao et al. learned both some category-specific sub-dictionaries and a shared dictionary by imposing cross-incoherence constraint between different sub-dictionaries and self-incoherence constraint in each sub-dictionary.28) For these methods, the reconstructive error associated with different classes can be used for classification, but a coding vector, which isn’t discriminative, isn’t suitable for classification. Another category aims to improve the discriminative capability of a coding vector based on a discriminative dictionary.29,30,31,32) Huang et al. presented a method that quantified the discriminative ability of a coding vector by Fisher’s discrimination criterion.29) Zhang et al. introduced a classification error term of a linear classifier.30) Furthermore, Jiang et al. exploited a label consistency regularization term.31) These methods have learned a shared dictionary by all the classes and treat a coding vector as a new feature representation of an original image. Although the coding vector is discriminative, the classification based on the class-specific reconstructive error doesn’t work. The main difference between these two kinds of method is the discriminative capability of a dictionary or a coding vector. If the dictionary discrimination and the coding vector discrimination can be combined, better classification performance may be gained.
Given a training dataset of c classes, the CDSDL algorithm constructs c discriminative class-specific sub-dictionaries for each class and a shared sub-dictionary for all classes, respectively. Some structural incoherence terms are added in the objective function to promote the incoherence for the sub-dictionary and the coding vector. The coding vector over the learned dictionary, which can be directly treated as a new feature representation of an image, contains crucial information for reconstruction and discriminative information for classification. The coding vector of a defect image of the same class is similar, while the coding vector of different class is dissimilar. Therefore, both the reconstructive error associated with different classes and the coding vector can be used for classification. In addition, we can amplify the discriminative part and suppress the shared part for feature representation of a defect image by controlling the number of atoms of sub-dictionary. The illustration of the CDSDL algorithm is shown in Fig. 2.
An illustration of the CDSDL algorithm. Each image can be sparsely represented by a few atoms from corresponding class-specific sub-dictionary Di (purple) and shared sub-dictionary Dc+1 (red).
Let Y = [Y1, ..., Yi, ..., Yc]∈
(2) |
(3) |
(4) |
(5) |
F(D, X; Y) is a reconstructive error term, G(X) is a Fisher discriminative term imposed on the coding vector matrix X, H(D) is a structural incoherence term imposed on the dictionary D, each atom dj of D is constrained to unit l2-norm to avoid trivial solution X.
Xi = [Xi1; ...; Xii; ...; Xic; Xic+1]∈
Qi∈
The term
Because Di is a class-specific sub-dictionary and Dc+1 is a shared sub-dictionary, the term
The term G(X) directly operates the coding vector and promotes them to more discriminative between different classes. Based on Fisher’s linear discriminant,43) which maximizes the ratio of between-class scatter matrix to within-class scatter matrix, we can jointly minimize the within-class scatter matrix SW(X) and maximum the between-class scatter matrix SB(X). SW(X) and SB(X) are defined as follows.
(6) |
(7) |
Therefore
(8) |
(9) |
The combination of
The term
The term
The weight w1
An incoherent constraint on sub-dictionary was proposed in,27,28,35) a Fisher discrimination constraint on coding vector was presented in,33,34) and a class-specific and shared dictionary was adopted in.28,34,35) However, there are mainly two differences between the proposed CDSDL algorithm and these algorithms. Firstly, Ramirez27) and Yang33) only learn some class-specific sub-dictionaries, which can’t separate the class-specific features and the shared features. Secondly, the constraints of these algorithms are imposed on either sub-dictionary27,28,35) or coding vector,33,34) while the proposed CDSDL algorithm explicitly imposes the constraints on both sub-dictionary and coding vector.
3.3. Optimization of CDSDLThe optimization of the CDSDL algorithm is not jointly convex, but is convex with respect to each variable when the others are fixed. The optimization problem can be divided into three sub-problems: (1) Solving X with fixed D; (2) Solving Di with fixed X and D-j, j ≠ i, where D-j is the sub-matrix by removing Dj from D; (3) Solving Dc+1 with fixed X and D-(c+1). An effective iterative algorithm that alternatively optimizes objective function J with respect to coding vector matrix X, class-specific sub-dictionary Di and shared sub-dictionary Dc+1 is introduced as follows. The CDSDL algorithm is summarized in Table 1.
Algorithm 1: Class-specific and Shared Dictionary Learning |
---|
Input:Training dataset Y = {Yi}i= 1, 2, ...,c; size of sub-dictionary ki, i = 1, 2, ..., c, c+1; parameters λ1, λ2, w1, w2, and T. |
Output:The learned dictionary D = {Di}i= 1, 2, ...,c,c+1, Di is a class-specific sub-dictionary, i ≠ c+1, Dc+1 is a shared sub-dictionary. |
Initialize:The class-specific sub-dictionary Di is initialized by K-SVD in Yi, the shared sub-dictionary Dc+1 is initialized by K-SVD in Y. |
While stop criterion isn’t reached |
Update of X Equation (11) is solved by FISTA and TwIST |
Update of Di Equation (18) is solved by MOCOD |
Update of Dc+1 Equation (22) is solved by MOCOD |
End While |
With fixed D, J can be rewritten as follows.
(10) |
Equation (10) can be rewritten as follows.
(11) |
Because F(Xi) is convex and differentiable to Xi, a fast iterative shrinkage thresholding algorithm (FISTA)42) and a two-step iterative shrinkage/thresholding (TwIST) algorithm45) can be employed to solve Eq. (11).
The first derivative of F(Xi) with respect to Xi is calculated, some detailed steps are listed in Appendix 1, then we have
(12) |
According to FISTA and TwIST algorithm, we have
(13) |
With fixed X and D-j, j ≠ i, J can be rewritten as follows.
(14) |
Then, we have
(15) |
Denote A = [Y, Y]∈
(16) |
Therefore
(17) |
Denote
(18) |
Equation (18) can be solved by method of optimal coherence-constrained directions (MOCOD)44) algorithm.
3.3.3. Update of Dc+1With fixed X and D-(c+1), J can be rewritten as follows.
(19) |
Because
(20) |
Denote A = Y – DQ−(c+1)(Q−(c+1))TX, B = (Y1−D1
(21) |
Denote U = [A, B], V = [Xc+1, Xc+1], Eq. (21) is equivalent to the following problem.
(22) |
Similar to Eqs. (18), (22) can be solved by MOCOD algorithm.
3.4. ClassificationFor the CDSDL algorithm, both the reconstructive error associated with different class and the coding vector can achieve good classification performance. Based on the coding vector, there are two kinds of classifier as follows.
For a global classifier, a test sample
(23) |
According to arg mini
For a local classifier, we stack a class-specific sub-dictionary Di and a shared sub-dictionary Dc+1 to construct c compositional dictionaries. A test sample
(24) |
According to arg mini
In fact,
The
The dataset of steel surface images as shown in Fig. 1 was collected from practical steel sheet production line. The dataset comprises four kinds of typical surface defects (Folding, Line, Patch, and Scratch) and one kind of non-defective (Original). There are 300 8-bit grayscale images per class, whose image size is 200×200 pixels. Each image is resized from 200×200 pixels to 40×40 pixels to avoid over-fitting. Because of non-uniform brightness of the images, each image is normalized to have zero mean and unit l2-norm. We randomly choose 50% images per class for a training dataset, and the rest is a testing dataset. The classification accuracy is defined as follows.
(25) |
The relationship between classification accuracy and size of sub-dictionary is shown in Fig. 3. Figure 3(a), in which the size of class-specific sub-dictionary is fixed, shows that classification performance decreases with increasing the size of shared sub-dictionary. The possible reasons are that smaller shared sub-dictionary is enough to capture the shared features of the defect images, and that larger shared sub-dictionary may cause redundancy of the feature vector. Figure 3(b), in which the size of shared sub-dictionary is fixed, shows the classification performance improves with increasing the size of class-specific sub-dictionary. The possible reason is that more discriminative information can be captured by a larger class-specific sub-dictionary. In fact, larger size of dictionary may have stronger representative ability of details and achieve better classification performance at the expense of increasing computational load. When the class-specific sub-dictionary reaches to a certain size, classification performance won’t develop. Therefore, we should make a tradeoff between classification performance and computational efficiency. As shown in Fig. 3, when the size of shared sub-dictionary ks = 2 and the size of class-specific sub-dictionary kc = 25, CDSDL can still have higher classification accuracy 94.25%. From kc = 50 to kc = 25, classification accuracy only drops by 1.58%. Therefore, the CDSDL algorithm is simple and efficient to learn a compact, reconstructive and discriminative dictionary, which can boost classification performance and reduce computational load.
The classification accuracy versus the number of atoms in sub-dictionary, where the legend denotes the number of atoms in the class-specific sub-dictionary (top) and the shared sub-dictionary (bottom), respectively.
Figure 4 shows that objective function value of the CDSDL algorithm decreases fast and converges within about 5–8 iterations.
The convergence curve of the CDSDL algorithm.
Figure 5 shows that the global coding vector matrix of training dataset and testing dataset are nearly diagonal block. Each block indicates the main part of coding matrix for the corresponding class-specific sub-dictionary. Therefore, the discrimination of the coding vector which can improve classification performance is enhanced by the CDSDL algorithm.
The visualization image of global coding vector matrix (Folding, Line, Non-defect, Patch, Scratch): training dataset (top); testing dataset (bottom).
The CDSDL algorithm is compared with several state-of-the-art methods, including COPAR,35) FDDL,33) LCKSVD31) and SRC.22) As shown in Table 2, CDSDL with a local classifier achieves 94.25% classification accuracy, compared to 91.85% for COPAR, 51.89% for FDDL, 66.99% for LCKSVD and 56.96% for SRC. Compared to SRC, which is the baseline method in the experiment, CDSDL improves the classification accuracy with a margin of more than 37%. CDSDL outperforms FDDL by about 42.36%. CDSDL is competitive with 2.4% improvement to 91.85% achieved by COPAR. Therefore, CDSDL has better classification performance than these methods. Besides, a local classifier (94.25%) of CDSDL achieves much better performance than a global classifier (83.51%) with about 10% advancement. The probable reason is that the number of training samples of each class is relatively large, or the learned class-specific sub-dictionary has large size.
Method | Accuracy (%) |
---|---|
COPAR | 91.85±1.02 |
FDDL | 51.89±4.31 |
LCKSVD | 66.99±1.42 |
SRC | 56.96±1.20 |
CDSDL | 94.25±0.42 |
According to Fig. 6, error rate of scratch is bigger than other defects. There are 33 images of scratch are misclassified to 9 folding, 10 line and 13 patch. The possible reasons for this phenomenon are listed as follows. (1) These defects are similar with scratch in size, direction and illumination. (2) The CDSDL algorithm is sensitive to scratch defect, and thus generates the improper feature representation. We will make a further research for this phenomenon. Figure 7 shows that some scratch samples are misclassified to other defects.
The confusion matrix, where the element located in the i-th row and the j-th column indicates the number of the j-th class sample is misclassified to the i-th class, j ≠ i.
Some scratch samples are misclassified to other defects: (a) Scratch→Folding; (b) Scratch→Line; (c) Scratch→Patch.
To evaluate robustness of the CDSDL algorithm, additive Gaussian noise was added to some surface images. We randomly choose 50% images per class as a training dataset, and the rest is a testing dataset. All the training samples are added to Gaussian noise with different signal to noise ratio (SNR), including 15 dB, 20 dB and 25 dB. As shown in Fig. 8, the same image is polluted by Gaussian noise with different SNR. According to Table 3, CDSDL can achieve 92.45% classification accuracy even at 25 dB noise.
The same image is polluted by Gaussian noise: (a) 15 dB; (b) 20 dB; (c) 25 dB; (d) Original.
SNR (dB) | Accuracy (%) |
---|---|
15 | 65.77±2.50 |
20 | 85.88±1.63 |
25 | 92.45±0.29 |
A classification method based on class-specific and shared sub-dictionary learning is presented in the paper. The class-specific sub-dictionary learning and shared sub-dictionary learning are jointly formulated with reconstructive error, sparse and discriminative promotion constraints. Each class-specific sub-dictionary has good performance to represent the sample of corresponding class, and the shared sub-dictionary contributes to reconstruction of all the samples. Therefore, the associated feature representation based on class-specific sub-dictionary and shared sub-dictionary not only faithfully represent the sample from anyone class, but also are more compact and discriminative for classification. Experimental results show that the CDSDL algorithm can achieve better accuracy in classification of surface defects of steel sheet.
This work was supported by the Natural Science Foundation of China, under grant No. 51174151, and the Specialized Research Fund for the Key Science and Technology Innovation Plan of Hubei Province, under grant No. 2013AAA011. The authors would like to thank Kechen Song and Yungang Tan for providing the defect images. MATLAB procedure was revised and optimized from.27,33,35,44)
Appendix 1: Derivation of F(Xi) in Eq. (12)
Denote
Denote
For the i-th class, we have
For the non i-th class, we have
Choose the j-th part of the i-th class, we have
Separate Xi, we have
Expand the anyone non i-th class,
Calculate the first derivative with respect to Xi, we have