2016 Volume 22 Issue 2 Pages 267-277
This paper presents a study that was performed for rapid and noninvasive detection of waxed chestnuts using hyper-spectral imaging. A visual near-infrared (400–1026 nm) hyper-spectral imaging system was assembled to acquire scattering images from two groups of chestnuts (waxed and non-waxed chestnuts). The spectra of the samples were extracted from the hyper-spectral images using image segmentation process. Then multiplicative scatter correction (MSC) was conducted to preprocess the original spectra. Effective wavelengths were selected to reduce the computational burden of the hyper-spectral data. Using the seven effective wavelengths that were obtained from a successive projections algorithm (SPA), three calibration algorithms were compared: partial least squares regression (PLSR), multiple linear regression (MLR) and linear discriminant analysis (LDA). The best model for discriminating between waxed and non-waxed chestnuts was found to be the MSC-SPA-MLR model.
The chestnut (Castanea mollissima Bl.) is an important edible fruit in the northern hemisphere, which has been consumed as extensively as the potato in the past (Ferreira-Cardoso et al., 1999). and has become increasingly important in human nutrition because of its nutrient composition and potential beneficial health effects, for example, in reducing coronary heart disease and cancer rates (Sabaté et al., 2001). Fried chestnut is the most popular preparation method for chestnuts. However, some unscrupulous traders add industrial wax while frying chestnuts to ensure that the chestnuts are brighter and more attractive. Generally, industrial wax is extracted directly from oil, which will also extract polycyclic aromatic hydrocarbons and polycyclic aromatic hydrocarbons which are present in the industrial process, both of which are highly carcinogenic (Bonser et al., 1963). Mixed industrial wax will infiltrate into the pulp in the process of frying chestnuts which seriously endangers human health. Hence, a sorting process is necessary to identify and screen out waxed chestnuts.
Traditionally, screening waxed chestnuts is usually dependent on subjective criteria to distinguish between waxed and non-waxed chestnuts, taking into consideration that brighter and more attractive chestnuts are mixed with industrial wax. Although industrial wax will make chestnuts brighter and more attractive, it is not a reliable way to identify waxed chestnuts. Many other factors will affect the appearance of chestnuts, such as different chestnut varieties and different roasting processes. Alternatively, other inspection techniques such as chemical procedures, screening methods, and instrumental methods have been used to provide information about chestnut quality (Prieto et al., 2009). These methods usually use some physical techniques and chemical reactions to determine whether chestnuts are waxed or not. However, these methods are time-consuming, inefficient, destructive, and have high energy consumption, and thus are consequently unsuitable for rapid identification of waxed chestnuts.
Hyper-spectral imaging (also called imaging spectrometry or imaging spectroscopy) is an emerging analytical technology which is receiving growing attention as a rapid, efficient, and non-destructive analytical tool for quantitative determination of the quality and safety of agricultural goods. By combining the advantages of computer vision and spectroscopy technology into one system, hyper-spectral imaging can generate a spatial image of the spectral variation, which provides structural, chemical, physical and functional information about the samples (Zhang and He, 2013). For a hyper-spectral image, a complete wavelength spectrum is calculated for each pixel in the sample over the whole available spectral range. By combining the spectrum data series at each pixel, a three-dimensional hyper-spectral image called a “hypercube” can be established, which has two dimensions that represent the spatial information and a third dimension that describes the spectral information (Wu et al., 2013). Due to the unique potential of the imaging spectroscopy technique, many researchers have been attracted to this powerful analytical technique for detection of many different types of products, including meat (Kamruzzaman et al.,, Barbin et al., 2012), fruit (Isaksson and Næs, 1988, Rajkumar et al., 2012), and vegetables (Zhang and Slaughter, 2011, Diezma et al., 2013). However, a hyper-spectral imaging system acquires a substantial number of hyper-spectral images, each composed of a large amount of information, which complicates the process of predicting the value of any single dependent variable.
One way to overcome this problem is to implement the hyper-spectral imaging technique in conjunction with multivariate methods which will preprocess the original obtained spectrum and decrease the amount of data by identifying effective wavelengths for rapid and accurate quantitative or qualitative analysis of food quality (Liu et al., 2014b). Preprocessing of the original spectrum is necessary to make the samples easier to handle and establish a stable and reliable basis for the forecasting model. The preprocessing aims to remove spectral noise, extract useful information, and to weaken or eliminate all unrelated factors impacting the spectrum. Spectrum preprocessing is particularly important for the modelling process, and plays a very important role in the stability of the model (Lu et al., 2010). Selection of effective wavelengths by multivariate analysis is also an important step because the removal of highly correlated variables produces better prediction and a simpler process. The effective wavelength selection algorithms usually consist of choosing a small subset of effective wavelengths that carry the most important information of the whole spectrum. This approach reduces the complexity of the data and improves the stability and predictive capability of the model. In addition, the selection of effective wavelengths saves overall analysis time, making the model more suitable for online automated quality control systems (Liu et al., 2014a). Well-selected effective wavelengths may be even more efficient than using the whole spectrum range in producing robust calibration models that are simple, cost-effective and amenable to automated industrial applications (Wold et al., 1996).
In recent years, the visible-NIR hyper-spectral imaging technique has been employed in a wide range of applications to quantify and control quality parameters with high precision. In particular, it has been used for detection of the quality and safety of agricultural and sideline products (Liu et al., 2003, Shahin and Symons, 2012). (Li et al., 2011) applied principal component analysis (PCA) and a band ratio coupled with a simple threshold method to develop an algorithm to detect defects on oranges by using a visible-NIR hyper-spectral imaging system. (Liu et al., 2014a) applied PCA and partial least squares regression (PLSR) to identify a subset of information-rich wavelengths and developed statistical models to demonstrate the potential of visible-NIR hyper-spectral imaging as an objective and non-destructive method for rapid determination of color and pH of porcine meat during the salting process. A successive projections algorithm (SPA) was demonstrated to be a useful method to select effective wavelengths to improve the stability and predictive capability of a model established by multiple linear regression (MLR) to determine the water content of beef slices (Wu et al., 2013). To the best of our knowledge, there have been no previous investigations reported regarding the detection of waxed chestnuts using hyper-spectral imaging.
In summary, the purpose of this study is detection of waxed chestnuts using hyper-spectral imaging in the visible-NIR region of. The specific objectives of this study are to:
Sample Preparation A total of 440 chestnut samples with similar shape (hemisphere-shaped), size (2.5–3 cm in length, 2–2.5 cm in width and 2 cm in height), and color (brown) were collected from two different regions (Yanshan, China and Taishan, China). Before processing, all samples were mixed together and then divided into two groups (1 and 2) randomly, resulting in 220 chestnuts in each group. Both groups of chestnuts were fried under the same time (30 min), same temperature (about 300°C) and same fried chestnut machine(model WL-25; Run Lian Company, Hebei, China) except that industrial wax (No. CAS 8020-83-5; colorless) was used during the frying for group 2 but not for group 1. After processing, all samples were placed in cold storage (4°C) for further study. Before acquiring the hyper-spectral images, all chestnuts were removed from storage and maintained at room temperature (25°C) for 4 h to ensure that the experimental conditions were consistent.
Hyper-spectral Imaging System Each sample was imaged individually using a line scan push broom imaging spectrometry system, as illustrated in Figure 1. This imaging spectrometry system mainly consists of: an imaging spectrograph (ImSpetorV10E, Spectral Imaging Ltd., Oulu, Finland) covering a spectral range from 400–1026 nm, a high-resolution 1392 × 1040 digital camera (Imperx, IGV-B1410M-SC000, USA) for the spectral range 400–1026 nm with size 8.978 × 6.708 mm, a camera lens (Schneider Kreuznach, Germany) for the spectral range 400–1026 nm with a numerical aperture of 2.4, a computer which run the imaging spectrograph software (Spectral Image-VINR and HIS Analyzer; Wuling Company, Taiwan, China) to control the exposure time and wavelength range, generate spatial maps and extract useful spatial information, a specially-assembled light unit consisting of a 150-W tungsten halogen lamp and two quartz rectangular lampshades as the light source (Model 3900; Illumination Technologies, Inc., New York, USA), a conveyer belt operated by a stepper motor (Model WN232TA300M-F; Weinaguangke Company, Beijing, China), and a computer running the stepper motor software (DyiTV1.1.5; Weinaguangke Company, Beijing, China) which controls the motor speed, acceleration speed and distance. To obtain the same spatial shape of the object in the image, the motor was adjusted to a speed of 420 µm/s with an exposure time of 2.5 ms during image acquisition. The illumination was focused on the surface of the sample at the same height with the camera lens' focal plane. The width of each light line on the sample was 12 mm, which was enough for the detector's field of view. The camera lens was positioned 280 mm above the surface of the samples and the height could then be controlled at ±10 mm to get the best image. The resolution of the imaging spectrograph was 2.73 nm. The detector performed linear slit scanning along the Y-axis as the sample moved along the X-axis, to generate the hyper-spectral images. The working spectral range of the Imaging Spectroscopy System is 375–1,026 nm with 1,040 spectral bands, however the spectral range of the tungsten halogen lamp is 400–2,500 nm, resulting in a usable spectral range of 400–1026 nm. The key steps for the whole procedure are presented in Figure 2. Details of each of the three steps are provided in the following sections.
Hyper-spectral Imaging System.
Key steps in the experimental procedure
Image Processing
Image Acquisition Each sample was scanned line by line using the detector to acquire the whole range spectral information from the hyper-spectral imaging system. A two-dimensional image (y, λ) with one spatial dimension (y) and one spectral dimension (λ) was acquired each time. A complete hypercube was generated as the linear slit scanned along the x-direction. Imaging scanning software called Spectral Image-VINR integrated all the images into one hyper-spectral image called a “hypercube” with dimensions , and λ for one sample. In total, hyper-spectral images of 440 chestnut samples were acquired. The hyper-spectral data of all samples were then divided into a calibration set with 240 samples (120 waxed and 120 non-waxed chestnuts) and a prediction set with 200 samples (100 waxed and 100 non-waxed chestnuts) using the Kennard-Stone (K-S) algorithm (Saptoro et al., 2012).
Calibration of Hyper-spectral Images The original hyper-spectral images should be calibrated with white and black reference images produced by the imaging analyzer software HIS Analyzer, in order to obtain the reflectance hyper-spectral images. The white reference image (W) was acquired under the same conditions as the raw images (Io) using a white surface board made by Teflon (about 99.99% reflectance). The dark reference image D1 was obtained by turning off the light source and completely covering the camera lens with its black cap. The sample dark reference image (D2) was obtained using the same method as the dark reference image D1, but with the same exposure time as the raw images of the chestnuts, since the raw images were obtained with a different exposure time than the dark reference image . The reflectance hyper-spectral image (Ic) was then calculated based on Eq. (1) as follows (ElMasry et al., 2009):
![]() |
Image Segmentation To conduct spectral data extraction from each chestnut in the hyper-spectral image, a segmentation step was developed to isolate the chestnut samples from the all-black background, using a customized process that was developed using Matlab R2011b software, as shown in Figure 3. The segmentation initially extracted the hypercube (Figure 3b) from the raw captured image as shown in Figure 3a. The image at wavelength 950 nm (the highest reflectance value in the hypercube) was then selected and subtracted from the image at wavelength 430 nm (the lowest reflectance value in the hypercube). This step resulted in an image with high contrast between the sample and the background, as shown in Figure 3c. The resulting image was then segmented by a simple global threshold which was obtained from the image histogram in Matalb R2011b, as shown in Figure 3d. This segmented image was called the “sample mask”, and contained only the chestnut part of the hyper-spectral image. The whole hypercube was masked by the sample mask image to obtain the final segmented images, which contained only the sample parts with an all-black background, as shown in Figure 3e. The final segmented images have identified the regions of interest (ROI) which were used in the subsequent procedure and analysis. The segmentation process used is as described by Wu et al. (2013).
Main steps involved in segmentation of hyper-spectral images. a The original color image, b The corresponding hyperspectral imag. (400–1,026 nm), c image resulting from subtraction of two wavelengths: λ950 − λ430, d sample mask, e the corrected hyper-spectral image (400–1,026 nm).
Spectral Analysis
Spectral Data Extraction and preprocessing After the image segmentation, spectral information was extracted from the final segmented images in the ROI. The mean spectral reflectance of the chestnuts in each segmented image was then calculated by averaging the spectral value of all pixels in the ROI. A total of 300 mean spectral reflectance values were obtained by following this procedure, with the help of customized software called HIS Analyzer which was developed using LabVIEW.
After spectral data extraction, multiplicative scatter correction (MSC) was used for preprocessing. The MSC method uses linear regression of the spectral variables vs. the average spectrum and simultaneously corrects for both multiplicative and additive scatter effects (Isaksson and Næs, 1988). This method has attractive conceptual properties and has given many promising results in other studies (Helland et al., 1995, Geladi et al., 1985, Maleki et al., 2007). Details of MSC for spectral data preprocessing can be found elsewhere (Maleki et al., 2007).
Effective Wavelengths Selection In this study, SPA was applied to select the effective wavelengths that have the greatest contribution to the identification of waxed and non-waxed chestnuts without retaining redundant information. SPA is a novel variable selection algorithm designed to solve collinearity problems by selecting variables with minimal redundancy (Araújo et al., 2001). For this purpose, SPA employs a simple projection operation in a vector space to select subsets of variables with minimum of colinearity. In what follows, the instrumental response data are disposed in a matrix X of dimensions (N × K) such that the kth variable Xk is associated to the kth column vector Xk ∈ ℜN. Let M = min (N − 1, K) be the maximum number of variables.
SPA comprises two phases. The first phase consists of projections carried out on the X matrix, which generate K chains of M variables each. Each element in a chain is selected in order to display the least collinearity with the previous ones. The second phase of SPA consists of evaluating candidate subsets of variables extracted from the chains generated in the first phase. The candidate subset of m variables starting from xk is defined by the index set {SEL (1, k), SEL (2, k), …, SEL (m, k)}. Since m ranges from one to M and k ranges from one to K, a total of M × K subsets of variables are tested. Different prediction performance metrics could be used to choose the best variable subset (Galvao et al., 2008).
Discrimination Models In this study, partial least squares regression (PLSR), linear discriminant analysis (LDA) and multiple linear regression (MLR) were used to establish discrimination models. However, PLSR and MLR are normally used for quantification, not for discriminant analyses. Here we divide non-waxed chestnuts and waxed chestnuts into two classes (non-waxed chestnuts for 1; waxed chestnuts for 2). The threshold value of the two classes is 1.5. Thus they can be used to discriminate chestnuts. PLSR is a method that specifies a linear relationship between a set of predictor variables, X, and a set of dependent (response) variables, Y. The general idea of PLSR is to extract the orthogonal or latent predictor variables that will account for as much of the variation of the dependent variable (s) as possible (Farifteh et al., 2007). A detailed description of the PLSR technique can be found in (Wold et al., 2001). LDA is a well-established statistical technique. For a two-class problem, one canonical discriminant function is constructed for classification between the two classes. The discriminant function is formulated by a linear combination of the feature variables:
![]() |
Overview of the spectra The curves of the mean spectral reflectance values of the two groups are shown in Figure 4. In the visible spectral region, the reflectance values increase as the wavelengths grow, which corresponds to the red-brown color and chemical component contents of the chestnut. In the near-infrared spectral region, the reflectance values are enhanced significantly, which means low absorption in the near-infrared spectral region of the chestnut. Comparison of the two curves shows that the mean reflectance of the waxed chestnuts is much lower than the non-waxed chestnuts within the near-infrared spectral region, and slightly lower than the non-waxed chestnuts within the 400–630 nm wavebands. This may be because the face of the waxed chestnuts is covered with a layer of industrial wax which affects the reflection spectrum of the chestnut. Industrial wax is a complex mixture which contains many kinds of hydrocarbons and other non-hydrocarbon compounds, 95% of which are carbon and hydrogen and the other 5% are hetero atoms. The absorption band of C-H in alkane and aromatics is located in the near-infrared spectral region (700–1026 nm in Figure 4); the absorption band of 400–630 nm may be due to the hetero atom (Burns and Ciurczak, 2007). However, when many compounds are considered, their spectral curves may overlap, resulting in a difficult determination task for the content of compounds by direct observation of the spectra (Wu et al., 2013). For discrimination of waxed chestnuts, further research into other methods should be conducted.
Mean spectral reflectance values of the two groups of chestnuts.
Spectral Variation of Chestnut Samples PCA is a technique employed to interpret spectral data by identifying the greatest directions of variability in a multivariate data space (ElMasry et al., 2011). In this study, PCA was applied to identify variations between tested samples that could be attributed to differences in their spectral information. The first two principal components explained only 64.634% of the variation between samples while the first three principal components explained 92.387%. Figure 5 shows the score plots for the first three principal components from PCA which were applied to all spectral data extracted from all hyper-spectral images of the chestnut samples. The waxed chestnut group is located on the negative side of PC1 and the positive side of PC3, while most of the non-waxed chestnut group is on the positive side of PC1 and PC3 but lower than the waxed chestnut group. An obvious differentiation can be observed between the two groups of chestnuts. However, the samples cannot be separated to two classes completely, demonstrating the difficulty of classifying the chestnut samples into a waxed group and non-waxed group based on the score plot of PCA. Thus, further algorithms need to be developed to classify chestnuts to a waxed group and non-waxed group with high accuracy.
Representation of the score plots of PCA conducted on the hyper-spectral data. Spectral Data Preprocessing.
Spectral Data Preprocessing Before multivariate statistical analysis, the original spectral matrix (X) was treated with the preprocessing algorithm MSC. In order to facilitate the comparison of the subsequent models, the preprocessed spectral data was saved in a matrix (XM). In the following experiments, each discrimination model will be established using spectral matrices X and XM.
Effective Wavelengths Selection The effective wavelengths selection is an important process to reduce redundant information in the hyper-spectral images by selection of the most important wavelengths. According to previous research (Wu et al., 2011), effective wavelengths may be more efficient than full wavelengths, because they contain the most important information relevant to the discrimination. In this study, SPA was applied to select effective wavelengths from an original spectral matrix (X) and a preprocessed spectral matrix (XM). The numbers of variables included in the model were selected by Matlab R2011b software, and are shown in Figure 6. Figure 6 shows that four variables were finally selected for the original spectral matrix(X) and seven variables for the preprocessed spectral matrix (XM). For these two matrices, four (402.8, 516.8, 977.0 and 1026.5 nm) and seven (401.0, 433.5, 607.4, 731.0, 952.1, 1023.3 and 1026.5 nm) wavelengths were identified as important wavelengths. Through the above analysis, we can find that the wavelengths between 400 nm and 630 nm are selected due to the hetero atom and the wavelengths between 700 nm and 1026 nm due to the absorption band of C-H in alkane and aromatics. These specific wavelengths are selected because under them have the maximum information differences between waxed chestnuts and non-waxed chestnuts. These wavelengths are all located in the specific spectral regions which have different reflectance values in their mean curve, shown in Figure 4. These wavelengths were then used as effective wavelengths instead of the whole spectrum of data for the discrimination models.
Number of variables included in the SPA model.
Multivariate Statistical Analysis After spectral data preprocessing, PLSR was used to develop calibration models using the whole spectral wavelengths, while PLSR, MLR, and LDA were used with the effective wavelengths. The selection of the best calibration method is important in spectral analysis. For different types of spectral data, the best calibration method may be different. Therefore, the results of different calibration methods need to be compared in order to select the best method. PLSR models based on different spectral matrices were compared in order to verify the effectiveness of MSC and SPA, and the results are shown in Figure 7. When MSC was analyzed, the preprocessed spectral data models exhibited lower root mean square error (RMSE) and a higher coefficient of association r than the whole spectra models and the effective wavelengths models based on the original spectral matrix (X). When SPA was considered, the MSC-SPA-PLSR model obtained lower RMSE and higher r. In addition, once the effective wavelengths were selected, the number of spectral data variables was greatly reduced from 998 to 4 and 7 for X and XM, which not only reduced the burden of data processing but also improved the stability of the models. The above results show that the preprocessed effective wavelength-based models are better than the corresponding full spectra-based models.
PLSR models based on different spectral matrices: a PLSR model, b MSC-PLSR model, c SPA-PLSR model and d MSC-SPA-PLSR model.
MLR is another commonly used calibration algorithm. Although MLR cannot be used with whole spectral wavelengths and is easily affected by collinearity, it can be effective to apply SPA to select effective wavelengths for MLR models, because SPA can reduce the number of variables and collinearity of the spectral data effectively. The obtained MSC-SPA-MLR equation for a spectral matrix XM (Eq. 10) is shown as follows:
![]() |
The results of the comparison between the PLSR and MLR models are shown in Table 1. The results of the MSC-SPA-MLR model are superior to the MSC-SPA-PLSR model, with a decrease in the RMSE of 8.39% and an increase in r of 2.41%. Therefore, MSC-SPA-MLR can be considered to be better than MSC-SPA-PLSR for establishing a quantitative model for waxed chestnuts discrimination.
Spectral set | model | Number of variables | r | RMSE |
---|---|---|---|---|
X | PLSR | 998 | 0.8764 | 0.2408 |
X | SPA-PLSR | 4 | 0.8800 | 0.2398 |
X | SPA-MLR | 4 | 0.8974 | 0.2315 |
XM | MSC-PLSR | 998 | 0.9182 | 0.2379 |
XM | MSC-SPA-PLSR | 6 | 0.9201 | 0.2204 |
XM | MSC-SPA-MLR | 6 | 0.9423 | 0.2019 |
LDA is another algorithm which can be used for classification. In this study, SPSS software was used to apply LDA to classify waxed and non-waxed chestnuts. The preprocessed effective wavelength-based calibration set was then calculated by LDA and the results of the histograms are shown in Figure 8. The figure shows that only four non-waxed samples are incorrectly classified in the waxed category. The obtained MSC-SPA-LDA equation for the spectral matrix XM (Eq. 11) is as follows:
![]() |
Calibration results of LDA from SPSS: a non-waxed chestnuts and b waxed chestnuts.
Once the models were set up, the prediction set with 200 samples (100 waxed and 100 non-waxed chestnuts) was applied to classify the samples using the MSC-SPA-MLR model and MSC-SPA-LDA model by and, respectively. The predicted results of MSC-SPA-MLR and MSC-SPA-LDA models are shown in Figure 9 and the accuracy (in percent) of the predicted results is shown in Table 2. The MSC-SPA-LDA model obtained an accuracy of 94%, while the MSC-SPA-MLR model obtained an accuracy of 98%. These results suggest that the MSC-SPA-MLR model is the best model to use to discriminate between waxed and non-waxed chestnuts.
Prediction results of: a MSC-SPA-MLR, b MSC-SPA-LDA.
Model | Number of variables | Number of samples | Number of predicted non-waxed samples | Number of predicted waxed samples | Accuracy (%) |
---|---|---|---|---|---|
MSC-SPA-MLR | 6 | 100 (non-waxed) | 96 | 4 | 98.0 |
100 (waxed) | 0 | 100 | |||
MSC-SPA-LDA | 6 | 100 (non-waxed) | 92 | 8 | 94.0 |
100 (waxed) | 4 | 96 |
Visible-NIR hyper-spectral imaging was successfully utilized for the noninvasive detection of waxed chestnuts. MSC was shown to be a useful processing method which can improve the quality of the calibration models for all types of calibration algorithm and also improve the accuracy of the validation data. Rather than traditional data mining strategies, effective wavelengths selection was used to reduce the dimensionality using collinearity between the whole spectral wavelengths in the hyper-spectral images, resulting in seven effective wavelengths (401.0, 433.5, 607.4, 731.0, 952.1, 1023.3 and 1026.5 nm) selected by SPA. Compared with the MSC-SPA-PLS and the MSC-SPA-LDA models, the MSC-SPA-MLR model was shown to be the best quantitative model with a low RMSE of 0.2019, a high r of 0.9423, and a high accuracy of 98%.
Further research should focus on selection of the effective wavelengths of the MSC-SPA-MLR model and development of a simpler and more effective multi-spectral system which will only obtain the effective wavelengths used by the MSC-SPA-MLR model. This will allow the multi-spectral system to be developed into small, convenient, rapid, accurate and noninvasive instruments for detecting waxed chestnuts.
Acknowledgments This work was partially supported by the National High Technology Research and Development Program of China (863 Program) (2013AA030602), the National Natural Science Foundation of China (61378060), the National Science Instrument Important Project (2011YQ14014704), and the Shanghai Municipal Natural Science Foundation (13ZR1427800).