Food Science and Technology Research
Online ISSN : 1881-3984
Print ISSN : 1344-6606
ISSN-L : 1344-6606
Technical paper
Dried Jujube Classification Based on a Double Branch Deep Fusion Convolution Neural Network
Lei GengWenlong XuFang Zhang Zhitao XiaoYanbei Liu
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML

2018 Volume 24 Issue 6 Pages 1007-1015

Details
Abstract

A novel method based on a double branch deep fusion convolution neural network (DDFnet) is developed to classify dried jujubes. First, the structure of the network is designed as double branches. In one branch, the dataset of the jujubes is pre-trained with a model trained by a Squeezenet network on a large-scale ImageNet dataset. The other branch is founded on the structure of Squeezenet, which is composed of fire modules. The feature maps that are output by squeeze and expand convolution layers are fused into fusion modules. Next, a model trained on the dataset with DDFnet is used to achieve the multi-classification of jujubes. Finally, the dataset is classified by the model; it shows good performance with high accuracy rates of 99.6%, 99.8%, 98.5%, and 99.2% for the classification of plump, wizened, cracked, and defective jujubes, respectively. This research demonstrates the feasibility of DDFnet for sorting dried jujubes and enhancing product quality.

Introduction

The jujube (Zizyphus jujuba Mill.), which is one of the most important and representative fruits in China, is admired for its high nutritional value as a type of important traditional Chinese medicinal and tonic food (Pareek, 2013). This fruit plays an important role as a food industrial raw material. The overall quality of jujube is impacted more severely in the course of jujube's picking and transportation, due to the mixture of different dried jujube qualities, such as starchy head, cracked skin, mildewed and wormy fruit (Li et al., 2009). The economic benefits of jujube growers and jujube industry will be strongly affected by the jujube's quality. The market prices of jujube also vary with the quality. Therefore, it is a crucial link to identify the defective jujubes and sort jujubes into different qualities for the storage, transportation and further processing of jujubes. However, the quality of jujube classification is primarily manual at present, and it has a number of shortcomings, including high labour intensiveness, high cost, and low efficiency. To meet the demands of the market and the jujube processing industry, it is necessary to identify an automatic, efficient and nondestructive method for the multi-classification of dried jujube.

In recent years, with the rapid development of machine vision and image processing technology, the external quality detection and classification of fruit has made great progress. A series of different technologies has been applied to the detection of defects. These techniques include computer vision systems (Zhao et al., 2013; Li et al., 2002), image processing algorithms (Rehkugler et al., 1989; Moradi et al., 2011), NIR transmittance spectroscopy (Khatiwada et al., 2016), X-ray imaging (Schatzki et al., 1997), hyperspectral imaging techniques (Yu et al., 2014; Wang et al., 2011), and convolution neural networks (Liu et al., 2017). However, there are few studies examining the detection and classification of jujube quality at home and abroad, which are limited by geographical conditions. At present, there are three primary types of jujube defect detection methods: traditional image processing technology, hyperspectral imaging, and support vector machine (SVM). Li et al. (2016) proposed a method based on nonuniformities of gradient distribution to identify wizened jujubes from a mix of wizened and plump jujubes and adopted normalized gradient histograms as a texture feature representation of the jujube. Wu et al. (2016) found that a hyperspectral imaging technique could be applied to identify the common defects (bruises, insect infestation, and cracks) of jujubes; the hyperspectral images of dried jujubes were evaluated through principal component analysis to select the optimal wavelengths for image recognition. In the HIS colour model, the average value and the standard square deviation value of H were extracted as the colour feature values of dried jujubes, and the defects of jujubes were identified using a support vector machine, which was proposed by Zhao et al. (2008). The machine vision recognition technology by particle swarm optimization based on least square support vector machine was presented by Zhang et al. (2011). The SVM model based on combining FMs with RGB intensity was created to sort dried jujubes by Lou et al. (2012), and the model trained by SVM had fast and accurate identification effects. Nevertheless, there are still several disadvantages to the above methods. For example, the multi-classification task of different jujube qualities is not realized to date. The equipment for obtaining hyperspectral images is highly expensive, and it is difficult to implement for industrial quality inspection. The precision of the identification warrants further research.

In recent years, convolutional neural network (CNN) has achieved good results in the field of image classification, accompanied by the rapid development of deep learning. Lecun et al. (1998) proposed a LeNet network, which was composed of convolutional layer, pooling layer, fully connected layer, showed high accuracy in handwriting classification, and became a typical network structure. This network laid the foundation for the future development of convolutional neural networks. AlexNet was proposed by Krizhevsky et al. (2012), which successfully applied Relu activation layer, local response normalization layer, dropout layer, and overlapping pooling layer in CNN for the first time, thus accelerating the convergence of network training and preventing the appearance of overfitting. Simonyan et al. (2014) proposed a VGGnet network, which removed the local response normalization layer. This network was the repeated stacking of convolutional layers and pooling layers, the convolution kernel size of which was 3*3 and 1*1, respectively. The memory consumption and computation time was reduced, and the pre-trained method was used to accelerate convergence in VGGnet. Szegedy et al. (2014) created a modular structure GoogLeNet. Model classification accuracy of GoogLeNet was improved by introducing the Inception module, and the average pooling layer was adopted to replace the fully connected layers. To avoid the gradient disappearing, the VGGnet added two auxiliary softmaxs for back-forward. Deep residual network for image recognition was proposed by He et al. (2015b). ResNet demonstrated that the depth of convolutional neural networks was crucial for image classification and identification tasks. The residual network solved the problem of gradient disappearance and explosion caused by increasing the depth of the network structure and improved the accuracy and generalizability of the network model. It can be observed that the convolutional neural network is a stack of mappings from the above different forms of network structures. Many network structures with higher accuracy in image classification have been generated by connecting such layers as the convolutional layers, pooling layers, activation function layers, and fully connected layers.

Several measures has been adopted to optimize network structure and improve classification accuracy of the model. He et al. (2015a) proposed a parameter-modified linear unit that generalized the traditional linear unit and accelerated the model fitting speed with an extra computation cost of almost zero. Oquab et al. (2014) raised a transfer learning plan that was applied to deep learning, it could initialize the target dataset network parameters using the pre-trained CNN model from the big dataset to fine-tune the target dataset. Batch normalization was advanced by Ioffe et al. (2015), and it not only accelerated the convergence of the model but also alleviated the gradient dispersion of the deep network. Iandola et al. (2016) proposed SqueezeNet, which introduced fire modules, and it reduced the model parameters of the convolutional neural network and achieved high recognition accuracy at the same time. To solve the problem of gradient disappearance caused by the increase of layers, the Dense Convolutional Network (DenseNet) was proposed by Huang et al. (2016). Chen et al. (2017) proposed Dual Path Networks (DPN), which was a simple, efficient and highly modular network, it further enhanced the feature extraction capabilities of convolutional neural networks. Therefore, in order to improve network performance, the above measures can be taken to adjust the network structure and parameters.

Inspired by the convolutional neural network, a method for jujube quality classification based on a double branch deep fusion convolutional neural network (DDFnet) is proposed in this paper. Compared to traditional jujube quality identification methods, DDFnet achieved multi-classification and higher classification accuracy in jujube quality recognition.

Materials and Methods

Experimental materials    The variety of dried jujubes (Zizyphus jujube Mill.) is the Xinjiang jujube. The jujubes are transported to the laboratory from the Cangzhou jujube processing enterprise and are manually sorted into four different qualities, including plump, wizened, cracked, and defective jujubes. There are 20,000 samples of dried jujubes used to train and test the convolutional neural network, including a set of 5,000 plump jujubes, a set of 5,000 wizened jujubes, a set of 5,000 cracked jujubes, and a set of 5,000 defective jujubes, such as fruit with skin cracks, diseased fruit, wormy fruit, and mildewed fruit. Meanwhile, the jujube samples are divided into a training dataset and a validation dataset with a ratio of 4:1. The training dataset is composed of 16,000 jujube samples, and the validation dataset is composed of 4,000 jujube samples. Representative images of the four different qualities are shown in Fig. 1.

Fig. 1.

Typical surface four qualities of jujube samples.

Image acquisition and pre-processing    The images of dried jujubes are acquired using a HIKVISION MV-CA003-50GC (Hangzhou Hikrobot Technology Co., Ltd, China) colour industrial camera with an 8-mm HIKVISION MVL-HF0828M-6MP (Hangzhou Hikrobot Technology Co., Ltd, China) lens. A brightness-adjustable ring light is used to provide the lighting conditions needed for dried jujube image acquisition. The jujube images are acquired by an image acquisition and preservation software system (version: MVS2.3.1 build20171129 (STD), freeware at http://www.hikrobotics.com/service/soft.htm).

The size of the original jujube images (color images) is 640×480, and there is redundant white background information in the images. To remove the redundant information, traditional image processing algorithms are used to extract the region of interest (ROI) from the original jujube images. First, the images are binarized to highlight the target contour of interest; the binarization threshold is 252. Next, the minimum square outline of a binary image is retrieved to extract the image's ROI. Finally, the dried jujube images are normalized to the same size, 227×227. Dried jujube pre-processed images are shown in Fig. 2.

Fig. 2.

Image pre-processing of dried jujubes

Convolutional neural network and fusion module    Essentially, a convolution neural network is a multilayer perceptron (Gardner et al., 1998). This network reduces the number of weights and the complexity of the model by means of local connection and weight sharing. Images marked with labels can be trained directly as inputs to the network, avoiding complex feature extraction by traditional image processing methods. The feature maps are extracted mainly by the convolution layers and pooling layers in convolutional neural network. The gradient descent method (Yang et al., 1998) is used to minimize the loss function, and the weight parameters are updated layer by layer. The network is iteratively trained to improve classification accuracy. The software used for network training is Visual Studio2013 (The Microsoft, Inc.). A network training and model test block diagram is shown in Fig. 3.

Fig. 3.

Network training and model test block diagram

The lightweight network, Squeezenet, is proposed to simplify the model complexity and reduce the number of parameters in CNN. Squeezenet is primarily composed of fire modules, including 3 convolutional layers (squeeze1*1 and expand3*3), 3 Relu layers, and 1 concat layer. The output feature maps of expand1*1 and expand3*3 are put together by the concat layer and used as the inputs to the next fire module. The network design strategy is to reduce the number of parameters of CNN and obtain larger feature maps through reduction of the convolution kernel size and delaying downsampling for the fire module.

To improve the accuracy of jujube identification and classification, a fusion module is proposed as an improved version of the fire module. The fusion module is formed by changing the flow direction of the internal feature maps. The feature maps of squeeze1*1, expand1*1 and expand3*3 are fused by concat layer. The feature maps extracted from adjacent fusion modules are merged to acquire more feature information. At the same time, BN layers and PRelu layers are introduced in the fusion model to accelerate network convergence. The fire module and fusion module are shown in Fig. 4.

Fig. 4.

Fire and fusion module.

Network structure design    Aiming at the classification of dried jujube quality, a double branch deep fusion convolution neural network (DDFnet) is proposed in this paper. In the first branch, the dataset of dried jujubes is pre-trained with the model trained by the Squeezenet network on the large-scale ImageNet dataset under the strategy of transfer learning (Pan et al., 2010). This branch consists of 1 convolution layer (conv_1), 8 fire modules (fire_2∼fire_9), 2 maxpooling layers, several Relu layers, and concat layers. The core for the second branch is a fusion module, and this branch is composed of 1 convolutional layer (conv_1), 8 fusion modules (fusion_2∼fusion_9), 2 maxpooling layers, several BN layers, PRelu layers, and concat layers. The feature maps are merged through the concat layer from the last fire module and fusion module. Conv10_1 and global average pooling layer are used for classification instead of the full-connected layers to reduce the number of parameters and relieve the over-fitting phenomenon.

The second branch network fuses the feature maps of different convolution layers, widens the network structure, improves the classification accuracy of the classification model. However, the convergence rate of second branch network is slow, during network training. In order to accelerate the network convergence, transfer learning is applied in the structure of DDFnet. The first branch of DDFnet is pre-trained with the model trained by the Squeezenet network. The generalization ability and classification accuracy are further improved for the classification model, through the combination of two branches. The structure of double branch deep fusion convolution neural network (DDFnet) is shown in Fig. 5.

Fig. 5.

Double branch deep fusion convolution neural networ. (DDFnet) structure.

Hyper-parameter setting and network training technique    The settings of hyper-parameters and the choices of network training techniques have significant impacts on model training. The appropriate hyper-parameters can reduce the number of parameters and the amount of calculation during the network training process. Before building a network structure, all types of hyperparameters need to be specified in advance, including image pixel size, convolution layer numbers, and convolution kernel parameters. At the same time, a variety of network training techniques, such as the selection of activation functions and the application of BN layers, can accelerate the speed of network training and eliminate the disappearance of gradients.

Images with higher resolution are generally beneficial for improving network performance. The size is 227*227 for jujube images in this paper. Compared to large-size convolutional kernels, small-size convolutional kernels can increase network capacity and reduce parameters. Convolution kernels sizes are primarily 3*3 and 1*1 in DDFnet. Related parameters of different layers are shown in Table 1.

Table 1. Related parameters of different layers.
Fire/Fusion module
Layers Conv_1 squeeze1*1 expand1*1 expand3*3 Conv_10 Pool_1/3/5
Kernel-size 3*3 1*1 1*1 3*3 1*1 3*3
Related Stride 2 1 1 1 1 2
parameters Pad 0 0 0 1 0 0

Relu layer (Nair et al., 2010) is likely to cause neuron inactivity. To speed the convergence of the network and prevent the disappearance of gradients, a PRelu layer is introduced in the second branch network. The PRelu layer adds notably few parameters and has almost no effect on the amount of calculations. Compared to Relu, PRelu corrects the data distribution, keeps negative values, and updates parameters in reverse. The PRelu function is shown as follows:   

Transfer learning strategy is applied to convolution neural network, which is of great significance to network optimization. Convolution neural network can extract features from a large number of data samples, but it is not effective for small samples. When the number of model parameters is larger than data samples, over-fitting is easy to occur, which makes the network remember the training samples but fail to learn the common features of training sample. To solve this problem, the dataset of the jujubes is pre-trained with a model trained by a Squeezenet network on a large-scale ImageNet dataset in the first branch of DDFnet.

An appropriate initialization scheme can prevent the disappearance of the gradient. At the same time, the initialization of parameters can affect the convergence rate for training network. The Xavier (Glorot et al., 2010) parameter initialization method is used to initialize weights in the second branch network. This initialization method can make the feature information flow better in the network, and the variance of each layer is equal as far as possible. It not only ensures the differences between input and output, but also makes the model stable and convergent quickly, so as to achieve better optimization effect in classification.

Results and Discussion

Classification models based on a deep fusion convolution neural network (DDFnet)    The model trained by DDFnet is used to identify the jujube quality of the validation dataset; the number of misidentified images for plump jujubes, wizened jujubes, cracked jujubes, and defective jujubes are 4, 2, 15, and 8. The model shows a good performance with high accuracy rates of 99.6%, 99.8%, 98.5%, and 99.2% for the classification of plump, wizened, cracked, and defective jujubes, respectively. The total classification accuracy is 99.3%. DDFnet shows good results in jujube quality classification. The different qualities of the jujube classification results are shown in Table 2.

Table 2. Different qualities of jujube classification accuracy rates.
Number of incorrect identification Total accuracy rate(%)
Plump Wizened Cracked Defective
Plump - 1 0 3 99.6
Wizened 0 - 2 0 99.8
Cracked 3 5 - 7 98.5
Defective 5 0 3 - 99.2

Comparing other convolutional neural networks    To the best of our knowledge, this report describes the first attempt to apply a convolutional neural network for the classification of dried jujubes. In the experiment, many different convolutional neural networks in the field of image classification are implemented to classify the qualities of dried jujubes with good results. The Lenet network consists of 2 convolution layers, 2 pooling layers, 2 fully connected layers, and 1 Relu layer. The model trained by Lenet has an 86.5% accuracy rate on the validation dataset. Squeezenet is composed of 2 convolution layers, 8 fire modules, 4 pooling layers, several Relu layers, and concat layers. The Squeezenet model has a 96.2% accuracy rate in jujube quality classification. The dried jujube dataset is pre-trained with the model trained by the Squeezenet network on the ImageNet dataset, and the classification accuracy rate of the PreSqueezenet model is 98.7%. To improve the accuracy rate of dried jujube classification, the fire module is replaced by the fusion module. This network is called a deep fusion convolution neural network (DeepFusionnet), and the classification accuracy rate of the DeepFusionnet model is 98.8%. Due to the differences in network structures, the number of convolution layers, and training strategies, convolution neural networks have different classification effects. Among these networks, DDFnet has a highest classification accuracy. The comparison of different CNN models classification accuracy rates is shown in Table 3.

Table 3. Comparison of different CNN models classification accuracy rates.
Network structures Accuracy rate(%)
Lenet 86.5
Squeezenet 96.2
PreSqueezenet 98.7
DeepFusionnet 98.8
DDFnet 99.3

The training dataset is trained iteratively by DDFnet and it can generate a model after 500 iterations. Twenty classification models are generated during 10,000 iterations. The validation dataset is tested by 20 models. Comparing the classification accuracy rates and the loss values with other convolutional neural networks, DDFnet has a fastest convergence rate and the classification accuracy is significantly improved. It can be seen that pre-training has a marked impact on the network's convergence rate. Different network accuracies and losses graphs are shown in Fig. 6.

Fig. 6.

Different networks accuracies and losses graphs.

Comparing different classification methods    Prior to the approach proposed in this paper, other methods have been used only to detect a single quality for dried jujube identification. Hyperspectral image acquisition equipment is expensive, and it is difficult to meet the industrialization needs of jujube quality identification. Hyperspectral imaging technology (Wu et al., 2016) was used to identify defective jujubes with an accuracy of 96%. The traditional image processing technology was applied to the classification of jujube, but image feature extraction is difficult. An accuracy rate of 99.01% was achieved for the identification of wizened jujubes with a gradient distribution histogram (Li et al., 2016). The images cannot be directly used as an input to SVM, and it is difficult to avoid complicated image processing. The accuracy rate of defective jujubes recognition was 98% with the method based on particle swarm optimization and SVM (Zhang et al., 2011). The accuracy rate was 96.2% for defective jujube recognition with the method based on the HIS colour model and SVM (Zhao et al., 2008). Compared to the current dried jujube quality identification methods, the classification accuracy rate of the DDFnet model is the highest, and multi-classification is achieved using a convolutional neural network. The comparison of different classification methods is shown in Table 4.

Table 4. Comparison of different classification methods.
Methods Feature extraction methods Classification results Classification accuracy(%)
Li et al.. (2016) Gradient distribution histogram Only wizened jujubes 99.01
Wu et al. (2016) Hyperspectral imaging technique Only defective jujubes 96
Zhao et al. (2008) HIS colour model and SVM Only defective jujubes 96.2
Zhang et al (2011) Particle swarm optimization and SVM Only defective jujubes 98
Proposed in this paper Convolution neural network DDFnet Plump, wizened, cracked, defective jujubes 99.3

Conclusions

A novel method based on a double branch deep fusion convolution neural network (DDFnet) is proposed to classify different qualities (plump, wizened, cracked, and defective) of dried jujubes in this paper. The DDFnet structure is designed for network model training. The fusion modules and transfer learning are applied to accelerate convergence and to improve the model classification accuracy rate in this network. BN layers and PRelu layers are introduced in DDFnet. The total accuracy rate is 99.3% for the classification of dried jujubes. Deep learning technology and convolutional neural networks are first applied in the field of dried jujube classification, achieving multi-classification of dried jujubes and obtaining the highest classification accuracy compared to other current methods.

Acknowledgements    This work was supported by the National Natural Science Foundation of China under grant No.61771340; Tianjin Science and Technology Major Projects and Engineering under grant No.17ZXHLSY0040, No.17ZXSCSY0060, and No. 17ZXSCSY0090; Plan Programe of Tianjin Educational Science and Research under grant No.2017KJ087; Tianjin Natural Science Foundation under grant No. 17JCQNJC01400; Program for Innovative Research Team in University of Tianjin under grant No. TD13-5034.

References
 
© 2018 by Japanese Society for Food Science and Technology

This article is licensed under a Creative Commons [Attribution-NonCommercial-ShareAlike 4.0 International] license.
https://creativecommons.org/licenses/by-nc-sa/4.0/
feedback
Top