The Horticulture Journal
Online ISSN : 2189-0110
Print ISSN : 2189-0102
ISSN-L : 2189-0102
ORIGINAL ARTICLES
Noninvasive Diagnosis of Seedless Fruit Using Deep Learning in Persimmon
Kanae MasudaMaria SuzukiKohei BabaKouki TakeshitaTetsuya SuzukiMayu SugiuraTakeshi NiikawaSeiichi UchidaTakashi Akagi
Author information
JOURNAL FREE ACCESS FULL-TEXT HTML

2021 Volume 90 Issue 2 Pages 172-180

Details
Abstract

Noninvasive diagnosis of internal traits in fruit crops is a high unmet need; however it generally requires time, costs, and special methods or facilities. Recent progress in deep neural network (or deep learning) techniques would allow easy, but highly accurate diagnosis with single RGB images, and the latest applications enable visualization of “the reasons for each diagnosis” by backpropagation of neural networks. Here, we propose an application of deep learning for image diagnosis on the classification of internal fruit traits, in this case seedlessness, in persimmon fruit (Diospyros kaki). We examined the classification of seedlessness in persimmon fruit by using four convolutional neural networks (CNN) models with various layer structures. With only 599 pictures of ‘Fuyu’ persimmon fruit from the fruit apex side, the neural networks successfully made a binary classification of seedless and seeded fruits with up to 85% accuracy. Among the four CNN models, the VGG16 model with the simplest layer structure showed the highest classification accuracy of 89%. Prediction values for the binary classification of seeded fruits were significantly increased in proportion to seed numbers in all four CNN models. Furthermore, explainable AI methods, such as Gradient-weighted Class Activation Mapping (Grad-CAM) and Guided Grad-CAM, allowed visualization of the parts and patterns contributing to the diagnosis. The results indicated that finer positions surrounding the apex, which correspond to hypothetical bulges derived from seeds, are an index for seeded fruits. These results suggest the novel potential of deep learning for noninvasive diagnosis of fruit internal traits using simple RGB images and also provide novel insights into previously unrecognized features of seeded/seedless fruits.

Introduction

In most fruit crops, seedlessness is a desirable trait, both for fresh consumption and in processed fruits. A wide variety of techniques for the production of seedless fruits, such as treatment with phytohormones or chemical compounds, utilization/breeding of aneuploidy/polyploidy or genetically seedless cultivars, have been developed (Rotino et al., 1997; Varoquaux et al., 2000). Representative practices are gibberellin treatment in table grapes (Weaver and Pool, 1965; Kimura et al., 1996), utilization of triploids in banana or watermelon (Kihara, 1951; Henderson, 1977), selection of parthenocarpic cultivars in tomato (Lukyanenko, 1991; Mazzucato et al., 1998), and breeding of stenospermocarpic cultivars in grape (Bouquet and Danglot, 1996).

Noninvasive prediction of internal traits or disorders in fruit is important for the selection of highly qualified fruit. Internal traits that have been investigated include apple firmness and soluble solids content (Peng and Lu, 2007), peach firmness (Lu and Peng, 2006), tart cherries pit presence (Qin and Lu, 2005), tomato mechanical damage (Milczarek et al., 2009), and pickle internal defects (Ariana and Lu, 2010). Many noninvasive assessments of internal fruit traits were developed based on optical magnetic resonance imaging (MRI), two dimensional (2D) X-ray, near-infrared (NIR), and vibration for a variety of agricultural and food products (Milczarek et al., 2009; Cubero et al., 2011; Lorente et al., 2011). A nondestructive tool for assessing internal traits or disorders is required to offer better quality products. However, in contrast to the described techniques for stable production of seedless fruits, noninvasive prediction of seedlessness or seed numbers in fruit crops is little developed, although it would be very useful for commercial production (Varoquaux et al., 2000). In addition, problems when applying these tools on site remain because special facilities and high costs are generally involved (Donis-González et al., 2014). On the other hand, experts can detect some internal traits from the outer appearance, although such “empirical” abilities are cultivated only after long observational experience. Besides, manual prediction by experts is inefficient in terms of time and cost, and may not be suitable for ongoing “smart” agricultural techniques.

Deep neural network (or simply deep learning) frameworks may allow users to reproduce the professional eyes with “empirical” ability to predict fruit internal traits, including seedlessness. Deep learning assists in addressing the increasing complexity and volume of imaging data and is increasing in popularity. Within the fields of image processing and diagnosis, the application of Convolutional Neural Networks (CNN) is considered a breakthrough (Shin et al., 2016). For plants, to date, deep learning frameworks have been successfully applied to the detection of stresses/diseases (Ramcharan et al., 2017; Ferentinos, 2018; Ghosal et al., 2018; Singh et al., 2018). On the other hand, there are only a few applications of deep neural networks to predict fruit internal traits, such as damage detection in blueberry fruits (Wang et al., 2018) or detection of internal disorders in persimmon fruits (Akagi et al., 2020). A major issue in image diagnosis by deep learning has been that the explanatory factors in the image could not be detected. “Visual reason to predict the phenomenon” often directly indicates the early symptoms or the phenomenon itself, contributing to site-specific physiological interpretation. Regarding this issue, recent progress in the field has enabled “explainable artificial intelligence”, in which a sensitivity analysis of the neural network, such as Gradient-weighted Class Activation Mapping (Grad-CAM) (Selvaraju et al., 2017), provides visual explanatory factors on the original image.

In this study, we focused on the prediction of seedlessness or seed numbers in persimmon (Diospyros kaki Thunb.) fruits using deep neural networks. Persimmon is a major fruit crop, especially in East Asia. A wide variety of hexaploid persimmon cultivars include various numbers of seeds (n = 0–8) in the fruit and mixed seeded and seedless fruits are offered to consumers in markets, except for genetically seedless cultivars such as the nonaploid ‘Hiratanenashi’. Despite the fact that persimmon fruits with seeds are commercially undesirable, as for most other fruit crops (Varoquaux et al., 2000), practical prediction tools for seed numbers have not been developed. The substantial shape of the fruit is thought to be unaffected by seed numbers, although some experts are able to detect seedlessness in a few major cultivars, such as ‘Fuyu’. Here, we aimed to develop deep learning frameworks to predict seedlessness in persimmon fruits using ‘Fuyu’ without empirical experience. We also examined the back-propagation of the trained neural network models to visualize the characteristics of seeded/seedless persimmon fruits. These results would develop into a novel effective technique to diagnose seedlessness only from simple photo images, and provide insights into “key points” in understanding fruit internal structures from outer appearances, without the need for long empirical observational ability.

Materials and Methods

Assessment of seedlessness in persimmons

A total of 599 fully matured ‘Fuyu’ persimmon fruits were harvested from four independent trees (38, 57, 58, and 92 years old) in late November 2018, at Gifu Prefectural Agricultural Technology Center, Gifu, Japan (N35.441721, E136.699894). The fruits were placed on a gray-background sheet for photos from the fruit apex side using a digital camera (COOLPIX P520; Nikon Corporation, Japan). The LED positional light (KANSAI RACK, Japan) was set just above the camera and the fruits. The distances from the camera and the positional light to the fruits were 40 cm and 60 cm, respectively. The detailed settings of the camera were F-value of F/4, exposure time of 1/50 second, ISO-400, default white balance (auto), and the size of the image was 1600 × 1200 pixels. The images were taken in a dim room to avoid the influence of external light. The ‘Fuyu’ fruits were dissected to visually annotate seed numbers (n = 0–8) after taking the photos.

Image processing and construction of neural networks

This study followed typical three steps in classification with CNN assessment, given in Figure 1, (i) image input, (ii) training of images, and (iii) classification, followed by (iv) backpropagation of the neural network (described later). The images were resized into 224 × 224 pixels to adjust the deep learning framework which was pre-trained with the standard image set from ImageNet (224 × 224 pixels) <http://www.image-net.org/> that could classify an internal disorder in persimmon fruit with high accuracy (Akagi et al., 2020). Note that although the aspect ratio was changed from the original images (1600 × 1200 pixels), we confirmed that this change had no significant effect on classification ability compared with using cropped images with the same aspect ratio, as well as for diagnosis of internal disorders (Akagi et al., 2020) and quick softening (M. Suzuki, K. Masuda, and T. Akagi unpublished data) in persimmon fruit. The resized images were randomly separated into the training set (75%) and the test set (25%). Four standard deep neural network models, VGG16 (Simonyan and Zisserman, 2014), Resnet50 (He et al., 2016), InceptionV3 (Szegedy et al., 2016a), and InceptionResnetV2 (Szegedy et al., 2016b), were examined. Specifically, we used the implementation found in the Keras library <http://keras.io/ja/>. For the data augmentation process to increase the generalization ability of the networks, new image samples were generated from the training images by random combinations of brightness changes, horizontal flips, vertical flips, and rotation using the “ImageDataGenerator function” in Keras. We applied the “class-weight” option, which is available in Keras for balancing the categories with different sample numbers. Training and testing with those models were run on Ubuntu 18.04 (DeepStation DK1000, 16GB RAM, GPU = 1; UEI Corporation, Japan). We applied stochastic gradient descent (SGD) for the optimizer, with a learning rate of 0.001. The detailed settings of the neural network frame are summarized in Table 1A. The training process was terminated when the loss values for the test set started increasing.

Fig. 1

The flow of deep learning diagnosis of seedless persimmon fruit and a visual explanation. Images from the apex side of a total of 599 ‘Fuyu’ persimmon fruits were applied to CNN deep learning analysis to classify into two categories depending on seed number. This flow consists of (i) image input, (ii) training of images, (iii) classification, and (iv) backpropagation of neural networks.

Table 1

Setting (A), and accuracy and loss (B) in classification of seeded and seedless fruits for the four CNN models used in this study.

Evaluation of CNN models and feature extraction

To evaluate the classification performance by the neural network models, we first derived receiver-operating characteristics (ROC) (Fan et al., 2006), which evaluated rate transition of the true-positive rate (TPR) and false-positive rate (FPR), as an index of classification accuracy. The area under the ROC curve (AUC) was calculated, which is a common metric in classification accuracy. The feature distribution of the positive and negative samples was visualized on a two-dimensional plane by using t-distributed Stochastic Neighbor Embedding (t-SNE) (Maaten and Hinton, 2008).

Visual explanation of classification results

To produce a “visual explanations” of the individual classification results by the neural network model, Grad-CAM (Gradient-weighted Class Activation Mapping) and Guided Grad-CAM were applied (Selvaraju et al., 2017) on the step (iv) backpropagation of neural network in Figure 1. Grad-CAM is based on a sensitivity analysis of the unit outputs of a certain layer in the network and produces a coarse localization map highlighting the weighted regions on the original image. An implementation of Grad-CAM and Guided Grad-CAM using the iNNvestigate library (Alber et al., 2019) can be found at <https://github.com/uchidalab/softmaxgradient-lrp>.

Distribution of relevance levels

Distributions of relevance levels in Guided Grad-CAM, along with the distance from the outer contour of the fruitlet were calculated using all test images (in total, 107 seeded and 42 seedless fruit images) and averaged among the seedless and seeded test samples, respectively. The distribution is represented as a two-dimensional histogram H(r,d) (Fig. 5A–D), where r is the relevance level (given by an explainable-AI method) and d is the (normalized) distance from the outer contour. To generate the histogram, the fruitlet region is first extracted from each fruit image using a color clustering-based binarization technique, where all pixels in the image are clustered into two clusters by RGB values. Mathematical morphology operations are then applied to the fruitlet region image to remove fragmental connected components (i.e., noises). Third, distance transformation is applied to determine the distance from the fruit outer contour at each pixel. Note that the distance is normalized so that the maximum distance becomes 1. Finally, the two-dimensional histogram H(r,d) is obtained by counting the number of pixels with the relevance level r and the normalized distance d.

Results

Deep neural networks discriminate seedlessness in persimmon fruits

In the training/test fruit samples, seeds were located randomly in eight locules (Fig. 2A). These samples showed a wide distribution of seed numbers between 0–8, where approx. a quarter of the sample was seedless (Fig. 2B). Here, firstly to classify “seedless” and “seeded” persimmon fruits, we defined binary categories of “seedless” (seed nos = 0) and “seeded” (seed nos = 1–8). All four neural networks, VGG16, Resnet50, InceptionV3, and InceptionResnetV2, classified seedless and seeded fruits with high accuracy (> 0.85) for the test dataset (Table 1B). Among the four neural network models, VGG16 offered the highest accuracy of training and test dataset (1.00 and 0.89, respectively). A confusion matrix visualizing true negative and positive rates supported the conclusion that VGG16 showed the highest performance (0.81 for true negative and 0.93 for true positive, respectively) (Fig. 3A). The distribution of prediction values, after dissecting the information of the confusion matrix, also showed VGG16 with the highest performance among the four models (Fig. 3B). The negative and positive samples tended to be distributed separately in all four models, especially the VGG16 trained model. The ROC-AUC, which evaluates the transition of the true/false-positive rates, showed substantially high values (> 0.93) in neural network models (Fig. 3). In the other models with deeper layer structures, an increase in epoch numbers often tended to result in overfitting. These results suggested that the classification of seedless fruits was the result of certain simple features.

Fig. 2

Distribution of seed numbers in persimmon fruits. (A) Fruit images from the apex side and of a cross-section in a horizontal direction. Seeds seemed to be randomly positioned into eight locules of persimmon fruit. (B) Distribution of seed numbers in 599 persimmon fruits used for deep learning. In this study, binary classes for deep learning were defined as negative (seedless: seed nos = 0) and positive (seeded: seed nos = 1–8).

Fig. 3

Evaluation of performance by four CNNs. Confusion matrixes (A), distribution of prediction value (B), and ROC curve (C) in the test dataset, classified using VGG16, Resnet50, InceptionV3, and Inception Resnet. For the ROC curve, the AUC (area under the curve) values correspond to performance in classification. Random classification, or no ability for diagnosis, shows ROC-AUC = 0.5, while perfect classification was expressed in ROC-AUC = 1.0. Amongst the four CNN models, VGG16 showed the highest AUC values.

Classification performance depends on seed numbers in fruit

The prediction values in the binary classification of “seeded and seedless fruits” in four trained models tended to be increased in proportion to seed numbers (Fig. 4A). In a comparison of the prediction values from fruits with one and more than two seeds, significant differences were detected in the trained VGG16 model (P < 0.01 with Student’s t-test). In the other neural network models, the prediction values for “seeded” were also significantly increased (P < 0.05 for 1- vs. 2-seeds, see Fig. 4A). Most of the fruits predicted to be “seedless” had less than two seeds, while fruits with many seeds tended to result in a “seeded” prediction. In the trained VGG16 model, the prediction accuracies were 82.2% between seedless and one-seeded (0 vs 1) and 84.6% between seedless and one–two seeded (0 vs 1–2), respectively. This suggested that the VGG16 model showed high classification ability even in comparison with seedless and few-seeded fruits. Feature principal components with t-SNE for the test images showed a clear distribution consistent with the categorization by seed numbers (0, 1, 2, and > 3 seeds) (Fig. 4B). Amongst the neural network models, the result of t-SNE in VGG16 showed the clearest distribution associated with the seed numbers, which explained seedless and seeded fruits with a single axis in 2D feature space.

Fig. 4

Distribution of prediction values and features, along with seed numbers. Each model was trained for binary classification of seeded/seedless fruits. (A) Distribution of the prediction values, in seedless (seed nos = 0), 1-, 2-, and > 3-seed samples. They showed a clear tendency of increase in prediction values in proportion to seed numbers. * and ** indicate P = 0.05 and 0.01, respectively, for statistical significance. (B) Visualization of two-dimensional t-SNE features in the convolution layer immediately before full-connection in each CNN model, according to the seed numbers. Outlined gold, and filled green, blue, and pink circles show samples with 0, 1, 2, and > 3 seeds, respectively. Outlined/filled squares with each color give averages with standard error (SE) bars.

Visual explanation of reasons for seeded/seedless in persimmon fruits

To visualize the relevant regions to the diagnosis of seedlessness, we applied two explainable-AI methods, Grad-CAM and Guided Grad-CAM (Selvaraju et al., 2017). We here targeted relevant regions in the block5_conv3 (immediately before the fully connection layer) and block3_conv3 layers of VGG16 with Grad-CAM, and block5_conv3 of VGG16 with Guided Grad-CAM (Fig. 5). With Grad-CAM, in comparison to seedless fruits, seeded fruits exhibited substantial relevant regions shifted from the apex of fruits for the block5_conv3 layer (see the fruit images with one, two, and six seeds in Fig. 5A). Relevant regions in the upper layer (block3_conv3) were distributed on finer positions surrounding the apex, which might correspond to the positions of bulges derived from seeds (Fig. 5B). Consistent with this hypothesis, in the block3_conv3, the area of the relevant positions was increased according to the actual seed numbers (see fruits with one, two, and six seeds, in Fig. 5B). Furthermore, the results from Guided Grad-CAM showed finer distributions of relevant regions around the apex in seeded fruits (Fig. 5C), which was also consistent with those with Grad-CAM. On the other hand, seedless fruits tended to show relevant regions in block5_conv3 just onto the apex of fruit, with Grad-CAM (Fig. 5A), and exhibit broader and weaker relevant regions than seeded fruits, in block3_conv3 with Grad-CAM and Guided Grad-CAM (Fig. 5B–C). Other than the area surrounding the apex, both in seedless and seeded fruits, relevant regions were distributed around the margins of fruit and had a gray background (Fig. 5B–C). These relevance level tendencies between seeded and seedless fruits were clarified by the two-dimensional histograms Η(r, d) (Fig. 5D for seedless, E for seeded). Both seeded and seedless samples showed higher relevance levels around the outer contours (ca d ∈ [0, 0.1]). More importantly, a narrow and higher peak was found at the putative apex (ca d ∈ [0.9, 0.95], indicated by a white arrow in Fig. 5E) of seeded fruits, whereas only a broader peak was found for seedless fruits.

Fig. 5

Visualization of explanation factors in classification of seeded/seedless fruits. (A) Original images of the fruits with seedless and 1–6 seeds. Visualization of the distribution of relevant regions in (B) VGG16 block5_conv3 and (C) block3_conv3 with Grad-CAM, and in (D) VGG16 block5_conv3 with Guided Grad-CAM, in seedless, 1-seed, 2-seeds, and 6-seed samples. The relevant regions in the block5_conv3 (immediately before full connection) with Grad-CAM showed ambiguous and wide distribution, while the shallower layer (black3_conv3) with Grad-CAM or with Guided Grad-CAM showed finer distributions surrounding the apex and the outer contours. (E–F) Two-dimensional histograms of relevance levels with Guided Grad-CAM, Η(r, d), where r is the relevance level of a pixel and d is the normalized distance of the pixel from the outer contour (d = 0) to the center (d = 1). A white arrow in the seeded fruit histogram (F) gives a peak that corresponds to the position of bulges due to seeds.

Discussion

Internal disorders/structures in persimmon fruit, such as seedless fruit, are only detectable by highly experienced experts. Even though this is possible, it is often difficult to pass on the relevant skills to reproduce the discriminating techniques, or to explain the reasons for the diagnosis because the features for seedlessness are determined by considering multiple factors and are not explainable from a single factor. Our results suggest that a noninvasive diagnosis of seedlessness may be immediately reproducible with CNN-based deep neural networks using simple pictures of persimmon fruits. Noninvasive diagnosis of fruit internal traits would be in high demand, regardless of fruit crop species, as detection of internal status generally requires special methods or facilities, such as acoustic vibration (Nakano et al., 2018) or ultrasonic inspection (Gaete-Garretón et al., 2005; Mizrach, 2008). Here, we proposed the application of deep neural networks with simple fruit images using normal cameras, in which diagnosis by the trained model took < 0.1 seconds per fruit. This substantially reduces both cost and time for the selection of seedless fruit and additionally would be useful to apply to the detection of other fruit internal traits.

Amongst the four models used in this study, VGG16, which carries the simplest layer structure, was the best-fitted for classification of seedlessness in persimmon fruits with 89% accuracy (Figs. 3 and 4), which was higher than the 70–80% accuracy of the highly experienced fruit selection experts. This suggests that relatively simple features may be involved in the difference between seedless and seeded fruits. The number of training/test images (n = 599) in our analysis was much less than other studies with deep neural networks, such as typical image recognition contests (n >100,000 in Image-net Large Scale Visual Recognition Challenge: ILSVRC). This situation may relate to the inconsistent diagnostic abilities among the models, in which CNN with highly complicated layers showed no significant improvement in classification. The fact that VGG16 with simple CNN could diagnose well is another merit for the backpropagation of CNN. Backpropagation of the neural networks using Grad-CAM and Guided Grad-CAM enabled us to visualize the reason(s) that the network could classify seedless and seeded fruits. In CNN with complicated layers, such as InceptionV3 or Resnet50, it is hard to backpropagate the layers further from the full-connection layer, while CNN with simple layer structure can allow visualizing of the upper layers, where the features are not highly pooled yet. As indicated in our results (Fig. 5), often visualization of upper layers may provide finer (or easier) interpretations for physiological or morphological research. Alternatively, the application of Guided Grad-CAM, which is improved Grad-CAM to show finer localization of relevance values, also might work better. Although not examined in this study, a combination of other backpropagation methods, such as layer-wise relevance propagation (LRP) (Bach et al., 2015) and its derivate tools may provide a better understanding.

In our analysis, CNN was trained to classify the pictures into binary categories, seedless and seeded fruits, although seed numbers are originally a quantitative phenotype. Importantly, distributions of prediction values for these two categories were increased in proportion to the seed numbers (Fig. 4). This suggested that CNN models for “regression” potentially work to estimate seed numbers as a quantitative phenotype. As indicated in the results section, this tendency was captured also in the application of explainable-AI methods, where the relevant regions for positive classification tended to be distributed more widely around the apex in fruit with more seeds (Fig. 5B–C). Future work associating the relevant regions and the actual internal structure (or seed positions) would give a proper understanding of physiological or morphological interpretation.

For practical use of the CNN models, there are three main issues; image quality, conditions when taking images, and accuracy to select seedless fruits. First, normal cameras attached to the fruit sorting lines (at least in Japan) take approx. 310,000 pixels (640 × 480 pixels) images (Kurita et al., 2006). Since we successfully applied approx. 50,000 pixels (224 × 224 pixels), which was resized from 2,000,000 pixels (1600 × 1200 pixels) original images, for the CNN models in this study, input image quality from normal cameras would be sufficient for use of our CNN models. Second, although this study arranged well-uniformed conditions for taking images that mimicked fruit sorting lines, the actual conditions for seedless diagnosis may be diverse. More fruit images in a wider variety of backgrounds might be able to improve classification ability for practical use. Test diagnosis on site, with specific picture background and light conditions, will be required for practical use. Third, the probability of “actual seedless” fruits in the “predicted seedless” class depends on the threshold of the prediction values. For instance, to maximize the probability of seedless fruit with the VGG16 model (see Fig. 3B for the distribution of prediction values), classification with threshold prediction values of 0.15 resulted in 87.9% seedless fruit in the predicted seedless category, which was substantially higher than 81.0% with the threshold of 0.5. Making these potential improvements in the future would enable practical prediction tools for seedless fruits.

Conclusion

Our application of deep learning with four CNN models classified seeded/seedless fruits with high accuracy from only 599 RGB images of outer appearances. Among the four CNNs, VGG16 with the simplest layer structure offered the highest performance. An increase in prediction values in proportion to the seed numbers suggested a potential “regression” approach to quantitatively estimate seed numbers in future. Visualization of an explanation map with Grad-CAM and Guided Grad-CAM properly pointed out substantial contributors in the images, which may provide interpretation or insights into physiological or morphological aspects in seedless fruit research. Our results suggest that deep learning can immediately produce “professional eyes” on fruit internal traits, which are usually cultivated with decades of experience, and also explain the reasons for the discrimination.

Acknowledgements

We thank Ryohei Kuroki in the Graduate School of Information Science and Electrical Engineering, Kyushu University, for setting up an analysis environment for deep learning and providing technical codes for convolutional neural networks.

Literature Cited
 
© 2021 The Japanese Society for Horticultural Science (JSHS), All rights reserved.
feedback
Top