2022 Volume 91 Issue 3 Pages 408-415
In contrast to the progress in the research on physiological disorders relating to shelf life in fruit crops, it has been difficult to non-destructively predict their occurrence. Recent high-tech instruments have gradually enabled non-destructive predictions for various disorders in some crops, while there are still issues in terms of efficiency and costs. Here, we propose application of a deep neural network (or simply deep learning) to simple RGB images to predict a severe fruit disorder in persimmon, rapid over-softening. With 1,080 RGB images of ‘Soshu’ persimmon fruits, three convolutional neural networks (CNN) were examined to predict rapid over-softened fruits with a binary classification and the date to fruit softening. All of the examined CNN models worked successfully for binary classification of the rapid over-softened fruits and the controls with > 80% accuracy using multiple criteria. Furthermore, the prediction values (or confidence) in the binary classification were correlated to the date to fruit softening. Although the features for classification by deep learning have been thought to be in a black box by conventional standards, recent feature visualization methods (or “explainable” deep learning) has allowed identification of the relevant regions in the original images. We applied Grad-CAM, Guided backpropagation, and layer-wise relevance propagation (LRP), to find early symptoms for CNNs classification of rapid over-softened fruits. The focus on the relevant regions tended to be on color unevenness on the surface of the fruit, especially in the peripheral regions. These results suggest that deep learning frameworks could potentially provide new insights into early physiological symptoms of which researchers are unaware.
Shelf life is an important factor determining the quality of fruit crops. To date, breeding of new cultivars tolerant to over-ripening pathways, such as apple, tomato, or melon, and development of genetically modified crops, have contributed to the prolongation of shelf life (Smith et al., 1990; Atkinson et al., 2012). In addition to these genetic improvements, control of environmental conditions and artificial chemicals, such as storage in fine-tuned, low temperature controlled or modified atmospheres (CA or MA, respectively) (Brackmann et al., 1993; Park et al., 2018), 1-methylcyclopropene (1-MCP) treatment (Kubo et al., 2003), or layer-by-layer (LBL) edible coating treatments (Ribeiro et al., 2007), have successfully achieved longer shelf life and maintained high quality. On the other hand, non-destructive prediction of shelf life (or internal traits as a wider concept) has also been highly anticipated. For instance, application of acoustic technology (Zude et al., 2006; Suzuki et al., 2015) or an electronic nose technique (Gomez et al., 2008), have been proposed for the prediction of shelf life in horticultural crops, although costs for the instruments and detection time remain big issues in terms of actual application. Furthermore, shelf life is dependent not only on natural maturing behavior, but also on stresses caused by internal disorders or injuries (Nakano et al., 2001). The latter case often appears as rapid over-softening, of which the physiological features are distinct from natural maturation (Fig. 1A).
Appearances of the normal and rapid over-softened persimmon fruits and the date to softening distributions. A. Typical conditions of the rapid over-softened persimmon fruits, in which the epicarp becomes reddish and the pericarp has a melting texture. B. Outer appearances of the normal and rapid over-softened fruits used for the deep learning in this study. At glance, it is difficult to distinguish them. C, D. Two criteria for the rapid-softened (positive, red) and control (negative, dark green) fruits in the distribution of the days to fruit softening (see Materials and Methods for details).
Persimmon is a major fruit crop, especially in East Asia. Its shelf life depends on both environmental (or storage conditions) and genetic (or cultivar) factors. Especially, in some early maturing cultivars, rapid over-softening is becoming a big issue. Rapid over-softening in persimmon, which occurs on trees or within approx. 10 days after harvest, causes severe water-soaked patches in the fruit flesh, resulting in the loss of marketability (Fig. 1A). This disorder is also often induced after de-astringency treatments, which are indispensable for the marketing of astringent cultivars; these cultivars account for the majority of marketed persimmon (Akagi et al., 2011). However, it is extremely difficult to predict rapid over-softening from outer appearances at harvest visually (Fig. 1B), even for experts with decades of experience. Furthermore, the mechanism behind this disorder remains little understood. Hence, the development of techniques that identify symptoms (or indexes) of rapid over-softening in persimmon that can predict this disorder, would contribute not only to the selection of fruits with long shelf life, but also to insights into its physiological mechanisms.
Recently, deep neural networks (or simply “deep learning”), also known as artificial intelligence (AI), has made strong progress in image diagnosis. One key point has been the development of convolutional neural networks (CNN, LeCun et al., 2015) that allow dramatic improvements in performance. A simple 8-layer CNN (AlexNet: Krizhevsky et al., 2012) won the image classification task at the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012, and thereafter the winner of ILSVLC 2015, a CNN with 152-layers named ResNet (He et al., 2016), exceeded human standards. Recent free CNNs have become available with very simple/right RGB images captured from a normal camera. They can use ambiguity or multiple features for their assessments, which is ideal for the diagnosis of symptoms derived from multi-aspect reactions in plants. In the agricultural field, the application of CNNs has been proposed mainly in image diagnosis of stress and disease in crops (Sladojevic et al., 2016; Singh et al., 2018), or combinations of object recognition and classification (Sa et al., 2016; Ponce et al., 2019; Ni et al., 2020; Osako et al., 2020). CNNs could also assess internal traits of fruits that are not directly observable from the outer appearance in blueberry with hyper-spectrum images (Wang et al., 2018), and persimmon with normal RGB images (Akagi et al., 2020; Masuda et al., 2021). Although the features of deep neural networks have been unconfirmed, recent techniques for CNN backpropagations, such as Guided backpropagation, Layer-wise relevance propagation (LRP) (Bach et al., 2015), Gradient-weighted Class Activation Mapping (Grad-CAM) (Selvaraju et al., 2016), and their derivatives (Iwana et al., 2019), known as “explainable AI (X-AI)”, have allowed visualization of featured regions in original images. In other words, their application can provide insights into regions with symptoms (or indexes) for the objective traits.
Here, using an early maturing persimmon cultivar, ‘Soshu’, we attempted to apply multiple explainable CNNs to simple RGB fruit images captured with a normal camera to develop platforms to predict rapid over-softening and shelf life in persimmon fruit and to characterize any symptoms.
A total 1,080 ‘Soshu’ fruits were harvested from 6–7 years old trees planted in Fukuoka Agricultural and Forestry Research Center (N33.49873, E130.56339), at the same fully mature stage (skin color chart = 6), in Oct 2019. The RGB images (2992 × 2000 pixels) from the fruit apex side were taken at uniform distance, light, and background conditions, using a Nikon D5200 (digital camera), immediately after harvest. The dates for fruit softening at room temperature after the harvest were recorded as the index of shelf life. According to Sugiura et al. (2012), fruits that did not return to dent when touched were classified as over-softening.
Deep learning model constructionThe flow for deep learning binary classification and quantitative regression of rapid over-softening is given in Figure 2. For binary classification by CNNs, we prepared two image datasets. Dataset 1 consisted of fruits softened in 10 days as positive samples, and the remaining fruits (> 10 days for softening) as negative samples (Fig. 1C). Dataset 2 consisted of fruits that softened earlier and later than 30% of all fruits among the positive and negative samples, respectively (Fig. 1D). For regression tests by CNNs, Dataset-1 and Dataset-2 were annotated with the actual dates showing fruit softening.
Schematic flow of the deep learning frameworks used in this study. A total of 1,080 ‘Soshu’ fruits were applied to both the classification and regression tasks with CNNs. In the classification task, the samples were classified into binary categories, rapid-softened (positive) and control (negative) fruits, in the two criteria (Fig. 1C, D). Thereafter, backpropagation of the trained CNNs could clarify the regions relevant to the classification, which could provide potential symptoms recognized by CNNs. In the regression task, with the predicted and observed date to fruit softening, we further examined the regressions trends.
For image processing, all the images were resized to 224 × 224 pixels, and augmented by vertically and horizontally flipping, rotating, and adjusting brightness with the ImageDataGenerator function in Keras <https://keras.io/>. In the binary classification, we randomly selected 75% of images for training and 25% for validating. Three representative CNNs, VGG16 (Simonyan and Zisserman, 2014), InceptionsV3 (Szegedy et al., 2016), and ResNet50 (He et al., 2016) were implemented in Keras 2.2.4, and their fully-connected layer was customized for binary classification. For pretraining, each model was weighted with the ImageNet dataset <http://www.image-net.org/>. For the basic setting of the models, we examined four solvers (“SGD”, “Adam”, “Nadam”, and “RMSprop”), and learning rates (0.1–0.0001), and finally adopted SGD as the solver and 0.001 as the learning rate with categorical cross-entropy for the loss function. We examined 5–100 epochs to determine the optimized epochs for each model. To compensate for the class imbalance (or bias in the positive and negative sample numbers), the class weight option was applied in Keras.
For regression by CNNs, we firstly selected 10% of images randomly for testing, then the remaining parts were randomly separated into 70% for training and 30% for validating. Xception (Chollet, 2017) was implemented in Keras 2.2.4, and the fully-connected layer was customized to evaluate the Root Mean Squared Error (RMSE) against fruit softening dates. The model was pre-trained with the ImageNet dataset. For the basic setting of the model, we adopted “SGD” as the solver and a learning rate of 0.001. We examined 20–100 epochs to determine the optimized epochs. All models ran on Ubuntu 18.04 (DeepStation DK1000, 16GB RAM, GPU = 1).
Evaluation of trained CNN model performanceFor binary classification, the performance of the trained models was evaluated with the Student’s t-test for the distribution of predictions between rapid over-softened (positive) and control (negative) samples, confusion matrix, ROC-AUC values, Precision-Recall curve, and the F1-score in the test samples. For the confusion matrix and F1-score, the threshold prediction value was set as 0.5. The distribution of the prediction values in the binary classification was also examined for the potential association with the softening dates by Pearson’s product-moment correlation analyses. For regression, Pearson’s momentum correlation coefficients between the predicted and actual dates of fruit softening were evaluated.
Feature visualization in CNNsBased on a previous report (Akagi et al., 2020), we applied three feature visualization methods, Grad-CAM (Selvaraju et al., 2016), Guided backpropagation (Springenberg et al., 2014), and Layer-wise Relevance Propagation (LRP, Bach et al., 2015), to the trained VGG16 model in the binary classification, which showed the best performance among the models used in this study (see Results section later). Briefly, we implemented Grad-CAM to find the high-impact regions in the feature map at the last convolutional layer (conv3_block3 in VGG16). The implementation and characteristics of these feature visualization methods using the iNNvestigate library (Alber et al., 2019) are available at <https://github.com/uchidalab/softmaxgradient-lrp> and Akagi et al. (2020). The extracted features were localized on the original image as heatmaps.
The softening days were distributed continuously from 1 to 30 days (Fig. 1C, D). This was due to merging of the distributions of rapid over-softening as a kind of disorder, and normal softening, as a measure of maturation. It is difficult to qualitatively distinguish rapid over-softening from the distribution. Here, considering the distribution of softening days and empirical criteria (Nakano et al., 2001), we defined the fruits that softened in 10 days as “rapid over-softening” for Dataset 1, although this also included some fruits without disorders, but that matured early. Dataset 2 was for a simple comparison of each 30% of the earlier and later softened fruits, to exclude any effects of the middle ambiguously softened samples.
Prediction of rapid over-softening and shelf lifeFor binary classification, all three applied CNN models, VGG16, Resnet50, and InceptionV3, showed statistically significant classification performance for both Dataset 1 and Dataset 2 (Fig. 3A, B, accuracy = ~87.0% in the test samples, F1-score = ~0.85, ROC-AUC value = ~0.845). We could not detect substantial or consensus differences in performance between the two datasets. Of the three CNN models, VGG16, which has the simplest layer structure of the three, achieved the best performance, reaching 87.0% for accuracy, and 0.85 for the F1-score in Dataset 1 (Fig. 3A). The VGG16 model also achieved better generalization performance than the other two models, in which both accuracy and loss exhibited only small gaps between the training and validation data sets in 20–40 epochs. Further examination of the classification ability of VGG16, a confusion matrix, and distribution of the prediction values are shown in Figure 3C and 3D, respectively. They were consistent with the results of ROC curve analysis and suggested that Dataset 2 may work better for actual prediction of rapid softening, although the statistical values showed no substantial differences between the two datasets (P = 8.04e−13 and 2.30e−12 for the prediction distribution in Dataset 1 and 2, respectively).
Classification abilities of the three CNNs with the two rapid-softening criteria. A. Prediction accuracy (in training and testing), and F1 score in the classifications with VGG16, Resnet50, and InceptionV3, in Dataset 1 and 2, respectively. The F1 score was calculated as the harmonic mean of precision and recall with a threshold prediction value = 0.5. B. ROC curves, ROC-AUC values, and PR curves and the classifications with each model. Although ROC curves (blue lines) in any model were apart from the chance line (in red) (ROC-AUC value > 0.5), VGG16 exhibited relatively better performance both in Dataset 1 and 2. C, D. Comparison of the classification abilities of Dataset 1 and Dataset 2 with the VGG16 model. Datasets 1 and 2 showed no substantial differences in terms of significance, the confusion matrix (C) and distribution of the prediction values (or confidence) (D) suggested that Dataset 2 would be better for practical use.
Regression tests with the Xception model showed statistically significant performance with both Dataset 1 and 2 (RMSE (unit: days) = 6.2 and 7.8, respectively), while correlations between the predicted and observed softening days were not considered adequate for actual estimation of shelf life (Fig. 4A, r = 0.242 and 0.228 for Dataset 1 and 2, respectively). On the other hand, in previous reports to predict internal disorders or seedlessness (Akagi et al., 2020; Masuda et al., 2021), prediction values output from binary classification were correlated to the degree of disorders or seed numbers. Consistent with these results, the prediction values in the binary classification of rapid over-softening (here, 0 and 1 for positive and negative, respectively) showed substantially higher correlations to the over-softening days than those with the regression model (Fig. 4B, r = 0.422 and 0.477 for Dataset 1 and 2, respectively). In a comparison of the datasets, Dataset 2, which was applied to the earlier and later 30% of all samples, showed a higher correlation with the same test samples (P = 0.107).
Regression of the days to fruit softening. A. Correlations between the observed days to softening and the days predicted from the CNN regression model. B. Correlations between the observed days to softening and the prediction values (or confidence) for the binary classification. In both panels, Dataset 1 and Dataset 2 are shown in light blue and pink, respectively.
We applied three feature visualization methods, Grad-CAM, Guided backpropagation, and LRP, to the trained VGG16 model, for the 270 testing images with various shelf lives (Fig. 5A). They all tended to exhibit high relevance in the fruit peripheral regions (or contours), in which persimmon fruits with calyx-end cracking, an internal disorder, also showed strong relevance in feature visualization (Akagi et al., 2020). Among the three methods, Grad-CAM and Guided backpropagation tended to show wide relevance regions, including glossed areas, while LRP showed very narrow regions for relevance. This difference may be due to the fact that Grad-CAM and Guided backpropagation are both gradient-based visualization methods (Springenberg et al., 2014; Selvaraju et al., 2016), while LRP reveals the input pixels that are highly relevant to the results by decomposing the class likelihood into the input pixels (Bach et al., 2015). Here, we especially focused on the features surrounding the fruit apex, for which physiological signals were easier to access with microscopic analysis than in the peripheral region. Closing-up of the relevant regions in the three methods consistently tended to show color unevenness or a coarse texture on fruit skin (Fig. 5B for a representative). Note that the color or texture patterns in the relevant region were quite varied, so it was difficult to define a consensus visual feature.
Feature visualization of the rapid-softening classification. A. Feature visualization of the four fruits with various softening terms. The prediction values (or confidence) for the positive and negative classes are given on the left side of the original images. For the Grad-CAM and Guided backpropagation, the features for the predicted class are given, while LRP shows features for the positive (or rapid-softening) class. B. Closing up of the fruitlet surface in the regions with high relevance to the positive class in Grad-CAM (red) and without high relevance (dark green). The highly relevant regions tended to exhibit severe color unevenness.
Rapid over-softening was thought to show few visible symptoms in terms of outer appearance at harvest (as given in Fig. 1B). Our results suggested that, even in normal RGB images only from the apex side, certain features indicating rapid over-softening appear at harvest, and these could be captured with our fine-tuned CNNs with up to 88.4% accuracy. Even so, it may be possible to improve the prediction performance, although we adequately examined the parameters for the training models. The remaining issues are not largely due to the model frameworks, but to the physiological (or biological) characteristics of rapid over-softening. First, we used only images from the fruit apex side, although the opposite calyx side could also include informative features. Persimmon fruit softening is thought to be triggered via physiological reactions in the calyx (Itamura et al., 1991; Nakano et al., 2002). Hence, multi-input CNNs (Abbasi et al., 2019) with both apex and calyx side images would archive higher performance. Second, in terms of physiological definitions, we could not perfectly distinguish rapid over-softening and normal maturation in the binary categories, as shown in Figure 1C and 1D, in which the days to softening distributed continuously. In other words, our positive samples may include multiple physiological reactions to earlier softening that could confuse feature extraction. On the other hand, CNN models trained with rapid over-softening could be applied to shelf life regression for all samples, implying that common visual features would work for the prediction of persimmon fruit softening. This is supported by physiological reactions in rapidly softened and normally matured persimmon fruits, in which ethylene production is a common key factor that initiates softening (Nakano et al., 2001, 2002; Wang et al., 2017).
Conventional (or past) deep learning frameworks were not able to explain the reason for the prediction, while more recent explainable AI methods could visualize the features for rapid over-softening in this study. Although the patterns of features were not consistent among the samples, the relevant regions tended to localize in the peripheral region, in which persimmon fruits with calyx-end cracking, an internal disorder, also showed strong relevance in feature visualization (Akagi et al., 2020). Furthermore, the featured region often exhibited color unevenness, which was also consistent with the features in calyx-end cracking. Rapid over-softened persimmon fruits, especially following de-astringency treatment, become partially or fully reddish and rapidly produce a large amount of ethylene (Ortiz, 2005; Nakatsuka et al., 2011; Wang et al., 2017). These facts together suggest that the featured color unevenness in this study could be an index of stresses and micro-softening. Subsequently, the resultant ethylene signals rapidly spread to cause whole fruit softening. This development of explainable AI allows featured region-specific physiological analyses in “potentially rapid softened” fruits, before actual softening. In combination with the explainable AI technique, “featured region-specific” histological or transcriptomic analyses would shed light on incipient physiological reactions related to the symptoms of rapid fruit softening.
ConclusionOur application of deep learning with three CNN models could successfully classify rapid over-softening fruits with high accuracy from only 1,080 normal picture images of the outer appearance. The prediction values in the classification were correlated to the days to fruit softening; this could allow prediction of the fruit shelf life in persimmon. Feature visualization for the classification found potential symptoms of the rapid-softening on the surface of the fruits, which tended to be located on regions with subtle color unevenness in the peripheral regions. These results suggested that explainable deep learning can be a useful tool for predicting the occurrence of disorders that even experts cannot detect, and potentially could provide new insights into the physiological interpretations of these disorders.