The Horticulture Journal
Online ISSN : 2189-0110
Print ISSN : 2189-0102
ISSN-L : 2189-0102
INVITED REVIEW
Collaboration with AI in Horticultural Science
Eriko KuwadaTakashi Akagi
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML

2024 Volume 93 Issue 4 Pages 313-320

Details
Abstract

Artificial Intelligence, or AI, is becoming increasingly prevalent in a wide variety of scientific fields. The recent progress in deep neural networks, or simply “deep learning”, in particular, has been remarkable, which is leading to the development of valuable technologies for various biological applications. Nevertheless, the application of these AI technologies in the field of horticultural science has not progressed. In the horticultural field, there is often a tendency to compare/compete with the accuracy (or ability) of AI and experts with long experience or existing systems, which may prevent the widespread adoption of AI technology in horticulture. The current evolving AI technologies go beyond mere prediction and diagnosis; through the application of “explainable AI” techniques, which can allow novel interpretations from a scientific perspective. It extends not only to conventional image analysis, but also to various data formats, including genetic sequences or any other numerical array data. Here, we introduce recent developments and evolution of AI technologies, mainly deep learning, in plant biology and horticultural science. Recent applications of convolutional neural networks (CNN) in image analyses allowed prediction/diagnosis of various invisible traits. Further combined application of explainable AI techniques and physiological assessments may spot features that potentially reveal the mechanisms of objective traits from a novel viewpoint. We also examined prospects for new applications of deep learning in horticultural science, such as for genetic factors or with new algorithms represented by Transformer.

Introduction

Application of artificial intelligence frameworks to plant biology

Recent progress in artificial intelligence (AI) technologies has gradually allowed not only simple automation, but a wide variety of advanced classification, regression, object recognition, and generation of sentences/images, which often exceed the quality of expert with many years of experience. Machine learning (ML), a type of AI, has been classically applied in many biological fields, in which its characteristics are manually curated to apply them to model construction. A recent turning point in AI that has received a lot of attention is the development of high-quality deep neural networks, or simply deep learning (DL) frameworks, which are a type of MLs. The concept and basic systems of DL were established in the 1980s. The recent focus on DL has been its high accuracy in image classification following the development of a convolutional neural network (CNN) named AlexNet, which won the ImageNet Large Scale Visual Recognition Challenge (ILSVC) 2012 (Krizhevsky et al., 2012). DLs differ from traditional machine learning in that they consist of multiple layers and can automatically learn complex features and characteristics from large datasets (Janiesch et al., 2021). CNN is a frame of DLs applied to convolutional layers that can extract local features while considering translation invariance by applying sliding filters called kernels to the input data (Krizhevsky et al., 2012; Simonyan and Zisserman, 2014; Szegedy et al., 2015). In more recent years, the Transformer frame (Vaswani et al., 2017) has been developed allowing a remarkable improvement in accuracy, especially in natural language processing tasks (NLP) and generation, as represented by ChatGPT (Generative Pretrained Transformer) (Brown et al., 2020). The most notable feature of the Transformer is its attention mechanism, enabling each element within a sequence (or “token” in the model) to assign weights to all the other elements. The Transformer is not only applied to NLP tasks, but is also being increasingly utilized in other biological matrix data, including images and genome sequences (Borhani et al., 2022; Gupta and Shankar, 2023), as partially discussed later.

The application of DL in plant biology has been delayed, and CNNs were initially utilized only for simple image tasks, such as species classification (Dyrmann et al., 2016; Grinblat et al., 2016) and disease detection (Mohanty et al., 2016). Recently, DL has been developed for applications not only in simple image analyses, but also physiology and genetics. Most applications target highly sensitive measurements and automation of phenotypic annotation. A DL technology that automatically measures root hair length along the root axis allowed advanced study of the involvement of the circadian clock and light in the elongation of root hairs in Arabidopsis thaliana (Ikeda et al., 2023). Similarly, Goh et al. (2023) developed a motion-tracking confocal microscope that combines DL technology allowing the quantification of cell division dynamics in roots. Although it was difficult to find features specific to early sex differentiation in moss plants, Tomizawa et al. (2023) constructed a DL model to classify the sexes of Marchantia polymorpha. They further visualized the features relevant to sex differentiation via “explainable DL” tools (This concept is mentioned in later sections). In contrast to basic plant biology, agricultural science needs DL technologies that help achieve greater efficiency and automation in practical operations, making produce more valuable. Since around 2018, DL techniques have been widely applied to various areas, including yield prediction (Khaki and Wang, 2019; Maimaitijiang et al., 2020; Tanaka et al., 2023), disease detection (Ferentinos, 2018; Saleem et al., 2019), weed detection (Hasan et al., 2021; Rai et al., 2023), and crop quality management (Bhargava and Bansal, 2021).

Implementation of AIs in Horticultural Science

In the last five years, DL has also been gradually applied in horticultural science to realize automation or improved efficiency in tasks such as cultivar classification, crop detection, disorder diagnosis, and quality/yield prediction (Yang and Xu, 2021). For simple application only with publicly available image data, DL techniques are available to classify species in vegetables (Zhu et al., 2018) or flowers (Cıbuk et al., 2019). Varieties or cultivars with similar appearance (or features) make classification more challenging (Ponce et al., 2019; Rodríguez et al., 2018). Object recognition can play a crucial role in automation, particularly in the use of robotic technology in large farms or orchards. For instance, early signs of diseases, pests, or weeds can be detected by DL techniques (Fuentes et al., 2017; Wu et al., 2021), enabling optimized actions including supply of water/fertilizers or weed control. For operations involved in harvesting, DL techniques allow automatic recognition of fruit positions and maturity (Yoshida et al., 2022; Zhang et al., 2020), although robotic implementation responding in harvesting techniques has not yet been sufficiently developed. An image segmentation method with deep learning not only detects blueberry fruits in a bush and their mechanical harvestability, but also can delineate the differences in traits among different blueberry varieties (Ni et al., 2020). For disorders or quality diagnosis, numerous characteristics have been targets of DL application, including detection of apple diseases (Jiang et al., 2019; Liu et al., 2017), prediction of multiple fruit disorders in peach (Masuda et al., 2023b), detection of calyx-end cracking in persimmon (Akagi et al., 2020), or fruit internal damage in blueberry (Wang et al., 2019). These examples detected not only direct, but also indirect, symptoms in fruits, reflecting the internal states of the targeted disorders, which often allowed diagnosis of invisible disorders. Quality diagnosis involves non-destructive prediction of shelf life with CNN frameworks, such as in persimmon fruit (Suzuki et al., 2022) or melon (Qian et al., 2023), potentially contributing to food waste reduction or branding enhancement. Although normal camera images (or simple RGB images) are still common (and easy to apply) for DL analyses, hyperspectral or near-infrared spectra images, have been also become good targets in recent years (Xuan et al., 2022; Yu et al., 2018). A benefit of using these (invisible light) images is that they can define novel features that the human eye (or even experts with many years’ experience) cannot recognize. Yield prediction has often applied DL frameworks to images captured by unmanned aerial vehicles (UAVs) (Apolo-Apolo et al., 2020; Chen et al., 2019). Recent studies applied not only images, but also a multimodal approach combining images and meteorological data, achieving higher prediction accuracy (Barriguinha et al., 2022).

One of the bottlenecks in the application of AI in horticultural science is the difficulty in collecting large datasets and ensuring label reliability. The accuracy of DL models is highly dependent on the quantity and quality of training data. To ensure the robustness of a DL prediction, datasets in horticultural science may need to include extensive conditions, various varieties, and growth environments. Consequently, various image augmentation techniques, particularly generative adversarial networks (GAN) (Goodfellow et al., 2014), have been widely applied. GANs are employed to synthesize datasets tailored to individual situations, and researchers often use GANs to extend their captured data (Abbas et al., 2021; Bird et al., 2022; Lu et al., 2022). In cases where labeled data is limited, techniques like transfer learning (Ahmad et al., 2021; Behera et al., 2021), semi-supervised learning (Casado-García et al., 2022; Ghosal et al., 2019; Khan et al., 2021), reinforcement learning (Elavarasan and Vincent, 2020), and active learning (Albert-Weiss and Osman, 2022) can be also utilized to enhance DL accuracy.

Interpretable AIs that go beyond mere prediction

As mentioned, DL frameworks are potentially applicable to various horticultural “tasks”. On the other hand, regarding horticultural “science”, how can we find new biological insights with DLs beyond mere prediction? Initially, the features a deep neural network applies for prediction (or “the reason DL can predict”) had been black boxes, making it challenging to interpret DL model judgements. However, recent progress in the development of explainable AI (X-AI) techniques, including Saliency Maps (Simonyan et al., 2013), Grad-CAM, Guided Grad-CAM (Selvaraju et al., 2017; Springenberg et al., 2014), and Layer-wise Relevance Propagation (LRP) (Bach et al., 2015), have gradually allowed visualization of the features DL models used to make predictions. Although X-AI technologies may not attract much attention in the information science field, they may have big advantages or applications in biology, since insights into the features relevant to the prediction are exactly what researchers and farmers want to know. This may also relate to clarifying the empirical intuition of experts with many years of experience if we can reproduce their skills with DL frameworks. Further detailed analysis on the features visualized by X-AI may provide novel insights into objective phenotypes from various physiological aspects, as described in the next section.

Here, we will give some examples regarding the application of interpretable image diagnosis with X-AI techniques in horticultural crops. In persimmon, CNNs trained with normal pictures (or RGB image) of the fruit apex side can predict seed numbers in the fruit (Masuda et al., 2021). The application of Grad-CAM visualized the relevance weight, which is a key factor for prediction results, using mainly images around the apex and the shade in fruit peripheral regions. They clearly corresponded to actual seed positions and reflected fruit height, which is markedly affected by seed numbers (Masuda et al., 2021). Calyx-end cracking is a representative internal disorder in persimmon fruit, which often severely loses marketability depending on the degree of this disorder. Experts with decades of experience can diagnose this disorder, although no clear symptoms have been defined. A combination of several CNNs could diagnose calyx-end cracking in persimmon fruit with > 75% accuracy, comparable to experts in fruit sorting with many years of experience (Akagi et al., 2020). Importantly, the application of various X-AIs, including Grad-CAM, LRP, and their derivatives could visualize biologically interpretable features, reflecting the transition of the stress derived from internal cracking using color unevenness. These two examples may be cases in which the visualized features were very consistent with our biological knowledge. On the other hand, the features of X-AIs are often be hard to interpret biologically. Minamikawa et al. (2022) approached the prediction of peelability and hardness of Citrus fruit by integrating conventional statistics and an X-AI technique. Conventional statistics allowed a clear definition of correlations and potential networking among fruit trait characteristics. Then, the features visualized by X-AI were easily interpretable based on these trait characteristic networks, consistent with empirical knowledge of breeders with many years’ experience.

Even though we have no information regarding the X-AI spotting features, combinations of further biological assessments focusing on these features may provide novel insights into the physiological reactions involved in the phenomena predicted by AI. Figure 1 exemplifies the idea of combined application of X-AI and physiological analyses in callus regeneration. Prediction of shoot regeneration from an incipient callus is difficult. However, if it could be diagnosed by deep learning, X-AI can identify the key regions for regeneration. Physiological analysis focusing on the key regions may help detect a reaction specific to the fate of regeneration. Regarding an actual application, rapid over-softening of persimmon fruit is a severe disorder that is mostly unpredictable at harvest, even by experts with decades of experience in fruit sorting. Simple CNNs could predict this disorder from normal photo images at harvest with > 70% accuracy (Suzuki et al., 2022). Even so, the features visualized by Grad-CAM were hard to interpret in terms of fruit softening. The application of transcriptomic analysis of the featured and non-featured regions could allow an interpretation of physiological reactions specific to the potential symptoms of rapid over-softening from perspective of gene regulatory networks (Masuda et al., 2023a). Indeed, this AI plus an omics approach successfully revealed the region-specific radical ethylene-related response, which was consistent with a representative fruit softening mechanism. Such biological interpretations could be made not only with omics datasets, but even with a simple microscopic assessment.

Fig. 1

A schematic model for X-AI-guided physiological interpretation of premonitory symptoms. Callus regeneration, for which early diagnosis is difficult. In the later developmental stage, shoot regeneration may be visually diagnosable, but the causal physiological reactions may already have been processed or completed. If deep learning models can predict regeneration with images of early developmental states, X-AI can spot the features relevant to the premonitory symptoms (or potential early causal reactions). The featured region-specific assessments, in comparison with the non-featured regions, can provide physiological interpretations of the premonitory reactions in the fate of regeneration.

AIs on genetic factors

Although we may think that DL techniques are applicable only to image or language data, any array (or matrix) data is a potential target of DLs. As the current DLs are not limited as mere prediction tools, X-AI is thought to be very compatible with genetic factors and genomic sequences. The DNA sequences can be expressed as one-hot arrays, where four nucleotide residues, A, C, T, and G, are represented as [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], and [0, 0, 0, 1], respectively (Zou et al., 2019). There is an unlimited number of potential application platforms if using this concept. Notwithstanding, in horticultural crops, there have been few examples of applying DL frameworks to genomic analysis. Here, we introduce an example of constructing CNN models predicting gene expression from promoter sequences. Cis-regulatory elements (CREs) are short, non-coding DNA sequences recognized by transcription factors (TFs) that play a central role in gene expression regulation. Mutations in CREs have greatly contributed to the evolution of horticultural crops, allowing expression tuning related to agronomically important traits, including berry skin, fresh color, and fruit size (Alonge et al., 2020; Espley et al., 2009; Kobayashi et al., 2004). According to these evolutionary aspects, cis-editing has been proposed as one of the main approaches for next-generation breeding, highlighting the importance of targeted expression design, to further explore the potential of existing accessions/cultivars (Li et al., 2020). Even so, the definition of each CRE function is still difficult, mainly due to the complexity in the combinations used to recognize TFs and also due to the ambiguity of their recognition and regulation patterns.

A two-step DL model trained with the genome-wide promoter sequences in tomato successfully predicted gene expression activation in fruit ripening initiation (Akagi et al., 2022). In the first DL step, a fully-connected model predicting CREs from genomic sequences was constructed by training it on Arabidopsis DAP-seq data, as the CREs’ sequence patterns are quite conserved within a TF subfamily among plant species (O’Malley et al., 2016; Weirauch et al., 2014). Then, the second DL frame (a simple CNN) trained with the resultant CREs array for the genome-wide promoter regions could predict gene expression patterns in tomato fruit. A subsequent X-AI approach identified the CREs or nucleotide residues relevant to the prediction of the expression patterns in each gene, and this was experimentally validated with maturing tomato fruits (Akagi et al., 2022). This DL-based cis-decoding framework could be applicable not only to the characterization of cis-trans regulatory networks, but also to allele design with optimized expression from the viewpoint of actual breeding of horticultural crops (Fig. 2). Figure 2 shows an example of the cis-design based on DL and X-AIs. Even if there are no desirable expression variations in the current alleles (or standing allele variations), once a suitable model for prediction of expression patterns is constructed, feature visualization steps can find the nucleotide residues responsible. Artificial mutation of the responsible residues would efficiently result in a new expression pattern. If an allele with an optimized expression is predicted by the DL model, CRISPR-Cas9 may be able to derive the targeted allele, as suggested by Li et al. (2020).

Fig. 2

X-AI-guided cis-decoding potentially allows the design of novel alleles with optimized expression patterns. The existing alleles often have no desirable variations in expression patterns. Once a good DL model to predict expression patterns from promoter CREs or nucleotide residues is constructed, an explainable AI-based cis-decoding framework can provide the information on key CREs/residues relevant to their expression. This could lead to the design of novel alleles with optimized expression patterns, which cannot be done with the existing genetic diversity.

Models for gene expression predictions are not limited to the DL approach. The random forest machine learning method with k-mer based parsing of a genome allowed highly accurate prediction of gene expression related to cold stress in panicoid grasses, resulting in the construction of a cross-species applicable model (Meng et al., 2021). In Arabidopsis, gene expression related to heat and drought stress has been predicted by CNN on k-mer featured word bags, and X-AI has identified potential words (or k-mers) relevant to the prediction of the expression patterns (Azodi et al., 2020). Note that the prediction accuracy of these models is highly dependent on each biological context, such as treatments or organs. This means that rating the models’ ability only by prediction accuracy may not be appropriate, and it would be worthwhile to compare the independent features that each model applied for the predictions. In yeast, DL models with much larger scale sequence data are available to predict gene expression patterns with extremely high accuracy (Vaishnav et al., 2022). In horticultural crops, however, (at least as of the time of writing) it is very challenging to prepare such massive sample sizes and genome data volume. Thus, we propose focusing on the importance of feature interpretations from a biological viewpoint, while understanding the risks associated with imperfect prediction accuracy (Hanson et al., 2023), in horticultural science.

Perspectives with advanced DL frameworks

The recent progress in new AI frameworks has been very rapid, which makes it difficult to cover all techniques in detail. Notwithstanding, a few breakthrough technologies (or algorithms), represented by Transformer (Vaswani et al., 2017), have been widely implemented in many kinds of applications, and will be the new standard for DL frameworks, even in horticultural science. For instance, AlphaFold series, which allowed high-quality prediction of 3D protein structures only from amino acid sequences (Jumper et al., 2021), works based on Transformer. Training of massive genome and epigenome datasets with Transformer, named “enformer”, achieved extremely accurate prediction of expression patterns in mammals (Avsec et al., 2021). These may be easily transferred to application in horticultural crops in this post-genome era, although understanding their analytical mechanisms and the training sets will be essential for actual use. The prediction (or training) of AlphaFold underlies the co-evolution of residues in “a sequence”, which can be visualized in an attention map (Jumper et al., 2021; Senior et al., 2020). The training sets are published protein structures that cover a wide range of evolution, but not standing variations. This suggests that AlphaFold may often be unable to predict protein interactions or 3D structure with (recent) single residue substitutions that do not reflect the evolutionary continuity in the trained database. Transformer has also been used for image data, named Vision Transformer (or ViT) (Dosovitskiy et al., 2020), but the prediction accuracy was much weaker than with simple CNNs for a binary classification with 3 thousands of RGB images of persimmon fruit as the training data (unpublished data). On the other hand, a combined framework with ViT and CNN achieved perfect classification of plant species (N = 44) with only 44 thousand leaf images as the training data (Lee et al., 2023). Although currently there may be few direct applications of these techniques in horticultural crops, we expect that simpler implementation forms or improved combined platforms will be quickly established for these advanced AIs, rather than conventional bioinformatics handling nucleotide sequences.

Literature Cited
 
© 2024 The Japanese Society for Horticultural Science (JSHS)

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial (BY-NC) License.
https://creativecommons.org/licenses/by-nc/4.0/
feedback
Top