Reviews in Agricultural Science
Online ISSN : 2187-090X
A Comparative Study of the Deep Learning Based Image Segmentation Techniques for Fruit Disease Detection
Manju BaggaSonali Goyal
Author information
JOURNAL FREE ACCESS FULL-TEXT HTML

2025 Volume 13 Issue 1 Pages 81-104

Details
Abstract

Agriculture’s productivity is a key factor in economic growth. One of the reasons that disease detection in plants is crucial in the world of agriculture is that diseases in plants are a fairly common occurrence. If sufficient care is not taken in this region, plants suffer major consequences, which have an impact on the quality, quantity, or productivity of the corresponding products. For instance, both living and non-living organisms can cause various diseases in stone fruits and other crops. Early disease patterns and clusters can be identified using computer vision technologies. This work focuses on deep learning-based crop image segmentation research. Firstly, the fundamental concepts and features of deep learning-based crop leaf image segmentation are presented. The future development path is enlarged by outlining the state of the research and providing a summary of crop image segmentation techniques together with an analysis of their own drawbacks. Crop image segmentation based on deep learning has still faced challenges in research, despite recent remarkable advances in crop segmentation. For instance, there are few crop images in the datasets, the resolution is modest, and the segmentation accuracy is not great. The real-field criteria cannot be satisfied by the imprecise segmentation findings. With an eye towards the aforementioned issues, a thorough examination of the state-of-the-art deep learning-based crop image segmentation techniques is offered to assist researchers in resolving present issues.

1. Introduction

Since plants are the primary source of food for humans, it is imperative that they be cared for. Around the world, stone fruits are grown on 5.07 million hectares of land, yielding 35.24 million tons of fruit annually. Apricots, peaches, nectarines, plums, and cherries are grown on over 43,000 hectares in India, where they yield about 0.25 million tons of fruit annually. Joshi et al. [1] predicted that if fruit wines are made from increased fruit production, orchardists will benefit financially from greater employment prospects and higher returns.

However, identifying healthy and diseased plants is a crucial step in the development of successful agriculture. To keep uninfected plants safe from diseased ones, it’s critical to identify the afflicted plants, as suggested by Haq and Ijaz [2]. Algani [3] predicted that since most disease symptoms are evident on the leaves, plant leaves are the primary source for leaf infection detection. Reddy and Neeraja [4] predicted the best method for identifying plant infections is to look for leaf disease. This is because various infections have distinct symptoms. Numerous illnesses affecting stone fruits are brought on by both known viruses and unknown graft-transmissible pathogens as predicted by Khan et al. [5].

Mangos, like other stone fruits, are susceptible to a number of diseases throughout their lives. One such disease is bacterial canker disease, which manifests as elevated, yellow-to-brown patches that are encircled by a continuous white halo. Pale yellow dots on leaves are the initial sign of powdery mildew disease. They spread rapidly to form massive blotches that can completely cover the surfaces of the petiole, stem, and leaves. Twisted and puckered leaves with black, round scabby patches on the underside are signs of scab disease. As the illness worsens, leaves drop and become yellow as depicted by Rao et al. [6].

Alshammari et al. [7] predicted that the production of olive oil accounts for about 80% of olive farming, with table olives making up the remaining 20%. Harvests of olives can be impacted by a variety of diseases and deficiencies, including olive Aculus Olearius, Olive Fly/Bug, Leaf Mold, Prays Oleae, Olive Bark Beetles, and Olive Borers. These include Olive Wilt, Angular Leaf Spot, Verticillium Wilt of Olive, Olive Knot, Peacock Spot, and Bacterial Leaf Blight. Aculus Olearius, Leaf Mold, and Leaf Spot can all be observed on the leaves of the host olive as presented by Lachgar et al. [8]. Yao et al. [9] presented that peaches are a significant stone fruit, and their production can be impacted by a variety of diseases such as Cankers/fruit rot, anthracnose, scab, bacterial spot, cytospora canker, powdery mildew, peach leaf curl, and others.

The production rate of stone fruits is impacted by four primary illnesses. Such disorders have never been easy to detect. Prior to now, the sole method for diagnosing plant diseases was visual analysis or observation with the unaided eye. For accurate disease assessment by specialists in the field, this technique necessitates ongoing crop field surveillance. For vast plant areas, the visual analysis procedure can be exceedingly expensive, labor-intensive, and time-consuming because it necessitates continuous human observations. The population’s exponential growth is quickly altering the availability and demand for food. Bedi and Gole [10] presented that a situation like this compels society as a whole to consider the application of cutting-edge technology in order to accurately and early diagnose diseases and apply corrective measures when needed.

Image segmentation algorithms have been shown to be one of the most cost-effective and precise methods for evaluating the characteristics associated with different plant diseases. Among the various methods, deep learning (DL) methods that use “convolutional neural networks (CNNs)” and are based on artificial intelligence (AI) performed well in image recognition tests. According to Liu et al. [11], CNN is a useful method for automatically identifying and categorizing pests, illnesses, and lesions. On the other hand, automatic detection and image categorization in the field face difficulties. Concerns regarding the type of diseased images include the background’s complexity, which complements the lesion’s subject, lighting variations, the photography hardware, and the angle at which the pictures were captured as presented by Saradhambal et al. [12].

The importance of identifying lesions rather than merely categorizing them has been emphasized by Barbedo [13]. The category by itself will not yield adequate results until it is combined with the disease’s position. Using radial basis function neural networks, an automated technique is presented for the separation of the fungal pathogen from the mango plant’s leaves. In comparison to the K-means algorithm, which produces an average specificity of 0.8178 and an average sensitivity of 0.8091, the suggested approach achieves greater performance with an average specificity of 0.9115 and an average sensitivity of 0.9086 as depicted by Chouhan et al. [14].

Using marker-controlled watershed segmentation (hue and gradient information), the affected areas of mango leaves are automatically identified. It was discovered by Zeng et al. [15] that the suggested disease identification achieved approximately 90% and 80% accuracy when the affected regions were chosen automatically and manually, respectively. The test dataset was made up of images that were sourced from the internet.

The precision of a quantitative evaluation of crop disease severity and the recognition of crop illnesses are directly impacted by the segmentation of disease lesions in the images of leaves. Research is focused on finding the most effective and high-quality ways to remove damaged leaves from crops. Lesions have been extracted and recognized during the past 20 years using conventional techniques of image processing such as edge detection, color space transformation, feature space transformation, and others.

It has proven difficult to develop an automated system for detecting leaf disease in stone fruits. Many diseases might have outwardly similar symptoms, making it challenging to distinguish between them using more nuanced indicators. For example, mango anthracnose and scab share similar visual signs. This study looks at several stone fruit leaf diseases that have been identified and categorized early on using deep learning technologies. In addition, Table 1 presents a comparison of strong deep learning tools, which greatly aids the researcher in selecting one based on their problem description. The literature has been reviewed, and then the topic of segmentation and categorization has been discussed.

Table 1: A comparison of popular open-source deep learning tools

Tools Suitable user interface Practicality
TensorFlow Python Support for distributed applications, high performance, portability, and flexible development
Keras Python Adaptable and makes building neural networks easier
Caffe MatLab, Python High readability, scalability, speed, quantity of users, and community reach
Pytorch Python, Lua, C Modularization, support for dynamic neural networks, ease of development and debugging, minimal learning costs
Theno Python Adaptable and very effective
Deeplearning4j Python, Scala, Java Scalable and useful for many uses, including voice, picture, and natural language processing. It can also be used to train models on huge datasets.

The segmentation portion receives less attention. Additionally, a more universal plant segmentation technique that works in both controlled and uncontrolled settings need to be created and put into practice. Since the primary goal of image segmentation is to separate the symptom information from the backdrop, the most important task in a complex environment is how to segment the images while localizing and detecting damaged plant leaves as predicted by Nanehkaran et al. [16]. Using previously trained DL models, numerous authors have investigated the categorization of both single- and multi-biotic leaf diseases. Once the dataset was pre-processed as part of the initial phase of image processing, the authors used classification algorithms to identify the disease’s spots. The classification model does not yet provide adequate boundary identification for the lesion. Consequently, it is especially crucial to instantly identify these illnesses on the leaves of stone fruits. Singh and Misra [17] identified illnesses on plant leaves; several excellent image-based segmentation techniques have been developed.

We searched Google Scholar using the keywords “deep learning-based image segmentation techniques for crop disease detection” or “crop disease detection using image segmentation methods” to find the most recent research in order to better summarize the DL based Image Segmentation Techniques. The following is the article’s structure: Section 2 investigates the concept of crop image segmentation. We covered the definition of deep learning and its applications in Section 3. The primary body of the examined literature is contained in Sections 4 and 5. Based on deep learning crop (Leaf) image segmentation, Section 4 presents the five network structures: Mask R-CNN, U-Net, Seg-Net, Mask Scoring R-CNN, and DeepLabv3+. The segmentation techniques employed recently on solely stone fruits are introduced in Section 5. Sharing evaluation metrics and data sets that come from the well-known crop image analysis tasks takes place in Section 6. Section 7 contains the article’s outlook and summation.

2. Crop image segmentation

2.1 Problem statement

Using computer image processing technologies to analyze and process images in order to attain segmentation, extraction, and three-dimensional rearrangement is known as image segmentation based on crop leaves imaging. To distinguish the affected region from the entire area of the leaf, some studies have employed segmentation algorithms to split an image into two or more useful parts. The accuracy and dependability of crop disease identification can be significantly increased by using the fundamental DL based architectures to analyze lesions and other regions of interest qualitatively or even quantitatively. Image processing is also used to estimate the severity of crop leaf disease in addition to evaluating the quality of fruit as depicted by Patil and Shekhawat [18]. The primary variety of crops used as objects at the moment are their leaves, stems and fruit surfaces.

The following steps comprise the image segmentation technique for crop disease detection:

1. Acquire a data set of images of crops, usually consisting of training, validation, and test sets. A common practice in deep learning image processing is to partition the data set into three sections. These are: the test set is used to confirm the model’s final outcome; the verification set is used to modify the model’s tunable parameters; and the training set is used to train the network model.

2. To increase the size of the data set, preprocess and enlarge the image, usually by standardizing the original image and applying random rotation and scaling.

3. To segment the leaf image of a crop, apply the appropriate image segmentation method. Next, export the segmented photos. This essentially entails identifying the infected leaf and stem, measuring the affected area, figuring out the infected area’s shape, and figuring out its color.

4. To validate the efficacy of crop image segmentation, it is necessary to establish effective performance indicators for verification. This is a crucial step in the procedure, which is called estimating performance evaluation.

2.2 Image segmentation

In the domain of image understanding, segmentation of images has emerged as a prominent subject and an enduring issue the field of machine vision. The term “image segmentation” describes the process of dividing a picture into multiple disconnected regions based on characteristics including color, grayscale, spatial texture, and geometric structures. In the same location, these qualities demonstrate consistency or similarity, but there is a noticeable variance when compared to various areas. Segmentation of images can be classified as semantic, instance, or panoramic based on the varying coarse and fine granularities of the segmentation process and eventually segmentation process makes lot of improvements in every field whether it is agriculture, medical, language translation tool or genomics as suggested by Dhawan et al. [19], Bouslimi and Echi [20], Hajari et al. [21], Sabba et al., [22]. Figure 1 and 2 demonstrate the segmentation of tomato and peach leaf.

Figure 1: Extraction of infected portion from tomato leaf through segmentation
Figure 2: Leaf area segmentation in peach image of infected leaf area

Conventional image segmentation techniques such as threshold-based segmentation, region-based segmentation, and edge detection-based segmentation should be feasible for the segmentation. These techniques segment the image using mathematical and digital image processing knowledge. Although the segmentation process is quick and the calculation straightforward, the segmentation’s detail accuracy cannot be assured. Although deep learning-based segmentation techniques are now incomparable to traditional picture segmentation techniques, the concepts are still valuable to understand. Deep learning-based techniques have currently made significant progress in the area of image segmentation. Their accuracy in segmentation has outperformed that of conventional segmentation techniques. The first deep learning system to effectively use image semantics with instance segmentation was the fully CNN (convolutional neural network). This was the first study to segment images using convolutional neural networks. The notion of complete convolutional networks was put up by the writers. Exceptional segmentation networks with a strong advantage in processing fine edges are U-Net as depicted by Ronneberger et al. [23] and Zhang and Zhang [24], Mask R-CNN by Yao et al. [25], Mask Scoring R-CNN by Huang et al. [26], Seg-Net by Badrinarayanan et al. [27] and DeepLabv3+ by Wang et al. [28].

2.3 Disease recognition techniques for plants

The use of images and machine learning in automated plant disease identification systems, particularly CNNs, has significantly improved the precision of a valid examination. CNNs, a branch of artificial intelligence, have gained popularity as a flexible technique for ingesting copious volumes of diverse data and producing accurate forecasts of difficult-to-predict occurrences as predicted by Liu et al. [11].

3. Deep neural network architectures for image segmentation

Instead of being a single strategy, a class of algorithms and architectures called deep learning can be used to a variety of problems like augmented reality, video surveillance, driverless cars, medical image analysis, crop disease detection and even processes videos in real time to keep an eye on drivers’ actions while they’re driving as presented by Merampudi et al. [29], Sumatri et al. [30], Panwar et al. [31], Iftikhar et al. [32], Yayla et al. [33] and Salami [34]. An overview of supervised and unsupervised deep neural network architectures, including encoder-decoder and autoencoder models, generative adversarial networks, convolutional neural networks, recurrent neural networks, and long short-term memory, is given in this section. These networks are primarily used for image segmentation. Convolutions are a powerful tool for creating semantic activation maps with constituents that naturally comprise different semantic segments. These internal activations have been used in a variety of ways to segment the images. Table 2 provides an overview of the main deep learning-based segmentation methods and a succinct explanation of their main contributions.

Table 2: Brief description of segmentation algorithms

Year Segmentation model Segmentation type Description
2022 Mask RCNN Instance segmentation with ResNet50 and ResNetx101 as the backbone architecture Segmenting using masked lesion area to improve accuracy
2022 Mask Scoring RCNN Instance segmentation with ResNet50 and ResNetx101 as the backbone architecture Segmenting using masked lesion area to improve accuracy
2021 U-Net Semantic Segmentation with attention mechanism Multi-scale extraction integration
2020 Seg-Net Semantic Segmentation A fully convolutional neural network approach is used
2019 Faster RCNN Instance segmentation with region proposal network Deep convolutional neural networks with object detection models are used
2019 Seg-Net Semantic Segmentation with encoder-decoder architecture SqueezeNet encoder with depth wise separable convolution
2018 DeepLab Semantic Segmentation Pyramid of spatial pooling, Atrous convolution, and DenseCRF
2017 WNet Semantic Segmentation with encoder-decoder architecture Segmentation in unsupervised learning with normalized cut loss
2017 Attention based Segmentation Instance Segmentation with recurrent Neural Network architecture Focus modules for the segmentation of images
2017 PspNet Semantic Segmentation Multiple scale pooling to achieve scale-invariant segmentation
2017 Mask-RCNN Semantic Segmentation Segmenting using region proposal network
2015 DeepMask Class Specific Segmentation Segmentation and classification through concurrent learning
2015 FCN Semantic Segmentation Full convolutional layers

3.1 Convolutional neural networks

An animal’s visual brain served as the biological model for the multilayer neural network known as a CNN. The architecture is especially helpful for applications that process images. Yann LeCun invented the first CNN, with an architecture that was centered on handwritten character recognition, including postal code interpretation. Deep CNN consists of several layers, such as fully connected, pooling, and convolutional layers. Starting layers of a deep structured network identify features like edges, while later layers merge these elements to create higher-level input qualities. Basic CNN architecture is shown in Fig. 3. Among the most popular CNN architectures used for crop disease detection are GoogleNet by Zhang et al. [35], AlexNet by Rao et al. [6], ResNet by Rajbongshi et al. [36] and VGGNet by Uguz and Uysal [37].

3.2 Recurrent neural networks

This is one of the major types of ANN i.e. Recurrent neural networks (RNNs) are a subclass of neural networks in which the connections among the nodes in a layer create a directed graph along a temporal order of variables. It is usually possible to model the relationship between a variable’s current state and its past states thanks to recurrent connections. Because the RNN-based method can handle sequential data to create predictions, like in motion recognition, it has drawn a lot of interest as presented by Rumelhart et al. [38]. Newer RNN models that overcome issues like vanishing gradients allow training on longer sequences, such as Gated Recurrent Units (GRU) or Long Short-Term Memory networks (LSTMs). A few recent studies have demonstrated the efficacy of RNN techniques for processing fixed size variable length data in a sequential manner, like an image. It has been demonstrated, for instance, that an RNN architecture based on GRU can effectively describe dependencies across various plant observation photos as observed by Lee et al. [39] or that discriminating regions of images may be captured using LSTM for fine-grained categorization as observed by Zhao et al. [40].

Figure 3: The basic architecture of CNN

3.3 Encoder-Decoder models

A set of models known as encoder-decoders are trained to transfer data points from a domain of input to one of output using a 2-stage network as predicted by Goodfellow et al. [41]. The provided image is routed through convolution and pooling blocks in the encoding step to produce feature maps. The feature map is decoded using a deconvolution technique in the decoding step to create a map that is the same size as the original image. These models are widely used for image-to-image translation and sequential-to-sequential modelling in Natural Language Processing applications. The output of these models can be an improved image, such as one that has been enhanced through super-resolution, image de-blurring, or segmentation mapping. A specific type of encoder-decoder paradigm where the input and output are the same is called an auto-encoder. Some researchers combine picture segmentation techniques with the region of interest (ROI). For instance, according to Kao et al. [42], the convolutional autoencoder acted as the ROI’s background filter when determining an image’s ROI.

3.4 Generative adversarial networks

In order for the model to be able to produce new instances from the original dataset, generative modelling entails automatically identifying and learning patterns in the incoming data. The generative adversarial network could produce images that could be falsified in order to accomplish a significant dataset enlargement with a minimal loss of image attributes. This was possible since the network is essentially a deep learning model and was one of the most popular techniques for unsupervised data improvement. GANs consist of two parts: a generator that is taught to create new datasets; for instance, in computer vision, it creates new images from real-world images already in existence; and a discriminator that compares those images with some real-world instances to distinguish between real and false images. For the purpose of detecting plant diseases, authors in Pujari et al. [43] employed an enhanced method that combines generative adversarial networks with CNN. Abbas et al. [44] presented a deep learning-based technique for identifying tomato leaf diseases that creates artificial images of tomato plant leaves using a conditional generative adversarial network.

4. Systematic literature review of DL based image segmentation models used for crop disease detection: Assessing current review articles and determining a justification for the current review

In recent years, a significant amount of research has focused on the impact of biotic and abiotic diseases on plant health. While there are few review publications addressing the importance of image segmentation for crop disease diagnosis, some have attempted to provide insights into this complex issue. These reviews differ in their approaches, quality, and findings, with a few focusing on specific areas like image compression, robotic perception, and medical imaging [45, 46, 47, 48, 49, 50]. Some studies have used qualitative techniques, such as thematic analysis, to examine image segmentation algorithms like U-Net and Mask-RCNN, while others have employed quantitative methods like meta-analysis or systematic reviews. This paper stands out by focusing solely on deep learning-based image segmentation for leaf disease detection in agriculture, highlighting its role in increasing production while addressing challenges specific to agriculture, such as dataset limitations and accuracy concerns. It offers valuable insights for researchers by analyzing existing constraints and proposing future research directions. Additionally, this survey provides a comprehensive overview of crop disease detection studies, summarizing their objectives, segmentation methods, and results. By addressing the gaps between general deep learning techniques and their agricultural applications, it serves as a unique resource for advancing agricultural disease detection research.

It is challenging to identify plant diseases in their natural environments due to the significant variations in their shape, size, texture, color, backdrop, arrangement, and imaging illumination. Because CNN has a powerful feature extraction capacity and have good feature expression potentialities when it comes to picture segmentation. Neither extensive picture preprocessing nor manual feature extraction are necessary. Consequently, in recent years, CNN has been utilized to segment crop leaf images. In the field of detection of diseases, it has had remarkable results. Apart from that, Fig. 4 presents a chronology of some of the best-performing models for DL image segmentation since 2015. Several learning-based segmentation techniques are surveyed in this part as shown in Table 3.

Figure 4: Progression of image segmentation methods based on deep learning

4.1 Fully convolutional models

Full convolution neural network (FCN) is used for picture segmentation based on semantics. These days, FCN serves as the foundation for nearly all semantic segmentation models. Using convolution, FCN initially extracts and codes the features from the input image. The feature image is then gradually resized using either up sampling or deconvolution to match the original image’s size. Plant diseases and pest segmentation techniques can be categorized into three groups based on variations in FCN network topology: conventional FCN, U-Net by Sodjinou et al. and Chen et al. [57, 58], and Seg-Net by Goodfellow et al. [41]. The network structure of FCN is shown in Fig. 5.

Figure 5: The Structure of full convolution neural network for Olive Leaf Scorch

Table 3: Examining and contrasting several segmentation methods for the diagnosis of plant diseases

Ref Year Objective Techniques applied Result
Chouhan et al. [14] 2020 A scale-invariant feature transform technique is used to provide an automated method in order to separate the fungal diseases from the mango leaves. Radial basis function Neural Network Average Specifcity = 0.9115 and Sensitivity = 0.9086
Singh and Misra [17] 2017 Provided a method for image segmentation that is used to automatically detect and categorize plant leaf diseases in pine trees. Genetic algorithm Accuracy 97.6%
Yao et al. [25] 2022 ResNet50 and ResNetx101 are utilized as the backbone architecture for classification, and instance segmentation is employed to obtain detailed information about peach leaves, such as peach disease, masked lesion regions and the severity level of a disease. Mask R-CNN and Mask Scoring R-CNN Focal Loss function improved rate of recognition and segmentation accuracy
Pujari et al. [43] 2013 Identified diseases in mango, grape, and pomegranate using an ANN classifier by employing the Run length Matrix approach to extract textural information from ROI. Watershed techniques, K-means clustering, Thresholding and region growing The respective classification accuracies for the affected and normal fruit varieties are 76.6% and 84.65%.
Gulhane and Gurjar [51] 2011 Utilized to separate individual cotton leaf pixels in a picture in order to recognize and classify cotton diseases. Color-image segmentation method Accuracy 90.5
Bashish et al. [52] 2022 Plant leaf disease detection and categorization using leaf texture characteristics calculations. K-means clustering technique Precision 93%
Revathi and Hemalatha [53] 2014 Demonstrated varying classifier accuracies for cotton leaf disease identification by using color and texture data to identify the edge. Particle swarm optimization feature Selection method Accuracy 94%
Ali et al. [54] 2017 Citrus disease classification based on textural characteristics and color histogram ΔE color difference algorithm Accuracy 99.9% and sensitivity with 0.99 area under the curve
Naranjo-Torres et al. [55] 2019 This technique extracted the region of interest from the Softmax layer and measured the maturity of the fruits. Convolutional autoencoder (CAE) + Backpropagation neural network (BPNN) Accuracy 100%
Khan et al. [56] 2022 Using deep learning, a framework can identify the type of disease and determine how much of a particular tomato leaf is afflicted. Semantic segmentation based deep convolutional neural network (DCNN) Accuracy 97.6%
Sodjinou et al. [57] 2021 To separate weeds and crops using agronomic color pictures U-Net Accuracy 99.19%
Chen et al. [58] 2021 A novel method was developed to increase the accuracy of lesion segmentation in rice leaves by utilizing an attention mechanism and multi-scale extraction integration. U-Net Accuracy 94%
Zabawa et al. [59] 2020 This framework counts the number of grapevine berries in a picture by identifying individual berries in the image using a convolutional neural network. Semantic segmentation Accuracy 94.0%
Shao et al. [60] 2021 This technique integrated the watershed algorithm for dense rice image recognition with the Transfer learning-based localization-based counting fully convolutional neural network model. FCN + watershed algorithm Accuracy 89.88%
Bai et al. [61] 2017 Image segmentation using fuzzy clustering based on neighborhood grayscale data to identify cucumber leaf spot disease Fuzzy C-means Average positive error: 0.04%, Average negative error: 0.10%, Average segmentation error: 0.12%,
Yadav et al. [62] 2021 CNN models that are used for automatic disease identification in peach crops are segmented using grey level slicing on pre-processed leaf pictures. Gray level slicing imaging method Accuracy 98.75%
Joshi et al. [63] 2019 Utilizing black gram leaf picture segmentation and enhancement, a fully automated, non-invasive technique is suggested to promptly identify the illnesses.

Otsu thresholding + mask

VGG 16 CNN

Accuracy 98.2%
Rani and Amsini [64] 2017 Fuzzy set operation for Otsu-based color image segmentation is used to identify the diseased area in litchi fruits and leaves in order to extract the existence of diseased areas from the images. Otsu based color image segmentation Not mentioned
Singh and Shekhawat [65] 2019 A dependable technique for recognizing certain leaf spot-type illnesses that affect olive trees is found. Histogram thresholding + K-means segmentation Not mentioned
Mohapatra et al. [66] 2022 Suggested using convolutional neural networks (CNNs) as a metaheuristic to identify and diagnose diseases in mango leaves. Fuzzy c-means Accuracy 91.2%
Saleem et al. [67] 2021 The suggested leaf vein-seg method uses a cubic support vector machine to segment the leaf’s vein pattern in order to quickly diagnose the disease in mango leaves. Novel leaf vein-seg approach Accuracy 95.5%
Lin et al. [68] 2019 The degree of leaf infection on a cucumber plant was determined by the CNN model, which has a higher computational complexity. U-Net Average pixel accuracy = 96.08%, Dice accuracy = 83.45% and intersection over union = 72.11%
Kaur et al. [69] 2023 The Deep Segmentation CNN model was trained using the labelled, enriched, and augmented data. This semantic segmented data was identified and classified for both single and multiple tomato leaf illnesses. U-Net + Seg-Net Accuracy 98.2%
Wang and Zhang [70] 2018 Suggested a full convolution neural network-based approach to give a crop leaf disease monitoring system a theoretical foundation. FCN Segmentation accuracy 96.26%
Kerkech et al. [71] 2020 The objective is to map out unhealthy regions of the vineyard so that they may be precisely and quickly treated. This will ensure that the vines remain in a healthy state, which is crucial for managing yield. Seg-Net Not mentioned
Stewart et al. [72] 2019 Shown how deep learning-based instance segmentation techniques combined with UAV technology can be used to give precise, high-throughput quantitative measurements of maize plant disease. R-CNN Not mentioned
Wang et al. [73] 2019 A tomato disease detection system is presented that is based on object detection models and deep convolutional neural networks. Faster R-CNN and Mask R-CNN Lowest mean time (0.123)
Deng et al. [34] 2023 Overcoming obstacles such small infected spots and hazy borders in order to accomplish quantitative identification and accurate segmentation of tomato leaf diseases. MC-UNet incorporates Cross-layer Attention Fusion, Multi-scale Convolution, SoftPool, and SeLU activation. Accuracy 91.32%
Divyanth et al. [74] 2023 To precisely locate, divide, and assess the severity of maize disease lesions in challenging field settings. Two-phase method for semantic segmentation that combines DeepLabV3+ and U-Net. Accuracy 96%
Zhu et al. [75] 2024 To enhance apple leaf disease segmentation in intricate settings with inconsistent lighting and overlapping leaves. LD-DeepLabv3+, a two-stage system with adaptive loss and attention processes 98.70% IoU for leaf segmentation and 86.56% IoU for spot extraction
Taji et al. [76] 2023 To better detect and manage plant diseases by classifying and segmenting them. Hybrid model that combines instance/semantic segmentation, transfer learning, and GAN Accuracy 98.78%
Yue et al. [77] 2023 To segment contaminated tomatoes in real time for harvesting and leaf health monitoring. Improved YOLOv8s-Seg using feature fusion, RepBlock, and SimConv. F1-score: 88.7%, mAP at 0.5: 92.2%, inference time: 3.5 ms

Conventional FCN: A unique approach to segmenting leaf diseases of maize crop is based on complete convolution neural networks was presented by Wang et al. [70] in response to the difficulty that typical computer/machine vision is sensitive to changing lighting and complicated backgrounds. This novel method’s accuracy of segmentation technique was 96.26%. Wang et al. [78] suggested a method for classifying pests and plant diseases relying on enhanced fully convolutional model (FCN). After a convolution layer had collected feature information from multiple layers from the leaf lesion image of maize crop, this approach performed a deconvolution operation to restore the dimension and clarity of the original image. The accuracy rate was 95.87%, the small affected area’s segmentation was highlighted, and the lesion’s integrity was ensured in comparison to the first FCN technique.

U-Net: U-Net is a conventional FCN structure in addition to an encoder-decoder structure. In order to facilitate the recovery of segmentation information, it is characterized by the introduction of a layer-hopping connection that links the feature map from the coding step with that from the decoding stage. Fifty cucumber powdery mildew leaves that were collected in a natural setting were separated using a U-Net based convolutional neural network by Lin et al. [68]. To safeguard the neural network against weight initialization, a batch normalization layer has been added to each convolution layer, in contrast to the original U-Net. The investigation demonstrates that the convolutional neural network based on U-Net is superior to the RF i.e. random forest, K-means, and GBDT techniques currently in use for effectively segmenting powdery mildew on cucumber leaves at the pixel level, with an average pixel accuracy of 96.08%. With fewer samples, the U-Net technique can segment the lesion site in a complicated backdrop rapidly and precisely. The network structure of U-Net is shown in Fig. 6.

Seg-Net: A traditional encoder-decoder architecture is also included. One of its features is that the decoder’s up sampling operation uses the index of the encoder’s largest pooling operation. An algorithm for segmenting images for unmanned aerial vehicles was proposed by the authors in [71]. Seg-Net was used to segment 480 samples of visible and infrared pictures into four categories: healthy, symptomatic, shadows, and ground. On grapevines and leaves, the recommended method’s detection rates were 92% and 87%, respectively. The network structure of Seg-Net is shown in Fig. 7.

Figure 6: The Structure of the U-Net
Figure 7: The Structure of the Seg-Net

4.2 Mask R-CNN

One of the most popular image instance segmentation techniques available now is mask R-CNN. It can be conceptualized as a detection- and segmentation-based network-based multitask learning technique. To identify individual lesions and count the number of lesions, for example, instance segmentation can be applied when numerous lesions of the same sort exhibit adhesion or overlap. Semantic segmentation, however, frequently addresses numerous lesions of the same kind collectively. Using an image from an unmanned aerial vehicle, Stewart et al. [72] trained a Mask R-CNN model to segment lesions of maize affected by northern leaf blight (NLB). One lesion may be correctly identified and segmented by the trained model. The IOU between the projected lesion and the baseline real value was 0.73, with an average accuracy of 0.96, at the IOU threshold of 0.50. In addition, several research use object detection networks in conjunction with the Mask R-CNN architecture to identify plant diseases and pests. Ask R-CNN and Faster R-CNN are the two models that Wang et al. utilized in [73]. Mask R-CNN was used to detect and segment the position and form of the infected area, whereas Faster R-CNN was utilized to determine the class of tomato infections. The mask R-CNN network construction for a plant leaf is shown in Fig. 8.

The findings demonstrated that the suggested method capable of precisely and swiftly identifying 11 tomato classes of infections by dividing the position and form of damaged patches. Mask R-CNN achieved a good identification rate of 99.64% for all tomato disease classes. The segmentation strategy outperforms the categorization and identification network approaches in terms of obtaining lesion information. However, it requires a large amount of labelled data, and obtaining that data pixel by pixel can be costly and time-consuming, much like the detection network.

Figure 8: The network structure of Mask R-CNN

5. Measures for image segmentation models

Depending on the study’s aim, evaluation indices may differ. But up until now, the majority of studies have concentrated on metrics that measure model correctness. The most widely used metrics are Accuracy, Precision, pixel accuracy, mean Average Precision (mAP), F1 score, Recall and IoU, ROC-AUC, Balanced accuracy, kappa etc.:

Definitions of accuracy, precision [79] and recall are:

  
Accuracy = TP + TN TP + TN + FP + FN (1)
  
Precision = TP TP + FP (2)
  
Recall = TP TP + FN (3)

In Formulas (1) TP is true positive, FP is false positive, TN is true negative and FN is false negative, it is expected to be 1 and is actually 1, indicating the quantity of lesions on plant leaves that the algorithm correctly identified. The number of lesions that the algorithm misidentified—with a predicted value of 1 and an actual value of 0—is known as false-positive, or FP (False Positive). The number of lesions that have not yet been identified is indicated by false-negative data, or FN (False Negative). When it should be zero, it is actually one.

mAP is most frequently used to assess detection accuracy. The average accuracy for each class in the dataset must first be determined i.e. average precision:

  
P average = x = 1 N ( class ) precision ( x ) ΔRecall ( x )

The precision and recall of class x are represented by Precision (x) and Recall (x) and ΔRecall(x) determines the change in Recall, respectively, in the formula above, which also indicates the number of categories as N(class).

The mAP is the mean of the average precision values across all classes (N) in the dataset and APn is the average precision for a particular class n:

  
mAP = n = 1 N ( AP n )

The F1 score is also introduced to evaluate the correctness of the model. The F1 score is also influenced by the recall and accuracy of the model. The formula is as follows:

  
F 1 score = 2 Precision Recall Precision + Recall

Apart from F1-Score two other metrics weighted and macro F1-Score are used for determining model performance, especially in multiclass classification scenarios with imbalanced datasets [80].

The Macro F1-Score treats all classes equally, irrespective of frequency, by computing the F1-Score separately for each class before calculating the unweighted average. When every disease class is evenly important, the Macro F1-Score is useful. It guarantees that minority classes are given equal weight by offering a comprehensive picture of the model’s behavior across all classes. Where N represents total number of classes.

  
Macro F 1 Score = 1 N i = 1 N ( F 1 Score i )

In Weighted F1-Score by calculating the F1-Score for each class and averaging them, weighted by the number of true examples per class, the Weighted F1-Score takes into account the support i.e. the number of true instances of each class. By spotlighting the contribution of each class proportionate to its occurrence in the dataset, the Weighted F1-Score offers a more accurate analysis of the model’s behavior in situations when some diseases are more common. Where wi is the proportion of true instances for class i.

  
Weighted F 1 Score = i = 1 N ( w i ) × F 1 Score i
  
w i = Number of True Instances in Class i Total number of instances

Pixel accuracy is defined as the percentage of accurately identified pixels to total pixels. A pixel that is successfully detected as not belonging to the specified class is called a true negative, whereas a pixel that is correctly predicted to belong to the supplied class is called a true positive.

  
Pixel Accuracy = ( TP ) + ( TN ) ( TP ) + ( TN ) + ( FP ) + ( FN )

Mean pixel accuracy (MPA), which calculates the percentage of correct pixels for each class and averages it across all classes, is an extension of PA.

The Jaccard Index/IoU, also known as the Intersection over Union (IoU) metric, is a statistical measure that computes the ratio of the total number of pixels included in both the target and prediction masks to the number of pixels that are common between them. Apart from that Mean-IoU is the IoU average across all classes.

  
IoU or Jaccard Index = target prediction target prediction

The ROC-AUC (Receiver Operating Characteristic - Area Under the Curve) For classification tasks, such as distinguishing between healthy and unhealthy crop leaves or other disease groups, this metric is frequently employed for evaluation [81]. Healthy and sick leaf samples are frequently distributed unevenly in leaf datasets. Because it assesses performance across all thresholds, ROC-AUC is resilient in managing such situations. Per class evaluation or macro-averaging strategy can be used to extend ROC-AUC when identifying many diseases in order to evaluate the overall performance of the model. In contrast to accuracy, ROC-AUC provides a more thorough assessment by assessing the model’s performance regardless of the decision threshold. For instance, ROC-AUC guarantees that the model can accurately prioritize unhealthy samples while minimizing false positives when identifying diseases like leaf spot or powdery mildew in crops. This is crucial for practical implementation in agriculture.

In ROC Curve the performance of a model across various classification criteria is represented graphically by the ROC curve. It depicts the percentage of Precisely identified positive cases (diseased leaves) is known as the True Positive Rate, or TPR and the percentage of falsely identified negative cases (healthy leaves misclassified as diseased) is known as the False Positive Rate (FPR).

  
TPR = ( TP ) ( TP ) + ( FN )
  
FPR = ( FP ) ( FP ) + ( TN )

In AUC (Area Under the Curve) the area under the ROC curve is represented by a single scalar value called the AUC. It offers a gauge of the model’s ability to differentiate between the classes: AUC = 1.0 denotes a perfect model with 100% class distinction, AUC = 0.5 denotes a model that is no more effective than random guessing, and higher AUC denotes optimal efficiency.

Balanced Accuracy: It’s possible that some leaf diseases are under-represented in the dataset. If a model accurately detects the majority class (healthy leaves) but fails to detect the minority class, it may obtain high overall accuracy [82]. This is addressed by balanced accuracy, which ensures that the model’s performance across all classes is appropriately assessed by equally weighting the sensitivity and specificity. The average of the True Negative Rate which is known as Specificity and True Positive Rate which is known as Sensitivity is known as Balanced Accuracy. In cases where class distributions are unbalanced, it offers a fairer indicator of performance.

  
Balanced Accuracy = 1 2 ( TP TP + FN + TN TN + FP )

Cohen’s Kappa: Beyond accuracy, Cohen’s Kappa offers information about the model’s classification performance. It provides a more delicate assessment of the model’s efficacy, especially in multiclass classification scenarios, by taking into consideration the probability that the model will make accurate estimations by accident. It is a statistical evaluation metric that assesses inter-rater agreement for categorical items while controlling for chance agreement. Its values vary from -1 to 1, with 1 denoting complete agreement, 0 denoting no agreement that is not the result of chance, and negative values denoting disagreement [83].

  
κ = P 0 P e 1 P e
  
P 0 = Total Correct Predictions Total Predictions
  
P e = i = 1 k ( ( Total Predicted instance of Class i ) × ( Total Actual instances of Class i ) Total predictions 2 )

6. Findings

6.1 To expand the size of the dataset in order to produce more precise outcomes

Techniques of Deep learning is currently used in computer/machine vision applications at broader level, which are generally regarded to have specific applications in the agricultural sector, such as diagnosing plant diseases. Plant disease samples for agriculture are not easily accessible. Compared to public repositories, self-gathered data sets are smaller and need more burdensome data labelling. Approximately 80% of research publications by Arivazhagan and Ligi, Mia et al., Trang et al. and Salamai [84, 85, 86, 87] mentioned the limitation of smaller datasets as a significant issue that ultimately calls for the usage of data augmentation. Techniques of deep learning have not been broadly applied in the domain of plant disease diagnostics since many plant diseases are uncommon yet expensive associated with collecting disease pictures. This leads to the acquisition of merely a few data for training the model.

6.2 Intricate background images results in poor accuracy

Large-scale intricate backgrounds may result in more false detections due to the background noise issue on the gathered photos, particularly on low quality images [88, 89]. Due to the lack of available algorithms, the direction of various object identification algorithms’ improvement is examined, and a number of techniques, including attention mechanisms, regional CNNs, and encoder-decoder based architectures, are suggested to enhance the performance of images with complicated backgrounds. The rational allocation of resources is improved by the employment of such techniques. Such systems’ primary function is to identify an area of interest fast and dismiss irrelevant information. It is possible to separate features in plant disease photos and minimize background noise by using the knowledge of these features.

6.3 Model architectures

Even though DL-based models have demonstrated encouraging results on difficult benchmarks, there are still unanswered problems regarding these models. What precisely are deep models learning, for instance? What meaning should we provide to the traits that these models learn? For a given dataset, what is the minimum neural network architecture that can attain a particular segmentation accuracy? While there are methods for visualizing the learned convolutional kernels of these architectures, a thorough analysis of their underlying behavior is yet missing [90]. ResNet-10 by Yao et al. [9], for instance, has the best detection rate but requires the most time to train and detect. Therefore, a deeper comprehension of these models’ theoretical underpinnings can aid in the creation of better models tailored to different segmentation circumstances.

6.4 Lesion overlapping

The fuzziness, complexity, and overlap of the many sections in the disease leaf image make image segmentation an inherently problematic challenge. The majority of disease leaf segmentation algorithms use Gray-level variations between normal, spot, and background pixels as well as a predefined threshold or criterion to identify leaf image spots. However, in reality, as Fig. 9 illustrates, the regions of these pixels in a disease leaf image are typically hazy and unclear, the normal and spot regions’ colors are likewise uneven and ambiguous, and the disease leaf image’s grey histogram is constantly overlapping. Therefore, in overlapping cases, the accuracy of illness diagnosis decreases since the image segmentation approaches are unable to recognize and classify these minute lesions [92]. For increased accuracy, alternative morphological traits and dataset sizes can be utilized.

Figure 9: Segmenting plant disease leaf images original Leaf image and grayscale image [91]

6.5 Limited to single disease only

The literature review conducted by the authors concentrated on one disease, one class, and two or more illnesses on numerous crops [92]. Also, there are very few studies in which there are more than one disease on leaf. Furthermore, no study is conducted on a specific group of crops, such as pome fruits, cereals, grains, millets, and stone fruits, which are the main sources of energy that humans consume.

7. Conclusion and future directions

Drawing from the research conducted to automate the detection and classification of plant leaves using deep-learning-based image segmentation approaches, the following research points could contribute to the advancement of state-of-the-art methods.

7.1 Plant disease datasets: Insights into strengths and limitations of detection systems

The development of comprehensive databases containing plant disease images in natural settings remains in its early stages. Several existing repositories, listed in Table 4 with their features and web links, provide valuable resources for research. However, many studies rely on images generated in controlled laboratory environments or through data augmentation techniques rather than real-time field images. Fazari et al. [93] highlighted this limitation in studies involving single diseases or multiple diseases affecting a single crop. In this context, Generative Adversarial Networks (GANs) have emerged as a powerful tool to address dataset limitations. By generating synthetic images, GANs can enhance dataset diversity, mitigate imbalances, and improve the performance and generalization of deep learning models. They also allow the simulation of rare or complex disease instances, providing more robust training data. Future research should leverage advanced data acquisition platforms such as IoT-based agricultural monitoring models, unmanned aerial systems, and portable spore-capture devices to improve data variety and coverage. These technologies can facilitate the randomization and scalability of image datasets, enhancing segmentation accuracy and model reliability.

The literature predominantly focuses on datasets addressing single diseases or classes, with limited research on multi-disease, multi-crop datasets. Furthermore, attention is often directed at the entire leaf region rather than the specific affected areas, underscoring the need for advanced crop separation techniques. Studies such as [74], which utilized U-Net and DeepLabV3+ to identify and estimate the severity of maize diseases in field settings, demonstrated the efficacy of precise segmentation techniques in predicting disease severity based on leaf area coverage. Expanding datasets to include diverse conditions and developing robust crop separation methods suitable for both controlled and real-world environments are critical steps toward improving detection system accuracy and applicability.

Table 4: Mostly used public image datasets dedicated to crop disease detection

Dataset name Modality Platform Images Type of annotation Web link
PlantVillage Dataset RGB Images Handheld cameras/smartphones 54,000+ Image-level (Healthy and diseased leaves) https://github.com/spMohanty/PlantVillage-Dataset
Kaggle Plant Disease Dataset RGB Images Handheld cameras 25,000+ Image-level (Healthy and diseased leaves) https://www.kaggle.com/datasets/emmarex/plantdisease
AI Challenger Agricultural Disease Dataset RGB Images Handheld cameras and UAV drones 175,000 Bounding box (for disease localization) https://github.com/foamliu/Crop-Disease-Detection
MangoLeafBD Dataset RGB Images Handheld cameras/smartphones 4000 Image-level (Healthy and diseased leaves) https://data.mendeley.com/datasets/hxsnvwty3r/1
CNN_Olive_Dataset RGB Images Handheld cameras 3400 Image-level (Healthy and diseased leaves) https://github.com/sinanuguz/CNN_olive_dataset
Rice Leaf Disease Dataset RGB Images Handheld cameras 5932 Image-level (Healthy and diseased leaves) https://data.mendeley.com/datasets/dwtn3c6w6p/1
Wheat Disease Dataset RGB Images Ground vehicle and handheld cameras 4900 Image-level (Healthy and diseased leaves) https://www.kaggle.com/datasets/olyadgetch/wheat-leaf-dataset
UCI Machine Learning Repository – Leaf Dataset RGB Images Handheld cameras/smartphones 1000 Image-level (Leaf classification) https://archive.ics.uci.edu/dataset/241/one+hundred+plant+species+leaves+data+set
Tomato Leaf Disease Dataset RGB Images Handheld cameras 18000 Image-level (Healthy and diseased leaves) https://www.kaggle.com/datasets/kaustubhb999/tomatoleaf
PlantDoc Dataset RGB Images Handheld cameras and UAV drones 2551 Bounding Box (Disease regions) https://www.kaggle.com/datasets/abdulhasibuddin/plant-doc-dataset
CASA (Citrus Agricultural Disease Dataset) RGB Images UAV drones and handheld cameras Approx 85,000 images Image-level and pixel-level (for segmentation https://data.mendeley.com/datasets/3f83gxmv57/2
Cassava Leaf Disease Dataset RGB Images Handheld cameras/smartphones 21000 Image-level (Healthy and diseased leaves) https://www.kaggle.com/competitions/cassava-leaf-disease-classification/data

7.2 Optimizing accuracy and speed in modern crop disease detection systems

Recent advancements in image processing, machine learning, and deep learning have significantly improved the detection of leaf diseases across various crops. Studies have primarily focused on identifying up to five or six diseases affecting multiple crops or multiple diseases on a single crop [94]. However, specific crop categories such as pome fruits, cereals, grains, millets, and stone fruits, as well as other plant parts like panicles and stems, have received limited attention. While CNNs address memory and computational complexity issues, their real-world applicability is hindered by the lack of automated, intuitive systems with mobile and online accessibility.

Additionally, speed remains a major concern in artificial intelligence applications, as highlighted by Sumathi et al. [95]. Although deep learning techniques deliver superior outcomes compared to traditional methods, they often come with increased computational complexity. To ensure detection accuracy, models must thoroughly analyze image features, which increases computing loads, resulting in slow detection speeds that fail to meet real-time requirements. Reducing computational loads is necessary for improving speed, but it risks compromising training quality, leading to false or missed detections. Thus, designing an effective algorithm that combines speed, accuracy, and computational efficiency is vital for building robust real-time disease detection systems.

References
 
© 2025 The Uniited Graduate Schools of Agricultural Sciences, Japan
feedback
Top