2026 年 19 巻 1 号 p. 42-50
An image-processing algorithm for identifying individual crops is developed for labor-savings and time-series biological information collection. Information including the leaf development frequency are diagnostic indicators of strawberry growth. The algorithm is designed for drones in greenhouses that cannot acquire location information using the global navigation satellite system (GNSS). Drones fly over crop rows and sequentially assign identification numbers (IDs) to crops. Object-detection artificial intelligence (AI) is used to estimate the crop zone, and the ID is based on the crops number difference between frames. The previous misdetection rate was 1.06 %, failing to identify crops, which decreases to 0.31 % using the proposed algorithm. Furthermore, because there are no failures in consecutive frames, IDs are assigned to all crops correctly.
Strawberry branding has accelerated in recent years, and many strawberry varieties are now available in stores. However, as with other crops, the planted area and yield are declining in Japan, with a shrinking domestic market. However, Japanese strawberries are in high international demand, even at a higher unit price than other domestic products (Shiraki et al., 2023), and exports have increased from 180 million JPY in 2012 to 6.2 billion JPY in 2023 (MAFF, 2024). The Ministry of Agriculture, Forestry, and Fisheries (MAFF) “Strategy for Expanding Exports of Agricultural, Forestry, and Fishery Products and Foods has set an export-value target of 8.6 billion JPY for 2025; thus, further strengthening of the production base is required (MAFF, 2020).
An effective means of achieving this is to establish a data-driven smart production system that provides appropriate cultivation management and environmental control based on crop growth conditions. However, forced cultivation, which is the mainstream cropping system for strawberry production in Japan, is a scientifically unexplained cropping system with unique physiological conditions, such as dormancy (Lee et al., 1968), and a method to evaluate its growth conditions has not been established at the production site.
The authors’ previous studies showed that leaf emergence frequency and time-series changes in the length and area of young individual leaves, from the newest to the third leaf, are effective indicators for evaluating the growth conditions of strawberry cultivars grown under forced cultivation. In addition, the growth of strawberry crops varies depending on changes in environmental conditions, such as temperature differences of only a few degrees, which can easily occur in greenhouses. Therefore, it is important to observe as many crops as possible to provide feedback on the growth diagnosis results for cultivation management (Tsubota et al., 2020).
Aerial images captured by drones are used in outdoor fields to observe many crops and save labor. Examples include classifying nitrogen levels in paddy fields (Yang et al., 2025); estimating potato yields (Njane et al., 2023); assessing disease severity (Sugiura et al., 2016); and diagnosing onion growth (Yamamoto et al., 2020). However, using the global navigation satellite system (GNSS), which is used to control drones, is difficult in greenhouses because of unstable reception conditions caused by the influence of columns, covers, growing materials, and the crops themselves. For this reason, although there are some examples of studies using drones to observe crops in greenhouses (Fadami et al., 2021), the current situation is that observations are mainly made at fixed points (Umeda et al., 2018) or using ground vehicles with low mobility (Shimomoto et al., 2025).
In addition to agriculture, drone control without the GNSS is in high demand in the fields of architecture and civil engineering, and several methods have recently been proposed. For example, control methods using sensors mounted on an aircraft, such as simultaneous localization and mapping (Suzuki, 2017); control methods using external sensors, such as total stations (Ishii et al., 2020); and control methods that use the image recognition of landmarks preinstalled along the flight path, such as 2D codes (Kikuchi et al., 2019) have been developed. Research and development is also active in the agricultural field, such as automatic flight in a tomato greenhouse using motion-capture technology with an infrared camera (Hiraguri et al., 2023) and automatic flight tracing of a line laid out in a passage in a strawberry greenhouse (Tsubota et al., 2023). It is anticipated that drones will be used not only outdoors, but also in greenhouses in the near future.
The authors previously proposed the use of a drone downwash to expose and observe young strawberry leaves that were blinded by other leaves. The required air velocity necessary was approximately 4–6 m/s. Furthermore, a small quadcopter moving directly above the crop row and taking pictures from above enabled labor-saving observations of young leaves (Tsubota et al., 2022a, 2022b).
The identification of individual crops is necessary to observe the growth diagnostic indices targeted in this study over time. New leaf development and petiole elongation can be measured by tracking the same individual. Tanaka et al. (2021) investigated aerial photography and image-analysis methods to classify cabbage crops as individuals when the leaves of adjacent crops did not overlap, based on orthoimages of the field synthesized from aerial drone photography. Guo et al. (2020) developed a method to reconstruct a field in three dimensions from aerial images and separate individual crops and weeds using height differences. Sugiura et al. (2021) proposed a technique to measure the growth of individual crops over time using machine learning and other methods for individual identification based on composite images of a field taken periodically from the air.
However, these methods can only be conducted outdoors at flying altitudes of several tens of meters, and are difficult to apply to greenhouses at altitudes of a few meters. When a drone captures downward images in a greenhouse, it can capture only a few individual crops simultaneously. In addition, markers are usually installed as ground reference points to generate highly accurate 3D reconstructions and orthoimages from images captured while moving. However, applying the same method to a greenhouse is impractical and requires the installation of many markers.
Based on this background, this study examines a method for identifying crops and providing location information using videos taken by a drone in a greenhouse. An object-detection model based on deep learning is used for crop-detection and an image-processing algorithm is developed and verified in terms of accuracy by sequentially assigning identification numbers (IDs) to each crop that appears in the frame. The images of the crops, along with their IDs, are saved for use in future growth diagnoses.
This study used “Tochiotome,” the most widely planted variety of strawberries in Japan (Morishita, 2014). To create and evaluate the object-detection model for individual crops, this study used 84 crops grown in 8.4 m long rows planted with staggered planting at the National Agriculture and Food Research Organization (NARO) (Tsukuba City, Ibaraki, Japan).
To validate the accuracy of the algorithm, 30 crops in a 6 m long row planted in a single row at the same organization in Saitama City were used. Both experiments were conducted on the crop, planted in late September at a 200 mm crop spacing and grown at a minimum temperature of a 10 °C setting in a greenhouse.
2.2. Experimental deviceIn this study, a prototype experimental device (Fig. 1) was built and tested assuming that the drone flies along the crop rows above the planting bed. The drone unit was fixed to a transport unit and moved stably on rails, such that the altitude and position did not change each time. Thus, the accuracy of the movement was not affected in the subsequent study of image-processing methods.

The drone unit (X-SPEED 250 B 250MM, ARRIS, Inc., USA) had four rotors with three propellers and a 60 mm length blade, which was arranged point-symmetrically at intervals of 85 mm. A camera (IB-MCT001, Inaba Sangyo, Inc., Japan) was set 155 mm forward and 70 mm downward from the center of the four rotors. This drone unit was the same as that reported in the authors’ previous study (Tsubota et al., 2022b), in which the pulse width modulation (PWM) outputs of the rotors and those below the airflow distribution were defined. In this experimental setup, the drone was mounted on a cart, and the entire system was powered by a 100 V AC power source. The drone used in the experiment was equipped with a standard battery capacity of 1,500 mAh, which was capable of flying for approximately 10 min depending on the flight method.
The transport unit was a cart driven by a stepper motor that moved at a constant speed along a horizontally installed rail. This type of rail was used in Saitama; however, a hot-water pipe was used in Tsukuba because of the shape of the greenhouse. There was no significant difference in performance between the two rails.
The rotors of the drone unit were attached to the transport unit 750 mm from the top of the planting beds. The centers of the four rotors passed directly above the crop planting position such that the growth points were easily exposed owing to the generated downward airflow. The rotors were moved along the rows of crops, and videos were recorded. In each experiment, the downward airflow was set to a wind speed of 4.5 m/s, which was necessary to expose the area near the growth point. The camera resolution was (1,920 × 1,080) pixels at 30 fps, and the horizontal (x-direction) and vertical (y-direction) angles of view were 110 ° and 56 °, respectively. Therefore, in the direction of movement, the image captured from the top surface of the planting bed, located 750 mm from the camera, corresponded to approximately 0.74 mm/pixel. The transport unit speed was 0.1 m/s.
2.3. Image-analysis method 2.3.1. Crop-detection modelIn image-analysis, the multiple crops observed in an image are divided into individual objects. Two major image-processing methods are used to detect objects in images. One is deep learning and the other is rule-based image-processing without deep learning. In this study, crops were detected using a deep learning crop-detection model.
The detection model was created using videos captured by observation equipment at different times during a seven-month period (October 19, November 11, and December 16, 2020; January 15, February 16, March 15, and April 15, 2021) at NARO, Tsukuba. The images were extracted every 15 frames from the 30 fps video so that the same individual crop could be extracted from several frames. For annotation, 8,480 individual crops were extracted by specifying the zones of each individual, with an average of 1,548 frames taken each day. Of these, 80 % (6,784) were used for training and 20 % (1,696) were for used for validation. As adjacent crop individuals often closely overlap each other, clearly separating individual zones is difficult. In several cases, the boundary between the two is ambiguous. Therefore, the rule for annotation included growth points in the zone. Many growth points were located at the center of individual crops and could be clearly distinguished. The boundaries of each crop were determined subjectively via visual inspection. The image-processing software HALCON (MVTec Software GmbH, Germany) was used for training. The image resolution was reduced to (640 × 352), and “Enhanced,” which is recommended for high-precision detection, was selected as the object-detection model. The batch size was set to 18 and the number of epochs was set to 70. To evaluate the detection model, the harmonic means of the recall, precision, and F-score were obtained for the frames used for validation.
To validate the effectiveness of the crop-detection model, the precision (P (%)), recall (R (%)), and F-score (F (%)) of the model were calculated using Eqs. (1)–(3). A total of 686 frames (98 per day) were selected at 100-frame intervals from the 30 fps videos observed on October 29, November 30, and December 27, 2020 and January 28, February 22, March 25, and April 19, 2021.
| (1) |
| (2) |
| (3) |
Where TP, FP, and FN denote the number of true positives, false positives, and false negatives, respectively. Hereafter the false negative is defined as “undetected” and false positive is defined as “misdetection.”
2.3.2. Crop-identification algorithmThe flowchart of the image-processing algorithm is shown in Fig. 2. Identification was performed by assigning unique IDs to individual crops within a particular crop row. The details of the of the image-processing algorithm were as follows.

First, the frame that split from the video of one crop row was read (Fig. 2 (a)). Crop-detection was performed for that frame using the crop-detection model (Fig. 2 (b)). As the video was captured from the end of the crop row, no crops were initially detected because there were no crops in the frame. However, as the frame progressed (Fig. 2 (g)) and the first crop was detected (Fig. 2 (c)), the identification process began assigning IDs.
Completion processing was then performed when the object-detection failed to estimate the undetected and misdetected crops (Fig. 2 (d)). If no estimated zone was created within 200 mm, the crop was considered undetected (Fig. 3 (a)). A temporary ID was assigned to the undetected zone. This zone was treated in the same manner as the crop zone detected by object-detection in the subsequent processing, as shown in Fig. 2 (e) onward.
Misdetections were then subtracted. If multiple detections were made within 200 mm of the adjacent detected crops, they were judged to be misdetected (Fig. 3 (b)). In addition, if the x-coordinate of the center of gravity of the estimated zone deviated by a certain amount from the crop row, it was judged to be misdetected (Fig. 3 (c)).



Next, the identification process assigned IDs to the crop zones (Fig. 2 (e)). This process used the crop zones detected by object-detection and those estimated by completion. The crop zone at the top of the frame was defined as ID_top, and that at the bottom was defined as ID_bottom. The difference between the number of zones counted in the previous frame (Cn−1) and the current frame (Cn) was calculated (Cch). If the difference was 0, indicating that the same zones were detected in the frame, the ID_top from the previous frame was assigned (Figs. 4 (a) and (b)). When the difference was 1, a new crop zone was deemed to have been detected, and ID_top increased by 1 (Figs. 4 (b) and (c)). When the difference was −1, ID_bottom zone in the previous frame was deemed to be outside the frame (Figs. 4 (c) and (d)). ID_top remained the same as in the previous frame and ID_bottom increased by 1.

Cn-1 = 3
ID_topn-1 = 10
ID_bottomn-1 = 8

Cn = 3, Cch = Cn – Cn-1 = 0
ID_topn = ID_topn-1 = 10
ID_bottomn = ID_bottomn-1 = 8

Cn+1 = 4, Cch = Cn+1 – Cn = 1
ID_topn+1 = ID_topn + 1 = 11
ID_bottomn+1 = ID_bottomn = 8

Cn+2 = 3, Cch = Cn+2 – Cn+1 = −1
ID_topn+2 = ID_topn+1 = 11
ID_bottomn+2 = ID_bottomn+1 + 1 = 9
After the identification process was completed, an image of each crop, along with its ID number, detection coordinates, and observation date, was saved. Each image had (501 × 501) pixels centered at the center of gravity of the estimated zone, and could be used for biometric measurements (Fig. 2 (f)).
This process was repeated until no zones were detected in the frame at the end of the crop row (Fig. 2 (h)).
To realize the identification process shown in Figs. 2 (e) and 4, the intervals between frames to be processed are important. If the interval was longer than the distance between the crops, some crops were not captured. To ensure that the topmost and bottommost plants in the image do not change simultaneously between two consecutive frames, the frame rate must be set to be sufficiently high relative to the drone flight speed. The processing frame rate should conform to the following condition.
| (4) |
Where k (frames/s) is the processing frame rate, v (mm/s) is the moving speed of the camera, and Ypm (mm) is the distance between the crops.
2.4. Accuracy verification of crop-identification algorithmThirty crops of the ‘Tochiotome’ variety were planted on September 27, 2021, in a single row 6 m long and 150 mm wide with a 200 mm spacing between crops in a greenhouse at NARO, Saitama. The observations were conducted once daily from February 15 to April 21, 2022. However, observations were not possible on February 20, March 12, or March 13 because of the experimental equipment. Therefore, videos captured on other days were used to verify the performance of the crop-identification algorithm. In particular, the degree to which the algorithm was able to determine when the crop was misdetected or undetected and the causes of these cases were examined.
As the speed of the observation device was 0.1 m/s and the distance between the crops was 200 mm, the algorithm requires a processing frequency higher than 0.5 fps calculated by Eq. (4). In this verification, 10 fps was used, which is every three frames of a 30 fps video and offers a sufficient processing frequency. Subsequently, 600 frames per day were tested for a total of 37,800 frames over 63 days of observation.
The leaf area and petiole length were examined once per week for 26 of the observed crops, excluding two crops at both ends and two crops in the middle of the row. The leaf area was measured by imaging (Tsubota et al., 2020) and the petiole length was measured using a tape measure. The leaf position was determined by observing leaf development, and the leaf size and length were recorded up to the 11th leaf.
The crop conditions during the measurement period are shown in Fig. 5. The average leaf area per crop according to the leaf position is shown in Fig. 5 (a). The total leaf area decreased from approximately 75,000 mm2 in mid-February to approximately 60,000 mm2 in mid-March and then increased to approximately 70,000 mm2 in mid-April. After March, newer leaves, such as the 2nd to 5th leaves, grew larger than the older leaves located outside.


Error bars indicate maximum and minimum.
The average petiole lengths with mean, maximum, and minimum values are shown in Fig. 5 (b). The 3rd to 11th leaves are shown to end the elongation period. For these leaves, petiole length decreased from approximately 95 mm in mid-February to 77 mm in mid-March, and increased to 87 mm in mid-April. As there was a leaf blade at the end of the petiole, the whole leaf length exceeded 100 mm at all times, and the leaves partially overlapped with a neighboring crop under planting conditions of 200 mm between crops.
From October to April in Tsukuba, a crop-detection model was created using 80 % of the frames from the observed cultivation row videos selecting one day’s recordings per month. The evaluation was performed using the remaining 20 % of the frames, and resulted in a precision, recall, and F-score of 87, 91, and 89 %, respectively.
For 686 frames from another seven days during the same period, which were not used to create the model, the evaluation resulted in a precision, recall, and F-score of 99, 92, and 95 %, respectively (Table 1). The F-score was high as 98–99 % in October after planting, in February when growth was poor, and in March and April when the leaves were heavily plucked due to insect infestation. However, the F-score was relatively low at 88 % in November and December, when the plants were fully grown and crops crossed each other. Even in these cases, the results were equivalent to those of the model evaluation, suggesting that the constructed crop-detection model was effective.
| Day | Precision (%) | Recall (%) | F-score (%) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Target row | Non-target | Total | Target row | Non-target | Total | Target row | Non-target | Total | ||
| 2020/10/29 | 100 | 100 | 100 | 97 | 95 | 96 | 99 | 98 | 98 | |
| 2020/11/30 | 100 | 97 | 98 | 92 | 69 | 81 | 96 | 81 | 88 | |
| 2020/12/27 | 99 | 98 | 98 | 89 | 71 | 80 | 94 | 82 | 88 | |
| 2021/1/28 | 100 | 99 | 99 | 94 | 86 | 90 | 97 | 92 | 94 | |
| 2021/2/22 | 98 | 99 | 99 | 98 | 97 | 97 | 98 | 98 | 98 | |
| 2021/3/25 | 98 | 99 | 98 | 100 | 100 | 100 | 99 | 99 | 99 | |
| 2021/4/19 | 98 | 99 | 98 | 100 | 99 | 100 | 99 | 99 | 99 | |
| Total | 99 | 99 | 99 | 96 | 88 | 92 | 97 | 93 | 95 | |
In the experimental device, the target observation crop was directly below the flight route; however, in the frame, an adjacent row was also included. Therefore, to verify the effectiveness of the imaging method based on the airflow of the experimental device, an adjacent row that was not the observation target was also evaluated. Comparing the target row with the parallel non-target row, the precision rates were 99 % for both, whereas the recall rates were 96 % and 88 %, respectively, with a lower value for the non-target row. The recall rates were particularly low at approximately 70 % in the non-target rows in November and December. At this point, overgrowth was frequent, suggesting that the characteristic appearance of petioles intersecting the growth point (Fig. 6) could not be captured when there were no gaps between the crops. Therefore, airflow was used, and the detection model was effective for detecting strawberry crops.

Compared to recall, the precision was high in both rows, but there were a total of 46 misdetections out of 4,062 crops, of which 29 were fruit bunches and 17 were leaves mistaken as crop individuals. The reason for this misdetection was thought to be that multiple branching peduncles or leaf petiole crossover was similar to the state of the growth point and surrounding petioles exposed to the airflow (Fig. 7).


Table 2 lists the test results for the crop-identification algorithm. A total of 37,800 frames and 71,647 crops were used for identification and 98.94 % were identified without complementarity.
| Unit | Processed image | Undetected | Misdetection | ||||
|---|---|---|---|---|---|---|---|
| Total | Complemented | Failed | Total | Complemented | Failed | ||
| Number | 71,647 | 304 | 281 | 23 | 461 | 258 | 203 |
| % | - | 0.42 | 0.39 | 0.03 | 0.64 | 0.36 | 0.28 |
A total of 304 crops (0.42 %) were undetected. Of these, 281 crops were identified by estimating the crop zone using the distance between crops (Fig. 8 (a)). The remaining 23 crops could not be used for this estimation because of incorrectly detected crop zones. The final undetection rate was 0.03 %.
In total, 461 crops (0.64 %) were misdetected. More than half of these, 258 crops, could be judged as misdetections based on information including the crop row coordinates (x-coordinates) and crop spacing (Fig. 8 (b)), but the other 203 crops could not (Fig. 8 (c)). The final misdetection rate was 0.28 %.




Figure 9 shows the detailed examination results of the misdetected factors, which were less accurate than those of the undetected factors. As described in the evaluation of the detection model in Section 3.1., fruit bunches and leaves were mistakenly identified as crops. A total of 267 misdetections were fruit bunches, of which 37 % were judged as misdetections using the proposed algorithm, while 63 % were not. Fruit bunches develop from a growth point near the center of the crop, elongate, and droop toward the aisle. During the elongation process, the fruit bunch was located at the x-coordinate close to the crop row; hence, only the area around the fruit bunch was often missed as a crop. The experiment was conducted between late February and early March.


Misdetections caused by leaves were higher in mid-to-late February; older outer leaves were larger than inner leaves at this time, and petioles were longer than those in the other periods. The crossing petioles looked like growth points, and when they were located independently without other leaves, they caused misdetections. However, 82 % of the 194 misdetections were judged as misdetections using the proposed algorithm, using the x-coordinates of the outer leaves with long petioles that were far away from the crop row. This was better than that in the fruit bunch cases.
Complementing undetected and misdetected areas using the proposed algorithm, the total identified crop zones were 99.69 %. Furthermore, the same crops were captured at different angles so that the individual IDs were not continuously incorrectly assigned (Fig. 8 (d)). Therefore, on all observation days, none of the 30 crops in the observed crop row were completely unrecorded in terms of image data.
To observe more individual crops in greenhouses in the future, the crop rows must be identified. This can be solved by adding a column number when crop rows are observed individually, or by placing a specific marker at the end of the rows.
From the above results, it is possible to identify individual crops and correctly assign IDs to each crop by flying a drone along a strawberry crop row in a greenhouse and capturing a video. By performing this mobile observation over time, it is possible to track the growth of the crop, which has become a biometric method that contributes to the growth diagnosis of forced culture strawberries. Future research aims to develop an image-processing method to calculate growth diagnostic indicators, such as the frequency of leaf occurrence for each crop (Tsubota et al., 2020), to achieve labor-saving and quantitative evaluation.
This study proposed a method for assigning location information to crops in video images to diagnose strawberry growth, assuming drone-based imaging in greenhouses where GNSS and similar positioning systems are unavailable. The experimental device simulating a drone flew over crop rows, capturing downward-facing video, and an image-processing algorithm was developed to identify crops and its accuracy was evaluated.
First, a deep learning-based object-detection model was constructed for crop-identification. As strawberries often overlap with neighboring crops, precise boundary detection is challenging. Therefore, approximate crop zones were annotated centered on the growth point, which is visually easy to identify as the center of each crop. A total of 8,480 annotated images from various days between October and April were used for training.
When evaluating 686 frames from different days in the same period, the F-score dropped to 88% in November and December owing to frequent overgrowth. However, overall, the model achieved an average precision of 99 %, recall of 92 %, and F-score of 95 %.
Next, a crop-identification algorithm was developed that assigned unique IDs to the detected crops in each frame. By tracking the number of crops in the preceding and following frames, IDs were sequentially assigned to new crops that appeared as the device moved. Crops that left the field of view were replaced with new unassigned crops.
Although the F-score of the detection model was high, undetections (FN) and misdetections (FP) remained. These were corrected using inter-crop distances and crop–row coordinates. Using daily observation videos of a row with 30 crops over 63 days (mid-February to mid-April), the rates of undetected and misdetected crops were 0.42 % and 0.64 %, respectively, which were reduced to 0.03 % and 0.28 %, respectively, after correction. Furthermore, no failures occurred in consecutive frames, allowing accurate ID assignment throughout.
To enable labor-saving and quantitative growth diagnosis of strawberries, future research aims to develop image-processing methods for the time-series analysis of identified crops from moving observation videos, using the leaf emergence frequency, leaf area, and petiole length.
To apply this method to aerial videos captured by drones, it is necessary to implement flight control technologies that ensure two key conditions:
1) The interval between the processed frames must remain shorter than the crop spacing.
2) The drone must maintain a sufficiently straight flight such that the aerial imaging area does not deviate from the crop row, even in environments without GNSS.
This research was supported by the MAFF of Japan, “Smart Agriculture Production Area Model Demonstration” (Project No. 5G3C1, Section Title: Demonstration of Intelligent and Remote Strawberry Cultivation Using Local 5G, Project Entity: NARO). The results were obtained from research commissioned by the National Institute of Information and Communications Technology (NICT), Japan (Project No. 23301).
The authors declare no conflicts of interest.
(URLs on references were accessed on 28 January 2026.)