High-throughput field crop phenotyping: current status and challenges

In contrast to the rapid advances made in plant genotyping, plant phenotyping is considered a bottleneck in plant science. This has promoted high-throughput plant phenotyping (HTP) studies, resulting in an exponential increase in phenotyping-related publications. The development of HTP was originally intended for use as indoor HTP technologies for model plant species under controlled environments. However, this subsequently shifted to HTP for use in crops in fields. Although HTP in fields is much more difficult to conduct due to unstable environmental conditions compared to HTP in controlled environments, recent advances in HTP technology have allowed these difficulties to be overcome, allowing for rapid, efficient, non-destructive, non-invasive, quantitative, repeatable, and objective phenotyping. Recent HTP developments have been accelerated by the advances in data analysis, sensors, and robot technologies, including machine learning, image analysis, three dimensional (3D) reconstruction, image sensors, laser sensors, environmental sensors, and drones, along with high-speed computational resources. This article provides an overview of recent HTP technologies, focusing mainly on canopy-based phenotypes of major crops, such as canopy height, canopy coverage, canopy biomass, and canopy stressed appearance, in addition to crop organ detection and counting in the fields. Current topics in field HTP are also presented, followed by a discussion on the low rates of adoption of HTP in practical breeding programs.


Introduction
While plant genotyping has rapidly outperformed Moore's law of computational power, low-throughput plant phenotyping is seen as a bottleneck of plant science, promoting an intensification of studies on high-throughput phenotyping (HTP) in the last decade. Costa et al. (2019) analyzed trends in publications on plant phenomics between 1997 and 2017 and found that the number of these publications increased much more rapidly after 2007 than in other plant science categories. As shown in Fig. 1, this trend accelerated again after 2017. During this period, several plant phenotyping research centers have been founded, including the Australian Plant Phenomics Facility in Australia (APPF) (https://www.plantphenomics.org.au), Jülich Plant Phenotyping Center (JPPC) in Germany (https://www.fzjuelich.de/ibg/ibg-2/EN/Research/ResearchGroups/JPPC/ JPPC_node.html), National Plant Phenomics Center in the United Kingdom (NPPC) (https://www.plant-phenomics.ac. uk), Plant Phenotyping and Imaging Research Center in in data analysis, sensors, and robot technologies (Roitsch et al. 2019). Machine learning approaches represented by convolutional neural networks (CNN) (Jiang and Li 2020) have also contributed to advances in newly emerged imageanalyzing technologies, such as 3D reconstruction by SfM-MVS (structure from motion and multi-view stereo) , which reconstruct 3D structures of objects based on stereo photogrammetry using multiple images of the target objects. Sensor hardware and computer resources have markedly improved, and prices have decreased, making them more popular. The resolution of commercial RGB cameras now stands at 100 million pixels. Similarly, multispectral cameras and light detection and ranging (LiDAR) systems, which are extremely expensive, are now also available at reasonable prices . LiDAR ) allows for distance scanning to reconstruct the 3D structures of objects by detecting the distances to the target objects. Even the price of hyperspectral cameras, which exceeded 100,000 USD some years ago, is falling rapidly .
In addition, advances in sensor platforms within the field of robotics have supported the progress of HTP (Zhao et al. 2019). Particularly, the recent contribution of advances in unmanned aircraft systems (UASs), also often called unmanned aerial vehicles (UAVs), has been outstanding for HTP, along with several types of UAS-mountable image sensors, such as RGB, multispectral, hyperspectral, and thermal cameras . Similarly, advances in IoT environment sensors, such as Field Server (Hirafuji et al. 2013), have been also supported HTP, particularly when considering the importance of understanding G×E (genotype and environment interaction).
This article provides an overview of recently developed HTP technologies, focusing on the canopy-based architectural phenotypes of major crops, such as canopy height, canopy coverage, canopy biomass, canopy stressed appear-ance, and canopy level crop organ detection and counting. This article does not discuss root phenotyping, which is as important as above-ground phenotyping (Atkinson et al. 2019, Uga 2021, since it is reviewed in the same issue (Teramoto and Uga 2022). Instead, current topics in field HTP are discussed, including the challenges associated with promoting the use of machine learning approaches in HTP.
The dynamic and rapid advances being made in HTP has lead stakeholders to expect breeders to adopt HTP in their breeding programs (Fasoula et al. 2020, Watt et al. 2020. However, the adoption of HTP in practical breeding programs is stagnant (Awada et al. 2018, Deery and. In the final part of this review, we briefly discuss the reasons for the low rate of adoption of this technology.

Canopy height, canopy coverage, and biomass
The estimation of biomass-related traits has been widely studied in satellite remote sensing (Liu et al. 2019a). However, considering the current resolution of satellite images, satellite-based biomass estimation models cannot be applied to the average scale of breeding plots. However, UAS-based monitoring is currently the best fit for the scale of the plots. Moreover, the comparatively easy usability and the reasonable cost of UAS promote its use in plant breeding .

Canopy height
The efficiency of canopy height estimation, which used to be highly laborious, has been dramatically improved by two types of 3D reconstruction technologies: SfM-MVS and LiDAR. SfM-MVS is mainly used with UAS-based RGB (UAS RGB) and/or UAS-based multispectral (UAS multispectral) images, whereas LiDAR systems are usually either fixed obliquely looking down fields or mounted on mobile platforms, such as vehicles and gantries. Currently, the 3D reconstruction of canopies using SfM-MVS with UAS images is more scale-efficient than that using groundbased LiDAR. However, 3D reconstructions by SfM-MVS at times fail, depending on the quality of the acquired images and the complexity of the canopy structures. This method also requires more computational resources than LiDAR. Considering that reasonably priced UAS-mountable LiDAR systems are becoming increasingly available, we expect LiDAR to take the lead in 3D reconstruction in the near future .
Examples of canopy height estimation by SfM-MVS have been provided for wheat , Hassan et al. 2019, Khan et al. 2018,Yue et al. 2018b, barley (Wilke et al. 2019), rice (Kawamura et al. 2020), maize , Ziliani et al. 2018) and sorghum , Watanabe et al. 2017, while examples of canopy height estimation by LiDAR have been provided for wheat (Friedli et al. 2016, Jimenez-Berni et al. 2018, Walter et al. 2019a, rice (Phan et al. 2016, Tilly et al. 2014, corn (Friedli et al. 2016), soybean (Friedli et al. 2016), cotton (Sun et al. 2018), and peanut (Yuan et al. 2019). Hu et al. (2018) proposed a method to calibrate the estimated values by using small number of manually observed values. Note that the estimation of canopy heights from 3D point clouds constructed by SfM-MVS or LiDAR differ among these studies.

Canopy coverage, senescence, and seedling emergence
Canopy coverage is a good indicator of crop growth, particularly when it is obtained sequentially to obtain a growth curve. While it was almost impossible to obtain this type of curve easily, high-throughput imaging by UASs or ground vehicles has made this a reality.
Image-based canopy coverage estimation requires accurate crop segmentation from the background. Historically, simple thresholding based on a value determined by maximum likelihood classification or color indices, such as ExG (Woebbecke et al. 1995), have been used for such segmentations. Guo et al. (2013) raised questions about the robustness of existing methods under varying illumination with heavily shadowed patches of outdoor fields, and proposed a machine learning based segmentation method, DTSM (decision tree segmentation model), the accuracy and the robustness of which have been confirmed in wheat, rice, cotton, sugarcane, and sorghum , Guo et al. 2013, and which is now widely used as a published application, EasyPCC , in plant science. The canopy coverages of wheat (Jimenez-Berni et al. 2018) and cotton (Sun et al. 2018) have also been estimated using ground-based LiDAR observations. Similarly, the senescence or stay-green of wheat, maize, and sorghum has been evaluated by UAS RGB or multispectral images (Hassan et al. 2018, Liedtke et al. 2020, Makanza et al. 2018). Using UAS-RGB images, the emergence of wheat, rice, maize, and potato was evaluated . In a unique study, Bruce et al. (2021) assessed the variation of soybean pubescence using UAS multispectral images.

Biomass and LAI
Unlike the majority of height and canopy coverage estimations, the estimations of aboveground biomass (AGB) and leaf area index (LAI) usually require some regression to estimate the target trait values. There are two types of estimation. The first type uses vegetation indices, such as normalized difference vegetation index (NDVI), calculated based on spectral reflectance values from multispectral or hyperspectral images captured by UAS cameras or ground cameras, while the second type uses the architectural values of plants, such as the height and volume of plants obtained from 3D reconstruction data. The AGB and LAI estimations of wheat (Hu et al. 2021, Khan et al. 2018, Yao et al. 2017, Yue et al. 2018a) and rice (Shu et al. 2021, Tanger et al. 2017, Wang et al. 2021c) are examples of the first type, while estimations of wheat (Deery et al. 2020, Jimenez-Berni et al. 2018, Walter et al. 2019b, soybean (Herrero-Huerta et al. 2020), and cotton (Sun et al. 2018) are examples of the second type. There are also examples where both types are mixed, such as rice (Jiang et al. 2019) and corn (Michez et al. 2018). Riera et al. (2021) used a completely different approach to estimate soybean yield, choosing to count the number of pods from images captured by a ground robot cart.

Crop stress assessments
Methods for the high-throughput phenotyping of abiotic and biotic stresses on crops, including drought, pests, and diseases, have also advanced rapidly, making use of advances in machine learning technologies (Singh et al. 2016. The scope of these works vary from the leafscale level to the field level.

Disease assessments
CNN has played an important role in the identification of biotic stress, particularly at the leaf or individual plant level (Boulent et al. 2019). For example, nine different stressinduced phenotypes in soybean leaves (four different diseases, two nutritious deficiencies, herbicide injury, sudden death syndrome, and normal) of soybean single leaves were highly accurately classified and quantified (Ghosal et al. 2018) using CNN, and ten different stressed appearances on tomato leaves (gray mold, canker, leaf mold, plague, leaf miner, whitefly, low temperature, nutritional excess or deficiency, powdery mildew) were accurately classified using CNN (Fuentes et al. 2017. Furthermore, an accurate and qualitative assessment of disease at the leaf level can help in the identification of efficient resistant genes, as was done for Septoria tritici blotch (STB) in wheat (Yates et al. 2019). Technologies that utilize intact leaf images taken under natural conditions have also seen advances for use in the accurate recognition of diseases (Fuentes et al. 2017, Johnson et al. 2021. While studies at the leaf-level could be used to replace observations by experts and provide objective and repeatable evaluations, improvements in assessment efficiency when applied in the field have yet to be achieved. Thus, canopy-level stress high-throughput phenotyping, mainly by UASs, has also been studied, which is expected to see a dramatic acceleration of its application in the assessment of stress in plant breeding (Barbedo 2019). Following the success of disease assessment using ground mobile platforms, including for sugar beet cercospora leaf spot (Atoum et al. 2016) and wheat STB (Walter et al. 2019a), field level disease assessment by UAS has been widely performed using RGB and/or multispectral images with CNN : northern corn leaf blights (DeChant et al. 2017, Wiesner-Hanks et al. 2019, wheat yellow rust , wheat stripe rust (Schirrmann et al. 2021), rice sheath blight , potato late blight (Duarte-Carvajalino et al. 2018, Sugiura et al. 2016, soybean foliar diseases (Tetila et al. 2017), sugar beet cospora leaf spot (Altas et al. 2018, Jay et al. 2020, peanut tomato spot wilt (Patrick et al. 2017), radish Fusarium wilt (Dang et al. 2020, Ha et al. 2017, and soybean iron-deficient chlorosis (Dobbels and Lorenz 2019). Taking into account the falling prices of hyperspectral cameras, we can expect this technology to be widely applied for disease assessment in the coming years. Thomas et al. (2018) used this technology for barley powdery mildew at a ground-based phenotyping facility, while Joalland et al. (2018) used the same technology to assess tolerance to sugar beet cyst nematode (SBCN).

Water stress
Canopy surface temperature (CT) is a good indicator of stomatal conductance (Moller et al. 2007, Seguin et al. 1991 because plant surfaces are cooled in proportion to the evaporation rate. Recently, several different types of thermal cameras mounted on UAS have become commercially available , and their use in monitoring CT has been confirmed , Sagan et al. 2019. The CT is constantly and rapidly changing according to the environmental conditions, including light, temperature, and wind. As a result, consistent and repeatable measurements over crop canopies are difficult (Perich et al. 2020). However, several ideas have been proposed by researchers to achieve reliable CT measurements for maize , fruit trees , wheat (Perich et al. 2020), soybean (Crusiol et al. 2020), and barley (Hoffmann et al. 2016). For example, Perich et al. (2020) used the heritability of CT to identify the optimal timing of the measurement.
Structural changes in plants, such as leaf wilting, which is detectable by image analysis, can also be an indicator of water stress (Srivastava et al. 2017, Wakamori andMineno 2019). Another way to estimate water stress is to use mod-els or indices based on hyperspectral or multispectral images (Asaari et al. 2019, Romero et al. 2017, Thorp et al. 2018. Flooding stress on soybeans has also been previously assessed using UAS multispectral and thermal images .

Salinity stress
Salinity stress usually causes growth deficiencies. As a result, phenotyping methods for biomass-related traits can be used to identify salinity stress by comparing with control plants. This method was used by Johansen et al. (2019), who evaluated the response of wild tomato genotypes to salinity stress by comparing growth curves based on canopy coverage estimated from the UAS RGB and multispectral time-series images. Similarly, Ivushkin et al. (2019) showed that the hyperspectral physiological reflectance index (PRI, Gamon et al. 1992) obtained from hyperspectral images could be used to identify the stress of treated quinoa plants compared to control plants.

Lodging
UAS canopy monitoring provides an opportunity for the high-throughput and quantitative measurement of canopies to evaluate the extent of lodging. The canopy height estimation methods based on SfM-MVS or LiDAR can be directly used for lodging assessments , Wilke et al. 2019, whereas lodging assessment based on image features, canopy coverage, and NDIV from UAS multispectral images (Han et al. 2018 or a combination of selected bands of hyperspectral images (Wang et al. 2021c) have also been proposed.

Weed identification
Although weed detection using ground vehicles is well documented, particularly for localized precision herbicide applications, few studies have reported on UAS-based weed detection (Singh et al. 2020). UAS-based weed detection is particularly important when crop traits, such as biomass and canopy coverage, are estimated from fields contaminated by weeds. De Castro et al. (2018) proposed a method to segment weeds in sunflower and cotton fields using random forest classification based on features derived from UAS RGB and multispectral images and crop height estimated from UAS RGB images. This study aimed to identify broad-leaf weeds and grass weeds (Torres-Sánchez et al. 2021). Huang et al. (2018) demonstrated that rice and weeds can be classified based on UAS RGB images using a CNN model, fully convolutional network (FCN) and transfer learning (Jiang and Li 2020). While the current methods are not applicable to complex fields where weeds of various species are intermingled, Skovsen et al. (2021) demonstrated that CNN models can classify white clover, red clover and weed from rather complicated canopy images, using synthetic training data which is discussed in the later part of this paper. Variations in hyperspectral reflectance among certain weeds and crops have been reported (Singh et al. 2020), indicating that UAS hyperspectral images can be used to segment weeds from crops.

Canopy-level crop organ detection and counting
The development of automatic crop organ detection and counting technologies in outdoor fields has been a newly emerging area in the last 5 years, occurring alongside advances in image analyzing technologies, mainly based on machine learning. Crop organ detection and counting in fields is hindered by the variations in the environmental conditions, such as light, shadows, wind, rain, and heavy occlusion of the organs, in contrast to controlled indoor conditions. In breeding fields, the intraspecific variations in shape, size, and color among different genotypes accelerate these difficulties. Despite such difficulties, recent studies on crop organ detection and counting have reported great success, as exemplified below.

Rice panicle detection and counting
A pioneering study (Guo et al. 2015) performed an accurate automatic detection of rice flowering panicles based on time series RGB images captured by ground-based cameras, using the scale-invariant feature transform (SIFT) (Lowe 2004), bag of visual words (BoVWs) (Csurka et al. 2004), and a support vector machine (SVM). The study showed that, visually, very small events, such as rice flowering (anthesis), which occurs at particular times on particular days on particular parts of the panicles, could be automatically detected from images taken under varying natural conditions. Similarly, Desai et al. (2019) used a CNN model, ResNet-50 (Jiang and Li 2020), to detect rice flowering panicles instead of image feature extractions, such as SIFT, and showed that the heading date of the rice canopy could be estimated using the daily cumulative distribution of the detected number of flowering panicles.
Methods to automatically detect and count rice panicles in paddy rice canopies were proposed using CNN (Lyu et al. 2021, Xiong et al. 2017. Lyu et al. (2021) used UAS RGB images captured at comparatively low altitudes (1.2 m) with a CNN model, Mask R-CNN (Jiang and Li 2020), and achieved a counting precision of 0.82 (precision = T p /T p + F p while recall = T p /T p + F p , where T p , F p , and F p are the numbers of true-positive, falsepositive, and false-negative in the detections, respectively). The panicle annotation dataset (38,799 patches) used by Lyu et al. (2021) was expanded to 50,730 by filtering the results of the automatic detection of panicles .

Wheat spike detection and counting
The detection and counting of wheat spikes has been widely performed using CNN models, challenging several of the difficulties experienced under natural conditions  (Jiang and Li 2020), to detect and count wheat spikes based on ground-based high-resolution RGB images to estimate the ear density, achieving R 2 = 0.91 in spike counting. Hasan et al. (2018) also used Faster R-CNN, achieving R 2 = 0.93 in spike counting regardless of spike growth stage with RGB images captured from a hand-pushed cart. Xiong et al. (2019) developed a large annotation dataset of wheat spikes and developed a CNN model, TasselNetv2, to count wheat spikes and improve the structure of TasselNet (Lu et al. 2017). TasselNetv2 achieved not only good spike counting accuracy, even for lower resolution ground-based RGB images than those used by Madec et al. (2019), but also a faster performance than TasselNet. Lu and Cao (2020) proposed TasselNetV2+, adding several modifications to the algorithm of TasselNetV2 to improve the computational efficiency of wheat spike detection and counting while retaining the accuracy.
Using the UAS-RGB images captured at altitudes between 7 and 15 m, Zhao et al. (2021a) achieved a wheat spike detection accuracy (IoU) of 0.94 using a CNN model, YOLOv5 (Jiang and Li 2020). In another study, Zhao et al. (2021b) proposed a method for automatically determining the heading date of wheat spikes. Instead of directly detecting the emergence of spikes, they used the inflection points of the canopy growth curves estimated from UAS RGB images as an indicator of heading. The mean absolute error of the estimated heading date was 2.81 days. Jin et al. (2019) estimated the stem density of wheat using RGB images of stem cross-sections left on the ground after the harvest using Faster R-CNN, and found that the value was a good proxy of ear density. David et al. (2020David et al. ( , 2021 provided a large-scale open benchmark dataset of wheat images through a multilateral international collaboration. The dataset created in 2020 (David et al. 2020) included 4,700 high-resolution wheat images of various genotypes and various growth stages collected from several countries around the world and 190,000 wheat spike annotations, to accelerate the development of spike detection algorithms. The dataset was used at a global competition, Global Wheat Head Detection (https://www. kaggle.com/c/global-wheat-detection), in which 2,245 teams from around the world participated. The dataset was updated by adding 1722 images from 5 additional countries with 81,553 additional wheat heads (David et al. 2021) where the dataset was reexamined and relabeled to improve the dataset quality. Lu et al. (2017) (Guo et al. 2013), to detect sorghum heads of various colors. Because some of the regions detected as sorghum heads contained more than one head, they estimated the number of heads in each of the detected head regions by SVM with the eleven image features, such as area, perimeter, and roundness of the regions, and achieved a precision/recall of 0.87/0.98 for the detection and R 2 = 0.84 for head counting. TasselNetV2+ (Lu and Cao 2020) achieved improved computational efficiency also for maize tassel and sorghum head detections.

Fruit detection and counting
A CNN model named Deepfruits (Sa et al. 2016) based on Faster R-CNN was one of the first studies to demonstrate the power of CNN for fruit detection. Deepfruit employed both RGB and NIR images as multimodal inputs and was successfully applied to fruits of seven different crop species: sweet pepper, melon, apple, avocado, mango, orange, and strawberry. Kang and Chen (2020) proposed a CNN model, LedNet, to detect apples in orchards, achieving an accuracy (IoU) of 0.85. To promote fruit detection studies, Häni et al. (2020) published a benchmark dataset for apple detection and segmentation that contained 1,000 images and 41,000 annotated instances of apples. Mu et al. (2020) succeeded in detecting highly occluded immature green tomatoes using CNN models (R-CNN and ResNet-101, Jiang and Li 2020), achieving R 2 = 0.87. Yeom et al. (2018) estimated the number of open cotton balls using image feature extraction on the UAS RGB images at an altitude of 15 m. Riera et al. (2021) estimated the number of soybean pods in each breeding plot as the basis for yield estimation using CNN models (VGG and RetinaNet), wherein the images were captured using a video camera mounted on a small field robot that moved between the rows of the plot.

Model-assisted phenotyping
Model-assisted phenotyping is an approach used to estimate phenotypes that cannot be directly observed using crop models parameterized by observable phenotypes. Simple examples have already been introduced in the biomass estimation section of this article. Occlusion is an unavoidable issue when phenotyping canopy structures, particularly in the late growth stage, when the foliage architecture be-comes complex. For example, one study found that the accuracy of the total leaf area and leaf number of soybean plants estimated from UAS images was much worse in the late growing stage than in the early growing stage (Liu et al. 2021a). To overcome this issue, Liu et al. (2019b) proposed a modeling workflow called the digital plant phenotyping platform (D3P) for wheat, coupling an L-systembased wheat architectural model (ADEL-wheat, Fournier et al. 2003) and observations by HTP. They conducted a simulation study to estimate the model parameters and a green area index (GAI, green plant area per ground area) by the assimilation data from the green fraction estimated from RGB images of the canopy to D3P. As a result, they demonstrated that some architectural parameters, such as phyllochron, lamina length of the first leaf, rate of elongation of leaf lamina, number of green leaves at the start of leaf senescence, and minimum number of green leaves, and GAI were accurately estimated. Data assimilation, in which model parameters are dynamically updated using observed data, is commonly used in satellite-based crop monitoring studies, such as yield estimation (Zhang et al. 2016).
Similar data assimilation has been used in several studies, such as Blancon et al. GLAI) dynamic model of maize using the estimated GLAI from the empirical relationship between multispectral reflectance obtained from UAS multispectral images at an altitude of 60 m and GLAI manually measured at the ground level. They found that the GLAI dynamic was accurately estimated (R 2 = 0.9), as well as the model parameters, including the maximum leaf area and leaf longevity. Additionally, they found that the model parameters and GLAI dynamics were highly heritable (0.65 ≤ H 2 ≤ 0.98). Similarly, Roth et al. (2020) estimated the beginning of stem elongation, the rate of plant emergence, and the number of tillers of wheat seedlings by SVM and crop modelling based on timeseries multi-view angle UAS RGB images at an altitude of 18 m, achieving a tiller number estimation accuracy of R 2 = 0.86. Ubbens et al. (2020) proposed the latent space phenotype (LSP) to evaluate time-course phenotypic changes caused by abiotic stress factors, such as drought, nitrogen deficiency, and salinity. These phenotypic changes can be very complicated and depend on many factors. As such, it is not easy to quantify the changes, and humans are not always able to easily identify the different phenotypic responses to different treatments. The authors first obtained abstract low-dimensional vectors that discriminate between timeseries images captured under stressed and control conditions by encoding the original images using CNN and an extension to recurrent neural network (RNN) (Jiang and Li 2020) and long short-term memory (LSTM) (Jiang and Li 2020). The encoding process was not different from the widely used CNN-based phenotypic discrimination, such as disease identification (Singh et al. 2016. However, Ubbens et al. (2020) added a decoding process to recover low-dimensional vectors from the original images by CNN training. The outputs of the decoding process represented the image expressions of the different responses to treatment. They defined the distance between two decoded images and used the sum of the distances from the first image to the last image of the decoded time-series images as the LSP, which represents the difference in time-course responses to the treatments. Then, they demonstrated some use cases of LSPSs. For example, a QTL analysis based on the LSPs obtained from the C4 model plant, Setaria RILs, subjected to water stress treatments, was used to identify the same QTLs related to water stress, as reported by Feldman et al. (2018). Gage et al. (2019) also used the concept of LSP for the point cloud data acquired by a LiDAR mounted on a phenotyping rover in maize fields to evaluate variations in plant architectures among 698 hybrid genotypes, as 3D point cloud data cannot be directly parameterized to understand variation. First, they created a 2D marginal frequency distribution of the 3D point cloud of maize crops in each plot. Then, they used two methods of dimension reduction to map the original 2D distribution to LSPs: an autoencoder and principal component analysis (PCA). They trained the CNN encoder and decoder so that the original 2D distribution images (input) were encoded to 16-dimensional vectors as LSPs, and the vectors were decoded back to 2D distribution images (output), minimizing the loss based on the mean square error between the input and the output. They also used PCA to obtain 16 principal component scores as the LPSs. Some of the LSPs showed high heritability as manually measured architectural traits. In other words, extremely complicated 3D point clouds were summarized to a few latent variables using either a CNN autoencoder or PCA on 2D frequency distributions of the 3D point clouds, and the latent variables were linked to heritability. Their results also showed that the partial least squares (PLS) regression model based on the LSPs was able to predict some of the manually measured traits well.

Latent space phenotyping
One possible way to understand the relationship between the latent variables and observable phenotypes is to intentionally fluctuate the latent variables and decode them back to images to see how the fluctuation changes the images. A similar approach of dimension reduction from images and image recovery was successful in previous simpler image analysis studies on plant phenotyping, such as Yoshioka et al. (2004) and Furuta et al. (1995). We also expect the concept of LSP to be readily applied to hyperspectral images where a tremendously large number of dimensions need to be handled, and in which it is difficult to intuitively infer the data structure.

Leaf segmentation and reconstruction in canopy
Leaves and roots are important organs for maintaining photosynthesis. Although leaf canopies have historically been evaluated as a mass of leaves, the automatic segmentation of individual leaves has been recently challenged in crops, such as sugar beet (Xiao et al. 2020), barley (Paulus et al. 2014), maize (Miao et al. 2020), and wheat (Srivastava et al. 2017), based on 3D-point clouds constructed by SfM-MVS or LiDAR. Once such organ segmentations are successful from the point clouds, surface reconstruction of the segmented point clouds for each organ becomes necessary, as described by Ando et al. (2021). However, these studies focused on the individual plant level and cannot be directly linked to canopy performance, such as light interception efficiency, in the field. Understanding the leaf canopy foliage structure in a crop population is directly linked to the evaluation of photosynthesis through the ability to intercept light and the productivity of the canopy, expecting to identify genes in the architectural structure.
Leaves are often heavily occluded in the crop canopy. As shown by Isokane et al. (2018), the detailed 3D architectural structure of an individual plant can be reconstructed using CNN and multiview images, even if some parts of the plant are not visible from the outside. This highlights the possibility of using virtual crop populations constructed based on the detailed 3D architectural information acquired at the individual plant level for the comparison of photosynthetic performances among the virtual canopies with different plant architectures, as attempted by Liu et al. (2021b).

Interoperable data integration and data management platform
Alongside the rapid advances of HTP, the amount of data accumulated, including image data, is enormous. Building data management platforms for phenotypic data, as well as other omics data and environmental data, is tremendously important for plant science research, in combination with the development of data analysis technologies (Coppens et al. 2017). Because most of the data ever accumulated are managed in a proprietary format within a research organization, or even by a person who generates the data, data sharing among different organizations is rather inefficient. To accelerate collaborative research and realize interoperability, it is strongly recommended to integrate various types of data generated by different organizations.
To accelerate such interoperable data management and the development of data platforms, several international standards, such as Crop Ontology (Shrestha et al. 2012), which defines the relationships among crop-related vocabularies, MIAPPE (Minimum Information About a Plant Phenotyping Experiment) (Ćwiek-Kupczyńska et al. 2016), which proposes metadata standards for the data related to plant phenotyping, and BrAPI (Breeding API) (Selby et al. 2019), which efficiently bridges the breeding-related data and software developments, have been proposed. Utilizing these international standards, GnpIS, a data repository for plant phenomics, was developed (Pommier et al. 2019).
This repository allows for long-term access to datasets according to the FAIR principles (Findable, Accessible, Interoperable, and Reusable) (Wilkinson et al. 2016), covering phenotypic and environmental data, and ensures interoperable data integration between phenotypic and genotypic datasets. The use of GnpIS also guarantees interoperability with other data repositories by using international standards that enable such data links.
Many phenotyping studies using machine learning have published training data on paper publications. We have learned that compiling image data sets of wheat spike detection from several organizations around the world has accelerated related studies (David et al. 2020) and expect this activity should move across species aiming at a similar image archive as ImageNet (https://image-net.org/index.php) (Russakovsky et al. 2015). This archive has been fundamental in supporting the rapid development of general object recognition. As mentioned several times in this paper, 3D reconstruction technologies have been widely used in plant phenotyping to generate 3D information. Griffiths (2020) proposed a 3D print repository for plant data with data standardization, discussing the future perspective of 3D printing technologies in plant phenomics.

Easing training data provisions in machine learning approaches
As mentioned above, image analyses with machine learning technologies, including CNN, have been successfully applied to plant phenotyping, replacing human visual assessments with even higher accuracy. However, the machine learning approach requires the provision of training datasets. In general, the development of training datasets requires human visual annotations to manually label target objects, costing both labor and time. Moreover, a machine learningbased model developed in a domain cannot be applied to other domains. To ease such annotation costs, several solutions have also been proposed in plant phenomics. Ghosal et al. (2019) proposed a weakly supervised deep learning approach inspired by active learning for the detection and counting of sorghum heads in UAS RGB images using CNN models (RetinaNet and ResNet-50). In the weakly supervised approach, a CNN model was first trained with a small number of images. Then, false negatives and false positives generated during the validation process of the model were added to the original training data set repeatedly until a good detection performance is achieved. These authors showed that a model trained with 40 images by the weakly supervised approach achieved the same detection performance (R 2 = 0.88) as a model trained with 283 images. Although the proposed method still requires human interaction to identify false-negative and false-positive results after the validation process, the annotation time was roughly four times faster on average. Usu-ally, the annotation process requires labeling objects by drawing bounding boxes around objects, and the process performed visually by humans is time-consuming. To simplify this process, Chandra et al. (2020) proposed a point supervision approach, where the first step of the annotation was performed by clicking the inside of each object instead of drawing bounding boxes, followed by the automatic proposals of object regions for the next cycle of the weakly supervised training, resulting in a significant reduction in the annotation time.

Domain adaptation
A machine learning-based model, such as a CNN model trained in a particular domain, cannot be usually applied in another domain. For example, an orange fruit detection model from orange trees, which is supervised by the manual annotation of orange fruits, may not perform well or may even be totally useless in apple fruit detection. Therefore, a new training process based on images obtained from apple fruits is usually required. This approach is rather ad hoc, requiring the building of domain-specific models unlimitedly. In this context, expanding the coverage of a model trained in a domain to another domain without providing the training data set for the new domain, called domain adaptation, has been a hot topic in machine learning studies. Zhang et al. (2021) proposed a domain adaptation method for fruit detection using a CNN model, CycleGAN , based on GAN (Generative Adversarial Networks) (Goodfellow et al. 2014). CycleGAN is often used to transform images in a domain to those in another domain to learn the relationship between the two domains. Zhang et al. (2021) applied this feature of CycleGAN to automatically transform the training images manually annotated for orange fruit detection to the training images for fruits of other crops, such as apple and tomato fruits, without conducting the annotation process for those new crops. They trained a CycleGAN model to transform single orange images into single apple images, and the orange images of orange trees taken in an orchard were transformed into fake apple images using the trained CycleGAN. The fake images were used to train a CNN model, the Improved-Yolov3 (Jiang and Li 2020) model, to detect apples using the annotation information made on the original orange tree images, such as locations and bounding-box sizes, as pseudo-labels. The proposed method also included filtering out improper pseudo-labels to increase the accuracy of the detection. The results showed that the precision and recall of the detections by the models trained based on the pseudo-labels were as high as 0.89/0.92 and 0.91/0.94 for apples and tomatoes, respectively.

Data augmentation and synthetic data
Image data augmentation is comparatively a simple idea to stretch the scale of training data using existing training images. This stretch is expected to improve the robustness of the trained model preventing overfitting without additional BS Breeding Science Vol. 72 No. 1 Ninomiya costs for time-consuming processes, such as manual annotation. The simplest data augmentations are geometric transformations, such as flipping, rotation, cropping, shifting, zooming, and noise injection, randomly given to the original training images to increase the volume of training data. Color space transformation on the original training images is another example of augmentation. In addition to widely used image augmentations, the concept of synthetic data, sometimes called domain randomization, has been applied to plant phenomics to unload the annotation process and construct even more robust models. For example, (https:// arxiv.org/abs/1807.10931) successfully trained a leaf instance segmentation model based on Mask R-CNN for Arabidopsis by combining existing real training images with artificially generated images from a 3D rendering model. Shete et al. (2020) developed TasselGAN, which could synthesize maize tassel images to be used as training data for tassel detection and segmentation, by merging artificial tassel images and sky images generated. Toda et al. (2020) demonstrated a successful case of artificial data synthesis in their segmentation of crop seeds. First, they provided 20 single-seed images of 20 barley cultivars and manually annotated a bounding box for each of the seed images. Then, they repeated the process to locate randomly selected single seed images on a background with random rotations, allowing for a certain level of seed overlap to ensure that an image of the seed pool of a genotype was synthesized. Then, they generated 1,200 similar seed pool images and trained Mask R-CNN for the segmentation of barely seeds in barley seed pool images, where some of the seeds were overlapped and occluded, achieving very good segmentation performance against real-world seed pool images. They also showed that the segmented seed images were useful for seed morphological characterization, and that the proposed method was generally applicable to seed segmentations of other crops, such as wheat, rice, oat, and lettuce.

Understanding CNN black boxes
While CNN has shown great success in plant phenotyping, sometimes overperforming human visual judgment, they were left as black boxes in many of the cases. Understanding black boxes sometimes provides useful knowledge. For example, Ghosal et al. (2018) built a CNN model to accurately classify several leaf diseases in soybean and identify a key layer for classification. The heatmap pattern of the key layer was then used for the quantification (grading) of the diseases. Toda and Okura (2019) attempted to understand the inside functions of the black boxes of CNN disease classifiers trained with publicly available plant disease images by visualizing the status of neurons and layers. As a result, they discovered that CNN identified the disease in a manner similar to human visual judgment. With these findings, they demonstrated that some of the layers that did not contribute to the classifications could be eliminated without degrading the classification performance.

Discussion
This paper provides a summary of the current status and challenges of HTP, focusing mainly on the technologies used in outdoor fields for architectural crop traits, leaving the topic of root phenotyping uncovered. Based on our findings, we expect HTP to replace methods that are tedious, lowthroughput, subjective, destructive, invasive, subjective, and qualitative, by covering a broader breeding field in a shorter time, thereby contributing to more efficient plant breeding.
Some studies, such as Tanger et al. (2017) and Walter et al. (2019a), have discussed the usability of HTP in practical breeding. Tanger et al. (2017) compared the usability of HTP in rice breeding, targeting a new mapping population of over 1,500 RILs. They were able to scan over 4,500 plots of a 1.5 ha experimental field within two hours using a boom-sprayer-based ground vehicle with multispectral reflectance sensors, ultrasonics canopy height sensors, and infrared sensors. As a result, they estimated the vegetation index and height, and discovered that the QTLs identified for the traits obtained by HTP, even during the flowering stage, corresponded to the QTLs of the manually observed yield-related traits. They concluded that HTP could accelerate breeding, allowing researchers to estimate the breeding values and the effect of QTLs at a much earlier stage, in addition to very efficient data collection. Walter et al. (2019b) estimated the biomass and canopy height of wheat breeding fields using LiDAR mounted on a ground vehicle, scanning 7,400 plots/h, and showed that the heritability of those estimated values was highly repeatable and as high as the heritability of the corresponding ground observations, proposing a practical application in their breeding program.
HTP can also generate new traits that used to be fairly difficult to obtain in the past, such as time series canopy coverage growth patterns over time, providing new approaches to the study of crops. Furthermore, this method may allows to eliminate the need for tedious yield phenotyping after harvest by predicting yield and other desired traits with models based on traits that are more easily obtainable before harvest (Parmley et al. 2019), as previously discussed for model-assisted phenotyping. Despite the recent technological success of HTP, which is promising in the acceleration of crop breeding, few have practically adopted this method or demonstrated its results in plant breeding programs (Awada et al. 2018, Deery and. Deery and Jones (2021) emphasized the importance of targeting the needs of breeders rather than pursuing the technologies through the collaboration between phenomics researchers and breeders, while Awada et al. (2018) found that how to integrate and utilize enormous amount of data generated by HTP in breeding programs was unclear for plant breeders. In summary, existing HTP technologies are not breeder-oriented but technology-oriented.
Although breeders need an integrated pipeline or tool, most of the HTP technologies that are currently available are segmented, including data management. Thus, it is not easy for breeders to employ them. For example, several UAS applications have been introduced in this paper. Reading the original articles of those applications, although the usage of UAS seems straightforward, in reality it is rather difficult to capture quality images and to process the images before data analysis for phenotyping. As summarized by Guo et al. (2021), several complex steps are required to properly acquire and process field images by UAS prior to image processing for phenotyping, making the expected end users hesitate to adopt UAS for their breeding programs.
To solve these issues, the enrichment of easy-to-use phenotyping tools to handle these processes is necessary. EasyIDP (Wang et al. 2021a) for intermediate data processing for UAS images, EasyMPE (Tresch et al. 2019) for microplot extraction, EasyPCC  for crop segmentation, and EasyDCP (Feldman et al. 2021) for 3D phenotyping are good examples. Then, we would need to integrate these tools as a pipeline on a common data exchange platform with standardized application programming interfaces (APIs) and access to genotypic data.
In addition, many of the traits obtained by HTP are given in the estimated values or newly defined ones, and breeders hesitate to replace traditionally obtained values with the estimated values or the values of the newly defined traits. Regarding this issue, discussions of the estimated values for the newly defined traits by HTP are needed among crop scientists, including breeders. For example, there is a need to understand that a widely used index, LAI, is a compromised index that cannot exactly reflect canopy foliage architecture because the light interception of a canopy with the same LAI with different leaf angles should not be the same. Alternatively, NDVI, the most popular vegetation index, has the same background as it was defined when a very limited number of reflectance bands was available. Now that hyperspectral images are becoming available at reasonable prices, we may be able to develop new models to monitor crop physical and physiological status with a much higher dimension and accuracy.
In this review, phenotyping of non-architectural traits, such as nutritious conditions and photosynthetic activities, has not been discussed despite their importance in crop productivity. It is well known that chlorophyll content can be estimated well by using spectral reflectance as commonly used in SPAD measurements, and can be estimated from UAS hyperspectral images (Shu et al. 2021). Fu et al. (2019) also showed the possibility of estimating the photosynthetic capacity of six tobacco genotypes using a model based on hyperspectral reflectances. Furthermore, the use of light-induced fluorescence transients (LIFT) has been used to estimate photosynthetic activities in open canopies. For example, Keller et al. (2019) used LIFT to evaluate photosynthesis in the soybean canopy. A totally different approach was used by Liu et al. (2021b), who compared the photosynthetic performances using the virtual canopies of different foliage architectures, as introduced above. Although these technologies look great, they are still far from being practically applied in HTP in the field.

Author Contribution Statement
S.N. wrote the manuscript.

Acknowledgments
The author thanks Dr. Guo Wei of the University of Tokyo for his valuable comments and feedback on this work. This work was partially funded by the CREST Program "Knowledge discovery by constructing AgriBigData" (JPMJCR1512), the SICORP Program "Data science-based farming support system for sustainable crop production under climatic change" (JPMJSC16H2), and the aXis B type project "Development and demonstration of highperformance rice breeding support pipeline for semiarid area" of the Japan Science and Technology Agency (JST).