The Horticulture Journal
Online ISSN : 2189-0110
Print ISSN : 2189-0102
ISSN-L : 2189-0102
依頼総説
Progress in Genomics-based Breeding of Tropical Fruit Tree Species
Kenji Nashima
著者情報
ジャーナル オープンアクセス HTML

2025 年 94 巻 3 号 p. 296-306

詳細
Abstract

Tropical fruits, including bananas (Musa accuminata Colla), pineapples (Ananas comosus (L.) Merr.), and mango (Mangifera indica L.), are important edible fruits worldwide. The breeding of tropical fruit tree species poses many challenges compared with the breeding of annual crops because of their long juvenile period and the high cost of raising individuals to maturity in the field. With progress in next-generation sequencing (NGS) technologies, genome sequencing and identification of trait-associated loci have greatly progressed in recent years. The whole-genome sequences of most of the major tropical fruit tree species have been reported. Recent high-accuracy long-read sequencing technologies have enabled telomere-to-telomere and haplotype-resolved genome sequencing. These technologies have also been adopted for the genome sequencing of tropical fruit tree species. Quantitative trait locus (QTL) mapping and genome-wide association studies based on NGS technologies have been conducted on tropical fruit tree species to detect trait-associated loci. These analyses helped identify multiple QTLs that may contribute to efficient breeding selection. Two issues have been raised when applying QTL information to breeding selection in pineapple programs in Japan. The first is the adaptability of various populations, for which developing haplotype-resolved QTL information has been proposed as a solution. The second is to consider multiple QTLs among multiple traits in DNA-based selection. To utilize multiple QTLs among multiple traits for DNA-based breeding selection, a scheme for accumulating preferred alleles has been proposed and applied to pineapple breeding programs in Japan.

Introduction

Tropical fruit trees producing edible fruits are native to tropical or subtropical regions. Tropical fruit tree species include woody fruit trees (mango (Mangifera indica L.), acerola (Malpighia emarginata DC.), cherimoya (Annona cherimola Mill.), lychee (Litchi chinensis Sonn.), avocado (Persea Americana Mill.), and durian (Durio zibethinus L.)) and perennial herbs (banana (Musa accuminata Colla), pineapple (Ananas comosus (L.) Merr.), papaya (Carica papaya L.), passion fruit (Passiflora edulis Sims), and pitaya (Hylocereus undatus (Haw.) Britton & Rose)). Tropical fruits are economically significant among fruit species. According to the statistical database of the Food and Agricultural Organization (FAO) of the United Nations (FAOSTAT), the world’s most-produced fruits are bananas (139 million tons), with mangoes (61 million tons) and pineapples (30 million tons) in the fifth and sixth positions, respectively (FAOSTAT 2023, http://faostat.fao.org/default.aspx, accessed 2024-12-18). These tropical fruits are favored by consumers in tropical countries as well as non-tropical countries; however, most tropical fruit tree species are sensitive to low-temperature conditions and cannot stably produce fruits in areas prone to such conditions. Therefore, tropical fruits are produced in countries in tropical or subtropical regions, many of which are classified as developing countries, and are exported to different parts of the world (mainly developed countries) (FAO, https://openknowledge.fao.org/server/api/core/bitstreams/ecc80c38-d519-4404-9a66-e36460358bd4/content, accessed 2024-12-18).

The breeding of perennial fruit trees, including tropical fruit trees, has many disadvantages compared to annual crops, such as lengthy breeding cycles, a long juvenile period, and the high cost of raising individuals to maturity in the field (Yamamoto, 2021). DNA marker-assisted selection systems have been developed for each fruit tree species to overcome these fruit tree breeding disadvantages. Such DNA marker-assisted selection offers particular benefits for the breeding of fruit trees because the breeding of these species is greatly limited by large tree sizes, long generation cycles, and long juvenile phases (Luby and Shaw, 2001). Quantitative trait locus (QTL) mapping and genome-wide association study (GWAS) have been performed to develop DNA markers associated with breeding traits and to determine whether genetic variants are associated with a trait. Although whole-genome sequencing and genome-wide DNA markers are necessary for these analyses, the development of genomic information resources for tropical fruit trees has fallen behind that of non-tropical fruit tree species. However, next-generation sequencing (NGS) technologies have dramatically promoted the development of genomic information resources. Herein, we review recent progress in whole-genome sequencing and QTL identification in tropical fruit tree species.

Genome sequencing of tropical fruit tree species

The importance of reference genomic sequences in breeding is well established. High-quality reference genome sequences allow the identification of QTLs by QTL mapping and GWAS. Genome-wide marker genotyping is required for QTL mapping and GWAS. In recent years, NGS-generated short-read sequence mapping has frequently been used to obtain genome-wide single nucleotide polymorphism (SNP) genotypes because of its cost-effectiveness and marker abundance. Reference whole-genome information enables the provision of SNP locus information and abundant SNP detection. In addition, the reference genome contributes to mining and identifying the genes responsible for a QTL by providing lists of the specific functions of the candidate genes and genes located around the QTL. Transcriptome sequencing (RNA-seq) using a reference genome helps analyze the differences in mRNA sequences and gene expression levels among accessions under different environmental conditions and developmental stages. Despite its usefulness, the whole-genome sequencing of tropical fruit tree species has lagged, primarily since the late 2010s.

The first published genome sequence of a tropical fruit tree species was that of the transgenic ringspot virus-resistant papaya cultivar ‘SunUp’ (Ming et al., 2008). This genome sequence was assembled using Sanger sequencing reads, a very costly process. Subsequently, the NGS-based genome sequences of tropical fruit species were reported (Table 1). In the early 2010s, the Roche Genome Sequencer (GS) FLX system and Illumina sequencing system were applied to the genome sequencing of bananas (D’Hont et al., 2012) and pineapples (Ming et al., 2015). The Roche GS FLX system was the first major commercially available NGS system, and it can read an average sequence length of approximately 700 bp. Long-read sequence technologies (PacBio series and Oxford Nanopore series) have been the main sequencing systems used for genome assembly. These technologies provide sequence reads of more than 10 kb, enabling the construction of long and accurately assembled sequences. These long-read sequencing technologies enable complete genome assembly from telomere to telomere. PacBio sequel and later models can provide HiFi reads (Wenger et al., 2019), with a read length of up to 25 kb and an average read accuracy of 99.95%. Using HiFi sequences, telomere-to-telomere gapless chromosome sequences have been generated for a few species. Telomere-to-telomere sequences are constructed using a single contig with telomere repeats at both ends without the gaps caused by any undetermined nucleotide sequences. Among tropical fruit species, telomere-to-telomere sequences have been reported for pineapple (Feng et al., 2024), mango (Wijesundara et al., 2024), banana (Belser et al., 2021), avocado (Yang et al., 2024), durian (Li et al., 2024d), and acerola (Shirasawa et al., 2024).

Table 1

Published genome sequences for fruit tree species.

Long-read technology has enabled the diploid assembly of both haplotypes. In diploid genomes, chromosomal DNA includes two haplotype genomes, each inherited from paternal and maternal parents. Early genome assemblies only provided representative sequences without resolving the haplotype genome. Progress in NGS technologies and assembly programs has enabled the construction of haplotype-resolved genomes. Haplotype sequence information enables the comparison of allelic sequences, which could help to identify genes that control phenotypes. For example, the haplotype-resolved sequences of the pineapple cultivar ‘Yugafu’ have contributed to the identification of the Ananas comosus WUSCHEL-related homeobox 3 (AcWOX3) gene, which determines a spiny or non-spiny leaf margin phenotype (Nashima et al., 2022). The haplotype-resolved assembly and comparison of haplotype sequences have revealed the insertion of an inverted repeat sequence of approximately 20 kb, including AcWOX3, which resulted in RNAi-derived reduction in the mRNA levels of AcWOX3, leading to the absence of spines. Similarly, QTLs show functional differences between multiple alleles. Haplotype sequences can help identify effective alleles for breeding selection. In addition, the sequence information of the haplotype genome is a crucial resource for gene identification. Although various schemes exist for gene identification, a standard scheme is described below. First, the candidate gene-genomic regions are narrowed down by QTL mapping, GWAS, or fine mapping or by narrowing down genes based on transcriptomic comparison. Subsequently, causative genes are predicted and a search of the narrowed down genes is performed, followed by sequence comparison between allelic sequences. The causative genes for QTLs are expected to exhibit differences in genomic sequences between alleles. mRNA expression levels or protein sequences should differ between alleles. Such differences can be detected by comparing mRNA sequences and expression levels. RNA sequencing and mapping of the obtained RNA sequences to genome sequences are generally used to elucidate transcriptome information, with a specific public genome sequence used as a reference genome. However, the RNA sequence data frequently includes RNA sequences derived from different alleles in almost all genes, which can produce ambiguous mapping results when mapping a specific public sequence. When the corresponding alleles are mapped as references, allele-specific splicing errors leading to reduced gene expression levels are detectable; this information contributes to specifying the functional differences between alleles. Such haplotype-resolved RNA sequence approaches have been reported for certain horticultural crops (Cheng et al., 2021; Han et al., 2024; Zhang et al., 2024a). In tropical fruit species, haplotype-resolved genomes have been reported for pineapple (Feng et al., 2024; Nashima et al., 2022), mango (Singh et al., 2021; Wijesundara et al., 2024), banana (Belser et al., 2021; D’Hont et al., 2012; Li et al., 2024e; Wang et al., 2019; Xie et al., 2024), lychee (Hu et al., 2022), and avocado (Nath et al., 2022). Although at least one genome sequence has been published for each tropical fruit tree species, a further collection of the haplotype sequences of accessions will help promote breeding research.

QTL mapping, GWAS, and gene identification of tropical fruit tree species

QTL mapping and GWAS are methods used to detect QTLs. Specifically, QTL mapping is a statistical method that combines linkage analysis with a statistical model of the phenotypic values of a trait applied to map QTLs and determine their effects (Iwata et al., 2016). QTL mapping requires a genetic linkage map in addition to genome-wide marker genotypes and trait phenotypes. Therefore, QTL mapping is commonly performed using a single-crossed population to construct genetic linkage maps. However, GWAS requires genome-wide marker genotypes and trait phenotypes to detect QTLs. GWAS is more suitable for QTL detection in fruit tree species than QTL mapping because the analyzed population is not limited to a single population and can include multiple populations and accessions (Iwata et al., 2016). To date, there have been 24 publications on QTL mapping and GWAS about eight tropical fruit tree species (Table 2). Among tropical fruit trees, random amplified polymorphic DNA (RAPD) marker-based genotyping and QTL mapping for 62 loci with 253 accessions in papaya represented the earliest generation of QTL mapping (Sondur et al., 1996). Although RAPD markers can detect DNA variants without genomic DNA sequence information, the reproducibility of variant detection is lower than that obtained using other DNA markers (Jones et al., 1997). Subsequently, polymerase chain reaction (PCR)-based genotyping and QTL mapping have been reported for papaya (Blas et al., 2012) and passion fruit (Pereira et al., 2017). The DNA marker of papaya was designed using the whole genome sequence of the ‘SunUp’ cultivar (Chen et al., 2007), whereas DNA markers used in passion fruit QTL mapping were developed by several researchers using Sanger sequencing (Pereira et al., 2013). The early generation of next-generation sequencers, such as Roche GS FLX, enabled shotgun DNA sequences to be obtained relatively easily. Using shotgun sequences, DNA marker development has been performed in tropical fruit species, including pineapple (Nashima et al., 2020), pitaya (Nashima et al., 2021), banana (D’Hont et al., 2012), and mango (Mahato et al., 2016). Although these PCR-based markers have high reproducibility, they require considerable time, cost, and effort to obtain genotype information. Therefore, only a few reports on QTL mapping using PCR-based markers have been published. Low-cost SNP genotyping techniques and NGS-based reduced-representation genome sequencing methods, including genotyping by sequencing (Elshire et al., 2011) and double digestion restriction site-associated sequencing (Peterson et al., 2012), have been developed and are widely used. These techniques help sequence only a fraction of the whole genome, which significantly simplifies the resulting dataset compared with that in whole-genome sequencing approaches. These methods enable us to obtain thousands of SNPs for each individual at a lower cost than genotyping using PCR-based DNA markers. This has enabled GWAS with large sample sizes and abundant markers in pineapple (Nashima et al., 2024b; Sanewski, 2022), papaya (Nantawan et al., 2019), banana (Nyine et al., 2019; Osorio-Guarin et al., 2024; Rio et al., 2025; Sardos et al., 2016), and mango (Ma et al., 2024; Mango Genome Consortium et al., 2021).

Table 2

QTL mapping and GWAS for tropical fruit trees.

Most of the genes determining quantitative or qualitative traits in tropical fruit tree species have not yet been identified. In pineapples, trait-associated gene identification has been reported for the spine phenotype in the leaf margin (AcWOX3), white coloration in the flesh (A. comosus carotenoid cleavage dioxygenase 4: AcCCD4), and red coloration in the peel (AcMYB266) (Nashima et al., 2022; Zhang et al., 2024b). For example, AcWOX3, was identified by comparing the haplotype sequences of the genomic region specified by fine mapping. AcCCD4 was identified by QTL analysis and comparative analyses of mRNA expression. For AcMYB266, transcriptome analysis identified AcMYB266 as the gene determining anthocyanin accumulation in the peel. For papaya, Blas et al. (2010) identified the lycopene β-cyclase gene as a flesh color determinant by map-base cloning and comparison of allelic sequences. In the future, additional corresponding genes for quantitative or qualitative traits in tropical fruit tree species will be determined based on genome sequences and QTL information.

Proposal for practical DNA-based selection based on pineapple breeding in Japan

The QTLs identified by QTL mapping and GWASs are expected to be applied to breeding selection. DNA marker-assisted selection, a method for weeding out undesirable individuals using marker genotypes, can effectively utilize genomic information for breeding. However, the practical use of marker-assisted selection in breeding programs has been hampered by several issues. Here, we discuss two issues in the pineapple breeding program at the Okinawa Prefectural Agricultural Research Center (OPARC) in Japan.

One issue is the allele multiplicity of each QTL. Many cross-combinations have been produced in breeding programs to obtain variable populations. In the pineapple breeding program at OPARC, 4,000 pineapple seedlings are obtained from 20 to 30 artificial hybridizations every year (Ogata et al., 2016). For efficient breeding selection, QTL detected by QTL mapping and GWAS (Nashima et al., 2023, 2024a, b) can be applied for marker-assisted selection. Although each analysis was performed using NGS-based SNP genotyping, SNPs can only distinguish between two types, i.e., reference or alternative alleles, even if there are more than three alleles for the locus. Generally, multiple alleles exist for each trait-associated locus among accessions, and only certain allele(s) are desirable for breeding, whereas others are not. Nashima et al. (2024b) confirmed the applicability of SNPs linked to QTL in seven breeding populations of pineapple and reported that five SNPs applied to all seven populations. Although these SNPs were reliably utilized in populations crossed between the parents of these seven populations, they were not certified by other parents. Without addressing allele multiplicity, DNA marker selection will be uncertain and not applicable to practical breeding selection. Distinguishing desirable alleles from other alleles using DNA markers is required for reliable breeding selection. Haplotype sequence information can be used to resolve such allele multiplicity. The deployment of haplotypes in breeding is referred to as haplotype-based breeding (Bhat et al., 2021). Minamikawa et al. (2021) focused on the founder haplotype of fruit tree species and applied it to a GWAS and genomic selection of apples. The research group, including the author, obtained haplotype sequences for major founder breeding accessions in OPARC to transform SNP-based QTL information into haplotype-based QTL information (Fig. 1). Combining QTL and haplotype information could resolve the issue of allele multiplicity.

Fig. 1

Schematic depicting the model for developing haplotype-based quantitative trait loci (QTL) information to breed fruit tree species. First, single nucleotide polymorphisms (SNPs) for QTLs are detected by a genome-wide association study (GWAS) using breeding populations. For example, adenine may be preferred over cytosine in certain QTL SNPs. Next, QTL SNPs and neighboring sequences are identified by a search and extracted from the obtained whole haplotype sequences of the founder accessions. Subsequently, haplotype numbers, Hap1 to Hap6, and their QTL SNPs are identified, and this haplotype sequence information is used for genotyping the parents of breeding populations. Finally, trait-haplotype associations are confirmed for each breeding population to reveal which haplotypes are preferred for breeding selection. Although the SNP allele “A” was assessed as preferred from SNP-based GWAS, Hap2 with an “A” SNP was finally judged as not preferred.

Another issue is that DNA marker selection is too strict when multiple QTLs are used. When marker-assisted selection is performed using a single DNA marker, half, one quarter, or three quarters of the individuals with a desirable genotype are selected. In theory, applying multiple DNA markers increases selection efficiency drastically by removing all non-ideal genotypes. For example, using 10 markers from half of the individuals with a desired selection marker for further breeding resulted in a 1/1,024 population size. However, this strict marker-assisted selection is unacceptable in breeding programs. Nashima et al. (2024b) identified nine QTL for nine traits, including flesh color, flesh color value (b*), bolting days, days from bolting to harvest, fruit shell color, soluble solid content, acidity, ascorbic acid content, and fruit cracking, in pineapple. There was only a 0.2% probability of obtaining the ideal genotype from the nine loci in an example cross. Weeding out 99.8% of seedlings before planting is too challenging for breeding selection. Therefore, Nashima et al. (2024b) suggested an alternative strategy, i.e., determining the genotypes of all QTLs for all seedlings and then selecting and planting seedlings based on their genotype rank and field capacity (Fig. 2). This strategy considers all detected QTL and also predicts the entire phenotype from the accumulation of preferred alleles. Currently, research groups including the author are evaluating the effectiveness of this selection strategy in OPARC.

Fig. 2

Model of breeding selection based on multiple quantitative trait loci (QTL). Seedlings from the breeding population are genotyped for all QTL. Elite individuals, considered candidates for new cultivars, possess preferred alleles in multiple QTL. After determining the extent of the possession of the preferred allele, breeders select the plant individuals to be confirmed for their actual phenotypes. According to their field capacity, breeders select the individuals to be planted based on their ranking data. “+” indicates preferred allele or haplotype, whereas “−” indicates not preferred allele or haplotype.

Future perspectives on genomics-based breeding of tropical fruit tree species

Recent progress in sequencing technologies has enabled whole-genome sequencing and SNP genotyping at a lower cost and with higher accuracy. The whole genomic sequences of major tropical fruit tree species have been published, and QTL information has gradually accumulated. Although breeding programs for tropical fruit trees are performed at institutes in tropical and subtropical regions, NGS technologies have not been fully introduced. Minamikawa et al. (2018) suggested that phenotype data routinely collected for breeding populations could help increase the power and resolution of GWASs. In addition, breeding resources maintained at each institute could directly help with gathering and sharing QTL information. As QTL information has not yet been fully accumulated for each tropical fruit tree species, further GWAS and QTL mapping research is required.

Literature Cited
 
© 2025 The Japanese Society for Horticultural Science (JSHS)

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial (BY-NC) License.
https://creativecommons.org/licenses/by-nc/4.0/
feedback
Top