Sat-BSA: an NGS-based method using local de novo assembly of long reads for rapid identification of genomic structural variations associated with agronomic traits

Tenta Segawa; Chisato Nishiyama; Muluneh Tamiru-Oli; Yu Sugihara; Akira Abe; Hinako Sone; Noriaki Itoh; Mayu Asukai; Aiko Uemura; Kaori Oikawa; Hiroe Utsushi; Ayako Ikegami-Katayama; Tomohiro Imamura; Masashi Mori; Ryohei Terauchi; Hiroki Takagi

doi:10.1270/jsbbs.20148

Abstract

Advances in next generation sequencing (NGS)-based methodologies have accelerated the identifications of simple genetic variants such as point mutations and small insertions/deletions (InDels). Structural variants (SVs) including large InDels and rearrangements provide vital sources of genetic diversity for plant breeding. However, their analysis remains a challenge due to their complex nature. Consequently, novel NGS-based approaches are needed to rapidly and accurately identify SVs. Here, we present an NGS-based bulked-segregant analysis (BSA) technique called Sat-BSA (SVs associated with traits) for identifying SVs controlling traits of interest in crops. Sat-BSA targets allele frequencies at all SNP positions to first identify candidate genomic regions associated with a trait, which is then reconstructed by long reads-based local de novo assembly. Finally, the association between SVs, RNA-seq-based gene expression patterns and trait is evaluated for multiple cultivars to narrow down the candidate genes. We applied Sat-BSA to segregating F₂ progeny obtained from crosses between turnip cultivars with different tuber colors and successfully isolated two genes harboring SVs that are responsible for tuber phenotypes. The current study demonstrates the utility of Sat-BSA for the identification of SVs associated with traits of interest in species with large and heterozygous genomes.

Introduction

World population has exceeded 7.5 billion and is projected to reach about 10 billion by 2050. Of this, about 820 million (more than 10%) currently suffer from hunger (Boliko 2019). To mitigate this problem and meet the growing global food demand, the breeding of climate-resilient, high-yielding and pest and disease resistant/tolerant crop varieties should be a priority for our sustainable lives. One of the techniques widely used in crop breeding is marker assisted selection (MAS) (Ashikari and Matsuoka 2006). In MAS, DNA markers tightly linked to genes controlling traits of interest are used to select the progeny inheriting the desirable traits without the need for phenotyping, which is a time-consuming and laborious step in conventional breeding. This feature of MAS, in combination with the ability to artificially manipulate plant growth and development including germination and flowering time, allows breeders to accelerate crop improvement (Ohnishi et al. 2011). However, the identification of genes or genomic regions underlying target traits, which is a prerequisite for MAS, is still a rate-limiting step in plant breeding.

Recent approaches combining NGS and bulked-segregant analysis (BSA) have enabled rapid gene identifications (Abe et al. 2016, 2018, Xu and Bai 2015). BSA is a technique for identifying genomic regions controlling traits of interest by genotyping bulk DNA samples prepared from individuals showing contrasting phenotypes or trait values in a segregating progeny (Giovannoni et al. 1991, Michelmore et al. 1991). While BSA originally relied on limited numbers of classical DNA marker such as restriction fragment polymorphisms and random amplified polymorphic DNAs, NGS-based BSA makes use of genome-wide single nucleotide polymorphisms (SNPs), allowing high-throughput analysis with improved resolution. MutMap is one such NGS-based BSA technique for identifying causal mutations responsible for traits of interest in mutants (Abe et al. 2012). In MutMap, an F₂ progeny is developed by crossing a mutant to the parental line used for mutagenesis, followed by NGS-based sequencing of a DNA sample pooled from mutant F₂ individuals. The NGS reads are used for the following genotyping: SNP calling and calculating SNP-index (or mutant allele frequencies) at all SNP positions. Finally, the SNP position occupying 100% mutant allele in the bulk DNA sample is defined as the causal mutation responsible for the phenotype. MutMap allows high-accuracy SNP detection, making it suitable for the analyses of mutants generated by chemical mutagens that cause simple genetic variations such as SNPs. However, this feature limits its application to plant species since the development of large mutant collections is basically dependent on self-compatible nature of plants. MutMap is also not suitable for analyzing complex natural genetic variations that include major structural differences such as insertions, deletions and rearrangements due to difficulty with short reads alignment on structurally variable genomic regions.

Natural variations are results of natural and artificial selections accumulating over a long period of environmental adaptation followed by domestication, and therefor are more complex and diverse than variations induced by artificial mutations. Thus, these variations are likely to be beneficial genetic sources for selective breeding because they would be selected and maintained to exert robustness against biotic challenges (Pathogen attack) and abiotic stresses (climate change). As an NGS-based BSA for identifying such natural variations, we previously developed a method we named QTL-seq using progeny derived from crosses between cultivars showing phenotypic differences (Itoh et al. 2019, Takagi et al. 2013). However, because this technique, like MutMap, is based on only “alignment-able” re-sequencing data on the reference sequence to assess genome-wide SNP differences, it often results in genetic mapping on the rough genomic regions (in Mb) associated with trait/phenotype of interest, but not in identification of the causal genes.

Brassica rapa is independently domesticated in Europe and Asia before its cultivation spread worldwide (Cheng et al. 2016, Zhao et al. 2005). Its long history of cultivation and associated extensive natural and artificial selections have led to considerable genetic diversity and the development of many economically important crop species such as Chinese cabbage (ssp. pekinensis), Pak choi (ssp. chinensis), Wutacai (ssp. narinosa), oil seed types (ssp. oleifera) and turnip (ssp. rapa) (Diederichsen 2001, Yarkhunova et al. 2016). Different breeding strategies have been followed for these crops driven by demands of growers and consumers. To accelerate B. rapa breeding via MAS, the identification of genes controlling agronomically important traits is vital. In B. rapa, developing mutant lines in sufficient numbers for forward genetics is difficult due to its self-incompatible nature, which hinders the selfing required to fix heterozygous mutations (Serrat et al. 2014, Takasaki et al. 2000). This also hampers the application of MutMap in B. rapa. We recently improved the QTL-seq protocol for the highly heterozygous B. rapa, successfully identifying genomic regions associated with phenotypic differences in cultivars (Itoh et al. 2019). However, the identification of causal genes within the candidate genomic regions remains to be clarified. The publicly available “reference genome” sequence of B. rapa was constructed using the Chinese cabbage ‘Chiifu’ (Cai et al. 2017, Wang et al. 2011). Consequently, in ‘Chiifu’ genome sequence-based genetic mappings, it is often difficult to identify candidate genes due to structural variants (SVs) between ‘Chiifu’ and the other B. rapa cultivars. Such SVs are often addressed by generating BAC libraries for the cultivar of interest, followed by screening and sequencing of the BAC clones with DNA probes tightly linked to the target genomic region (Kim et al. 2006). However, this procedure is laborious and time-consuming.

In the present study, we developed “Sat-BSA” an NGS-based BSA for rapidly and efficiently identifying SVs associated with traits using segregating progeny derived from crosses between diverse cultivars. Sat-BSA comprises three steps; 1) identifying the candidate genome regions harboring causal genes by QTL-seq that utilizes genome-wide SNP positions differentiating parental cultivars, 2) re-constructing genome of one of the parental cultivars used for QTL-seq by long reads-based local de novo assembly of the target genomic interval, and 3) analyzing the association between phenotype, SVs and gene expression patterns in multiple cultivars to identify candidate genes. To verify the efficiency and accuracy of Sat-BSA, we demonstrate an application to the identification of two genes responsible for tuber color variations in turnip.

Materials and Methods

Plant materials

The B. rapa cultivars ‘Akamaru (AKA)’, ‘Kanazawa Aokabu (KAN)’ and ‘Hinona (HIN)’ and the additional landraces used in this study were obtained from the National Agriculture and Food Research Organization (NARO) Genebank of Japan (https://www.gene.affrc.go.jp/index_en.php). The NARO Genebank accession numbers of the cultivars/landraces were summarized in Supplemental Table 1. The ‘CS2105577’ Arabidopsis T-DNA insertion mutant line was obtained from the SALK Institute for Biological Studies (http://signal.salk.edu).

Library preparation and NGS

For NGS by Illumina, genomic DNA was extracted using the DNeasy Plant Mini Kit (QIAGEN). About 1 or 0.5 μg DNA was sheared with NEBNext dsDNA Fragmentase (NEB) and used for library preparation using TruSeq DNA LT Sample Prep Kit (Illumina) or NEBNext Ultra II DNA Library Prep Kit for Illumina (NEB) according to the manufacturers’ instructions. For constructing RNA-seq libraries, total RNA was extracted with the RNeasy kit (QIAGEN) followed by DNase treatment using TURBO DNase (Thermo Fisher) to remove contaminating DNA. Then, 1 μg RNA was used for library construction with TruSeq RNA Library Prep Kit v2 (Illumina) or NEBNext Ultra II RNA Library Prep Kit for Illumina (NEB). The libraries were subjected to paired-end sequencing using 2 × 75 bp, 2 × 300 bp and 2 × 150 bp cycles on Illumina NextSeq500, MiSeq and HiSeq X platforms, respectively (Supplemental Table 2).

For generating ONT long reads, 1 μg gel-purified DNA (Zymoclean Large Fragment DNA Recovery Kit, ZYMO RESEARCH) was subjected to library construction using the 1D Genomic DNA Ligation Sequencing Kit (SQK-LSK 109, Oxford Nanopore Technologies). The libraries were sequenced on the MinION sequencer (Mk1B, Oxford Nanopore Technologies) according to the manufacturer’s protocol. The resulting ONT long reads were corrected with Canu ver 1.9 (Koren et al. 2017) using default configuration except for the expected total contig size and stop coverage values. The expected total size of assembled contigs was set at 485 Mb, which also corresponded to the B. rapa genome size (Wang et al. 2011). The stop coverage value was decided depending on the total read number in each sample. The ONT reads amount and the stop coverage value used in Canu were summarized in Supplemental Table 3.

QTL-seq analysis

For QTL-seq analysis, all Illumina short reads with more than 20% of their bases having phred quality score of <20 were excluded. The filtered quality short reads were analyzed using the QTL-seq pipeline ver 2s (https://www.ishikawa-pu.ac.jp/staff/staffname/takagi-horoki/, http://genome-e.ibrc.or.jp/home/bioinformatics-team/mutmap). In this pipeline, alignment of reads and conversion of SAM/BAM files were carried out with BWA ver 0.7.15 (Li and Durbin 2009) and SAMtools ver 1.9 (Li et al. 2009), respectively. The publicly available B. rapa genome sequence “BrapaV3.0PacBio.Chr.fa”, excluding the unanchored scaffolds (http://brassicadb.org/brad/downloadOverview.php), was used as a “reference genome”. SNP-index values were calculated at all SNP positions that were homozygous in both parental lines but heterozygous in F₁. Because DNA sample was not available for NGS from the F₁ used for generating F₂ progeny, DNA sample pooled from all F₂ segregating individuals was used instead for filtering SNPs by QTL-seq pipeline. SNP positions with alignment depth of <10 were excluded from QTL-seq analysis.

Local de novo assembly

For local de novo assembly, we first aligned ONT reads to B. rapa “reference genome” by Minimap2 ver 2.17 (Li 2018) and selected the reads that aligned at a candidate genomic region with the information in SAM file. Then, the selected reads were assembled by Canu ver 1.9 using default configuration except for the expected total contig size and stop coverage. The expected total size of assembled contigs was set at the length of candidate genomic region identified by QTL-seq. The stop coverage value was decided depending on the total read number in each sample. The selected ONT reads were summarized in Supplemental Table 4.

ONT reads alignment depth-based analysis of SVs

We first aligned ONT reads to contigs developed by local de novo assembly by Minimap2 ver 2.17. The aligned ONT reads showing mapping quality <60 were excluded from analysis.

RNA-seq analysis

The RNA sequence reads obtained from Illumina sequencer were filtered for quality, and sequence reads having more than 20% of their bases with phred quality score of <20 were excluded (Supplemental Table 2). Then, the filtered reads were aligned to the contigs generated by local de novo assembly with STAR ver 2.0.0 (Dobin et al. 2013). In the configuration of STAR, intron size (option --alignIntronMax) was set at 30 kb and the option for gene prediction (--outSAMstrandField intronMotif) was used. Only those reads that aligned in pair and showed mapping quality >1 were retained. Then, the exon-intron structure of genes were predicted with StringTie ver 2.1.1 (Pertea et al. 2015). For comparing transcript levels of the samples, the RNA-seq reads aligned to exons were counted by featureCount (Liao et al. 2014) followed by normalization of the read counts by transforming them to transcripts per kilobase million (TPM) values (Wagner et al. 2012). Finally, the calculated TPM values at each exon were compared. The genes that showed low TPM value (<10) in the samples used for gene prediction were excluded from analysis.

HPLC analysis

Cyanidin and pelargonidin standards were purchased from Indofine Chemicals (Hillsborough, NJ, USA). Anthocyanins were extracted from 100 mg of fine-grained tuber skin with 1 mL of 75% MeOH containing 2 N HCl and 0.04% ascorbic acid. The mixture was centrifuged at 14,000 rpm for 5 min, and 200 μL of the supernatant was hydrolyzed at 90°C for 50 min. For analysis, the sample was centrifuged at 14,000 rpm for 10 min, and 20 μL of the supernatant was subjected to a high-performance liquid chromatography (HPLC) using a Prominence LC20A instrument (Shimadzu, Japan) equipped with a reversed-phase YMC-ODSA column (4.6 mm i.d. × 150 mm; YMC, Kyoto, Japan) protected by a guard column containing the same material. The elusion was performed under constant flow (1 ml/min) with a linear gradient condition between 0.1% phosphoric acid (buffer A) and acetonitrile (buffer B). The condition used was: 0–15 min, 10–40% B; 15–25 min, 40–60% B; 25–35 min, 60–100% B; 35–45 min 100% B; and 45–55 min 10% B. Analysis was carried out at 37°C, and absorbance at 520 nm was monitored with photodiode array (Biospec-1600; Shimadzu, Japan). Anthocyanidins were identified by co-elution with the standard samples and the absorption spectra of the peaks.

Phenotyping and genotyping in F₂ progeny

Seeds of F₂ progeny obtained from the crosses ‘AKA’ × ‘KAN’ and ‘AKA’ × ‘HIN’, together with their parental lines, were sown in 4.7 × 4.7 cm pots in glasshouse. Eleven days old F₂ seedlings (‘AKA’ × ‘KAN’) were transplanted into a filed at Ishikawa Prefectural University in Nonoichi, Japan, on May 9^th, 2017. One month after transplanting, tuber color was visually assessed. Eighteen days old F₂ seedlings from ‘AKA’ × ‘HIN’ cross were transplanted into a vinyl house on May 2^nd, 2017. Twenty-four days after transplanting, tuber color was visually assessed. F₂ plants were genotyped using DNA markers designed at polymorphic regions detected by whole genome sequencing of the parental lines (Supplemental Table 5).

Transformation of Arabidopsis thaliana

A. thaliana was transformed by floral dip method with Agrobacterium tumefaciens strain EHA105 (Clough and Bent 1998) and the binary vectors pCAMBIA1301MdNcoI (Imamura et al. 2018) carrying transgenes in the T-DNA region. The T-DNA region used for transformation in this study is described in Supplemental Fig. 1. For phenotyping, transgenic and non-transgenic lines were cultivated in an incubator under the following conditions: Photoperiod = 16 h/8 h light/dark, Temperature = 20°C. The color of each plant organ was visually assessed, and anthocyanins were measured by HPLC as described above.

Near isogenic line (NIL) development

NILs were developed by backcrossing the F₁ plants (‘AKA’ × ‘KAN’) to ‘KAN’ four times followed by selfing of BC₄F₁ plants. The BC₄F₂ generation was defined as NIL. At each generation, the individuals with the desirable genotype were selected based on DNA marker data. Details of the DNA markers used for genotyping is provided in Supplemental Table 5.

Results

The principle and operation flow of Sat-BSA

The principle and operation flow of Sat-BSA is explained using two turnip cultivars with red and white tuber skin colors as an example (Fig. 1). In this example, the causal gene responsible for the tuber color difference in the two cultivars is targeted. The first Sat-BSA step is the identification of a candidate genomic region(s) associated with tuber color by QTL-seq (Itoh et al. 2019). To this end, we first cross the cultivar with red tubers (P1) to the second cultivar with white tubers (P2) and generate an F₂ progeny segregating for red:white tubers with 3:1 ratio (Fig. 1A). Multiple individuals with white tubers are pooled in equal amounts to prepare a bulk DNA sample for NGS. In parallel, NGS reads obtained from one of the parents (P1 in this example) are aligned to the publicly available “reference genome” (hereafter called “reference genome”) of B. rapa. Nucleotides of the “reference genome” are then replaced by P1 nucleotides at all homozygous SNP positions to prepare “P1 reference” sequence. The NGS short reads obtained for the white tuber bulk DNA sample is aligned to “P1 reference” to calculate SNP-index values at all SNP positions. In this example, the SNP-index values represent the frequency of P2 alleles in the bulk sample. This is followed by sliding window analysis (SWA) for calculating average of SNP-index values within defined window and step sizes. Finally, the genomic region showing statistically significant peaks identified by SWA of SNP-index values is defined to potentially link to candidate genes associated with white tuber, the phenotype shared by the F₂ individuals pooled for NGS (Fig. 1A).

Fig. 1.

Overview of Sat-BSA. (A) To identify the gene responsible for the red tuber phenotype of P1, P1 is crossed to P2 and segregation of tuber color is assessed in F₂ generation. A single bulk DNA sample pooled from F₂ individual with white tubers is applied to NGS and used to define a candidate genomic interval by QTL-seq analysis. (B) P1 ONT long reads that map to the candidate genomic interval are assembled de novo to reconstruct the P1 sequences. (C) Using the assembled contigs and RNA-Seq reads obtained from tissues that are expected to express the gene of interest, protein-coding genes are predicted, and their structure defined. (D) The structural variations (SVs) within the candidate interval is assessed in multiple cultivars (P1, P2 and P3) based on P-value of Fisher’s exact test comparing depth of aligned ONT reads. (E) Comparison of expression patterns among the cultivars based on transcripts per million (TPM) value of RNA-Seq reads for the identification of differentially expressed genes (DEGs). Candidate genes are determined based on both SVs and DEGs data.

The second Sat-BSA step is de novo assembly of the candidate genomic region identified by QTL-seq using long reads obtained for one of the parents (P1 in our example). A methodological limitation of recent BSAs such as QTL-Seq is absolute dependence on the publicly available “reference genomes”. Reconstructing a “P1 reference” by replacing nucleotides of the “reference genome” by P1 nucleotides at all SNP positions fails to change the structural differences between P1 and the “reference genomes”. Consequently, the genes that are located within P1-specific genomic regions are excluded from the analysis. To avoid this methodological limitation, P1 long reads obtained by Oxford NanoPore Technology sequencer (ONT reads) are aligned to the “reference genome”. The ONT reads that align to the candidate genomic region identified by QTL-seq are selected and assembled de novo (Fig. 1B). Then, structures of the protein-coding genes predicted within the assembled contigs by StringTie are further verified using RNA-seq data obtained from tuber surface, the tissue where the target gene is likely to be expressed (Fig. 1C).

The final step is narrowing down the number of candidate genes by analyzing the association between phenotype, SVs and gene expression patterns in multiple cultivars. In our example, a third cultivar (P3) with the same tuber color as P1 is included for the analysis. SVs including large deletions, insertions and inversions are determined based on differences in number and alignment depth of ONT long reads between cultivars and P-value of Fisher’s exact test (FET) (Fig. 1D). If the total number of ONT reads and alignment depth at the same genomic position in P1 and P2 are 5 Mb and 6 Mb and 12 and 0, respectively, the P-value of FET is calculated based on a 2 × 2 table described by 5,000,000/12 × 6,000,000/0. Both large insertions and inversions are determined by P-value of FET based on the ratio of edge to aligned ONT reads (Supplemental Fig. 2). If ratio of the alignment depth is 10 (P1) and 10 (P2) and edge of the aligned ONT reads is 0 (P1) and 10 (P2) at a given position, P-value of FET is calculated based on a 2 × 2 table described by 10/0 × 10/10. Regions with P-value <0.05 based on both statistical analyses are defined as regions that are structurally different between P1 and P2. For the analysis for SVs, short InDels (less than 5 bp) and SNPs are ignored because such variations cannot be accurately detected by ONT reads alone. For comparing gene expression patterns, RNA-seq data from various organs and conditions are obtained from multiple cultivars (Fig. 1E). RNA-seq reads are aligned to the assembled contigs, and numbers of the reads that align to exons of the predicted genes are normalized by calculating transcripts per kilobase million (TPM) values. P-values of statistical analysis performed on TPM at each exon are compared between cultivars to determine expression levels. The gene(s) having P-value of <0.05 in the analyses of both SVs and gene expression patterns are considered to high-confident candidates associated with tuber color traits. The candidate genes identified by Sat-BSA can be narrowed down further by functional annotation using homology analysis such as Blast (McGinnis and Madden 2004).

The pipeline for Sat-BSA (https://anaconda.org/bioconda/sat-bsa) is available in Bioconda, a public package manager for bioinformatics software (Grüning et al. 2018).

Sat-BSA for identifying genes controlling tuber color in B. rapa

For a proof-of-principle, we applied Sat-BSA to isolate genes controlling tuber color differences in three turnip cultivars ‘Akamaru (AKA)’, ‘Kanazawa Aokabu (KAN)’ and ‘Hinona (HIN)’ that produce red, white and white with purple top tubers, respectively (Fig. 2A). Because anthocyanins are associated with red and purple pigments in B. rapa, we reasoned that the red and purple colors of ‘AKA’ and ‘HIN’ might be due to differences in anthocyanin content or composition (Guo et al. 2014). HPLC analysis confirmed accumulation of the pelargonidin- and cyanidin-based anthocyanins in the red and purple tubers of ‘AKA’ and ‘HIN’, respectively (Fig. 2B). No accumulation of anthocyanins was detected in the white tubers of ‘KAN’. We independently crossed ‘AKA’ to ‘KAN’ and ‘HIN’ and generated F₁ plants that all produced white tubers with purple tops (aboveground tuber parts) and accumulated the cyanidin-based anthocyanin in both progenies.

Fig. 2.

The red and purple tuber phenotypes of the B. rapa cultivars ‘AKA’ and ‘HIN’ are associated with anthocyanin accumulation. (A) Tuber phenotypes of ‘AKA’, ‘KAN’, ‘HIN’ and F₁ (‘AKA’ × ‘KAN’ and ‘AKA’ × ‘HIN’) plants. (B) HPLC chromatogram showing the anthocyanidins detected in tubers of the three cultivars and their F₁ progeny shown in (A). Samples obtained from surfaces of top (aboveground) tuber parts were used for HPLC analysis. Purple and red arrowheads indicate the retention times at which standards of cyanidin and pelargonidin were detected, respectively. (C) Segregating ratio of tuber color in F₂ progenies derived from the crosses between ‘AKA’ × ‘KAN’ and ‘AKA’ × ‘HIN’. The F₂ individuals pooled to prepare bulk DNA samples for next generation sequencing are shown.

The segregation of tuber color was assessed in F₂ progeny obtained from selfing of F₁ plants. Of the 239 F₂ plants obtained from ‘AKA’ × ‘KAN’ cross, 175 and 64 plants produced red or purple and white tubers, respectively (Fig. 2C). This corresponded with mendelian segregation ratio of 3:1 (Chi square test, χ² = 0.40307, P = 0.5255, ns), suggesting a single dominant gene in ‘AKA’ is responsible for the tuber pigment difference between ‘AKA’ and ‘KAN’. Furthermore, the 175 F₂ plants with red or purple tubers had a purple/red ratio of 3:1 (135/40 purple/red; Chi square test: χ² = 0.42857, P = 0.5127, ns). The 232 F₂ individuals obtained from the cross between ‘AKA’ and ‘HIN’ produced either purple or red tubers with a purple/red ratio of 3:1 (185 purple and 47 red, Chi square test, χ² = 2.7816, P = 0.09535, ns), showing that purple color phenotype in ‘HIN’ is a dominant trait over the red color in ‘AKA’ (Fig. 2C). Taken together, the tuber color segregating ratio obtained from both F₂ progenies suggested that a single recessive gene in ‘AKA’ is responsible for red color phenotype in the pigmented tubers.

Sat-BSA identifies a gene responsible for anthocyanin accumulation in ‘AKA’ tubers

For identifying the genomic region associate with ‘AKA’ tuber phenotype, we applied QTL-seq analysis to a single bulk DNA sample pooled from 20 F₂ (‘AKA’ × ‘KAN’) individuals with white tubers (W-bulk) (Fig. 2C). The analysis was carried out using the QTL-seq pipeline version 2s (https://www.ishikawa-pu.ac.jp/staff/staffname/takagi-horoki/, http://genome-e.ibrc.or.jp/home/bioinformatics-team/mutmap). First, “AKA-reference” was developed by replacing nucleotides of the publicly available reference genome for B. rapa “BrapaV3.0.PacBio.Chr.fa” (http://brassicadb.org/brad/downloadOverview.php) with those of ‘AKA’ nucleotides at all homozygous SNP positions. Next, NGS short reads obtained from ‘KAN’, F₁ (‘AKA’ × ‘KAN’) and W-bulk (white tuber bulk) DNA were aligned to “AKA-reference” (Supplemental Table 2). The SNP-index values of W-bulk were calculated genome-wide and spurious SNPs were filtered out with alignment data of ‘KAN’ and the F₁ as previously described (Itoh et al. 2019). Accordingly, we detected a single peak with the highest SNP-index average of 1 within 21.70–23.70 Mb interval on chromosome A07 (Fig. 3A, Supplemental Fig. 3), confirming the 3:1 phenotype segregation ratio. Because the SNP-index values obtained on “AKA-reference” represent the frequencies of ‘KAN’ allele in the bulk DNA sample, the frequency of ‘KAN’ allele within 21.70–23.70 Mb should be closed to 1 among the individuals pooled for W-bulk.

Fig. 3.

Sat-BSA identifies a gene responsible for tuber color in B. rapa. (A) The candidate genomic region identified on chromosome A07 by QTL-seq. The SNP-index values were calculated using Illumina short reads obtained from a bulk DNA sample pooled from 20 F₂ individuals (‘AKA’ × ‘KAN’) with white tubers and aligned to “AKA-reference”. The green dots represent a SNP-index value at a given SNP position. Red line represents the sliding window average SNP-index values of 2 Mb interval with 50 kb increment. The gray lines are the 95% confidence interval of SNP-index value under the null hypothesis of a randomly bulked DNA sample. The red shadow indicates the candidate genomic region harboring the highest average SNP-index value within a 2 Mb region. (B) A summary of contigs generated by local de novo assembly of ONT reads that mapped to the candidate genomic region identified in (A). (C) The tuber tissues used for RNA-seq to identify differentially expressed genes. The tissue used for gene prediction by StringTie is indicated by arrow with dot line. For tissues designated by ‘×2’, samples from two independent plants were used to construction of two RNA-seq libraries. (D) Snapshot of IGV showing the STRG.20 gene region aligned with ‘AKA’, ‘KAN’ and ‘HIN’ ONT reads (top panel). The lower panel shows P-values of FET based on depth of aligned ONT reads. Red and purple lines represent P-values calculated by comparing ‘KAN’ alignment depth with that of ‘AKA’ and ‘HIN’, respectively. The gray horizontal line shows the P-value of 0.05. (E) Snapshot of IGV showing the STRG.20 gene region aligned with Illumina RNA-seq short reads obtained from different tuber parts of ‘AKA’, ‘KAN’ and ‘HIN’.

To reconstruct the candidate genomic region (A07: 21.70–23.70 Mb) for ‘AKA’, we sequenced ‘AKA’ genome using the Oxford Nanopore Technologies sequencer MinION. We then aligned 1,136,281 ‘AKA’ ONT reads to “BrapaV3.0PacBio.Chr.fa”, obtained 12,202 ONT reads that were successfully aligned to the candidate region, and then de novo assembled “the locally aligned ONT reads” using Canu (Koren et al. 2017) (Fig. 3B, Supplemental Tables 3, 4). The assembly generated 191 contigs with a total length of 3.5 Mb, which was longer than the 2.0 Mb candidate genomic region defined by QTL-seq. Next we aligned RNA-seq reads of ‘AKA’ tuber surface, which is expected to express the causal genes, on the putative 481 protein-coding genes predicted by StringTie in the 191 contigs (Pertea et al. 2015) (Fig. 3C, Supplemental Table 2). To narrow down the list of candidate causal genes, the association between tuber color and SVs was analyzed using ‘AKA’, ‘KAN’ and ‘HIN’ ONT reads aligned to the assembled contigs. By comparing ONT reads alignment depth in these cultivars, we identified 50 genes that are structurally different in ‘KAN’ compared to the other two cultivars (P < 0.05, FET) (Fig. 3D). We further analyzed the association between tuber pigmentation and gene expression levels in these three cultivars using RNA-seq data derived from top (aboveground) and lower (belowground) parts of tuber (Fig. 3E). Of the 481 predicted genes, 36 exhibited statistically higher TPM values in red/purple tissues than those in white tissues (Wilcoxon rank sum test: P < 0.05). Importantly, 9 out of the 36 genes showed significant SVs, suggesting they are high-confident candidate genes for ‘AKA’ tuber color (Supplemental Table 6).

Blastx analysis of the 9 candidate genes in plant amino acid database (http://pgdbj.jp) identified a gene we tentatively named as STRG.20. In the database, STRG.20 showed homology to R2R3-Myb, a positive regulator of anthocyanin accumulation in Arabidopsis (Gonzalez et al. 2008). We also found that sequence of STRG.20 is identical to that of a Myb-class transcription factor gene, BrMyb2, recently reported to be associated with anthocyanin accumulation in Chinese cabbage (He et al. 2020). BrMyb2 is a homolog of Arabidopsis thaliana PRODUCTION OF ANTHOCYANIN PRODUCTION1 (PAP1)/AtMYB75 and Petunia hybrida ANTHOCYANIN2 (AN2) (Borevitz et al. 2000, He et al. 2020, Quattrocchio et al. 1999). Alignment of the ONT reads from ‘KAN’, ‘AKA’ and ‘HIN’ revealed that approximately 16 kb region containing the STRG.20 locus was deleted in ‘KAN’ (Fig. 4A, 4B, Supplemental Fig. 4). Using RNA-seq data and qRT-PCR, we confirmed that STRG.20 is expressed in tubers of ‘AKA’ and ‘HIN’, but not in ‘KAN’ (Figs. 3E, 4C). Taken together, our results suggested that STRG.20 is the causal gene controlling tuber color in ‘AKA’ and that deletion of this gene in ‘KAN’ lead to the formation of white tubers.

Fig. 4.

Functional analysis of STRG 20. (A) The structure of STRG 20 locus. Open and gray boxes indicate exons and UTRs of STRG 20, respectively. Primer pairs used for confirming presence of STRG 20 (Fw1 and Rv1) and qRT-PCR (Fw3 and Rv3) were shown by arrows. (B) PCR amplifications of the STRG 20 locus in ‘AKA’, ‘KAN’ and ‘HIN’ using the Fw1/Rv1 primer pairs. The black arrowhead indicates expected sizes of the PCR products. (C) qRT-PCR analysis of the relative expression level of STRG.20 in ‘AKA’ and ‘KAN’. BrEF1α was used as internal control for estimating relative expression levels. Values are mean of three biological replicates, and bars represent standard deviations. Asterisk indicates significant difference (Student t-test, ** P < 0.01). (D) Phenotypes of Arabidopsis WT (Col-0) and a transformants over expressing STRG.20 under CaMV 35S promoter (35S::BrMyb2). Pictures were taken 40 days after sowing. (E) HPLC chromatograms showing anthocyanidin profiles in the roots of WT and the transformants described in (D). The purple arrowhead corresponded to retention time for the detection of cyanidin standards.

To confirm the function of STRG.20, we generated transgenic lines expressing ‘AKA’ STRG.20 under the CaMV 35S promoter in Arabidopsis (Supplemental Figs. 1, 5). The transgenic lines overexpressing STRG.20 produced deep purple colored roots, which was consistent with the phenotype of the Arabidopsis transgenic lines expressing BrMyb2 from Chinese cabbage (He et al. 2020) (Fig. 4D). HPLC analysis confirmed that the purple root phenotype of transgenic lines overexpressing STRG.20 was due to the accumulation of cyanidin-based anthocyanin (Fig. 4E). These results demonstrated that STRG.20 is functionally expressed to activate anthocyanin biosynthesis in tubers of ‘AKA’, whereas it is not expressed in that of ‘KAN’ in the recessive manner, suggesting that the SV specifically occurred in ‘KAN’ is the causative for the transcriptional silence in the tuber. Consequently, in this paper, we named the ‘KAN’ and ‘AKA’ alleles of STRG.20 as KAN-Brmyb2 and AKA-BrMyb2, respectively.

Identification of the gene for red tubers of ‘AKA’

We further applied Sat-BSA to identify the gene responsible for the accumulation pelargonidin-based anthocyanin in the red tubers of ‘AKA’. The purple/red ratio of 3:1 we observed in F₂ progeny developed by crossing ‘AKA’ to ‘KAN’ and ‘HIN’ suggested a single recessive allele of ‘AKA’ is responsible for accumulation of pelargonidin-based anthocyanin (Fig. 2C). We prepared two bulk DNA samples from 20 F₂ individuals with red tubers (R-bulk) in both progeny for NGS. QTL-seq analyses of both samples identified the same genomic region with significantly low SNP-index values within a 2.0 Mb interval on chromosome A10 (A10: 17.65–19.65 Mb) of ‘AKA’ (Fig. 5A, Supplemental Figs. 6, 7). This suggested that segregation of purple and red phenotypes in both progeny is controlled by the same gene with functional alleles in ‘KAN’ and ‘HIN’. To reconstruct the candidate genomic region harboring the functional allele of ‘KAN’ by local de novo assembly, we aligned 1,031,330 ‘KAN’ ONT reads to the B. rapa reference genome “BrapaV3.0PacBio.Chr.fa” and identified 8,206 reads to be specifically mapped to the candidate genomic interval (Supplemental Fig. 8, Supplemental Tables 3, 4). We assembled the 8,206 reads de novo, generated an assembly consisting of 153 contigs in a total length of 3.6 Mb and predicted 610 protein-coding genes using RNA-seq data obtained from the top purple portion of ‘HIN’ tubers. By comparing the depth of ONT reads on the 610 predicted genes between ‘AKA’, ‘KAN’ and ‘HIN’, we identified 69 genes with ‘AKA’-specific SVs (P-value of FET in depth showed <0.05).

Fig. 5.

Sat-BSA identifies the gene responsible for the red tuber phenotype of ‘AKA’. (A) SNP-index plots showing the same candidate genomic intervals independently identified by QTL-seq analysis of bulk DNA samples (R-bulk) prepared from individuals from two F₂ progenies with red colored tuber phenotype described in Fig. 2C. Green dots indicate SNP-index values at a given SNP position. Red lines represent the sliding window average SNP-index values of a 2 Mb interval with 50 kb increment. The gray lines show the 95% confidence interval of SNP-index value under the null hypothesis of randomly bulked F₂ DNA. Intervals in red shadows show regions with the lowest SNP-index values. (B) A snapshot of IGV genome viewer showing the genomic region of STRG.573 (KAN-BrF3ʹH), the candidate gene identified by Sat-BSA. The structure of KAN-BrF3ʹH shown using yellow boxes (Exons and UTRs) and lines (introns) (top). The 4,375 bp TE insertion in the second exon of KAN-BrF3ʹH was shown. The KAN-BrF3ʹH region aligned with ONT reads of ‘AKA’, ‘KAN’ and ‘HIN’ (bottom). (C) A graph describing P-value calculated based on Fisher’s exact test of alignment depth differences between the three cultivars. Green and purple lines represent P-values calculated by comparing ‘AKA’ alignment depth with that of ‘KAN’ and ‘HIN’, respectively. The gray horizontal lines show the P-value of 0.05. (D) Genetic complementation of Arabidopsis f3ʹh mutant with KAN-BrF3ʹH. Phenotypes of mature siliques of WT (Col-0), f3ʹh mutant (‘CS2105577’) and ‘CS2105577’ transformed with KAN-BrF3ʹH, and AcGFP1 (control) were shown (left). HPLC chromatograms showing anthocyanidin profiles in the same siliques (right). The purple and red arrowheads correspond to retention times for the detection of cyanidin and pelargonidin standards, respectively.

We surveyed differentially expressed genes (DEGs) between RNA-seq data of pelargonidin-based red tubers (top and lower parts) and red leaf veins of ‘AKA’, and cyanidin-based purple top of ‘HIN’ tubers by comparing TPM values of expressed genes between the two tissue types, and identified 255 DEGs (P < 0.05, Student t-test). Of these 255 DEGs, 41 DEGs also showed significant SVs between cultivars, suggesting these are the most likely candidates associated with pelargonidin accumulation (Supplemental Table 7). Of the 41 candidate genes, STRG.573 showed the largest difference in average TPM value between the two sample types. The third exon of STRG.573 in ‘HIN’ tissue was highly expressed with the TPM value of 3976.3, while the value was <4 in the ‘AKA’ tissue where it does not express. (Supplemental Fig. 8). STRG.573 was annotated as a flavonoid 3ʹ-hydroxylase (F3ʹH), a cytochrome P450 enzyme crucial for cyanidin-type anthocyanin biosynthesis (Tanaka 2006). We surveyed genomic structure of STRG.573 in detail and identified several sequence variations, one of which is a specific insertion of transposable element (TE) in the second exon of ‘AKA’ STRG.573 (Fig. 5B, 5C). A PCR with the mixed primers spanning the TE insertion site amplified a PCR product containing TE sequence in ‘AKA’ that was longer than the PCR products obtained from ‘HIN’ and ‘KAN’ (Supplemental Fig. 9). Sanger sequencing of the longer PCR product of ‘AKA’ revealed that the TE insertion is 4,375 bp long and encodes a DNA type transposon (class II Harbinger transposon) (Supplemental Fig. 5). RNA-seq reads of ‘AKA’ aligned only to first and second exons of STRG.573, while reads from ‘HIN’ covered all exons with enough depth, suggesting that the TE insertion in ‘AKA’ caused truncation of STRG.573 transcript leading to loss-of-function (Supplemental Fig. 8), while the ‘KAN’ and ‘HIN’ STRG.573 are functional as F3ʹH.

To verify the function of STRG.573, we carried out a genetic complementation test by transforming the Arabidopsis loss of function mutant ‘CS2105577’, which carries a T-DNA insertion in the second exon of F3ʹH, with STRG.573 from ‘KAN’ under the CaMV 35S promoter (Supplemental Figs. 1, 5). WT and ‘CS2105577’ plants produced purple and red siliques that corresponded to the accumulation of cyanidin and pelargonidin-based anthocyanin, respectively (Fig. 5D). The ectopic expression of STRG.573 from ‘KAN’ successfully complemented phenotypes of ‘CS2105577’, which produced purple siliques that accumulated the cyanidin-based anthocyanin. In contrast, siliques of ‘CS2105577’ transformed with AcGFP1 (control) remained red by pelargonidin-based anthocyanins. These results confirmed that STRG.573 from ‘KAN’ function as F3ʹH and is responsible for the red and purple tuber phenotypes in turnip. We accordingly named the ‘KAN’ and ‘AKA’ alleles of STRG.573 as KAN-BrF3ʹH and AKA-Brf3ʹh, respectively. Further analyses of diverse Japanese turnip landraces revealed that the TE insertion in AKA-Brf3ʹh is common among landraces that produce red tubers (Supplemental Fig. 9), suggesting that red colored turnips cultivated in Japan were originally selected from a landrace with the TE insertion in BrF3ʹH.

Functions of AKA-BrMyb2 and AKA-Brf3ʹh in B. rapa

The functions of AKA-BrMyb2 and AKA-Brf3ʹh in B. rapa were further studied by developing near isogenic lines (NILs) carrying one or both of the genes in ‘KAN’ (white tubers) genetic background. We backcrossed an F₁ derived from a cross between ‘AKA’ and ‘KAN’ to ‘KAN’ four times followed a single selfing generation and confirmed inheritance of the desirable alleles at each step using DNA-markers that targeted polymorphic sites in each allele. Theoretically, BC₄F₂ should carry more than 96% of ‘KAN’ genome due to the five times crossing with ‘KAN’. We accordingly defined BC₄F₂ lines carrying different combinations of BrMyb2 and BrF3ʹH alleles (BrMyb2:BrF3ʹH) as ‘KAN’:‘KAN’, ‘KAN’:‘AKA’, ‘AKA’:’KAN’ and ‘AKA’:‘AKA’ and named them NIL-KK, NIL-KA, NIL-AK and NIL-AA, respectively (Fig. 6A). We assessed tuber color in these NILs and observed that while NIL-KK and NIL-KA, both carrying the KAN-Brmyb2 allele, produced white tubers, whereas NIL-AK and NIL-AA that carried the AKA-BrMyb2 allele produced purple and red tubers, respectively. We confirmed by HPLC analysis that the purple tuber of NIL-AK and red tuber of NIL-AA were associated with the synthesis of cyanidin and pelargonidin, respectively (Fig. 6B). Additionally, we cultivated BC₄F₃ generation of both NIL-AK and NIL-AA and these produced the pigmented petioles with less color intensity compared with petioles of ‘AKA’ (Supplemental Fig. 10). These demonstrated that both AKA-BrMyb2 and AKA-Brf3ʹh take parts in anthocyanin biosynthesis, comparable to their homologues in A. thaliana (Borevitz et al. 2000, Schoenbohm et al. 2000). Additionally, comparing the phenotypes of NIL-KA and NIL-AA suggested that AKA-BrMyb acts upstream of AKA-BrF3ʹH in the control of tuber pigmentation. Important to note, both NIL-AK and NIL-AA accumulated anthocyanin only on the aboveground parts of tubers, suggesting light-dependent activation of anthocyanin biosynthesis in B. rapa.

Fig. 6.

Characterization of near isogenic lines (NILs) carrying one or both of AKA-BrMyb2 and AKA-f3ʹh in ‘KAN’ genetic background. (A) Tuber (left) and seed (right) phenotypes of the various NILs. Allele combination is shown for each NIL. (B) HPLC chromatogram showing the anthocyanidins detected in each NIL tuber. The purple and red arrowheads correspond to retention times for the detection of cyanidin and pelargonidin standards, respectively. (C) Flavonoid biosynthetic pathway in plants. Intermediates and genes are indicated in black and blue, respectively. The number of black arrows correspond to the number of homologous identified for each structural gene in B. rapa “reference genome” by homology analysis with Blastn. CHS, chalcone synthase; CHI, chalcone isomerase; F3H, flavanone 3-hydroxylase; F3ʹH, flavonoid-3ʹ-hydroxylase; DFR, dihydroflavonol 4-reductase; ANS, anthocyanidin synthase.

We also observed phenotypic variations in our NILs for seed coat color. The seed coat color of Brassicaceae is primarily determined by flavonoid accumulation (Routaboul et al. 2006). In Brassicaceae, the flavonoids that determine seed color are derivatives of kaempferol and quercetin, which are responsible for brown and yellow colors, respectively. F3ʹH catalyzes the conversion of dihydroquercetin (a precursor of quercetin and pelargonidin) to dihydrokaempferol (a precursor of the kaempferol and cyanidin). The NIL-KK and NIL-AK produced dark brown seeds, while NIL-KA and NIL-AA produced yellow seeds. Taken together, these suggested that BrMyb2 mainly regulates the flavonoid synthetic pathway in aboveground part of tubers and petioles, whereas BrF3ʹH functions both in tubers and seeds.

Discussion

The identification of natural genetic variations associated with agronomically important traits is vital not only for accelerating breeding but also for understanding the molecular mechanisms underlying these traits. Here, we demonstrated that Sat-BSA is an efficient NGS-based BSA technique for identifying SVs responsible for phenotypic differences in cultivars. The reconstruction of local genomic structures based on local de novo assembly of long reads is an essence of Sat-BSA free from a reference genome that necessarily possesses unique structural variations. This enables detection of characteristic SVs within multiple cultivars that are not efficiently detected by a single reference-based BSAs (Takagi et al. 2013).

Local de novo assembly can be executable with low coverage of long reads. Because B. rapa underwent whole genome triplication (Cheng et al. 2013), homologous and repetitive regions make whole genome-scale de novo assembly difficult (Sohn and Nam 2018). In this study, we only needed long reads that provided less than 20× coverage of the B. rapa genome for de novo assembly to accurately reconstruct the genomic regions harboring AKA-BrMyb2 and KAN-BrF3ʹH. The sequence of KAN-BrF3ʹH obtained by local de novo assembly was the same as the Bra009312 sequence in “BrapaV3.0PacBio.Chr.fa” genome (Guo et al. 2014). Whereas, AKA-BrMyb2 has a 3,606 bp deletion within the first intron compared to the sequence of BrMyb2 predicted in “BrapaV3.0PacBio.Chr.fa”. Additionally, the contig (tig00000001) containing AKA-BrMyb2 has another BrMyb2 homolog, which we named as “STRG.48” that shares 99.9% homology with AKA-BrMyb2. AKA-BrMyb2 and STRG.48 are located 100.4 kb apart, suggesting that they are duplicates that occurred in ‘AKA’ genome (Supplemental Fig. 11). However, their functional relationship remains to be elucidated. This duplicated region of AKA-BrMyb2 is structurally different from the “BrapaV3.0PacBio.Chr.fa” reference sequence. It should be noted that whole genome de novo assembly using the ONT reads failed to reconstruct genomic region containing AKA-BrMyb2 and its duplicate, STRG.48. Therefore, local de novo assembly of selected long reads in Sat-BSA is a useful means to accurately reconstruct local genomic sequence by reducing genomic complexity and calculation loads. Recently, Voichek and Weigel (2020) reported a technique for genome-wide associate mapping/study (GWAS) targeting SVs in genomes. The authors used short reads-guided de novo assembly to link SVs to phenotypic variations, which is conceptually similar to that of the local de novo assembly to identify the causal SVs missing in the reference genome that we demonstrate herein. As expected, size of their assembled contigs is shorter (<2 kb) than the contig sizes described in this study (max size 920 kb in the contig for KAN-BrF3ʹH) due to the short reads lacking SV flanking regions. Thus, local de novo assembly of long reads enable to reconstruct relatively large genomic SVs that are, in principle, difficult to detect by short reads only. This feature of Sat-BSA is particularly suitable for analyzing large genomes that are rich in repetitive sequences.

Depth comparison of the aligned ONT reads was effective for detecting SVs in Sat-BSA. Although other genetic variations such as large InDels, recombination, rearrangements and copy number variations that are also common genetic features in plant genomes would contribute to trait diversity in crops including B. rapa (Nishida et al. 2013, Qian et al. 2016, Sun et al. 2018, Wang et al. 2016), most of NGS-based BSAs are based on SNPs for identification of candidate genes due to use of short reads. Sat-BSA provides an opportunity to solve genetic basis of phenotypic traits caused by large SVs among landraces and cultivars, and is complementary to classical SNPs-based BSAs using short reads. It is therefore important to consider SVs when analyzing segregating progeny obtained from crosses between diverse cultivars/landraces for identifying the genetic changes underlying traits of interest. In this study, we applied Sat-BSA to successfully identify the large deletion in KAN-Brmyb2 and 4,375 bp insertion in AKA-Brf3ʹh that caused loss of gene function. The depth comparison of long reads in Sat-BSA is an essential process for identifying SVs, however it is still difficult to capture SNPs and short InDels that are less than 5 bp by using long reads alone due to relative low quality natures of the long reads of the present Oxford Nanopore Technologies or PacBio compared to that of Illumina short reads. However, sequence quality of long reads in the transcribed regions can be improved by alignment of high-quality Illumina short reads obtained from RNA-seq (also known as polishing process), allowing the accurate detection of SNPs and short InDels.

In multiple analyses of the association between phenotypes, SVs and gene expression patterns in cultivars, clarifying genetic interaction and mode-of-inheritance of traits in each cultivar is prerequisite to decide the proper cultivars for comparison. Based on segregation ratio of the two F₂ progenies used in this study (‘AKA’ × ‘KAN’ and ‘AKA’ × ‘HIN’), we were able to identify the cultivars carrying functional alleles of the genes responsible for both tuber phenotypes. Genetic analysis is important to distinguish cases whether similar phenotype is conferred by different genetic basis or not. For example, two cultivars could both produce white tubers due to two independent mutations in different genes inhibiting pigment synthesis. The analysis of association in Sat-BSA is not applicable to such phenotypes. That is why genetic analyses based on segregating progeny obtained from multiple combination of cultivars is important for Sat-BSA. In the case that the same gene causes different phenotypes or results in varying trait values in different cultivar combinations, Sat-BSA can be applied for the study of both qualitative and quantitative traits.

In B. rapa, the isolation of genes controlling tuber pigmentation is important for breeding cultivars with the desired phenotype via MAS and for understanding the molecular mechanisms underlying tuber color development. In both F₂ progenies we studied, anthocyanin phenotypes in pigmented tubers were further divided into those whose entire tubers were pigmented and others that accumulated anthocyanins in the upper or aboveground tuber parts only (Supplemental Fig. 12). This phenotypic variation in tubers suggests that additional genes are required for anthocyanin accumulation in B. rapa tubers dependent on light condition. Light-independent anthocyanin accumulation might be attractive trait for breeding crops with the desired tuber color phenotype even when grown under low light conditions. Further application of Sat-BSA is the analysis of bulk DNA samples pooled from individuals with fully pigmented tubers would lead to the identification of causal gene responsible for light-independent anthocyanin accumulation in ‘AKA’.

Here we demonstrated application of Sat-BSA in a non-model, but a commercially valuable crop, B. rapa. Approximately 40% of flowering plant species including B. rapa shows self-incompatibly, which makes the development of mutant lines for large-scale genetic screens very difficult (Fujii et al. 2016). Consequently, reliable forward genetic approaches applicable to segregating progenies from self-incompatible crops have significant importance for crop improvement. We envisage Sat-BSA to be an efficient forward genetics approach, opening novel opportunities 1) for identifying novel genes underlying phenotypic variations among cultivars, landraces and ecotypes, and 2) for MAS-based breeding of various commercially valuable crops in the fluctuating global climates and markets.

Author Contribution Statement

TS performed Sat-BSA, next generation sequencing and biological analysis, CN performed next generation sequencing, MT conceived and supervised the entire study, YS developed Sat-BSA pipeline, AA performed next generation sequencing, HS observed the segregation for tuber color in the progeny developed between cultivars, NI developed Sat-BSA pipeline, MA developed the progeny between cultivars, AU performed next generation sequencing, KO performed next generation sequencing, HU performed next generation sequencing, AI-K performed HPLC analysis, TI developed transgenic lines, MM developed transgenic lines, RT conceived and supervised the entire study, HT designed the research, supervised the study and wrote the paper. All authors contributed to the manuscript writing.

Acknowledgments

This work was supported by the JSPS KAKENHI [grant number 15H05611] to HT. Computations were partially performed on the NIG supercomputer at ROIS National Institute of Genetics (Mishima, Shizuoka, Japan). Local cultivars/landraces of Brassica rapa used in this study were obtained from the National Agriculture and Food Research Organization (NARO) Genebank and Matsushita seed.

Literature Cited

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）