Breeding Science
Online ISSN : 1347-3735
Print ISSN : 1344-7610
ISSN-L : 1344-7610
Reviews
Comparative genomics of Brassicaceae crops
Ashutosh SharmaXiaonan LiYong Pyo Lim
Author information
JOURNAL FREE ACCESS FULL-TEXT HTML

2014 Volume 64 Issue 1 Pages 3-13

Details
Abstract

The family Brassicaceae is one of the major groups of the plant kingdom and comprises diverse species of great economic, agronomic and scientific importance, including the model plant Arabidopsis. The sequencing of the Arabidopsis genome has revolutionized our knowledge in the field of plant biology and provides a foundation in genomics and comparative biology. Genomic resources have been utilized in Brassica for diversity analyses, construction of genetic maps and identification of agronomic traits. In Brassicaceae, comparative sequence analysis across the species has been utilized to understand genome structure, evolution and the detection of conserved genomic segments. In this review, we focus on the progress made in genetic resource development, genome sequencing and comparative mapping in Brassica and related species. The utilization of genomic resources and next-generation sequencing approaches in improvement of Brassica crops is also discussed.

Introduction

Comparative genomics is the study of similarities and differences at the genomic level to make inferences about the functions and evolution of various biological processes. This is an important field to study genome evolution, sequence collinearity and transfer of information from extensively studied model organisms to species of commercial interest. Genome sequencing of Arabidopsis, a member of the Brassicaceae family, has revolutionized our knowledge in every field of plant biology and laid a foundation for genomics and comparative biology.

The family Brassicaceae is one of the major groups of the plant kingdom, comprising of 340–360 genera and over 3,700 species distributed worldwide (Warwick et al. 2006). Many species within the family are of great economic, agronomic and scientific importance. Some examples of these include the following: Brassica napus and B. juncea (oilseed crops); B. rapa (turnip, leaf vegetable); B. olercaea (cabbage, cauliflower, Kale, broccoli); Raphanus sativus (vegetable) and Arabidopsis thaliana (model plant). The six most cultivated species of the genus Brassica comprises the three diploid genomes of B. rapa (AA, 2n = 20), B. nigra (BB, 2n = 16) and B. oleracea (CC, 2n = 18) together with three amphidiploid species, B. juncea (AABB, 2n = 36), B. napus (AACC, 2n = 38) and B. carinata (BBCC, 2n = 34). Cytogenetics and hybridization studies have demonstrated that amphidiploid species are natural hybrids of diploids and the six Brassica species are interlinked (Fig. 1, U 1935). Genome evolution and comparative sequence analysis of Brassicaceae have also confirmed the interrelationship of the six Brassica species at the molecular level (Schmidt and Bancroft 2011). In Brassicaceae, genomic studies are mainly focused on cultivated Brassica and their diploid progenitor species, which are compared with the Arabidopsis genome. With the inception of the Multinational Brassica Genome Project (MBGP) in 2002, the international Brassica community agreed to develop more resources for Brassica crops and genome sequencing. In the last decade, significant advances have been made in generation of genomics resources and translational research, which will aid Brassica crop improvement (Augustine et al. 2013, Schmidt and Bancroft 2011). Recently, sequencing of B. rapa and ancestral diploids of Brassica has expanded comparative genomic studies, providing resources for the identification of candidate genes of agronomic traits. Comparative mapping of Brassica species with the Arabidopsis genome helps in understanding conserved genetic architecture and genome evolution and the identification and functional analysis of genes for important agronomic traits. Genome-wide synteny analyses between the Arabidopsis and Brassica A, B, and C genomes have identified conserved chromosomal blocks and elucidated genome rearrangements and karyotype diversification.

Fig. 1

Genomic relationships among six cultivated Brassica species represented by ‘Triangle of U’. Adapted from UN, 1935.

Next-generation sequencing (NGS) techniques have been utilized to develop cost-effective and efficient methods for single nucleotide polymorphism (SNP) discovery, genotyping and gene expression studies. In some Brassica species, these techniques have been used for the identification of SNP markers and the construction of linkage maps. Transcriptome analysis has also been used to find different gene expression profiles in response to abiotic and biotic stress and in understanding gene regulatory mechanisms. In this review, we emphasize the advancement of resource development in Brassicaceae, comparative mapping and the recent progress made in sequencing Brassica and related species. We focus on the genomics and genetic improvements made in six cultivated crops of Brassica and Raphanus.

Linkage maps are a prerequisite of comparative mapping

Linkage maps are valuable sources for identification and map-based cloning of important genes, analyses of QTLs for agronomic traits, and comparative mapping. Linkage maps have been developed in almost all the major crops with the advancement of DNA markers, such as RFLP (Restriction Fragment Length Polymorphism), AFLP (Amplified Fragment Length Polymorphism) and sequence-based markers like SSR (Simple Sequence Repeats) and SNPs. Comparative studies of linkage maps between species are useful in predicting diversity, genome evolution and organization. A number of genetic linkage maps have been generated in Brassica, utilizing different sets of markers and mapping populations. Linkage maps provide a basis for genetic architecture analysis of the genome and sequence-based marker data facilitate comparative mapping and the study of genomic relations among species.

In Brassica rapa, during the last decade, more than ten linkage maps have been developed mainly based on molecular markers (RFLP, RAPD, AFLP, and SSR) using different mapping populations (Kim et al. 2006, Lou et al. 2008, Sakamoto et al. 2008, Song et al. 1991, Suwabe et al. 2004, 2006, Wang et al. 2004), which has made comparative analysis with each other difficult without a common reference map. Choi et al. (2007) constructed the first reference genetic map for B. rapa using doubled haploid lines derived from a cross between two diverse Chinese cabbage (B. rapa ssp. pekinensis) inbred lines, “Chiifu-401-42” and “Kenshin-402-43”. The reference linkage map was updated with the addition of 156 BAC-end SSR markers (Kim et al. 2009) and subsequently was used for high-density integrated map construction (Li et al. 2010). Recently, the genome of the B. rapa inbred line Chiifu-401-42 has been completely sequenced under the Brassica rapa Genome Consortium (Wang, X. et al. 2011) and the reference linkage map has facilitated assignment of sequence scaffolds to the chromosomes.

Brassica oleracea, representing the C genome of Brassica, comprises various vegetables, one of which, cabbage (B. oleracea var. capitata), has been considered for the genome sequencing project. In B. oleracea, more than ten linkage maps have been developed using RFLP, AFLP or SSR markers in different mapping populations (Iniguez-Luy et al. 2009, Okazaki et al. 2007, Schmidt and Bancroft 2011). Integrated maps in B. olercaea have also been constructed with RFLP and AFLP markers by Kianian and Quiros (1992) and Sebastian et al. (2002). Available expressed sequence tag (EST) sequences of Arabidopsis and Raphanus have also been explored to construct several other maps and have allowed comparison of the B. oleracea genome with the Arabidopsis genome (Ashutosh et al. 2012, Babula et al. 2003, Kifuji et al. 2013, Kowalski et al. 1994, Lan et al. 2000). A high-density linkage map using Sequence-Related Amplified Polymorphism (SRAP) markers was developed in B. oleracea and identified QTLs of curd formation in cauliflower (Gao et al. 2007). In Brassica, 56,465 non-redundant SSR markers identified from B. oleracea whole-genome shotgun sequences were preferentially located on the C genome, and of these 752 markers showed polymorphism among six B. napus varieties (Li, H. et al. 2011). As the B. oleracea genome sequencing project was launched, a high-density reference map was drafted including 602 SSRs and 625 SNP markers generated from whole-genome shotgun sequences by NGS, covering 1197.9 cM (Wang, W. et al. 2012). This is also the first map that has allowed the assembled scaffold to be anchored to pseudochromosomes, which has significantly contributed to Brassica genome studies.

Brassica nigra (BB), one of the diploid Brassica, has not been studied extensively at the genomic level relative to other Brassica species despite a rich source of agronomically important genes in terms of disease resistance, drought tolerance and seed oil quality (Chevre et al. 1996, Sjödin and Glimelius 1989, Struss et al. 1996). In B. nigra, the first linkage map was developed by Truco and Quiros (1994) using isozymes, RFLP and RAPD markers. In total, 124 markers were assigned to 11 linkage groups covering a total distance of 677 cM. A comprehensive linkage map of B. nigra was constructed with 160 DNA probes from Arabidopsis and identified 284 homologous loci covering a 750 cM region (Lagercrantz 1998).

Brassica napus (AACC), a major oilseed crop, is a highly accessed Brassica in terms of genetics and genomics. More than 30 linkage maps have been developed using various types of mapping populations and different molecular markers for different agronomic traits. The linkage maps in B. napus were developed using AFLP (Mei et al. 2009, Qiu et al. 2006, Radoev et al. 2008), RFLP (Parkin et al. 1995, Uzunova et al. 1995), SRAP (Sun et al. 2007), and SSR (Piquemal et al. 2005, Wang, J. et al. 2011) markers for developmental traits, seed quality and disease resistance. These maps have provided valuable information for rapeseed improvement and also in genome structure analysis. Recently, a high-density SNP linkage map, consisting of 5,764 SNP and 1603 PCR markers, was developed by integrating four DH populations to detect polymorphism level and linkage disequilibrium across different collections ( Delourme et al. 2013).

Brassica juncea (AABB), one of the six cultivated Brassica, is the major oilseed crop of India. Relative to B. napus, genetic and genomic studies in B. juncea have been done less intensively, but in recent years the international community has given more attention to B. juncea because of its resistance to salinity and seed shattering. Early linkage maps in B. juncea were developed with RFLP and AFLP markers to investigate various traits (Axelsson et al. 2000, Christianson et al. 2006, Pradhan et al. 2003). A high-density linkage map in B. juncea was developed using AFLP, RFLP, SSR and gene-based markers with a total of 1,148 loci covering 1,840 cM of 18 linkage groups (Ramchiary et al. 2007). Although these linkage maps were useful for breeding and tagging of important traits, they provided limited information for comparative mapping.

Recent work on B. carinata (BBCC), one of the six cultivated amphidiploids, suggests it has better adaptability and productivity in semi-arid and temperate areas compared to oilseed rape. Being resistant to various diseases and biotic stress, B. carinata is suitable to cultivate in temperate environments (Getinet et al. 1996) and is also a potential crop in biofuel production (Cardone et al. 2003). Although genetic diversity analysis of this species was carried out, limited work has been done on genomic studies. Recently, a linkage map of B. carinata has been constructed and 212 loci were assigned to seventeen linkage groups covering a region of 1703 cM (Guo et al. 2012).

Raphanus sativus (radish), a member of Barssicaceae, is used all over the world as a vegetable crop with an edible taproot. Although Raphanus is an economically important crop, genetic and genomic research has not progressed as in B. rapa and B. napus. A number of genetic maps have been developed in R. sativus using RFLP, AFLP, SSR and EST-SNP to analyze QTLs for disease resistance, root shape, flowering time, and pigmentation (Bett and Lydiate 2003, Budahn et al. 2009, Hashida et al. 2013, Kamei et al. 2010, Tsuro et al. 2005, Yu, X. et al. 2013, Zou, Z. et al. 2013). EST-based SNP and SSR were utilized to construct dense linkage maps and alignment of marker sequences to known Brassica sequences identified extensive chromosome homoeology among Brassicacae (Li, F. et al. 2011, Shirasawa et al. 2011). The Brassica SSR and BAC-end sequence markers have also been explored in R. sativus in identification of QTLs for Fusarium wilt resistance trait (Yu, X. et al. 2013).

Comparative mapping for identification of conserved genomic segments

Arabidopsis, a member of Brassicacaeae, is closely related to Brassica at the genomic sequence level, and shows around 85–90% identity in the exonic regions (Schmidt 2002). The fact that molecular markers of a Brassica species are transferrable to other Brassica species helps comparative mapping studies between cultivated Brassica species and with A. thaliana, as well as other Brassicaceae crops. On the basis of comparative mapping studies between A. thaliana and the ancestral karyotypes, 24 crucifer genomic blocks (A–X) have been proposed by Schranz et al. (2006), which are now widely accepted by scientific communities. By comparative genetic mapping between Arabidopsis and Brassica species, the presence of segmental duplications and genome rearrangements of Brassica A, B and C genomes was proposed and confirmed at the micro or macro level (Navabi et al. 2013, O’Neill and Bancroft 2000, Parkin et al. 2005). Lukens et al. (2003) attempted a comparison of mapped RFLP probe sequences of B. oleracea with the Arabidopsis genome sequence and identified 34 genomic collinear regions. In B. oleracea, through cDNA or BAC sequence comparison with Arabidopsis and B. rapa, they identified conserved collinearity for gene order and content of specific chromosomal segments (Li et al. 2003, Qiu et al. 2009). In B. napus, by sequencing of mapped RFLP probes and comparing these with the Arabidopsis genome, Parkin et al. (2005) identified 21 genomic blocks linked to the A and C genomes. Most of these conserved segments were found in six copies, which confirm the proposed hexaploid ancestor for the diploid Brassica progenitors. These genomic segments could be duplicated and rearranged in the present-day B. napus genome. Panjabi et al. (2008) extended comparative work in B. juncea and used Arabidopsis-based polymorphic intron PCR markers to identify conserved chromosomal regions and evolutionary relationships of the A, B and C genomes of Brassica. BAC- and SSR-based linkage maps of B. rapa (Choi et al. 2007, Kim et al. 2009) were adopted as references to anchor sequence contigs of the international B. rapa genome sequence project and facilitated identification of conserved genomic blocks between Arabidopsis and B. rapa. The genome sequence of B. rapa provides an important resource for comparative mapping (BrGSP, Wang, X. et al. 2011). A conserved genomic restructuring in B. napus was confirmed by comparative mapping of dense linkage maps based on SSR and SNP markers with Arabidopsis and B. rapa (Bancroft et al. 2011, Wang, J. et al. 2011). A linkage map was developed in B. carinata and comparative mapping with the Arabidopsis sequence identified conserved ancestral building blocks (Guo et al. 2012). Recently, B. nigra BAC libraries have been sequenced and compared with Arabidopsis chromosome 4 and homologous Brassica A and C genomes, identifying conserved collinearity for gene content and order (Navabi et al. 2013). Extensive chromosome sequence homoeology was also revealed in Raphanus by comparing an EST-SNP-based linkage map (Li, F. et al. 2011) and an EST-SSR map with sequences of Arabidopsis and B. rapa genomes (Shirasawa et al. 2011). Very recently, comparing the whole-genome sequence of B. rapa with genome sequences or genetic maps of other crucifer species, the conserved genomic block boundaries were re-defined for seven ancestral karyotype blocks, deciphering the diploid ancestral genome of mesohexaploid B. rapa (Cheng et al. 2013).

Functional genomic regions and candidate genes have also been identified by a comparative mapping approach in Brassicas (Schmidt and Bancroft 2011). Mapping and identification of candidate genes in B. rapa by comparative genomic study have been reported: for example, cloning of flowering time FLC genes (Schranz et al. 2002), clubroot resistance genes syntenic to the Arabidopsis chromosome (Suwabe et al. 2006), and mapping QTLs for flowering time (Li et al. 2009). QTLs for clubroot resistance were identified in B. oleracea and comparative analysis of resistance genes was performed between B. rapa and B. oleracea (Nagaoka et al. 2010). In B. juncea, comparative mapping of QTL regions for aliphatic glucosinolate with the corresponding Arabidopsis sequence identified candidate genes regulating the aliphatic glucosinolate biosynthetic pathway (Bisht et al. 2009). In B. olercaea, candidate genes for male fertility were identified by comparing sequence-tagged markers with genome sequences of Arabidopsis and B. rapa (Ashutosh et al. 2012). As the B. napus genome sequence is not available, sequences of diploid progenitors B. rapa and B. oleracea were utilized in comparative mapping with Arabidopsis to identify candidate genes of QTLs for seed weight in B. napus (Cai et al. 2012). Li et al. (2013) have identified five major functional conserved genomic regions containing QTLs for morphological and yield traits between A, B, C subgenomes of B. rapa, B. juncea and B. napus. The knowledge gained from comparative analysis has revealed high-level sequence collinearity across Brassicaceae and helps in understanding genome evolution and polyploidization. Comparative genomic studies give confidence in identifying orthologous candidate genes for important agronomic traits in Brassica crops and help in generating an integrated linkage map of species.

Next-generation genotyping techniques in Brassica

Recent advances of NGS technology have facilitated the discovery of various approaches of simultaneous sequence variant analysis and genotyping. Selected genomic regions or targeted restriction fragments of pooled individuals can be sequenced in a single reaction of a massive parallel sequencing platform. The sequences are aligned to the reference genome to compare assembled individual sequences and to identify variant sites to discover SNPs. These approaches are cost-effective and highly efficient in generating large amounts of informative data. Different protocols are available, such as complexity reduction of polymorphic sequences (CRoPS, Van Orsouw et al. 2007), restriction-associated DNA sequencing (RADseq, Baird et al. 2008), genotyping by sequencing (GBS, Elshire et al. 2011), and diversity arrays technology (DArT, Jaccoud et al. 2001). Each of the above protocols has its advantages and limitations but is reliable in SNP discovery and genotyping. Comparisons of these protocols have been explained in various reviews (Davey et al. 2011, Nielsen et al. 2011). These genotyping methods have been explored only in a few Brassica species, although transcriptome sequence techniques have been used for SNP discovery in B. napus and B. rapa (Hu et al. 2012, Trick et al. 2009a, 2009b).

The RADseq technique used in B. napus identified more than 20,000 SNPs and simultaneously genotyped eight different inbred lines (Bus et al. 2012). This method is simple, cost-effective, efficient and an alternative to transcriptome sequencing in SNP genotyping. DArT markers developed in Brassica and related species have been used in molecular diversity analysis of 89 different accessions of B. napus, B. rapa, B. juncea and B. carinata (Raman et al. 2012). Recently, a consensus linkage map based on DArT markers has been developed in B. napus, consisting of 1,359 markers spanning all 19 chromosomes covering a total of 1,987.2 cM with an average map density of one marker per 1.46 cM (Raman et al. 2013). Most of the DArT markers sequenced and aligned with B. rapa and B. oleracea genomes are useful in comparative mapping and genome evolution studies. Wells et al. (2013) developed a methodology based on pooled PCR product sequencing that incorporates bar-coded amplification tags (BATs) into PCR products. Using this method, targeted gene sequences were screened in a B. napus population and the resulting allele scoring mapped 24 markers on the expected position of the B. napus linkage map. In summary, next-generation high-throughput genotyping techniques are capable of providing increased marker density for genome selection or genome-wide association studies. Furthermore, next-generation genotyping methods need to be explored in Brassica and Raphanus to generate high-density linkage maps.

EST sequences and transcriptomes

In Brassicaceae, a large amount of data of EST sequences have been generated from all major Brassica crops and related species. In the NCBI database, approximately 1,500,000 EST sequences are available from various tissues exposed to different stress/growth conditions of ten different species of Brassicaceae (excluding Arabidopsis). Love et al. (2005) initiated development of a Brassica microarray by assembling EST sequences. Brassica 95K EST microarray was developed through clustering and assembling 810,254 Brassica raw ESTs, available in 2007, in non-redundant unigenes (Trick et al. 2009a). These unigenes have been loaded on a web portal (http://barssica.bbsrc.ac.uk) and are a valuable source for comparative mapping and genome analysis. Unigene sets are useable as pseudo-reference sequences for re-sequencing projects by next-generation techniques and assembling transcriptomes (Trick et al. 2009b) and SNP chips (Hayward et al. 2012).

In radish, rich EST sequences are available in the public domain and are being used for generation of linkage maps and genetic studies. In total, 3,800 EST-SSR markers were developed from 26,606 ESTs derived from different tissues of R. sativus and a linkage map was constructed using 630 EST-SSR and 213 reported marker loci covering a 1,129.3 cM region (Shirasawa et al. 2011). Available radish ESTs in databases (Radish DB; http://radish.plantbiology.msu.edu) were explored to discover SNPs and to construct an R. sativus linkage map consisting of 726 markers, and 72 syntenic regions to Arabidopsis were identified (Li, F. et al. 2011). RNA-seq-based transcriptome profiling of Raphanus root was utilized for identification of genes in response to metal Pb stress and 22 genes have been validated by quantitative real-time PCR (Wang et al. 2013). In the latest updated information, a total of 311,799 high-quality EST sequences were generated from raw EST data and further assembled in 85,083 unigenes (RadishBase, bioinfo.bti.cornell.edu/radish) (Shen et al. 2013).

In recent years, with the advancement of NGS technology, it has become possible to economically re-sequence whole genomes or generate large amount of transcriptome data in a short time. These sequences have been utilized for variant analysis to develop genic and functional markers. NGS techniques have been utilized to generate transcriptome sequences in polyploid B. napus and to discover single nucleotide polymorphism (Hu et al. 2012, Trick et al. 2009b). Furthermore, the B. napus genome was dissected by transcriptome sequences of parental and mapping population leaf samples and an SNP linkage map of about 23,000 markers was constructed (Bancroft et al. 2011). Sequence comparison of the B. napus genome with its progenitors B. rapa and B. oleracea revealed genome re-arrangements and detected a track of genomic segment inheritance. Transcriptome profiling based on deep EST sequencing in B. napus and three other oilseed species revealed both conserved and distinct species-specific expression patterns for genes involved in the synthesis of glycerolipids and their precursors (Troncoso-Ponce et al. 2011). Higgins et al. (2012) employed a next-generation-based RNA-seq technique to discriminate A and C genome transcriptomes in amphidiploid B. napus and measured the contribution of gene expression by each genome. The associative transcriptomics approach has been explored in B. napus to identify genomic deletions in QTL regions of glucosinolate content of seeds (Harper et al. 2012). A different gene expression pattern in response to water logging has also been identified in B. napus roots at the seedling stage (Zou, X. et al. 2013).

In Brassica rapa, abiotic stress transcriptome studies identified 56 transcription factors and 60 genes commonly expressed under various stresses (Lee et al. 2008). The gene expression pattern in different tissues of B. rapa analysed by RNA-seq revealed transcriptome complexity (Tong et al. 2013). Recently, in B. juncea (Tumourous stem mustard), transcription level analysis was performed to detect gene expression patterns at various stem development stages (Sun et al. 2012). In brief, the advancement of transcriptomic studies helps to understand the complexity of gene expression and regulation networks at various developmental stages and the response to biotic/abiotic stresses; in addition, high-resolution genome dissection provides resources for comparative and functional genomics.

Genome sequencing of the Brassicaceae family

Recent advances in high-throughput sequencing technology have immensely benefitted whole-genome sequencing projects in non-model organisms and have opened a new era in comparative genomics. In the Brassicaceae family, to date, the genomes of ten species have been partially or completely sequenced, including the model plant A. thaliana and cultivated Brassica species, e.g., B. rapa and B. oleracea, summarized in Table 1. The annotated Arabidopsis genome sequence provides a valuable reference, and genome sequences have also been utilized to develop DNA markers and a number of informative linkage maps in cultivated Brassica species, which have been used to identify candidate genes. Most of the ancestral progenitor sequences have been used for genome evolution studies and identification of conserved ancestral genomic segments. The sequencing project of 1,001 accessions of A. thaliana will enable the study of genome-wide association in this species (http://1001genomes.org). These sequences will provide a link of phenotypic diversity with genome variation and generate large resources for the plant community.

Table 1 Summary of genome sequences completed in Brassicaceae species
Species Genome sizea (Mb) Number of predicted genes % of genes orthologous to A. thaliana References
Arabidopsis thaliana 135–157 28,710 100 Arabidopsis Genome Initiative 2000
Arabidopsis lyrata 230–245 27,379 92 Hu et al. 2011
Schrenkiella parvula 140 28,901 80.2 Dassanayake et al. 2011
Brassica rapa 529 41,174 78.2 Wang, X. et al. 2011
Capsella rubella 210–216 26,521 88 Slotte et al. 2013
Eutrema salsugineum 314 26,521 82.7 Yang et al. 2013
Leavenworthia alabamica 316 30,343 67.7 Haudry et al. 2013
Sisymbrium irio 262 28,917 82.9 Haudry et al. 2013
Aethionema arabicum 240 23,167 72.4 Haudry et al. 2013
Brassica oleracea 696 45,758 Yu et al. 2013
a  Genome size reference from Johnston et al. (2005) or adapted from Haudry et al. 2013.

Sequencing of Brassica genomes are required for finding important genes, understanding of genome evolution and improvement of crops. Considering this, and the importance of Brassica crops, sequencing of the Brassica genomes was initiated in 2002 by the Multinational Brassica Genome Project (MBGP). The B. rapa Chinese cabbage (cv. Chiifu-401) was the first genome selected for sequencing because of its small genome size (529 Mb) and low frequency of repetitive sequences. The draft genome sequence of B. rapa (A genome) has been published by the Brassica rapa Genome Sequencing Project (BrGSP) (Wang, X. et al. 2011). A total of 41,174 protein-encoding genes were modeled on the B. rapa genome by assembling 1,427 markers, and 10 pseudochromosomes have been produced. The genome sequence of another important vegetable crop, B. oleracea (C genome), has recently been completed using a whole-genome shotgun (WGS) sequencing strategy. A 630 Mb assembled draft genome sequence was obtained, with a scaffold N50 size of 1.457 Mb and contig size of 26.828 kb, and assigned to nine pseudochromosomes containing 45,758 predicted genes (Yu, J. et al. 2013, http://www.ocri-genomics.org/bolbase/index.html).

Recently, Haudry et al. (2013) sequenced three genomes of Brassicaceae species, i.e., Leavenworthia alabamica, Sisymbrium irio and Aethionema arabicum. Comparative analysis with the previously sequenced genomes identified 90,000 conserved noncoding sequences (CNS) in Brassicaceae that show evidence of transcriptional and post-transcriptional regulation. Currently, work is in progress to complete the genome sequences of other Brassica and Raphanus species in the near future.

Genomic resources

Sequencing technology advancement has produced vast genomic information and sequence data in major crops. In the past decade, various Brassica databases were integrated on a common platform to facilitate efficient utilization by diverse researchers. An open access integrated database provides annotated genome information, genetic and physical maps, molecular markers, reference maps and gene expression data. The UK Brassica community put initiative in this direction in 1996 by compiling Brassica sequences and genetic maps to create the BrassicaDB database. A major advance in knowledge sharing realized the initiation of the Multinational Brassica Genome Project (MBGP) in 2002. A number of open access databases are available in Brassicaceae with systematics information on linkage maps, QTL maps, details of mapping populations, BAC libraries, marker data, EST repositories and genome sequences. The annotated B. rapa genome sequence is available on BRAD Brassica database (IVF-CAAS, China) and Brass ensemble (Rothemsted Research, UK) web resources. Recently, the B. oleracea genome sequence has become available for comparative analysis on the Bolbase data source (http://www.ocri-genomics.org/bolbase/index.html), although the complete genome for download is yet to be released. Radish Base, a database of genetics and genomics of radish, was recently developed by Cornell University, USA, and consists of SSR, EST, and SNP marker information, linkage maps and organelle genome sequences. Currently, many genetic and genomic resources in Brassicaceae are available and are summarized in Table 2. The integrated knowledge available in the public domain will provide a platform to exchange information and a basis for crop Brassica enhancement.

Table 2 List of web resources and databases providing bioinformatics analysis and genomic resources for the Brassicaceae
Names URL Key contents
ACPFG Applied Bioinformatics group http://www.appliedbioinformatics.com.au/index.php/Main_Page http://www.brassicagenome.net/ B. rapa genome browser, EST-SNP data base, BrassicaDB, CMap to compare genome and genetic map
Bolbase http://www.ocri-genomics.org/bolbase/ Genomic data of B. oleracea, analysis of genome structure as well as syntenic regions, browse, search and download genome of B. rapa and A. thaliana
BRAD http://brassicadb.org/brad/ Compilation of sequence datasets including the complete sequence of B. rapa. Annotations of genes orthologous to those in A. thaliana, and genetic markers and genetic maps, BLAST server
BrassEnsembl http://www.brassica.info/BrassEnsembl/index.html B. rapa genome sequence, consensus integrated genetic maps of the Brassica A and C genomes
Brassica Genome Gateway http://brassica.nbi.ac.uk Brassica genome sequencing database, Brassica 95K unigene set, the Brassica IGF Project, BrassicaDB
Brassica.info http://www.brassica.info/ Web-based open source to exchange information relating to Brassica genomics and genetics, registries of reference datasets, nomenclature standards, a compilation of ongoing public domain genome sequencing
BrassicaDB http://brassica.nbi.ac.uk/BrassicaDB/ Comprehensive sequence data set, genetic maps and markers in Brassica species, BLAST server, physical maps
CropStoreDB http://www.cropstoredb.org A collection of datasets related to plant and crop genetics, Brassica data implemented
Radish Base http://bioinfo.bti.cornell.edu/cgi-bin/radish/index.cgi Assembled and annotated ESTs, predicted metabolic pathways, EST-SSR, SNP markers, and genetic maps
Radish database http://radish.plantbiology.msu.edu/index.php/Main_Page EST sequences, linkage maps, SNP and SSR markers, radish genome sequence updates

Conclusion and perspectives

Since the accomplishment of genomic sequencing of the model plant A. thaliana, and later B. rapa and B. oleracea, comparative mapping between these species and important Brassicaceae crops has been possible. The presence of duplicated and repetitive DNAs complicates the proper alignment and identification of actual causal genes out of many paralogs. Genome sequence information on the other four cultivated Brassica genomes is still not available. Since the sequences of Brassica species are highly conserved, molecular markers and genomic information obtained for extensively studied B. rapa, B. oleracea and B. napus could be transferred to other commercial Brassica crops. Construction of high-density consensus genetic maps, common marker systems, and genomic sequence information is of great significance for accelerating breeding progress, as it allows comparative QTL mapping analysis, marker-assisted selection and cloning of economically important genes for desired traits.

Although many genomic resources have been established, and genomic sequencing of several Brassicaceae crops has been finished, most comparative studies are on a structural genomics level, and only a handful of genes governing important traits have been identified and functionally characterized. Thus, the genomic and comparative genomic resources that are being established are only a starting point for exploring the variation within Brassicaceae. In the near future, functional genomics should increasingly be used to identify desired genes for directed gene-assisted selection of economically important traits, and to detect genetic variation within the species, by combining various techniques, such as transcriptomic analysis and high-throughput genotyping and phenotypic characterization, to study the expression of duplicated genes under different environmental conditions.

Acknowledgments

Ashutosh Sharma was a recipient of a research fellowship from the Japan Society for the Promotion of Science for Foreign Scientists and his work was supported in part by the Program for Promotion of Basic and Applied Research for Innovations in Bio-oriented Industry (BRAIN), Japan. Work of YPL and XNL was supported by a grant from the Next-Generation BioGreen 21 Program (Plant Molecular Breeding Center No. PJ007992), Rural Development Administration, Republic of Korea. We are grateful to Prof Gareth Jenkins and Stuart Sullivan, University of Glasgow, for their valuable comments on this article.

Literature Cited
 
© 2014 by JAPANESE SOCIETY OF BREEDING
feedback
Top