Genes & Genetic Systems
Online ISSN : 1880-5779
Print ISSN : 1341-7568
ISSN-L : 1341-7568
Full papers
Complete chloroplast genome and 45S nrDNA sequences of the medicinal plant species Glycyrrhiza glabra and Glycyrrhiza uralensis
Sang-Ho Kang Jeong-Hoon LeeHyun Oh LeeByoung Ohg AhnSo Youn WonSeong-Han SohnJung Sun Kim
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML
Supplementary material

2018 Volume 93 Issue 3 Pages 83-89

Details
ABSTRACT

Glycyrrhiza uralensis and G. glabra, members of the Fabaceae, are medicinally important species that are native to Asia and Europe. Extracts from these plants are widely used as natural sweeteners because of their much greater sweetness than sucrose. In this study, the three complete chloroplast genomes and five 45S nuclear ribosomal (nr)DNA sequences of these two licorice species and an interspecific hybrid are presented. The chloroplast genomes of G. glabra, G. uralensis and G. glabra × G. uralensis were 127,895 bp, 127,716 bp and 127,939 bp, respectively. The three chloroplast genomes harbored 110 annotated genes, including 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. The 45S nrDNA sequences were either 5,947 or 5,948 bp in length. Glycyrrhiza glabra and G. glabra × G. uralensis showed two types of nrDNA, while G. uralensis contained a single type. The complete 45S nrDNA sequence unit contains 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 26S rRNA. We identified simple sequence repeat and tandem repeat sequences. We also developed four reliable markers for analysis of Glycyrrhiza diversity authentication.

INTRODUCTION

Licorice is a perennial herb belonging to the family Fabaceae. The genus Glycyrrhiza includes about 18 species in Asia, Europe and the Americas. Glycyrrhiza uralensis Fisch. occurs from Central Asia to the northeastern part of China, whereas G. glabra L. is distributed from southern Europe to the northwestern part of China. The roots and stolons of G. uralensis and G. glabra produce some of the most important crude drugs in the world (Gibson, 1978), mainly glycyrrhizin, an oleanane-type triterpene saponin. Glycyrrhiza plants have been used traditionally as anti-inflammatory (Finney and Somers, 1958; Kroes et al., 1997), antiviral (Fiore et al., 2008), antiallergy (Park et al., 2004) and antiulcer treatments (He et al., 2001). Because licorice extracts are approximately 150 times sweeter than sucrose (Kitagawa, 2002), they are also widely used in the world as a natural sweetener, with an annual value of over US $42 million (Parker, 2006). As a medicinal plant, correct authentication of licorice plant ingredients ensures their safe use.

Chloroplast (CP) genome sequences are of central importance to tracing plant taxonomy and authentication because they are highly conserved across plant species. The CP genome is composed of a large single-copy region, a small single-copy region and two inverted repeats (IRs) (Gary et al., 1984; Shinozaki et al., 1986; Leseberg and Duvall, 2009). Interestingly, licorice species belong to the inverted repeat-lacking clade (IRLC) (Wojciechowski et al., 2004) of papilionoid legumes, characterized by the loss of one copy of the IR. To date, only the CP genome of G. glabra has been sequenced among the Glycyrrhiza species (Sabir et al., 2014).

The sequence of the 45S nuclear ribosomal DNA (nrDNA), bearing the 18S-5.8S-26S ribosomal RNA genes, also provides additional information that can be very useful in plant taxonomy and DNA barcoding (Chen et al., 2014; Techen et al., 2014; Mishra et al., 2016). In particular, internal transcribed spacer (ITS1 and ITS2) sequences in nrDNA are potential barcodes (Álvarez and Wendel, 2003; Yao et al., 2010). Although these sequences are valuable for medicinal identification, there is little information about their comparison and polymorphism between Glycyrrhiza species.

In the current study, we analyzed the complete sequences of the CP and nrDNA of two Glycyrrhiza species. In addition, we identified 160 polymorphic sites in the CP genome and 10 polymorphic sites in the nrDNA that are valuable for the identification and authentication of G. glabra and G. uralensis as well as G. glabra × G. uralensis interspecific hybrids. Despite their useful applications as medicinal ingredients and food resources, there is limited information regarding the complete chloroplast genomes and the nrDNA sequences of Glycyrrhiza species. The results of this study provide an insight into the genetic relationships among the various species in the genus Glycyrrhiza.

MATERIALS AND METHODS

Plant materials and DNA extraction

European licorice (G. glabra; the female parent) and Chinese licorice (G. uralensis; the male parent) were planted in the greenhouse and artificially crossed in May 2007. In June 2008, stolons were separated from F1 (G. glabra × G. uralensis) licorice seedlings and cultivated, resulting in 32 clonal lines of interspecific hybrids. The aerial parts of the two Glycyrrhiza species were collected from Eumseong (36°56′ 38.68″N, 127°45′ 17.60″E), and identified by J.-H. L. Voucher specimens (G. glabra: MPS000350-1, G. uralensis: MPS004535, G. glabra × G. uralensis F1: MPS002499) are deposited at the Korea Medicinal Resources Herbarium, Eumseong, Korea. Total DNA was extracted from young and fully expanded leaves of Glycyrrhiza species using the modified cetyltrimethylammonium bromide method (Allen et al., 2006). DNA purity and concentration were checked by electrophoresis on a 1.2% agarose gel and by DropSense96 Spectrophotometer (Trinean, Gentbrugge, Belgium). High-quality DNA (concentration > 100 ng/μl; A260/A230 > 1.7; A260/A280 = 1.8 − 2.0) was used for further analysis.

Illumina sequencing and de novo assembly of CP and nrDNA

Paired-end (PE) libraries were constructed with insert sizes ranging from 280 to 430 bp and following the manufacturer’s specified protocols for the TruSeq PE Cluster Kit (Illumina, San Diego, CA, USA). The PE libraries were sequenced using the Illumina genome analyzer (HiSeq 1000, Illumina) platform at our in-house facility (Genomics Division, National Institute of Agricultural Sciences, Korea). CP genome and nrDNA de novo assembly was accomplished using approaches described in Kim et al. (2015). In short, sequences of low quality were trimmed to below Phred scores of 20 using CLC quality trim software. The remaining high-quality sequences were assembled into contigs, using CLC genome assembler beta 4.06 (CLC, Aarhus, Denmark) with a minimum of 150–500 bp autonomously controlled overlap size, at Phyzen (Seongnam, South Korea). The obtained CP genome sequences were assembled using the G. glabra (KF201590) genome as a reference sequence. The assembled nrDNA contigs fully covered the 45S nrDNA cistron unit and partially covered the intergenic spacer sequences.

Gene annotation, SNP genotyping and repeat sequence analysis

CP sequences were annotated using DOGMA (http://dogma.ccbb.utexas.edu) (Wyman et al., 2004) and BLAST searches. tRNA genes were identified using DOGMA and tRNAscan-SE (http://lowelab.ucsc.edu/tRNAscan-SE/) (Schattner et al., 2005). The circular CP genome map was constructed using OGDraw software (http://ogdraw.mpimp-golm.mpg.de/) (Lohse et al., 2007). Repeats in the CP sequences of the Glycyrrhiza species were investigated using Tandem Repeats Finder, version 4.0 (http://tandem.bu.edu/trf/trf.html) (Benson, 1999), with 100% similarity and a minimum size of 10 bp. Simple sequence repeat (SSR) motifs with a minimum size 10 bp were identified using MISA (http://pgrc.ipk-gatersleben.de/misa/).

Sequence divergence analysis

The CP genome sequence of G. glabra (KF201590) was downloaded from the NCBI database and aligned with newly determined sequences using MAFFT version 7 (http://mafft.cbrc.jp/alignment/server/). Comparison of the four CP genomes among G. glabra (KU891817), G. uralensis (KU862308), G. glabra × G. uralensis (KU862307) and G. glabra (KF201590) was performed using the mVISTA program in Shuffle-LAGAN mod (Frazer et al., 2004).

Identification of polymorphisms that can distinguish Glycyrrhiza species

Four PCR primers (Supplementary Table S1) were designed based on CP InDels and nrDNA-specific sequence regions among Glycyrrhiza species. These primers were used to distinguish G. glabra and G. uralensis as well as G. glabra × G. uralensis. The PCR conditions were 4 min at 94 ℃ followed by 38 cycles of 94 ℃ for 30 s, 60 ℃ for 30 s and 72 ℃ for 15 s, followed by a final extension at 72 ℃ for 1 min. Gel electrophoresis was performed using a 1% agarose gel, and amplified fragments were stained with a fluorescent dye.

RESULTS AND DISCUSSION

After sequencing, we employed a combination of de novo assembly and reference-guided strategies using Illumina PE reads ranging from 587 to 741 Mbp, which represents approximately 226- to 400-fold CP genome coverage. The complete CP genomes of G. glabra, G. uralensis and G. glabra × G. uralensis were circles of 127,895 bp, 127,716 bp and 127,939 bp, respectively (Table 1). The complete CP gene content and order were identical among the Glycyrrhiza species (Fig. 1). These three CP genomes belong to the IRLC (Wojciechowski et al., 2004) of papilionoid legumes, where the loss of one copy of the IR has occurred. The Glycyrrhiza CP genomes harbor 110 annotated genes: 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes (Table 2). Among these, nine protein-coding and six tRNA genes contain a single intron, while one gene (ycf3) contains two introns. infA, rpl22 and rps16 were absent in Glycyrrhiza species. Two of these genes, infA and rpl22, are also missing from the CP genomes of other legumes (Doyle et al., 1995) but are present in the nucleus (Gantt et al., 1991), and the loss of rps16 from CP DNA in Medicago and Populus has been reported (Ueda et al., 2008). Whole-genome alignments of Glycyrrhiza species with the annotation of G. glabra (KF201590) (Sabir et al., 2014) as a reference using mVISTA revealed their sequence variation (Fig. 2). The whole CP genome alignments showed that the coding regions are more highly conserved than the intergenic regions, as is the case in most angiosperms. Analysis of sequence variation between G. glabra (KF201590) and G. glabra (KU891817) showed 30 single-nucleotide polymorphisms (SNPs) and 24 insertions-deletions (InDels). These SNPs and InDels may provide valuable information for authenticating Glycyrrhiza species. The CP genome of G. glabra × G. uralensis shared 99.98 and 99.85% nucleotide sequence identity with G. glabra and G. uralensis, respectively, indicating that Glycyrrhiza species also follow the mode of maternal plastid inheritance (Hagemann et al., 2004).

Table 1. Summary statistics of CP genome and nrDNA sequencing and assembly for three Glycyrrhiza species
Scientific nameAmount (Mbp)CP genomenrDNA
Length (bp)Coverage (fold)GenBank Acc. No.Length (bp)Coverage (fold)GenBank Acc. No.
G. glabra741.68127,895367.81KU891817Type 15,947616.43KX530462
Type 25,947600.79KX530463
G. uralensis721.47127,716225.95KU862308Type 15,9481259.83KX530461
G. glabra x587.42127,939399.91KU862307Type 15,948739.21KX530459
G. uralensisType 25,947684.44KX530460
Fig. 1.

The map of the CP genome of the Glycyrrhiza species. Genes shown outside and inside the outer circle are transcribed clockwise and counterclockwise, respectively. Functionally annotated genes are grouped by color according to the key at the bottom left. The darker gray area in the inner circle shows the GC content.

Table 2. Gene composition in Glycyrrhiza CP genomes
Category of gene groupGroup of genesNames of genes
Self-replicationRibosomal RNAs16S (rrn16), 23S (rrn23)
4.5S (rrn4.5), 5S (rrn5)
Transfer RNAstrnA-UGC , trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnfM-CAU, trnG-GCC, trnG-UCC , trnH-GUG, trnl-CAU, trnI-GAU , trnK-UUU , trnL-UAA , trnL-UAG, trnL-CAA, trnM-CAU, trnM-GUU, trnP-UGG, trnQ-UUG, trnR-ACG, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-UAC , trnV-GAC, trnW-CCA, trnY-GUA
Small subunit of ribosomerps2, rps3, rps4, rps7, rps8, rps11, rps12 , rps14, rps15, rps18, rps19
Large subunit of ribosomerpl2 , rpl14, rpl16 , rpl20, rpl23, rpl32, rpl33, rpl36
RNA polymeraserpoA, rpoB, rpoC1 , rpoC2
PhotosynthesisNADH-dehydrogenasendhA , ndhB , ndhC, ndhD, ndhE
ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
Photosystem IpsaA, psaB, psaC, psaI, psaJ, ycf3 #
Photosystem IIpsbA, psbB, psbC, psbD, psbE, psbF
psbH, psbI, psbJ, psbK, psbL, psbM
psbN, psbT, psbZ
Cytochrome b6/fpetA, petB , petD , petG, petL, petN
ATP synthaseatpA, atpB, atpE, atpF , atpH, atpI
RubiscorbcL
Other genesaccD, ccsA, cemA, clpP, matK
Unknown functionORFs ¥ycf1, ycf2, ycf4
   indicates the existence of a single intron in the corresponding genes;

#   indicates the existence of two introns in the corresponding gene;

¥   indicates open reading frames.

Fig. 2.

Comparison of the CP genome of G. glabra, G. uralensis and G. glabra × G. uralensis using G. glabra (KF201590) as a reference sequence. The top line shows the order of genes (transcriptional direction is indicated by arrows). Genome regions are color-coded as follows: conserved gene = blue, tRNA and rRNA = sky blue and intergenic region = red.

The nrDNA sequences were assembled into single contigs that were either 5,947 bp or 5,948 bp in length. Glycyrrhiza glabra and G. glabra × G. uralensis showed two types of nrDNA, while G. uralensis contained a single type of nrDNA (Table 1). The complete nrDNA sequence unit contains 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 26S rRNA (Fig. 3). The average GC content ranged between 53.86 and 53.91%, which is almost identical among the five nrDNAs (Fig. 3).

Fig. 3.

Schematic diagram of the nrDNA cistron unit of five Glycyrrhiza sequences. (A) Mapped read depth of the nrDNA cistron unit sequences. (B) A GC content plot was drawn with a window size of 40 nucleotides using UGENE software.

Repeat sequences in the CP genomes of G. glabra, G. uralensis and G. glabra × G. uralensis were analyzed using Tandem Repeats Finder, version 4.0. A total of 20 unique sequences of tandem repeats were detected in the Glycyrrhiza CP genomes (Supplementary Table S2). The lengths of tandem repeats in the CP genomes ranged from 11 to 39 bp, and most of the tandem repeats appear in two copies. As in Bupleurum falcatum (Shin et al., 2016), most of the tandem repeat sequences were in non-coding regions, with only three genic regions (rps11, rpl20 and ycf1) containing tandem repeat sequences. Tandem repeat sizes identified in Glycyrrhiza CP genomes were invariably less than 40 bp, which is sufficient for illegitimate recombination (Sherman-Broyles et al., 2014). SSRs, also known as microsatellites, frequently occur in CP genomes. In this study, mononucleotide SSRs were excluded. We identified 350, 349 and 352 SSRs with a length of at least 10 bp in G. glabra, G. uralensis and G. glabra × G. uralensis, respectively (Fig. 4). Among the SSRs, the pentanucleotide SSRs were the most abundant in the CP genomes, accounting for 84% of total SSRs. Di-, tri- and tetranucleotide repeats were composed of A or T at a higher level, which reflects AT richness in the CP genomes (Zhang et al., 2011; Yi and Kim, 2012). These SSRs may further serve as genetic markers for phylogenetic and medicinal plant authentication studies (Zhang et al., 2016).

Fig. 4.

Number of simple sequence repeats in the Glycyrrhiza CP genomes. Classification of SSRs by repeat types in G. glabra (A), G. uralensis (B) and G. glabra × G. uralensis (C).

We detected 160 and 10 SNPs from the Glycyrrhiza CP genomes and nrDNAs, respectively (Supplementary Table S3 and S4). Like SSRs, most SNPs in chloroplast DNA are located in non-coding regions, whereas SNPs in nrDNA were detected in ITS1, ITS2 and 26S. Furthermore, we identified 83 InDels in the Glycyrrhiza CP genomes. PCR primers were designed based on InDels and specific sequence regions (Supplementary Table S1). We successfully amplified four PCR products that can distinguish between G. glabra and G. uralensis species (Fig. 5). The primer pairs ycf3F01/ycf3R01, atpHF01/atpHR01 and ycf2F01/ycf2R01 amplified PCR products in all three Glycyrrhiza CP genomes. On the other hand, the 5.8SF01/5.8SR01 primer pair amplified a PCR product only in G. glabra and G. glabra × G. uralensis, in nrDNA. These primers will be used as Glycyrrhiza authentication markers.

Fig. 5.

Validation of InDel and sequence-specific polymorphic sites. PCR analysis of InDel regions from CP genomes and sequence-specific regions from nrDNA. M indicates a 100-bp size marker; GG, GU and F1 correspond to G. glabra, G. uralensis and G. glabra × G. uralensis, respectively. 1-4 represent the ycf3F01-ycf3R01, atpHF01-atpHR01, ycf2F01-ycf2R01 and 5.8SF01-5.8SR01 primer pairs, respectively. a, b PCR products are derived from CP genomes and nrDNA-based markers, respectively.

In this study, the complete Glycyrrhiza CP genomes and nrDNA have been sequenced. These genomes belong to the IRLC of papilionoid legumes, which is characterized by the loss of one copy of the IR. The complete CP genomes of G. glabra, G. uralensis and G. glabra × G. uralensis were 127,895 bp, 127,716 bp, and 127,939 bp, respectively. The nrDNA sequences were either 5,947 bp or 5,948 bp. Glycyrrhiza glabra and G. glabra × G. uralensis showed two types of nrDNA, while G. uralensis contained a single type of nrDNA. We developed four reliable markers for the analysis of Glycyrrhiza diversity authentication. This study will open up further avenues of research to develop a better understanding of the molecular ecology and molecular phylogeny within Glycyrrhiza species.

ACKNOWLEDGMENTS

The authors thank the National Institute of Agricultural Sciences Genome Sequencing Core facility for their services. This work was carried out with the support of the National Institute of Agricultural Sciences (Project No. PJ010889), Republic of Korea.

REFERENCES
 
© 2018 by The Genetics Society of Japan
feedback
Top