Breeding Science
Online ISSN : 1347-3735
Print ISSN : 1344-7610
ISSN-L : 1344-7610
Research Papers
Genome-wide characterization of microsatellites and genetic diversity assessment of spinach in the Chinese germplasm collection
Shu-Fen LiBing-Xiao WangYu-Jiao GuoChuan-Liang DengWu-Jun Gao
著者情報
ジャーナル フリー HTML
電子付録

2018 年 68 巻 4 号 p. 455-464

詳細
Abstract

Spinach is a nutritional leafy green vegetable, and it also serves as a model species for studying sex chromosome evolution. Genetic marker development and genome structure analysis are important in breeding practice and theoretical evolution studies of spinach. In this study, the frequency and distribution of different microsatellites in the recently released draft spinach genome were characterized. A total of 261,002 perfect microsatellites were identified (estimated frequency: ~262.1 loci/Mbp). The most abundant microsatellites were tetranucleotide and trinucleotide, accounting for 33.2% and 27.7% of the total number of microsatellites, respectively. A total of 105 primer pairs were designed and screened, and 34 were polymorphic among the detected spinach cultivars. Combined with seven primer sets developed previously, 41 primer pairs were used to investigate genetic diversity among 43 spinach cultivars in China. The average polymorphism information content value of the 41 markers was 0.43, representing an intermediate level. The spinach cultivars had a low genetic diversity, and no detectable common factors were shared by each group in the UPGMA dendrogram. This study’s findings facilitate further investigations on the organization of the microsatellites in spinach genome and provide clues for future breeding applications of spinach in China.

Introduction

Microsatellites are short repeats of nucleotides that are arranged in tandems and are distributed ubiquitously in prokaryotic and eukaryotic genomes (Buschiazzo and Gemmell 2006, Gur-Arie et al. 2000). They are also called simple sequence repeats (SSRs) or short tandem repeats (STRs). Microsatellites often exhibit length diversity because the numbers of repeat units are hypervariable among individual genotypes (Kashi and King 2006). Due to the extensive distribution, high polymorphism, and co-dominant heredity, microsatellites play significant roles in genome organization, and they are also considered as the most popular neutral genetic markers (Demenou and Hardy 2017, Huang et al. 2016, Nag and Mitra 2017). Microsatellite markers have been commonly applied in genetic diversity assessment (Chen et al. 2015, Li et al. 2016), phylogenetic analysis (Hilmarsson et al. 2017, Xu et al. 2010), genetic linkage mapping (Somers et al. 2004), and quantitative trait locus mapping (Xia et al. 2014, Zhang et al. 2012). In the breeding practice, the development and characterization of genotype-associated microsatellite markers can be of great help to breeders (Miah et al. 2013). In addition, microsatellites play functional roles in the organization of the genome and chromatin (Baldi and Baisnee 2000), the evolution of the genome (Deng et al. 2016, Kejnovský et al. 2013), and the regulation of DNA metabolism (Gendrel et al. 2000, Majewski and Ott 2000) and gene expression (Li et al. 2004).

Owing to the next-generation sequencing technology, genome data of a number of plants and animals are available. Even non-complete genome data can provide abundant microsatellites (Li et al. 2016, Zhu et al. 2012). The genome-wide characterization of microsatellites, combined with marker screening and development may help investigation of genome organization and provide routinely applicable markers for genetic and phylogenetic studies.

Cultivated spinach (Spinacia oleracea) belongs to the Spinacia genus, Chenopodiaceae family. It is a commonly dioecious plant (2n = 2x = 12) with an estimated haploid genome size of 989 Mb (Arumuganathan and Earle 1991). Due to its high mineral, vitamin, carotenoid, and folate content, spinach is a popular nourishing leafy vegetable crop that is cultivated worldwide, primarily in China, the USA, Japan, and Europe (Siemonsma and Piluek 1993). Thus, breeding spinach varieties with favorable traits, such as late bolting, fast growth, high yield, and disease and stress resistance, is a significant undertaking. Abundant genome-wide molecular markers have potential applications in the marker-assisted breeding of spinach and can also be used for the genetic diversity assessment of spinach germplasm collections. In addition, the genome-wide characterization of microsatellites is beneficial for genome organization studies. In particular, spinach is used as a model dioecious plant species for genetic studies on sex determination and cytogenetic studies on sex chromosome evolution (Deng et al. 2013, Khattak et al. 2006, Lan et al. 2006, Onodera et al. 2011). It has been shown that microsatellites may be involved in the sex chromosome evolution (Cermak et al. 2008, Kejnovský et al. 2013, Pokornáet al. 2011). Thus, the characterization of microsatellite distribution and variation in the genome of spinach is helpful for further studying the evolution of sex chromosomes in spinach.

Several reports regarding the microsatellite analysis in spinach germplasm have been published (Göl et al. 2017, Groben and Wricke 1998, Khattak et al. 2007, Kuwahara et al. 2014). However, only a limited number of microsatellite markers for spinach are available. In addition, the large-scale genome-wide characterization of microsatellites has yet to be fully studied. The recent release of the draft genome of spinach (Xu et al. 2017) has provided a valuable source that can be used in the genome-wide characterization and marker development of microsatellites. In the present study, a large-scale genome-wide analysis of microsatellites in spinach was conducted, and the genetic diversity of the main spinach cultivars in China was assessed by using the developed microsatellite markers. This study’s findings may add to our understanding of the spinach genome structure with respect to microsatellites and may provide clues regarding the future spinach breeding practice in China.

Materials and Methods

Plant materials and DNA extraction

A total of 43 cultivars were used in assessing the genetic diversity of the spinach germplasm collection in China. The seeds were provided by the Chinese Crop Germplasm Resources Information System. Among the cultivars, one (‘Ribendaye’) was bred from Japan, and the other 42 ones were bred by 10 different provinces in China (Supplemental Table 1, Supplemental Fig. 1). All plants were grown in an experimental field located at Henan Normal University. The total genomic DNA was isolated from young leaves of various spinach cultivars using a traditional CTAB method (Doyle and Doyle 1987).

Microsatellite detection and analysis

A large-scale, genome-wide microsatellite analysis was performed in spinach genome using a Perl program. Spinach genome and gene data were downloaded from http://www.spinachbase.org/cgi-bin/spinach/tool/download.cgi. The perl scripting source code for the microsatellite tool SSRIT (Temnykh et al. 2001) was downloaded from http://www.gramene.org/db/markers/ssrtool, and a slight modification was made to run in a batch mode. Repeats of basic motifs ranging from 2- to 8-bp of the microsatellites were recorded. The minimum repeat lengths were 12 (for di- to tetra-), 15 (for penta-), 18 (for hexa-), 21 (for hept-), and 24 (for octanucleotide). Microsatellite density, GC content, motif, repeat length, and repeat times distribution in spinach genome were analyzed using Windows Excel 2007. The repeat motifs analyzed in this study included all variants of both strands of the DNA sequences. For example, the AG motif is equivalent to GA and the reverse complements GT and TG.

Transposable element (TE) annotation

In order to compare the distribution pattern of microsatellites, TEs, and genes in the spinach genome, we annotated TEs according to the method described previously (Harkess et al. 2017) with minor modifications. Briefly, LTRharvest (GenomeTools v1.5.9) was used with default parameters except for ‘-similar 55 -maxdistltr 40000’ to identify full-length LTR retrotransposons. Then, the detected LTR retrotransposons were identified and further classified using RepeatModeler program (v1.0.10) (http://www.repeatmasker.org/RepeatModeler/). Furthermore, the assembly and clustering contigs of whole genome shotgun Miseq reads using the RepeatExplorer pipeline (Novák et al. 2013) was also classified using RepeatClassifier program. The classified LTR retrotransposons and RepeatExplorer assembled contigs were combined as a custom transposon database. Repetitive element annotation of the whole genome was performed utilizing RepeatMasker (v4.0.7) with the custom transposon database with default parameters.

Primer design and microsatellite amplification

Primers were designed for randomly selected 105 microsatellite loci. The microsatellite flanking sequences were used for primer design using Primer 3 (Rozen et al. 2000) and evaluated by Oligo 7 (Rychlik 2007). PCR amplification was performed in a 25-μL reaction volume containing 1 × PCR buffer, 50 ng genomic DNA, 0.5 U Taq polymerase (Transgene, Beijing, China), 0.2 μM of each primer, and 200 μM dNTPs (Transgene). The reaction was carried out using the following cycling conditions: initial denaturation at 95°C for 5 min, followed by 35 cycles of denaturation at 95°C for 1 min, annealing temperature (based on each pair of primer) for 40 s, extension at 72°C for 1 min, and a final extension at 72°C for 7 min. Then, the PCR products were size-fractionated on 8% nondenatured polyacrylamide gels with a DNA size marker DL500 (Takara, Dalian, China), followed by silver staining to visualize the band patterns.

Microsatellite data analysis

The genomic DNA fragments from microsatellites that generated clear and unambiguous bands were scored for 86 spinach accessions (two individuals for each cultivar). The polymorphism was determined according to the presence or absence of the microsatellite band. Low-frequency bands with less than four occurrences were neglected due to their potential unreliability. The genetic variation was analyzed by using POPGEN32 (Yeh et al. 2000). Next, the Un-weighted Pair Group Method with Arithmetic Mean (UPGMA) method was used to construct a dendrogram using MEGA 6 (Tamura et al. 2013) based on the Nei’s unbiased genetic distance. The polymorphic information content (PIC) values for each marker were calculated according to the following formula: PIC = 1 - i = 1 n p i j 2. In this formula, pij is the frequency of the jth pattern for SSR marker i, and n is the total number of different alleles at the locus i.

Results

Detection of microsatellites in the spinach genome

Repeats of basic motifs ranging from 2- to 8-bp of the microsatellites were recorded in the recently released 996 Mbp genomic sequences of spinach. A total of 261,002 perfect microsatellites with ≥3 repeat units and a minimum length of 12 bp were detected. On average, the estimated frequency of microsatellites across the genome was one microsatellite every 3.8 kb of the sequence (i.e. 262.1 loci/Mbp). The total length of all detected microsatellite sequences was estimated to be 4,926.8 kb, accounting for about 0.5% of the draft genome assembly of spinach.

Distribution of microsatellite types in the spinach genome

Among the seven nucleotide types, tetranucleotide motifs were the most frequent repeats (86,713), accounting for 33.2% of the total microsatellite loci detected. This motif was followed by the tri- (72,201, 27.7%), di- (39,930, 15.3%), penta- (31,844, 12.2%), hexa- (15,422, 5.9%), hepta- (12,425, 4.8%), and octanucleotides (2,467), which only accounted for 0.9% of the total microsatellites (Table 1).

Table 1 Distribution of perfect microsatellites with ≥3 repeats and minimum 12 bp length in genomic sequences of spinach
Microsatellite type Count Relative frequency (%) Mean repeat number Density (Microsatellites/Mb) Cumulative sequence length (kb)
Dinucleotide 39,930 15.3 12.5 40.1 995.7
Trinucleotide 72,201 27.7 5.8 72.5 1258.2
Tetranucleotide 86,713 33.2 3.2 87.1 1117.0
Pentanucleotide 31,844 12.2 3.4 32 534.7
Hexanucleotide 15,422 5.9 3.4 15.5 318.4
Heptanucleotide 12,425 4.8 7.3 12.5 636.8
Octanucleotide 2,467 0.9 3.3 2.5 66.0
Total/mean 261,002 100 5.6 262.1 4926.8
Total seq. (Mbp) 996

We also analyzed the repeat number distribution of each microsatellite motif (Fig. 1). For all seven microsatellite motifs, the microsatellite frequency was negatively correlated with the number of repeat units. This variation was more obvious in the tri- to octanucleotides than in the dinucleotides. Consequently, the mean repeat number in dinucleotides (12.5) was more than two times the number of trinucleotides (5.8), and nearly four times those of the tetra-, penta-, hexa-, and octonucleotides (3.2, 3.4, 3.4, and 3.3, respectively). Unexpectedly, the mean repeat number in the heptanucleotide was 7.3, higher than the other microsatellite types, except for the dinucleotide. We further analyzed the repeat number of heptanucleotide in detail. The results showed an unexpectedly large number of long heptanucleotide repeats: a total of 112 heptanucleotide repeats with more than 100 repeat units. The largest unit number of heptanucleotide was 846, and this was also the largest out of all the microsatellites detected. Due to the large differences in the mean repeat numbers, the cumulative sequence length of each microsatellite type was not correlated with the relative frequency. The total length of the trinucleotide repeats was the longest (1258.2 kb), followed by the tetra- (1117 kb) and dinucleotides (995.7 kb). The cumulative sequence length of the heptanucleotides (636.8) was even longer than those of the penta- (534.7 kb) and hexanucleotides (318.4 kb). The total length of the octanucleotide repeats was the shortest, only 66 kb (Table 1).

Fig. 1

The frequencies of the repeat motifs with respect to the number of motif repeats of microsatellites in the genome sequences of spinach.

Distribution of the microsatellite motifs

A detailed analysis with respect to the nucleotide composition of each type of microsatellites was also performed. Results showed that some repeat motifs occurred more frequently than others in each class.

Among the four dinucleotide motifs, the AT motif was dramatically overrepresented, representing 43.5% of the total dinucleotides. No significant differences were observed among the AC, AG, and CG motifs, which accounted for 19.8%, 18.5%, and 18.3% of the total dinucleotides, respectively. An analysis of the trinucleotide repeats showed that the AAT motif occurred most frequently, accounting for 36.4% of the total trinucleotides, thereby outnumbering the next most abundant motifs, AAC (17.4%) and AAG (10.8%), by 2.1- and 3.4-fold, respectively. Furthermore, the AAT motif was also the most predominant motif in the entire spinach genome, accounting for 10.1% of the total microsatellite loci characterized. By contrast, CCG and ACG were the rarest motifs, representing only 1.2% and 0.8% of the total trinucleotides, respectively. A frequency analysis of the tetranucleotide repeat revealed that the AAAT and AATT were predominant, and together, occupied 55.5% of the total tetranucleotides. Among the longer motif types, the most abundant motifs were AAAAT (20.3%) among the pentanucleotides, AAAAAT (13.4%) among the hexanucleotides, AATAAAT (17.8%) among the heptanucleotides, and TCTTGTAT (33.7%) among the octanucleotides (Supplemental Table 2). Overall, the repeats of AAT, AAAT, AATT, AT, AAC, AAAC, AC, AAG, AG, and CG were remarkably abundant in the spinach genome (Fig. 2).

Fig. 2

The 20 most abundant microsatellite motifs in the spinach genome. The colors black, orange, and purple denote the AT-rich microsatellite motifs, the AT = GC, and the GC-rich motifs, respectively.

Clearly, the AT-rich motifs occurred more frequently than the AT = GC and GC-rich motifs. The AT-rich motifs represented 73.7% of the total repeats, dramatically outnumbered the AT = GC motifs (11.3%) and GC-rich motifs (15.0%) (Supplemental Table 2). Among the top 20 most frequently occurring motifs, 13 were AT-rich, whereas only 3 and 4 were AT = GC and GC-rich motifs (Fig. 2). A further investigation revealed that the AT-rich motifs were most predominantly in the tri- to octanucleotides, whereas the difference was not that great in the dinucleotides.

Chromosome analysis of the microsatellites

The frequencies and distributions of the different microsatellite types in each spinach chromosome were further investigated. Only 47% of the sequences were found to be anchored to six chromosomes. A total of 121,527 microsatellite loci, nearly 47% of the total loci were identified on the six chromosomes, indicating that the microsatellites were distributed relatively evenly among the spinach genome. Although the density of the microsatellites on each chromosome had a range of variations, in general, the frequency of the microsatellite loci was correlated with the chromosome size. For example, chromosome 4 and chromosome 6 were the largest and smallest chromosomes, and the microsatellite number was the highest and lowest on these two chromosomes, respectively (Fig. 3, Supplemental Table 3). On each chromosome, the distribution patterns of the different microsatellite types were similar to those in the overall genome, i.e. the tetra- and trinucleotides were the most and second abundant types, and the hexa-, hepta-, and octanucleotides were the least abundant types.

Fig. 3

The abundance of different microsatellite repeats on each chromosome of spinach.

Furthermore, we analyzed the distribution of the microsatellites, annotated genes, and transposable elements (TEs) along each chromosome of spinach. The results showed that the distribution pattern of the microsatellites was in accordance with that of the genic sequences (Fig. 4). However, the microsatellite density was negatively correlated with the TE density, such that the higher the density of the microsatellites, the lower the density of the TEs (Fig. 5).

Fig. 4

The distribution of microsatellites and annotated genes along each chromosome.

Fig. 5

The distribution of microsatellites and annotated TEs along each chromosome.

Microsatellite marker screening and polymorphism

A total of 105 microsatellite loci were randomly selected, and the flanking sequences were used for primer design. The validation of the amplification and evaluation of the polymorphisms of these primer pairs was performed by using six individuals from three cultivars (‘Hengshan’, ‘Lianchengdong’, and ‘ribendaye’). All of these primers can amplify products and 34 generated clear polymorphic bands among the different cultivars. We also used 13 pairs of primers developed previously (Khattak et al. 2007) for the amplification and screening. Out of the 13 primer sets, 7 sets produced clear bands and showed polymorphism among the cultivars used in this study. Thus, these 41 stable and polymorphic genomic microsatellite primers were used for further genetic diversity analysis of individuals belonging to the 43 spinach cultivars (two individuals each cultivar) (Table 2). A total of 235 alleles were detected in the 86 individual plants using the 41 loci. About 2–10 different alleles for each primer pair were amplified, with the average fragment number of 5.7. The primer sets generated 119 polymorphic fragments in the spinach accessions, accounting for 58.3% of the total fragments. The PIC value of each primer pair for the spinach accessions ranged from 0.14–0.87, with an average of 0.43 (Table 2).

Table 2 Characteristics of 41 polymorphic microsatellites and primer sets
Primer ID Primer sequence (5′-3′) Repeat motif NPF/total no. fragments (%) PIC
Spms6 F:AGCTACATCCAATAATGCAA
R:TAGGATGGTAATGAGAAGGG
(CAC)13 1/6 (17) 0.15
Spms11 F:CGTTGATGATAATGGGGAGG
R:CCTCGATAAGTCTTGATCCG
(GAC)9 4/8 (50) 0.49
Spms15 F:GAGGGGTAAGATTGAGGTGA
R:CGTGCTTCACATACGTGTCC
(GAA)9 4/10 (40) 0.35
Spms16 F:CACAGTTGAGGAGGAAACGA
R:TGATGCAATTAAGGATGTCG
(AG)15 2/8 (25) 0.22
Spms12 F:TGCAGCCTCAGAGAACGAGT
R:TCTTGTATCTGTTGCGAGGT
(GAG)10 1/3 (33) 0.27
Spms19 F:TCGACGAACAAAGTGCACAG
R:TTAAGGCCACGTGTCAGGTT
(AAG)8 5/7 (71) 0.61
Spms21 F:CAAGCCAACAATCTACGGTG
R:GAGGAGAAGAGTAGAGGTCA
(TC)13 2/5 (40) 0.39
Spms22 F:CCTGATTCCCGTCTTAGCC
R:GGTATCGAGGCATTACTGCT
(AT)7 2/8 (25) 0.20
Spms24 F:CAATGACGATCTCCTACGAC
R:ATTAGGGTGGTTCGGGAAAG
(TTG)19 4/10 (40) 0.32
Spms25 F:GTAAGTACCTCAGATATCCC
R:AGTTACAGAGATAGCAACCA
(TATCAA)5 5/6 (83) 0.74
Spms26 F:CAATCCGTGACAACCTGCTT
R:ACCAACTATGGCGGATCCAG
(TA)22 1/3 (33) 0.22
Spms32 F:TAACCTAGTGGTCAAAGGAT
R:TGTTTAACTGCTAGTGAGGG
(CAGATG)3 4/4 (100) 0.71
Spms35 F:ACCAGAACACTGCAACAGGA
R:GTTGTGCTGTGTAGAAGTCC
(CAG)4 3/9 (33) 0.29
Spms37 F:GAGGTTCGGATGTGTTGGAC
R:CCAATTCAGGTGGTGGAGGT
(TGG)4 1/3 (33) 0.33
Spms42 F:CCCACCTTGCGAATGTATCC
R:TATCCGCGATCTCAATCCAC
(GAA)10 5/7 (71) 0.54
Spms45 F:TGAGAAATAATTGCTGGAAC
R:GTCCACTACCAGTCTACCAC
(ACAAC)4 1/6 (17) 0.16
Spms48 F:TTCATCTTCTTTGTAGTTGC
R:CCCCGATATGGTCTCATCTT
(GAA)9 3/8 (38) 0.37
Spms57 F:TCTCTCCTCTCAATCAATGC
R:CTCTCCATCGGCTTTCTGTC
(TTCT)3 2/5 (40) 0.21
Spms58 F:CCATGTCCAGAAGAGCAATC
R:CAGCGTAATATCAGGTGTTC
(CAAT)4 1/6 (17) 0.13
Spms60 F:CTGTTGTGTTTTTGCGTTAG
R:AGATCTGTTGTAGCCGTGAC
(GTTT)3 1/3 (33) 0.30
Spms67 F:TGATTCTCCAGTAACACCGA
R:ACACTATCATAACGCTGAGG
(TC)11 1/4 (13) 0.17
Spms71 F:CCACCACATTCTTCATTATT
R:AGTGAGTGTGTGGGTGGTGA
(CACT)5 3/4 (75) 0.69
Spms75 F:AAGTAATAATGATGTCAGCG
R:CGGTAGTCATCCCAAGTCAG
(ATAGA)3 1/5 (20) 0.19
Spms76 F:GATCGAGTATTAAGGGACGG
R:TCATGCACCACTCTGATTAC
(AG)9 7/9 (78) 0.68
Spms78 F:CGTTATCCTCCAAAGTCTCC
R:CAATGGCGTTACTTCATCTT
(AG)16 5/8 (63) 0.57
Spms79 F:TACACAAGCAATCTAGGTGG
R:TTCTAGTGATGCTGATCCTG
(AG)14 1/5 (20) 0.18
Spms82 F:TACAAACTGCAAGGTCTCAA
R:TTCATCCTTACCTAGTACCA
(TG)8 2/3 (67) 0.49
Spms87 F:GTACAATGGATATGATTCTG
R:CTTCAAGCCACCGAGTCCCA
(AAT)12 2/5 (40) 0.39
Spms91 F:AATTGCAGTGTCATTAAGTT
R:ACTATTTCACAAAGTGAGGC
(TTCAG)3 1/3 (33) 0.33
Spms93 F:CAGAATTCAGTTCAGTTCAT
R:GAACTGAACTTATTGGACCT
(AGTTC)3 3/9 (33) 0.29
Spms95 F:TTGTAATCTATAGAATCGTT
R:AGTGGTTGATTATATTCAGG
(TA)13 2/4 (50) 0.37
Spms97 F:AAGGAGGTATGCTTTGGCTA
R:CCATAGGATTTAGGCCCTGT
(CTGAA)5 2/3 (67) 0.53
Spms99 F:CACTATAAACACGTCAGACT
R:CGGTGAGAAACAAAGTTAGG
(CT)7 3/5 (60) 0.59
Spms102 F:ATAAATCACAAACGCAAACT
R:TTCCGATTGAATCTGCTTAT
(ACA)17 5/7 (71) 0.59
Spms106* F:ACTAGTGAGGGGGCCAGTTTACA
R:CAGCTGAGGCTCTTCTTCTTCTTC
(GAA)9 4/5 (80) 0.69
Spms107* F:CTGCTCATTTCTGGTTTGATTGG
R:TCGGGTGTGTTATGATAGGTTGG
(TTG)19 8/9 (89) 0.69
Spms110* F:AGGTAGAGGCAAAGGAAGAGGCA
R:ACAGAACCGGAAAAAAAAAAAGGG
(AGAGGC)5 3/5 (60) 0.45
Spms113* F:AAACTCTTTCTGATGGAGAGC
R:TTTGGAGGAGAGAGAGTGG
(CT)6(CCA)4 1/2 (50) 0.49
Spms115* F:TAGGGTACTGTAGAGGAAGTCG
R:TGGGAATCTAACATTTGTATGC
(GT)5 6/6 (100) 0.87
Spms117* F:CCTCTAGGACCAATAATAATGC
R:CTCTCAACTTTGCTATCAACC
(GTC)6 5/6 (83) 0.63
Spms118* F:AAGAGATCCAAATGCAAAGGAAG
R:GCAACACTAAAAATACCCTAATCG
(AG)15 2/3 (67) 0.59

Note: NPF: Number of polymorphic fragments; %: the percentage of the polymorphic fragments;

*  indicates the microsatellites markers developed by Khattak et al. (2007).

Genetic diversity among the different spinach cultivars in China

Based on the 41 microsatellite loci data, the genetic distance among the 43 spinach cultivars ranged from 0.0745–0.4712, with an average of 0.2227. A UPGMA dendrogram based on Nei’s genetic distances illustrates the genetic relationships among the various spinach cultivars (Fig. 6). The 43 spinach cultivars can be classified into three clusters at the point of branch length of about 0.11. Group III contained four cultivars, including one cultivar derived from Japan, and three cultivars from Shanxi, Xinjiang, and Inner Mongolia, respectively. Group II clustered seven cultivars, including two derived from Shanxi, and five derived from Henan, Hunan, Jiangsu, Xinjiang, and Inner Mongolia, respectively. Group I, the largest cluster, contained the remaining 32 cultivars. Theses cultivars came from different regions and covered all the 10 provinces where the cultivars were usually derived. We also compared the main traits of the cultivars (Supplemental Table 1) with the clustering results. However, no clear correlation between the main traits and the clustering results of the studied cultivars was detected.

Fig. 6

UPGMA dendrogram of the spinach cultivars in the Chinese germplasm collection based on Nei’s genetic distance of 41 genomic microsatellite markers.

Discussion

The genome-wide molecular markers are of great value for both breeding practical study and theoretical genome structure analysis of spinach, a nutritious vegetable and a model species for studying sex determination and sex chromosome evolution. However, only a limited number of microsatellite markers in this species are available. Although the genomic sequences of spinach have been released, there are still no related documents that elucidate the frequency and distribution of microsatellites in a large-scale genome level. In previous studies, microsatellites were generally detected based on a small number of genomic or EST sequences. Early in 1998, Groben and Wricke detected 50 microsatellites in the published sequences in the EMBL and GenBank databases (Groben and Wricke 1998). Recently, Göl et al. identified 3,852 microsatellites from 18,545 contigs which only represent 2.5% of the spinach genome (Göl et al. 2017). The present study was performed based on the entire assembled genome of spinach, which enabled the identification of a far larger number of microsatellites (261,002) compared with previous studies. Such data can facilitate further analyses on the frequency and distribution of microsatellites in the entire genome, and on each chromosome of spinach. Furthermore, the data can provide a convenient way for developing microsatellite markers, which can be helpful in breeding practice.

The microsatellite density varies extensively in diverse plant genomes, reflecting the plant genomes’ flexible tolerance for microsatellites. It has been reported that the microsatellite density usually has a negative correlation with the genome size in plant species, that is, with the increase in genome size, the microsatellite density decreases (Morgante et al. 2002). We found that the density of the microsatellites in the spinach genome was about 262.1 loci/Mbp, which was generally in accordance with this trend. Spinach has a medium genome size of 989 Mbp, and the microsatellite density is in medium level. The microsatellite density was higher than those of larger genomes, such as in maize (120.5 loci/Mbp, Huo et al. 2008) and wheat (36.7 loci/Mbp, Han et al. 2015); however, it was lower than those of smaller genomes, such as in cucumber (551.9 loci/Mbp, Cavagnaro et al. 2010) and Ziziphus jujube (1075.3 loci/Mbp, Xiao et al. 2015). This observation indicated that the microsatellite is not an important player in plant genome size variation. In fact, the total length of microsatellites only occupied 0.5% of the spinach genome. This is a smaller fraction compared with other repetitive sequences, such as the TEs that have been reported to contribute greatly in plant genome expansion (Li et al. 2017).

In different plant genomes, the frequency of each microsatellite repeat has various patterns. However, in general, the di-, tri-, and tetranucleotides are the most frequently occurring microsatellites in plants. The frequency analysis of different nucleotide repeats in spinach showed that the tetranucleotides were the most abundant repeats, followed by the tri-, di-, and pentanucleotide repeats. This finding is similar to those reported regarding a few other plant genomes, such as in cucumber (Cavagnaro et al. 2010). However, in most other species, the di- or trinucleotides were overrepresented in the genomes. For example, in watermelon (Zhu et al. 2016), papaya (Wang et al. 2008), Vaccinium macrocarpon (Zhu et al. 2012), Elaeis guineensis, and Phoenix dactylifera (Xiao et al. 2016), the dinucleotides are the most abundant, whereas in Papaver somniferum (Celik et al. 2014), G. max, Arabidopsis thaliana, Oryza sativa, Sorghum bicolor (Cavagnaro et al. 2010), and Asparagus officinalis (Li et al. 2016), the trinucleotides are typically overrepresented. In rare cases, the pentanucleotides are the most frequent repeats, such as in diploid cotton Gossypium arboretum and Gossypium Raimondii (Lu et al. 2015). These data reflect the variation of genome structure in different species. Although the tetranucleotides occurred more frequently than the trinucleotides in the spinach genome, the cumulative length of trinucleotides was higher due to the high mean repeat number. Thus, the trinucleotide repeats contributed greatly to the spinach genome. With respect to the mean repeat number, there exists a general trend that the number decreases gradually with the increase in motif length. This was generally true in the spinach genome, as indicated in the present study. However, there was an exception: the mean repeat number of the heptanucleotides was the second highest, only lower than that of the dinucleotides. Detailed analysis revealed a considerable number of heptanucleotide repeats with high repeat unit numbers. This is a distinct feature of spinach genome with respect to microsatellites. Hence, further investigations into location analysis by FISH and other related studies should be performed to investigate the potential special contributions of the heptanucleotide repeats to the spinach genome structure.

The analysis of the density of the microsatellites and other elements, such as genes and TEs, on each chromosome can reflect the studied genome structure. Although only 47% of the sequences can be anchored to the spinach chromosomes, the analysis of the microsatellite density, genes, and TEs can provide useful information to help us better understand the spinach genome. The frequency of the microsatellite loci is generally correlated with the chromosome size, indicating the balanced distribution of microsatellites in spinach chromosomes. On each chromosome, the microsatellites were positively associated with the genes but negatively correlated with the TEs generally. This distribution pattern is in agreement with the findings of studies in other plant species (Morgante et al. 2002, Portis et al. 2016). This may be caused by the evolutionary history and dynamics of microsatellites, genes, and TEs. The high frequency of microsatellites in the genic regions of the genome should make them an even more attractive marker type for future genetic analyses in plants (Morgante et al. 2002).

All the designed primers can amplify products, however, only 34 primers showed clear polymorphic bands among the spinach cultivars. The low polymorphism ratio might be due to the high level of identities between different spinach cultivars. The average PIC value of the microsatellite markers used in this study among the spinach cultivars was 0.43. This value is higher than those of several species, such as P. somniferum (Celik et al. 2014), A. officinalis (Li et al. 2016), and Populus tomentosa (Du et al. 2012), but lower than those of other plants, including Z. jujube (Xiao et al. 2015) and chickpea (Parida et al. 2015).

Genetic diversity analysis is essential in assessing the heterogeneity of the genetic pool among populations and cultivars, which in turn, are beneficial for breeding programs and plant resources management (Ghaderi et al. 2014). A total of 41 microsatellite loci were used in this study to investigate the genetic diversity of 43 spinach cultivars. The results revealed that the genetic distance between these cultivars was low, indicating the low heterogeneity and narrow genetic pool of the cultivated spinach in China. This finding is a reminder that spinach cultivar breeders should make relevant efforts, such as introgressing from wild relatives and foreign cultivars to broaden the genetic pool. Although the studied cultivars could be classified into three groups based on the microsatellite analysis results, no common factor was found to be shared by each of the groups. The foreign cultivar “Ribendaye”, which was derived from Japanese, grouped with three domestic cultivars, indicating that these cultivars may have similar genetic backgrounds. Each group contained cultivars derived from different provinces, mostly because of the seed exchange among different breeding institutes. This implies again that the genetic pool of the domestic spinach cultivars is relatively limited. The main differences between the various cultivars were observed in the leaf shape, leaf color, growth duration, and stress resistance. However, the characteristic cultivars were lacking in different regions. Combined with efforts to extensively explore the good characteristics of the local cultivars, new cultivars should be bred so that they are adjusted to future market demands, such as good taste and high nutrition.

In conclusion, the current study reported the large-scale genome-wide identification and characterization of microsatellites in spinach. The investigation of the distributions and variations of the microsatellite repeats is helpful in further understanding the organization of this type of repetitive DNA elements in the spinach genome. In addition, the large sets of microsatellites are valuable sources for developing markers that are useful for genetic studies and breeding applications. The low genetic distance indicates the narrow genetic pool in Chinese spinach germplasm, reminding us that future spinach breeding efforts should enlarge the genetic pool through the ingression of foreign and/or wild resources.

Acknowledgements

This work was financially supported by grants from the National Natural Science foundation of China (31470334 and 31770346), Program for Innovative Research Team (in Science and Technology) in University of Henan Province (17IRTSTHN017), and the Foundation for Young Teachers in Colleges and Universities of Henan Province (2016GGJS-051).

Literature Cited
 
© 2018 by JAPANESE SOCIETY OF BREEDING
feedback
Top