2018 Volume 68 Issue 4 Pages 393-403
Analyzing the genetic differences among crop germplasm resources scientifically and accurately is very important for the selection of core accessions, the identification of new cultivars, and the determination of seed purity. However, phenotypic selection per se is not sufficient to identify genetically distinct accessions. In this study, 26 out of 83 simple sequence repeat markers associated/linked with cotton important agronomic traits derived from our previous and other published research, corresponding to the 26 chromosomes of Upland cotton (Gossypium hirsutum L.), were selected as core primers for DNA fingerprinting construction. The 26 markers showed clear band patterns, good repeatability and high polymorphism. The average alleles, gene diversity index and polymorphism information content were 3.12, 0.4312 and 0.3830, respectively. Using TM-1, a genetic standard line for Upland cotton, as the control, DNA fingerprinting pattern and DNA barcodes were obtained based on the core primers. There was a significant positive correlation between genetic distance matrix determined using 26 core primers and that determined using more primers (335) derived from previous research, further suggesting that the core primers were eminently suitable for DNA fingerprinting in Upland cotton. This study provides a molecular basis for assessing identification, authenticity and seed purity of cotton cultivars.
Cotton (Gossypium spp.) is an important economic fiber crop, and cotton production has a significant role in the global economy (Stephens and Mosley 1974). Among the four cultivated cotton species, namely Upland cotton (Gossypium hirsutum L.), Sea Island cotton (Gossypium barbadense L.), Asiatic cotton (Gossypium arboreum L.) and African cotton (Gossypium herbaceum L.), Upland cotton (2n = 52, AADD) is the most widely cultivated species worldwide due to its high yield and wide adaptability, representing 94% of the growing area and accounting for 95% of world cotton production (Chen et al. 2007). In the process of breeding and production of Upland cotton, identification of differences among cultivars or germplasm accessions mainly depends on the description of phenotypic traits, such as plant height, plant architecture, early maturity, yield, fiber quality and disease resistance (Chen and Du 2006, Talib et al. 2015, Van Esbroeck et al. 1999). However, these traits vary according to environmental conditions, resulting in non-heritable phenotypic variation within a certain range occurring between different years or at different sites. Therefore, it is not reliable to identify individual accessions on the basis purely of phenotypic traits. Meanwhile, the genetic diversity of Upland cotton is much lower than that of the wild species because of the heavy reliance of commercial Upland cotton production on a limited number of genotypes as well as the long period of domestication of this crop under selection pressure for early maturity, insect resistance and disease resistance (Iqbal et al. 2001); as a consequence, the unambiguous identification of newly bred cultivars becomes more difficult. The application of transgenic technology to cotton breeding means that the number of cultivars with improvement in only a few or even a single gene has increased, so it is difficult to accurately distinguish the improved cultivars from their original parents by phenotypic traits. Therefore, establishing an objective, reliable and practicable identification technology to accurately identify different cotton cultivars is urgently needed.
With the rapid development of molecular marker technology, it has become possible to carry out fast and accurate identification of cultivars at the DNA level, which is not sensitive to environmental conditions (Gao et al. 2009). The International Union for the Protection of New Varieties of Plants (UPOV) has included identification of molecular markers into the Distinctness element of DUS (Distinctness, Uniformity and Stability) testing in crop varieties (See UPOV/INF/17 and UPOV/INF/18; http://www.upov.int/information_documents/en/). In China, identification at the DNA level is also an important measure for cultivar quality monitoring, which also provides a theoretical and legal basis for cultivar protection (Wang et al. 2007). DNA barcodes refer to relatively short DNA fragments representing the species, and they are standardizable, easy to amplify, and show sufficient genetic variation (Zhao et al. 2012). A DNA barcode is a DNA fingerprint identity, a digital representation of DNA fingerprinting, which can be used to distinguish different cultivars accurately at the DNA level. Of the various molecular markers, simple sequence repeat markers (SSRs) have the advantages of high polymorphism, good reproducibility, co-dominant inheritance, short amplification products and widespread distribution (Kashi et al. 1997, Röder et al. 1998), making them among the preferred markers for constructing DNA barcodes. Presently, SSRs have been used for DNA fingerprinting or barcode construction in many crops, such as rice (Yan et al. 2011), maize (Li et al. 2005), wheat (Li et al. 2006), soybean (Gao et al. 2009), rapeseed (Chen et al. 2008), sugarcane (Liu et al. 2010), tomato (Dhaliwal et al. 2011), and sesame (Wei et al. 2017), providing a strong guarantee of cultivar identification and intellectual property protection.
In cotton, research into DNA fingerprinting using molecular marker technique has been reported. Punitha and Raveendran (2010) carried out a DNA fingerprinting study on colored- and white-linted genotypes, using randomly amplified polymorphic DNA (RAPD) markers, and cluster analysis showed clear-cut separation of the colored- and white-linted genotypes, forming three distinct clusters. Kuang et al. (2011) constructed a DNA fingerprinting database of 32 major Upland cotton cultivars from three main cotton regions of China, based on 36 SSRs. The result showed that ten cultivars could be distinguished using nine primers while thirty-two major cultivars could be identified using at least five primer combinations. In the study of Li et al. (2014), DNA fingerprinting of 30 major Upland cotton cultivars was analyzed using 20 SSRs. Four of the 30 cultivars had specific primers, i.e. each of the four cultivars could be distinguished from the others by using one specific primer; each of 26 cultivars could be distinguished from the others by using at least two primers. The molecular markers used in the above studies were all obtained from the cotton genome database and, therefore, were not directly related to cotton agronomic traits such as yield, fiber quality, plant architecture, etc., so the efficiency to identify and distinguish cultivars especially some new transgenic cultivars deferring from their parents only with respect to small genomic segments or a few traits, was not high.
In our previous studies, the SSRs distributed at approximately 10-cM intervals in each of 26 chromosomes the cotton tetraploid genetic map (Guo et al. 2007) and the SSRs linked to QTL for agriculturally and economically significant traits of cotton (Li et al. 2013, Mei et al. 2004, Nguyen et al. 2004, Qin et al. 2008, Shen et al. 2007, Song and Zhang 2009, Zhang et al. 2003) were selected to screen 172 Upland cotton cultivars. As a result, 331 polymorphic markers were obtained. Association mapping for important agronomic traits, including yield, fiber quality, early maturity and plant architecture, was conducted using these SSR markers, and some SSRs associated with target traits for cotton breeders were identified (Li et al. 2016a, 2016b, 2016c, 2017). On the basis of these studies, the SSR molecular markers associated with cotton agronomic traits were used in the current study as core primers to construct DNA barcodes of popular Upland cotton cultivars, providing a molecular basis for assessing identification, authenticity and seed purity of Upland cotton cultivars.
A total of 168 commercial Upland cotton cultivars, which have been or are being grown in different ecological cotton-growing areas of China, were selected as experimental materials. Of these, 61 were from the Yellow River basin, 26 were from the Yangtze River basin, 50 were from northwestern China, 20 were from northern China, and the remaining 11 were introduced from countries outside China. In 2015, all accessions were planted in the field of Henan Institute of Science and Technology, Xinxiang, Henan (113°52′ E, 35°18′ N, 95m asl) in single rows with 14–16 plants per row.
Sources and screening of core primers“Core primers” refers to a set of primers with good characteristics of polymorphism, stability, and repeatability, which can be used as a preferred set of primers for related studies. In our previous study, association mapping for important agronomic traits of Upland cotton was carried out using 331 polymorphic SSR markers, and also a number of markers significantly associated with yield (Li et al. 2017), fiber quality (Supplemental Table 1, unpublished), early maturity (Li et al. 2016b, 2016c), and plant architecture (Li et al. 2016a) were identified. In this study, core primers will be screened from these traits-associated markers. In view of the fact that individual primers might not exhibit high polymorphism, stability and reproducibility, the SSR markers identified linked with at least two agronomic traits of cotton based on linkage mapping (Abdurakhmonov et al. 2009, Cai et al. 2014, Mei et al. 2013, Shao et al. 2014, Sun et al. 2012) were also used to screen the core primers. The three parameters, the number of alleles, the gene diversity index (Di) and the polymorphism information content (PIC), were used to measure the polymorphism level of per primer for the 168 cotton accessions. The values of Di and PIC were estimated using the formulas for marker i:
where, n is the total number of alleles of marker i and Pij is the frequency of the jth allele of maker i in population.
PCR detection and genotypingDNA was extracted from young leaves of cotton plants expressing the phenotype characteristic of the cultivar in each row, using the CTAB method (Paterson et al. 1993). The PCR amplification procedure was as follows: pre-denaturation was performed at 95°C for 4 min, denaturation was carried out at 94°C for 30 s, annealing was conducted at 57°C for 45 s, extension was performed at 72°C for 1 min, and the whole process was repeated for 30 cycles; at the end of the 30 cycles, extension was carried out at 72°C for 7 min, and then the temperature was maintained at 10°C for 10 min. The amplification products were separated by polyacrylamide gel electrophoresis (PAGE) and detected by silver staining. SSR genotyping was carried out with reference to the methods of Zhao et al. (2012) and Mei et al. (2013), using TM-1, a genetic standard line for Upland cotton, as the control. Briefly, the band electrophoresis fingerprinting pattern in TM-1 (numbered 1 in the panel) was designated as 1, similar patterns were designated as 1, and different ones were in turn designated as 2, 3, 4, 5, and so on. In this way, the allelic variation matrix of all cultivars was constituted with the fingerprinting codes at a marker locus.
Construction of DNA barcodesOnly one marker was selected from each of the 26 chromosomes of Upland cotton for construction of DNA barcodes, based on the method developed by Yan et al. (2011) for the construction of the rice molecular identity database. As already noted, TM-1 was used as the control to obtain fingerprinting codes of each material. After the fingerprinting codes of all the accessions (including TM-1) were obtained using the core primers, they were arranged in order of chromosomes of Upland cotton, with the A subgenome (Chr.01–Chr.13) first, then D subgenome (Chr.14–Chr.26) next, to construct the DNA barcodes of the 168 Upland cotton cultivars.
Cluster analysis and Mantel testWhen performing cluster analysis and Mantel test, the band electrophoresis fingerprinting pattern of each cultivar, such as 1, 2, 3, 4, 5, etc. was converted to the codes of 0 and 1, which represent the absence and presence of each of the corresponding alleles. In this way, the allelic variation matrix of cultivars was constituted with the codes of 0 and 1 at a marker locus. Nei’s genetic distance of 168 cotton cultivars based on both core primers and more primers was calculated using the provesti.dist() function of R package poppr (Kamvar et al. 2014). The Neighbor-Joining cluster dendrogram for all cultivars based on the genetic distance matrix based on core primers was generated using the nei.dist() function of poppr (Kamvar et al. 2014). The correlation between the genetic distance matrix determined using core primers and that determined using more primers was analyzed by the Mantel test with 10000 permutations using the mantel function of R package vegan (Oksanen et al. 2017).
A total of 78 SSRs, each associated with at least two agronomic traits which had been identified using association mapping in our previous research based on (Li et al. 2016a, 2016b, 2016c, 2017), were located on 24 chromosomes with the exception of chr.04 and chr.05, and three unknown chromosomes (Supplemental Table 2). Here, these markers were repeatedly evaluated, and ultimately 21 SSRs located on 21 chromosomes with clear band patterns, good repeatability and high polymorphism information content were identified (Table 1). They were respectively CGR6078 on Chr.01 (A01), DPL0041 on Chr.02 (A02), MUSS162 on Chr.03 (A03), DPL0852 on Chr.07 (A07), CGR6103 on Chr.08 (A08), DPL0530 on Chr.09 (A09), NAU3467 on Chr.10 (A10), JESPR201 on Chr.11 (A11), CGR5193 on Chr.12 (A12), JESPR204 on Chr.13 (A13), NAU1070 on Chr.14 (D02), NAU3901 on Chr.15 (D01), TMB1268 on Chr.17 (D03), NAU2443 on Chr.18 (D13), NAU1102 on Chr.19 (D05), CGR6022 on Chr.20 (D10), NAU6966 on Chr.22 (D04), NAU5189 on Chr.23 (D09), DPL0068 on Chr.24 (D08), NAU3588 on Chr.25 (D06), and NAU3862 on Chr.26 (D12). In addition, five SSRs, MUSS193 on Chr.04 (A04) and DPL0131 on Chr.21 (D11) reported linked with at least two agronomic traits by linkage mapping (Shao et al. 2014, Sun et al. 2012), NAU3269 on Chr.05 (A05), BNL3650 on Chr.06 (A06), and BNL2634 on Chr.16 (D07) reported associated with at least two agronomic traits by association mapping (Abdurakhmonov et al. 2009, Cai et al. 2014, Mei et al. 2013), were also repeatedly evaluated respectively. The amplification products of these five SSRs all had clear band patterns and good reproducibility. As a result, a total of 26 SSRs, corresponding to the 26 chromosomes of Upland cotton, were identified as core primers for this study (Table 1).
| Core primer | Chr. (Subgenome) | Forward primer 5′-3′/Reward primer 5′-3′ | Sequence source | Allele No. | Di | PIC | Agronomic traits associated/linked with markers * | Literature cited |
|---|---|---|---|---|---|---|---|---|
| CGR6078 | Chr.01 (A01) | CATGCAAGAAAGCTGCTCAA/TAGGCATGTGTCTCCGTGTG | Genome | 3 | 0.3515 | 0.3241 | BW, FL, FS, FE, FU | Li et al. (2017); FL, FS, FE and FU are unpublished (Supplemental Table 1, similarly hereinafter) |
| DPL0041 | Chr.02 (A02) | GCATCATATCATGTCCCATTACAC/GGGAGAGAGTGTAGTATGTTTGGG | Genome | 5 | 0.4945 | 0.4666 | LP, FL, FS, FE | Li et al. (2017); FL, FS and FE are unpublished |
| MUSS162 | Chr.03 (A03) | TTGGTTGGTTAATTACGGGG/GGCTTGTATCTCCCAGCAAG | EST | 3 | 0.5675 | 0.5049 | BW, LI | Li et al. (2017) |
| MUSS193 | Chr.04 (A04) | GAAAATGAGCACTTCTCCGC/AATGCGAATTGATCCAACAG | EST | 2 | 0.4688 | 0.3589 | FL (12.9 cM), FM (2.22 cM), FU (6.20 cM), FE (19.71 cM) | Sun et al. (2012) |
| NAU3269 | Chr.05 (A05) | CGACTTAGCCGCCTATTAAA/TTTATCCTCGAACGACTTCC | EST | 2 | 0.3809 | 0.3083 | LY, SY, BN, LP, LI | Mei et al. (2013) |
| BNL3650 | Chr.06 (A06) | TCGATTTCCTTATTTGATTTCTG/AATTTGTCCAGATTCATTCTTCA | Genome | 3 | 0.4808 | 0.4322 | FL, FE | Abdurakhmonov et al. (2009) |
| DPL0852 | Chr.07 (A07) | GTTCCAAATCAATCTCGTGT/GGCTGTTACAGATCAAACTCCC | Genome | 3 | 0.5184 | 0.4602 | BN, FL, FS, FE, FU, HFFBN | Li et al. (2016c, 2017); FL, FS, FE and FU are unpublished |
| CGR6103 | Chr.08 (A08) | CAAAGGATGGGACACAGGTAA/TGCATTAGATACCGAAATGAGC | Genome | 3 | 0.2725 | 0.2537 | SY, FM | Li et al. (2017); FM is unpublished |
| DPL0530 | Chr.09 (A09) | AGACTTACTTAAAGGCACCATTCG/GCAGACTCTTCTGGTGTAACAGTG | Genome | 3 | 0.4062 | 0.3705 | FM, FL, FS, FU | FM, FL, FS and FU are unpublished |
| NAU3467 | Chr.10 (A10) | AGCTAAGCGCTTCAAGTTGT/ACGCATCCTAGAGGTCAGAA | EST | 3 | 0.5159 | 0.4233 | BP, BN, PH, EFB, FBA | Li et al. (2016a, 2016b, 2017) |
| JESPR201 | Chr.11 (A11) | TCGATCAGTTAGGGTTTTGG/CGAATCTCAACCAGATTTCC | Genome | 3 | 0.5616 | 0.4807 | FFBN, HFFBN | Li et al. (2016c) |
| CGR5193 | Chr.12 (A12) | GGCATCAGGTGCCCTCTTA/AGCAAGTCCGGCACAATC | Genome | 3 | 0.1768 | 0.1692 | LY, LP, LI | Li et al. (2017) |
| JESPR204 | Chr.13 (A13) | CTCCAGGTTCAATGGTCTG/GCCATGTTGGACAAGTAGTC | Genome | 2 | 0.2934 | 0.2503 | SP, BP, FM | Li et al. (2016b); FM is unpublished |
| NAU1070 | Chr.14 (D02) | CCCTCCATAACCAAAAGTTG/ACCAACAATGGTGACCTCTT | EST | 3 | 0.6326 | 0.5549 | LP, TFB, EFB, FBA | Li et al. (2016a, 2017) |
| NAU3901 | Chr.15 (D01) | AAGACAAAAGGCAAGGACAC/CTTGGAAAAAGGAAGAGCAG | EST | 4 | 0.4734 | 0.4186 | LP, LI, FM | Li et al. (2017); FM is unpublished |
| BNL2634 | Chr.16 (D07) | AACAACATTGAAAGTCGGGG/CCCAGCTGCTTATTGGTTTC | Genome | 3 | 0.6111 | 0.5355 | FL, FS, FM | Cai et al. (2014) |
| TMB1268 | Chr.17 (D03) | CAGGTACCATTGATGCCAAA/CTCGAAACCTAGTGCCCTGT | Genome | 3 | 0.2917 | 0.2723 | FBP, GP, FFBN | Li et al. (2016b, 2016c) |
| NAU2443 | Chr.18 (D13) | CGTTGAGAAGGAAAGCCTAA/AGCCTGCTTCATGTTCTTTT | EST | 4 | 0.3289 | 0.3081 | FS, FE, FU, HFFBN | Li et al. (2016c); FS, FE and FU are unpublished |
| NAU1102 | Chr.19 (D05) | ATCTCTCTGTCTCCCCCTTC/GCATATCTGGCGGGTATAAT | EST | 3 | 0.6648 | 0.5907 | GP, FFBN | Li et al. (2016b, 2016c) |
| CGR6022 | Chr.20 (D10) | TGTTTGGCATAAACCCGAAG/TTCTCTATAACCTCTACCCGCCTA | Genome | 3 | 0.5709 | 0.5037 | BW, SI | Li et al. (2017) |
| DPL0131 | Chr.21 (D11) | ACATACGGGTTGAAATGTACTCCT/ATGAATGCAGATCATTACGCCT | Genome | 3 | 0.3430 | 0.3164 | FE (0–14.4 cM), FS (0–14.4 cM) | Shao et al. (2014) |
| NAU6966 | Chr.22 (D04) | GTCATCATTATCGTCAAGTC/AAAGTGAGTTAAGAAAGGCT | Unknown | 3 | 0.3909 | 0.3572 | BW, LI, SI | Li et al. (2017) |
| NAU5189 | Chr.23 (D09) | TGTCCCCCAATCATATTTTC/CAACTTCCCAAGCTCGTATT | EST | 3 | 0.6602 | 0.5860 | FM, PH, HFFBN | Li et al. (2016a, 2016c); FM is unpublished |
| DPL0068 | Chr.24 (D08) | GTTCAACAGGTCTGTACCAGTTCC/GCAAATGATCTCTGCCCTGTAA | Genome | 3 | 0.2997 | 0.2780 | FBP, GP | Li et al. (2016b) |
| NAU3588 | Chr.25 (D06) | CCCCATAGGGCATACTTCTA/GCCAACAAGAACAACAACAC | EST | 5 | 0.3216 | 0.3038 | FM, FL, FS | FM, FL and FS are unpublished |
| NAU3862 | Chr.26 (D12) | TTGGAGAGGGAGATTGGTAG/GGATGAACTTTGCTTTAGCC | EST | 3 | 0.1346 | 0.1291 | SY, LP | Li et al. (2017) |
Di = gene diversity index; PIC = polymorphism information content;
The 26 SSRs revealed a total of 81 alleles ranging from 2 to 5 per primer with an average of 3.12. The Di and PIC ranged from 0.1346 to 0.6648 and from 0.1291 to 0.5907, respectively, and the average values were 0.4312 and 0.3830, respectively. The fingerprinting patterns of two representative primers, NAU1102 and DPL0041, on part of the cultivars, are shown in Fig. 1, where NAU1102 and DPL0041 amplified three and five alleles, respectively.

Electrophoresis fingerprinting patterns of two representative primers, NAU1102 and DPL0041, for part of the Upland cotton cultivars. 1–48 corresponds to the materials in Table 2, respectively; Upper for primer NAU1102 and Lower for primer DPL0041; Dotted arrows mark different alleles.
The genomic DNA of 168 cotton cultivars was amplified using the 26 core primers, and their corresponding fingerprints were obtained. Using TM-1 as the control, the electrophoresis fingerprinting pattern detected in TM-1 was denoted as 1, and fingerprinting codes of all cultivars (including TM-1) were obtained, which were recorded as 1, 2, 3, 4, 5 and so on. These codes were organized to form a data set according to the sequential arrangement of chromosome number and the order of A subgenome (Chr.01–Chr.13) first, followed by D subgenome (Chr.14–Chr.26), resulting in a code combination for the SSRs on 26 chromosomes for each cultivar. A total of 26 digits formed the SSR relative molecular identity number, i.e. DNA barcode, unique to each cotton cultivar (Table 2). The electrophoresis fingerprinting patterns corresponding to the barcodes are shown in Fig. 2. Barcodes and electrophoresis fingerprinting patterns can be used for verification of the identity of a specific cultivar, thus achieving the purpose of cultivar identification.
| No. | Cultivar | Ecological origin | DNA barcode * |
|---|---|---|---|
| 1 | TM-1 | USA | 1111111111111-1111111111111 |
| 2 | KK1543 | Former Soviet Union | 1122112222221-2222221121112 |
| 3 | Heishanmian | Northern China | 1121121232111-2332221111221 |
| 4 | Xinluzao1 | Northwestern China | 1121113212311-2332122211231 |
| 5 | Xinluzao2 | Northwestern China | 1212111212211-2312112211311 |
| 6 | Xinluzao3 | Northwestern China | 1121131212222-2312332112112 |
| 7 | Xinluzao4 | Northwestern China | 1311113212212-2322122213111 |
| 8 | Xinluzao5 | Northwestern China | 1111112211211-2412122312111 |
| 9 | Xinluzao6 | Northwestern China | 1431112212211-3312333211131 |
| 10 | Xinluzao7 | Northwestern China | 1112111232311-3312133313211 |
| 11 | Xinluzao8 | Northwestern China | 1112111232311-3312133313211 |
| 12 | Xinluzao9 | Northwestern China | 1111111212211-1412133213111 |
| 13 | Xinluzao10 | Northwestern China | 1131111212111-3422321112111 |
| 14 | 18-3 | Northwestern China | 1122121212111-2332421113111 |
| 15 | Xinluzao11 | Northwestern China | 1111111211211-2312123112141 |
| 16 | Xinluzao12 | Northwestern China | 3131132211211-2412121133111 |
| 17 | Xinluzao13 | Northwestern China | 1311111211212-1422122113111 |
| 18 | Xinluzao15 | Northwestern China | 1111122211131-2312112113111 |
| 19 | Xinluzao16 | Northwestern China | 2112213223112-1322132111111 |
| 20 | Xinluzao17 | Northwestern China | 1111112211111-1312112312111 |
| 21 | Xinluzao18 | Northwestern China | 1221111211111-2312112113131 |
| 22 | Xinluzao19 | Northwestern China | 1221111211211-2312133113121 |
| 23 | Xinluzao20 | Northwestern China | 3521113232112-2222422311131 |
| 24 | Xinluzao21 | Northwestern China | 1521113212112-2322132312111 |
| 25 | Xinluzao22 | Northwestern China | 1111211221212-1322122113111 |
| 26 | Xinluzao23 | Northwestern China | 1511111331312-3322131132221 |
| 27 | Xinluzao24 | Northwestern China | 2311113111212-1332121121311 |
| 28 | Xinluzao25 | Northwestern China | 1412111212211-2313133111111 |
| 29 | Xinluzao26 | Northwestern China | 1512112211111-3312132111111 |
| 30 | Xinluzao27 | Northwestern China | 3412113211211-2312122222211 |
| 31 | Xinluzao28 | Northwestern China | 2311113321212-1332121121311 |
| 32 | Xinluzao29 | Northwestern China | 2521211231111-2332311113111 |
| 33 | Xinluzao30 | Northwestern China | 2321213223212-2312422211131 |
| 34 | Xinluzao31 | Northwestern China | 2311113321212-1321121121321 |
| 35 | Xinluzao32 | Northwestern China | 2111113211111-1422421213111 |
| 36 | Xinluzao33 | Northwestern China | 1112111211312-1322431113111 |
| 37 | Xinluzao34 | Northwestern China | 1512111212111-2322111311111 |
| 38 | Xinluzao35 | Northwestern China | 1321211211111-2322112111111 |
| 39 | Xinluzao36 | Northwestern China | 1131121222111-3412111113311 |
| 40 | Xinluzao37 | Northwestern China | 1312111321112-1422132131311 |
| 41 | Xinluzao38 | Northwestern China | 1122131213211-2323221223111 |
| 42 | Xinluzao39 | Northwestern China | 2311113121212-1322121121311 |
| 43 | Xinluzao40 | Northwestern China | 2111112212112-1322133112212 |
| 44 | Xinluzao41 | Northwestern China | 1111211212211-2422112113111 |
| 45 | Xinluzao42 | Northwestern China | 2122121222211-2412122331111 |
| 46 | Baimian1 | Yellow River | 1111131212211-1412131113111 |
| 47 | Xinluzao46 | Northwestern China | 2211113211211-1412122121111 |
| 48 | Xinluzao47 | Northwestern China | 1232112321212-1322131131111 |
| 49 | Xinluzao48 | Northwestern China | 3121113213111-2332312113111 |
| 50 | Xinluzao49 | Northwestern China | 1321121213211-2322312113111 |
| 51 | Xinluzao51 | Northwestern China | 3112131211211-2322312313111 |
| 52 | Xi9 | Yellow River | 3111113311211-2322312132111 |
| 53 | Xinluzhong36 | Northwestern China | 1212132211311-3322312113111 |
| 54 | CRI8 | Yellow River | 2111111213211-3322112131111 |
| 55 | CRI10 | Yellow River | 1211111212211-2422121111111 |
| 56 | CRI12 | Yellow River | 1132211211122-3322112112111 |
| 57 | CRI13 | Yellow River | 1112111221111-2323122113121 |
| 58 | CRI14 | Yellow River | 3112113213311-2321232212131 |
| 59 | CRI15 | Yellow River | 1111111211211-1412111111111 |
| 60 | Zhong1707 | Yellow River | 1211111211111-1323112111111 |
| 61 | CRI17 | Yellow River | 1112211231211-2323132112111 |
| 62 | CRI18 | Yellow River | 1112121211111-2322133133111 |
| 63 | CRI19 | Yellow River | 3132131212211-2332123111111 |
| 64 | CRI20 | Yellow River | 1122132221212-2412132112111 |
| 65 | CRI22 | Yellow River | 1112232212111-3422123111111 |
| 66 | CRI23 | Yellow River | 1111211211221-3312112133112 |
| 67 | CRI24 | Yellow River | 1111111231111-1112113111111 |
| 68 | CRI25 | Yellow River | 1111211211211-2412112122111 |
| 69 | CRI26 | Yellow River | 1132111232211-3412111112211 |
| 70 | CRI27 | Yellow River | 1111113212211-1412111113311 |
| 71 | CRI30 | Yellow River | 1222111222211-2312133112111 |
| 72 | CRI33 | Yellow River | 1111111211211-1412122111121 |
| 73 | CRI34 | Yellow River | 1111111212211-1322111111111 |
| 74 | CRI35 | Yellow River | 1111111212211-1432111111111 |
| 75 | CRI36 | Yellow River | 1111111211211-1432122111121 |
| 76 | CRI37 | Yellow River | 1112131212311-1312132332111 |
| 77 | CRI40 | Yellow River | 1112213211211-2332112111111 |
| 78 | CRI50 | Yellow River | 1131112212112-1322132112321 |
| 79 | CRI58 | Yellow River | 1511111211111-1313112113231 |
| 80 | CRI64 | Yellow River | 1111112311211-1312122112111 |
| 81 | Zhongzhimian2 | Yellow River | 1112231211311-3412122122111 |
| 82 | Liaomian4 | Northern China | 1111212211211-1312132111121 |
| 83 | Liaomian5 | Northern China | 1131212211211-2312132113111 |
| 84 | Liaomian7 | Northern China | 1112111212311-1312122131111 |
| 85 | Liaomian8 | Northern China | 1112123212311-2423131112111 |
| 86 | Liaomian16 | Northern China | 1111112211121-1312133121112 |
| 87 | Liaomian18 | Northern China | 2211112211211-1312121121111 |
| 88 | Liaomian19 | Northern China | 1111112211111-1312121111111 |
| 89 | Baimian985 | Yellow River | 1111111211111-1112112112111 |
| 90 | Yumian1(CQ) | Yangtze River | 1131111211111-3312412112111 |
| 91 | Baimian5 | Yellow River | 3111211212111-1332112131111 |
| 92 | Yumian1(HN) | Yellow River | 1112111211211-1311112113111 |
| 93 | Yumian5 | Yellow River | 1222113211311-2411132313111 |
| 94 | Yumian7 | Yellow River | 1131222221211-3412222111111 |
| 95 | Yumian12 | Yellow River | 1512131132211-3321133112121 |
| 96 | Yumian21 | Yellow River | 2131111231211-3322133132211 |
| 97 | Lumian1 | Yellow River | 1131211312211-3412112113111 |
| 98 | Lumian4 | Yellow River | 1131231311211-2312112113111 |
| 99 | Lumian6 | Yellow River | 1311111112211-1311112111111 |
| 100 | Lumian10 | Yellow River | 1111111211211-1312112113111 |
| 101 | Lumianyan21 | Yellow River | 1111131212111-1412112113111 |
| 102 | Lumianyan28 | Yellow River | 1111211212211-2412112123111 |
| 103 | Lumianyan29 | Yellow River | 1121112211111-2312332111111 |
| 104 | Shiyuan321 | Yellow River | 1131111211131-3423122111111 |
| 105 | Fenwu195 | Yellow River | 1121111212231-2433122311111 |
| 106 | Qianjiang9 | Yangtze River | 1111111111111-1432132113111 |
| 107 | Xiangmian3 | Yangtze River | 1112111232111-1312112133111 |
| 108 | Xiangmian10 | Yangtze River | 1121213211111-2332132213111 |
| 109 | Yanmian48 | Yangtze River | 1112111212211-1312332312211 |
| 110 | Jiangsumian1 | Yangtze River | 1112211211111-1322322313111 |
| 111 | Sumian1 | Yangtze River | 1312111212211-1332322112211 |
| 112 | Sumian6 | Yangtze River | 3221211211231-2322121111142 |
| 113 | Sumian9 | Yangtze River | 3112211211111-1312312113111 |
| 114 | Sumian10 | Yangtze River | 1132131212111-3322121132111 |
| 115 | Sumian12 | Yangtze River | 3311111232211-1312123112111 |
| 116 | Sumian16 | Yangtze River | 3331212311222-3333121332111 |
| 117 | Xuzhou142 | Yangtze River | 3112221212111-2412313312111 |
| 118 | Qiannong465 | Yangtze River | 1532132332311-3312133112111 |
| 119 | Dongting1 | Yangtze River | 1111212111111-1311111112111 |
| 120 | Xinqiu1 | Yellow River | 1111131331311-3113112112111 |
| 121 | Simian2 | Yangtze River | 1112131211111-1322132113211 |
| 122 | Simian3 | Yangtze River | 1132211232211-3312132112112 |
| 123 | Simian4 | Yangtze River | 1111211231111-1422112213111 |
| 124 | Chuanmian56 | Yangtze River | 2531132231221-2322333113112 |
| 125 | Jinzhong169 | Northern China | 1121111111111-2312122113151 |
| 126 | Jinzhong200 | Northern China | 1111131121132-2312222111112 |
| 127 | Jinmian5(SX) | Northern China | 1222121221111-2322121112231 |
| 128 | Jinmian6 | Northern China | 1211211211212-2311121111331 |
| 129 | Jinmian8 | Northern China | 1531131331111-2313131113211 |
| 130 | Jinmian9 | Northern China | 1232221211131-2411131113213 |
| 131 | Jinmian13 | Yellow River | 1131121211111-3112332113111 |
| 132 | Jinmian14 | Northern China | 1111112212232-1332121111111 |
| 133 | Jinmian24 | Northern China | 1131131212211-3232113111211 |
| 134 | Jinmian29 | Yellow River | 1111112212211-1132122111151 |
| 135 | Jinmian36 | Yellow River | 1321111212212-1332122111111 |
| 136 | Jinmian45 | Yellow River | 1112211212111-1312122131111 |
| 137 | Jinmian1 | Northern China | 1122221212111-2312122133111 |
| 138 | Jinmian2 | Northern China | 3132221211311-3312132132221 |
| 139 | Jinmian4 | Northern China | 1122111221212-2322132312131 |
| 140 | Jinmian5 | Northern China | 1111121211211-2412122111111 |
| 141 | Guoxinmian3 | Yellow River | 3132132212212-1322132133121 |
| 142 | Jimian958 | Yellow River | 1111211212211-2132111123111 |
| 143 | Jimian1 | Yellow River | 1121221211211-2131311113111 |
| 144 | Jimian7 | Yellow River | 1122122213312-1332111113121 |
| 145 | Jimian12 | Yellow River | 1112111321311-1312123113131 |
| 146 | Ejing1 | Yangtze River | 1111111331111-1322123113121 |
| 147 | Emian3 | Yangtze River | 1511211212211-1412111111111 |
| 148 | Emian14 | Yangtze River | 2521131211211-2222132122111 |
| 149 | Esha28 | Yangtze River | 1531132212311-3332132112111 |
| 150 | Jing8891 | Yangtze River | 1111111211111-1312121113131 |
| 151 | Daihongdai | Yangtze River | 1211111211211-1312122211311 |
| 152 | STV2B | USA | 1122112312211-2312133313111 |
| 153 | DPL15 | USA | 1112211211211-2322133113111 |
| 154 | DPL16 | USA | 3512131211211-1331112111121 |
| 155 | Shanmian4 | Yellow River | 1112232212211-1312113132111 |
| 156 | Shan1155 | Yellow River | 1112232211211-2322121112111 |
| 157 | Shan2365 | Yellow River | 1121211211211-2412122123111 |
| 158 | Handan802 | Yellow River | 1132131211211-3112133132111 |
| 159 | Handan885 | Yellow River | 1131121211211-3412132113111 |
| 160 | Ganmian8 | Yangtze River | 1111211211211-1312112113111 |
| 161 | Uganda3 | Uganda | 1121221211131-2312122113113 |
| 162 | Lvzao254 | Northwestern China | 1111132231211-1312132112111 |
| 163 | Shixuan87 | Uzbekistan | 1111122211211-2311121111111 |
| 164 | Beishinuo | USA | 1122121221211-2122122313111 |
| 165 | 99M4 | USA | 1121112231211-2322132113111 |
| 166 | 99M7 | USA | 1331112311232-3333132122113 |
| 167 | 99M8 | USA | 1522112231211-2322112223111 |
| 168 | Lamagan77 | Northwestern China | 3522113211112-2333332313211 |

Electrophoresis fingerprinting patterns of 26 core primers corresponding to DNA barcodes.
Revealing the genetic relationship among cultivars can provide important supporting information for cultivar identification. Based on genotype data from the 26 core primers, cluster analysis of the 168 Upland cotton cultivars was carried out and Neighbor-Joining dendrogram was obtained (Fig. 3). The 168 cultivars could be categorized into nine groups (A–I). Group A contained 17 cultivars, of which, three came from the Yellow River basin, one from the Yangtze River basin, 12 from northwestern China, and one from abroad. Group B contained 27 cultivars, of which, 10 came from the Yellow River basin, three from the Yangtze River basin, six from northwestern China, seven from northern China, and one from abroad. Group C contained 12 cultivars, of which, nine came from the Yellow River basin, the other three from the Yangtze River basin, northwestern China and northern China, respectively. Group D contained 33 cultivars, of which, 17 came from the Yellow River basin, nine from the Yangtze River basin, four from northwestern China, one from northern China, and two from abroad. Group E contained 13 cultivars, of which, six from northwestern China, four from northern China, and three from abroad. Group F contained 18 cultivars, of which, one came from the Yellow River basin, three from the Yangtze River basin, 11 from northwestern China, one from northern China, and two from abroad. Group G contained 21 cultivars, of which, 12 came from the Yellow River basin, two from the Yangtze River basin, three from northwestern China, and four from northern China. Group H contained 11 cultivars, of which, three came from the Yellow River basin, four from the Yangtze River basin, three from northwestern China, and one from abroad. Group I contained 16 cultivars, of which, six came from the Yellow River basin, three from the Yangtze River basin, four from northwestern China, two from northern China, and one from abroad.

Neighbor-Joining (NJ) dendrogram for 168 Upland cotton cultivars based on the genetic distance matrix determined using 26 core primers (All cultivars were categorized into nine groups, Group A–I).
Correlation analysis between the genetic distance matrix determined by the 26 core primers and that determined by the 335 primers (330 from our previous work and the other five from published research) for 165 cultivars [the three cultivars, TM-1, Baimian5 and Yumian1(HN), were removed from 168 cultivars due to their absences in the detecting of 309 primers out of 26 core primers in our previous work] was carried out using the Mantel test with 10000 permutations. The R scripts can be found in Supplemental Data 1; the raw genotype data of 168 cultivars for 26 primers and 165 cultivars for 335 primers can be found in Supplemental Data 2 and Supplemental Data 3, respectively. The results showed that there was a significant positive correlation between the two genetic distance matrices (r = 0.3078, P = 9.999e-05), indicating that the genetic relationship revealed by the 26 core primers was highly similar to that revealed by the 335 primers, further illustrating that using these core primers for diversity evaluation of Upland cotton cultivars was reliable and they were eminently suitable for DNA fingerprinting in Upland cotton.
DNA barcode is the digital representation of DNA fingerprinting. Different researchers use different DNA barcoding methods for experimental materials. Most of the studies code marker genotypes as 1 or 0, according to the presence or absence of the band, respectively, and construct DNA fingerprints using 0–1 numeric string (Dhaliwal et al. 2011, Li et al. 2006, Liu et al. 2010, Pan et al. 2008) and denary numeric string (Li et al. 2014, Pan et al. 2010). As Upland cotton (2n = 52, AADD) is an allotetraploid, the number of SSR bands is large and the bands appear in groups, so that the SSR band pattern of Upland cotton is more complex than for diploid crops. Thus marking the existence or non-existence of each band with 1 or 0 might cause the phenomenon of misjudgment. Therefore, assignment of different electrophoresis fingerprinting patterns to different cultivars was carried out in this study based on the method of Zhao et al. (2012). In view of the fact that TM-1, a genetic standard line for Upland cotton, is highly homozygous and has wide application in genetic and breeding research, it was used as the control and its electrophoresis fingerprinting pattern was assigned the code 1. The genotype codes of other cultivars for individual primers were sorted by the order of chromosomes where the markers were located and the order of chromosomes from genome A first and genome D next, resulting in the construction of unique DNA barcode of each of the cotton cultivars. Generally, these codes are less than 10, and also, the electrophoresis fingerprinting patterns corresponding to the codes are shown in Fig. 2. For the above reasons, our method for DNA fingerprints in Upland cotton is more reasonable and convenient than previous traditional methods. We propose that TM-1 could be used as the standard control material in the future for construction of all DNA barcodes of Upland cotton, so that DNA barcodes from different research groups can be used for comparison and identification on the DNA level among different cultivars.
In recent years, cotton DNA fingerprinting has been widely reported. Guo et al. (1996) constructed DNA fingerprinting of 9 main Upland cotton cultivars in China using RAPD markers. Song et al. (1999) distinguished 8 cotton cultivars from each other using 26 AFLP markers. Wang and Li (2002) obtained the DNA fingerprinting of brown cotton “Three Lines” (sterile, maintainer, restorer) and their hybrid F1 using AFLP markers. However, these studies are in the early stages of molecular marker development, and the experimental materials for constructing fingerprinting are mainly confined in the target materials without considering more background materials. In the process of DNA fingerprinting construction, it is necessary to select more background germplasm for screening primer polymorphism and for enhancing the specificity of cultivar identification (Li et al. 2011). Because of the low level of polymorphism among Upland cotton cultivars, in order to identify SSR loci that were involved in every chromosome, which exhibited high polymorphism and effectively distinguished genomic characteristics of different cotton accessions, 168 cultivars of Upland cotton were selected as experimental materials for repeated screening of primers. These cultivars acted as not only target materials but also background materials. They were either from one of the four major cotton-growing areas in China, each with different environmental characteristics, or introduced from foreign countries, so the range of pedigrees was very rich, including members of the Stoneville pedigree, such as Lumian6, Sumian1 and Sumian12, the Foster pedigree, such as Xinluzao11, Liaomian10 and Liaomian18, the King pedigree, such as Heishanmian, Xinluzao10 and Xinluzao23, the Deltapine pedigree such as Xinluzao5, CRI19 and Yumian1, Trice cotton, such as Qiannong465, and the Uganda pedigree such as CRI12, Yumian21 and Uganda3. Neighbor-Joining cluster dendrogram based on the 26 core primers further reflected the genetic relationship among cultivars, which were categorized into nine groups. Therefore, the 168 cultivars were used as background materials to identify genome-wide polymorphic markers, and as target materials to identify stable polymorphic markers on 26 chromosomes of Upland cotton for the follow-up DNA fingerprinting.
SSR markers are among the most widely-used molecular markers at present. SSR markers are co-dominant, simple and easy to automate and abundant in quantity. Since their amplification products are stable, they have great advantages for use in the analysis of cotton genome evolution, genetic diversity and DNA fingerprinting (Guo et al. 2003, Kantartzi et al. 2009, Khan et al. 2000). SSRs can be classified into three types (Li et al. 2014). The first type are markers within genes, which can identify genes associated with important agronomic traits in crops. The second type are markers closely linked to traits, which can identify disease resistance and stress tolerance genes and some important quality trait genes in crops. The third type are markers loosely linked with trait genes. When using DNA fingerprinting to identify crop cultivars and to determine their purity, priority should be given to the first two types of markers considering the correlation between markers and traits. In the past, not only was the number of basic primers used for analysis of cotton fingerprinting limited, but these primers were mostly of the third type, so that the polymorphism level of primers was low, or there was no corresponding relationship between polymorphism and phenotypic differences among cultivars, resulting in low detecting efficiency. In this study, the basic primers used for identifying core primers were from two sources. One source was the SSR markers selected from 26 chromosomes of Upland cotton based on the genetic map constructed by previous research (Guo et al. 2007). Although these markers were mainly of the third type, they had high polymorphism themselves among accessions because they are derived from the published genetic map. The other source was published SSR markers closely linked to important trait genes of cotton (Li et al. 2013, Mei et al. 2004, Nguyen et al. 2004, Qin et al. 2008, Shen et al. 2007, Song and Zhang 2009, Zhang et al. 2003). These markers were mainly of the first and the second types. Therefore, compared with previous studies, the basic primers selected in this study had a wider range of sources and could be used for effective screening of polymorphisms among accessions.
Determination of core primers is an important part of a DNA fingerprinting analysis system, and it is also a key step in the commercialization of DNA fingerprint identification. It not only greatly reduces the cost of primer synthesis, but also greatly reduces the intensive work of screening primers. It also allows fingerprinting results from different research groups to be compared and integrated. Pan et al. (2008) used 12 accessions with significantly different phenotypes and genetic backgrounds as the panel germplasm for screening 5,914 pairs of SSR primers, and the results showed that 319 pairs of primers exhibiting suitable amplification and clear bands could be regarded as core primers for determining fingerprints. Moreover, they recommended 13 SSR primer pairs expressing polymorphism in Upland cotton, Sea Island cotton and Asiatic cotton, which could be regarded as the first-choice markers for DNA fingerprinting and germplasm identification. Zhao et al. (2012) selected 12 cotton cultivars derived from different pedigrees and different cotton-growing areas to identify SSR markers exhibiting high levels of polymorphism. Twenty-six SSR primer pairs were tagged onto the corresponding 26 chromosomes of cultivated tetraploid cotton species, and were recommended as the first-selected primer pairs to establish DNA barcodes of cotton cultivars, while the other 25 SSR primers could be used as candidate primers. These core primers have laid the foundation for the establishment of the standard DNA fingerprint database of Upland cotton. However, the markers used in the above studies were all obtained from cotton genome database, and are not directly related to cotton agronomic traits, so the efficiency to identify and distinguish cultivars is not high. In particular, owing to the application of transgenic technology on cotton breeding in recent years, some new transgenic cultivars differ from their parents only with respect to small genomic segments or a few traits. Therefore it is very difficult to accurately distinguish the accessions which have close relationship using limited molecular markers unrelated to phenotypic traits. In this study, the selection of core primers depends on both the level of marker polymorphism and the relationships between markers and agronomic traits. Although the average alleles (3.12), Di (0.4312) and PIC (0.3830) of the 26 core primers were lower than those reported by Lacape et al. (2007) (an average alleles of 5.6 and PIC of 0.55) and Moiana et al. (2012) (an average alleles of 6.9 and PIC of 0.646), they were similar to those reported by Bertini et al. (2006) (an average alleles of 2.13 and PIC of 0.40), Fang et al. (2013) (an average alleles of 2.64 and PIC of 0.2869) and Zhao et al. (2015) (an average alleles of 2.26, Di of 0.3502, and PIC of 0.2857), and above all, these markers were all associated/linked with at least agronomic traits of cotton. Therefore, the 26 core primers identified in this study are eminently suitable for DNA fingerprinting in Upland cotton. This is exactly the innovation of this research.
Molecular genetic studies suggest that genetic correlations among different quantitative traits may be a result of gene interaction or pleiotropy (Lehner 2011). Based on the above reasons, molecular markers associated with cotton agronomic traits including yield, fiber quality, early maturity, and plant architecture (Abdurakhmonov et al. 2009, Cai et al. 2014, Li et al. 2016a, 2016b, 2016c, 2017, Mei et al. 2013, Shao et al. 2014, Sun et al. 2012) were selected as core primers to conduct DNA fingerprinting in this study. Because the DNA differences corresponded to phenotypic differences, and also because 11 out of the 26 core primers were ESTs (expressed sequence tags, Table 1) corresponding to parts of a specific expressed gene sequence, the ability to distinguish different cultivars, especially those with only small phenotypic differences, was more effective. There was a significant positive correlation between the genetic distance matrix calculated by the 26 core primers and that by all 335 pairs of primers, further suggesting that these core primers for were eminently suitable for DNA fingerprinting in Upland cotton.
In recent years, the number of new cotton cultivars has been increasing year by year. As the number of cotton cultivars is not fixed and new cultivars bred by organizations and companies constantly enrich the Germplasm Bank, DNA fingerprint data have also been growing with the input of fingerprint data from the new cultivars. At the same time, with the continuing growth of the cultivar pool, the identification efficiency of existing core primers will be affected; therefore, the DNA fingerprint identity data of cultivars should also be extensible. When the existing core primers cannot meet the needs of cultivar identification, new effective core primer data can be added to the DNA fingerprinting database. Presently, the genome of the Upland cotton has been sequenced (Li et al. 2015, Zhang et al. 2015), and a total of 77,996 SSR markers were identified based on the sequence information (Wang et al. 2015). Further, a genome-wide single-nucleotide polymorphism (SNP) chip of Upland cotton, NAUSNP80K array, was successfully developed based on sequencing of ‘TM-1’ and re-sequencing of 100 different cultivars in Upland cotton with 5× coverage on average (Cai et al. 2017), which provided an important guarantee for exploring SNP markers related to cotton important agronomic traits. In the research of Wei et al. (2017), a total of 140 polymorphic markers including 46 SSRs, 47 SNPs and 47 InDels were recommended as a core set of molecular markers to establish sesame cultivars’ fingerprinting. Especially, 9 SSRs 15 SNPs and 14 InDels were sufficient to distinguish all sesame cultivars. Therefore, further research needs to be focused on developing more SSR markers, or even SNP and InDel markers, associated with specific cotton agronomic traits, which would greatly ensure the accuracy, timeliness and practicality of a cotton DNA fingerprinting database, and its applicability to cotton improvement.
This research was supported by National Natural Science Foundation of China (31671743), Innovative Talent Support Program of Science and Technology of Henan Institute of Higher Learning (16HASTIT014), and Technology Demonstration and Industrialization of Seed Industry Facing the Five Central Asian countries (161100510100).