2015 Volume 90 Issue 5 Pages 269-281
AA genome species in the genus Oryza are valuable resources for improvement of cultivated rice. Oryza rufipogon and O. barthii were progenitors of two domesticated rice species, O. sativa and O. glaberrima, respectively. We used chloroplast single-nucleotide repeats (RCt1-10) to evaluate genetic diversity among AA genome species. Higher diversity was detected in the American species O. glumaepatula and the Asian species O. rufipogon. Other chloroplast sequences indicated that O. glumaepatula shares high similarity with O. longistaminata. Insertions of retrotransposable elements, however, showed a close relation between O. barthii and O. glumaepatula. To clarify phylogenetic relationships among AA genomes, whole-genome sequences obtained from different species were used to develop chloroplast INDEL markers. The INDEL patterns clearly showed multiple maternal origins of O. glumaepatula. The complicated origins have resulted in high genetic diversity in this species. In contrast, the Australian endemic species O. meridionalis tended to show narrower diversity than the other species. High variation in O. rufipogon, reconfirmed using the chloroplast INDELs, covered the variation in O. meridionalis and part of the variation in O. glumaepatula. Maternal lineages including O. barthii, O. longistaminata and the remainder of O. glumaepatula were phylogenetically close to each other and carried low genetic diversity. They were separated from independent lineages, suggesting that they had diverged from a single ancestral maternal lineage, but diverged later to keep gene flow within respective species, as SSR compositions suggested. Genetic relationships among AA genome species indicate how these species have evolved and become distributed across four continents.
The evolution of the genus Oryza, especially AA genome species, has been a major focus of rice research (Oka, 1988). AA genome species include two cultigens: O. sativa, which is cultivated worldwide, and O. glaberrima, an endemic species distributed along the River Niger. The progenitors of these cultivated species were once regarded as the O. perennis complex, consisting of four geographical races – Asian, African, American and Oceanian – inhabiting different regions (Morishima, 1969). These were subsequently renamed and defined as different species: O. rufipogon, O. barthii, O. longistaminata, O. glumaepatula and O. meridionalis (Oka, 1988). Oryza sativa was domesticated in East Asia from O. rufipogon (Oka, 1988; Fuller and Sato, 2008; Huang et al., 2012), and O. glaberrima was independently domesticated from O. barthii (Wang et al., 2014).
The major cultivated species, O. sativa, has a deep population structure due partly to introgression from various varieties of O. rufipogon, and also from other varietal groups (Tang and Morishima, 1997; Ishikawa et al., 2002a, b; Garris et al., 2005; McNally et al., 2009; Molina et al., 2011; Huang et al., 2012). Wide variation had already occurred 86 to 440 thousand years ago between subspecies in O. sativa, indica and japonica, and 2 million years ago among AA genome wild species (Zhu and Ge, 2005; Molina et al., 2011). The deep divergence between subspecies in O. sativa is evident in genome divergence (Huang et al., 2012). This variation can be seen in various measurements of divergence (McNally et al., 2009; Molina et al., 2011). japonica was classified into Temperate-japonica and Tropical-japonica (Oka, 1988; Sato, 1991). Divergence between Tropical-japonica and indica was estimated to be as deep as about 3,900 years ago (Molina et al., 2011). However, these varietal groups shared monophyletic origin. Because they shared identical alleles at several loci regulating major agronomic traits such as non-shattering, both varietal groups were created through past introgression to share the same genetic components (Tang and Morishima, 1997; Ishikawa et al., 2002a, b). Past introgression events among wild forms and cultivars have also been inferred from genomic data (Huang et al., 2012). Such introgressions can still be observed as weedy rice, and in parts of genome sequences where selective sweeps have occurred (Molina et al., 2011). In fact, strong selective sweeps can be seen to have occurred at several key domesticated genes such as the non-shattering gene and white pericarp gene (Li et al., 2006; Lin et al., 2007; Sweeney et al., 2007). In contrast, the endemic species O. glaberrima was domesticated from the annual O. barthii, which is widespread in west tropical Africa (Oka, 1988; Khush, 1997; Semon et al., 2005). Recent genomic data have suggested that O. glaberrima was domesticated from a narrow gene pool in O. barthii (Wang et al., 2014). Because the domesticated species occurred independently from O. rufipogon, O. glaberrima has diverged sufficiently from O. sativa to develop multiple reproductive barriers (Morishima, 1969). The related perennial species, O. longistaminata, shared the same habitat as African species, but was more widely distributed in Africa and was independent from domestication.
Two other wild species in different continents, O. meridionalis and O. glumaepatula, have shown no domestication events. O. meridionalis is a species endemic to Oceania, including New Guinea and Australia, and is characterized by an annual life history and morphologically by a short anther (Ng et al., 1981; Vaughan, 1994; Lu, 1999). Its distribution partly overlaps with that of O. rufipogon. Recently, a possible new species has been reported in the same area, which appears to have diverged from the endemic species but acquired a perennial life history (Sotowa et al., 2013; Brozynska et al., 2014). In addition, a conventional perennial type, the so-called Australian O. rufipogon, was found to share a highly similar chloroplast genome with this possible new species and also O. meridionalis. However, the Australian O. rufipogon did not show any reproductive barrier against Asian O. rufipogon. Nuclear markers also suggested that the Australian type had not shown any divergence from the Asian type. However, there is still insufficient understanding of divergence among AA genome species. Another example is the American species, O. glumaepatula, which is distributed in the New World ranging from Cuba to Brazil. The life form of this taxon is still uncertain: an ecotype distributed in Central America and the northern region of South America seems to have a perennial habit (Oka, 1988), while another ecotype in tropical Brazil shows an annual-perennial intermediate form (Akimoto et al., 1998). This means that there are still opportunities to detect de novo diversity in natural populations for this and other species.
The availability of various molecular markers now allows researchers to acquire a comprehensive grasp of divergent evolution among the AA genome species described above, including chloroplast sequences, chloroplast single-nucleotide repeats and nuclear simple sequence repeats (SSRs), although this has not been completely applied to genomes other than those of some O. sativa cultivars. Since the complete sequencing of the chloroplast genome of O. sativa cv. Nipponbare (Hiratsuka et al., 1989), the known presence of conserved sequences has made it possible to compare divergent accessions and also the mitochondrial and nuclear genomes (Sotowa et al., 2013; Brozynska et al., 2014). Re-sequencing is also feasible for any genome belonging to the genus Oryza, allowing possible alignment against the reference sequence (Waters et al., 2012). In the present study, we utilized these materials and tools to evaluate the phylogenetic relationships of AA genome species among Asian, African, American and Oceanian species.
Wild rice accessions composed of ranks 1 to 3 and comprising five AA genome species were provided by the National BioResource Project in Japan (Nonomura et al., 2010). The accessions formed a core collection selected from typical representatives in historical collections stored at the National Institute of Genetics, Mishima, Japan. Twenty accessions for O. barthii, 20 accessions for O. glumaepatula, 19 accessions for O. longistaminata, 39 accessions for O. rufipogon and 18 accessions for O. meridionalis were examined to clarify their genetic diversity (Table 1). Seven O. rufipogon accessions were categorized as Oceanian O. rufipogon, because two of them originated in Australia and the other five originated in Papua New Guinea. In a previous report, we characterized these accessions as distinct from Asian O. rufipogon (Sotowa et al., 2013). To compare genetic diversity among the cultivars, 20 indica and 20 japonica accessions were also genotyped (Supplementary Table S1). The japonica cultivars had already been classified into Tropical-japonica (Tr-J) and Temperate-japonica (Tm-J) using a method described by Sato (1991). These wild rice accessions were supplied as DNA samples from the National Institute of Genetics, Japan. DNA samples of cultivars and additional material were extracted with the general urea method.
| Species, name of accession | Origin | Rank (1-2-3) |
|---|---|---|
| O. glumaepatula | ||
| W1169 | Cuba | 1 |
| W1171 | Cuba | 3 |
| W1183 | Guyana | 3 |
| W1185 | Suriname | 2 |
| W1187 | Brazil | 2 |
| W1189 | Manaus, Brazil | 3 |
| W1191 | Brazil | 3 |
| W1196 | Colombia | 2 |
| W1477 | Brazil | 3 |
| W2140 | Brazil | 3 |
| W2145 | Brazil | 1 |
| W2149 | Brazil | 3 |
| W2160 | Brazil | 3 |
| W2165 | Brazil | 3 |
| W2173 | Brazil | 3 |
| W2184 | Brazil | 3 |
| W2192 | Brazil | 3 |
| W2199 | Brazil | 1 |
| W2201 | Brazil | 3 |
| W2203 | Brazil | 3 |
| O. barthii | ||
| W0042 | No description | 3 |
| W0652 | Sierra Leone | 1 |
| W0698 | Guinea | 2 |
| W0720 | Mali | 2 |
| W0747 | Mali | 2 |
| W1050 | Gambia | 3 |
| W1063 | No description | 3 |
| W1410 | Sierra Leone | 3 |
| W1416 | Sierra Leone | 3 |
| W1443 | Mali | 3 |
| W1467 | Cameroon | 3 |
| W1473 | Chad | 3 |
| W1574 | Nigeria | 3 |
| W1583 | Chad | 3 |
| W1588 | Cameroon | 1 |
| W1605 | Nigeria | 3 |
| W1642 | Botswana | 3 |
| W1643 | Botswana | 3 |
| W1646 | Tanzania | 2 |
| W1702 | Mali | 3 |
| O. longistaminata | ||
| W0643 | Gambia | 2 |
| W0708 | Guinea | 2 |
| W1004 | Ghana | 3 |
| W1232 | Unknown | 3 |
| W1413 | Sierra Leone | 1 |
| W1420 | Mali | 3 |
| W1423 | Mali | 3 |
| W1444 | Ivory Coast | 3 |
| W1448 | Ivory Coast | 3 |
| W1454 | Burkina Faso | 3 |
| W1460 | Dahomey | 3 |
| W1465 | Nigeria | 3 |
| W1504 | Tanzania | 3 |
| W1508 | Unknown | 1 |
| W1540 | Congo | 2 |
| W1570 | Nigeria | 3 |
| W1573 | Nigeria | 3 |
| W1624 | Cameroon | 2 |
| W1650 | Tanzania | 3 |
| O. meridionalis | ||
| W1297 | Darwin, Australia | 2 |
| W1300 | Darwin, Australia | 3 |
| W1625 | Darwin, Australia | 1 |
| W1627 | Australia | 2 |
| W1631 | Kununurra area, Australia | 3 |
| W1635 | Darwin, Australia | 1 |
| W1638 | Queensland, Australia | 3 |
| W2069 | Kununurra area, Australia | 2 |
| W2071 | Kununurra area, Australia | 3 |
| W2077 | from Darwin to Normanton, Australia | 3 |
| W2079 | from Darwin to Normanton, Australia | 2 |
| W2080 | from Darwin To Normanton, Australia | 3 |
| W2081 | Matarauka, Australia | 3 |
| W2100 | Queensland, Australia | 3 |
| W2103 | Queensland, Australia | 2 |
| W2105 | Queensland, Australia | 3 |
| W2112 | Queensland, Australia | 3 |
| W2116 | Queensland, Weipa, North Point, Australia | 3 |
| Asian O. rufipogon | ||
| W0106 | Phulankara, near Cuttack, Orissa, India | 1 |
| W0107 | Pahala, Orissa, India | 3 |
| W0108 | Cuttack, Orissa, India | 3 |
| W0120 | Cuttack, Orissa, India | 1 |
| W0137 | Kadiam, Andhra, India | 3 |
| W0180 | Ngao, Lamphang, Thailand | 3 |
| W0593 | Binjai Rendah, Malaysia | 3 |
| W0610 | Rangoon, Myanmar | 3 |
| W0630 | Magwe, Myanmar | 2 |
| W1294 | Musuan, Mindanao, Philippines | 1 |
| W1551 | Saraburi, Thailand | 3 |
| W1666 | Siliguri, India | 3 |
| W1669 | Orissa, India | 3 |
| W1681 | Orissa, India | 3 |
| W1685 | Orissa, India | 3 |
| W1690 | Chiengrai, Thailand | 3 |
| W1715 | China | 3 |
| W1807 | Sri Lanka | 2 |
| W1852 | Chiang Saen, Thailand | 3 |
| W1865 | Saraburi, Thailand | 3 |
| W1866 | Saraburi, Thailand | 1 |
| W1921 | Saraburi, Thailand | 1 |
| W1939 | Bangkoknoi, Thailand | 3 |
| W1945 | No description | 2 |
| W1981 | Palembang, Indonesia | 3 |
| W2003 | from Pajani to Bombay, India | 1 |
| W2014 | India | 3 |
| W2051 | Hobiganji, Bangladesh | 2 |
| W2263 | Cambodia | 2 |
| W2265 | Laos | 3 |
| W2266 | Laos | 3 |
| W2267 | Laos | 3 |
| Australian O. rufipogon | ||
| W2078 | from Darwin to Normanton, Australia | 2 |
| W2109 | Queensland, Australia | 3 |
| New Guinean O. rufipogon | ||
| W1230 | Baad, Koembe, Papua New Guinea | 3 |
| W1235 | Taram, Papua New Guinea | 3 |
| W1236 | Madang, Papua New Guinea | 2 |
| W1238 | Koembe River, Papua New Guinea | 3 |
| W1239 | Sakor River, Papua New Guinea | 3 |
Chloroplast single-nucleotide repeat markers, RCt1-RCt10, were applied to evaluate the genetic diversity of the maternal origins of AA genome species (Ishii and McCouch, 2000; Ishii et al., 2001). Nuclear SSR markers were also applied to evaluate genetic diversity (Table 2). These fragments were amplified with Thermopol Taq (New England Biolabs, Tokyo, Japan) and electrophoresed on a 6% denaturing polyacrylamide gel, and genotyped after staining the gel with silver nitrate (Wang et al., 2012). The plastid genotypes were also subjected to principal component analysis (PCA) to evaluate their relationships. The chloroplast genes rpl16 and matK were amplified to clarify sequence diversification using the respective primers listed in Table 2. Fragments of rpl16 and matK were amplified to detect SNPs in order to determine phylogenetic relations among African and American species. All fragments were amplified with Thermopol Taq under the following cycle conditions: 94 ℃ for 3 min for pre-heating; 30 rounds of 94 ℃ for 10 s, 55 ℃ for 30 s and 72 ℃ for 30 s; and 72 ℃ for 5 min. These PCR fragments were purified with a FastGene Gel/PCR extraction kit (Nippon Genetics, Tokyo, Japan), and then sequenced with a BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Tokyo, Japan) using an ABI3500 Genetic Analyzer. Retrotransposon insertions were re-examined, as reported previously by Cheng et al. (2002). Three insertions, p-SINE1-r705, p-SINE1-r801 and p-SINE1-r806, were selected to clarify relationships among O. barthii, O. longistaminata and O. glumaepatula. PCR was performed with Ex Taq (Takara Bio, Kusatsu, Japan) using the following protocol: 94 ℃ for 3 min for pre-heating; 30 rounds of 94 ℃ for 10 s, 60 ℃ for 30 s and 72 ℃ for 30 s; and 72 ℃ for 5 min. Amplified fragments were electrophoresed on 1.5% agarose gels to clarify whether insertions were carried by each accession.
| Type of marker | Locus | Forward | Reverse | Chloroplast genome region (INDELs) | INDEL type | Reference | Original data to develop chloroplast INDELs |
|---|---|---|---|---|---|---|---|
| RCt | RCt1 | CATCCTTTTCAATCCAAAATCA | TGCCTGATGTAGGGAAAAGC | (A)10 | Ishii and McCouch. (2000) | ||
| RCt2 | CTGGGGGGGATTATACCTGT | ATATCTCTCATTTCCGACGCA | (A)11 | Ishii and McCouch. (2000) | |||
| RCt3 | TAGGCATAATTCCCAACCCA | CTTATCCATTTGGAGCATAGGG | (A)10 | Ishii and McCouch. (2000) | |||
| RCt4 | ACGGAATTGGAACTTCTTTGG | AAAAGGAGCCTTGGAATGGT | (T)12 | Ishii and McCouch. (2000) | |||
| RCt5 | ATTTGGAATTTGGACATTTTCG | ACTGATTCGTAGGCGTGGAC | (T)10 | Ishii and McCouch. (2000) | |||
| RCt6 | GAATTTTAGAACTTTGAATTTTTTACCC | AAGCGTACCGAAGACTCGAA | (A)10 | Ishii and McCouch. (2000) | |||
| RCt7 | GTGTCATTCTCTAGGCGAAC | AAATATGACAGAAAAGAAAAATAGG | (T)10 | Ishii and McCouch. (2000) | |||
| RCt8 | ATAGTCAAGAAAGAGGATCTAGAAT | ACCGCGATTCAATAAGAGTA | (T)17 | Ishii and McCouch. (2000) | |||
| RCt9 | ATAAGGTTATTCCCCGCTTACC | AAATTGGGGGAATTCGTACC | (T)10 | Ishii and McCouch. (2000) | |||
| RCt10 | TCTTCATTTGGAATCTGGGC | CTATTGATGCAAACGCTGTAC C | (T)10 | Ishii and McCouch. (2000) | |||
| rpl16 | rpl16-exon1 | ATGCTTAGTGTGTGACTCGTTAG | This study | ||||
| rpl16-336f | GGTCTATGAATTACATCATAAAAAG | This study | |||||
| rpl16-500f | TTTTTGGAAGCTCCATTGCGAG | Sotowa et al. (2013) | |||||
| rpl16-1kb | ATGAGAAGAAACTCTCATGTCC | This study | |||||
| rpl16-486r | CAATTTCTCAGTTTTATTAACTCGG | This study | |||||
| rpl16-5Preverse | TGTTTACGAAATCTGGTTCTTTTG | This study | |||||
| 3P | ATCTGCTACATTTAAAAGGGT | Nakamura et al. (1997) | |||||
| INDEL (cp genome) | glum-cpINDEL1 | CTCGGACGAATAATCTAATACATGG | CTATGATTCTATGTTCTCCTTAGTG | 46087..46091 | TATAT Deletion | This study | W1171 genome data |
| glum-cpINDEL2 | ATATATAGTCAAGAAAGAGGATC | ATGAATTAACAAATAAGACAGG | 78424..78429 | TTTTTT Deletion | This study | W1171 genome data | |
| glum-cpINDEL3 | CAAAAATTTTCTCATTGAAACAATC | CAATTTGAGTTACGAAACAAGGGAG | 103827^103828 | GTTTT Insertion | This study | W1171 genome data | |
| glum-cpINDEL4 | TGGCGGCAGTCTCGAAAAAG | CAAGTTCACGAACTAATAAGG | 105208^105209 | ATTCA Insertion | This study | W1171 genome data | |
| rufi-cpINDEL1 | GGATTCACCGAAACAAACAACC | GCCAAATTGAGCAGGTTGCG | 12670..12673 | AGGG Deletion | This study | 45-2 genome data | |
| rufi-cpINDEL2 | TTTGGGGAAGAAAACATCTTCC | TAAACGGAGAGAATCGACTAAG | 14012..14013 | AC Deletion | This study | 45-2 genome data | |
| rufi-cpINDEL3 | AATTGCTCTCACCGCTCTTTC | TAGTCGAATTGTTGTATCAACTC | 17380..17385 | ATAGAA Deletion | This study | 45-2 genome data | |
| rufi-cpINDEL4 | TAATTTGATATGGCTCGGACG | TGCTATGATTCTATGTTCTCC | 46087..46091 | TATAT Deletion | This study | 45-2 genome data | |
| meri-cpINDEL2 | GCCTTGTTCAGGAACTCGACAG | TTGGTTGTACCATTGCATTTCAG | 5852..5856 | CAATC Deletion | This study | GU592208 | |
| meri-cpINDEL3 | AATGGCGCAATGATCTTGGAGA | GAATGGCGATGGCTCGATTTC | 8192^8193 | AGAAA Insertion | This study | GU592208 | |
| meri-cpINDEL5 | AAGTGTGCCTTGCAACCGAG | AAGCAGCAGAACACCTGAAAC | 13566..13567 | T Deletion | This study | GU592208 | |
| meri-cpINDEL8 | GATATATTTGTGCTGGCATTCTC | TTCCAGTGAAAATCATATGCAC | 17379..17383 | ATAGAA Deletion | This study | GU592208 | |
| matK | matK | TTGATGCAAGAATTGCCTTTCC | AAAATGCAACACCCTGTTCTGACC | This study | |||
| SINE-INDEL | p-SINE1 (r705) | TGTTGCGGAACTTGCATTGT | AGAATCAAACTTGACCTGTC | Cheng et al. (2002) | |||
| p-SINE1 (r801) | CTTGGCTTATTATTACTGATT | ATGAAAGAATAGCGTAAACAAAT | Cheng et al. (2002) | ||||
| p-SINE1 (r806) | ATGCAGCTGTAAAGAAGAGT | CAAGATTAAGGCTCATCTGA | Cheng et al. (2002) | ||||
| SSR | AP003436 | GCAGCGAAGCCAACGTAGTCC | CTGCCTTCCCAAACATCTTCTC | Wang et al. (2012) | |||
| AP004212 | GGAGGCTCTACTACATATGG | TGGGAAACTATGCATCAGTC | Wang et al. (2012) | ||||
| +29Cat | CACGATCTAGAAGACGAGAG | CCAAATTACGCCTTCCTACC | Wang et al. (2012) | ||||
| RM3204 | GCAACCCTTTCTTCCTCCTC | CCAAGGAGAGCGCACTAGC | McCouch et al. (2002) | ||||
| RM257 | CAGTTCCGAGCAAGAGTACTC | GGATCGGACGTGGCATATG | Chen et al. (1997) | ||||
| RM3577 | CCGATCCCATTCACAGATTC | CAGTGCCTTGATCGATGTTG | McCouch et al. (2002) | ||||
| RM17 | TGCCCTGTTATTTTCTTCTCTC | GGTGATCCTTTCCCATTTCA | Panaud et al. (1996) |
W1171 (O. glumaepatula) and a single Thai wild rice, 45-2, grown at Prachinburi Rice Research Center (Rice Department, Thailand), were subjected to next-generation sequencing (NGS) to identify novel insertions or deletions (INDELs) which may be detected only from non-model accessions. DNA samples of these accessions were extracted from mature leaves with a DNeasy Plant Mini Kit (QIAGEN, Tokyko, Japan). The whole genome sequences were obtained with Illumina-Hiseq as 100-bp pair-ends, with 75,832,764 reads for W1171 and 62,762,232 for 45-2. The raw reads were re-sequenced using a CLC Genomics Workbench (CLC bio Japan, Tokyo, Japan) against the complete chloroplast genome of O. sativa cv. Nipponbare (GU592207.1). INDELs were then screened. When the variations showed over 1,000-fold coverage, over 100 counts, and frequencies of more than 50%, these INDELs were used in “wet experiments”. To confirm these variations in comparison with the Nipponbare chloroplast genome, four INDELs from each accession were developed (Table 2). Another complete chloroplast genome of O. meridionalis (GU592208) was aligned with the Nipponbare genome and four INDELs were developed.
Data analysisData of RCt microsatellite and nuclear SSRs were used to perform PCA and to calculate genetic distance among accessions with GenAlEx (http://biology-assets.anu.edu.au/GenAlEx/Welcome.html). The number of alleles (Na) and effective alleles (Ne), and expected heterozygosity (He), were also calculated by GenAlEx. The Ne will be significantly lower than the actual number if the frequencies for some alleles are much smaller than those for others.
He was calculated using the formula,
Polymorphism in chloroplast single-nucleotide repeats was found in six of ten loci examined (Supplementary Table S2). The average expected heterozygosity (He) ranged from 0.176 to 0.542 among species (Table 3). The highest score was for O. glumaepatula, followed by 0.528 for O. rufipogon. Our previous study showed that O. rufipogon accessions were composed of two diverged maternal lineages (Sotowa et al., 2013). Therefore, these accessions were divided into Asian and Oceanian groups, which were found to have distinct He scores of 0.511 and 0.150, respectively. O. sativa showed relatively higher genetic diversity when two varietal groups (indica and japonica) were mixed together. These varietal groups diverged at the subspecies level and were strongly influenced by different groups of O. rufipogon. When the genetic diversity of each varietal group was calculated separately, their scores were lower than that of O. rufipogon. Subgroups in japonica, Tm-J and Tr-J, were also diverged to some extent, and their individual scores were also calculated. The two subgroups showed lower scores than the japonica group. These trends suggested that varietal groups tended to carry different RCt genotypes among O. sativa.
| Species | No. | He | Average | |||||
|---|---|---|---|---|---|---|---|---|
| RCt1 | RCt3 | RCt5 | RCt6 | RCt8 | RCt9 | |||
| Cultigen | ||||||||
| O. sativa | 40 | 0.049 | 0.635 | 0.495 | 0.489 | 0.569 | 0.644 | 0.480 |
| indica | 20 | 0.000 | 0.180 | 0.180 | 0.095 | 0.095 | 0.000 | 0.092 |
| japonica | 20 | 0.095 | 0.480 | 0.000 | 0.320 | 0.530 | 0.480 | 0.318 |
| Tm-J | 10 | 0.180 | 0.180 | 0.000 | 0.000 | 0.460 | 0.480 | 0.217 |
| Tr-J | 10 | 0.000 | 0.420 | 0.000 | 0.480 | 0.540 | 0.320 | 0.293 |
| Wild | ||||||||
| O. rufipogon | 39 | 0.650 | 0.537 | 0.264 | 0.375 | 0.570 | 0.670 | 0.528 |
| Asia | 32 | 0.671 | 0.554 | 0.273 | 0.387 | 0.588 | 0.692 | 0.511 |
| Oceania | 7 | 0.000 | 0.245 | 0.000 | 0.000 | 0.245 | 0.408 | 0.150 |
| O. meridionalis | 18 | 0.444 | 0.401 | 0.000 | 0.105 | 0.204 | 0.105 | 0.210 |
| O. glumaepatula | 20 | 0.810 | 0.585 | 0.515 | 0.255 | 0.445 | 0.640 | 0.542 |
| O. barthii | 20 | 0.345 | 0.335 | 0.000 | 0.000 | 0.000 | 0.375 | 0.176 |
| O. longistaminata | 19 | 0.637 | 0.465 | 0.000 | 0.188 | 0.283 | 0.548 | 0.354 |
To obtain an overview of diversity among AA genome species, PCA was adopted. Allelic combinations revealed particular groups (Fig. 1A). The subgroups in Asian cultivated species were clearly separated from each other, but the variation of the cultivars as a whole was included within that of O. rufipogon. This showed that O. rufipogon in the core collection covered the fundamental variation inherited by the varietal accessions. PCA showed that the Oceanian group was located in a position intermediate between O. rufipogon and O. meridionalis (Fig. 1B). After excluding O. sativa, the relationships of all wild AA genome species were confirmed by PCA (Fig. 1C). O. glumaepatula overlapped partly with O. longistaminata and O. rufipogon, and O. barthii was placed outside the others (Fig. 1D). The distribution graphs obtained by PCA suggested that O. longistaminata shared a closely related chloroplast genome with O. glumaepatula.

PCA among O. sativa (indica, Tm-japonica and Tr-japonica) and AA genome species. (A) Varietal groups and O. rufipogon were compared. (B) O. sativa, Asian and Oceanian O. rufipogon, and O. meridionalis as an endemic species in Australia were compared. (C) All wild rice species carrying AA genomes were compared. (D) African and American species were compared.
A phylogenetic tree was constructed with RCt genotypes (Supplementary Fig. S1). O. meridionalis, O. barthii and Oceanian O. rufipogon formed distinctive clades. O. glumaepatula tended to form two major clades with either O. rufipogon or O. longistaminata. O. rufipogon accessions were widely distributed. Ambiguous divergence was due to the mutable nature of single-sequence repeats. Except for RCt6, polymorphic markers carried multiple alleles (Supplementary Table S2). In one case, RCt1 carried seven alleles in a single species. Although mutable alleles of RCt markers caused phylogenetic relations to be ambiguous, PCA with RCt repeats demonstrated a novel phylogenetic relationship of O. glumaepatula.
Sequence-based phylogenetic analysis with rpl16 and matKParts of the chloroplast genome including rpl16 were sequenced among all species. As the rpl16 sequence included RCt8 and other multiple single-nucleotide repeats, its phylogenetic tree did not show clear relationships (data not shown). Thus, single-nucleotide repeats were excluded from the rpl16 sequence data and SNP combinations were summarized as haplotypes to confirm relationships among O. glumaepatula, O. barthii and O. longistaminata (Table 4). O. longistaminata shared haplotypes (Haplotypes 2 and 7) with O. rufipogon, one of which (Haplotype 2) was also a major haplotype in O. glumaepatula. Two haplotypes (Haplotypes 3 and 4) in the latter species carried SNPs against Haplotype 2. Haplotypes 3 and 4 carried an A to G substitution at 1430 nt against Haplotype 2. Haplotype 4 carried another substitution, an A to T substitution at 655 nt. They were identical to Haplotype 2 except for these two substitutions. None of the haplotypes was identical to the two haplotypes in O. barthii, because Haplotypes 5 and 6 carried a T insertion between 779 and 780 nt and Haplotype 6 carried another A substitution at 1180 nt when compared to Haplotype 2. These variations resulted in a unique feature in O. barthii.
| Species | No. of accessions | Intron 1 | Exon 2 | Haplotype | Accessions | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 655 nt | 676 nt | 770^771 nt | 779^780 nt | 1180 nt | 1288 nt | 1397–1398 nt | 1430 nt | 1450 nt | ||||
| O. sativa | 1 | A | T | – | – | G | C | CG | A | G | 1 | Nipponbare |
| O. glumaepatula | 8 | A | T | – | – | G | C | GC | A | G | 2 | W1171, W1183, W1185, W1196, W1477, W2199, W2201, W2203 |
| 7 | A | T | – | – | G | C | GC | G | G | 3 | W1191, W1169, W1187, W2140, W2145, W2149, W2192 | |
| 5 | T | T | – | – | G | C | GC | G | G | 4 | W1189, W2160, W2165, W2173, W2184 | |
| O. barthii | 16 | A | T | – | T | G | C | GC | A | G | 5 | W0042, W0652, W0698, W0720, W0747, W1050, W1063, W1410, W1416, W1443, W1467, W1473, W1583, W1588, W1605, W1702 |
| 4 | A | T | – | T | A | C | GC | A | G | 6 | W1574, W1642, W1643, W1646 | |
| O. longistaminata | 18 | A | T | – | – | G | C | GC | A | G | 2 | W0643, W0708, W1004, W1232, W1413, W1420, W1423, W1444, W1448, W1454, W1460, W1465, W1504, W1508, W1540, W1570, W1624, W1650 |
| 1 | A | T | – | – | G | C | GT | A | G | 7 | W1573 | |
| O. rufipogon | ||||||||||||
| Asia | 23 | A | T | – | – | G | C | GT | A | G | 7 | W0106, W0108, W0137, W0180, W1294, W1551, W1666, W1669, W1681, W1685, W1690, W1715, W1807, W1852, W1865, W1866, W1921, W1939, W1981, W2051, W2263, W2265, W2266 |
| 7 | A | T | – | – | G | C | GC | A | G | 2 | W0107, W0120, W0593, W0610, W0630, W1945, W2267 | |
| 2 | A | T | – | – | G | C | GC | A | C | 8 | W2003, W2014 | |
| Oceania | 2 | A | T | – | – | G | C | GT | A | G | 7 | W1230, W1236 |
| 2 | A | T | C | – | G | C | GC | A | G | 9 | W2078, W2109 | |
| 3 | A | T | C | – | G | T | GC | A | G | 10 | W1235, W1238, W1239 | |
| O. meridionalis | 2 | A | T | C | – | G | C | GC | A | G | 9 | W1635, W2116 |
| 10 | A | T | C | – | G | T | GC | A | G | 10 | W1631, W1638, W2069, W2071, W2080, W2081, W2100, W2103, W2105, W2112 | |
| 6 | A | A | C | – | G | C | GC | A | G | 11 | W1297, W1300, W1625, W1627, W2077, W2079 | |
To confirm relationships among O. glumaepatula, O. barthii and O. longistaminata, an additional chloroplast gene, matK, was also sequenced. In each case, O. glumaepatula and O. longistaminata shared the same sequence, which was identical to O. sativa cv. Nipponbare, except that one accession, W1448, carried a C to T substitution at 684 nt (Supplementary Table S3). All O. barthii accessions possessed a unique substitution at 516 nt.
Genetic distance evaluated by nuclear DNAIt has previously been reported that O. glumaepatula shares particular insertions of retrotransposons with O. barthii (Cheng et al., 2002). However, chloroplast markers did not show any evidence for this. Nuclear SSR markers and other markers were therefore applied to clarify the phylogenetic relationships and thus to resolve the apparent inconsistency regarding genetic relationships between American and African species. Twenty-eight SSR markers were randomly chosen and seven of them showed sufficient polymorphism. Other markers represented monomorphism over all species or were not amplified with DNA templates in multiple species, probably because of sequence divergence among different species or genomic rearrangement. The averaged He scores for the species overall ranged from 0.335 at AP004212 to 0.704 at RM257 (Supplementary Table S4). O. meridionalis showed the lowest diversity among the species. A high He score was found in Oceanian O. rufipogon, next to Asian O. rufipogon. This relatively high value for Oceanian O. rufipogon would have been due to the mixture of Asian and Australian factors described in our previous paper (Sotowa et al., 2013).
These genotypes were then used to calculate genetic distances in order to construct a phylogenetic tree. Species-specific clades were clearly recognized for O. meridionalis and O. glumaepatula. O. rufipogon showed a scattered distribution and partly formed clades with O. longistaminata. Oceanian O. rufipogon accessions W1235 and W1239 formed a clade with the Australian endemic species O. meridionalis (Fig. 2). The past classification of Oceanian O. rufipogon W1235 and W1239, based on a field observation by Katayama (1968), would be mistranslated to annotate the species classification between annual types belonging to either O. rufipogon or O. meridionalis, because O. meridionalis had not been reported at that time; Ng et al. (1981) reported the species. They should now be referred to as O. meridionalis. Detailed phenotypic observations should be performed to determine the taxonomic classification. Other Oceanian O. rufipogon accessions formed three clades with some of the Asian O. rufipogon accessions. In one of these clades, two Australian O. rufipogon accessions (W2078 and W2109) and one Papua New Guinean O. rufipogon accession (W1238) formed a single clade with Cambodian, Indian and Malaysian O. rufipogon accessions. W1230 was grouped with accessions from India, Laos, Myanmar and Thailand. W1236 was grouped with accessions from India, Indonesia, the Philippines and Thailand. O. glumaepatula and O. barthii were not genetically close and tended to form distinct clades with other species except for one O. barthii accession (W1416) close to one Indian O. rufipogon accession, W0137.

Phylogenetic tree obtained by the neighbor-joining method based on seven SSR loci. The diverse distribution of Oceanian accessions is shown using bold letters. One O. barthii accession, W1416, was located next to Indian O. rufipogon, and is underlined. The scale indicates genetic distance.
The previously reported close relationship between O. glumaepatula and O. barthii has been confirmed on the basis of SINE insertions (Cheng et al., 2002). Accessions used in this experiment were re-examined using the three known SINE insertions (Supplementary Table S5). O. barthii and O. glumaepatula shared the p-SINE1-r806 insertion, while only O. glumaepatula carried the p-SINE1-r801 insertion. O. longistaminata carried the p-SINE1-r705 insertion but other species did not. These insertions suggested that these species diverged and that all accessions in single species originated from single ancestral populations. Some O. rufipogon accessions that were included in the same clades with O. longistaminata did not carry the p-SINE1-r705 insertion (data not shown). The genetic relationships inferred from SSR genotypes were imprecise beyond the species level, but SSR markers provided an overview of relationships among species.
NGS dataTwo accessions, W1171 (O. glumaepatula) and Thai wild rice 45-2 (O. rufipogon), were newly re-sequenced against the chloroplast genome of O. sativa cv. Nipponbare. 45-2 was wild rice from the Prachinburi Rice Research Center, Rice Department, Thailand. The two accessions were subjected to NGS. To understand polymorphism in wild rice populations, such a native wild rice accession was selected. Based on the data of Basic Variant Detection, several rearrangements, insertions and deletions were assumed to exist. Another chloroplast genome from O. meridionalis was also used to develop INDELs. All INDELs except rufi-cpINDEL2 and meri-cpINDEL5 were presumed to be simple four- to six-nucleotide insertions or deletions (Table 2). These INDELs were expected not to be mutable like SSR (Table 5). One INDEL marker, rufi-cpINDEL3, was denoted as an ATAGAA deletion (Table 2). However, because the site was flanked by incomplete inverted repeat units similar to ATAGAA, this INDEL marker generated a high number of alleles and high He scores (Table 5, Supplementary Table S6). O. glumaepatula and O. barthii carried six and seven alleles at rufi-cpINDEL3. These seemed to behave as markers with high mutability such as SSR markers.
| Markers | He | ||||||
|---|---|---|---|---|---|---|---|
| O. glumaepatula | O. barthii | O. longistaminata | O. rufipogon (Asia) | O. rufipogon (Oceania) | O. meridionalis | ||
| glum-cpINDEL | cpINDEL1 | 0.500 | 0.095 | 0.000 | 0.389 | 0.000 | 0.000 |
| cpINDEL2 | 0.375 | 0.000 | 0.100 | 0.498 | 0.000 | 0.105 | |
| cpINDEL3 | 0.375 | 0.000 | 0.100 | 0.000 | 0.000 | 0.000 | |
| cpINDEL4 | 0.375 | 0.000 | 0.100 | 0.000 | 0.000 | 0.000 | |
| rufi-cpINDEL | cpINDEL1 | 0.000 | 0.000 | 0.000 | 0.342 | 0.490 | 0.000 |
| cpINDEL2 | 0.000 | 0.000 | 0.000 | 0.170 | 0.000 | 0.000 | |
| cpINDEL3 | 0.750 | 0.759 | 0.100 | 0.117 | 0.000 | 0.000 | |
| cpINDEL4 | 0.000 | 0.000 | 0.000 | 0.342 | 0.000 | 0.000 | |
| meri-cpINDEL | cpINDEL2 | 0.495 | 0.000 | 0.000 | 0.000 | 0.408 | 0.000 |
| cpINDEL3 | 0.000 | 0.000 | 0.188 | 0.000 | 0.000 | 0.000 | |
| cpINDEL5 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
| cpINDEL8 | 0.000 | 0.000 | 0.488 | 0.117 | 0.408 | 0.000 | |
| Average He | over 11 loci* | 0.193 | 0.009 | 0.089 | 0.169 | 0.119 | 0.010 |
| Average He | over 12 loci | 0.239 | 0.071 | 0.090 | 0.165 | 0.109 | 0.009 |
A phylogenetic tree drawn with the 11 INDELs except for rufi-cpINDEL3 is shown in Fig. 3. O. meridionalis accessions tended to form species-specific clades. Four Oceanian O. rufipogon accessions (W1235, W1238, W1239 and W2078) were included in a single clade with O. meridionalis. Two of them, W1235 and W1239, were grouped with O. meridionalis in a clade constructed with nuclear SSR genotypes (Fig. 2). As mentioned above, these may belong taxonomically to the species O. meridionalis. In contrast, other Oceanian O. rufipogon accessions, W1238 (Papua New Guinea) and W2078 (Australia), were grouped with Asian O. rufipogon accessions originating from Cambodia (W2263), India (W1681) and Malaysia (W0593), along with an accession from Australia (W2109), into another clade constructed with nuclear genotypes. O. barthii also showed narrow genetic diversity: the species formed a single clade except for W1702, which was close to the remaining O. barthii accessions in the tree. This clade was relatively close to the clade comprising O. longistaminata and several O. glumaepatula accessions. In overview, O. glumaepatula formed several subgroups which were scattered between O. rufipogon and O. longistaminata.

Phylogenetic tree obtained by the neighbor-joining method based on 11 chloroplast INDEL markers developed using chloroplast genomes. The scale indicates genetic distance.
Phylogenetic trees obtained from INDEL markers based on different species are shown in Supplementary Fig. S2. The phylogenetic tree obtained with the glum-cpINDEL data was similar to the tree in Fig. 3. O. glumaepatula carried multiple maternal lineages shared partly with O. rufipogon and partly with O. longistaminata. The tree with the data of rufi-cpINDEL was ambiguous, due to the high mutability in rufi-cpINDEL3 (Supplementary Fig. S3). Thus, INDEL markers will help to understand maternal lineages easily and efficiently.
Diploid AA genome species are unique among other diploid and tetraploid species belonging to Oryza because two domesticated forms, O. sativa and O. glaberrima, belong to AA genome species. As the major cultigen, O. sativa, is composed of the diverged indica and japonica, O. rufipogon was presumed to carry high diversity, next to O. glumaepatula. RCt markers have helped to distinguish accessions at the subspecies level (Ishii and McCouch, 2000). When compared at the species level, they could not give precise resolution. On the other hand, species distributions detected by PCA provided an overview of species divergence. However, phylogenetic relations obtained by data of Rct markers were not as precise as INDEL- or sequence-based relations because of the high mutability of RCt markers compared with SNPs. The divergence speed of chloroplast DNA was inferred as being slower than that of nuclear DNA, but high mutation rates can occur in single-nucleotide repeats included in RCt markers. Sequences of the chloroplast gene rpl16 and its flanking region also demonstrated a similar result. When we excluded single-nucleotide repeats from the rpl16 sequences, a phylogenetic tree obtained from the data was relatively clear. The matK sequence did not include such repeats. Thus, phylogenetic relationships estimated from matK sequence data were simple. Oryza glumaepatula shared high similarity with O. longistaminata based on the chloroplast sequences of matK and rpl16. In contrast, O. glumaepatula shared the same SINE insertions in the nuclear genome with O. barthii, but not with O. longistaminata described by Cheng et al. (2002). Compared to the relationships obtained with chloroplast data and the SINE insertions, nuclear SSR genotypes demonstrated that O. glumaepatula was independent from both O. barthii and O. longistaminata. This was due to the nature of SSR markers, which are easily mutated. These data suggested that all species have diverged as biological species, but the phylogenetic relationships were more complicated than we had anticipated.
SSR markers possess higher mutability, as noted above. They offer good resolution within species or even among landraces (Garris et al., 2005; Ootsuka et al., 2014). In this report, they were applied to determining how species are related to each other. Some accessions belonging to different species showed close relatedness. In the case of O. longistaminata and O. rufipogon, some accessions from both species were included in the same clades, although an insertion of p-SINE1-705 was detected only in O. longistaminata and not in O. rufipogon. This suggested that resolution beyond the species level with SSR markers was not reliable, due to the mutability of SSRs. Species-specific clades were recognized for O. glumaepatula, O. barthii and O. meridionalis. SSR markers gave an overview of relations among these species, which have diverged as independent species.
As single-nucleotide repeats tend to mutate at a higher rate than other kinds of DNA sequences, we developed INDEL markers from chloroplast genome information. NGS techniques allowed us to obtain genome information from non-model species such as O. glumaepatula and O. meridionalis. Although there is no complete nuclear genome sequence, re-sequencing against the chloroplast genome is available. The higher copy number of chloroplast genomes resulted in highly reliable polymorphism. Several INDELs were developed and relatively larger INDELs were selected, which were expected to yield obvious differences. Most INDELs resulted from a simple insertion or deletion event except for rufi-cpINDEL3, which offered six and seven alleles in O. glumaepatula and O. barthii, respectively. Several identical motifs inside a region targeted by designed primers might generate such relatively high numbers of alleles.
Except for the highly mutable rufi-cpINDEL3, other INDEL markers made it possible to evaluate the polymorphism within species. Notably, INDELs in a particular accession in O. glumaepatula identify heterogeneity in the species, and in fact, high heterogeneity was detected. The degree of resolution allowed us to define subgroups in this species. These subgroups of O. glumaepatula distinguished with chloroplast data are likely to reflect a complex evolutionary history, in which O. glumaepatula probably carries multiple maternal origins shared partly with O. longistaminata and partly with O. rufipogon. This complexity may account for O. glumaepatula having the highest He score estimated from RCt markers. In the process of divergence, the presumed proto-O. glumaepatula population as a maternal donor apparently had high gene-flow from a proto-O. barthii population before they diverged as different species in different continents. Thus, some O. glumaepatula accessions shared higher similarity of their maternal lineages with O. longistaminata, but they shared higher similarity with O. barthii at the nuclear level as estimated from SINE insertions. This inconsistency of the compositions of nucleus and cytoplasm arose before these species had completely diverged from each other. All O. barthii accessions were separated from other maternal origins, as estimated from chloroplast INDEL data. The different maternal origin of O. barthii from other species suggested that a minor maternal lineage carried by the ancestral population evolved as O. barthii.
When compared to O. glumaepatula, Australian O. meridionalis accessions showed the lowest heterogeneity. In the case of O. meridionalis, this may have been attributable to a founder effect of earlier migration to the Australian continent from Asia. The maternal lineage, however, has not been found in Asia; the lineage has probably become extinct in Asia. All accessions of O. rufipogon in Australia and Papua New Guinea shared the same or a similar maternal lineage with O. meridionalis. This suggested that O. meridionalis had diverged from Asian O. rufipogon in the past, and later the progeny carrying the same maternal lineage and belonging to O. rufipogon might extend to the Oceania area. O. rufipogon accessions were scattered in various clades and covered various maternal lineages of other AA genome species. It could be that O. rufipogon donated those maternal lineages. Further data using chloroplast genome INDELs obtained from NGS data should allow a more detailed understanding of which species might be the progenitor species among AA genome species. The current data suggest that this is O. rufipogon. Recent studies of complete sequences of chloroplast genomes suggest that O. longistaminata is closely related to some O. glumaepatula accessions (Kim et al., 2015; Wambugu et al., 2015). Thus, we need to know more precisely about the diversity among AA genome species. As field work on native wild rice accessions is still being conducted, more intense field work will reveal precise phylogenetic relations with the help of NGS as a tool.
This work was funded by a Grant-in-Aid B (Overseas project, No. 25304021) from JSPS. The valuable wild rice accessions used in this study were distributed by the National Institute of Genetics supported by the National BioResources Project, MEXT, Japan. The sequencing facility in the Gene Research Center, Hirosaki University was used.