Inconsistent diversities between nuclear and plastid genomes of AA genome species in the genus Oryza

AA genome species in the genus Oryza are valuable resources for improvement of cultivated rice. than the other species. High variation in O. rufipogon , reconfirmed using the chloroplast INDELs, covered the variation in O. meridionalis and part of the variation in O. glumaepatula . Maternal lineages including O. barthii , O. longistaminata and the remainder of O. glumaepatula were phylogenetically close to each other and carried low genetic diversity. They were separated from independent lineages, sug-gesting that they had diverged from a single ancestral maternal lineage, but diverged later to keep gene flow within respective species, as SSR compositions suggested. Genetic relationships among AA genome species indicate how these species have evolved and become distributed across four continents.


INTRODUCTION
The evolution of the genus Oryza, especially AA genome species, has been a major focus of rice research (Oka, 1988). AA genome species include two cultigens: O. sativa, which is cultivated worldwide, and O. glaberrima, an endemic species distributed along the River Niger. The progenitors of these cultivated species were once regarded as the O. perennis complex, consisting of four geographical races -Asian, African, American and Oceanian -inhabiting different regions (Morishima, 1969). These were subsequently renamed and defined as different species: O. rufipogon, O. barthii, O. longistaminata, O. glumaepatula and O. meridionalis (Oka, 1988). Oryza sativa was domesticated in East Asia from O. rufipogon (Oka, 1988;Fuller and Sato, 2008;Huang et al., 2012), and O. glaberrima was independently domesticated from O. barthii (Wang et al., 2014).
The major cultivated species, O. sativa, has a deep population structure due partly to introgression from various varieties of O. rufipogon, and also from other varietal groups (Tang and Morishima, 1997;Ishikawa et al., 2002a, b;Garris et al., 2005;McNally et al., 2009;Molina et al., 2011;Huang et al., 2012). Wide variation had already occurred 86 to 440 thousand years ago between subspecies in O. sativa, indica and japonica, and 2 million years ago among AA genome wild species (Zhu and Ge, 2005;Molina et al., 2011). The deep divergence between subspecies in O. sativa is evident in genome divergence (Huang et al., 2012). This variation can be seen in various measurements of divergence (McNally et al., 2009;Molina et al., 2011). japonica was classified into Temperate-japonica and Tropical-japonica (Oka, 1988;Sato, 1991). Divergence between Tropical-japonica and indica was estimated to be as deep as about 3,900 years ago (Molina et al., 2011). However, these varietal groups shared monophyletic origin. Because they shared identical alleles at several loci regulating major agronomic traits such as non-shattering, both varietal groups were created through past introgression to share the same genetic components (Tang and Morishima, 1997;Ishikawa et al., 2002a, b). Past introgression events among wild forms and cultivars have also been inferred from genomic data (Huang et al., 2012). Such introgressions can still be observed as weedy rice, and in parts of genome sequences where selective sweeps have occurred (Molina et al., 2011). In fact, strong selective sweeps can be seen to have occurred at several key domesticated genes such as the non-shattering gene and white pericarp gene (Li et al., 2006;Lin et al., 2007;Sweeney et al., 2007). In contrast, the endemic species O. glaberrima was domesticated from the annual O. barthii, which is widespread in west tropical Africa (Oka, 1988;Khush, 1997;Semon et al., 2005). Recent genomic data have suggested that O. glaberrima was domesticated from a narrow gene pool in O. barthii (Wang et al., 2014). Because the domesticated species occurred independently from O. rufipogon, O. glaberrima has diverged sufficiently from O. sativa to develop multiple reproductive barriers (Morishima, 1969). The related perennial species, O. longistaminata, shared the same habitat as African species, but was more widely distributed in Africa and was independent from domestication.
Two other wild species in different continents, O. meridionalis and O. glumaepatula, have shown no domestication events. O. meridionalis is a species endemic to Oceania, including New Guinea and Australia, and is characterized by an annual life history and morphologically by a short anther (Ng et al., 1981;Vaughan, 1994;Lu, 1999). Its distribution partly overlaps with that of O. rufipogon. Recently, a possible new species has been reported in the same area, which appears to have diverged from the endemic species but acquired a perennial life history (Sotowa et al., 2013;Brozynska et al., 2014). In addition, a conventional perennial type, the socalled Australian O. rufipogon, was found to share a highly similar chloroplast genome with this possible new species and also O. meridionalis. However, the Australian O. rufipogon did not show any reproductive barrier against Asian O. rufipogon. Nuclear markers also suggested that the Australian type had not shown any divergence from the Asian type. However, there is still insufficient understanding of divergence among AA genome species. Another example is the American species, O. glumaepatula, which is distributed in the New World ranging from Cuba to Brazil. The life form of this taxon is still uncertain: an ecotype distributed in Central America and the northern region of South America seems to have a perennial habit (Oka, 1988), while another ecotype in tropical Brazil shows an annual-perennial intermediate form (Akimoto et al., 1998). This means that there are still opportunities to detect de novo diversity in natural populations for this and other species.
The availability of various molecular markers now allows researchers to acquire a comprehensive grasp of divergent evolution among the AA genome species described above, including chloroplast sequences, chloroplast single-nucleotide repeats and nuclear simple sequence repeats (SSRs), although this has not been completely applied to genomes other than those of some O. sativa cultivars. Since the complete sequencing of the chloroplast genome of O. sativa cv. Nipponbare (Hiratsuka et al., 1989), the known presence of conserved sequences has made it possible to compare divergent accessions and also the mitochondrial and nuclear genomes (Sotowa et al., 2013;Brozynska et al., 2014). Re-sequencing is also feasible for any genome belonging to the genus Oryza, allowing possible alignment against the reference sequence (Waters et al., 2012). In the present study, we utilized these materials and tools to evaluate the phylogenetic relationships of AA genome species among Asian, African, American and Oceanian species.

Plant materials
Wild rice accessions composed of ranks 1 to 3 and comprising five AA genome species were provided by the National BioResource Project in Japan (Nonomura et al., 2010). The accessions formed a core collection selected from typical representatives in historical collections stored at the National Institute of Genetics, Mishima, Japan.  (Sotowa et al., 2013). To compare genetic diversity among the cultivars, 20 indica and 20  Table S1). The japonica cultivars had already been classified into Tropical-japonica (Tr-J) and Temperate-japonica (Tm-J) using a method described by Sato (1991). These wild rice accessions were supplied as DNA samples from the National Institute of Genetics, Japan. DNA samples of cultivars and additional material were extracted with the general urea method.

Molecular markers
Chloroplast single-nucleotide repeat markers, RCt1-RCt10, were applied to evaluate the genetic diversity of the maternal origins of AA genome species (Ishii and McCouch, 2000;Ishii et al., 2001). Nuclear SSR markers were also applied to evaluate genetic diversity (Table 2). These fragments were amplified with Thermopol Taq (New England Biolabs, Tokyo, Japan) and electrophoresed on a 6% denaturing polyacrylamide gel, and genotyped after staining the gel with silver nitrate . The plastid genotypes were also subjected to principal component analysis (PCA) to evaluate their relationships. The chloroplast genes rpl16 and matK were amplified to clarify sequence diversification using the respective primers listed in Table 2. Fragments of rpl16 and matK were amplified to detect SNPs in order to determine phylogenetic relations among African and American species. All fragments were amplified with Thermopol Taq under the following cycle conditions: 94 °C for 3 min for pre-heating; 30 rounds of 94 °C for 10 s, 55 °C for 30 s and 72 °C for 30 s; and 72 °C for 5 min. These PCR fragments were purified with a FastGene Gel/PCR extraction kit (Nippon Genetics, Tokyo, Japan), and then sequenced with a BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Tokyo, Japan) using an ABI3500 Genetic Analyzer. Retrotransposon insertions were re-examined, as reported previously by Cheng et al. (2002).  Panaud et al. (1996) heterozygosity (He), were also calculated by GenAlEx. The Ne will be significantly lower than the actual number if the frequencies for some alleles are much smaller than those for others. He was calculated using the formula, Phylogenetic trees based on RCt genotypes, INDELs and SSRs were constructed by the neighbor joining method using Populations 1.2.31 (http://bioinformatics.org/~tryphon/ populations/). Phylogenetic trees were drawn with MEGA 5.0 (Tamura et al., 2011).

Genetic diversity in chloroplast microsatellites
Polymorphism in chloroplast single-nucleotide repeats was found in six of ten loci examined (Supplementary Table S2). The average expected heterozygosity (He) ranged from 0.176 to 0.542 among species (Table 3). The highest score was for O. glumaepatula, followed by 0.528 for O. rufipogon. Our previous study showed that O. rufipogon accessions were composed of two diverged maternal lineages (Sotowa et al., 2013). Therefore, these accessions were divided into Asian and Oceanian groups, which were found to have distinct He scores of 0.511 and 0.150, respectively. O. sativa showed relatively higher genetic diversity when two varietal groups (indica and japonica) were mixed together. These varietal groups diverged at the subspecies level and were strongly influenced by different groups of O. rufipogon. When the genetic diversity of each varietal group was calculated separately, their scores were lower than that of O. rufipogon. Subgroups in japonica, Tm-J and Tr-J, were also diverged to some extent, and their individual scores were also calculated. The two subgroups showed lower scores than the japonica group. These trends suggested that varietal groups tended to carry different RCt genotypes among O. sativa.
To obtain an overview of diversity among AA genome species, PCA was adopted. Allelic combinations revealed particular groups (Fig. 1A). The subgroups in Asian cultivated species were clearly separated from each other, but the variation of the cultivars as a whole was included within that of O. rufipogon. This showed that O. rufipogon in the core collection covered the fundamental variation inherited by the varietal accessions. PCA showed that the Oceanian group was located in a position intermediate between O. rufipogon and O. meridionalis (Fig. 1B). After excluding O. sativa, the relationships of all wild AA genome species were confirmed by PCA (Fig. 1C). O. glumaepatula overlapped partly with O. longistaminata and O. rufipogon, and O. barthii was placed outside the others (Fig. 1D). The distribution graphs obtained by PCA suggested that O. longistaminata shared a closely related chloroplast genome with O. glumaepatula.
A phylogenetic tree was constructed with RCt genotypes ( Supplementary Fig. S1 (Cheng et al., 2002). However, chloroplast markers did not show any evidence for this. Nuclear SSR markers and other markers were therefore applied to clarify the phylogenetic relationships and thus to resolve the apparent inconsistency regarding genetic relationships between American and African species. Twenty-eight SSR markers were randomly chosen and seven of them showed sufficient polymorphism. Other markers represented monomorphism over all species or were not amplified with DNA templates in multiple species, probably because of sequence divergence among different species or genomic rearrangement. The averaged He scores for the species overall ranged from 0.335 at AP004212 to 0.704 at RM257 (Supplementary Table S4). O. meridionalis

C.
showed the lowest diversity among the species. A high He score was found in Oceanian O. rufipogon, next to Asian O. rufipogon. This relatively high value for Oceanian O. rufipogon would have been due to the mixture of Asian and Australian factors described in our previous paper (Sotowa et al., 2013).
These genotypes were then used to calculate genetic dis-tances in order to construct a phylogenetic tree.  (Fig. 2). The past classification of Oceanian  Asia  23  A  T  --G  C  GT  A  G  7  W0106, W0108, W0137,  W0180, W1294, W1551,  W1666, W1669, W1681,  W1685, W1690, W1715,  W1807, W1852, W1865,  W1866, W1921, W1939,  W1981, W2051, W2263,  W2265, W2266   7  A  T  --G  C  GC  A  G  2 W0107, W0120, W0593, W0610, W0630, W1945, W2267 O. rufipogon W1235 and W1239, based on a field observation by Katayama (1968) (Cheng et al., 2002). Accessions used in this experiment were re-examined using the three known SINE insertions (Supplementary Table S5). O. barthii and O. glumaepatula shared the p-SINE1-r806 insertion, while only O. glumaepatula carried the p-SINE1-r801 insertion. O. longistaminata carried the p-SINE1-r705 insertion but other species did not. These insertions suggested that these species diverged and that all accessions in single species originated from single ancestral populations. Some O. rufipogon accessions that were included in the same clades with O. longistaminata did not carry the p-SINE1-r705 insertion (data not  shown). The genetic relationships inferred from SSR genotypes were imprecise beyond the species level, but SSR markers provided an overview of relationships among species. Two accessions, W1171 (O. glumaepatula) and Thai wild rice 45-2 (O. rufipogon), were newly resequenced against the chloroplast genome of O. sativa cv. Nipponbare. 45-2 was wild rice from the Prachinburi Rice Research Center, Rice Department, Thailand. The two accessions were subjected to NGS. To understand polymorphism in wild rice populations, such a native wild rice accession was selected. Based on the data of Basic Variant Detection, several rearrangements, insertions and deletions were assumed to exist. Another chloroplast genome from O. meridionalis was also used to develop INDELs. All INDELs except rufi-cpINDEL2 and meri-cpINDEL5 were presumed to be simple four-to six-nucleotide insertions or deletions (Table 2). These INDELs were expected not to be mutable like SSR (Table  5). One INDEL marker, rufi-cpINDEL3, was denoted as an ATAGAA deletion (Table 2). However, because the site was flanked by incomplete inverted repeat units similar to ATAGAA, this INDEL marker generated a high number of alleles and high He scores (Table 5, Supplementary Table S6). O. glumaepatula and O. barthii carried six and seven alleles at rufi-cpINDEL3. These seemed to behave as markers with high mutability such as SSR markers.

NGS data
A phylogenetic tree drawn with the 11 INDELs except for rufi-cpINDEL3 is shown in Fig. 3 Supplementary  Fig. S2. The phylogenetic tree obtained with the glum-cpINDEL data was similar to the tree in Fig. 3. O. glumaepatula carried multiple maternal lineages shared partly with O. rufipogon and partly with O. longistaminata. The tree with the data of rufi-cpINDEL was ambiguous, due to the high mutability in rufi- cpINDEL3 ( Supplementary Fig. S3). Thus, INDEL markers will help to understand maternal lineages easily and efficiently.

DISCUSSION
Diploid AA genome species are unique among other diploid and tetraploid species belonging to Oryza because two domesticated forms, O. sativa and O. glaberrima, belong to AA genome species. As the major cultigen, O. sativa, is composed of the diverged indica and japonica, O. rufipogon was presumed to carry high diversity, next to O. glumaepatula. RCt markers have helped to distinguish accessions at the subspecies level (Ishii and McCouch, 2000). When compared at the species level, they could not give precise resolution. On the other hand, species distributions detected by PCA provided an overview of species divergence. However, phylogenetic relations obtained by data of Rct markers were not as precise as INDEL-or sequence-based relations because of the high mutability of RCt markers compared with SNPs. The divergence speed of chloroplast DNA was inferred as being slower than that of nuclear DNA, but high mutation rates can occur in single-nucleotide repeats included in RCt markers. Sequences of the chloroplast gene rpl16 and its flanking region also demonstrated a similar result. When we excluded single-nucleotide repeats from the rpl16 sequences, a phylogenetic tree obtained from the data was relatively clear. The matK sequence did not include such repeats. Thus, phylogenetic relationships estimated from matK sequence data were simple. Oryza glumaepatula shared high similarity with O. longistaminata based on the chloroplast sequences of matK and rpl16. In contrast, O. glumaepatula shared the same SINE insertions in the nuclear genome with O. barthii, but not with O. longistaminata described by Cheng et al. (2002). Compared to the relationships obtained with chloroplast data and the SINE insertions, nuclear SSR genotypes demonstrated that O. glumaepatula was independent from both O. barthii and O. longistaminata. This was due to the nature of SSR markers, which are easily mutated. These data suggested that all species have diverged as biological species, but the phylogenetic relationships were more complicated than we had anticipated.
SSR markers possess higher mutability, as noted above. They offer good resolution within species or even among landraces (Garris et al., 2005;Ootsuka et al., 2014). In this report, they were applied to determining how species are related to each other. Some accessions belonging to different species showed close relatedness. In the case of O. longistaminata and O. rufipogon, some accessions from both species were included in the same clades, although an insertion of p-SINE1-705 was detected only in O. longistaminata and not in O.