A large deletion within intron 20 sequence of single-copy PolA1 gene as a useful marker for the speciation in Oryza AA-genome species

Oryza AA-genome complex comprises five wild species, O. rufipogon, O. barthii, O. longistaminata, O. glumaepatula, and O. meridionalis. Evolutionary relationships among these five wild species have remained contentious and inconclusive. We found that intron 20 of PolA1, a single-copy nuclear gene, was short (S-type: 141–142 bp) in O. rufipogon, O. barthii, and O. glumaepatula, while long (L-type: ca. 1.5 kb) introns were apparent in O. longistaminata and O. meridionalis. Because Oryza species containing BB, CC, EE, FF, and GG genome showed L-type introns, the S-type intron was probably derived from the L-type intron by the deletion of a 1.4 kb fragment through intramolecular homologous recombination between two tandem TTTTGC repeats. Excluding the large deletion sequence, intron 20 sequence of O. barthii was identical to that of O. longistaminata. As more than 3,470 accessions of O. rufipogon and O. sativa also contained the same intron 20 sequence with O. longistaminata except for single T-nucleotide deletion, which was shared with O. glumaepatuala, the deletion of the T-nucleotide probably occurred in the L-type intron 20 of O. logistaminata. Deletions of a large 1.4 kb fragment and single T-nucleotide within the intron 20 of PolA1 gene were considered as useful DNA markers to study the evolutionary relationships among Oryza AA-genome species.


Introduction
The genus Oryza (2n = 24 to 48) comprises 24 wild species representing 11 genomes: AA, BB, CC, BBCC, CCDD, EE, FF, GG, HHJJ, HHKK and KKLL. It has two cultivated species, O. sativa L. and O. glaberrrima Steud, while the other five species: O. rufipogon (including O. nivara), O. barthii, O. longistaminata, O. glumaepatula and O. meridionalis are regarded as wild species in the AAgenome in Oryza sativa complex (Ge et al. 1999, Vaughan 1994. The wild species of the AA-genome have been recognized as genetic resources for various kinds of useful genes to improve cultivated rice (Jena 2010).
Two cultivated rice species, O. sativa in Asia and O. glaberrima in Africa, are thought to be originated from O. rufipogon and O. barthii, respectively (Morishima et al. 1992, Oka 1988. Previous study by Second (1985) pointed Communicated O. barthii, and O. glumaepatula were closely related based on isozyme analysis. Many phylogenetic studies of Oryza species have been performed with several DNA molecular markers, such as RFLP (Wang et al. 1992), RAPD (Ishii et al. 1996), AFLP (Aggarwal et al. 1999), rDNA spacer (Cordesse et al. 1992), transposon (Kanazawa et al. 2000), catalase gene (Iwamoto et al. 1999), short interspersed element (Motohashi et al. 1997), and genome wide sequence analysis (Zhu and Ge 2005); the detailed phylogenetic relationships of AA-genome species are still inconsistent among many studies. Guo and Ge (2005) reported that monophyly in Oryzeae was strongly supported by either individual or combined analyses of both cytoplasmic and nuclear sequence in tribe level. Reconstruction of a phylogenetic tree based on combinations of sequence data from different sources such as plastid, mitochondrial and nuclear DNA produced complicated phylogenetic relationships among wild AA-genome species (Duan et al. 2007).
We are interested in the involvement of PolA1 gene with speciation because it encodes species-specific protein tag sequence not only in plants but also animals, fungi, and protists (Nakamura 2016). The PolA1, single-copy nuclear gene, encodes the largest subunit of RNA polymerase I that plays an essential role in 45S rRNA transcription (Seither et al. 1997). PolA1 consists of 21 exons and spans approximately 15.0 kb on chromosome 6 in Oryza sativa subsp. japonica 'Nipponbare' (LOC_Os06g40950 and The Rice Annotation Project Database: Os06g0612200 in Rice Genome Annotation Project, Kawahara et al. 2013). Recently, particular DNA sequences from exons 19 to 21 of PolA1 gene have been useful to elucidate the phylogenetic relationships of Petunia (Zhang et al. 2008), Oryza ), Triticum (Takahashi et al. 2010), Brassica (Fareed et al. 2016) and Triticum-Aegilops , and Triticum-Aegilops and Hordeum (Rai et al. 2012).
In this study, we found that the intron 20 sequences of PolA1 gene were differentiated in length into S-type (141-142 bp) or L-type (ca. 1.5 kb) in Oryza AA-genome species. As Oryza species outside the AA-genome had the L-type intron, the S-type intron was probably originated from the L-type intron by the deletion of 1.4 kb DNA fragment. This result suggested good evidence for the evolutionary relationships among Oryza AA-genome species.

Plant materials and DNA extraction
Almost all accessions of Oryza species used in this study were provided by the National Institute of Genetics, Japan. Two accessions of O. longistaminata were obtained from the Genebank of the International Rice Research Institute (IRRI). Of the 30 accessions, listed in Table 1 Table 1).
Young leaves (ca. 100 mg) of seedlings were frozen in 2-ml plastic tubes with liquid nitrogen and crushed into fine  (Doyle and Doyle 1987) and used for PCR and sequence analyses.

PCR amplification and direct sequencing
As shown in Figs. 1A and 2A, DNA fragments containing intron 20 (S-type and L-type) were amplified by PCR using two different pairs of primers a and b, and e and f, which were located on the exon 20 and exon 21 of PolA1 gene, respectively. The primers, listed in Supplemental Table 2, were designed based on the sequence of rice PolA1 gene (NC_029261, DDBJ). Subsequently, PCR amplification was performed with ExTaq DNA polymerase (TaKaRa, Shiga, Japan) according to manufacturer's instruction. The PCR conditions were 40 cycles of 94°C for 1 min, 58°C for 1 min for annealing, and 72°C for 2 min for elongation in a PTC200 thermocycler (MJ Research, Waltham, MA, USA).
The amplified PCR products were subjected to 1.0-1.5% agarose gel electrophoresis and purified using a PCR purification kit (QIAquick; Qiagen, CA, USA). DNA sequences DNA fragments containing the intron 20 sequences were amplified using a pair of primers a and b, which located on the middle of exons 20 and 21 of PolA1 gene, respectively. Amplified L-type intron 20 sequences were determined using additional sequencing primers c, d1 (AA-genome), and d2 (other than AA-genome of the purified PCR products were determined by direct sequencing with the same primer as used for PCR amplification in an automated DNA sequencer ABI310 (Applied Biosystems, CA, USA) with a Big Dye Terminator Cycle Sequencing kit (Applied Biosystems, USA). Sequences of the L-type intron 20 were determined by using primers c, d1, and d2 as a sequencing primer (Supplemental Table 2). The determined intron 20 sequences of PolA1 genes in Oryza species were registered in the DDBJ as accession nos. (LC638415-LC638446).

Data analysis
Sequences of PCR products read by direct sequencing were analyzed to determine the positions for donor and acceptor sites of the intron 20 in the PolA1 gene using NCBI web-based Blast sever (Altschul et al. 1990). The sequences were aligned by using CLUSTAW (Thompson et al. 1994) and the alignment was then manually adjusted using Genetyx Software ver. 6.0 (Software Development Co., Tokyo, Japan). The phylogenetic tree of intron 20 sequences was constructed using Neighbor-joining method with bootstrap estimate from 1,000 replicates in the MEGA6 software (Tamura et al. 2011). We analyzed SNPs in intron 20 of PolA1 gene between 24412641-24426383 on chromosome 6 of 3,024 accessions of O. sativa (http://iric.irri.org/ resources/3000-genomes-project) from the International  to confirm sharing of the same one base T-nucleotide deletion in the S-type intron 20 sequences.

Results and Discussion
Over the past half century, the utility and potential of various molecular approaches have been effectively used to solve the controversies of evolution and biosystematics that had remained unresolved despite many efforts made through conventional approaches (Avise 1995). Although it is difficult to infer the direction of speciation by comparing association of SNPs and DNA markers, large insertion/ deletion inside single-copy conserved gene, such as PolA1, are thought to be good markers for determining evolutionary relationships. Previous study by Takahashi et al. (2009) reported that PCR products containing intron 19 sequence of PolA1 gene were differentiated in length among AA, EE, FF and GG genome species in the genus Oryza while the amplicon sizes were identical between AA, BB, and CC genome species. In this study, using a pair of primers a and b (Fig. 1A), amplified DNA fragments containing intron 20 sequences of PolA1 gene were differentiated into two types, long type (L-type) and short type (S-type), in Oryza species. As shown in Fig. 1B (Table 1). Two AA-genome species, O. sativa Ac221 showed S-type (141 bp) while O. longistaminata W0708 contained both S-and L-type. This result suggested that L-type introns were ancestral to S-type introns and large deletion within the intron 20 happened after the AA-genome species originated.
Within AA-genome species, using a different pair of primers e and f ( Fig. 2A), all accessions of O. rufipogon showed the S-type intron 20 except two accessions W1235 and W1239 (Fig. 2B, Supplemental Table 1), As Sotowa et al. (2013) reported that two New Guinea accessions (W1235 and W1239) shared the same deletions in nuclear genome with O. meridionalis, these two accessions were misclassified as O. rufipogon (Lam et al. 2020 Although L-type intron 20 sequences of several accessions could not be determined because of sequence heterogeneity (Table 1), Neighbor-joining phylogenetic tree of the L-type sequences in Oryza species was constructed (Fig. 1C). Two AA-genome species, O. longistaminata and O. meridionalis, had closely related L-type sequences. Oryza punctata (BB-genome) and O. officinalis, O. eichingeri, O. rhizomatis (CC-genome species) formed a single clade, which was closely related to that of AAgenome species. Oryza australiensis (EE), O. brachyantha (FF), and O. granulata (GG) formed paraphyletic groups those were distantly related to AA-genome species.
The phylogenetic analyses based on the L-type intron 20 sequence (Fig. 1C) was consistent with those based on nuclear ribosomal DNA sequence (Kim et al. 2015) and multiple SINE inserts (Cheng et al. 2002), which supported the position of O. longistaminata as the basal AA-genome species. The ancestor of the Asian Oryza AA-genome species was diverged from ancestor of O. longistaminata in Africa involving the changes from perennial to annual and sympatric speciation during the course of evolution (Cheng et al. 2002, Iwamoto et al. 1999, Ohtsubo et al. 2004 We found that a large DNA fragment (ca. 1.4 kb) was probably deleted between two tandem TTTTGC repeats in the L-type intron, which resulted in the S-type intron (141-142 bp) (Fig. 2C). Also, the identical two tandem repeats were present at the same positions in the L-type intron of O. officinalis W0002 (Fig. 3A). Excluding the large 1.4 kb sequence, sequences of S-and L-type introns were highly homologous (Fig. 3A). In detail, intron 20 sequence of O. barthii W0652 and W1416 was identical to that of O. longistaminata W1232. And both annual W0106 (O. nivara) and perennial W1956 accessions of O. rufipogon as well as template japonica 'Nipponbare', tropical japonica Ac221, and indica Ac130 of O. sativa contained the same intron 20 sequence with O. longistaminata W1232 except for single T-nucleotide deletion (Fig. 3A). Interestingly, the same single T-nucleotide deletion was shared with two accessions (W1169, W1185) of O. glumaepatula.
The evolutionary relationships among O. rufipogon, O. barthii, O. longistaminata, O. glumaepatula and O. meridionalis have long been a subject of controversy (Vaughan et al. 2005, Wang et al. 1992, Zhu and Ge 2005.  (Fig. 3C). Then, deletions of the same 1.4 kb fragment independently arose during the speciation from O. longistaminata to O. barthii and from a O. longistaminata-like species to O. rufipogon and O. meridonalis. Extensive sequence analysis of the intron 20 in PolA1 gene will be necessary in AA-genome species except for O. rufipogon and O. sativa to reveal which of two scenario will be correct.
A previous study by Akimoto et al. (1997) presumed that O. glumaepatula was associated both of O. rufipogon and O. longistamianta. In this study, as O. glumaepatula shared the S-type intron 20 sequence containing single T-nucleotide deletion with O. rufipogon (Fig. 3A), it might be originated with relation to O. rufipogon (Vaughan et al. 2005, Yin et al. 2015. Two cultivated species, O. sativa and O. glaberrima, were independently originated from O. rufipogon in South-east Asia and O. barthii in West Africa, respectively. The origins of two cultigens have been supported by many researches (Aggarwal et al. 1999, Chang 1976, Zhu and Ge 2005.
In this study, we found that intron 20 sequences of the PolA1 gene differed by an order of magnitude in length between two L-type (ca. The annual species O. meridionalis was also originated from perennial species O. longistaminata in Australia. Although the mechanism underlying the speciation of these new species from O. longistaminata remains to be resolved, ancestral O. longistaminata accessions except Africa were displaced over time by newly arisen species. These findings could potentially suggest that O. longistaminata was used to distribute in the past not only in Africa but also in Asia and Australia.