Edited by Yoshio Sano. Hisako Ohtsubo: Corresponding author. E-mail: hohtsubo@ims.u-tokyo.ac.jp |
SINEs (Short INterspersed Elements) are 70–500 bp repetitive DNA sequences that proliferate via transcription followed by reverse transcription. SINEs are found in a wide variety of eukaryotes, including animals, fungi and plants (Umeda et al., 1991; Okada, 1991; Yoshioka et al., 1993; Deragon et al., 1994; Kachroo et al., 1995). SINEs contain an internal RNA polymerase III promoter, which is involved in their transcription, but have no open reading frames. The 5’-regions of SINEs are related to tRNA (Lawrence et al., 1985; Daniels and Deininger, 1985; Endoh and Okada, 1986; Yoshioka et al., 1993) or to 7SL RNA, as shown for animal SINEs, such as the primate Alu and rodent B1 family elements (Weiner, 1980; Ullu and Tschudi, 1984), or to 5S rRNA, such as SINE3 in the zebra fish genome (Kapitonov and Jurka, 2003). All these SINEs have a poly (A) tract or an A- or T- rich sequence in their 3’-end regions. The 3’-regions of some SINEs show similarity to the 3’-end regions of LINEs (=non-LTR retrotransposons). Specific families of SINEs are found only in closely related species and it has thus been postulated that each family of SINEs originated relatively recently on the evolutionary time scale (Kido et al., 1991).
The rice genus, Oryza, comprises approximately 22 species with six diploid genome types AA, BB, CC, EE, FF, GG and four tetraploid genome types BBCC, CCDD, HHJJ and HHKK, and these species are divided into several complexes (Khush, 1997; Ge et al., 1999; Vaughan et al., 2003). Of these complexes, the Oryza sativa complex contains seven AA-genome species, including two cultivated species, O. sativa and Oryza glaberrima. The Oryza officinalis complex, which is most closely related to the O. sativa complex, contains nine species including diploid (BB, CC, and EE) and tetraploid (BBCC and CCDD) species.
The first plant SINE, which is named p-SINE1 (plant Short INterspersed Element No. 1), was identified in the introns of the Waxy gene in O. sativa (Umeda et al., 1991), and since then a large number of p-SINE1 members have been identified from rice species in the O. sativa and O. officinalis complexes (Mochizuki et al., 1993; Motohashi et al., 1997; Cheng et al., 2002, 2003; Xu, 2004). These include members of the RA (Recently Amplified) subfamily that consists of two groups, RAα and RAβ (see Ohtsubo et al., 2004), most of which show insertion polymorphisms within O. sativa and its ancestral wild species O. rufipogon with the AA genome. The presence or absence of these polymorphic members has been used for phylogenetic analysis of strains in these species (Cheng et al., 2003; also Xu, 2004).
In this study, we identified and characterized two new families of SINEs from rice. Seventeen members of the p-SINE2 family and 24 members of the p-SINE3 family were screened among rice strains of species with different genome types. Consensus sequences derived from the alignments of the sequences of members of each p-SINE family revealed that their 5’-end regions with the polymerase III promoter show significant homology with the 5’-end region of p-SINE1, but not with the 3’-end region of p-SINE1. The deduced RNA secondary structures of p-SINE1, p-SINE2 and p-SINE3 were, however, similar to one another. Similar to the p-SINE1 family members, the members of the p-SINE2 or p-SINE3 family were located at random on the 12 rice chromosomes. Sequence divergence observed among members of each p-SINE family suggests that p-SINE2 and p-SINE3 were derived from p-SINE1.
Rice strains used are described in the Results and Discussion section. Total genomic DNA samples of rice strains were previously described (Cheng et al., 2002; Xu, 2004).
Nucleotide sequence searches in databases (DDBJ, EMBL, and Genbank) were performed with the BLAST program (Altschul et al., 1990). Primary nucleotide sequences were analyzed with the GENETYX-Mac 12 system program. Multiple sequences were aligned by use of the program GENETYX-Mac 12 and Clustal W version 1.7 (Thompson et al., 1994). A phylogenetic tree was constructed based on the nucleotide sequences of p-SINE members by the computer program Clustal W version 1.7. The mean genetic distance was calculated according to Lenoir et al. (2001).
The PCR analysis was performed with Ex Taq DNA polymerase (Takara), as described previously (Motohashi et al., 1997). The presence or absence of each p-SINE member was determined by identifying one unique PCR fragment with or without a p-SINE member after electrophoresis in a 1.8% agarose gel. When the fragments differed in size or when two or more bands were generated, the presence or absence of the p-SINE member in the fragments was confirmed by Southern hybridization or by direct sequencing of the PCR products, as described previously (Cheng et al., 2003).
Nucleotide sequence data with information for p-SINE2 and its members (r2001. r2010 and r2012–r2018) and for p-SINE3 and its members (r3001–r3024) appear in the DDBJ/EMBL/GenBank International Nucleotide Sequence Databases under the accession number AB206875–AB206884, AB206886–AB206893 and AB206894–AB206918, respectively.
We have previously identified and characterized many p-SINE1 members in several O. sativa strains. All of them appear to be very closely related to one another, except for two members, r24 and r3011 (Fig. 1A). We searched for homologous sequences to each of the two members firstly in chromosome 3 and chromosome 10 of O. sativa var. Nipponbare in genomic DNA databases (http://search.usricegenome.org/). We identified some members showing strong homology with each member, suggesting that these two form new SINE families. The two new SINE families were, therefore, designated as p-SINE2 and p-SINE3 to distinguish them from p-SINE1. Using a consensus sequence derived from each of the p-SINE family members, we identified 17 members of p-SINE2 and 24 members of p-SINE3 in O. sativa in nucleotide sequence databases (Fig. 2). The members of p-SINE2 or p-SINE3 formed a branch different from each other and from p-SINE1 (Fig. 1B), confirming that p-SINE2 and p-SINE3 are new families distinct from p-SINE1. Almost all members of p-SINE2 and p-SINE3 are flanked by direct repeats of a sequence, 8–20 bp in length, at a target site, except one member of p-SINE2 and three members of p-SINE3 (Fig. 2). The consensus sequences derived from the nucleotide sequences of all the members of each family (Fig. 2) were compared with the p-SINE1 consensus sequence, and found to have significant homology (about 80%) in their 5’-end regions with the polymerase III promoter (A box and B box), but have poor homology in their 3’-end regions (less than 40%), although they all contained T-rich tails at their 3’ ends (Fig. 3). The 5’-end regions had no significant homology to any other SINEs. This suggests that the three p-SINE families originated from a common ancestor.
![]() View Details | Fig. 1. Phylogenetic trees of p-SINE members. A. A phylogenetic tree of p-SINE1-family members. B. A phylogenetic tree of p-SINE2- and p-SINE3-family members. Only two p-SINE1 members are included. These trees were constructed based on their nucleotide sequences. The scale bar equals a distance of 0.1. |
![]() View Details | Fig. 2. Alignments of nucleotide sequences of members of two new p-SINE families. (A) p-SINE2 family members. (B) p-SINE3 family members. All members of p-SINE2 and p-SINE3 were identified from O. sativa japonica variety Nipponbare, except r3001, which was identified from O. sativa indica variety 93–11. The consensus sequences of p-SINE2 and p-SINE3 are shown at the top. Bars denote identical nucleotides to those in the respective consensus sequences and slashes indicate gaps introduced to maximize homology. Sequences corresponding to the A- and B-boxes of the polymerase III promoter are shown by boldface letters. Almost all members of p-SINE2 and p-SINE3 families are flanked by direct repeats of a target site sequence, 9–20 bp in length, which are double-underlined, except one member (r2012) of p-SINE2 and three members (r3007, r3016A and r3018) of p-SINE3. |
![]() View Details | Fig. 3. Comparison of consensus sequences of three p-SINE elements from rice. Schematic structures of the three elements are shown at the top. RNA polymerase III promoter elements (A box and B box), the AT rich region and the T-rich tail are indicated by different geometrical patterns. Alignments of consensus sequences of p-SINE1, p-SINE2 and p-SINE3 are shown at the bottom. Identical nucleotides and gaps are indicated by asterisks and bars, respectively. The nucleotides of the A- and B-boxes are shown in boldface letters. |
The RNA secondary structure of p-SINE1 has been previously determined (Fig. 4; Osawa, 2003). Based on this structure, the RNA secondary structures of p-SINE2 and p-SINE3 were deduced (Fig. 4). They were found to be similar to one another, although they contained many nucleotide substitutions at their 3’-end regions. It is particularly interesting that a stem-loop structure seen in the 3’-end region in the RNA secondary structure is highly conserved, despite the large number of substituted nucleotides (Fig. 4). The conservation of the stem-loop structure suggests that the structure has an important role in p-SINE retroposition.
![]() View Details | Fig. 4. RNA secondary structures of three rice p-SINE elements. Structures shown in A–C are derived from consensus sequences of p-SINE1, p-SINE2 and p-SINE3, respectively. The structure of p-SINE1 shown in A has been determined previously (Osawa, 2003). The nucleotides substituted in p-SINE2 and p-SINE3 are indicated by letters in red in comparison with the p-SINE1 sequence. |
In a similar manner to the p-SINE1 members, the p-SINE2 and p-SINE3 members appear to be dispersed randomly along each of the 12 chromosomes (Fig. 5; Ohtsubo et al., 2004), but unlike other retrotransposons and autonomous transposable elements that form a cluster in the heterochromatin of pericentromeric regions (Sasaki et al., 2002; Feng et al., 2002; The Rice Chromosome 10 Sequencing Consortium 2003). Of 17 p-SINE2 members, 10 were present in the putative or hypothetical exons or introns, or within a 0.5 kb region of genes or the putative coding regions (Table 1). Of 24 p-SINE3 members, 12 were present in the putative or hypothetical exons or introns, or within a 0.5 kb region of genes or the putative coding regions (Table 1). This suggests that p-SINE2 and p-SINE3 tend to be inserted within or near genes, like p-SINE1 (Ohtsubo et al., 2004) and other SINEs from Arabidopsis and Alu elements from human (Lenoir et al., 2001; Grover et al., 2004). Although SINEs may have an impact on gene expression (Oldridge et al., 1999; Deininger and Batzer, 1999), the presence of p-SINE elements in gene-rich regions may be explained by their short size, which may be better tolerated in such regions than those with longer size, and by their non-autonomous characteristics, which may lead to their survival in gene-rich regions.
![]() View Details | Fig. 5. Chromosomal locations of p-SINE members. A. p-SINE2 family members. B. p-SINE3 family members. Positions of the members are located on marker-based physical maps derived from the International Rice Genome Sequencing Project homepage (http://rgp.dna.affrc.go.jp/IRGSP/download.html). Horizontal bars on each of 12 rice chromosomes indicate members present in the Nipponbare genome, except a p-SINE3 member r3001 on chromosome 11, which is not present in the Nipponbare genome but in the 93-11 genome. A p-SINE3 member r3005 on chromosome 1 is not present in the 93-11 genome. Centromeric regions (solid boxes) are depicted according to the physical maps shown in the URL above. |
![]() View Details | Table 1. Locations of p-SINE2 and p-SINE3 family members |
We examined the presence or absence of the p-SINE2 members at respective loci in 19 strains of species with various genome types (Table 2), by PCR using a pair of primers that hybridize to the flanking regions of each p-SINE2 member. All p-SINE2 members were present in the strains of all or some species with the AA, BB, BBCC, CC, CCDD or EE genome, but appeared to be absent in the strains of species with the FF, GG or HHJJ genome (Table 2). This shows that they are distributed in the species of the O. sativa and O. officinalis complexes, which suggests that the p-SINE2 family originated in an ancestor of a species with the AA, BB, CC, DD or EE genome. Almost all the p-SINE2 members characterized do not show any insertion polymorphism in the O. sativa complex (Table 2). This suggests that the p-SINE2 family members may have retroposed in the past and stably maintained in recent times.
![]() View Details | Table 2. The presence or absence of p-SINE2 family members at respective loci in the rice strains of various species |
We examined the presence or absence of p-SINE3 members in some of the strains analyzed above and obtained results suggesting that p-SINE3 members are present in the strains of the AA-genome species, but not in those of the non-AA genome species, such as O. punctata and O. officinalis. Therefore, we examined 39 strains of seven AA genome species (Table 3), for the presence or absence of the p-SINE3 members, as described above. Some p-SINE3 members (such as r3007, r3010, and r3015) were present in the strains of all species with the AA genome (Table 3). This suggests that p-SINE3 originated in a common ancestral strain of the species with the AA genome. Other p-SINE3 members (such as r3001, r3005, r3009 and r3018) showed intra-species insertion polymorphisms. This suggests that p-SINE3 was amplified recently on an evolutionary time scale.
![]() View Details | Table 3. The presence or absence of p-SINE3 family members at respective loci in various rice strains |
The consensus sequence derived from nucleotide sequences of members of an element family approximates the sequence of the founder element (Jurka, 1998), and the genetic distance of the members from its consensus sequence is related to the age of the family. To estimate the age of each of the three p-SINE families, we aligned the nucleotide sequences of 52 members of p-SINE1, 17 of p-SINE2 or 24 of p-SINE3, with the consensus sequence of each element and calculated the genetic distance of the members of each family. The mean distance from the consensus sequence was determined to be 0.0595 for the p-SINE3 family members, which was much smaller than those (0.151 and 0.114) for the p-SINE1 and p-SINE2 family members, respectively. This finding indicates that the p-SINE3 family is younger than the p-SINE1 and p-SINE2 families. This is consistent with the fact that the p-SINE3 family members are only present in the strains of the species with the AA genome, whereas members of the p-SINE1 and p-SINE2 families are not only present in strains of the AA-genome species, but also in those of the non-AA genome species, as described in the previous section. The above finding also indicates that p-SINE1 is older than p-SINE2. As described earlier, the three p-SINE families are thought to have originated from a common ancestor. The largest mean distance of p-SINE1 suggests that p-SINE2 and p-SINE3 were derived from p-SINE1.
SINEs are non-autonomous elements that must use the enzymatic machinery from an autonomous retroelement in trans for their retroposition. Since some SINEs and LINEs share a 3’-tail sequence, the most probable candidates for providing the retroposition machinery to SINEs are retrotranspositionally active LINE elements (Ohshima et al., 1996; Smit, 1996; Gilbert and Labuda, 1999; see Okada et al., 1997 for a review), which suggests that the LINE-encoded RTase mobilizes the passive SINE by recognizing the 3’ tail of the SINE RNA as the template for reverse transcription (Luan et al., 1993; Luan and Eichbush, 1995; Ohshima et al., 1996). A HeLa-cell retrotransposition assay was used to demonstrate that the proteins encoded by an eel LINE element could function in trans to mobilize an eel SINE (Kajikawa and Okada, 2002). Recently, it has been reported that a tagged reporter gene driven by transcription of a ‘young’ Alu sequence could be trans-mobilized by human LINE L1 (Dewannieux et al., 2003). This indicates that the mobilization of SINE elements is mediated by their partner LINE elements. Therefore, it is possible that the mobilization of the three SINE elements in rice have been mediated by their putative respective partner LINEs in rice, because of the difference in their 3’-end regions. As stated above, p-SINE2 and p-SINE3 families are thought to be derived from the p-SINE1 family, possibly by nucleotide substitutions in the 3’-end region. Therefore, we speculate that the partners of the p-SINE2 and p-SINE3 elements are derived from that of the p-SINE1 family. However, no rice LINE elements, which share the 3’-end regions with any p-SINE elements, have been identified to date.
This work was supported by a grant from the Ministry of Education, Culture, Sports, Science and Technology of Japan.
|