Epigenetic regulation of transcription and possible functions of mammalian short interspersed elements , SINEs

Short interspersed elements (SINEs) are a class of retrotransposons, which amplify their copy numbers in their host genomes by retrotransposition. More than a million copies of SINEs are present in a mammalian genome, constituting over 10% of the total genomic sequence. In contrast to the other two classes of retrotransposons, long interspersed elements (LINEs) and long terminal repeat (LTR) elements, SINEs are transcribed by RNA polymerase III. However, like LINEs and LTR elements, the SINE transcription is likely regulated by epigenetic mechanisms such as DNA methylation, at least for human Alu and mouse B1. Whereas SINEs and other transposable elements have long been thought as selfish or junk DNA, recent studies have revealed that they play functional roles at their genomic locations, for example, as distal enhancers, chromatin boundaries and binding sites of many transcription factors. These activities imply that SINE retrotransposition has shaped the regulatory network and chromatin landscape of their hosts. Whereas it is thought that the epigenetic mechanisms were originated as a host defense system against proliferation of parasitic elements, this review discusses a possibility that the same mechanisms are also used to regulate the SINE-derived functions.


INTRODUCTION
About a half of nucleotides in the mammalian genomes is made by retrotransposable elements, which include short interspersed elements (SINEs), long interspersed elements (LINEs) and long terminal repeat (LTR) elements (Deininger and Batzer, 2002;Deininger et al., 2003).These elements can amplify their copy numbers by transposition via an RNA intermediate, called as retrotransposition (Fig. 1).SINE is a collective name for highly iterative sequences of typically 100-500 base pairs (bp) in length (Singer, 1982).Examples of mammalian SINEs are human Alu (named for the presence of an AluI cleavage site) (Rubin et al., 1980;Deininger et al., 1981) and mouse B1 and B2 (named for their homology to double-stranded regions in nuclear pre-mRNAs, called as dsRNA-B) (Kramerov et al., 1979).Currently, 175 SINE families have been identified in a wide variety of eukaryotes (Vassetzky and Kramerov, 2013).In mammals, more than a million copies of SINEs are present in a genome, making up > 10% of the total genomic sequence (Lander et al., 2001;Waterston et al., 2002;Lindblad-Toh et al., 2005;Mikkelsen et al., 2005Mikkelsen et al., , 2007;;Elsik et al., 2009).Whereas transposable elements have been considered as selfish or junk DNA (Ohno, 1972;Doolittle and Sapienza, 1980;Orgel and Crick, 1980), recent findings in genomic and epigenomic researches suggests that some of their copies have functional roles in gene regulation and chromatin organization.This review provides an overview on the structure, transcriptional regulation and possible functions of SINEs in mammals, and discusses how these functions are regulated.Hopefully, this review would be helpful not only for researchers of transposons but also for those working on genome-wide studies.
considerably different between the families, they are all descended from cellular RNAs transcribed by RNA polymerase III (Pol III).Thus, in addition to the definition by Singer (1982), the transcription by Pol III is currently an important point of the SINE definition (Okada, 1991).Most SINEs such as mouse B2 are of tRNA origin (Daniels and Deininger, 1985;Sakamoto and Okada, 1985), whereas a few families such as human Alu and mouse B1 were originated from 7SL RNA (Weiner, 1980;Ullu and Tschudi, 1984), an RNA component of signal recognition particle, and a few from 5S ribosomal RNA (Kapitonov and Jurka, 2003).These Pol III genes and SINEs have promoter elements called A-and B-boxes in the transcribed region (Geiduschek and Kassavetis, 2001;Schramm and Hernandez, 2002) (Fig. 1).For transcription by the Pol III enzyme, the A-and B-boxes are first recognized by a six-subunit protein complex, TFIIIC.This protein-DNA interaction leads to the binding of the three-subunit TFIIIB complex, which recruits the Pol III enzyme.The transcription starts from the upstream of A-box, through the promoter region, to the site of termination signal, which is a simple run (4 or more) of thymidine residues.Right after the discovery of SINEs, Jagadeeswaran et al. (1981) proposed that SINEs are amplified within the genome by retrotransposition.SINEs do not encode any protein for their autonomous mobilization, whereas LINEs encode a protein with reverse transcriptase and endonuclease activities essential for LINE retrotransposition (Moran et al., 1996) (Fig. 1B).The LINE-encoded reverse transcriptase recognizes the RNA from which it was translated (Wei et al., 2001), and initiates the mobility reactions termed target-primed reverse transcription or TPRT (Luan et al., 1993); it cleaves target genomic DNA and reverse transcribes the bound RNA using the site of DNA cleavage as a primer (Fig. 1C).The RNA region recognized by LINE reverse transcriptases is usually the 3′ untranslated region with a specific sequence (Kajikawa and Okada, 2002;Anzai et al., 2005), or the polyadenylated tail in the case of LINE-1 (L1) reverse transcriptases (Wei et al., 2001).
An important feature of the SINE structure is that the 3′ portion of SINE shows homology to the 3′ portion of its partner LINE present in the same genome (Ohshima et al., 1996;Ohshima and Okada, 2005) (Fig. 1A), which enables the SINE RNA be reverse transcribed by the partner LINE reverse transcriptase (Fig. 1C).Human Alu is ended with a polyA sequence (Fig. 2) so that it can be mobilized by the human L1 reverse transcriptase (Dewannieux et al., 2003), which recognizes the polyA region in the RNA.Mouse B1 and B2 are also ended with a polyA-like sequence (Fig. 2), and their mobility depends on the mouse L1 reverse transcriptase (Dewannieux and Heidmann, 2005).Because SINE RNAs carry the sequence information of A-and B-boxes, their retrotrans-posed copies retain the promoter sequence.

THE MYSTERIES OF DISTRIBUTION
Because of the target-primed mechanism, the insertion sites of LINEs and their cognate SINEs are dictated by the LINE-encoded endonucleases, which have target-site specificity; for example, 5′-TTAAAA-3′ for human L1 (Jurka, 1997;Cost and Boeke, 1998).Therefore, these LINEs and SINEs should be inserted in similar genomic sequences.However, they occupy distinct parts of the genomes, and their regional densities are negatively correlated to each other (Fig. 3), where SINEs and LINEs reside predominantly in R and G bands, respectively (Korenberg and Rykowski, 1988).The SINE-rich regions are also rich in genes, whereas the LINE-rich regions are gene-poor (Lander et al., 2001).The LINE distribution can be explained by their potentially harmful effects when they are inserted within or close to genes.However, it cannot be easily explained why SINEs should have been excluded from gene-poor regions.Although the exact nature of selection is still controversial (Medstrand et al., 2002;Jurka et al., 2004), it can be hypothesized that positive selection has been operating on SINEs inserted in or close to genes during evolution, if we assume that SINEs have functional roles in gene regulation (Britten, 1996).The idea of selective retention is consistent with the observation that Alu and B1 have accumulated in similar sets of genes and orthologous regions (Waterston et al., 2002;Tsirigos and Rigoutsos, 2009), despite that they have independently proliferated in the primate and rodent lineages.In any event, the different distribution patterns of SINEs and LINEs could be involved in the formation of distinctive chromatin domains in the nucleus.For instance, the peripheral space of the nucleus is usually transcriptionally silent, whereas active transcription occurs in the interior space.Interestingly, we and others have shown that LINE sequences are accumulated in the nuclear periphery whereas SINEs are enriched in the interior space (Bolzer et al., 2005;Guelen et al., 2008;Solovei et al., 2009;Ichiyanagi et al., 2011), suggesting their involvement in the evolution of nuclear architecture and function.
The age distribution of SINE subfamilies is also a mystery.Each retrotransposon family is divided into subfamilies based on their sequence similarity.For example, human Alu consists of AluJ, AluS, and AluY subfamilies, which can be further divided into subsubfamilies.From the sequence alignments of individual genomic copies, the consensus sequences of respective subfamilies can be constructed.These consensus sequences likely represent the sequences of master loci, which were amplified in the past by retrotransposition (Deininger et al., 1992).Because these retrotransposed copies have then accumulated mutations, the divergence of individual genomic copies from the consensus sequence roughly correlates with the evolutional time after retrotransposition.As shown in Fig. 4 for mouse SINEs  Ohshima et al., 2003 for the Alu subfamilies), the statistics of their divergence clearly indicates that each subfamily has a distinct evolutionary period of amplification.Interestingly, while one subfamily is being inactivated, the activity of another subfamily with a similar sequence (but not identical) becomes increased.Although mechanisms of the subfamily extinction are not fully understood, these observations suggest that the host defense system acts individually on subfamilies in a sequencespecific manner.

EPIGENETIC REGULATION OF SINE TRANSCRIPTION
The vast majority of genomic SINE copies have been genetically inactivated by mutations disrupting the promoter function, and only a small subset retains the transcriptional potential.Thus, SINE-derived transcripts are generally very low or undetectable in somatic tissues (Adeniyi-Jones and Zasloff, 1985;Kaplan et al., 1985;Paulson and Schmid, 1986;Bachvarova, 1988;Ichiyanagi et al., 2011), although SINEs outnumber their parental RNA genes.On the other hand, spermatogenic cells, oocytes, and embryos of early development allow SINE transcription.The tissue-specific expression implies that SINE transcription is epigenetically regulated.In mammals, cytosine methylation of genomic DNA at CpG sites is one of major epigenetic modifications that repress genes, LINEs and LTR elements (Walsh et al., 1998;Bird, 2002;Bourc'his and Bestor, 2004;Tsumura et al., 2006), all of which are transcribed by RNA polymerase II (Pol II).The human Alu of ~290 bp in length carries up to 25 CpG sites both inside and outside of its Pol III promoter (Fig. 2A).These CpG sites are highly methylated in somatic tissues (Hellmann-Blumberg et al., 1993;Kochanek et al., 1993;Xie et al., 2009), and in vitro experiments have demonstrated that CpG methylation inhibits Pol III transcription from Alu and tRNA genes (Besser et al., 1990;Englander et al., 1993;Kochanek et al., 1993;Liu and Schmid, 1993).DNA methylation likely interferes TFIIIC to bind the A-and B-boxes, thereby Pol III can not be loaded on the promoter.In addition, Alu is marked with lysine-9 trimethylated histon H3 (H3K9me3) (Kondo and Issa, 2003), which are important for heterochromatin formation and repression of LTR elements (Matsui et al., 2010;Karimi et al., 2011).Accordingly, chromatin immunoprecipitation for Pol III has shown that vast majority of Alu elements are not bound by Pol III in cultured cells, thus transcriptionally silent (Oler et al., 2010).The mouse B1 is also derived from 7SL RNA, and contains up to 8 CpG sites in its ~145-bp sequence (Fig. 2B).The DNA methylation level at B1 elements (high in somatic cells and relatively low in germ cells and preimplantation embryos) negatively correlates with the B1 RNA abundance (Ichiyanagi et al., 2011), suggesting transcriptional regulation by DNA methylation.On the other hand, mouse B2 transcripts can be detected in some somatic tissues (Bachvarova, 1988;Li et al., 1999), despite that B2 elements (up to 6 CpG sites in ~190 bp; Fig. 2C) are DNA-methylated in somatic cells (Ichiyanagi et al., unpublished data).CpG is absent from the A-and B-boxes of the B2 promoter, suggesting a milder effect of DNA methylation on TFIIIC binding.For both mouse SINEs, histone modifications are not well characterized.
The germline is a battleground between retrotransposons and their hosts, where their potentially deleterious new insertions can be transmitted to the next generation.Whereas DNA methylation plays an important role in LINE and LTR repression, genomic DNA methylation becomes stripped off in primordial germ cells, which are the precursors of gametes, in mammals (Reik et al., 2001;Fig. 4. Statistics of nucleotide divergence of mouse SINE families.Fraction of genomic copies (y-axis) with indicated nucleotide divergence from the consensus sequences (x-axis, 1% interval) is shown for B1 (A) and B2 (B) subfamilies.Subfamily classification and their divergence were referred to RepeatMasker outputs downloaded from the UCSC genome browser (http://genome.ucsc.edu/).Because of high similarity in the divergence distribution, the data for B1_Mm, B1_Mus1, and B1_Mus2 are merged and shown as B1_Mm.The data for B1F and PB1 (B1F/PB1), as well as those for B2_Mm1a and B2_Mm1t (B2_Mm1), are also merged.Sasaki and Matsui, 2008).Under such situation, developing germ cells produce a class of small RNAs (24 to 33 nucleotides long) called piRNAs by the actions of PIWIfamily Argonaute proteins (Siomi et al., 2011;Chuma and Nakano, 2013).Majority of piRNAs are derived from retrotransposons, and the elimination of piRNAs from male germ cells by mutations in Mili, Miwi2, or MitoPLD results in LINE overexpression and hypomethylation (Aravin et al., 2007;Kuramochi-Miyagawa et al., 2008;Watanabe et al., 2011).The piRNA-dependent transposon silencing in the germline is also observed in insects (see the review by Saito, 2013).Therefore, piRNAs constitute the germline defense system against retrotransposons in animals, and the piRNA-dependent DNA methylation is analogous to the small RNA-mediated DNA methylation at transposons and repeats in plants (see the review by Ito, 2013).However, the piRNA defense system does not seem to regulate SINEs, since our recent study revealed that DNA methylation and RNA abundance of mouse B1 are not affected in the Mili -/-or MitoPLD -/-male germ cells (Ichiyanagi et al., 2011).SINE transcription may be more tolerated because it is not sufficient for retrotransposition.Presumably, LINE repression is a more important issue for the hosts, because LINEs play a pivotal role in both LINE and SINE amplification.

POSSIBLE SINE FUNCTIONS
SINEs produce RNAs that are similar to functional cellular RNAs and the expression of some mammalian SINEs, human Alu, mouse B1 and B2, and rabbit C, is induced upon cell stress such as heat shock (Fornace and Mitchell, 1986;Liu et al., 1995;Li et al., 1999), suggesting that SINE-derived RNAs have a functional role in stress response (Schmid, 1998).Indeed, the Alu and B2 RNAs, of different origins (7SL vs. tRNA), inhibit Pol II transcription via direct interaction with the Pol II enzyme in vitro, which is implicated in the transcriptional gene repression observed during heat shock (Allen et al., 2004;Espinoza et al., 2004;Mariner et al., 2008).Moreover, the Alu RNA has been shown to increase the rate of protein synthesis via inhibition of the PKR kinase (Chu et al., 1998).In case where an exonic SINE (usually in 5′ or 3′ UTR) is co-transcribed by Pol II, it can regulate the host gene expression as a constituent of the mRNA sequence; B1-containing mRNAs are targeted by siRNAs during very early embryogenesis (Ohnishi et al., 2012) and an inverted Alu pair in mRNA can form a duplex and serve as a target for RNA editing (Athanasiadis et al., 2004;Kim et al., 2004;Levanon et al., 2004).
On the other hand, evidence has also been accumulating that SINEs can function as a DNA element at their genomic locations.First, Alu provides a site of alternative splicing when inserted in introns (Sorek et al., 2002;Lev-Maor et al., 2003) (Fig. 5A), changing the splicing pattern of host genes.Thus, the presence of > 500,000 intronic Alu copies in the human genome implies that Alu has a significant impact on the evolution of the human proteome (Gotea and Makalowski, 2006).Second, some Alu and B1 copies contain binding sites for transcription factors such as SP1, p53, NFκB, retinoic acids receptors (RARs), and aryl hydrocarbon receptor (AhR) (Vansant and Reynolds, 1995;Piedrafita et al., 1996;Oei et al., 2004;Polak and Domany, 2006;Apostolou and Thanos, 2008;Roman et al., 2008;Zemojtel et al., 2009) (Fig. 5B), suggesting that Alu and B1 retrotransposition has shaped the regulatory networks in their hosts (Feschotte, 2008).Third, some genomic copies of ancient SINEs, namely AmnSINE1, LF-SINE, and MIR (Ther-1), serve as a tissue-specific distal enhancer conserved in mammalian genomes (Bejerano et al., 2006;Nishihara et al., 2006;Santangelo et al., 2007;Sasaki et al., 2008;Tashiro et al., 2011;Nakanishi et al., 2012) (Fig. 5C).Interestingly, the regulated genes are involved in mammalian specific traits such as brain functions, suggesting their role in establishment of mammals (Okada et al., 2010).Forth, Alu induces formation of stable nucleosomes positioned in phase around the elements (Fig. 5D), which would obstruct transcription initiation when located near a promoter or enhancer, and attenuate transcription elongation when located in a gene body (Englander and Howard, 1995;Tanaka et al., 2010;Bettecken et al., 2011).Fifth, mouse B1 copies residing close to (less than 1 kb from) transcription start sites of Pol II genes are involved in the gene repression in somatic tissues, which could explain why germ cell-specific genes harbor many SINE copies in their promoter regions (Ichiyanagi et al., 2011) (Fig. 5E).The repression of these genes correlates with hypermethylation of the B1 sequences in somatic tissues but not in germ cells, whereas the regions across the transcription start sites are unmethylated in both somatic and germ cells.Although detailed mechanisms are under investigation, the results suggest that B1 retrotransposition into promoters has an impact on evolutionary changes in gene expression patterns.Finally, a copy of B2 in mouse chromosome 11 has been shown to function as a developmentally regulated boundary between euchromatin and heterochromatin (Lunyak et al., 2007).B2 carries not only a Pol III promoter but also a Pol II promoter in the reverse direction (Ferrigno et al., 2001), and the boundary formation requires both Pol II and Pol III transcription at that B2 copy (Lunyak et al., 2007).Some mouse B1 copies also can function as a boundary when bounded by two transcription factors, Slug and AhR (Roman et al., 2011).Since TFIIIC can form a chromatin boundary at its binding sites independently of Pol III (Noma et al., 2006), it is possible that these SINE copies serve as a TFIIIC binding site for boundary formation (Lunyak and Atallah, 2011).CTCF is another, and better known, protein that Epigenetic regulation of SINE activities forms a chromatin boundary with a moderate sequence specificity for DNA binding (Phillips and Corces, 2009).Recently, a genome-wide comparative study on CTCF binding revealed that a number of species-specific CTCFbinding sites have been created by SINE sequences, such as rodent B2 and B3, dog CAN, and opossum Mar1, during mammalian evolution (Schmidt et al., 2012).In relation to the boundary of histone modifications, it was also shown recently that topological domains, characterized by frequent local chromatin interactions in nuclei, are separated by small regions enriched with SINEs and tRNA genes (Dixon et al., 2012).Taken together, SINEs seem to play important roles in shaping the higher order chromatin regulation.

EPIGENETIC SYSTEMS REVISITED: AGAINST SINE TRANSCRIPTION, SINE FUNCTIONS, OR BOTH?
The modifications on DNA and histones play a pivotal role in transcriptional repression of retrotransposons.In addition, the heterochromatic modifications likely sup-press ectopic recombination between transposon copies, which causes genomic rearrangements.Hence, these epigenetic modifications are thought as a host defense system against the parasitic elements.On the other hand, it is conceivable that the same epigenetic modifications also regulate the SINE functions.For instance, the heterochromatic modifications could inhibit undesired transcription-factor binding and the formation of an undesired chromatin boundary.The activities of the tissue-specific enhancers must be regulated epigenetically, and the B1-induced somatic repression is shown to involve DNA methylation as stated.Therefore, it is tempting to speculate that the primary role for epigenetic modifications in SINEs is to regulate these SINE-encoded functions.In this regard, the epigenetic regulation of TFIIIC binding is of great interest, because it is involved not only in SINE transcription but possibly in SINE functions as well.Recently, chromatin immunoprecipitation of the subunits of Pol III, TFIIIB, and TFIIIC became feasible (Moqtaderi et al., 2010;Oler et al., 2010;Carriere et al., 2012), providing a good prospect for future research.

CONCLUDING REMARKS
Given their enormous copy numbers, deeper understanding of SINEs is now inevitable for genomic, epigenomic, and evolutionary studies.SINEs and other retrotransposons clearly have shaped the host genomes by changing genomic sequences, splicing patterns, regulatory networks, chromatin networks, and developmental program.However, despite intensive studies on human and mouse SINEs, the mysteries of their genomic distribution and evolutionary dynamics remain unsolved.Moreover, many other mammalian SINEs are yet uncharacterized in terms of transcription and function.Therefore, future research directions would be toward understanding how Pol III and TFIIIC binding is regulated, what molecules are involved in SINE-derived enhancers and chromatin boundaries, how SINE retrotransposition is regulated in the germline and early development, and to what extent the proposed functions are conserved or diverged in a wide variety of mammalian and non-mammalian SINEs.It is undoubted that epigenetic regulation will be the pivot for these studies.

Fig. 1 .
Fig. 1.SINEs, LINEs and their retrotransposition.A. Typical SINEs contains A-and B-box sequences and a region (shown in green) homologous to a LINE in the same genome.To produce a SINE transcript (shown as wavy line), TFIIIC binds to A-and B-boxes and recruits TFIIIB and Pol III.The transcription ends at a downstream terminator, TTTT, occurring by chance.B. LINE encodes a protein with reverse transcriptase (RT) and endonuclease (EN) activities.C. Mechanism of target-primed reverse transcription for LINE/SINE retrotransposition.The LINE protein cleaves the target DNA and initiates reverse transcription (cDNA synthesis) of SINE RNA.Recognition of the RNA template is mediated via the LINE homology region present in the 3′ region of the SINE.This figure represent the main mobility pathway, and all the other proposed pathways also involve the reverse transcriptase-template RNA recognition(Ichiyanagi et al., 2007).

Fig. 2 .
Fig. 2. Structure and position of CG dinucleotides in human and mouse SINEs.The positions of CG sites in human Alu (top) and mouse B1 (middle) and B2 (bottom) are shown as lollipops.The 7SL-or tRNA-like regions and polyA sequences are also indicated.

Fig. 3 .
Fig. 3.The inversely correlated local LINE and SINE densities in the mouse genome.The chromosomal bands (top), SINE density (middle, copies per megabases), and LINE density (bottom) are shown for each mouse chromosome.The SINE and LINE densities are shown in yellow-to-green and yellow-to-red gradients, respectively.The window size for density calculation is 0.5 Mb.

Fig. 5 .
Fig. 5. Possible SINE functions as DNA elements.(A) SINEs inserted in genes provide a splicing site resulting in generation of a SINE-containing isoform (isoform 2).(B) transcription factor (TF) binding sites in SINE sequence may affect neighboring genes.(C) SINEs can regulate the gene expression at a distance as a tissue-specific enhancer.(D) Alu comprises stable dinucleosome and stabilizes nucleosome positioning in the neighboring regions.This may affect the promoter/enhancer function or transcriptional elongation.(E) SINE methylation-mediated gene regulation.When the SINE DNA is methylated (somatic cells, top), the neighboring gene is down-regulated.Upon SINE demethylation (germ cells, bottom), the gene expression is activated.(F) a boundary between heterochromatin (left region) and euchromatin (right region).TFIIIC, CTCF, other factors, or combination of these may be involved in the boundary formation at SINE.