2023 年 98 巻 3 号 p. 121-154
Genome sequencing revealed that nearly half of the human genome is comprised of transposable elements. Although most of these elements have been rendered inactive due to mutations, full-length intact long interspersed element-1 (LINE-1 or L1) copies retain the ability to mobilize through RNA intermediates by a so-called “copy-and-paste” mechanism, termed retrotransposition. L1 is the only known autonomous mobile genetic element in the genome, and its retrotransposition contributes to inter- or intra-individual genetic variation within the human population. However, L1 retrotransposition also poses a threat to genome integrity due to gene disruption and chromosomal instability. Moreover, recent studies suggest that aberrant L1 expression can impact human health by causing diseases such as cancer and chronic inflammation that might lead to autoimmune disorders. To counteract these adverse effects, the host cells have evolved multiple layers of defense mechanisms at the epigenetic, RNA and protein levels. Intriguingly, several host factors have also been reported to facilitate L1 retrotransposition, suggesting that there is competition between negative and positive regulation of L1 by host factors. Here, we summarize the known host proteins that regulate L1 activity at different stages of the replication cycle and discuss how these factors modulate disease-associated phenotypes caused by L1.
Approximately 20 years have passed since the initial draft of a human genome sequence was released (Lander et al., 2001; Venter et al., 2001). A surprising discovery was that the protein-coding exons cover only a small fraction (~1.5%) of the human genome, with the number of genes estimated to be ~20,000. The rest of the genome is comprised of “non-coding” regions, initially viewed as “junk” DNA. However, in light of extensive subsequent studies, it is now evident that non-coding DNA sequences are crucial to dictate gene expression, as they may include regulatory elements such as promoters, enhancers, introns or untranslated regions (UTRs) for respective mRNAs. Some non-coding DNA sequences are transcribed by RNA polymerases into functional and structural non-coding RNAs, including ribosomal RNAs (RNA polymerase I), long non-coding RNAs and microRNAs (RNA polymerase II), and transfer, small nucleolar and small nuclear RNAs (RNA polymerase III) (Lander, 2011; Kaikkonen and Adelman, 2018). The rest of the non-coding regions mainly consist of repetitive DNA sequences such as transposable elements (TEs) (Lander et al., 2001), which are transcribed by RNA polymerase II or III (Swergold, 1990; Chu et al., 1995). Surprisingly, TEs comprise nearly half of the human genome, in line with early estimates from seminal DNA melting and reassociation experiments performed over 50 years ago (Waring and Britten, 1966; Britten and Kohne, 1968). Although these mobile genetic elements contribute to phenotypic variation and pathologies, much remains to be elucidated about their mechanisms of mobilization and their interplay with host genomes. In this review, we offer an overview of the impacts and mechanisms of human long interspersed element-1 (hereinafter L1) retrotransposition, discuss how L1 and other retrotransposons can be a major cause of multiple diseases through retrotransposition-dependent and -independent mechanisms, and focus on the dynamic interaction between L1 and host factors in the evolutionary struggle to avoid potentially harmful effects provoked by L1 expression.
The idea that genetic elements are not static and have the ability to mobilize their sequences into different sites in the genome was originally proposed by Barbara McClintock after observing the phenomenon in maize chromosomes; the finding was described in a seminal paper published in 1950 (McClintock, 1950). The mobile genetic elements observed in maize were later characterized as Activator/Dissociation transposons (McClintock, 1950; Fedoroff et al., 1983). Later, it was discovered that TEs not only exist in most organisms, but are also abundant, in some cases covering almost the entirety of the genome (e.g., ~85% of the maize genome) (Schnable et al., 2009). However, many TEs have lost the ability to mobilize due to accumulated mutations and the host–TE arms race that occurs throughout evolution. In general, TEs are classified into class I (retrotransposon) and class II (DNA transposon), based on their mode of mobilization, through either an RNA or a DNA intermediate, respectively. Although TEs were previously regarded as “junk” DNAs due to their repetitive and abundant nature, more evidence has shown that they contribute to organismal complexity through genome rearrangement; in addition, some TEs have acquired regulatory functions that impact gene expression and splicing (Han et al., 2008; Cordaux and Batzer, 2009; Chuong et al., 2017; Rodriguez-Martin et al., 2020).
DNA transposonsA majority of DNA transposons “jump” by removing (cutting) their sequences from the original sites and inserting (pasting) them into a new genomic location through transposase enzymatic activity (Fig. 1A). In this “cut-and-paste” mechanism, the transposase activity can be provided by either the same element (in cis) or a different DNA transposon (in trans). These elements are usually flanked by inverted terminal repeat (ITR) sequences that serve as recognition sites for the transposase. DNA transposons are inactive in mammals, with bats as an interesting exception (Ray et al., 2008; Mitra et al., 2014), and make up ~3% of the human genome (Lander et al., 2001). Although all DNA transposons are inactive in humans due to accumulated mutations, some have been domesticated and have acquired host cellular function(s) over evolutionary timescales. Among the best-studied examples are the recombination activating gene 1 (RAG1) and RAG2 proteins, which belong to the Transib DNA transposon family and are important for V(D)J recombination in adaptive immunity in jawed vertebrates (Agrawal et al., 1998; Kapitonov and Jurka, 2001; Kim et al., 2015; Zhang et al., 2019). In addition, a few DNA transposons have been repurposed as transgene insertion tools for molecular genetics applications (e.g., Sleeping Beauty (Mátés et al., 2009) and piggyBac (Ding et al., 2005)). Although most DNA transposons transpose via a “cut-and-paste” mechanism, some (e.g., Helitrons (Kapitonov and Jurka, 2001) and Mavericks (Pritham et al., 2007)) employ a copy-and-paste mechanism that is distinct from retrotransposons (see the next section). DNA transposons have been extensively reviewed elsewhere, and will not be further discussed in this review (Feschotte and Pritham, 2007; Hickman and Dyda, 2015).
Classification and content of retrotransposons in the human genome. (A) Two modes of transposition. Upper: DNA transposons mobilize their sequences through a “cut-and-paste” mechanism, where the transposon in the original site (black line) is removed and inserted into a new genomic location (purple line). Lower: Retrotransposons amplify their sequences using RNA intermediates, also known as a “copy-and-paste” mechanism. The RNA sequences are converted into cDNA by reverse transcription before insertion of the cDNA sequences into the genome. (B) Classification and content of human retrotransposons. Left: The pie chart represents the fractions of transposable elements in the human genome. The protein-coding regions cover <2% of the genome, while almost half of the genome consists of transposable elements. Non-LTR retrotransposons make up the largest portion amongst transposable elements, including L1 (~16.9%), Alu (~10.6%) and SVA (~0.2%). LTR retrotransposons/endogenous retroviruses (ERVs) cover ~8.3% of the genome, while DNA transposons cover ~2.8% and other retrotransposons cover ~6.0% of the genome. Respective sequence structures, nucleotide lengths and copy numbers of each retrotransposon are shown on the right. The full-length retrotransposition-competent L1 sequence is approximately 6 kb in length, with a bidirectional RNA polymerase II promoter in the 5′ UTR, ORF1 (yellow box) and a short intergenic spacer region followed by ORF2 (blue box) that contains endonuclease (EN), reverse transcriptase (RT) and cysteine-rich domains (C). The L1 copy number is estimated to be ~516,000, which is nearly half of the Alu copy number (~1,090,000). The Alu sequence is ~300 bp in length and is made up of two monomers (left and right monomer). The left monomer contains an RNA polymerase III promoter with well-conserved A and B boxes. SVA is the least abundant non-LTR retrotransposon, with only ~2,700 copies and a length of approximately 2 kb. It contains hexameric repeat (CCCTCT)n, Alu-like, variable number of tandem repeat (VNTR) and SINE-R regions. An LTR retrotransposon or ERV is approximately 10 kb in length with flanking 5′ and 3′ LTR regions. It encodes Gag, Pol and Env proteins, like typical retroviruses, albeit truncated in some LTRs. Expression of the LTR retrotransposon is driven by an RNA polymerase II promoter within the 5′ LTR sequence.
Retrotransposons mobilize through a “copy-and-paste” mechanism termed retrotransposition, wherein their RNA intermediates are reverse transcribed into cDNAs during their integration (Fig. 1A). Retrotransposons are classified as long terminal repeat (LTR) and non-LTR elements based on the presence or absence of LTR sequence(s), which usually flank internal LTR retrotransposon sequences. LTR retrotransposons, also known as endogenous retroviruses (ERVs), comprise ~8% of the human genome (Fig. 1B) and are thought to be incapable of retrotransposition in humans due to point mutations, truncations and internal recombination, which sometimes results in solo LTRs (Lander et al., 2001). Nonetheless, some human-specific ERVs (HERVs) are transcribed, especially during embryogenesis, where a transcriptional network is rewired for developmental progression (Schmitt et al., 2013; Grow et al., 2015; Bannert et al., 2018; Tokuyama et al., 2018). HERV-K (HML-2), the most transcriptionally active and studied HERV, has intact ORFs (extensively reviewed by Garcia-Montojo et al., 2018; Xue et al., 2020). Although no retrotransposition-competent HERV has been reported, it is possible that HERV-encoded proteins from intact ORFs work in trans with other viruses to assemble infectious virus particles (Tokuyama et al., 2018; Ueda et al., 2020). Sequence analyses of more human genomes may reveal rare de novo insertions of HERVs (Richardson et al., 2015). Moreover, ERVs such as Syncytin and PEG10, both of which play pivotal roles in placental development, have been domesticated, suggesting an intimate association of ERVs with the evolution of mammals (Mi et al., 2000; Ono et al., 2006).
Non-LTR retrotransposons make up the majority of the human TEs. Among them, L1 is the only autonomous retrotransposon (Moran et al., 1996), while the rest of the non-LTR retrotransposons (i.e., short interspersed elements (SINEs) including Alu and SINE-VNTR-Alu (SVA), or processed pseudogenes) depend on the L1-encoded reverse transcriptase to mobilize their sequences in trans (Esnault et al., 2000; Dewannieux et al., 2003; Hancks et al., 2011; Raiz et al., 2012). In terms of genome coverage, L1 comprises ~16.9% of the human genome, followed by Alu (~10.6%), and SVA (~0.2%) (Fig. 1B); however, the copy number of Alu (~1.1 million copies) is higher than that of L1 (~516,000 copies), followed by processed pseudogenes (~8,000 copies, collectively) and SVA (~2,700 copies) (Lander et al., 2001) (Fig. 1B). A full-length L1 is approximately 6 kb in length (Scott et al., 1987), whereas Alu and SVA elements are only ~300 bp and ~2 kb, respectively (Fig. 1B). The details of Alu and SVA retrotransposons have been reviewed elsewhere (Hancks and Kazazian, 2010, 2016; Richardson et al., 2015).
Although L1s are the most abundant TEs by genomic coverage, the vast majority of L1 copies have been rendered inactive due to mutational processes including base substitutions, 5′ (or 3′) truncation or rearrangements during genome evolution (Grimaldi et al., 1984; Richardson et al., 2015). There remain, however, ~80–100 full-length L1s in the genome that retain the ability to retrotranspose (Sassaman et al., 1997; Brouha et al., 2003). Among these, a small number of highly active retrotransposition-competent human L1s (RC-L1s) have been implicated in the majority of human lineage-specific insertions (Skowronski et al., 1988; Sassaman et al., 1997; Myers et al., 2002; Brouha et al., 2003; Boissinot et al., 2004; Beck et al., 2010; Ewing and Kazazian, 2010; Huang et al., 2010; Philippe et al., 2016; Deininger et al., 2017) and de novo disease-producing L1 insertions (Hancks and Kazazian, 2012, 2016; Kazazian and Moran, 2017). L1 retrotransposition events in the germline (Ostertag et al., 2002; Richardson et al., 2017) or during early embryonic development (Garcia-Perez et al., 2007b; van den Hurk et al., 2007; Kano et al., 2009; Richardson et al., 2017; Feusier et al., 2019) can generate inter-individual genetic variation. L1 retrotransposition events in somatic cells, including neuronal progenitor cells (Muotri et al., 2005; Coufal et al., 2009; Faulkner and Garcia-Perez, 2017; Sanchez-Luque et al., 2019), post-mitotic neurons (Macia et al., 2017) and several cancers (e.g., Iskow et al., 2010; Lee et al., 2012; Solyom et al., 2012b; Shukla et al., 2013; Helman et al., 2014; Tubio et al., 2014; Rodić et al., 2015; Scott et al., 2016; Rodriguez-Martin et al., 2020), can generate intra-individual genetic variation. In addition to giving rise to insertional mutations, L1 retrotransposition events can lead to intra-chromosomal deletions or duplications, and, more rarely, inter-chromosomal translocation events (Gilbert et al., 2002, 2005; Symer et al., 2002; Beck et al., 2011; Richardson et al., 2015) (see also “Genomic alteration by L1 retrotransposition”).
RC-L1s are ~6 kb in length and consist of a 5′ untranslated region (UTR) that exhibits both sense and antisense promoter activities (Swergold, 1990; Speek, 2001; Olovnikov et al., 2007; Alexandrova et al., 2012), two open reading frames (ORF1 and ORF2) (Scott et al., 1987; Dombroski et al., 1991; Moran et al., 1996) and a 3′ UTR with a weak polyadenylation signal (Holmes et al., 1994; Moran et al., 1999; Goodier et al., 2000; Pickeral et al., 2000) (Fig. 2). Since more than ~500,000 L1 copies are interspersed throughout the genome, L1-containing sequences are often co-transcribed as a part of other RNAs that start from proximal gene promoters instead of canonical L1 promoters in the 5′ UTR. However, as these L1s are mainly located within introns or 3′ UTR regions, most of them are destined to be spliced out, or no longer undergo translation (Deininger et al., 2017). In contrast, transcription of RC-L1s is initiated by an internal TATA-less promoter located within the L1 5′ UTR (reviewed by Furano, 2000; Hermant and Torres-Padilla, 2021). Thus, it has been challenging to distinguish between genuine RC-L1 transcripts and other RNAs that contain L1 sequences embedded in the intronic and/or untranslated regions. Recently, however, substantial progress has been made with sequencing technologies and advanced computational tools to identify genuine RC-L1 transcripts with high confidence, allowing more accurate analysis and mapping of L1 transcripts (Philippe et al., 2016; Lanciano and Cristofari, 2020).
L1 transcription, translation and RNP formation. A full-length retrotransposition-competent L1 structure is depicted at the center. L1 contains a 5′ untranslated region (UTR), open reading frame 1 (ORF1) (yellow box), ORF2 (blue box) and a 3’ UTR followed by a poly(A) tract (An). The ORF1 and ORF2 sequences are separated by the inter-ORF spacer sequence. The ORF1-encoded protein (ORF1p) has a coiled-coil domain (CC), an RNA recognition motif (RRM) and a C-terminal domain (CTD). The ORF2-encoded protein (ORF2p) possesses endonuclease (EN) and reverse transcriptase (RT) domains, and a less characterized cysteine-rich domain (C). (Upper left) Several transcription factors including YY1, RUNX3, Ets, Sp1 and SOX11 facilitate L1 transcription. SOX2 may share its binding sites with SOX11 and repress L1 transcription. YY1 is critical to define the start position of a full-length L1 transcript in the 5′ UTR, which ensures that L1 copies maintain full-length sequences after successive rounds of retrotransposition. The antisense promoter (ASP) in the 5′ UTR is known to produce chimeric transcripts of L1s and their 5′ flanking genomic sequences, which potentially undergo translation. (Upper right) The L1 3′ UTR contains a weak polyadenylation signal. RNA polymerase II can bypass the L1 polyadenylation signal and continue to transcribe L1 3′ flanking genomic sequences until it encounters a downstream polyadenylation signal. As a result, these L1 flanking genomic DNAs can be inserted into new genomic loci during L1 retrotransposition by a process termed 3′ transduction. (Lower left) Bicistronic translation of L1 RNA through an unconventional termination–reinitiation mechanism. After ORF1 translation termination at the stop codon, the ribosome may continue to scan the RNA and re-initiate ORF2 translation when it encounters the first AUG codon of ORF2. (Lower right) L1 ribonucleoprotein particle (RNP) formation. ORF1p and ORF2p show a strong cis-preference and bind to their encoding RNA to form the L1 RNP and retrotranspose L1 sequences.
Several transcription factors have been reported to promote L1 expression: Yin Yang 1 (YY1) (Becker et al., 1993; Athanikar et al., 2004), RUNX family transcription factor 3 (or RUNT-related transcription factor 3) (RUNX3) (Yang et al., 2003), Ets proto-oncogene (ETS) family members (Yang et al., 1998), specificity protein 1 (or Sp1 transcription factor) (Sp1) (Yang et al., 1998) and SRY-box transcription factor (SOX) family members (Tchénio et al., 2000; Muotri et al., 2005). The binding sites of these factors are seemingly non-overlapping, suggesting an independent or cooperative mode of L1 transactivation (Athanikar et al., 2004; Alexandrova et al., 2012). L1 transcription can initiate at multiple sites within the 5′ UTR region, but YY1 confines the transcription start position to the +1 site of the L1 5′ UTR, which is vital to maintain the intact promoter sequence after repeated rounds of L1 retrotransposition and to preserve autonomous mobilization of L1s (Athanikar et al., 2004) (Fig. 2, Promoter). In order to comprehensively screen for L1 transcriptional regulators, Sun et al. (2018) developed the MapRRCon tool and found an additional 175 transcription factors that potentially bind to the L1 5′ UTR sequence; however, future studies are necessary to validate these candidate factors.
The L1 5′ UTR region exhibits antisense promoter (ASP) activity (Speek, 2001). This ASP allows RNA polymerase II to transcribe in the antisense direction relative to the L1 sequence, towards 5′-flanking genomic regions of the L1 element. This activity results in chimeric transcripts containing partial L1 sequences and their 5′-flanking genomic sequences (Nigumann et al., 2002). Moreover, such L1 antisense transcripts can also produce a peptide termed ORF0, found in humans and other primates; splicing into adjacent exons can result in ORF0 fusion proteins (Denli et al., 2015). ORF0 facilitates L1 retrotransposition and localizes in promyelocytic leukemia (PML) bodies, phase-separated nuclear structures that are involved in a wide range of nuclear processes (reviewed by Corpet et al., 2020). On the other hand, L1 antisense transcripts may also provide a source for small interfering (si) RNA production, which potentially reduces L1 transcripts (Yang and Kazazian, 2006; Chen et al., 2012). The direct roles of L1 antisense transcripts and ORF0 peptides in retrotransposition require further elucidation.
It is evident that L1 retrotransposition is mutagenic. The primary defense system counteracting retrotransposition throughout evolution is transcriptional repression via epigenetic modifications of the L1 promoter (for details, see “Host defense mechanisms against L1 expression and retrotransposition”). Indeed, L1 5′ UTR sequences are highly GC-rich, and CpG islands in the 5′ UTRs are highly methylated, enforcing transcriptional repression (Yoder et al., 1997). Intriguingly, while YY1 is required for the accurate transcriptional initiation of L1s, mutation of the YY1 binding site in the L1 5′ UTR allows L1 to escape DNA methylation, facilitating retrotransposition in somatic tissues where most L1 loci are epigenetically silenced (Sanchez-Luque et al., 2019). It is noteworthy that a rapid and marked decrease of global DNA methylation observed during embryogenesis and primordial germ cell development allows a large number of TEs including L1s to be highly expressed (reviewed by Eckersley-Maslin et al., 2018). At present, it is still debated whether L1 reactivation during these developmental stages directly leads to an increase of L1 retrotransposition events. However, retrotransposition-independent roles for L1 sequences in germline and early embryonic development have been suggested by several studies. For example, a transcriptionally active state of L1 loci may induce a dynamic alteration of global chromatin configuration, or L1 RNA may directly affect transcriptional reprogramming, thereby regulating gene expression that is indispensable for germline or early embryonic development (Jachowicz et al., 2017; Percharde et al., 2018, 2020; Yamanaka et al., 2019).
The L1 3′ UTR has been less well characterized. It generally contains a weak polyadenylation signal that often allows RNA polymerase II to transcribe through the boundary of full-length L1s (Fig. 2, Terminator). Thus, the 3′ flanking genomic sequences of L1s can also be transcribed continuously until RNA polymerase II incidentally encounters a downstream polyadenylation signal in genomic DNA. This leads to the generation of chimeric transcripts containing the flanking genomic sequences downstream of the L1 polyadenylation signal. As reverse transcription begins at the 3′ end of the chimeric transcripts, the 3′ flanking genomic sequences of L1s are also reverse transcribed and integrated into the genome through retrotransposition, in a process termed 3′ transduction (Holmes et al., 1994; Moran et al., 1999; Goodier et al., 2000; Pickeral et al., 2000). These additional transduced sequences that are generated in retrotransposition can in principle provide new promoters, alternative splicing sites, exons or premature polyadenylation signals at a new genomic location (Moran et al., 1996, 1999). Importantly, highly active L1 copies in the human genome are frequently accompanied by such 3′ transduced sequences during retrotransposition. In terms of mapping “hot” L1 loci, these transduced sequences at the new L1 insertion sites can be used to trace and identify the original copies of the active L1s and their chromosomal locations (Holmes et al., 1994; Beck et al., 2010; Macfarlane et al., 2013). With this method, recurrent events of L1 retrotransposition and the original sources of additional L1 copies have been extensively analyzed in cancer genomes (Tubio et al., 2014; Rodriguez-Martin et al., 2020). An intriguing question is why such poor polyadenylation activity is retained within L1 3′ UTRs. Since an L1 insertion in the same orientation as genomic transcription could increase the chance of acquiring unexpected premature polyadenylation signals, it is hypothesized that weak L1-derived polyadenylation activity would produce less deleterious impacts on the transcriptome in the host cells and, in turn, possibly allow more pervasive L1 retrotransposition throughout the genome (Richardson et al., 2015). An additional feature of the 3′ UTR is a conserved polypurine-rich tract that may form a G-quadruplex structure (Usdin and Furano, 1989). While this sequence is not critical for L1 retrotransposition in a cell culture-based assay (Moran et al., 1996), chemical stabilization of G-quadruplex formation facilitates L1 retrotransposition (Sahakyan et al., 2017), suggesting that the RNA or DNA secondary structure itself, or a structure-specific binding protein, modulates L1 mobilization. However, the exact function of the polypurine-rich sequence in L1 transcription and/or retrotransposition remains unclear.
L1-encoded proteins and L1 RNP formationL1 RNA is exported into the cytoplasm where the two L1-encoded proteins, ORF1p and ORF2p, are translated. These two proteins preferentially bind their own encoding RNA in cis to form L1 ribonucleoprotein particles (L1 RNPs) (Fig. 2, L1 RNP formation), an event that is necessary but not sufficient for retrotransposition (Martin, 1991; Hohjoh and Singer, 1996; Esnault et al., 2000; Wei et al., 2001; Kulpa and Moran, 2005, 2006; Doucet et al., 2010). ORF1p is produced at a high level, and its endogenous expression in early embryogenic, germ or cancer cells is readily detectable by western blotting or immunohistochemistry (Hohjoh and Singer, 1996; Garcia-Perez et al., 2007b; Soper et al., 2008; Rodić et al., 2014; Doucet-O’Hare et al., 2015; Jachowicz et al., 2017; Payer and Burns, 2019). In contrast, it is technically difficult to detect endogenous ORF2p (Ardeljan et al., 2019), probably because of its extremely low translation efficiency, due to an unconventional reinitiation process of ribosome scanning after ORF1p translation termination (Alisch et al., 2006; Doucet et al., 2010) (Fig. 2, Bicistronic translation). An internal spacer sequence that separates the ORF1 and ORF2 sequences is minimally conserved, even among mammals. While no internal ribosome entry site (IRES) activity has been reported in the human L1 spacer sequence (Alisch et al., 2006), the mouse inter-ORF spacer retains a functional IRES, and nucleolin may act as a facilitator of ORF2p translation by maintaining IRES function (Li et al., 2006; Peddigari et al., 2013). Of note, ORF2p translation occurs independently of the canonical first-AUG codon; moreover, L1 RNA undergoes ORF2p translation even when the ORF1 sequence is replaced with a different ORF such as a GFP derivative (Alisch et al., 2006). These data raise two provocative questions: does translation commonly occur downstream of the first ORF in mRNAs? And do such second ORF-encoded proteins have any physiological functions, with ORF2p as a precedent? Although several lines of evidence suggest that translation of polycistronic mRNAs, some of which encode “microproteins”, occurs more frequently than expected in eukaryotes (Kondo et al., 2007; Brubaker et al., 2014; Karginov et al., 2017; Makarewich et al., 2022), further proteomic studies and functional validation of these products will be needed to elucidate these questions.
ORF1p is an ~40-kDa protein with RNA-binding and nucleic acid chaperone activities (Hohjoh and Singer, 1996; Kolosha and Martin, 1997, 2003; Martin and Bushman, 2001; Martin et al., 2003, 2005; Kulpa and Moran, 2005; Khazina et al., 2011). The conserved RNA recognition motif (RRM) and the C-terminal domain (CTD) of ORF1p are required for RNA binding (i.e., RNP formation) and L1 retrotransposition (Moran et al., 1996; Kulpa and Moran, 2005; Martin et al., 2005; Januszyk et al., 2007; Khazina and Weichenrieder, 2009; Doucet et al., 2010; Khazina et al., 2011). ORF1p forms a metastable homotrimer, mediated by a coiled-coil (CC) domain (Fig. 2) (Martin et al., 2003; Khazina et al., 2011; Khazina and Weichenrieder, 2018). The external N-terminal region of the CC domain, which is essential for L1 retrotransposition, may adopt a disordered conformation, making it difficult to obtain structural information (Khazina and Weichenrieder, 2018; Newton et al., 2021). This disordered N-terminal region is a potential substrate for phosphorylation by proline-directed protein kinases; in turn, phosphorylation enhances ORF1p binding to peptidyl prolyl isomerase 1 (Pin1) (Cook et al., 2015). This leads to the hypothesis that Pin1-dependent isomerization of proline side chains next to phospho-serine residues affects the conformation of ORF1p or its interaction with other factors required for retrotransposition (Liou et al., 2011; Cook et al., 2015).
ORF2p is an ~150-kDa protein that possesses endonuclease (EN) and reverse transcriptase (RT) activities (Fig. 2) (Mathias et al., 1991; Feng et al., 1996; Moran et al., 1996; Doucet et al., 2010). These biochemical activities are vital for L1 retrotransposition (Feng et al., 1996; Moran et al., 1996). The ORF2p EN domain, which is located at its N-terminus, has a strong similarity to apurinic/apyrimidinic (AP) endonucleases (Feng et al., 1996). As predicted by this similarity, ORF2p EN activity introduces a single-stranded (ss) nick on a double-stranded (ds) DNA in vitro (Feng et al., 1996; Cost and Boeke, 1998; Cost et al., 2002); however, in contrast to AP endonucleases, an apurinic DNA site is not the preferred substrate for ORF2p EN (Feng et al., 1996). In addition to the unconventional translation of ORF2p, it is also likely that host defense mechanisms tightly limit the amount of ORF2p so as to minimize the potentially detrimental effects provoked by ORF2p EN, such as DNA break formation that can lead to apoptosis or early onset of cellular senescence (Belgnaoui et al., 2006; Gasior et al., 2006; Belancio et al., 2010; Kines et al., 2014; Miyoshi et al., 2019).
ORF2p RT activity was initially demonstrated using a chimeric protein of ORF2 fused to a budding yeast Ty family retrotransposon (Mathias et al., 1991; Dombroski et al., 1994). The ORF2p RT domain exhibits high sequence similarity with that of telomerase, retroviruses and other retroelements including group II introns (Xiong and Eickbush, 1990; Malik et al., 1999; Stamos et al., 2017), suggesting that they may have diverged from a common ancestral element during evolution (Nakamura and Cech, 1998; Belfort et al., 2011). The similarity between retrotransposons and telomerase extends beyond the sequence level; a mechanism of action analogous to reverse transcription may operate at the ends of chromosomes, as non-LTR retrotransposons maintain chromosomal termini by repeated retrotransposition to telomeres in fruit flies, where no obvious ortholog of telomerase has been found (reviewed by Pardue and DeBaryshe, 2011). In mammals, EN-deficient L1s can also mobilize into deprotected telomeres (see “Genomic alteration by L1 retrotransposition”) (Morrish et al., 2007; Kopera et al., 2011). These lines of evidence partly support a hypothesis that linear chromosomes emerged during a process wherein broken DNA ends at repetitive sequences began to be fixed and maintained using the repeat-associated retrotransposon RTs in ancestral organisms with circular chromosomes (de Lange, 2015).
ORF1p is able to bind RNA in a sequence-independent fashion (Hohjoh and Singer, 1996, 1997; Kolosha and Martin, 2003; Khazina et al., 2011), while RNA binding by ORF2p requires a poly(A) sequence (Fig. 2, L1 RNP formation) (Esnault et al., 2000; Doucet et al., 2015). Therefore, these two proteins bind to RNA independently; moreover, the apparent ORF1p–ORF2p interaction is largely abolished by RNase treatment (Taylor et al., 2013). As mentioned earlier, much less ORF2p is expressed than ORF1p; it is unclear whether such a difference is necessary for RC-L1 RNP formation. An epitope tagging strategy enables us to detect ORF2p in various cultured cells, and to purify the RC-L1 RNP complex, allowing accurate stoichiometric analysis of ORF1p and ORF2p in the L1 RNPs as well as identification of host factors that regulate L1 RNP formation (Goodier et al., 2004, 2010; Doucet et al., 2010; Taylor et al., 2013, 2018; Mita et al., 2018; Miyoshi et al., 2019). So far, PABPC1, a cytoplasmic poly(A) RNA-binding protein, is the only factor that has been reported to facilitate both L1 RNP formation and retrotransposition (Dai et al., 2012) (Fig. 3).
Host factors that facilitate L1 retrotransposition. After transcription, L1 RNA is exported into the cytoplasm, followed by ORF1p and ORF2p translation. Both proteins preferentially bind to their encoding RNA in cis to form a ribonucleoprotein (RNP) complex, which is necessary for retrotransposition. PABPC1, a poly(A)-binding protein, stabilizes L1 RNP formation. ESCRT, a membrane budding/fusion complex, may facilitate the nuclear transport of the L1 RNP. ORF1p and/or ORF2p can also act in trans on non-autonomous retrotransposons such as Alu, a 7SL-derived sequence. ORF1p but not ORF2p is dispensable for Alu retrotransposition. Several host factors, including the 7SL RNA-binding proteins SRP9 and SRP14, also bind to Alu RNA and are critical for Alu retrotransposition. Once the L1 RNPs gain access to the nucleus, TPRT is initiated by ORF2p EN cleavage at a degenerate consensus sequence such as 5′-TTTT/AA-3’ (the slash indicates the ORF2p cleavage site), generating a free 3′-hydroxyl group that is used as a primer for reverse transcription by ORF2p RT. Nucleolytic digestion by ORF2p EN triggers PARP2 recruitment, thereby increasing the local concentration of poly(ADP-ribose) at the L1 integration site, which, in turn, provides a chance for RPA binding. A cellular RNase H such as RNase H2 may play a role in L1 RNA/cDNA hybrid clearance after reverse transcription. Subsequently, RPA may protect (–) single-strand L1 cDNA to facilitate L1 retrotransposition. In addition, ORF2p directly interacts with PCNA. The detailed mechanisms of second-strand genomic DNA cleavage, (+) strand L1 cDNA synthesis and L1 cDNA ligation to genomic DNA remain unclear.
Of note, analysis of L1 subcellular localization revealed that L1 RNPs form distinct cytoplasmic foci that require the RNA-binding ability of ORF1p (Goodier et al., 2007; Doucet et al., 2010). Moreover, several reports demonstrated that L1 RNPs localize in stress granules (SGs), membraneless cytoplasmic regulatory bodies that form via liquid–liquid phase separation (reviewed by Boeynaems et al., 2018) under various cellular stress conditions (Anderson and Kedersha, 2002; Goodier et al., 2007; Doucet et al., 2010; Protter and Parker, 2016). SG components, including Moloney murine leukemia virus 10 protein (MOV10), APOBEC3 (apolipoprotein B mRNA editing enzyme catalytic polypeptide-like 3) and SAM and HD domain-containing protein 1 (SAMHD1), regulate L1 retrotransposition, likely through cytoplasmic co-localization with the L1 RNP foci (Goodier et al., 2012; Horn et al., 2014; Hu et al., 2015). However, L1 RNP formation alone may be insufficient to induce SG formation (Ariumi et al., 2018; Pereira et al., 2018). Further studies are required to elucidate whether SG formation is integral to L1 RNP foci formation in the cytoplasm. Intriguingly, an in vitro study demonstrated that ORF1p alone forms a phase-separated liquid droplet (Newton et al., 2021); however, since the presence of L1 RNA in the ORF1p-containing droplet has not been examined, it remains unclear whether the in vitro ORF1p condensate is equivalent to the in vivo L1 cytoplasmic foci.
Several lines of genetic and biochemical evidence support the hypothesis that both ORF1p and ORF2p preferentially retrotranspose L1 RNA in cis (Esnault et al., 2000; Wei et al., 2001; Kulpa and Moran, 2006; Doucet et al., 2010; Taylor et al., 2013; Luqman-Fatah et al., 2022); however, these proteins can also act in trans on non-autonomous retrotransposons such as Alu and SVA (Dewannieux et al., 2003; Hancks et al., 2011; Raiz et al., 2012) or other mRNAs to produce processed pseudogenes (Esnault et al., 2000) (Fig. 3). These RNA sequences harbor or terminate with a poly(A) sequence that is required for ORF2p binding (Boeke, 1997; Dewannieux et al., 2003; Doucet et al., 2015). For example, the APOBEC3F I inhibit L1 retrotranspositi high frequency of Alu retrotransposition can be attributed to a poly(A) tract near the end of Alu RNA. In addition, Alu RNAs are derived from 7SL RNA and retain the ability to interact with the ribosome-binding proteins SRP9 and SRP14, thereby placing Alu RNAs in close proximity to the ribosomes during translation of nascent ORF2p; such a close association can in turn promote the “hijacking” of ORF2p by Alu RNAs (Sarrowa et al., 1997; Bennett et al., 2008; Ahl et al., 2015; Doucet et al., 2015). Additional host factors or mechanisms underlying the high Alu retrotransposition frequency may be discovered with further investigation.
Target-site primed reverse transcription (TPRT)The mechanism by which L1 RNPs gain access to the nucleus remains unknown, but it does not appear to strictly require nuclear envelope breakdown since L1 retrotransposition occurs in non-dividing cells such as G1-arrested or post-mitotic neuronal cells (Kubo et al., 2006; Macia et al., 2017). It has also been hypothesized that the positively charged N-terminal region of ORF1p, which forms a metastable coiled-coil domain, is involved in membrane fusion akin to a virus–host interaction (Khazina and Weichenrieder, 2018). In addition, the ESCRT (endosomal sorting complex required for transport) machinery, which is involved in membrane budding/fusion, interacts with ORF1p and facilitates L1 retrotransposition (Horn et al., 2017). These observations lead to the hypothesis that a membrane fusion process aids L1 RNP trafficking from the cytoplasm into the nucleus where L1 RNPs initiate retrotransposition without nuclear membrane breakdown (Fig. 3). In contrast, cell division promotes L1 mobilization (Shi et al., 2007; Xie et al., 2013), and nuclear L1 RNP signals and L1 retrotransposition frequency peak during the S phase (Mita et al., 2018). Recent studies also support a mechanism linking L1 retrotransposition to DNA replication (Mita et al., 2018, 2020; Flasch et al., 2019; Sultana et al., 2019; Ardeljan et al., 2020; Rodriguez-Martin et al., 2020). Future studies are required to clarify how L1 RNPs access the nucleus and how L1 retrotransposition is regulated throughout the cell cycle.
Genomic integration of L1 cDNAs occurs via TPRT, the mechanism of which was originally studied in the Bombyx mori R2Bm retrotransposon (Luan et al., 1993; Feng et al., 1996; Cost et al., 2002) (Fig. 3). ORF2p EN cleaves a degenerate consensus sequence, 5′-TTTT/AA-3′ (the slash indicates the typical site for scission by ORF2p EN), leaving a nick with a 3′-hydroxyl and 5′-phosphate group (Feng et al., 1996; Cost and Boeke, 1998; Cost et al., 2002; Flasch et al., 2019). The cleavage sequences observed are variable both in vitro and in vivo, as the ORF2p EN cleavage site specificity is somewhat relaxed (Feng et al., 1996; Moran et al., 1996; Cost and Boeke, 1998; Cost et al., 2002; Flasch et al., 2019; Sultana et al., 2019). After nicking, the resulting T-rich genomic DNA strand liberated by ORF2p EN is annealed to an L1 RNA poly(A) tail, and is used as a primer by ORF2p RT to reverse transcribe the first strand (or [-] strand) of the L1 cDNA (Kulpa and Moran, 2006; Monot et al., 2013; Doucet et al., 2015). During this step, PCNA, a known ORF2p-interacting protein that normally functions as a sliding DNA clamp for DNA polymerases, likely enhances L1 retrotransposition efficiency by recruiting ORF2p to potential initiation sites or by increasing ORF2p RT processivity, although this has not yet been demonstrated conclusively in vitro (Taylor et al., 2013). Moreover, poly(ADP-ribose) polymerase 2 (PARP2) acts as a sensor that recognizes the nick generated by ORF2p EN, which, in turn, activates PARP2 and enhances poly(ADP-ribose) synthesis (Miyoshi et al., 2019) (Fig. 3). Subsequently, replication protein A (RPA), a major ssDNA-binding protein in DNA repair and replication, binds to L1 integration sites via poly(ADP-ribose) generated by PARP2, and may protect ssDNA intermediates generated during TPRT from cytidine deamination by the potent L1 inhibitor APOBEC3A, or perhaps unscheduled nucleolytic attack by cellular nucleases (Miyoshi et al., 2019). The process of ssDNA generation from L1 RNA/cDNA hybrids remains unclear, since ORF2p does not exhibit detectable RNase H activity. Notably, RNase H activity is required for mobilization of group II introns (Smith et al., 2005; Piskareva and Schmatchenko, 2006). It is therefore likely that RNase H2 compensates for the lack of the RNase activity in L1 RNPs to liberate the ss L1 cDNA strand during or after reverse transcription (Benitez-Guijarro et al., 2018).
It has been hypothesized that a second nick of the genomic DNA occurs at a site downstream of the initial nick, resulting in target-site duplications (TSDs) flanking the inserted L1 sequences; TSDs are a hallmark of the L1 retrotransposition process (Moran et al., 1996; Gilbert et al., 2002, 2005; Symer et al., 2002) (Fig. 4A). Given that R2Bm can generate double-strand breaks (DSBs) at the target site (Luan et al., 1993), the second nick could be catalyzed by ORF2p EN, but unknown cellular nucleases are also likely involved in this process. Similarly, (+) strand L1 cDNA synthesis is poorly understood. In the final step of L1 retrotransposition, where L1 cDNA is ligated to flanking genomic DNA, host DNA repair factors and DNA ligases undoubtedly play a pivotal role. Non-homologous end joining (NHEJ), a well-known DSB repair pathway, is required for L1 retrotransposition in a chicken lymphoblast cell line (Suzuki et al., 2009); however, Chinese hamster ovary (CHO) cells defective in NHEJ can still carry out conventional L1 retrotransposition similar to that in NHEJ-proficient cells (Morrish et al., 2002). L1 retrotransposition is frequently accompanied by 5′ truncation of the L1 cDNA insertion, probably due to host defense mechanisms. In this case, microhomology is frequently observed at the junctions between genomic DNA and the 5′ ends of L1 copies (Gilbert et al., 2005; Zingler et al., 2005; Kojima, 2010), suggesting the involvement of an alternative NHEJ pathway, which involves annealing of short stretches of microhomology at or close to DSBs. Indeed, another poly(ADP-ribose) polymerase, PARP1, which is crucial for alternative NHEJ (Ray Chaudhuri and Nussenzweig, 2017), interacts with L1 RNPs, and PARP1 knockdown or PARP inhibitor treatment leads to a marked reduction of L1 retrotransposition (Taylor et al., 2013; Miyoshi et al., 2019). Taken together, the detailed mechanisms of opposite-strand genomic DNA cleavage, (+) strand L1 cDNA synthesis and L1 cDNA ligation to target-site genomic DNA remain to be elucidated. In addition to the L1-encoded proteins, host-encoded proteins likely participate in these processes (Taylor et al., 2013, 2018; Liu et al., 2018; Miyoshi et al., 2019).
Genomic alteration by L1 retrotransposition. (A) TSD or deletion in L1 retrotransposition. ORF2p EN activity creates a nick at the degenerate consensus sequence 5′-TTTT/AA-3′. The liberated 3’-hydroxyl group is used to initiate reverse transcription by ORF2p RT (dashed blue line) during TPRT. The second-strand cleavage frequently occurs downstream of the first-strand nick site (left), resulting in the generation of duplicated sequences (green lines) at both ends of the L1 integration site (blue lines). In contrast, when an upstream site is chosen as the other end of the L1 cDNA insertion (right), target-site deletion (orange line) rather than duplication may take place. It remains unclear whether second-strand cleavage is induced by a specific nuclease, or a pre-existing DNA break is exploited, during L1 retrotransposition. (B) Multiple fates of L1-mediated genomic alterations. L1 (or Alu and SVA) retrotransposition causes different types of genomic instability that depend on the specific cDNA insertion mechanisms (blue boxes). In most cases, insertional mutations of L1 retrotransposition are accompanied by TSDs (green triangles) of ≤ 50 bp, which can occasionally lead to disruption of exon sequences (orange boxes). Large genomic deletion or duplication (i.e., large TSD formation, green boxes) likely occurs through a second-strand cleavage or pre-existing DNA break far upstream or downstream of the original nicked site to ligate the L1 cDNA end to genomic DNA. In addition, intra-chromosomal duplication can also occur via a complex L1 integration process independently of TSD formation (not shown in the figure). When L1 cDNA is fused to DNA breaks in different chromosomes (light blue and pink, respectively), this retrotransposition eventually leads to chromosomal translocation. Endogenous L1 copies can provide a template for strand invasion of the newly synthesized L1 cDNA, which may result in chromosomal deletion, duplication and translocation during L1 retrotransposition. In connection with translocation, a breakage–fusion–bridge cycle may be triggered by chromosomal fusion mediated by L1 retrotransposition. (C) Endonuclease-dependent and -independent L1 retrotransposition. ORF2p EN activity (scissors) is required for a conventional L1 retrotransposition associated with TSD formation (green triangles) as described in (A). However, EN-deficient ORF2p can still retrotranspose L1 sequences (blue boxes) in a cell line lacking both the functional NHEJ and p53 pathways, likely employing unrepaired DSBs to initiate reverse transcription. This endonuclease-independent L1 retrotransposition usually occurs without typical hallmarks of retrotransposition, such as an ORF2p EN cleavage site, TSD formation and terminal poly(A) insertion (An).
It is evident that ORF1p is required for L1 retrotransposition (Moran et al., 1996; Kulpa and Moran, 2005; Doucet et al., 2010; Khazina et al., 2011); however, its exact role(s) in TPRT remains to be elucidated. The nucleic acid chaperone activity of ORF1p is assumed to facilitate the initial step of TPRT (e.g., strand exchange between L1 RNA and target-site genomic DNA) (Martin and Bushman, 2001), although ORF1p is not inherently obligatory for Alu retrotransposition, which requires ORF2p RT activity (Dewannieux et al., 2003). Since a loss-of-function ORF1p results in variable lengths of L1 cDNA synthesis by ORF2p in vitro (Doucet et al., 2010), ORF1p appears to be involved in positioning ORF2p on L1 RNA to accurately initiate reverse transcription from the 3′ end of the template RNA. In Alu retrotransposition, SRP9/14 may act as the nucleic acid chaperone instead of ORF1p (Dewannieux et al., 2003). The detailed function of ORF1p during TPRT requires further investigation. We provide a list of the known host factors that facilitate L1 retrotransposition in Table 1.
Protein symbol | Protein name | Notes | Mechanism of activation | References |
---|---|---|---|---|
Transcription | ||||
YY1 | Yin Yang 1 or YY1 transcription factor | transcription factor | upregulation of L1 transcription | Athanikar et al., 2004; Becker et al., 1993 |
RUNX3 | RUNX family transcription factor 3 or RUNT-related transcription factor 3 | transcription factor | upregulation of L1 transcription | Nuo Yang et al., 2003 |
ETS1 | Ets proto-oncogene 1, transcription factor | transcription factor | upregulation of L1 transcription | Yang et al., 1998 |
Sp1 | specificity protein 1 or Sp1 transcription factor | transcription factor | upregulation of L1 transcription | Yang et al., 1998 |
SOX11 | SRY-box transcription factor 11 | transcription factor | upregulation of L1 transcription | Muotri et al., 2005; Tchenio, 2000 |
RNA and protein | ||||
PABPC1 | poly(A)-binding protein cytoplasmic 1 | poly(A) RNA-binding protein | RNP stabilization | Dai et al., 2012 |
PDPKs | proline-directed protein kinases | increasing the Pin1 and ORF1p interaction | Cook et al., 2015 | |
Pin1 | peptidyl prolyl isomerase 1 | unknown | Cook et al., 2015 | |
UPF1 | regulator of nonsense transcripts 1 | nonsense-mediated decay | unknown | Taylor et al., 2013 |
Transport | ||||
ESCRT | endosormal sorting complex required for transport | RNP formation or nuclear transport? | Horn et al., 2017 | |
TPRT intermediates | ||||
PCNA | proliferating cell nuclear antigen | sliding clamp in DNA replication | supporting ORF2p activities | Taylor et al., 2013 |
PARP1 | poly(ADP-ribose) polymerase 1 | PARP family | unknown | Taylor et al., 2013; Miyoshi et al., 2019 |
PARP2 | poly(ADP-ribose) polymerase 2 | PARP family | RPA recruitment to TPRT intermediates | Miyoshi et al., 2019 |
RPA | replication protein A | single-strand DNA binding protein | protection of TPRT intermediates | Miyoshi et al., 2019 |
RNase H2 | ribonuclease H2 | ISG | RNA clearance after reverse transcription | Benitez-Guijarro et al., 2018 |
In addition to single-nucleotide variants (SNVs) and short insertions or deletions (indels) (e.g., < 50 bp), a large number of genomic alterations that span more than 50 bp, including deletions, duplications, inversions, insertions and translocations, are detected as naturally occurring structural variants (SVs) in the human genome (Sudmant et al., 2015; Collins et al., 2020; Nesta et al., 2021). De novo L1 or Alu insertion has been estimated to occur at a rate of one per ~20–200 or ~20–40 live births, respectively, with a chance of causing SVs (Kazazian, 1999; Li et al., 2001; Cordaux et al., 2006; Xing et al., 2009; Ewing and Kazazian, 2010; Wimmer et al., 2011; Hancks and Kazazian, 2012; Feusier et al., 2019). Thus, retrotransposition-mediated SVs continue to generate genetic diversity in human populations (Beck et al., 2010; Huang et al., 2010). Indeed, approximately one-fourth of human SVs associate with L1, Alu, SVA and processed pseudogene-derived sequences (Xing et al., 2009; Sudmant et al., 2015). Moreover, de novo insertions of these elements have been documented by sequencing analysis of disease-causing gene mutations and by whole-genome analysis in various types of cancers (Lee et al., 2012; Tubio et al., 2014; Hancks and Kazazian, 2016; Kazazian and Moran, 2017; Rodriguez-Martin et al., 2020).
In addition to insertional mutations, L1-mediated retrotransposition occasionally manifests as target-site genomic alterations. A conventional L1 retrotransposition event is frequently accompanied by the L1 retrotransposition hallmark, TSD formation. As mentioned above, the tendency toward hypothetical downstream opposite-strand nicking may result in the frequent duplication of the genomic DNA fragment between the first- and second-strand cleavage sites at both ends of the L1 cDNA sequence (Fig. 4A). In contrast, when second-strand cleavage occurs upstream of L1-mediated first-strand cleavage, perhaps through endonucleolytic digestion or a naturally occurring DNA break, the resultant insertion is accompanied by a deletion, rather than a duplication, of target-site genomic DNA (Fig. 4A) (Gilbert et al., 2002, 2005; Symer et al., 2002; Rodriguez-Martin et al., 2020). In a third possible scenario, no duplication or deletion of target-site DNA occurs when the second strand is cleaved exactly opposite the first-strand nick.
The enzymatic activities responsible for the second-strand cleavage during TPRT remain unclear. In addition, second-strand target sequences exhibit no obvious similarity to the consensus ORF2p EN cleavage site, and the positions do not appear to be strictly determined, suggesting the involvement of cellular nucleases or DNA breaks (Gilbert et al., 2002, 2005; Symer et al., 2002; Rodriguez-Martin et al., 2020). Since second-strand cleavage may occur in the vicinity of the first-strand nick (Gilbert et al., 2005), TSDs or deletions are usually relatively short (e.g., less than ~50 bp for TSDs) (Lee et al., 2012). However, long TSD formation or large genomic alterations ranging in size from several kb to Mb are also detected in neurons, several types of cancers and disease-causing mutations (Gilbert et al., 2002, 2005; Symer et al., 2002; Erwin et al., 2016; Hancks and Kazazian, 2016; Payer and Burns, 2019; Rodriguez-Martin et al., 2020). These gross target-site alterations could arise from several plausible events: second-strand cleavage that occurs at a position far from the initial endonucleolytic nick; strand invasion and annealing between a pre-existing and a newly synthesized L1 sequence; or exploitation of naturally occurring DNA breaks instead of breaks arising from ORF2p EN activity during L1 cDNA insertion (Fig. 4A, 4B and 4C). Moreover, intra- and/or inter-chromosomal translocations are also generated by L1 retrotransposition, in cases where the L1 cDNA end is presumably ligated to DSB ends on other chromosomes (Gilbert et al., 2002, 2005; Symer et al., 2002; Beck et al., 2011; Richardson et al., 2015). It should be mentioned that these L1-mediated translocations may cause dicentric chromosome formation followed by chromosomal breakage between the two centromeres during mitosis, leading to a breakage–fusion–bridge cycle (Rodriguez-Martin et al., 2020), which was originally proposed by Barbara McClintock as a model for chromosomal instability (McClintock, 1941). These multiple fates of L1-mediated genomic rearrangements strongly suggest that host DNA repair or replication machinery participates in L1 retrotransposition (Fig. 4B). In support of this notion, sequencing analyses, proteomic approaches and gene knockdown/knockout screening have identified a number of DNA repair factors as L1 retrotransposition regulators (Taylor et al., 2013, 2018; Liu et al., 2018; Mita et al., 2018, 2020; Flasch et al., 2019; Miyoshi et al., 2019; Sultana et al., 2019). Indeed, TSDs are less frequently observed in the context of knockdown of breast cancer 1 (BRCA1), a multifunctional tumor suppressor gene in which inherited mutations predispose to breast and ovarian cancer (Mita et al., 2020); in contrast, TSD length increases when the nucleotide excision repair (NER) pathway is defective (Servant et al., 2017).
ORF2p EN is crucial for the initiation of canonical L1 integration, which is accompanied by typical structural hallmarks such as cleavage at the L1 EN target consensus and poly(A) tail insertion, as well as TSD formation. An alternative pathway that does not require the ORF2p EN activity has also been demonstrated, termed endonuclease-independent (ENi) retrotransposition (Fig. 4C) (Morrish et al., 2002). In p53- and NHEJ-deficient cells (e.g., with mutation or knockdown of XRCC4, DNA-PKcs, XLF or DNA ligase IV), unrepaired DSBs may be employed by EN-deficient L1 RNPs to initiate reverse transcription at genomic DNA ends, leading to L1 insertions lacking the hallmarks described above (Morrish et al., 2002, 2007; Coufal et al., 2011; Kopera et al., 2011). Of note, de-protected telomeres in a DNA-PKcs-mutated CHO cell line are recognized as DSBs and subjected to ENi retrotransposition (Morrish et al., 2007). ENi retrotransposition was also corroborated in a cell line deficient in the Fanconi anemia (FA) pathway (Flasch et al., 2019), which is required for the repair of DNA inter-strand crosslinks and stalled/broken DNA replication forks. Intriguingly, BRCA1 acts in the FA pathway (Garcia-Higuera et al., 2001; Sawyer et al., 2015) and its knockdown results in elevated ENi retrotransposition frequency (Coufal et al., 2011). These cellular environments may recapitulate a model wherein ancient, endonuclease-lacking LINE elements utilized DNA breaks or DNA replication intermediates such as lagging-strand fragments (i.e., Okazaki fragments) as primers from which to initiate cDNA synthesis during retrotransposition (Kopera et al., 2011; Flasch et al., 2019). L1 insertions that exhibit features similar to ENi retrotransposition events are found not only in a disease-causing genomic deletion but also in the human genome reference sequence (Sen et al., 2007; Morisada et al., 2010). Future studies will shed light on how ORF2p accesses DSB ends and co-opts host repair factors for ENi retrotransposition.
ORF2p is also able to reverse transcribe non-retrotransposon sequences including small nuclear (sn) and small nucleolar RNAs that are conjugated to L1 sequences (reviewed by Richardson et al., 2015). In this process, the RNA ligase RtcB, which is required for tRNA maturation, ligates L1 RNA and U6 snRNA to provide a template for reverse transcription, giving rise to a chimeric cDNA sequence comprised of L1 and U6 during retrotransposition (Buzdin et al., 2002; Gilbert et al., 2005; Garcia-Perez et al., 2007a; Moldovan et al., 2019). As described earlier, due to a weak polyadenylation signal within the L1 3′ UTR, the 3’-flanking genomic information is frequently co-transcribed, carried over on the L1 RNA, and retrotransposed with the source L1s (i.e., 3′ transduction) (Holmes et al., 1994; Moran et al., 1999; Goodier et al., 2000; Pickeral et al., 2000). In contrast, when transcription begins upstream of an L1 promoter (e.g., a proximal gene promoter) and RNA polymerase II transcribes the full-length L1 sequence, the L1 5′-flanking genomic region can be retrotransposed with the L1 sequence (i.e., 5′ transduction) (Lander et al., 2001).
High processivity of ORF2p RT was observed relative to Moloney murine leukemia retrovirus RT (Piskareva and Schmatchenko, 2006); nevertheless, a large number of L1 retrotransposition events result in 5′-truncated insertions (Lander et al., 2001; Myers et al., 2002; Gilbert et al., 2005). This discrepancy requires further study, but it appears that host DNA repair activity and/or defense systems restrict full-length L1 cDNA synthesis and/or remove TPRT intermediates (see “Host defense mechanisms against L1 expression and retrotransposition”). Intriguingly, retrotransposition with severe (long-range) 5′ truncation leads to a complete loss of the L1 cDNA, thereby leaving only a poly(A) tract or the 3′ transduced sequence in the resultant insertion (Moran et al., 1999; Solyom et al., 2012a; Tubio et al., 2014). Moreover, target-site cleavage by the ORF2p EN activity can potentially trigger large-scale genomic deletion in the absence of any cDNA insertion in neuronal cells (Erwin et al., 2016). Whether such a deletion event is a neuron-specific phenomenon or is common in other cell types remains unclear.
Retrotransposons are repetitive and interspersed throughout genomes; hence, SVs occur not only via retrotransposition-mediated genomic alteration but also through recombination-mediated events between pre-existing retrotransposon sequences. When DSBs within retrotransposons are repaired via homologous recombination, unequal crossing-over between the two different copies leads to non-allelic homologous recombination (NAHR), which occasionally results in disease-causing mutations and accounts for deletions, duplications and inversions in primate genome evolution (Deininger and Batzer, 1999; Callinan and Batzer, 2006; Sen et al., 2006; Han et al., 2008). NAHR events involving Alu elements occur more frequently than those between L1s (Cordaux and Batzer, 2009), presumably due to the higher copy number and similarity of full-length Alu sequences (Sen et al., 2006; Han et al., 2008). Alu–Alu recombination may also cause more complex genomic rearrangements brought about by chromothripsis, in which chromosomal shattering with multiple and concurrent DNA breaks may be repaired by recombination events at an inter- or intra-chromosomal level (Nazaryan-Petersen et al., 2016, 2020). Intriguingly, similar to the brain study (Erwin et al., 2016), ORF2p EN cleavage of genomic DNA also appears to be involved in this striking rearrangement, suggesting a concomitant occurrence of multiple events including DNA breaks by ORF2p EN (perhaps accompanied by retrotransposition) and subsequent NAHR events (Nazaryan-Petersen et al., 2016).
Aberrant expression of L1s poses a threat to genome integrity. To keep L1s in check, host cells employ multiple defense mechanisms that function at different stages of the L1 retrotransposition cycle (reviewed by Ariumi, 2016; Goodier, 2016). In the following sections, we focus on well characterized host factors that restrict L1 activities (Fig. 5) and provide a compilation of host factors restricting L1 retrotransposition (Table 2).
Host factors that restrict L1s. L1 inhibitors annotated at different stages of the L1 retrotransposition cycle. Upper left (nucleus): nucleosomes and the DNA strands are shown as a “beads-on-a-string” model. Most L1s are silenced by the histone H3 modification H3K9me3 (black clover symbols on the nucleosome beads) and DNA methylation (black lollipops on the DNA string). H3K9me3 is maintained by KAP1, SETDB1 and the HUSH complex. Losses of H3K9me3 and DNA methylation (open clovers and lollipops, respectively) activate L1 expression. Histone H3 is also modified by HDACs to remove acetylation and inhibit L1 expression. DNA methylation on the L1 5′ UTR is maintained by DNMTs, MeCP2 and PLZF. Proteins involved in epigenetic silencing are shown as dark-gray ovals. At the transcriptional level, p53 and SOX2 inhibit L1 through 5′ UTR binding (light-gray rectangles). Lower (cytoplasm): after L1 RNA is transcribed and exported into the cytoplasm, piRNA, siRNA and miRNA (miR-128 and let-7 miRNA) target the L1 RNA for translational impairment or degradation. The remaining L1 RNA undergoes translation to generate ORF1p trimers (yellow spheres) and ORF2p (blue rectangle) that bind to L1 RNA to form the L1 RNP. Several ISG proteins (red rectangles) inhibit L1 RNP, RNA or the encoded proteins (ORF1p and ORF2p). L1 RNP accumulates in the cytoplasm to form discrete foci-like structures that may or may not co-localize with stress granules. Some ISGs including ZAP (ZC3HAV1), MOV10, SAMHD1 and APOBEC3 family proteins have been observed to co-localize with L1 foci in stress granules. The co-localization of L1 foci with stress granules may lead to inhibition through autophagy clearance. Cytoplasmic L1 cDNA may trigger the cGAS–STING pathway, while RIG-I-like receptors (RLRs) may recognize L1 RNA; both pathways induce the expression of type I interferons that potentially upregulate the expression of L1-inhibiting ISGs. Upper right (nucleus): a fraction of L1 RNPs can gain access to the nucleus and undergo TPRT. Homologous recombination factors such as BRCA1, ERCC1/XPF complex and APOBEC3A inhibit the TPRT process through different mechanisms (purple rectangles). KZNFs may recognize de novo-inserted L1s, although no KZNF that is specific for evolutionarily young and active L1s has been reported. Transcription of intronless de novo-inserted L1s is recognized by PPHLN1 of the HUSH complex, resulting in the recruitment of chromatin modifiers such as SETDB1, KAP1 and MORC2; this leads to histone H3K9me3 trimethylation as well as DNA methylation. This may cause heterochromatinization of the insertion site and thus affect the expression of neighboring genes.
Protein or RNA symbol | Protein/complex name | Notes |
---|---|---|
Epigenetic silencing | ||
DNMT1 | DNA methyltransferase 1 | DNA methyltransferase |
DNMT3A | DNA methyltransferase 3 alpha | DNA methyltransferase |
DNMT3B | DNA methyltransferase 3 beta | DNA methyltransferase |
MECP2 | methyl-CpG-binding protein 2 | chromatin binding protein |
PLZF | promyelocytic leukemia zinc finger protein | POK family of transcription factors |
NuRD | nucleosomal and remodeling deacetylase complex | chromatin remodeler |
HDAC1 | histone deactylase 1 | E2f-Rb family complex |
HDAC2 | histone deactylase 2 | E2f-Rb family complex |
Rb family (Rb, p107, p130) | retinoblastoma-associated proteins | E2f-Rb family complex |
ZNF93 | zinc finger protein 93 | KZNF |
TRIM28/KAP1 | tripartite motif containing 28/KRAB-associated protein 1 | chromatin regulator |
SIRT6 | sirtuin 6 | deacetylase and mono-ADP ribosyltransferase |
MPP8 | M-phase phosphoprotein 8 | HUSH complex component |
TASOR | transgene activation suppressor | HUSH complex component |
PPHLN1 | periphilin | HUSH complex component |
MORC2 | MORC family CW-type zinc finger 2 | chromatin regulator |
SETDB1 | SET domain bifurcated histone lysine methyltransferase 1 | histone methyltransferase |
FBXO44 | F-box only protein 44 | chromatin regulator |
SUV39H1 | histone-lysine N-methyltransferase SUV39H1 | histone methyltransferase |
Transcriptional repression | ||
SOX2 | SRY-box transcription factor 2 | transcription factor |
p53 | cellular tumor antigen p53 | transcription factor |
RNA and protein | ||
piRNA | PIWI-interacting RNAs | small silencing RNA |
siRNA | small interfering RNAs | small silencing RNA |
miR-128 | microRNA-128 | small silencing RNA |
let-7 miRNA | let-7 microRNA | small silencing RNA |
Drosha-DGCR8 | microprocessor (Drosha-DGCR8) | small silencing RNA component |
Condensin II & GAIT | condensin II & the gamma activated inhibitor of translation | chromatin remodeler |
MOV10 | RNA helicase MOV10 | ISG |
TREX1 | three-prime repair exonuclease 1 | ISG |
SAMHD1 | SAM domain and HD domain containing protein 1 | ISG |
RNase H2 | ribonuclease H2 | ISG |
ADAR1 | adenosine deaminases acting on RNA 1 | ISG |
ADAR2 | adenosine deaminases acting on RNA 2 | ISG |
Z3HAV1 (ZAP) | zinc finger antiviral protein ZC3HAV1 | ISG |
OAS-RNase L | 2′,5′-oligoadenylate synthetase (OAS)-RNase L | ISG |
AID | activation-induced cytidine deaminase | ISG |
APOBEC1 | apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 1 | ISG |
APOBEC3B | apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3B | ISG |
APOBEC3C | apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3C | ISG |
APOBEC3F | apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3F | ISG |
ATG5 | autophagy-related gene 5 | autophagy component |
TUT4 | terminal uridylyl transferase 4 | uridylase |
TEX19.1 | testis-expressed protein 19.1 | |
TPRT intermediates | ||
APOBEC3A | apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3A | ISG |
ERCC1/XPF | DNA excision repair protein ERCC-1/DNA repair endonuclease XPF | nucleotide excision repair factors |
BRCA1 (+HR factors) | breast cancer type 1 susceptibility protein | Fanconi anemia (HR factors) |
ATM | Ataxia telangiectasia mutated or ATM serine/threonine kinase | PI3 kinase |
TUT7 | terminal uridylyl transferase 7 | uridylase |
Most L1s are inactive due to point mutations, 5′ truncations, inversions or internal deletions (Lander et al., 2001; Ostertag and Kazazian, 2001). Furthermore, it is well established that the majority of the extant full-length L1s are transcriptionally repressed in most tissues through DNA methylation in the L1 5′ UTR region, which contains a canonical CpG island (Woodcock et al., 1997; Bourc’his and Bestor, 2004; Coufal et al., 2009). DNA methyltransferase 1 (DNMT1) maintains DNA methylation on newly replicated DNA strands in hemi-methylated dsDNA, while DNMT3A and DNMT3B introduce de novo DNA methylation at L1 sequences (Liang et al., 2002; Kato et al., 2007; Castro-Diaz et al., 2014; Li et al., 2015). However, there is differential regulation among L1 families, with evolutionarily younger L1s representing the main targets of DNA methylation. It is noteworthy that the rodent-specific DNMT3C also mainly targets evolutionarily young L1s in the male germline (type A, T and Gf) (Barau et al., 2016). Intriguingly, L1 copies repressed by DNA methylation do not overlap much with L1 elements that are repressed by KAP1 (an H3K9me3 histone modifier; see below), suggesting mutually exclusive regulation of L1s between DNA methylation and histone modification (Castro-Diaz et al., 2014). DNMT inhibitors such as cytosine analogs including 5-azacytidine and 5-aza-2′-deoxycytidine (decitabine) can also induce robust upregulation of L1s in colorectal cancer cell lines (Yang et al., 2004; Weber et al., 2010; Mehdipour et al., 2020). In addition, two DNA methylation regulatory proteins, promyelocytic leukemia zinc finger protein (PLZF) (Puszyk et al., 2013) and methyl-CpG-binding protein 2 (MeCP2), were also reported to inhibit L1 transcription (Fig. 5, DNA methylation) (Yu et al., 2001; Muotri et al., 2010).
Histone modifications are essential in maintaining broad L1 silencing as they repress the expression of both evolutionarily young and old L1s. The transcription factor E2F and the retinoblastoma (Rb) family form a complex that binds to CpG-rich promoter regions including L1 5′ UTR promoter regions, as shown in HeLa and mouse embryonic fibroblast (MEF) cells (Montoya-Durango et al., 2009; Montoya-Durango and Ramos, 2011). Depletion of the Rb family proteins (Rb, p107, p130) reduces repressive histone modification marks such as H3K9me3 and H4K20me3, which are associated with heterochromatin formation in both mouse and human L1s (Montoya-Durango and Ramos, 2011; Ren et al., 2021). In addition, the NuRD (nucleosomal and remodeling deacetylase) multiprotein complex was later identified as the repressor complex that associates with the E2F/Rb complex, leading to the recruitment of histone deacetylases HDAC1 and HDAC2 to L1 promoter regions (Fig. 5, deacetylation) (Montoya-Durango et al., 2009, 2016).
To establish new histone repressive marks on de novo-inserted L1s, the host cell may recognize unique sequences of the element to recruit heterochromatin formation factors. KRAB zinc finger proteins (KZNFs) have evolved to recognize specific sequence features of the newly inserted retrotransposons to repress their transcription. However, this form of suppression can drive the evolution of new elements that lose or mutate their uniquely recognized regions. This host–L1 arms race model was elegantly demonstrated by Jacobs et al. (2014), who showed that two KZNFs, namely ZNF91 and ZNF93, evolved to specifically recognize SVAs and older primate-specific L1s, respectively, but that this was followed by a mutation that facilitated L1 escape from ZNF93 suppression. ZNF93 suppresses older L1s but not younger L1 families including human-specific L1s, due to a 129-bp deletion in the 5′ UTR (Jacobs et al., 2014). KZNFs suppress L1 activation by recruiting KRAB-associated protein 1 (KAP1) (also known as tripartite motif containing 28 (TRIM28)). KAP1 silences a broad range of retrotransposons by establishing heterochromatin through the recruitment of histone methyltransferases, including SET domain bifurcated histone lysine methyltransferase 1 (SETDB1), for H3K9me3 deposition (Fig. 5, H3K9me3 maintenance) (Rowe et al., 2010; Turelli et al., 2014). KAP1 was also reported to promote L1 silencing together with Sirtuin 6 (SIRT6), which binds to the L1 5′ UTR and mono-ADP-ribosylates KAP1 to establish heterochromatin at these L1 loci (Van Meter et al., 2014). Similarly, F-box only protein 44 (FBXO44) binds to H3K9me3 to recruit another histone methyltransferase, SUV39H1, to also suppress L1 expression during DNA replication (Shen et al., 2021).
Genome-wide CRISPR/Cas9 knockout screening revealed that the human silencing hub (HUSH) complex (made up of three proteins: M-phase phosphoprotein 8 (MPP8), periphilin (PPHLN1) and transgene activation suppressor (TASOR)), along with MORC family CW-type zinc finger 2 (MORC2), suppresses the expression of evolutionarily younger L1 families through H3K9me3 establishment on the active L1 copies by recruiting heterochromatin regulators such as SETDB1 (Fig. 5, HUSH complex) (Tchasovnikarova et al., 2015; Liu et al., 2018; Seczynska et al., 2022). In contrast to KZNFs, which recognize specific sequences of the retrotransposons, PPHLN1, a component of the HUSH complex, recognizes actively expressed intronless cDNAs including retroviruses and retrotransposons, making the HUSH complex a broad and fast-acting repressor of newly integrated retroelements (Seczynska et al., 2022). Corroborating studies indicate that L1 RNA is upregulated by knockdown of the HUSH complex component MPP8 (Tunbak et al., 2020; Gu et al., 2021).
In a working model, epigenetic silencing of L1 starts with the detection of de novo L1 insertion sites, which is mediated by the HUSH complex and perhaps an unidentified KZNF, followed by recruitment of KAP1 and SETDB1 (Matsui et al., 2010) to establish H3K9me3-dependent heterochromatin in the L1 integration sites. Repressive histone marks may be maintained and spread by the HUSH and MORC2 complexes (Douse et al., 2020). Histone deacetylation by HDACs may synergistically reinforce the sequence of these steps to maintain L1 suppression, as L1 is also actively inhibited through histone deacetylation (Fig. 5) (Montoya-Durango et al., 2009, 2016; Garcia-Perez et al., 2010). As mentioned previously, the mutation allowing escape from ZNF93-mediated repression occurred on young L1s (e.g., L1Hs and L1PA2) (Jacobs et al., 2014); however, repression mediated by the HUSH complex suggests that the young L1s can still be silenced through both DNA methylation and H3K9me3 (Liu et al., 2018; Seczynska et al., 2022). Indeed, MPP8, which binds to H3K9me3, was shown to mediate de novo DNA methylation via DNMT3A (Kokura et al., 2010; Chang et al., 2011). Epigenetic silencing (both DNA and histone methylation) mainly targets younger transcriptionally active L1s; however, some older L1s still retain repressive histone marks that may affect neighboring gene expression (Rowe et al., 2010).
Several transcription factors such as YY1 are known to promote L1 expression as described previously, whereas others have been reported to downregulate L1 expression instead. Negative transcriptional regulators of L1 include the SRY-box transcription factor 2 (SOX2) (Tchénio et al., 2000; Muotri et al., 2005) and the tumor suppressor protein p53 (Fig. 5) (Wylie et al., 2016; Rodriguez-Martin et al., 2020; Tiwari et al., 2020). Recently, it was shown that p53 directly binds to the L1 5′ UTR to suppress transcription (Tiwari et al., 2020). Although most L1s are silenced by epigenetic regulation and/or transcription factor-mediated suppression, aberrant expression of L1s is commonly observed in many cancer cells (Burns, 2017, 2020), in aged mice due to a decline in epigenetic silencing and in SIRT6 knockout mice (De Cecco et al., 2019; Simon et al., 2019). Moreover, as mentioned earlier, reprogramming during embryogenesis reactivates silenced L1s, which peaks between the 2- and 16-cell stages (Jachowicz et al., 2017; Percharde et al., 2018). Since L1 expression is inevitable, the host employs extra layers of defenses against the reactivated retrotransposons, which are discussed in the next sections.
Post-transcriptional (RNA) regulationSmall silencing RNA pathways including PIWI-interacting RNAs (piRNAs), siRNAs and microRNAs (miRNAs) (Fig. 5) destabilize and silence a variety of TEs; in particular, the piRNA pathway is thought to have evolved to limit retrotransposon expansion (reviewed by Ghildiyal and Zamore, 2009; Roberts et al., 2014; Ozata et al., 2019). The expression of piRNAs is critical in the germline and during epigenetic reprogramming, where a surge of retrotransposon expression could detrimentally affect genome integrity, to silence L1s in mice (Soper et al., 2008; De Fazio et al., 2011; Pezic et al., 2014; Malki et al., 2019), golden hamsters (Hasuwa et al., 2021; Zhang et al., 2021) and humans (Marchetto et al., 2013). Outside the germline, piRNA function is not fully understood (Ross et al., 2014; Perera et al., 2019); siRNAs and miRNAs are seemingly more important in somatic cells. Since the L1 promoter is bidirectional, it could generate an endogenous dsRNA to form L1-targeting siRNA to silence L1 expression (Yang and Kazazian, 2006; Chen et al., 2012). Finally, the miRNA molecules miR-128 and let-7 inhibit L1 retrotransposition. miR-128 inhibits L1 through its direct binding to L1 RNA (Hamdorf et al., 2015) as well as the repression of transportin 1 (TNPO1), which may assist the nuclear import of L1 RNPs (Idica et al., 2017). let-7 also binds to the ORF2 coding region of L1 RNA and impairs its translation (Tristán-Ramos et al., 2020).
It has been reported that L1 RNA levels are markedly reduced by other host factors including hnRNPL (Peddigari et al., 2013), Microprocessor Drosha-DGCR8 (Heras et al., 2014) and MOV10 RNA helicase (Arjan-Odedra et al., 2012; Goodier et al., 2012; Li et al., 2013; Warkocki et al., 2018). Although the exact role of hnRNPL in the reduction of L1 RNA levels remains to be elucidated, hnRNPL may reduce transcriptional elongation but promote the use of cryptic splice sites and premature polyadenylation signals in L1 RNA (Peddigari et al., 2013). Drosha-DGCR8, an miRNA-processing complex, also reduces the steady-state level of L1 RNA by an miRNA-independent unknown mechanism (Heras et al., 2014), while MOV10 was shown to impede L1 RNP formation, reducing the amount of both L1 ORF1p and L1 RNA (Goodier et al., 2012; Li et al., 2013; Choi et al., 2018).
Post-translational (protein) regulationType I interferons (IFNs) are cytokines with broad functions that include combating viral infection (reviewed by Ivashkiv and Donlin, 2014; McNab et al., 2015). L1 is suggested to have an antagonistic relationship with type I IFNs, reminiscent of the virus–host interaction: treating cells with type I IFNs suppresses L1 retrotransposition, while knockdown of the IFN receptor IFNAR allows L1s to efficiently mobilize (Goodier et al., 2015; Yu et al., 2015). The IFN-mediated L1 retrotransposition inhibition is most likely mediated by interferon-stimulated genes (ISGs). This section focuses on the ISGs that are reported to inhibit L1 retrotransposition (Fig. 5, ISG proteins), some of which have been introduced earlier (see also “L1 retrotransposition cycle and host factors that facilitate L1 mobilization”).
TREX1, mutations in which were found in patients with a congenital autoimmune disease, Aicardi–Goutières syndrome (AGS), is an exonuclease domain-containing protein that degrades cytoplasmic ssDNA generated from retrotransposons including L1s (Stetson et al., 2008). A specific TREX1 knockout in neurons resulted in a marked increase of L1 ssDNA that led to a type I IFN response (Thomas et al., 2017). Besides degrading cytoplasmic ssDNA, TREX1 is also able to reduce ORF1p level independent of its exonuclease activity (Li et al., 2017).
SAMHD1 is another AGS-linked protein that inhibits retrovirus infection through dNTP depletion by its triphosphohydrolase activity (Goldstone et al., 2011). However, the triphosphohydrolase activity of SAMHD1 is not involved in L1 retrotransposition inhibition (Zhao et al., 2013); rather, SAMHD1 may inhibit L1 retrotransposition through blocking ORF2p RT activity (Zhao et al., 2013), and/or through promoting stress granule formation (Hu et al., 2015). At variance with these reports, Herrmann et al. (2018) suggested that SAMHD1 affects neither ORF2p RT nor stress granule formation. Thus, a comprehensive mechanism of action of SAMHD1 against L1 retrotransposition requires further investigation. RNase H2A and RNase H2B (also AGS-associated proteins) regulation of L1s is also controversial, as contrary results were reported: on the one hand, these RNases may assist L1 retrotransposition through degradation of L1 RNA after reverse transcription, releasing the first-strand L1 cDNA from a DNA/RNA hybrid for second-strand cDNA synthesis (Bartsch et al., 2017; Benitez-Guijarro et al., 2018); on the other hand, the same RNases may impede L1s by associating with the potent L1 inhibitor MOV10 (Choi et al., 2018).
ADAR1, yet another AGS-linked protein, catalyzes the deamination of adenosine to produce inosine in dsRNA (George et al., 2014) and inhibits L1 retrotransposition (Orecchini et al., 2017); however, as the catalytic activity is dispensable for the L1 inhibition and ADAR1 does not alter the amount of either L1 RNA or ORF1p, the mechanism remains obscure (Orecchini et al., 2017). The same research group recently reported that ADAR2 also inhibits L1 retrotransposition independent of its deaminase activity (Frassinelli et al., 2021). In 2015, two independent papers reported that the zinc finger antiviral protein ZC3HAV1 (ZAP) inhibits L1 retrotransposition (Goodier et al., 2015; Moldovan and Moran, 2015). ZAP colocalizes with and destabilizes L1 RNA and ORF1p in cytoplasmic foci, suggesting that it directly regulates L1 RNP stability (Moldovan and Moran, 2015).
The 2′,5′-oligoadenylate synthetase (OAS)/RNase L antiviral activity begins with detection of dsRNA by OAS, promoting the formation of an RNase L dimer, which in turn cleaves single-stranded regions of viral RNA (Dong and Silverman, 1995). The same mode of action was shown to be used in L1 RNA degradation and L1 retrotransposition inhibition (Zhang et al., 2014), suggesting that L1s form dsRNAs that trigger this antiviral system. MOV10 RNA helicase is known to be one of the most potent L1 retrotransposition inhibitors (Arjan-Odedra et al., 2012; Goodier et al., 2012). This may be due to two modes of L1 inhibition, through L1 RNA destabilization and ORF2p RT inhibition. In conjunction with MOV10, the destabilization of L1 RNA is mediated by TUT4-dependent uridylation and perhaps their co-localization in cytoplasmic foci, while TUT7-dependent uridylation of L1 RNA inhibits ORF2p RT activity (Warkocki et al., 2018).
The activation-induced cytidine deaminase (AID)/APOBEC family proteins are cytidine deaminases that function in C-to-U deamination on nascent ssDNA produced during viral infections (Harris et al., 2003; Vieira and Soares, 2013). These proteins have a wide range of inhibitory effects against not only exogenous retroviruses but also retrotransposons (Turelli et al., 2004; Bogerd et al., 2006a, 2006b; Chen et al., 2006; Chiu et al., 2006; Muckenfuss et al., 2006; Stenglein and Harris, 2006; Hulme et al., 2007; Kinomoto et al., 2007; Niewiadomska et al., 2007; MacDuff et al., 2009; Ikeda et al., 2011; Wissing et al., 2011; Metzner et al., 2012; Lindič et al., 2013; Richardson et al., 2014). AID blocks translation of L1 proteins by its direct binding to L1 RNA, thereby inhibiting L1 retrotransposition (MacDuff et al., 2009; Metzner et al., 2012). In the APOBEC3 family, APOBEC3A and APOBEC3B strongly reduce both L1 and Alu retrotransposition efficiency (Bogerd et al., 2006b; Chen et al., 2006; Muckenfuss et al., 2006; Stenglein and Harris, 2006; Niewiadomska et al., 2007; Wissing et al., 2011; Richardson et al., 2014). APOBEC3C and APOBEC3F also inhibit L1 retrotransposition, albeit less potently than APOBEC3A and APOBEC3B (Bogerd et al., 2006b; Muckenfuss et al., 2006; Niewiadomska et al., 2007); in contrast, APOBEC3G and APOBEC3H did not cause any reduction in L1 retrotransposition efficiency, but instead inhibited Alu retrotransposition (Turelli et al., 2004; Chiu et al., 2006; Hulme et al., 2007). To further complicate the picture, APOBEC3B, APOBEC3C and APOBEC3F inhibit L1 through a deamination-independent mechanism (Stenglein and Harris, 2006; Horn et al., 2014); a similar mechanism was also found in AID (MacDuff et al., 2009) and APOBEC1 (Ikeda et al., 2011). Only APOBEC3A inhibits L1 retrotransposition by deamination, which is described in the next section. Although the exact mechanism of APOBEC3B, APOBEC3C and APOBEC3F remains to be elucidated, co-localization of APOBEC3s with retroviruses and retrotransposons in stress granules or processing bodies was suggested to also sequester L1 RNPs and/or mediate L1 RNP degradation (Wichroski et al., 2006; Gallois-Montbrun et al., 2007; Horn et al., 2014; Goodier, 2016). In addition, APOBEC3C binding to L1 RNPs may inhibit ORF2p reverse transcription (Horn et al., 2014).
Several distinct pathways have also been suggested to restrict L1 activities. Autophagy targets retrotransposon RNAs and/or RNPs for degradation and limits L1 retrotransposition, as autophagy-related gene 5 (ATG5) depletion by siRNA or treatment with the autophagy inhibitor bafilomycin reduces L1 RNA decay and increases L1 retrotransposition (Guo et al., 2014). Reduction of L1 RNA by autophagy may involve the accumulation of L1 RNPs in stress granules, which could be cleared in autophagosomes (Fig. 5) (Buchan et al., 2013). Condensin II, an essential component of the chromosome architecture during mitosis, forms a complex with GAIT, which suppresses translation of L1 proteins through its binding to the L1 3′ UTR (Ward et al., 2017). This was surprising, since condensin II in this case functions in the cytoplasm instead of the nucleus, where it plays its canonical role (Green et al., 2012; Hirano, 2016). Finally, testis-expressed protein 19.1 (TEX19.1) inhibits L1 retrotransposition in the germline by recruiting the E3 ubiquitin ligase UBR2 to ubiquitinate L1 ORF1p for proteasomal degradation (MacLennan et al., 2017).
TPRT intermediatesAs described in the previous section, among the APOBEC family proteins, only APOBEC3A was shown to inhibit L1 retrotransposition by deamination (Fig. 5). Although no deaminated signature was found on de novo L1 insertion sequences (Muckenfuss et al., 2006), the deamination of C-to-U on the L1s was finally observed by Richardson et al. (2014) using RNase H in vitro or a uracil DNA glycosylase-deficient cell line in vivo, suggesting that APOBEC3A acts on ss L1 cDNAs where the resultant deaminated strand is immediately cleaved by a cellular nuclease during TPRT.
While several DNA repair factors facilitate L1 retrotransposition (see “L1 retrotransposition cycle and host factors that facilitate L1 mobilization”), it should be noted that the host also employs several DNA repair factors to inhibit L1 retrotransposition. The ERCC1/XPF nuclease complex, essential for the nucleotide excision repair pathway, inhibits L1 retrotransposition, possibly by its cleavage activity on L1 cDNA in TPRT intermediates (Fig. 5) (Gasior and Deininger, 2008; Servant et al., 2017). In some instances, the TPRT intermediates can exhibit conflicts with DNA replication forks where FA factors including BRCA1, which are key regulators in HR, inhibit L1 cDNA insertion (Ardeljan et al., 2020; Mita et al., 2020). Interestingly, a lack of the FA proteins leads to synthetic lethality when L1 is overexpressed in telomerase-immortalized retinal pigment epithelium-1 (RPE) cells (Ardeljan et al., 2020). These data raise the possibility that the FA pathway is vital to maintain cellular fitness when DNA replication forks encounter excessive L1 retrotransposition intermediates, and failure of this mechanism may lead to genome instability in FA-deficient tumors. Finally, it is noteworthy that BRCA1 can bind to L1 RNA and reduce ORF2p translation, implying a DNA repair-independent role for this protein in the cytoplasm (Mita et al., 2020).
Even with the transcriptional repression of L1s by the host, some L1s can still escape epigenetic silencing, leading to relatively high expression under certain circumstances. These include: (1) global DNA hypomethylation that usually occurs with aging (reviewed by Unnikrishnan et al., 2018); (2) local DNA hypomethylation and/or loss of histone repressive marks in the L1 promoter regions, which are typically found in cancer cells (Thayer et al., 1993; Alves et al., 1996; Shukla et al., 2013; Tubio et al., 2014; Scott et al., 2016; Burns, 2017, 2020); and (3) deregulation or mutations of L1 inhibitors, described in previous sections. Since L1 retrotransposition generates de novo insertions of L1 cDNA sequences into the genome, it could be harmful to genomic integrity, especially in the case of exonic insertions. The iconic finding of an L1 retrotransposition-mediated genetic mutation was reported by Kazazian et al. in 1988, where the L1 sequence was inserted into an exon of the blood clotting coagulation factor gene Factor VIII in two patients with hemophilia A (Kazazian et al., 1988). In 1992, Miki et al. first identified a cancer driver mutation resulting from an L1 insertion in a tumor suppressor gene, APC, in colon cancer cells (Miki et al., 1992). Since then, more than 100 cases of mutagenic L1, Alu, SVA and processed pseudogene insertions associated with human diseases have been reported, and it is now established that de novo L1 insertions can drive or contribute to sporadic cases of genetic diseases, which has been extensively reviewed (Hancks and Kazazian, 2016; Burns, 2017, 2020; Kazazian and Moran, 2017; Terry and Devine, 2020). The obvious explanation is that exonic insertions cause indel and frameshift mutations, while L1 insertion within introns and untranslated regions may also affect RNA stability and splicing (e.g., exon skipping and intron inclusion) (Beck et al., 2011; Hancks and Kazazian, 2016). However, genomic alterations caused by L1s may be underappreciated. As discussed in “Genomic alteration by L1 retrotransposition”, a recent genome analysis based on the Pan-Cancer Analysis of Whole Genomes (PCAWG) project revealed that L1 insertions potentially contribute to chromosomal rearrangements, including duplications, inversions, translocations and 3′ transductions, in various types of cancer cells (Rodriguez-Martin et al., 2020). Moreover, high L1 expression is a hallmark of human cancer, and approximately half of all cancers have somatic L1 retrotransposition, suggesting that L1-mediated genomic alterations are more common in cancer cells than was previously assumed (Tubio et al., 2014; Rodriguez-Martin et al., 2020). Due to these features, L1 ORF1p in particular has been proposed as a biomarker for cancer diagnosis (Rodić et al., 2014; Ardeljan et al., 2020; Cohen et al., 2020).
L1 retrotransposition is cytotoxic, which is potentially due to ORF2p EN activity generating DSBs that can be visualized by γ-H2AX signals (Gasior et al., 2006; Miyoshi et al., 2019), and replication stress (Flasch et al., 2019; Ardeljan et al., 2020; Mita et al., 2020). In addition to DNA damage, L1 intermediates can also induce inflammatory cytokines (discussed in the next section); as a case in point, knockout of a type I IFN receptor (IFNAR) reduced L1-mediated cell toxicity (Ardeljan et al., 2020). L1-mediated DNA damage may lead to apoptosis (Belgnaoui et al., 2006), senescence (Wallace et al., 2008) and cell cycle arrest, due to replication stress overwhelming replication-coupled DNA repair factors (e.g., FA proteins). As a major guardian of genomic integrity, p53 plays a central role in the cellular response to L1-mediated genetic damage and rearrangements; not surprisingly, p53 depletion permits cell growth and L1 retrotransposition (Ardeljan et al., 2020).
Beyond retrotransposition: retrotransposons and immune responsesNucleic acid sensing employed in the innate immune system is one of the predominant pathways of virus detection in the cytoplasm. The host has evolved several receptors that are able to differentiate between self and non-self nucleic acids (Janeway, 1992) or, alternatively, to detect danger signals that are associated with cellular damage (Matzinger, 2002). These receptors are referred to as pattern recognition receptors (PRRs) and are generally categorized as DNA or RNA sensors. Once a PRR recognizes anomalous cytoplasmic nucleic acids, a downstream signaling cascade will be activated to induce the expression of cytokines including type I IFN, depending on the particular pathway, which has been extensively reviewed elsewhere (Motwani et al., 2019; Hopfner and Hornung, 2020; Rehwinkel and Gack, 2020; Onomoto et al., 2021). Subsequently, detection of the cytokines by their respective receptors (e.g., IFNAR in the case of type I IFN) will lead to the upregulation of ISGs, collectively also known as IFN signatures, which function to inhibit viral replication. Although the type I IFN response is vital for host protection against pathogens, aberrant chronic and/or episodic activation of type I IFN is known as a hallmark in many autoimmune diseases such as AGS, systemic lupus erythematosus (SLE) and Sjögren syndrome (SS) (Ivashkiv and Donlin, 2014; Tsokos et al., 2016; Crow et al., 2019; Ukadike and Mustelin, 2021).
The main culprit behind chronic inflammation in autoimmune patients was historically thought to be viruses; however, recent evidence has shifted the view towards endogenous elements, including retrotransposons, as the prime suspects for the cause of persistent inflammation. Most active retrotransposons have been repressed by the host or have accumulated mutations that impair their mobilization; however, retrotransposon intermediates (RNA and/or cDNA) including ERV, L1 and Alu still induce the type I IFN response in humans. How these are detected as non-self or danger elements by PRRs remains to be elucidated. ERV dsRNA as a trigger of the innate immune response has been extensively reviewed by Grandi and Tramontano (2018). In the case of L1, both L1 RNA (Mavragani et al., 2016; Zhao et al., 2018; Tunbak et al., 2020) and the cDNA (Brégnard et al., 2016; De Cecco et al., 2019; Simon et al., 2019; Zhao et al., 2021) produced by ORF2p RT induce the type I IFN response. Treatment with RT inhibitors such as tenofovir or nevirapine ameliorates the inflammatory phenotypes caused by L1 expression, supporting the idea that L1 cDNA induces the immune response and exacerbates inflammation (Fig. 5, cGAS) (Thomas et al., 2017; De Cecco et al., 2019; Simon et al., 2019). In addition, knockdown of cyclic GMP–AMP synthase (cGAS), the cytoplasmic DNA sensor, abrogated immune response activation (Simon et al., 2019; Zhao et al., 2021). However, the question of how L1 cDNA accumulates in the cytoplasm remains unanswered, as it is widely believed that the L1 RNA is reverse transcribed in the nucleus during the TPRT reaction. Although Alu RNA was recently shown to be reverse transcribed in the cytoplasm, resulting in age-related macular degeneration in a retrotransposition-independent manner (Fukuda et al., 2021), there is no conclusive evidence for L1 cytoplasmic reverse transcription to date. Instead of cytoplasmic reverse transcription, one hypothesis postulates that L1 cDNA from abortive TPRT is exported into the cytoplasm (Brégnard et al., 2016). The origin of cytoplasmic L1 cDNA remains an important question to be answered.
In addition to cGAS, cytoplasmic RNA sensors, i.e., the RIG-I-like receptors (RLRs), MDA5 and RIG-I, recognize L1 RNA, and thus also induce the type I IFN response (Fig. 5, RLRs) (Zhao et al., 2018; Tunbak et al., 2020). The exact sequences or RNA structure(s) involved in this response are largely unknown; however, it is likely that L1 RNA forms secondary structures such as hairpins or dsRNA that could be sensed by MDA5 and/or RIG-I (Chiappinelli et al., 2015; Cuellar et al., 2017; Tunbak et al., 2020), both of which are activated by dsRNA. Although RIG-I also detects uncapped and unprocessed 5′-phosphate RNA molecules, it is unlikely that RIG-I recognizes L1 RNA through these features as L1 RNA is transcribed by RNA polymerase II and is probably 5′ capped (Swergold, 1990; Becker et al., 1993; Athanikar et al., 2004; Dmitriev et al., 2007). Interestingly, L1 RNA is recognized by TLR7/8 (Mavragani et al., 2016), which are membrane proteins that only sense RNA molecules in endosomes and/or extracellular regions. It is currently unknown whether L1 RNA/RNPs gain access to endosomes and how these TLRs detect L1 RNA, but it may occur upon cell death or via secretion of exosomes/microvesicles, which cause L1 RNA to be released into the extracellular space (Balaj et al., 2011; Kawamura et al., 2019).
Besides L1, Alu RNA was also shown to strongly activate the innate immune response. Ro60, an RNA-binding protein and a common autoantigen in SLE patients, was found to directly bind Alu RNA (Hung et al., 2015). Depletion of Ro60 significantly increases Alu RNA levels and the type I IFN response, suggesting that Ro60 binding to Alu RNA is important to suppress the immune response (Hung et al., 2015). The use of DNA methyltransferase inhibitors such as 5-azacytidine and decitabine induces an innate immune response that is mediated by inverted-repeat Alu (Mehdipour et al., 2020) and ERV dsRNA (Chiappinelli et al., 2015). In addition, depletion of ADAR1 upregulates Alu expression, leading to a stronger immune response. Unlike L1, Alu RNA is one of the major RNA molecules edited by ADAR1, which causes destabilization of Alu RNA to reduce the immune response (Athanasiadis et al., 2004; Levanon et al., 2004; Chung et al., 2018; Mehdipour et al., 2020; Nichols et al., 2021).
Retrotransposon-mediated immune responses may explain the chronic inflammation in autoimmune patients with no history of persistent viral infection. Indeed, retrotransposon detection by PRRs causes type I IFN induction and is readily observed in some autoimmune patients. Intriguingly, all seven genes that are linked to AGS regulate L1 (Crow et al., 2015): TREX1 (Stetson et al., 2008; Li et al., 2017; Thomas et al., 2017), SAMHD1 (Zhao et al., 2013; Hu et al., 2015; White et al., 2016; Herrmann et al., 2018), RNASEH2A, RNASEH2B, RNASEH2C (Benitez-Guijarro et al., 2018; Choi et al., 2018), IFIH1 (MDA5-encoding gene) (Zhao et al., 2018; Tunbak et al., 2020) and ADAR1 (Orecchini et al., 2017). A high IFN signature is a common hallmark in SLE and SS patients and, in parallel, L1 intermediate (RNA and/or DNA) levels are higher in SLE patients in comparison to healthy individuals (Mavragani et al., 2016, 2018). Typical antigens recognized by SLE autoantibodies such as Ro60 and Lupus La are also L1 RNP-interacting proteins (Goodier et al., 2013; Moldovan and Moran, 2015). Recently, an L1 ORF1p autoantibody was detected in SLE patients (Carter et al., 2020; Crow, 2020). Based on these observations, RT inhibitors have been proposed for autoimmune disease therapy to suppress L1-mediated cDNA production (Volkman and Stetson, 2014). However, a caveat to this approach is that since L1 RNA can also trigger the innate immune response, RT inhibitors may not completely eliminate the retrotransposon-mediated immune responses. Instead, an alternative drug or drug combination may be required for more efficient treatment.
These retrotransposon-mediated immune responses are detrimental to the host due to a persistent activation of the innate immune response; however, an acute upregulation may be beneficial in tumor elimination. Epigenetic therapy for cancer treatment activates HERV, Alu and L1 to elicit a cytotoxic effect that leads to cell death (Chiappinelli et al., 2015; Roulois et al., 2015; Jones et al., 2019). Upregulation of L1 and SINE by SETDB1 or FBXO44 knockdown (Griffin et al., 2021; Shen et al., 2021), or DNA methyltransferase inhibitor treatment (Mehdipour et al., 2020), induces the innate immune response and secretion of inflammatory cytokines that reactivate exhausted T cells, followed by enhanced immune infiltration to eliminate cancer cells (Mehdipour et al., 2020; Griffin et al., 2021; Shen et al., 2021). A combination of epigenetic therapy with knockdown of L1 and/or Alu repressors such as SETDB1 (Cuellar et al., 2017; Jones et al., 2019) or ADAR1 (Mehdipour et al., 2020) has been suggested to increase the efficacy of cancer treatments. The identification of new host factors that strongly inhibit retrotransposons will yield more candidates that could be used in combination with epigenetic and/or immune-based cancer therapies, considering the importance of the innate immune response in T cell maturation. Of note, L1 may also inhibit acute myeloid leukemia (AML), as knockdown of the HUSH complex component MPP8 allowed a marked increase of L1 transcripts, thereby substantially reducing AML cell viability (Gu et al., 2021). The DNA damage response may be involved in inducing cell cycle arrest, since the levels of γ-H2AX, a DSB marker, and p21, a cyclin-dependent kinase inhibitor, increased in MPP8-depleted or L1-activated MLL-AF9-transformed leukemia cells. In addition to the DNA damage response, it is possible that cell death occurs through the IFN response elicited by L1 RNAs (Cuellar et al., 2017). However, since relying on retrotransposons to induce cancer cell death may occasionally do more harm than good to the patient due to genomic instability and inflammatory phenotypes, further careful studies need to be done to weigh the benefits of exploiting retrotransposons for cancer treatment in the future.
From originally being dismissed as “junk” DNA, the focus of TE studies has progressed into an era of discovering the physiological roles that retrotransposons play in our cells, which impact our development and health. Historically, it has been difficult to study retrotransposons due to their repetitive nature; however, the development of new tools such as long read sequencing and big data analysis makes it easier and faster to map the expressed retrotransposons to their individual loci, find intact ORFs, and study the impacts of retrotransposon expression on cellular homeostasis. Isolation of highly expressed retrotransposons such as “hot” L1s in humans has accelerated the understanding of L1 retrotransposition mechanisms as well as the discovery of L1-associated host factors using L1 expression-permissive cell lines. Most L1s are transcriptionally repressed in somatic cells by epigenetic silencing, but some cells still express L1s, and L1 expression is intriguingly indispensable for normal embryonic progression. However, since high L1 expression is implicated in a plethora of diseases including genetic disorders, cancer and inflammatory phenotypes through both retrotransposition-dependent and -independent pathways, the view of L1 and other retrotransposons as parasitic elements may be upheld. In addition, as described in this review, a large number of host factors that restrict L1 activities at different stages of the replication cycle have been reported, suggesting that unregulated L1 expression is detrimental to the host. However, at the same time, several host factors assist L1 retrotransposition. With these diverse host regulators of L1s in mind, we leave readers with some outstanding questions.
Outstanding questions1. L1s have been interspersed in the host genome through millions of years of evolution; however, some L1s remain active in human and other species. Despite their potential to cause harmful effects on the host, is there any evolutionary advantage of having a small subset of active L1 copies?
2. Although multiple layers of host factors limit L1 activities (e.g., transcription, RNA stability, translation and genomic integration), some host factors facilitate L1 retrotransposition. How have L1s recruited and exploited host systems to increase their copy number? Is this evidence of a TE–host arms race? Or could L1 retrotransposition be permitted in order to increase intra-individual genetic variation, such as in neural progenitor cells?
3. L1s and other retrotransposons induce the innate immune response, which is implicated in autoimmune phenotypes. However, could retrotransposon expression act in immune training or function to maintain a basal level of IFNs for immune defense? Since L1s are interspersed, could they also function as a surveillance system to detect aberrant states in the chromatin structure, such as unregulated loss of repressive heterochromatin? Loss of heterochromatin during tumor development may simultaneously cause aberrant expression of genes and their neighboring L1(s), ERV(s) or Alu(s), which in turn might induce an immune response as a danger signal to eliminate potential cancerous cells.
The authors declare no competing interests.
We thank Drs. F. Ishikawa, J. A. Hejna and S. R. Richardson for discussions and critical reading of the manuscript. A. L.-F. was supported by JASSO and MEXT Scholarships. T. M. was supported by JSPS KAKENHI (Grant Numbers 21K19219 and 22H02600), ISHIZUE 2021 of Kyoto University Research Development Programs, and research grants from the Takeda Science Foundation, the Sumitomo Foundation for Basic Science Research Projects and the Astellas Foundation for Research on Metabolic Disorders.