Genes & Genetic Systems
Online ISSN : 1880-5779
Print ISSN : 1341-7568
ISSN-L : 1341-7568
Special reviews
Retrotransposon-derived transcripts and their functions in immunity and disease
Mahoko Takahashi ueda
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML

2023 Volume 98 Issue 6 Pages 305-319

Details
ABSTRACT

Retrotransposons, which account for approximately 42% of the human genome, have been increasingly recognized as “non-self” pathogen-associated molecular patterns (PAMPs) due to their virus-like sequences. In abnormal conditions such as cancer and viral infections, retrotransposons that are aberrantly expressed due to impaired epigenetic suppression display PAMPs, leading to their recognition by pattern recognition receptors (PRRs) of the innate immune system and triggering inflammation. This viral mimicry mechanism has been observed in various human diseases, including aging and autoimmune disorders. However, recent evidence suggests that retrotransposons possess highly regulated immune reactivity and play important roles in the development and function of the immune system. In this review, I discuss a wide range of retrotransposon-derived transcripts, their role as targets in immune recognition, and the diseases associated with retrotransposon activity. Furthermore, I explore the implications of chimeric transcripts formed between retrotransposons and known gene mRNAs, which have been previously underestimated, for the increase of immune-related gene isoforms and their influence on immune function. Retrotransposon-derived transcripts have profound and multifaceted effects on immune system function. The aim of this comprehensive review is to provide a better understanding of the complex relationship between retrotransposon transcripts and immune defense.

INTRODUCTION

The human genome, once considered a static blueprint, is now recognized as a dynamic landscape that is continuously shaped by evolutionary forces. Within this vast genome, retrotransposons serve as hidden treasures of genetic elements, primarily composed of diverse families such as endogenous retrotransposons (ERVs), long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs). These retrotransposons, remnants of mobile DNA elements integrated into our genome over long periods of time, were previously regarded as “genomic parasites” (Smit, 1999; International Human Genome Sequencing Consortium, 2001). However, they have now emerged as essential components that are capable of influencing the functions of our immune system. Retrotransposons have played dual roles as drivers and participants in the evolutionary advancement of our immune system (Göke and Ng, 2016; Grandi and Tramontano, 2018b; Ferrari et al., 2021; Russ and Iordanskiy, 2023). Certain sequences within retrotransposons possess functions that combat pathogens, fine-tune immune responses, and maintain the delicate balance between defense and tolerance, thereby contributing to the acquisition of novel mechanisms within our immune system (Göke and Ng, 2016). Moreover, accumulating evidence implicates the dysregulation of retrotransposons and their derived genes in disease processes, disrupting normal immune function and contributing to the development of various conditions such as cancer, autoimmune diseases and neurological disorders (e.g., reviewed by Grandi and Tramontano, 2018a; Tam et al., 2019; Liu et al., 2020). However, recent research has revealed that retrotransposons harbor highly regulated immune responsivity, playing critical roles in immune system development and function, and profoundly impacting immune surveillance of cancer, as well as resistance and response to infections (Goodier, 2016; Jansz and Faulkner, 2021). Therefore, in this review, I focus on the relationship between retrotransposon-derived transcripts and the immune system. The impact of retrotransposon-derived transcripts on immune functions is extensive and multifaceted. These transcripts actively participate in the regulation of immune responses, modulating inflammation, immune cell differentiation and immune cell activation (e.g., Mustelin and Ukadike, 2020). By summarizing the structural features, life cycle, molecular mechanisms and implications for disease development of retrotransposon transcripts, I seek to comprehensively understand the complex interplay among retrotransposons, their transcripts and our immune defense.

RETROTRANSPOSON STRUCTURE AND LIFE CYCLE

The immune system is a remarkable defense mechanism that protects the body from invading pathogens and maintains overall health. Throughout evolution, the immune system has undergone significant adaptations to effectively combat a wide range of microbial challenges. Among the intriguing transcripts derived from retrotransposons that contribute to the complexity and functional diversity of the immune system are mainly ERVs, LINEs and SINEs (Göke and Ng, 2016; Grandi and Tramontano, 2018b; Ferrari et al., 2021; Russ and Iordanskiy, 2023). These elements fall under two broad categories: long terminal repeat (LTR) and non-LTR retrotransposons. While this review will explore both LTR and non-LTR retrotransposons, special attention will be given to ERVs as representatives of LTR retrotransposons, and to LINEs, SINEs and Alu elements within the non-LTR category. These elements contain sequences derived from virus-like proteins (Nakagawa and Takahashi, 2016) and have played important roles in the formation of the immune system through conjugation and functional diversification processes. While all retrotransposon groups were active in the early stages of evolution, their activities are now limited to specific types of retrotransposons due to their selfish and highly mutagenic nature. Recent studies have revealed that retrotransposons function as a source of novel genetic material and contribute to the evolution of various biological processes, including immunity (Grandi and Tramontano, 2018a; Ferrari et al., 2021).

LTR retrotransposons

LTR retrotransposons are characterized by their LTRs, the identical or nearly identical sequences that flank the element at both ends. These LTRs serve as control regions for transcription initiation and termination. This review will focus on ERVs as a representative subset of LTR retrotransposons, due to their prevalence and impact on human health. ERVs, which make up approximately 8% of the human genome, consist of genetic sequences derived from exogenous viruses. They typically exhibit a retroviral genome structure, comprising up to four open reading frames (ORFs) for genes and LTRs as gene control regions (Fig. 1A). These ORFs include gag, which codes for the essential capsid protein for retroviral replication, pro and pol, which encode enzymatic activities, and env, which encodes the envelope glycoprotein that mediates viral entry. Many mammalian ERVs have lost the ORF for env, which is essential for infection of new target cells. When ERVs lose the env gene, they often show heightened retrotransposition activity, leading to a substantial increase in their proliferation within the genome — approximately a 30-fold boost (Magiorkinis et al., 2012). The IAPs (intracisternal A-particles) in mice serve as a prime example of this phenomenon. The majority of IAP sequences have lost env, which has contributed to their widespread distribution and proliferation within the mouse genome (Ribet, 2008).

Fig. 1. Retrotransposon structures. Schematic structures of (A) an LTR retrotransposon, and (B) non-LTR retrotransposons. Each retrotransposon, upon integration into the genome, creates a duplicated target site (gray arrowheads) and incorporates a poly(A) tail. EN: endonuclease; RT: reverse transcriptase; PAS: polyadenylation signal. Autonomous transcription is driven by Pol II or Pol III and terminates at the PAS. In (B), the red dashed box in the 5′ UTR of LINE highlights ORF0, which is transcribed in the antisense orientation and is specific to primate LINE-1.

ERVs are generally transcribed from a Pol II promoter within the LTR and are subsequently cleaved by proteases encoded by the pro gene (Fig. 1A). Reverse transcription occurs when reverse transcriptase binds to a specific region called the primer binding site in the ERV RNA, and transfer RNA (tRNA) molecules function as primers for reverse transcription (Wilhelm and Wilhelm, 2001). The resulting cDNA product then associates with the enzyme integrase, encoded by pol, and is incorporated into the host genome through a process similar to cut-and-paste transposition mediated by a retrotransposon-like transposase (Curcio and Derbyshire, 2003; Hickman and Dyda, 2016).

In most cases, the ability of ERVs to replicate and produce virus particles has been lost due to accumulated mutations and indels during evolution. However, certain species still harbor ERV sequences that retain functional ORFs and replication capacity. For example, the mouse genome contains numerous active ERVs, including those derived from mouse mammary tumor virus (MMTV) and murine leukemia virus (MLV), which can infect as both endogenous and exogenous retroviruses (Stocking and Kozak, 2008). In contrast, the human genome harbors evolutionarily ancient ERVs that have integrated over millions of years, but currently lacks observed instances of intact ERVs that can be expressed as infectious virions.

Non-LTR retrotransposons

A significant portion of the genome is composed of non-LTR retrotransposons, a type of retrotransposon that lacks LTRs. Among these non-LTR retroelements, LINEs represent the most abundant group in the human genome, accounting for approximately 20% (International Human Genome Sequencing Consortium, 2001). Notably, LINE-1 (or L1) is the only active transposon still present in humans, being capable of mobilizing itself to new genomic locations through retrotransposition. Although there are around 500,000 copies of LINE-1 in the human genome, only a small fraction, approximately 80–100 copies, retain their mobility (Brouha et al., 2003; Beck et al., 2010; Orecchini et al., 2017). A complete LINE-1 element consists of a promoter located in the 5′ untranslated region (UTR), the ORF1 gene encoding an RNA-binding protein, the ORF2 gene with endonuclease and reverse transcriptase activities, and the 3′ UTR that provides a poly(A) tail (Goodier, 2016) (Fig. 1B, top). While the precise function of the ORF1 protein in LINE-1 elements is not fully elucidated, one study has indicated that it forms oligomers and is involved in the recognition and transport of template RNA into the nucleus (Richardson et al., 2015).

Once in the nucleus, the transported RNA utilizes a mechanism called target-primed reverse transcription (TPRT), which is facilitated by the endonuclease and reverse transcriptase activities of the ORF2 protein (Luan et al., 1993; Moran et al., 1996; Richardson et al., 2015). TPRT initiation in LINE-1 occurs through an endonuclease-mediated single-strand nick at the typical 5′-TT/AAAA-3′ site. Additionally, the ORF2 protein can bind not only to LINE-1 mRNA but also to other retrotransposons and unrelated RNA molecules. Notably, the retrotransposition of non-autonomous retrotransposons such as SINEs, which do not encode proteins themselves, relies on the presence of LINE-1 ORF2 (Dewannieux et al., 2003; Richardson et al., 2015).

In addition to the elements mentioned earlier, it is important to note that primate LINE-1 exhibits some unique characteristics. The LINE-1 5′ UTR contains a primate-specific ORF0 in the antisense orientation (Denli et al., 2015) (Fig. 1B, top). ORF0 possesses its own promoter and is distinguished by a highly conserved strong Kozak sequence that facilitates translation initiation. ORF0 enhances LINE-1 mobility, potentially contributing to retrotransposon-mediated genetic diversity in primate genomes.

One group of non-autonomous retrotransposons is SINEs. In mammals, SINEs are primarily derived from the reverse transcription of tRNA molecules (Okada, 1991). They possess a complex structure consisting of a 5′ head (or left monomer) with diverse origins, a central body (or right monomer) with various origins, and a 3′ tail that is often associated with the 3′ end of LINEs (Kramerov and Vassetzky, 2011) (Fig. 1B, bottom). The 5’ base sequence of SINEs plays a crucial role in determining their origin and classification into subfamilies (Bao et al., 2015). Upstream sequences of SINEs can regulate their transcription (Chesnokov and Schmid, 1996; Tatosyan et al., 2020), and an RNA polymerase (Pol) III promoter is present within the sequence (Deininger, 2011). Among the subfamilies of SINEs, Alus are exceptionally abundant in the human genome, accounting for approximately 11% with over one million copies (International Human Genome Sequencing Consortium, 2001). They are derived from the 7SL RNA gene and are specific to primates.

DIVERSE TRANSCRIPTS DERIVED FROM RETROTRANSPOSONS

Transcription of retrotransposons involves different mechanisms depending on the type of element. ERVs utilize an internal RNA Pol II promoter located within the LTR, while LINEs have a promoter in their 5′ UTR (Fig. 1A and 1B, top). Transcription termination occurs when ERVs recognize the polyadenylation signal within the 3′ LTR (U3 or R segment). For LINEs, termination typically occurs within the 3′ UTR, but it is prone to read-through transcription (for more details on read-through transcription, see the discussion later in this section) (Honigman et al., 1985; Dombroski et al., 1991).

SINEs initiate transcription through an internal Pol III promoter (Fig. 1B, bottom) and typically terminate transcription upon encountering a specific run of T residues in the DNA template (A residues [An] in the mRNA). In addition to this, SINEs can also be transcribed by Pol II when located within gene introns. When transcribed by RNA Pol II, transcription often terminates due to specific motifs in the flanking sequences (Conti et al., 2014). The transcribed SINEs, particularly Alu elements, contribute to double-stranded RNA (dsRNA) formation in human cells. They can uniquely form dsRNA structures by themselves, mainly due to their ability to create hairpins or open dsRNA hybrids (Kawahara and Nishikura, 2006; Bazak et al., 2014; Sadeq et al., 2021). Interestingly, the activity of Pol III is increased under stress conditions, such as viral infection and heat shock (Berger and Strub, 2011).

While I have focused so far on dsRNA formation specifically from SINEs, it is important to note that diverse transcripts of other retrotransposons can also give rise to dsRNA through various mechanisms. One such mechanism involves the complementarity of sense and antisense transcripts (for a comprehensive review, see Sadeq et al., 2021). These dsRNAs can be byproducts of convergent transcription, where both strands of DNA are transcribed in opposite directions (Fig. 2A, top). In the case of ERVs, although they are less commonly associated with dsRNA, bidirectional LTR promoters can facilitate dsRNA formation (Domansky et al., 2000; Chiappinelli et al., 2015). The other mechanism by which retrotransposons form dsRNA is not fully understood, but it is hypothesized that retrotransposons on separate, nearby transcripts can hybridize with each other to form dsRNA (e.g., Kim et al., 2019). Additionally, neighboring retrotransposons oriented in opposite directions in a transcript may form hairpin structures (Fig. 2A, bottom). These mechanisms collectively contribute to the diversity of dsRNA transcripts.

Fig. 2. Diverse transcripts of retrotransposons. (A) Generation of dsRNA. (Upper) dsRNA produced through convergent transcription and typical read-through transcript (chimeric or fusion transcript) of LINEs. (Lower) dsRNA can also be formed by adjacent retrotransposons oriented in opposite directions, applicable to all types of retrotransposons including ERVs, LINEs and SINEs. (B) Various chimeric transcripts between retrotransposons and neighboring genes/lncRNAs. Dashed lines represent spliced introns. (Top) Chimeric transcripts generated via LTRs with a splicing event. Note that the coding sequence of the ERV is marked for emphasis but does not necessarily represent a full-length/intact ERV. (Middle) When retrotransposons are inserted into gene regions, they generate more complex chimeric transcripts that include the retrotransposon. These retrotransposons can become part of an exon or induce convergent transcription, leading to the generation of dsRNA. (Bottom) In the most common scenario, a retrotransposon is inserted into the 3′ UTR region and becomes exonized.

Beyond the formation of dsRNAs, retrotransposons contribute to the transcriptome in various ways, particularly impacting coding genes and long non-coding RNA (lncRNAs). Indeed, research indicates that over one-third of transcripts encoding human proteins and more than three-quarters of lncRNAs contain exons derived from retrotransposons (Kelley and Rinn, 2012; Kapusta et al., 2013). For example, LINE-1 elements exhibit weak polyadenylation signal activity, leading to a significant portion of transcripts undergoing 3′ read-through transcription (Holmes et al., 1994; Moran et al., 1999) (Fig. 2A, top). In contrast, incomplete transcripts of retrotransposons can arise from early polyadenylation or splicing events (Perepelitsa-Belancio and Deininger, 2003; Belancio et al., 2006; Schrom et al., 2013; Thompson et al., 2016) (Fig. 2B, top). When retrotransposons are inserted into genomic regions containing genes, they introduce an additional layer of complexity to their transcripts, resulting in chimeric transcripts of exonized sequences that may further contribute to coding genes or lncRNAs (Fig. 2B, middle and bottom).

RETROTRANSPOSON EXPRESSION AND IMMUNE RESPONSES

All transcripts derived from retrotransposons have the potential to participate in immune responses due to their sequence features and insertion locations. The direct transcription of retrotransposons, whether full-length or fragments, can trigger immune reactions. Normally, retrotransposon expression is suppressed through epigenetic silencing under steady-state conditions (Rowe and Trono, 2011). However, retrotransposon transcription can be activated during viral infections and other events (Stauffer et al., 2001). For example, in individuals infected with HIV-1, both RNA and protein expression of HERV-K are increased in blood samples (Contreras-Galindo et al., 2006, 2007). Similarly, SARS-CoV-2 infections can induce the expression of ERVs and other retrotransposons in human lung-derived cells and peripheral blood mononuclear cells (Marston et al., 2021; Charvet et al., 2023).

One mechanism underlying the increased expression of ERVs involves their association with small ubiquitin-like modifier (SUMO) proteins (Everett et al., 2013). SUMOylation is a dynamic post-translational modification that is crucial for regulating various cellular processes, including transcription, mRNA processing and chromatin remodeling (Zhao, 2018). While TRIM28 (KAP1) is known to suppress ERV expression (Lee et al., 2018; Tie et al., 2018), viral infections can lead to the loss of SUMO-modified TRIM28, resulting in ERV derepression (Schmidt et al., 2019). However, the impact of TRIM28 on retrotransposons, excluding ERVs, is context-dependent. In mouse embryonic stem cells (Rowe et al., 2010) and neural progenitor cells (Fasching et al., 2015), TRIM28 moderately inhibits LINEs but not SINEs. Conversely, in mouse dendritic cells, TRIM28 specifically inhibits ERVs, but not LINEs or SINEs (Chikuma et al., 2021). Activation of retrotransposons other than ERVs correlates with increased LINE-1 transcription and global DNA demethylation in systemic lupus erythematosus (SLE) patients (Huang et al., 2014; Mavragani et al., 2016, 2018). Reduced expression of DNMT1 and DNMT3a, DNA methyltransferases, is believed to be associated with this phenomenon (Balada et al., 2008; Nawrocki et al., 2017). Notably, cancer chemotherapy with DNA demethylating agents can induce retrotransposon activation (Chiappinelli et al., 2015; Roulois et al., 2015).

RETROTRANSPOSONS ARE RECOGNIZED AS PAMPs

Toll-like receptor family

In the immune system, innate immune sensors called pattern recognition receptors (PRRs) recognize specific molecules derived from pathogens such as viruses (Fig. 3). These molecules, known as pathogen-associated molecular patterns (PAMPs), include lipids, proteins, glycans and nucleic acids (DNA and RNA). For instance, during their replication cycle, viruses produce dsRNAs that are recognized as PAMPs, initiating an immune response and activating the viral defense pathway (Chow et al., 2015). Similarly, retrotransposons are believed to mimic viruses and to be recognized as PAMPs, thereby promoting immune responses (Young et al., 2012; Chiappinelli et al., 2015; Roulois et al., 2015; Kassiotis and Stoye, 2016). This phenomenon is known as viral mimicry. Toll-like receptors (TLRs) are the major PRRs responsible for detecting PAMPs. In humans, TLRs consist of groups 1–10, while other species have their own particular TLRs (Hidmark et al., 2012; Sameer and Nissar, 2021). Specifically, TLR3 and TLR7/8 are mainly localized in endosomes, where they detect dsRNAs and single-stranded RNAs (ssRNAs), respectively, whereas TLR4 is localized on the cell surface (Fig. 3).

Fig. 3. An overview of innate immunity within cells, focusing on the recognition of retrotransposon-derived RNAs and proteins. The pattern recognition receptors include Toll-like receptors (TLRs) and the cytosolic RIG-I-like receptor family (RIG-I, MDA5). dsRNA: double-stranded RNA; ssRNA: single-stranded RNA.

TLR signaling in humans is mainly mediated by two pathways: the MyD88 (myeloid differentiation primary response 88)-dependent and the TRIF (TIR domain-containing adapter-inducing interferon-β)-dependent pathways (reviewed in Akira et al., 2006). All TLRs except TLR3 activate inflammatory genes and cytokines such as IL-6 and TNF-α through NF-κB and MAPK signaling using the MyD88-dependent pathway. TLR3 uniquely employs the TRIF-dependent pathway to activate IRF3 and IRF7 signaling, which in turn promotes the expression of type I interferons (IFN-I) and interferon-stimulated genes. TLR3 is also known to be activated by dsRNAs derived from HERVs (Chiappinelli et al., 2015) (Fig. 3). Meanwhile, TLR4 is versatile in its signaling, capable of utilizing both MyD88- and TRIF-dependent pathways. TLR4 detects lipopolysaccharides and can also recognize viral products that contain HERV env-derived proteins as PAMPs via its MyD88-dependent pathway (Duperray et al., 2015) (Fig. 3).

In multiple sclerosis (MS) patients, the surface subunit of the env protein of HERV-W has been shown to interact with TLR4 and its co-receptor CD14 (Rolland et al., 2006). Multiple HERV families, including HERV-K, HERV-H and HERV-W, have been detected in MS patients, with HERV-W identified as the most relevant in genome-wide association studies (GWAS) meta-analysis (Morandi et al., 2017). HERV-K/HERV-W have also been implicated in various neural disorders and are detected in patients’ blood, cerebrospinal fluid and brain tissue (Johnston et al., 2001; Jeong et al., 2010; Douville et al., 2011; Douville and Nath, 2014). RNA derived from HERV-K (HML-2) has been demonstrated to be recognized by ssRNA-sensing TLR7 in mice and TLR8 in humans (Fig. 3). Additionally, the env gene of HML-2 contains a sequence motif (GUUGUGU) similar to HIV ssRNA40, which induces NF-κB activation in human macrophages, suggesting potential interactions with TLR8 (Dembny et al., 2020).

Retinoic acid-inducible gene I family

Retinoic acid-inducible gene I (RIG-I)-like receptors (RLRs) are cytoplasmic PRRs comprising three groups: RIG-I, melanoma differentiation-associated protein 5 (MDA5) and laboratory of genetics and physiology 2 (LGP2). While MDA5 recognizes viral dsRNA (Wu et al., 2013), RIG-I can detect both ssRNA and dsRNA (Gurtler and Bowie, 2013) (Fig. 3). Although limited research has been conducted on the interaction between RLRs and retrotransposons, it has been demonstrated that RNA derived from HERVs (HML-2) can bind to and activate RLRs (Mikhalkevich et al., 2021). Similarly, RNA derived from LINEs is recognized as PAMPs by various innate immune sensors, such as RIG-I and MDA5. The secondary structure of LINE RNA is detected by natural immune sensors including RNA helicases and RNA sensors. Recent RNA immunoprecipitation sequencing (RIP-seq) analysis examining the interaction between LINE-1 and ZCCHC3, a co-receptor of RIG-I, MDA5 and TLR3, identified a significant interaction between high-CpG LINE-1 RNA and ZCCHC3 (Šulc et al., 2021). While the direct binding of LINE-1 CpG motifs to RIG-I or MDA5 remains unclear, the involvement of LINE-1 expression in innate immune responses mediated by ZCCHC3 has been demonstrated. Additionally, Šulc et al. reported that the majority of complementary segments identified in the genome (88%) are inverted-repeat Alu elements. Interestingly, LINE-1 and Alu elements are predominantly present in regions with low or no activity under steady-state conditions. Thus, the expression of these retrotransposons under epigenetic stress may activate PRRs, leading to the elimination of uncontrollable cells and the maintenance of tissue homeostasis, suggesting a potential defense mechanism in the body (Chen et al., 2012; Ishak and De Carvalho, 2020).

RETROTRANSPOSON EXPRESSION AND ITS IMPLICATIONS IN DISEASE

Retrotransposons are transcribed in various cellular states and have been found to be expressed in multiple diseases and their progression (Leonova et al., 2013; Chiappinelli et al., 2015; Roulois et al., 2015; Tanne et al., 2015; Sheng et al., 2018; Mehdipour et al., 2020). Extensive research has focused on the association between retrotransposon expression and autoimmune diseases, where the immune system mistakenly attacks the body’s own tissues (reviewed in Rice et al., 2018). Increased retrotransposon expression has been observed in conditions such as autism spectrum disorder, amyotrophic lateral sclerosis, schizophrenia and attention deficit hyperactivity disorder, suggesting their impact on disease susceptibility (Balestrieri et al., 2014, 2019; Longinetti and Fang, 2019; Tamouza et al., 2021). Notably, certain sequences derived from ERVs contribute to the production of self-reactive antibodies and dysregulation of immune tolerance mechanisms. In diseases like SLE and rheumatoid arthritis, ERV activation and subsequent production of virus-like particles can trigger inflammatory responses, contributing to chronic autoimmune conditions (Freimanis et al., 2010; Tokuyama et al., 2018, 2021). Additionally, retrotransposons have been implicated in aging and chronic inflammation associated with aging, as the relaxation of epigenetic control mechanisms may contribute to their activation (De Cecco et al., 2019; Chikuma et al., 2021).

One of the extensively studied areas regarding the association with retrotransposons is cancer. Aberrant expression of retroelements has been observed in various types of cancer, including breast cancer, lung cancer, ovarian cancer and others (Wang-Johanning et al., 2001, 2007; Chen et al., 2012; Patnala et al., 2014; Tang et al., 2017; Jung et al., 2018; Mendis et al., 2019; Ng et al., 2023). In particular, the HML-2 env protein has been suggested to have oncogenic potential (Bannert et al., 2018). LINE-1 elements show highly elevated activity in cancer, with reports of more than 100 new insertions in tumors (Helman et al., 2014; Tubio et al., 2014; Rodić et al., 2015; Tang et al., 2017; Jung et al., 2018; Rodriguez-Martin et al., 2020). Recent studies have also revealed interactions between ERVs and LINE-1 elements in the context of cancer, where ERV promoters activate neighboring LINE-1 elements, leading to retrotransposition events (Gerdes et al., 2016; Jansz and Faulkner, 2021). Additionally, LINE-1 elements interact with various intracellular factors and pathways involved in the DNA damage response and RNA processing, contributing to genome instability and potentially influencing cancer development (Gasior et al., 2006; Lee et al., 2012; Rodić et al., 2015; Pizarro and Cristofari, 2016; Servant et al., 2017; Ardeljan et al., 2020).

In cancer, the regulation of retrotransposon expression involves multiple layers of complexity. For instance, DNA methylation is often decreased in cancerous cells, leading to the derepression of normally silenced retrotransposons such as LINE-1 (McKerrow et al., 2022; Alkailani and Gibbings, 2023). Furthermore, certain transcription factors like c-Myc, known to be overexpressed in various cancers, can bind to retrotransposon elements and activate their transcription (Alkailani and Gibbings, 2023). Proteins involved in the RNA interference pathway, such as Dicer and Argonaute, are often dysregulated in cancer, affecting the post-transcriptional control of retrotransposon-derived RNAs (Yang and Kazazian, 2006).

It is worth noting that, although dysregulation of retrotransposons can contribute to tumorigenesis, retrotransposon activation can also have immunostimulatory effects, thus enhancing antiviral defenses. Notably, in melanoma patients treated with immune checkpoint therapy, a high level of retrotransposon and viral defense gene expression was significantly associated with a durable clinical response (Chiappinelli et al., 2015). However, in certain patients, the immune activation driven by aberrant retrotransposon expression can exacerbate symptoms or contribute to autoimmune disorders, highlighting the complex relationship between retrotransposons and immune function.

While the precise mechanisms linking retrotransposons to disease development are still under investigation, it is increasingly evident that retrotransposons play multifaceted roles in various pathological processes. The identification of retrotransposons associated with different diseases holds the potential to generate valuable biomarkers for early detection and monitoring. Furthermore, strategies targeting the activity of retrotransposons, such as epigenetic modifications and immune regulation, offer promising approaches for novel therapeutic interventions in diseases involving retrotransposon involvement.

RETROTRANSPOSON-DERIVED GENES

During the course of host genome evolution, certain gene sequences contained within retrotransposons have been co-opted and retained with actual physiological functions. One of the best-known examples is the syncytin gene. Syncytins, derived from the env of ERVs, have been found to contribute to placental development by promoting cell–cell fusion within the syncytiotrophoblast layer (Blond et al., 2000; Mi et al., 2000). These syncytin genes originated from the env gene of retroviruses that infected the germline of ancestral mammals, and different sequences have independently integrated into the genomes of various mammalian lineages. In humans, two highly expressed syncytin genes, syncytin-1 (ERVWE1, HERV-W) and syncytin-2 (ERVFRDE1), play crucial roles in placental development. Syncytin proteins are also involved in immune regulation and tolerance between the mother and fetus, leading to extensive studies on the interaction between ERVs and the immune system in reproductive biology (see reviews for details: Dupressoir et al., 2012; Durnaoglu et al., 2021). For example, Syncytin-1 has been suggested to suppress the activation of T cells and natural killer cells, potentially preventing rejection of the fetus by the maternal immune system (Mangeney et al., 2007; Holder et al., 2012; Tolosa et al., 2015). However, Syncytin-1 has also been shown to increase the interferon response in certain contexts. Notably, Syncytin-1 has been detected in patients with schizophrenia and autoimmune diseases such as MS (Antony et al., 2004; Leboyer et al., 2013). On the other hand, Syncytin-2 serves distinct roles in both placental development and immune regulation. While it also facilitates the formation of the syncytiotrophoblast layer, Syncytin-2 uniquely modulates immune responses by suppressing T cell activity through exosome-mediated mechanisms (Lokossou et al., 2020). In addition to syncytin genes, another env-derived gene known as suppressyn (Sugimoto et al., 2013, 2019; Frank et al., 2022) also plays a role in placental development and has been implicated in antiviral functions and immune responses. Unlike Syncytins, Suppressyn (SUPYN) lacks a transmembrane domain but retains the ability to bind to the amino acid transporter ASCT2 (SLC1A5). ASCT2 is a receptor for exogenous mammalian type D retroviruses and is thought to limit retroviral infection in developing embryos and the germline (Tailor et al., 1999; Yoshikawa et al., 2012; Shimode et al., 2013). The interaction between SUPYN and ASCT2 is believed to restrict retroviral infection in these contexts (Frank et al., 2022).

Examples of retrotransposon-derived genes that possess functional roles preserved in the host genome, similar to syncytin genes, are relatively few. Co-opted genes like syncytins have not been identified in LINE elements. However, a comprehensive search for virus-like proteins in mammalian genomes has identified numerous ERVs and LINEs with ORFs of over 80 amino acids (Nakagawa and Takahashi, 2016). Among these ERV ORFs in humans, 9.7% (1,243/12,879) are predicted to have the potential for transcription based on CAGE-seq and codon-based analyses (Ueda et al., 2020). In fact, a novel env-derived gene called ERVpb1, specifically expressed in hematopoietic cells, was identified from the list of ERV ORFs (Matsuzawa et al., 2021). The function of the Ervpb1 gene is not well understood, but, similar to syncytin genes, it contains highly conserved putative receptor binding domains and immunosuppressive domains, suggesting an impact on immune responses. Uncovering new genes from these unannotated ERV ORFs in the genome and elucidating their functions could contribute to advancements in immunology.

RETROTRANSPOSON-CONTAINING GENES

Transcription of retrotransposons results in the generation of chimeric transcripts where retrotransposon fragments or entire elements are incorporated into mature mRNA molecules (Rebollo et al., 2012). This phenomenon has contributed to the diversification of gene isoforms through the activity of retrotransposons. Recent advancements in long-read sequencing technologies have shed light on the significant number of these chimeric transcripts. A study utilizing Nanopore RNA-seq to identify isoforms in 29 human immune cell subsets has revealed notable incorporation of retrotransposons into the 3′ UTR of immune-related genes (Inamo et al., 2022). These novel isoforms contain evolutionarily new retrotransposon insertions that likely occurred after the divergence of humans and other mammals, specifically integrating into immune-related gene loci (Inamo et al., 2022). The diversity of 3′ UTR sequences resulting from retrotransposon insertions is likely to induce changes in RNA–binding protein interactions, leading to functional alterations in immune-related genes, including differences in mRNA stability and translation levels. For instance, an Alu insertion polymorphism was identified in the 3′ UTR of the ADIPOQ gene, which plays a role in anti-inflammatory and immune regulatory functions (Kojima et al., 2023). The expression level of the isoform with the Alu insertion differed significantly from that of the isoform without the Alu insertion (Kojima et al., 2023). Furthermore, the isoform with the Alu insertion showed an inverse correlation with the expression level of FAM120A, an RNA-binding protein, suggesting the potential impact of retrotransposon insertions on inter-individual and population-level genetic variation in immune responses (Kojima et al., 2023). These findings indicate that retrotransposons contribute to the diversity of immune-related genes and, more specifically, serve as important factors providing diverse protein-binding sites in the 3′ UTR.

Additionally, specific HERV groups are found in a reverse orientation to the 3′ UTR of gene clusters regulated by STAT1. In cancer cells, the methylation-mediated repression of these loci is released upon exposure to IFN-γ, resulting in bidirectional transcription of STAT1-activated gene promoters and antisense ERV, leading to the transcription of SPARCS (stimulated 3 prime antisense retroviral coding sequences) (Cañadas et al., 2018). SPARCS activate TLR3 by generating dsRNA, inducing a type I IFN response, and triggering positive feedback of innate immune signaling.

However, the presence of retrotransposons incorporated into known genes may pose challenges in detection using short-read sequencing methods due to mapping issues. Furthermore, their expression may be limited to specific conditions, such as cellular stimulation. As a result, the abundance of chimeric transcripts formed by retrotransposons and known genes is likely to be underestimated.

FUTURE DIRECTIONS

Understanding the precise mechanisms by which retrotransposon-derived transcripts contribute to immune responses and regulation is an ongoing and dynamic area of research. While the focus has often been on the aspects of retrotransposon-derived transcripts that enhance immune responses, recent discoveries have revealed that some retrotransposons actually play a role in suppressing immune reactions. For example, LINE-1 Lx9c11 has been found to act as an lncRNA that controls Schlafen genes and protects mice from viral infections (Bartonicek et al., 2022). Investigating the diverse functions of retrotransposon transcripts and elucidating the associated molecular mechanisms and pathways will provide valuable insights into the complex interplay between retrotransposons and the immune system.

Current technical challenges in retrotransposon analysis

There are several technical challenges and future research directions in this field. One major challenge is accurately detecting retrotransposons within the vast amount of genomic data and understanding their characteristics. Retrotransposons are highly repetitive sequences, making their identification and annotation difficult using traditional short-read sequencing approaches. Furthermore, distinguishing true expression changes of retrotransposons, which can act as PAMPs, from changes in the expression of host genes containing retrotransposon insertions remains challenging. To address these challenges, a combination of long-read sequencing for accurate annotation and short-read sequencing for expression data can be utilized. Ongoing efforts to improve the accuracy of retrotransposon quantification using long-read sequencing technologies, such as PacBio and Oxford Nanopore sequencing, are focused on advancements in throughput, error rate reduction, and the development of tools for more accurate quantification (Hu et al., 2021; Zhu and Liao, 2023).

Additionally, understanding the landscape of retrotransposon transcripts and their associated functions across different species and individuals requires comprehensive genomic sequencing. Recent advancements in long-read sequencing technology have made significant progress in this regard. For example, the telomere-to-telomere (T2T) genome was sequenced using a combination of Nanopore and PacBio technologies, resulting in a fully decoded human genome that features precise assembly of retrotransposon-rich regions (Nurk et al., 2022). Furthermore, pan-genomes based on the T2T genome have been constructed using PacBio sequencers, providing additional layers of genomic diversity and accuracy (Gao et al., 2023; Liao et al., 2023). For example, Liao et al. have generated a pan-genome draft that includes phased diploid assemblies from a genetically diverse cohort, covering over 99% of the expected sequences with over 99% accuracy at both structural and nucleotide levels. This ongoing project aims to encompass the genomes of 350 individuals, enabling more accurate and comprehensive representation of genomic variation, including retrotransposons. Integrating such pan-genome information with expression analyses will provide valuable insights into complex immune-related genes and the impact of retrotransposon transcripts on diseases.

Prospects for the disease relevance and therapeutic applications of retrotransposons

As mentioned earlier, the association between ERVs and the immune system has long been suggested due to their viral mimicry properties. However, recent studies have begun to reveal the potential contributions of transcripts derived from LINEs and retrotransposition events to immune signaling and disease development. A comprehensive understanding of the crosstalk between different retrotransposon families, including ERVs and LINEs, will shed light on their collective impact on host biology.

Exploring the functional relevance of retrotransposon-derived genes in specific diseases holds promise as a future research direction. Integration with genetic analyses such as eQTL (expression quantitative trait locus), sQTL (splicing quantitative trait locus) and GWAS can help narrow down candidate sequences for further investigation. By unraveling the precise mechanisms through which these elements contribute to disease onset, novel therapeutic targets and diagnostic markers can be discovered. Targeted interventions aimed at modulating retrotransposon activity are particularly promising in diseases where abnormal expression of these elements is observed, presenting opportunities for precision medicine approaches.

CONCLUDING REMARKS

Research on retrotransposons has unveiled their intricate involvement in immune responses and disease. Future investigations will require leveraging long-read data, exploring associations with other retrotransposon families, and conducting analyses to uncover the disease-specific relevance and functions of retrotransposon-derived transcripts and genes. By pursuing these directions, we can maximize our understanding of retrotransposons as key players in immunity and pave the way for innovative therapeutic strategies and advancements in diagnostics.

FUNDING

This work was supported by a Japan Society for the Promotion of Science Grant-in-Aid for Scientific Research (C) (grant number: 21K06817).

ACKNOWLEDGMENTS

I would like to express my gratitude to Dr. Yuta Kochi and Dr. Kensuke Yamaguchi for their valuable discussions. I also extend my thanks to all colleagues involved in the studies discussed in this review. I apologize to those whose work has not been referenced here.

REFERENCES
 
© 2024 The Author(s).

This is an open access article distributed under the terms of the Creative Commons BY 4.0 International (Attribution) License (https://creativecommons.org/licenses/by/4.0/legalcode), which permits the unrestricted distribution, reproduction and use of the article provided the original source and authors are credited.
https://creativecommons.org/licenses/by/4.0/legalcode
feedback
Top