2023 Volume 46 Issue 2 Pages 139-146
Repeat-associated non-AUG (RAN) translation is a pathogenic mechanism in which repetitive sequences are translated into aggregation-prone proteins from multiple reading frames, even without a canonical AUG start codon. Since its discovery in spinocerebellar ataxia type 8 (SCA8) and myotonic dystrophy type 1 (DM1), RAN translation is now known to occur in the context of 12 disease-linked repeat expansions. This review discusses recent advances in understanding the regulatory mechanisms controlling RAN translation and its contribution to the pathophysiology of repeat expansion diseases. We discuss the key findings in the context of Fragile X Tremor Ataxia Syndrome (FXTAS), a neurodegenerative disorder caused by a CGG repeat expansion in the 5′ untranslated region of FMR1.
DNA tandem repeats are the most unstable genetic elements found ubiquitously throughout the human genome.1) Not surprisingly, their expansion is known to cause more than 40 heritable diseases, most of which are neurological.2,3) A specific pathogenic repeat expansion causes diseases depending on multiple factors, including the extended repeat sequence, its size and location within the mutant gene, and the host gene’s function.4) Historically, repeat expansions in non-coding regions were presumed to cause human diseases exclusively via two potentially overlapping mechanisms: [1] loss of host gene function and [2] RNA-mediated toxicity.5) However, the discovery of repeat-associated non-AUG (RAN) translation has provided another twist to the already complex molecular pathways causing this group of diseases. RAN translation is a pathogenic mechanism in which repetitive sequences are translated into aggregation-prone proteins from multiple reading frames, even without a canonical AUG start codon.6) Initially discovered in spinocerebellar ataxia type 8 (SCA8) and myotonic dystrophy type 1 (DM1),6) RAN translation is now known to occur in the context of 12 disease-linked repeat expansions,7–12) making it a compelling therapeutic target common to this family of diseases (Table 1).
Disease | Gene | Repeat sequence | Genomic context | Translated protein | Reference |
---|---|---|---|---|---|
Spinocerebellar ataxia type 8 | ATXN8▪ ATXN8OS | CTG▪CAG | 3′ Untranslated region | Poly-Q, Poly-A, Poly-S | 6,80) |
Myotonic dystrophy type 1 | DMPK | CTG▪CAG | 3′ Untranslated region | Poly-Q | 6) |
Amyotrophic lateral sclerosis-frontotemporal lobar dementia | C9ORF72 | GGGGCC▪ GGCCCC | Intron | Poly-GP, Poly-GA, Poly-GR ▪ Poly-PR, Poly-GP, Poly-PA | 81–85) |
Fragile X tremor and ataxia syndrome | FMR1 | CGG▪CCG | 5′ Untranslated region | FMRpoly-G, FMRpoly-A ▪ ASFMRpoly-P, ASFMRpoly-A | 40,86) |
Fragile X primary ovarian insufficiency | FMR1 | CGG▪CCG | 5′ Untranslated region | FMRpoly-G | 87) |
Spinocerebellar ataxia type 2 | ATXN2 | CAG▪CTG | Open reading frame | Poly-Q and Poly-A | 8,52) |
Huntington’s disease | HTT | CAG▪CTG | Open reading frame | Poly-S, Poly-A ▪ Poly-C, Poly-L | 88) |
Spinocerebellar ataxia type 3 | ATXN3 | CAG▪CTG | Open reading frame | Poly-Q, Poly-A | 89) |
Spinocerebellar ataxia type 31 | BEAN1▪TK2 | TGGAA | Intron | Poly-WNGME | 90) |
Spinocerebellar ataxia type 36 | NOP56 | TGGGCC▪GGCCCA | Intron | Poly-WA, Poly-GP, Poly-GL ▪ Poly-GP, Poly-AQ, Poly-PR | 9,10) |
Neuronal intranuclear inclusion disease | NOTCH2NLC | CGG▪CCG | 5′ Untranslated region | uN2Cpoly-G | 11,12) |
Fuch’s endothelial corneal dystrophy | TCF4 | CTG▪CAG | Intron | Poly-C | 91) |
Myotonic dystrophy type 2 | CNBP | CCTG▪CAGG | Intron | Poly-LPAC ▪ Poly-QAGR | 92) |
Eukaryotic mRNA translation typically begins at the AUG codon closest to the mRNA’s 5′ end that is capped with 7-methyl guanosine. The initiation step of this process involves more than ten eukaryotic initiation factors (eIFs), Methionyl initiator tRNA (Met-tRNAiMet), and the small ribosomal subunit.13) The cap-binding complex eIF4F made of the cap-binding subunit eIF4E, the large connector protein eIF4G, and the helicase eIF4A first binds the mRNA 5′-cap and then becomes a scaffold for the recruitment of 43S ribosomal pre-initiation complex (PIC) loaded with eIFs 1, 1A, 2, 3, 5 and Met-tRNAiMet.14) The 5′-loaded PIC then searches for the start codon closest to the cap in the process called scanning. Start codon selection depends on the gate-keeper protein eIF1 bound to the ribosome’ decoding site, P-site,15,16) and control of its binding and release mediated by PIC-bound eIF and their subunits, such as eIF1A, eIF5, eIF3c and eIF2b.17–21)
However, translation does not always initiate with AUG codons, and the surrounding sequence context (such as the Kozak sequence) dramatically affects the efficiency of this critical step of gene expression.22) In keeping with this notion, genome-wide translation profiling has revealed numerous sites of non-AUG translation initiation, and their biological significance has been highlighted in recent years.23,24) Furthermore, translational control has emerged as a promising area of research given the advent of comprehensive genome-wide translatome analyses.25,26) Non-AUG initiation was believed to occur at “near-cognate” start codons with one-base substitution compared to AUG, such as CUG and GUG.27) Thus, the discovery of RAN translation without involving typical near-cognate start codons was a mystery not only to medical experts but also to basic molecular biologists.
This review discusses recent advances in understanding the regulatory mechanisms controlling RAN translation and its contribution to the pathophysiology of repeat expansion diseases. We discuss the following key points in the context of Fragile X Tremor/Ataxia Syndrome (FXTAS), a neurodegenerative disorder caused by a CGG repeat expansion in the 5′ untranslated region of FMR1:
FXTAS is an adult-onset, progressive neurodegenerative condition characterized by action tremor, cerebellar ataxia, cognitive decline, and parkinsonism linked to generalized brain atrophy.28) The disease is caused by a moderate expansion of a CGG repeat in the 5′ untranslated region (UTR) of the FMR1 gene.29,30) Expansions of >200 CGG repeats in the same gene were first linked to Fragile X syndrome, the most common genetic form of autism caused by the complete epigenetic silencing of FMR1.29–31) Whereas neurologically healthy individuals possess less than 45 repeats in the abovementioned genetic locus, FXTAS patients fall within the premutation range of 55–200 repeats, resulting in the upregulation of FMR1 mRNA.32) Patients possessing larger repeat lengths of 100–200 show a mild reduction in Fragile X Mental Retardation Protein (FMRP), the gene’s protein product32) (Fig. 1).
Boxes show the FMR1 gene structure in healthy and affected individuals. CGG, the area spanning the CGG repeat expansion: The size of its lettering correlates with the approximate size of repeat expansion. Curves, FMR1 mRNA molecules. Circles, FMRP molecules. Neurologically healthy individuals typically carry between 5–44 CGG trinucleotide repeats in the 5′-UTR of FMR1. Repeat length in the fragile X premutation range is 55 to 200, resulting in the upregulation in FMR1 mRNA, a modest decrease in FMRP, and a higher risk of acquiring FXTAS. Repeat length in the full mutation is >200; FMR1 transcription is abolished due to DNA hypermethylation, and the lack of FMRP results in FXS.
Because of the elevated transcription of FMR1 seen in patients, a gain-of-function mechanism mediated by the mutant RNA has been proposed to be the primary driver of neurodegeneration in this disease.33) The mutant transcripts are hypothesized to sequester multiple RNA-binding proteins (RBPs) into pathological RNA-containing aggregates called RNA foci, thus preventing them from fulfilling their normal function. Indeed, RBPs such as Pur α, HNRNP A2/B1, CUGBP1, and Sam68 have been detected in Drosophila disease models and FXTAS patient brains.34–36) However, neuropathological examination of FXTAS brains also revealed the presence of ubiquitinated inclusions in neurons and astrocytes containing molecular chaperones that do not directly interact with the CGG repeat-bearing transcript.37,38) Such inclusions resemble neuronal intranuclear aggregates in polyglutamine disorders, where the primary pathological substrates are composed of proteins.39) These studies suggest that FXTAS may not be caused exclusively by RNA-mediated toxicity and that a protein-mediated pathogenic mechanism may be contributing to FXTAS pathogenesis.
In 2013, the Todd group at the University of Michigan resolved this paradox by demonstrating that the FMR1 CGG repeats are RAN-translated in at least two reading frames, resulting in the accumulation of FMRpolyG and FMRpolyA.40) FMRpolyG contributed to CGG repeat toxicity in Drosophila and mouse models and accumulated in FXTAS patients’ brains.40) Follow-up neuropathological analyses revealed that FMRpolyG is a near-obligate component of the neuronal intranuclear inclusions characteristic of the disease.41) The Charlet–Berguerand group generated transgenic mouse models expressing CGG repeat RNA with or without FMRpolyG.42) Their in-vivo models showed that the expression of CGG repeat RNA alone was not pathogenic.42) At the same time, FMRpolyG accumulation triggered neurodegeneration mainly through the sequestration of LAP2β and the disruption of nuclear lamina architecture.42) Consistent with these findings, astrocyte-specific expression of the disease-linked CGG repeats was also sufficient to cause intranuclear inclusions, FMRpolyG accumulation, and motor dysfunction in mice.43) Collectively, the studies conclude that RAN translation is a driving force behind neurodegeneration in FXTAS.
The regulatory mechanisms surrounding RAN translation have remained elusive. Nevertheless, researchers from the Todd laboratory (University of Michigan) have shown that RAN translation shares molecular requirements with AUG-dependent translation in the context of the FMR1 CGG repeats.44) They engineered a series of reporters in which the FMR1 CGG repeats are positioned in multiple reading frames upstream of a nanoluciferase (nLuc) and 3xFLAG tag lacking an AUG initiation signal (Fig. 2A). In this manner, fusion proteins are produced if the translation is initiated from the FMR1 5′ UTR sequence. In line with previous observations, the 0 reading frame did not form a Poly-R product. However, fusion proteins were produced from the +1 and +2 frames, indicating the formation of FMRpolyG and FMRpolyA, respectively.44) Western blot analyses revealed that translation of FMRpolyG initiates upstream of the CGG repeats, while the production of FMRpolyA begins within the CGG repeat motif.44) Expectedly, ACG and GUG near-cognate initiation codons located 5′ to the CGG repeat tract were found to drive translation in the +1 frame.44) Furthermore, experiments using an in-vitro translation system demonstrated an m7G cap-dependent pathway involving the cap-binding protein eIF4E and the RNA helicase eIF4A for both the +1 and +2 reading frames.44)
(A) Structure of RAN translation reporter mRNA. m7G, 5′ end cap. pA, poly A sequence. Green part, derived from FMR1 5′ leader sequence. Red half arrows indicate RAN translation start sites. Purple part, nLuc-3xFLAG tag protein coding region. Wedge indicates the location of 1 (+1) or 2 (+2) base insertion introduced to make the corresponding RAN reading frame in-frame to nLuc. The table on the left shows the peptide products examined by each reporter. (B) Schematic diagram for a possible cap-dependent RAN translation initiation mechanism (+2 frame). The thick line is the mRNA containing the CGG repeat sequence. Hairpin is the secondary structure presumed to be formed by the CGG repeats.46) The bottom of the hairpin is unzipped, allowing the stalled PIC to initiate at a RAN start codon. The orange oval represents the scanning ribosome. Red half arrow indicates the RAN translation start site. The blue square marks a repeat start codon presumed to be utilized by RAN translation. The blue oval indicates FMRpolyA produced from the +2 frame. This mechanism is applicable to other types of RAN translation that initiates within the repeats.
The cap-dependent translation of the +1 reading frame came without surprise because its major start sites involved near-cognate initiation codons (Fig. 2A). In contrast, it was astounding to note that the +2 frame was also cap- and eIF4E/A-dependent because the repeat codon, GCG, not the 5′-cap, was previously thought to directly recruit the 43S PIC to mediate translation in this frame. Instead, the eIF4E/4A-dependence of FMRpolyA translation suggested that the 5′-loaded PIC is involved in initiating this reading frame. The mechanism of cap-dependent RAN translation of the +2 frame remains elusive. However, translation from “near-cognate” start codons is known to be enhanced by a secondary structure located approx. 20-bases downstream of the start codons.45) Because this distance matches the size of the 3′ half of mRNA bound by a translating ribosome, it is presumed that the scanning ribosome stalls due to the secondary structure located at its leading edge, positioning the start codon in its P-site for a long enough time. In this way, the ribosome mis-initiates at the start codon at a higher frequency. By the same token, RAN translation initiated within the repeat is understood as a rare mis-initiation due to ribosome stalling at a secondary structure formed by the trinucleotide repeats46–48) (Fig. 2B) (also see below).
Given that multiple pathogenic mechanisms synergize to drive neurodegeneration in repeat expansion disorders,49,50) identifying RAN translation-specific regulators will provide the ultimate toxicity test for proteins generated by this pathogenic process.51) In recent years, tremendous strides have been taken to develop genetic and chemical approaches to block RAN translation without compromising canonical AUG translation.52–55) In search of a specific inhibitor of RAN translation, the Asano group focused on the translation regulator, eIF5-mimic protein (5MP). It is known that 5MP suppresses general non-AUG translation.56) Since RAN translation of the FMR1 CGG repeats is known to be cap-dependent,44) the group hypothesized that the expression of 5MP can inhibit RAN translation in this genetic context.57)
Humans have two copies of 5MP, 5MP1/BZW2, and 5MP2/BZW1.58,59) Using the previously engineered RAN translation reporters from the Todd lab, the Asano group investigated the effects of 5MP1 and 5MP2 overexpression on RAN translation. 5MP1 and 5MP2 suppressed the translation of FMRpolyG translation and FMRpolyA. They also investigated 5MP binding to the ribosomal PIC and showed that 5MP binds the ribosome by interacting with eIF3c to repress non-AUG translation. By thorough “whole-lane” mass-spectrometry analyses, they demonstrated that 5MP is bound to the PIC both in human and fly cells. Importantly, the introduction of a mutation that disrupts the interaction with eIF3c prevented 5MP1 from both binding the PIC and inhibiting RAN translation. Thus, 5MP1 suppresses non-AUG translation in an eIF3c-dependent manner57) (Fig. 3A).
(A) Left, Structure of the 43S pre-initiation complex during 5ʹ scanning. The mRNAs regulated by this protein complex are found below. eIF, eukaryotic translation initiation factor. The light gray area represents the ribosome. Part of eIF5 and eIF3c (3c1, 3c2) suppresses the function of eIF1 leading to slightly inaccurate translation initiation. Right, when 5MP is part of the complex, the inhibition by those factors is released, eIF1 binds stably, and the start codon is accurately recognized. Translation of AUG-initiated mRNA is promoted and non-AUG translation (e.g., CUG, GUG, RAN translation) is suppressed. (B) Expression of pathogenic CGG repeat sequence in the compound eye of Drosophila. The left (GFP) is a control strain. Expression of the CGG repeats causes RAN translation-mediated photoreceptor cell death manifesting as a rough eye phenotype. The three on the right are flies expressing 5MP/Kra. RAN translation is suppressed by 5MP, resulting in normal compound eyes.
Based on this result, the Todd lab generated Drosophila models overexpressing 5MP and tested the functional consequences of inhibiting RAN translation in vivo. In flies, 5MP is encoded by the krasavietz (kra) gene.59) The group crossed two UAS-5MP/Kra overexpressing lines with a previously constructed GMR-Gal4, UAS- (CGG)103-EGFP line, exhibiting a rough-eye phenotype due to photoreceptor cell death driven by CGG repeat toxicity. Kra overexpression led to the near-complete loss of FMRpolyG aggregates and rescued the rough-eye phenotype in this model (Fig. 3B). To further validate their findings, they investigated the effects of Kra expression in Drosophila models ubiquitously expressing (CGG)90-EGFP, which exhibit reduced life spans. Kra overexpression significantly extended the life spans of (CGG)90-EGFP flies. Conversely, Kra knockdown further reduced the life span of the animal model. The results of these experiments highlight that 5MP/Kra is a suppressor of RAN translation in vivo.57)
Based on our reporter assays, the efficiency of RAN translation in the FMR1 CGG repeat context is merely 3% at +1 frame and less than 1% at +2 frame compared to the AUG initiation reporter excluding the repeats.57) Similar RAN translation efficiencies have been observed in the context of the TAF1 hexanucleotide repeat expansion linked to X-linked dystonia-parkinsonism (Reyes et al., MDS-22-0489, DOI:10.1002/mds.29183).60) On the other hand, a previous study claimed that the translation of the +1 frame is approx. 30–40% compared to AUG-initiated repeat constructs using purified mRNA in an in-vitro translation system.44) The main limitation of this system is its relaxed cap-dependence and start codon selection stringency. Thus, the experiment is appropriate to evaluate cap-dependence qualitatively but can exaggerate the measured RAN translation efficiency.
On the other hand, our results show that the same eIF3-mediated mechanism suppresses non-AUG translation.57) Although homopolymeric repeat proteins may destabilize nLuc, our results strongly suggest that RAN translation is a highly inefficient process. Thus, RAN translation can be viewed as faulty initiation and can be controlled by enhancing the efficiency of its AUG-initiated counterpart (Fig. 2B).
The inefficient nature of RAN translation is consistent with the slow accumulation of aggregation-proteins in long-lived cells such as neurons.57,61) If RAN translation is harmful, why would evolution favor the conservation of repetitive elements across the human genome? It has been pointed out that the CGG repeat sequence upstream of FMR1 is conserved in mammals and has expanded from roughly eight repeats to 20 or more copies, especially in primates.62) The Todd group proposed an exciting hypothesis that CGG repeats serve a physiological function by controlling FMRP translation.63) In FMR1 mRNA, the +1 and +2 frames of the CGG repeat sequence are in a different reading frame relative to the FMR1 coding region, each forming an upstream read frame (uORF). In eukaryotes, the translation of uORF immediately suppresses the translation of the downstream reading frame due to the 5′ cap-dependent translation.64) Although the efficiency of RAN translation is not high, multiple ribosome initiation complexes are considered to stay in repetitive sequences and suppress the translation of downstream open reading frames.65–67) Interestingly, translation of the uORF containing the CGG repeats is antagonistic to the downstream FMR1 translation, and hence, the activation of glutamate receptors at synapses suppresses uORF translation, resulting in FMR1 translation.64)
FMRP, encoded by FMR1, is an RNA-binding protein that suppresses the translation of mRNAs in the brain, including synaptic proteins involved in autism spectrum disorders.68) When glutamate receptors are activated at synapses, FMRP is ubiquitinated and rapidly degraded, while triggering the translation of FMR1 mRNA, generating new FMRP.69,70) RAN translation of the CGG repetitive sequence suppresses FMRP synthesis, which is vital for the balance between FMRP degradation and translational supplementation during synaptic activation, and, thus, for synaptic homeostasis.63)
RAN translation can occur by means of aberrant start codon recognition by the ribosome, which can be suppressed by the translation regulatory factor, 5MP.57) FXTAS is known to occur in 40% of men with CGG repeat expansions in the premutation range.71,72) Other repeat expansion disorders caused by RAN translation, like SCA8, also exhibit reduced penetrance and do not affect all mutation carriers.73–76) This phenomenon may be attributed to differences in the expression levels of translational regulators such as 5MP,57) repeat interruptions that stabilize RNA secondary structures,77) or protein-degrading factors such as PSBM5.78) Many laboratories around the world are currently searching for such disease-modifying targets.
On the other hand, studies have also shown that the onset of repeat expansion disorders is partially mediated by RNA toxicity. In keeping with this notion, FMRpolyG was shown to directly bind the CGG repeat-bearing transcript to cause the cell-to-cell spreading of pathology and motor dysfunction in mice.79) This finding indicates that the repeat-bearing RNA transcript contributes to disease pathogenesis. Consequently, a better mechanistic understanding of RAN translation will require us to probe how RNA structures interact with scanning ribosomes and their translation products. Elucidating the roles of 5MP and other translational regulatory factors in RAN translation, particularly concerning FMRP expression and neurophysiology, will thus mark a significant turning point for understanding the pathophysiology of repeat expansion disorders and potential treatment options for this group of genetic diseases.
KA’s research is funded by Innovative Award from Terry Johnson Cancer Center, KSU, NIH grant (GM124671), NSF Research Grant (1412250) and JSPS KAKENHI (18K19963). CJFR was supported by a Ph.D. scholarship from the Katholischer Akademischer Ausländer-Dienst (KAAD).
The authors declare no conflict of interest.