2018 Volume 43 Issue 1 Pages 75-83
Although the definition of a noncoding RNA (ncRNA) is an RNA molecule that does not encode a protein, recent evidence has revealed that some ncRNAs are indeed translated to give rise to small polypeptides (usually containing fewer than 100 amino acids). Despite their small size, however, these peptides are often biologically relevant in that they are required for a variety of cellular processes. In this review, we summarize the production and functions of peptides that have been recently identified as translation products of putative ncRNAs.
Key words: long noncoding RNA (lncRNA), circular RNA (circRNA), primary miRNA (pri-miRNA), translation, peptide
Noncoding RNAs (ncRNAs) have been defined as a class of RNA molecules that are transcribed from genomic DNA but do not encode proteins. They include traditional RNA molecules such as transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs) as well as other small RNAs such as microRNAs (miRNAs), Piwi-interacting RNAs (piRNAs), small nuclear RNAs (snRNAs), and small nucleolar RNAs (snoRNAs), many of which play a variety of important biological roles (Pasut et al., 2016).
Among ncRNAs, a family of RNA molecules that contain >200 nucleotides are categorized as long ncRNAs (lncRNAs), with the identification of such molecules having been facilitated by the development of high-throughput sequencing of cDNA clones (Carninci et al., 2005; Consortium et al., 2007). Although lncRNAs do not harbor apparent open reading frames (ORFs) for proteins of >100 amino acids, recent studies have identified a subset of lncRNAs that actually encode polypeptides of <100 amino acids (Fig. 1A) (Anderson et al., 2015; Kondo et al., 2010; Matsumoto et al., 2017; Nelson et al., 2016; Pauli et al., 2014).
Schematic models for the processing and translation of putative ncRNAs. (A) Peptides encoded by lncRNAs. Many mature lncRNAs are modified with a 5' cap and 3' poly(A) tail and are exported from the nucleus to the cytosol. Some of these molecules with small ORFs are taken up by ribosomes and translated. (B) Peptides encoded by circRNAs. A noncanonical form of splicing, known as back-splicing, is responsible for the generation of circRNAs, which do not contain a 5' cap or 3' poly(A) tail. These RNAs are exported to the cytosol and translated if they possess an IRES element. (C) Peptides encoded by pri-miRNAs. Plant pri-miRNAs have been shown to encode peptides. In both animals and plants, pri-miRNAs are synthesized by RNAPII and are modified with a 5' cap and 3' poly(A) tail. Most miRNA duplexes are processed by DCL1 and transported to the cytosol for formation of the RNA-induced silencing complex (RISC) that mediates suppression of target mRNAs. However, some pri-miRNAs containing small ORFs may be transported to the cytosol without processing and then translated.
Another newly identified class of “translated” ncRNAs comprises circular RNAs (circRNAs) that are localized mainly in the cytosol (Hansen et al., 2013; Memczak et al., 2013). These RNA molecules consist of only exon sequences and are produced by a back-splicing reaction that generates a covalent bond between the 3' end of an exon and the 5' end of an upstream exon (Fig. 1B). This reaction is promoted by the presence of sequences of reverse complementarity—such as human Alu repeats—in introns (Zhang et al., 2014). Given that circRNAs lack a 5'-cap modification, cap-dependent translation cannot occur in principle. However, a synthetic circRNA containing an internal ribosome entry site (IRES) was shown to be translated in vitro (Chen and Sarnow, 1995), suggesting the possibility that endogenous circRNAs with an IRES might be translated. Indeed, recent studies have shown that endogenous peptides are produced by circRNAs in a cap-independent manner (Fig. 1B) (Legnini et al., 2017; Pamudurti et al., 2017). Some circRNAs have even been found to generate proteins containing >100 amino acids (Legnini et al., 2017).
Primary miRNAs (pri-miRNAs), precursor ncRNAs that undergo enzymatic excision in the nucleus to yield mature miRNAs, have also been shown to encode peptides (Fig. 1C) (Ambros, 2004; Bartel, 2004; Lauressergues et al., 2015). These RNA molecules are generated in both animals and plants by the same enzyme, RNA polymerase II (RNAPII), that mediates the synthesis of protein-coding mRNAs, and they are modified with a 5' cap and 3' polyadenylate tail. These properties suggest that pri-miRNAs may produce proteins if they are transported to the cytosol without processing (Waterhouse and Hellens, 2015). It is of note that pri-miR171b of Medicago truncatula and pri-miR165a of Arabidopsis thaliana were found to be translated to yield peptides that enhance the expression of the corresponding miRNAs (Fig. 1C). The mechanism by which these pri-miRNAs avoid processing by Dicer-like 1 (DCL1) remains unknown, however.
In this review, we focus on ncRNAs that do actually encode peptides—including lncRNAs, circRNAs, and pri-miRNAs—and we address the functions and biological relevance of these molecules.
Most coding RNAs contain one long ORF and many short ORFs, with the long ORF typically encoding a functional protein (Fig. 2A). However, putative ncRNAs that encode hidden peptides contain only short ORFs that usually comprise <300 nucleotides, and the longest of these ORFs are frequently not responsible for encoding functional peptides, making it difficult to predict ORFs that do encode hidden peptides (Fig. 2A). Cross-species genomic comparisons are useful for the prediction of ORFs, given that the sequences of translated regions tend to be conserved under evolutionary pressure. PhyloCSF is a well-designed computational tool for ORF prediction that detects high frequencies of synonymous codon and conservative amino acid substitutions (Fig. 2B) (Lin et al., 2011).
Identification of hidden peptides. (A) ORF prediction. Whereas the longest ORFs encode functional proteins in typical coding RNAs, they do not necessarily encode functional peptides in hidden peptide–coding RNAs. (B) PhyloCSF computational analysis. The analysis for the SPAR polypeptide is shown, with the coding region yielding a positive score for frame 3. Ex, exon. (C) Ribosome profiling. The polysome fraction is purified by sucrose density gradient centrifugation and subjected to nuclease digestion, with the resultant ribosome-protected RNA fragments then being isolated, sequenced, and mapped to the genome. Polysomes can also be isolated by affinity pull-down of tagged ribosomes. (D) Peptidomics. Peptides are concentrated and then analyzed by MS, and the MS/MS spectra are compared with a custom peptide sequence database generated by prediction based on RNA sequences. Spectra that do not match known proteins are selected for identification of novel hidden peptides.
Experimental approaches, such as deep sequencing–based ribosome profiling and mass spectrometry (MS)–based peptidomics, have also been established for ORF identification. Ribosome profiling relies on deep sequencing to detect ribosome-protected RNA fragments, with such protection being indicative of active translation (Fig. 2C) (Ingolia, 2016; Ingolia et al., 2009). In this approach, RNA molecules with bound ribosomes, known as polysomes, are purified by sucrose density gradient centrifugation or affinity pull-down of epitope-tagged ribosomes and are then treated with nucleases, resulting in the generation of ribosome-protected RNA fragments. These fragments are sequenced and mapped to the genome in order to identify the region of active translation (Fig. 2C).
Whereas ribosome profiling provides indirect evidence of translation, MS is able to detect translation directly. Peptidomics is a modified form of proteomics that combines MS and computational prediction (Fig. 2D) (Ma et al., 2016; Matsumoto et al., 2017; Slavoff et al., 2013). Peptides are usually concentrated with the use of a molecular weight cutoff filter (MWCO) or sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS-PAGE) in order to reduce the background signal due to larger annotated proteins. MS/MS spectra are compared with a custom peptide database comprising a list of all possible short ORFs predicted from cDNA sequences in databases and RNA-sequencing data sets. Matched peptides corresponding to known annotated proteins or peptides are discarded, and the remaining unique peptides are annotated as novel hidden peptides (Fig. 2D).
In Drosophila embryos, the specialized cytoplasm in the posterior region of the egg, referred to as pole plasm, is essential for the formation of germ cells. Pole plasm contains electron-dense structures known as polar granules that become incorporated into the germ cell precursors known as pole cells. An RNA molecule, designated pgc, was found to be localized to polar granules and was originally identified as an untranslatable RNA on the basis of the absence of conserved ORFs for products of >100 amino acids (Nakamura et al., 1996). However, completion of the Drosophila genomic sequence revealed an error in the original pgc sequence and thereby led to the identification of an ORF encoding a 71–amino acid polypeptide that was highly conserved among 12 Drosophila species (Hanyu-Nakamura et al., 2008). Immunostaining with antibodies to the Pgc polypeptide revealed its localization in pole cells.
The repression of RNAPII-dependent transcription in primordial germ cells is a common mechanism for the maintenance of these cells through prevention of somatic differentiation in many animal species. This transcriptional repression is associated with the loss of phosphorylation of the COOH-terminal domain (CTD) of RNAPII. The Pgc polypeptide was found to interact with positive transcription elongation factor b (P-TEFb), which phosphorylates the CTD of RNAPII, and Pgc-deficient pole cells failed to prevent this phosphorylation reaction (Hanyu-Nakamura et al., 2008; Martinho et al., 2004). Although Pgc had no apparent effect on the kinase activity of P-TEFb, it was found to inhibit the recruitment of P-TEFb to transcription sites and thereby to prevent CTD phosphorylation (Fig. 3A).
Molecular mechanisms of action for peptides encoded by lncRNAs. (A) Pgc interacts with P-TEFb and thereby inhibits its recruitment to transcription sites, resulting in attenuation of CTD phosphorylation and RNAPII-dependent transcription. (B) The binding of Pri to Ubr3 promotes ubiquitylation of the NH2-terminal region of Svb by Ubr3 and consequent proteasome-dependent processing of the long inactive form of Svb to the short active one. (C) Apela functions as an agonist for the G protein–coupled receptor Aplnr. (D) MLN and Dworf associate with and modulate the Ca2+ pump activity of SERCA, and they thereby regulate the Ca2+ concentration of the SR. (E) SPAR binds to the v-ATPase complex at the lysosomal membrane and inhibits amino acid–dependent mTORC1 activation through regulation of the v-ATPase–Ragulator supercomplex.
An mRNA-like lncRNA, designated pri, was identified in Drosophila melanogaster as a poly(A)-containing RNA with no ORFs for products of >100 amino acids (Inagaki et al., 2005). The pri RNA is expressed in seven stripes during early development and in metameric and bilateral clusters during mid-embryogenesis. Mutant flies deficient in pri RNA die during embryogenesis, with the embryos manifesting a smooth-cuticle phenotype that is characterized by loss of trichomes and aberrant tracheal architecture and was therefore named polished rice (pri). Five small and highly conserved ORFs were detected in the pri RNA sequence, with ORFs 1 to 4 encoding products containing the amino acid sequence LDPTGQY or LDPTGTY. Translation of these four ORFs, but not that of ORF 5, was detected with the use of a series of constructs in which the sequence for green fluorescent protein was substituted for each ORF in the full-length transcript. ORFs 1 to 3 encode peptides of 11 amino acids, whereas ORF 4 encodes a peptide of 32 amino acids. The phenotype of pri-deficient mutant flies was completely rescued by introduction of ORFs 1 to 4 individually, suggesting that the encoded common amino acid sequence is responsible for pri function (Kondo et al., 2007).
The peptides encoded by ORFs 1 to 4 of pri RNA were shown to regulate the transcription factor encoded by shavenbaby (svb) (Kondo et al., 2010), with expression of Svb target genes being greatly attenuated in pri-deficient mutants and mutation of svb resulting in an aberrant trichome pattern (Payre et al., 1999; Sucena et al., 2003; Zanet et al., 2015). Indeed, Svb is translated as a long inactive form, and Pri peptides promote truncation of the NH2-terminal region of Svb to generate the short active form of the protein (Kondo et al., 2010).
A genome-wide RNA interference (RNAi) screen revealed that the ubiquitin-proteasome system likely contributes to the Pri-dependent processing of Svb. The NH2-terminal region of Svb thus undergoes ubiquitylation mediated by Ubr3 and consequent proteasome-dependent processing that converts the long inactive form of Svb to the short active one. Pri binds to Ubr3, and this interaction promotes the association between Ubr3 and the NH2-terminal region of Svb (Fig. 3B). Ubr3 is able to recognize other substrates in the absence of Pri, suggesting that Pri is required for the substrate selectivity of Ubr3 (Zanet et al., 2015).
(3) ApelaApela (also known as ELABELA, Ende, and Toddler) is a hormonal peptide that acts as an agonist for the G protein–coupled receptor Aplnr (Apelin receptor). The Apela transcript had been annotated as a lncRNA in zebrafish (ENSDARG00000094729), mouse (Gm10664), and human (LOC100506013), but was also found to be a peptide-coding mRNA in zebrafish (Chng et al., 2013; Pauli et al., 2014). Apela consists of 54 amino acids that include a signal peptide sequence and is highly conserved among vertebrates. Loss of Apela in zebrafish results in embryonic death, with the mutant embryos manifesting failure of heart formation, posterior accumulation of blood cells, malformation of pharyngeal endoderm, and abnormal left-right positioning and formation of the liver. The similarity of the phenotypes of Apela-deficient embryos to those of Aplnr mutants (Scott et al., 2007; Zeng et al., 2007) validates the function of Apela as the agonist for Aplnr (Fig. 3C).
About half of Apela-deficient mice also die during embryogenesis as a result of hypovascularity of the yolk sac and placenta as well as embryonic vascular malformations (Ho et al., 2017), which are again similar to the phenotypes of Aplnr-deficient mice (Kang et al., 2013). Although Aplnr is expressed in the yolk sac mesoderm that gives rise to endothelial cells, the Apela mRNA is not detectable in the endothelial precursors of the yolk sac. In contrast, Apela is expressed in the chorionic trophoblast of the developing placenta. The Apela peptide is secreted from the placenta and circulates in the blood of dams. Apela-deficient pregnant mice manifest preeclampsia-like signs including proteinuria and hypertension, indicating that Apela is a pregnancy hormone secreted by the developing placenta. Moreover, this preeclampsia-like condition is ameliorated by infusion of a recombinant Apela peptide, suggesting that Apela might have clinical applications in human pregnancy (Ho et al., 2017).
(4) SERCA regulatorsThe myoregulin (MLN) transcript was originally identified as a skeletal muscle–specific lncRNA conserved among vertebrates (LINC00948 in human and 2310015B20Rik in mouse), but it possesses a conserved ORF for a 46–amino acid peptide (Anderson et al., 2015). The translation of MLN RNA was confirmed by both in vitro translation and in vivo FLAG-tag knock-in assays. The predicted ORF product is a type II single-pass transmembrane peptide that shows substantial structural similarity to phospholamban (PLN) and sarcolipin (SLN), both of which are also type II single-pass transmembrane peptides. PLN and SLN interact with and thereby regulate the Ca2+ pump activity of sarco/endoplasmic reticulum Ca2+-ATPase (SERCA) in the sarcoplasmic reticulum (SR) membrane. Such regulation is important for muscle performance, as indicated by the fact that its impairment can give rise to cardiovascular disease (Kranias and Hajjar, 2012). The MLN peptide is also localized at the SR membrane and associates with SERCA to regulate the Ca2+ concentration of SR through inhibition of SERCA pump activity (Fig. 3D). Mice deficient in MLN manifest enhanced Ca2+ handling and skeletal muscle performance (Anderson et al., 2015).
The Dworf peptide also regulates SERCA pump activity and was originally identified as the putative product of a lncRNA (LOC100507537 in human and NONMMUG026737 in mouse) (Nelson et al., 2016). Immunoblot analysis with antibodies to the encoded peptide sequence confirmed production of Dworf, which was found to be expressed specifically in heart and soleus muscle. Dworf is a transmembrane peptide localized at the SR membrane and binds to SERCA and thereby promotes its pump activity (Fig. 3D). The role of Dworf in vivo was examined by the generation of Dworf-deficient mice and mice overexpressing Dworf under control of the cardiomyocyte-specific promoter of the α–myosin heavy chain gene. SERCA activity was increased in the heart of the Dworf-overexpressing mice, whereas the affinity of SERCA for Ca2+ was slightly reduced in that of Dworf-deficient mice. Consistent with Dworf being most highly expressed in soleus, the decline in the affinity of SERCA for Ca2+ was more prominent in this muscle than in the heart of the Dworf-deficient mice. The lack of Dworf expression in quadriceps was also reflected by the lack of a change in SERCA affinity in this muscle of Dworf-deficient mice (Nelson et al., 2016).
A bioinformatics approach identified two additional SERCA-inhibiting peptides, endoregulin (ELN) and another-regulin (ALN), both of which are translated from putative lncRNAs (1110017F19Rik/SMIM6 and 1810037I17Rik, respectively, in mouse) and contain a SERCA binding motif (Anderson et al., 2016). Whereas MLN, PLN, and SLN are abundant in muscle, however, ELN and ALN are expressed in nonmuscle tissues, suggesting that a common mechanism may underlie the control of intracellular Ca2+ dynamics in both muscle and nonmuscle tissues.
(5) SPARTo identify novel peptides encoded by putative lncRNAs, we have adopted a proteomics approach. We selected a polypeptide of 90 amino acids translated from human LINC00961, which we termed Small regulatory Polypeptide of Amino Acid Response (SPAR, also known as SPAAR), for further characterization given its high level of sequence conservation between human and mouse. Immunoblot analysis with antibodies to SPAR verified expression of the endogenous polypetide in both human cell line and mouse tissue. SPAR possesses a single putative transmembrane domain and is localized at the lysosomal membrane. Immunoprecipitation followed by MS identified proteins that interact with SPAR, including the vacuolar-type H+-ATPase (v-ATPase) complex, the proton pump activity of which is required for lysosomal acidification. v-ATPase also regulates activation of mammalian target of rapamycin complex 1 (mTORC1) in response to amino acid stimulation through interaction with the Ragulator complex at the lysosome (Bar-Peled and Sabatini, 2014). Of note, SPAR had no obvious effect on lysosomal acidification but was found to inhibit mTORC1 activation by amino acids through regulation of the v-ATPase–Ragulator supercomplex (Fig. 3E) (Matsumoto et al., 2017).
To evaluate the biological functions of SPAR in vivo, we have generated SPAR-deficient mice. These animals manifest no apparent developmental abnormalities, but their skeletal muscle shows enhanced regeneration capacity after acute injury as well as enhanced mTORC1 activation. These observations are consistent with the abundance of Spar mRNA in skeletal muscle of wild-type mice as well as with the fact that amino acid–induced mTORC1 activation is required for regeneration after muscle injury, and they suggest that mTORC1 activation is precisely regulated by the SPAR polypeptide in a tissue-specific manner (Matsumoto et al., 2017).
Analysis of published ribosome footprinting data sets resulted in the identification of 37 circRNAs that contain reads of the footprinting across the circRNA-specific junctions in Drosophila, suggesting that these circRNAs are actually translated (Pamudurti et al., 2017). Translation of one of these 37 circRNAs, circMbl, was confirmed by forced expression of an intron-exon-intron minigene in both Drosophila S2 cells and transgenic flies. The minigene is able to produce a V5-tagged polypetide only if circularized, given that the V5 tag is located at the COOH-terminus of the predicted product upstream of the initiating methionine. By contrast, no translation was observed with corresponding constructs for circHaspin and circCamK1, neither of which was present in the ribosome footprinting data sets. As mentioned above, the absence of a 5' cap suggests that circRNAs are translated in a cap-independent manner (Fig. 1B). Indeed, the expression of the V5-tagged polypeptide encoded by circMbl was not affected by overexpression of the protein 4E-BP, which inhibits cap-dependent translation. Moreover, both in vivo and in vitro assays revealed that the untranslated region of circMbl supports cap-independent translation, suggesting the presence of a putative IRES element (Pamudurti et al., 2017).
Although expression of the endogenous peptide encoded by circMbl was barely detectable in total fly lysates, immunoblot analysis revealed a band of the expected size (slightly smaller than 10 kDa) in a synaptosomal fraction prepared from fly heads (Pamudurti et al., 2017). The signal intensity for this band was increased by starvation, which promotes cap-independent translation, or by overexpression of FOXO, which inhibits cap-dependent translation. These results thus suggest that the endogenous peptide encoded by circMbl is indeed produced by translation in a cap-independent manner (Pamudurti et al., 2017).
(2) circZNF609The expression levels of genes for certain circRNAs that are conserved between human and mouse are high in myoblasts of both species and increase further during myoblast differentiation. A similar trend is observed for different circRNAs associated with neuronal differentiation, whereas the expression levels of a specific subset of circRNAs in myoblasts have been found to be altered in individuals with Duchenne muscular dystrophy (DMD) (Rybak-Wolf et al., 2015). On the basis of criteria such as phylogenetic conservation and expression level, 29 molecules were chosen for further characterization from a list of circRNAs that are differentially expressed during human myoblast differentiation or in DMD. RNAi-based functional screening of these 29 circRNAs highlighted circZNF609, the depletion of which inhibited human myoblast proliferation. Whereas circZNF609 is down-regulated during myogenesis, it is up-regulated in myoblasts of DMD patients, suggesting that overexpression of circZNF609 might contribute to the delayed myoblast differentiation phenotype of such patients (Legnini et al., 2017).
A predicted ORF of 753 nucleotides spanning the splicing junction has been identified in circZNF609. The protein-coding potential of circZNF609 was validated with an expression vector encoding the 3FLAG tag at the COOH-terminus of the predicted protein upstream of the initiating methionine, with production of the 3×FLAG-tagged protein being possible only from the circular form of the vector. Expression of the 3×FLAG-tagged protein was induced by heat shock stress, consistent with the activation of cap-independent translation by various external stimuli (Legnini et al., 2017).
Peptide-encoding pri-miRNAs have recently been identified in plants (Lauressergues et al., 2015). The observation that pri-miRNAs are transcribed by RNAPII is consistent with their coding potential. Indeed, pri-miR171b contains two putative ORFs for peptides of 20 or 5 amino acids, with miR171b regulating the formation of lateral roots in M. truncatula. Whether or not these ORFs of pri-miR171b are translated was examined with the use of a β-glucuronidase reporter gene fused to each ATG start codon. Translation was detected only when the reporter gene was fused to the start codon of the ORF encoding 20 amino acids. Endogenous expression of the encoded 20–amino acid peptide, designated miPEP171b, was detected by both immunoblot analysis and immunostaining with specific antibodies, with the expression being apparent at lateral root initiation sites. Another pri-miRNA–encoded peptide, miPEP165a, was identified in A. thaliana. The amino acid sequence of miPEP165a is highly conserved in Brassicales. Immunoblot analysis with antibodies to miPEP165a confirmed its endogenous expression (Lauressergues et al., 2015).
Both miPEP171b and miPEP165a have similar functions. Treatment of M. truncatula roots with synthetic miPEP171b was found to increase endogenous expression of miR171b and to reduce lateral root density, with the latter phenotype being mimicked by forced expression of miR171b. This effect of miPEP171b on the abundance of its corresponding miRNA was specific, with the peptide having no effect on the amount of other miRNAs. Treatment of A. thaliana seedlings with miPEP165a also resulted in the accumulation of miR165a. Additional treatment with cordycepin, an inhibitor of RNA synthesis, attenuated the induction of miR165a, indicating that miPEPs induce their pri-miRNAs at the transcriptional level (Lauressergues et al., 2015).
We have here described examples of biologically relevant peptides encoded by lncRNAs, circRNAs, and pri-miRNAs. Although not covered by this review, many other peptides encoded by ncRNAs—or “hidden” peptides—have been identified. The existence of more such peptides is expected, but their detection is technically difficult given that they yield a limited number of fragments on trypsin digestion and are therefore not optimal for analysis by MS. Development of a technique geared to the identification of such short peptides is thus awaited. The biological activities of hidden peptides might provide the basis for the development of novel therapeutics for currently intractable and other diseases.
We apologize to researchers whose work is not cited in this review as a result of space constraints. This work was supported by JST PRESTO program JPMJPR14MB (to A.M.) as well as by JSPS/MEXT KAKENHI grants 25221303 and 17H06301 (to K.I.N.).