Proceedings of the Japan Academy, Series B
Online ISSN : 1349-2896
Print ISSN : 0386-2208
ISSN-L : 0386-2208
Reviews
Discovery of m7G-cap in eukaryotic mRNAs
Yasuhiro FURUICHI
Author information
JOURNAL FREE ACCESS FULL-TEXT HTML

2015 Volume 91 Issue 8 Pages 394-409

Details
Abstract

Terminal structure analysis of an insect cytoplasmic polyhedrosis virus (CPV) genome RNA in the early 1970s at the National Institute of Genetics in Japan yielded a 2′-O-methylated nucleotide in the 5′ end of double-stranded RNA genome. This finding prompted me to add S-adenosyl-L-methionine, a natural methylation donor, to the in vitro transcription reaction of viruses that contain RNA polymerase. This effort resulted in unprecedented mRNA synthesis that generates a unique blocked and methylated 5′ terminal structure (referred later to as “cap” or “m7G-cap”) in the transcription of silkworm CPV and human reovirus and vaccinia viruses that contain RNA polymerase in virus particles. Initial studies with viruses paved the way to discover the 5′-cap m7GpppNm structure present generally in cellular mRNAs of eukaryotes. I participated in those studies and was able to explain the pathway of cap synthesis and the significance of the 5′ cap (and capping) in gene expression processes, including transcription and protein synthesis. In this review article I concentrate on the description of these initial studies that eventually led us to a new paradigm of mRNA capping.

I. Introduction

Cap structures of the type m7GpppNmpNm are at the 5′ ends of nearly all eukaryotic cellular and viral mRNAs (Fig. 1).1)

Fig. 1.

Cap structure. m7GpppNm pNm- (Cap 2) representing the m7G linked to the 5′-end of the primary transcript via a 5′-5′ triphosphate.

The cap is added to cellular mRNA precursors made by RNA polymerase II and to transcripts of viruses that replicate in the nucleus. It is added during the initial phases of transcription and before other RNA processing events, including internal N6A methylation, 3′-poly(A) addition and exon splicing.2),3) Most capped mRNAs have a single methyl group on the terminal G residue at the N-7 position, whereas adjacent nucleotides can be 2′-O-methylated to different extents, providing a basis for cap nomenclature: m7GpppN (Cap 0), m7GpppNm (Cap 1), and m7GpppNmpNm (Cap 2).4) Methyl groups in Cap 0 and Cap 1 structures are added in the nucleus; additional 2′-O-methylation in Cap 2 is a cytoplasmic event. In many lower eukaryotes, including yeast, mRNAs contain mainly Cap 0, and higher organisms usually have more extensively methylated caps.1)3) Despite variations in the extent of methylation important biological consequences of a cap structure appear to correlate with the N7-methyl on the 5′-terminal G and the two pyrophosphoryl bonds that connect m7G in a 5′-5′ linkage to the first nucleotide of mRNA. Caps increase mRNA stability by protecting against 5′→3′ exonucleolytic degradation.5)7) Splicing accuracy and efficiency both increase by the presence of 5′-terminal m7GpppN, possibly because of the participation of nuclear cap-binding protein(s) analogous to the cytoplasmic cap-binding initiation factor required for eukaryotic protein synthesis.8)10) The 5′-terminal m7GpppN may also participate in other critical steps in gene expression, including transport from nucleus to cytoplasm.

The 5′ cap and 3′-poly (A) are hallmark structures of eukaryotic mRNAs and are not present in rRNA, tRNA or prokaryotic mRNAs. These two structures have been the subjects of many review articles on their biological significance. Since their discovery in the early 1970s, the 5′- and 3′-terminal elements have contributed to the progress of molecular biology in many important ways including their utilization to distinguish and isolate various eukaryotic mRNAs. This article describes how the cap structure m7GpppNm was discovered, how the cap is synthesized, and what kind of biological functions the cap has in eukaryotic cellular events.

II. Historical view of the discovery of cap structure

A. Prologue.

Before the identification of the cap in several viral mRNAs in 1975, the 5′ ends of all eukaryotic mRNAs, like those of Escherichia coli and bacteriophage mRNA, were generally believed to be triphosphorylated pppN forms, although no eukaryotic mRNA had been reported as 5′ end-labeled by the standard, polynucleotide kinase method. Large-scale isolation of mRNA from eukaryotic cells was technically difficult, and the prevailing biochemical studies on E. coli and its phage had clearly shown that the mRNAs were triphosphorylated. E. coli mRNA contained pppA and pppG 5′ ends, and all RNA phages, such as MS2, Q-beta, and R17 contained a 5′-triphosphorylated purine nucleoside in the genomic positive-strand RNA (and thus mRNA) extracted from purified particles. In contrast, rRNAs and tRNAs, which are abundant in both eukaryotic and bacterial cells, had been shown to contain monophosphorylated 5′ ends.11) Modified methylated nucleotides were believed to be present only in tRNA and rRNA. Here, like other important biological findings made in the 1970s, including 3′-polyadenylation, splicing, and reverse transcription, viral systems had a pivotal role in defining the structure of the cap and the biological significance of the cap in the 5′ terminus of mRNA.

B. Stage 1: Identification 2′-O-methylated nucleotides at the 5′ ends of CPV genomes.

At the National Institute of Genetics in Japan, I and Kin'ichiro Miura, a senior co-researcher at the institute, tried to define the 5′- and 3′-terminal sequences of dsRNA of cytoplasmic polyhedrosis virus (CPV) that infects silkworm Bombyx mori. Our initial interest was to understand how 10 genomic segments of CPV could be correctly arranged within the viral particle (Fig. 2A).12),13) We divided the experiments in half: 3′ sequencing by myself and 5′ sequencing by Miura. The 3′ sequencing was done by using a classical method that contains a series of chemical reactions with a stepwise reductive labeling of 3′-terminal nucleotides by 3H-borohydride after converting the cis-diol ribose moiety to two aldehyde residues by periodate-mediated oxidation (Fig. 2B). The 3H-labeled 3′-terminal nucleoside was cleaved by ribonuclease and was analyzed by two dimensional paper chromatography that can identify nucleoside derivatives (A′, C′, G′ and U′) with modified ribose residues. The method also can label the penultimate nucleotide after removal of the 3′ terminal nucleotide by β-elimination reaction with aniline and subsequent removal of 3′-phosphate by alkaline phosphatase. These efforts yielded identical terminal sequences of   

for each dsRNA segment that was separated by polyacrylamide gel electrophoresis (Fig. 2A).12) However, unexpected was that nearly 30% of the 3H radioactivity was always incorporated into an undefined component of dsRNA, referred to as non-nucleoside material (NNM), while the remaining 3H counts were evenly distributed in the two terminal nucleosides C′ and U′ that were identified after digestion of labeled RNA by ribonucleases (Fig. 2B and C).13)

Fig. 2.

CPV dsRNA genome segments and the 3′-terminal analysis. A: Polyacrylamide gel electrophoresis of CPV genome dsRNA segments. B: 3′ 3H-labeling of CPV genome dsRNAs by periodate oxidation followed by [3H]-NaBH4. C: Distribution of 3H-radioactivity in 3′-terminal nucleosides and NNM (non-nucleoside material). Parts of the data are reprinted from the paper by Furuichi and Miura (1973).13)

The NNM could be ignored as rubbish, but I analyzed intensively the property of NNM. The NNM (which we later proved to be Cap 1) was resistant to treatments with alkali, protease, DNase, and ribonucleases, but was sensitive to snake venom phosphodiesterase as characterized by paper chromatography. It had a net negative charge of −4.5 at neutral pH and a size of 5.4 Å as determined by DEAE-cellulose column chromatography in the presence of 7M urea.13)

Miura, however, was faced with the difficulty of labeling the 5′ terminal nucleotides, possibly G and A, based on the predicted end-to-end duplex structure of CPV dsRNA, by treatment with polynucleotide kinase and [γ-32P]ATP.14) However, this was achieved by dsRNA, whose 3′-terminal nucleotides C and U were removed by sequential oxidation (by sodium periodate), β-elimination (by aniline), and dephosphorylation (by alkaline phosphatase). The processed RNA yielded a clear-cut result of 3′-terminal C, for both strands, in the penultimate positions to the terminal C and U, indicating that the 3′-terminal sequences of CPV RNA are ---pCpC and ---pCpU. Curiously, the NNM that was labeled by 3H in the first round 3′ terminal nucleoside analysis was eliminated after the second 3H-labeling to the penultimate C. The dsRNA that was shortened by the removal of terminal nucleoside C and U provided a far better substrate for 5′-32P-labeling by polynucleotide kinase and [γ-32P]-labeled ATP than the unprocessed dsRNA.

The results have remained unclear until the end-to-end base paired CPV dsRNA genome structure was clarified later and one of the 5′ termini was found to be protected by m7G-cap. The capped 5′ end provides an additional 2′-3′ cis-diol moiety of ribose, besides two 3′ terminal riboses, all of which are susceptible to periodate oxidation and β-elimination reaction by aniline.

It was indeed enigmatic incidents at the time, however, as it turned out, the 3H-labeling by NaBH4 took place on the ribose moiety of m7G-cap to yield 3H-NNM and the removal of cap and 3′ terminal nucleotides converted the 5′ ends of CPV dsRNA to an one-base-overhanged structure, which facilitated polynucleotide kinase to phosphorylate 5′ pAm and pG (Fig. 3).

Fig. 3.

Reaction scheme employed for analysis of 5′ and 3′ terminal structures of CPV genome RNA and retrospective identification of NNM and cap structure. (I): CPV genome RNA. (II): Periodate oxidized RNA, 2′ and 3′ OH were oxidized to aldehydes. (III): [3H]-NaBH4 reduced RNA which contains 3H labeled-CH2OH (shown by blue). The shadowed structure was turned out to be NNM later. (IV): CPV RNA produced from (III) by aniline treatment which removes the ribose-oxidized nucleoside. (V): Alkaline phosphatase-treated CPV RNA (IV), which is devoid of sensitive phosphates. (VI): 5′-32P-labeled CPV RNA which was phosphorylated by polynucleotide kinase and [γ-32P]ATP (the radioactive phosphates are shown by red).

The Fig. 3 depicts the reaction scheme employed for analysis of terminal structure of CPV RNA and retrospective explanation for the outcome of individual reactions.

Subsequently, the 5′-nucleotides of CPV RNA were found to have 5′ pG and 5′ pA that made a perfect match with the complementary strand in the form:   

However, Miura was puzzled by the slight difference in mobility of 32P-labelled pA* compared with authentic pA in the two-dimensional paper chromatography system used to identify the 32P-labelled 5′-nucleotides: the pA* obtained by P1 nuclease digestion migrated slightly faster than the authentic marker pA. Further analyses of 5′ oligonucleotides resulting from the 5′-32P-labelled RNA after digestion by guanine-specific RNase T1, pyrimidine-specific pancreatic ribonucleaseA or nonspecific ribonucleaseT2 indicated that pG was the penultimate nucleotide to both 5′ pA* and 5′ pG, establishing pA*pG and pGpG as the 5′ sequences of CPV dsRNA segment. More importantly, the results showed that the 2′OH of the ribose in pA* of pA*pG was modified because the phosphodiester linkage between pA* and pG could not be digested by RNaseT2, which requires a free OH for 2′, 3′-cyclic phosphate formation as an RNA hydrolysis intermediate.

As in the days when chemistry was used to identify modified bases often found in tRNA, we readily obtained authentic 2′-O-methyl-pA to compare its chromatographic mobility with that of the 32P-labeled pA*. Both migrated to the identical position, showing clearly that A* was 2′-O-methyl-adenosine. This finding indicated for the first time that virus RNAs, such as genomic CPV RNA, contains methylated residues.14) Accordingly, CPV genome RNA was considered to be in an end-to-end structure:   

A similar study by Miura et al.15) showed that human reovirus dsRNA genomes also contained a 2′-O-methylated nucleotide in the 5′ terminal sequence of pGmpCp in one of the strands. Despite the exciting findings of 2′-O-methylated nucleotides in the 5′ sequence of virus RNAs, the NNM in CPV dsRNA and its structural relationship to the 5′ pAmpGp remained unclear, although apparently NNM somehow protected the strand containing pAmpGp from phosphorylation by polynucleotide kinase and from end-labelling.

C. Stage 2: Identification of mRNA strands formed in vitro by CPV transcriptase.

Viruses in the Reoviridae family containing dsRNA, including the prototype human reovirus, contain RNA polymerase, which conservatively copy the plus strands in duplex genomic RNAs to produce viral mRNAs. Thus, one of the two strands comprising the dsRNA has the same polarity of mRNA. Synthesis of viral mRNAs by in vitro transcription using purified reovirus had already been demonstrated.16) Shimotohno et al.17),18) examined which of the two strands of CPV dsRNA had the same polarity as mRNA, and they found that CPV RNAs synthesized in vitro by virus-associated RNA polymerase had the same polarity as the pAmpGp-strand. Those CPV transcripts made in vitro were labeled by [γ-32P]ATP in the 5′-triphosphorylated form, 32p-ppApGp-, suggesting that the strand with 5′-pAmpG in the genomic dsRNA has the same polarity as virus mRNA.   

As a result, questions were raised: 1) when the methyl group in 5′-pAmpG was added or 2) is the methyl group found in the genomic RNA added during the mRNA synthesis or during the genome RNA synthesis, for example, to assemble segmented genomes?

D. Stage 3: “Methylation-coupled transcription” in CPV mRNA synthesis that uses S-adenosyl-L-methionine.

In the fall of 1973, I tested an idea that CPV transcriptase may require the methyl donor S-adenosyl-L-methionine (AdoMet), considering that methylation of mRNA might be a prerequisite for natural viral mRNA synthesis in infected silkworm cells. The results were indeed surprising, far beyond expectation, because the CPV transcriptase activity was stimulated nearly 100-fold, and mRNA synthesis appeared to proceed in an AdoMet dependent manner (Fig. 4).19)

Fig. 4.

Stimulation of CPV mRNA synthesis by S-adenosyl-L-methionine. Data with slight modifications are obtained from the paper by Furuichi (1974).19)

In addition, an experiment that used [3H-methyl]-labeled AdoMet clearly showed for the first time that virus mRNAs were methylated. The methylation of mRNA appeared to occur at the initiation of transcription and a limited number of methyl groups from [3H] methyl-labeled AdoMet were incorporated into each molecule of CPV RNA.19) One methyl group each was incorporated into the 2′-OH residue of 5′ terminal pA and into NNM that links to the 5′-terminal pAm in a structure of NNM∼pAm (representing the pAm-containing NNM). When [γ-32P]ATP was included in the CPV transcription in the presence of AdoMet, the product mRNA did not contain the 32P-labeled phosphate, in contrast to the 5′-pppA containing mRNA synthesized in the absence of AdoMet, indicating that the γ phosphate of 5′-terminal 32pppA was somehow eliminated when AdoMet was included in the transcription reaction mixture.19)

By contrast, the radioactive β phosphate in [β-32P]ATP remained in the newly synthesized mRNA and was found in the 5′ NNM∼pAm. This finding helped me to decide the overall mechanism of cap synthesis later. In fact, the 32ppA moiety of p-32ppA was incorporated into the CPV mRNA in the form of NNM∼32ppAm in which 32P-labeled phosphate was resistant to alkaline phosphatase. I reported some of these unique observations as “Methylation-coupled transcription” in volume 1 of Nucleic Acids Research in 1974.19) Since this report, methylation in mRNA synthesis has quickly become popular, and AdoMet has been added in many laboratories as a regular component to the in vitro transcription system of cells and viruses that contain RNA polymerase.20)22)

At the same time, Perry and Kelley23) reported the presence of 2.2 residues of methylated nucleotides in every 1000 residues in mRNA containing poly(A) prepared in vivo from mouse L cells cultured in the presence of [3H-methyl]methionine. Here, earlier findings of 3′-poly (A) in vaccinia virus and cellular mRNAs, and an invention of a purification method for mRNAs containing poly (A) by oligo(dT)-cellulose column chromatography greatly facilitated isolation and analysis of cellular mRNAs. Desrosiers et al.24) reported that mRNAs from Novikoff hepatoma cells contain m7G, 2′-O-methylated nucleotides and m6Ap, suggesting that the presence of methylated nucleosides in mRNA may reflect a cellular mechanism to process certain mRNA sequences. After discussions of methylation on cellular mRNAs and reovirus RNAs at the 1974 Gordon Research Conference, Rottman, Shatkin and Perry25) jointly predicted m7GppNm as the 5′ structure of eukaryotic mRNA in the newly started journal Cell. This structure (a freehand sketch) was close to m7GpppNm, but was incorrect with respect to the number of phosphates: two phosphates, instead of the correct three phosphates, between m7G and the first nucleotide of RNA. Those authors regrettably did not have solid chemical data supporting the 5′ blocked structure, nor did they have knowledge about the mechanism of how the structure was synthesized. The report was just too early to conclude the unprecedented blocked mRNA structure consisting of three phosphates, two methyl groups and one plus charge at the 5′ terminus. Thus, 1974 was the dawn of mRNA methylation; it was very competitive in the race to reach the truth, but it was certainly close to a complete explanation of the cap structure.

E. Stage 4: Characterization of NNM′∼ppAm and NNM′-ppGm, the 5′ terminal ends of CPV and reovirus mRNAs.

In June 1974, I joined Shatkin at the Roche Institute Molecular Biology to continue research on mRNA methylation using reovirus and CPV. Soon, similar NNM-ppGm (representing the pGm-containing NNM) was found in the reovirus mRNA synthesized by reovirus transcriptase in vitro in the presence of AdoMet. The 32P-labeled β-phosphate of [β,γ-32P]-labeled guanosine triphosphate (GTP) was incorporated into reovirus mRNA suggesting that CPV and reovirus mRNAs share a similar 5′-structure, although the first nucleotide of virus mRNA was different. Meanwhile, the NNM of CPV and reovirus remained to be characterized. The 5′ m7GppNm structure proposed by Rottman et al.25) failed to account for the 5′ NNM∼ppAm and NNM∼ppGm in CPV and reovirus mRNAs, respectively, because the net charge of the postulated m7GppNm was −1 short owing to the + charge in m7G while NNM had −2. The NNM of CPV and reovirus mRNAs synthesized in the presence of [3H-methyl] AdoMet was found to contain a 3H-methyl-labeled material that might be m7G or m7A, or an yet unidentified AdoMet-derivative (AdoMet has a +1 charge). An experiment that used [2-14C-methionyl]-AdoMet (containing non-radioactive methyl group) failed to label the mRNAs of either CPV or reovirus, nor it was incorporated into mRNA, showing that the NNM did not contain AdoMet itself. These results clearly limited the candidates for NNM to methylated nucleotide m7G- or m7A-related compound.

Regarding the origin of phosphate in 5′ NNM∼ppAm of CPV mRNA, the β phosphate of adenosine triphosphate (ATP) was shown to be included in the structure by the experiment with [β-32P]ATP prepared by a newly invented procedure (the details were published later in 1977),26) while γ phosphate of ATP was eliminated. Finally, the experiment that used [α-32p-labeled]GTP and 3H-methyl-labeled AdoMet showed that both 32P and 3H radioactivity were incorporated together into CPV NNM∼ppAm and reovirus NNM∼ppGm, indicating that NNM is the structure containing 7-methyl guanosine 5′ monophosphate, m7Gp. These results also clearly concluded that the 5′ terminal structure of CPV mRNA is m7GpppAmp-, which fitted in all aspects with the mysterious 3H-borohydride-labeled NNM found before in the analysis of CPV genome RNA. Indeed, CPV m7GpppAmp- had a net negative charge of −4.5 and about 5.4 Å in size and is susceptible to digestion by venome phosphodiesterase.13)

All these studies explained the initial questions regarding the genome structure of CPV and reovirus dsRNAs, and the mRNA structures synthesized in the presence of AdoMet.27),28)   

At the same time, in Japan and USA, similar structures of m7GpppAm and m7GpppGm were found in vaccinia virus mRNA synthesized in vitro in the presence of AdoMet by virion-associated RNA polymerase.21),22) Adding AdoMet into the in vitro viral transcription system initiated by me was so simple that the method encouraged many researchers. This method has indeed led a new insight into eukaryotic virus transcription, resulting in new mRNAs with methylation and a blocked 5′-terminal structure.19) The next question was “Does this bizarre structure found in viral mRNAs also exist in eukaryotic cellular mRNAs”?

F. Stage 5: Employment of nickname “cap” for the “blocked and methylated structure”.

The m7GpppAm and m7GpppGm found in viral mRNAs were referred as to “blocked and methylated structures” in the initial publications. However, they were soon given the short nickname of “cap” in a study to confirm the precursor-product relationship between heterogeneous nuclear RNA (hnRNA) and cytoplasmic mRNA. Poly(A)-containing mRNA of HeLa cells and precursor hnRNA were examined for their 5′-terminal structure after 32P-labeling and oligo (dT)-cellulose selection. A collaboration between the Shatkin and Darnell groups at the Rockefeller University clearly showed that both the mRNA in the cytoplasm and its precursor hnRNA in the nucleus contained the same general “blocked and methylated structures”, m7GpppNm.29) In one manuscript preparation meeting, Darnell suggested that we needed a nickname to replace the complicated “blocked and methylated 5′ structure” and said “Let’s call it ‘cap’ for short”: cap was initially introduced by Rottman et al.25) in their prediction of m7GppNm, but it was soon popularized by Shatkin and Darnell groups to mean m7GpppNm.30) Subsequent experiments done by us with 32P pulse-labeled hnRNA suggested that caps were made at the initiation of transcription. The m7G caps were added to the products of RNA polymerase II shortly after transcriptional initiation within a narrow window of nascent oligonucleotides (+20 to +40), indicating that the process is rapid and efficient.30),31) The name of cap became widespread quickly in the field of molecular biology and virology, and has further been expanded to wording for mRNA activity, such as “capping mRNA” or “decapping enzymes”.32)

III. Caps in eukaryotic mRNAs

A. Cellular mRNAs.

After the discovery of the cap structure in several virus mRNAs by us and others, studies were made worldwide to examine if the cap structure m7GpppN is present in mRNAs of various organisms. For example, I identified caps in mRNAs and precursor hnRNA of HeLa cells.29),30)

Also, Adams and Cory33),34) in Australia reported the bizarre 5′ terminal structure (namely the caps) in mRNA of mouse myeloma cells. Soon, the cap structure of m7GpppN(m) consisting of 7-methylguanosine linked to the 5′ end of the transcript by a 5′-5′ triphosphate bridge was found in all eukaryotic cells examined (Table 1). But prokaryotic cells, such as E. coli, do not contain a cap in their mRNAs, suggesting that caps in mRNA are restricted to eukaryotic cells that contain the nucleus. An interesting trend is in the degree of methylation with respect to the levels of evolution of eukaryotes. Most lower eukaryotes, including yeast, fungi and amebas, were found to contain the less methylated Cap 0 (m7GpppN); higher eukaryotes, including humans, have Cap 1 (m7GpppNm) and 2 (m7GpppNmpNm) in which the first and second nucleotides of mRNA are 2′-O-methylated. In addition to methylation in caps, one or two N-6-methyl adenosine (m6A) were found in every one thousand nucleotides, as mentioned before by Perry and Kelley23) and by Desrosier et al.24) Because methylation on the cap generally became more complex as organisms evolved, the existence of a cap on mRNA and its structural complexity are among the characteristics that distinguish eukaryotes from prokaryotes and higher eukaryotes from lower eukaryotes.

Table 1. Caps in eukaryotic cellular mRNAs
Source Types of Capa Other methylations
Cap 0 Cap 1 Cap 2
Human HeLa cells N1=Pum,Pym,m6Am N2=Pum,Pym m6A
HeLa cell histone N1=Am,m6Am,Gm N2=Pum,Pym n.d.
Mouse myeloma N1=Pum,Pym,m6Am N2=Pum,Pym m6A
Mouse erythroid N1=Pum,Pym,m6Am N2=Pum,Pym m6A
Mouse fibroblasts N1=Pum,Pym,m6Am N2=Pum,Pym m6A
Mouse kidney N1=Nm N2=Nm m6A
Rat hepatoma N1=Pum,Pym,m6Am N2=Pum,Pym m6A
Hamster kidney (BHK-21) N1=Pum,Pym,m6Am N2=Pum,Pym m6A,m5C
Monkey kidney (BSC-1) N1=Pum,Pym,m6Am N2=Pum,Pym m6A
Mouse immunoglobulin N1=Gm,m6Am N2=Am m6A
Human globin N1=m6Am,Am N2=Cm n.d.
Human gp 130 N1=Nm n.d. n.d.
Human WSb N1=Am,Gm,Um n.d. n.d.
Human RTSc N1=Am,Gm,Um n.d. n.d.
Mouse globin N1=m6Am,Am N2=Cm n.d.
Rabbit globin N1=m6Am N2=Cm n.d.
Duck globin N1=Nm n.d. n.d.
Chick ovalbumin N1=A N1=m6Am,Am N2=Pym n.d.
Trout protamine N1=Nm n.d.
Drosophila N1=C,A,G N1=Pym>Pum N2=Pum,Pym n.d.
Bombyx mori silk fibroin N1=Am N2=Um m6A
Aedes albopictus N1=Pum,Pym N2=Pum,Pym m6A
C. elegans   TMG-capped mRNA2,2,7 m3GpppN(m)-  
Tobacco hornworm oocyte GpppN n.d.
Brine shrimp N1=Am,Gm n.d. m6A
Sea urchin embryo N1=Pum>Pym n.d. m6A
Slime mold N1=A>G N1=Am (10%) n.d. n.d.
Neurospora N1=A>G n.d. n.d. n.d.
Yeast S. cerevisiae N1=A>G n.d. n.d. n.d.
Wheat embryo N1 n.d. n.d. n.d.
Maize N1 n.d. n.d. n.d.
Soybean seeds N1 N1=Nm n.d. n.d.

aCap 0: m7GpppN1pN2p-, Cap 1: m7GpppN1mpN2p-, Cap 2: m7GpppN1mpNm2p-. n.d., Not done; —, absent.

bWS: Human Werner syndrome gene transcript.

cRTS: Human Rothmund-Thomson syndrome gene transcript.

dTMG, Trimethylguanosine (2,2,7m3G).

B. Virus mRNAs.

Viruses adopt various types of strategies for their parasitic replication and proliferation in infected cells. With a few exceptions, notably the picornavirus group,35) eukaryotic viral mRNAs were found to contain the same cap structure as cellular mRNAs (Table 2), irrespective of their genomic structures, for example DNA or RNA, single-stranded or double-stranded, or negative-strand or positive-strand RNA or (DNA), and their replication strategies. This is probably because capped mRNAs are the fundamental, as well as functional, in host cells. Accordingly, the degree of methylation on viral mRNA caps correlates with host mRNA cap methylation, i.e., viruses that infect higher eukaryotic cells generally contain Cap 1 and Cap 2 structures, whereas mRNAs of viruses that infect unicellular hosts and plant cells contain Cap 0. After finding caps in mRNAs of cytoplasmic viruses, such as insect CPV, human reovirus and vaccinia virus, I investigated oncogenic viruses, such as avian Rous sarcoma virus and adenovirus, which have a replication phase in the nucleus. The studies showed these two viruses contained capped mRNAs synthesized by RNA polymerase II of host cells.36),37)

Table 2. 5′-structures of virus mRNAs
Viruses Genome Transcription 5′ Structures Other methylations
Genome 5′ end mRNA 5′ end
Mammals/Birds (RNA type)
 Reo dsRNA(±) cytoplasm m7GpppGm(+)/ppG(−) m7GpppGmpCmp n.d.
 Rous sarcoma ssRNA(+) nucleus m7GpppGmpCp n.d. 10–12 m6A
 Avian sarcoma ssRNA(+) nucleus m7GpppGmpCp n.d. 10 m6A
 Moloney murine leukemia ssRNA(+) nucleus m7GpppGmp n.d. 15–23 m6A
 Feline leukemia ssRNA(+) nucleus m7GpppGmp m7GpppGmpAp 10 m6A
 Sindbis ssRNA(+) cytoplasm m7GpppApUpYpGpb m7GpppApUpG m5C
 Calci ssRNA(+) cytoplasm protein combined base unspecified  
 Dengue ssRNA(+) cytoplasm m7GpppAmpNp m7Gpppm6Amp n.d.
 VSV ssRNA(−) cytoplasm (p)ppA m7Gpppm6AmpAmp n.d.
 Influenza ssRNA(−) nucleus pppA m7GpppNmp 3 m6A
 Newcastle disease ssRNA(−) nucleus n.d. m7GpppNmp n.d.
 Polio ssRNA(+) cytoplasm protein-pUp pUp n.d.
 EMC ssRNA(+) cytoplasm protein-pUp pNp n.d.
Mammals (DNA type)
 Vaccinia DNA cytoplasm m7Gpppm6Amp/Gmp n.d.
 Adeno DNA nucleus m7Gpppm6Amp/AmpN2mp m6A,m5C
 Simian virus 40 DNA nucleus m7Gpppm6Amp/Gmp/Cmp m6A
 Herpes simplex type 1 DNA nucleus m7GpppN1mpN2mp m6A
 Polyoma DNA nucleus m7Gpppm6Amp/AmpN2mp m6A

IV. Mechanism of cap synthesis

Subsequent to the discovery of the cap structure in mRNAs of many viruses, normal cells and tumor cells, I began to explain the mechanism of mRNA capping by studying the process associated with the in vitro synthesis of reovirus mRNA by viral cores. Because the components in the capping reaction and the origin of three phosphates that constitute two pyrophosphate linkages had been identified previously during the characterization steps of cap structure, a series of five reactions catalyzed by distinct enzymatic activity, were readily found to be required to form the Cap 1 structure (Fig. 5).38) The same general mechanism was also shown for CPV by Shimotohno et al.39) and for vaccinia virus by Moss et al.40) and was confirmed with purified enzymes from yeast and mammalian cells.41) This mechanism was confirmed for various cellular mRNAs using purified enzymes isolated from mammalian cells and yeast. For most cellular and nuclear viral mRNAs, transcriptional initiation sites are the capping sites, and capping and subsequent methylations occur by the scheme illustrated in Fig. 5. The 5′ end of the nascent RNA is first modified by removal of the γ-phosphate by RNA triphosphatase (RTase) to yield a diphosphorylated end (ppNpNp-). It is then capped by adding guanosine monophosphate (GMP) transferred from GTP by mRNA guanylyltransferase (GTase) to form GpppNpNp-. In mammalian cells, the GTase, which is present together with RTase in a bifunctional protein, reacts with GTP to make a covalent GMP–enzyme complex by phosphoamide linkage to the ε-amino group of the lysine residue in the signature KXDG sequence. Subsequent methylations catalysed by methyltransferases using AdoMet produces the cap structure m7GpppN(m)pN(m)p-. The N7-methylation on guanine occurs cotranscriptionally, catalysed by the RNA (guanine-7-) methyltransferase (MTase) that binds to GTase-Pol II complexes. The first 2′-O-ribose methylation is m7G-cap dependent and occurs in the nucleus, while the second 2′-O-ribose methylation occurs in the cytoplasm. Accordingly, capping is completed at an early stage of transcription when the nascent RNA is close to or is less than ∼30 nucleotides away; co-transcriptional rather than post-transcriptional capping is likely for cellular mRNA synthesis. Consistent with selective RNA capping on the Pol II transcripts, capping enzymes were found to bind to the Pol II C-terminal domain haptad (7 amino acids) repeat sequences in the largest subunit after they are specifically phosphorylated on serine residues and during the switch from initiation to processive elongation in mRNA synthesis.42),43)

Fig. 5.

Mechanism of cap formation.

Genes and enzymes participating in capping reaction.

Genes encoding enzymes participating in individual capping reactions have been cloned and characterized from several sources. In yeast Saccharomyces cerevisiae and fungus Candida albicans, the capping enzymes RTase, GTase and MTase are coded for by three separate genes. However, human RTase and GTase are on the same polypeptide of two-domain proteins, and MTase is coded for by a separated gene. Chu and Shatkin44) showed by gene silencing studies that each reaction participating in cap formation is essential for proliferation of cells and viruses. Later, I, who had left the research field of capping since 1984 and returned to Japan, collaborated with Shatkin again to identify the location of capping genes: human GTase and RTase genes were mapped on chromosome 6q16, and MTase gene on chromosome 18p11.22–p11.23.45)

The strange behavior of AdoMet-directed methylation-coupled transcription that triggered an expansion of mRNA methylation research was investigated thoroughly by me.46) The stimulatory effect of AdoMet was then found to be due to a lowering of Km for the initiating ATP so that virus RNA polymerase can more readily form the first dinucleotide pppApG. Kinetic studies indicated that AdoMet stimulated RNA polymerase through an allosteric effect by interacting with methyltransferase contained in the CPV transcription apparatus.47) This strange transcription stimulatory effect by AdoMet was also observed with the Spring Viremia Carp Virus transcription system in vitro.48)

When and where does the capping reaction occur in cellular mRNA synthesis?

Earlier, I found that HeLa cell mRNAs contained a cap structure at the 5′-terminus.29) Subsequently, Salditt-Georgieff et al.30) found that adding 5′-cap structures occurs early in HeLa hnRNA synthesis. Similarly, Sommer et al.37) reported that adenovirus RNAs in the nucleus and cytoplasm of Hella cells were both capped. Because capping appears to be completed in the nucleus at a very early stage of transcription, co-transcriptional capping, rather than post-transcriptional capping, is postulated for cellular mRNA synthesis in vivo.49) Biochemically, post-transcriptional capping can occur in vitro on 5′ triphosphorylated RNA, oligonucleotides, or even on ppNpN dinucleotide structures, by the action of purified 5′-RTase and GTase.50) However, this may not occur in vivo because a strong 5′-exonuclease activity that hydrolyzes uncapped RNA exists in the nucleus, and nascent RNAs would be digested readily before the completion of transcription unless their 5′ ends are protected by m7G-cap.5),6) As the action of 5′-exonuclease yields 5′ mono-phosphorylated RNAs, which are no longer substrates for re-capping, capping of nascent RNAs must be completed when they are short enough to be protected by RNA polymerase II complexes.

Historically, a prototype concept “Methylation-coupled transcription” was first proposed by me for CPV mRNA synthesis before identification of the cap structure.19) Next, “Post-transcriptional capping” was shown by Ensinger et al.50) using a purified vaccinia virus capping enzyme. Finally, the concept of “Co-transcriptional capping in the initiation of transcription in vivo” has been supported on the basis that a direct interaction occurs between RNA polymerase II and capping enzymes in the initiation of transcription.42),43)

V. Biological functions of m7G-cap in mRNA

Cap stabilizes mRNA and stimulates mRNA translation.

To investigate the potential biological function of 5′ cap in mRNA, Furuichi and Shatkin51) invented a method to prepare reovirus mRNAs having three different types of 5′ termini, i.e., m7GpppG(m)-, GpppG- and ppG-, based on the synthetic pathway shown in Fig. 5. Here, inclusion of AdoHcy, a methylation inhibitor, and pyrophosphatase in the reaction mixture produced predominantly GpppG-mRNA, while the presence of AdoHcy and a high concentration of pyrophosphate induced a reverse reaction of pyrophosphorolysis and produced ppG-mRNA. When we microinjected 32P-labelled reovirus mRNAs having the three different types of 5′ structures, into a Zenopus oocyte, or incubated in the wheat protein synthesizing system, m7GpppGm-mRNA was most stable in the cells and was incorporated into polysomes for protein synthesis, while GpppG-mRNA remained stable but was unable to be incorporated into polysomes (Fig. 6).5) By contrast, ppG-mRNA was unstable and was quickly degraded to 5′-monophosphorylated nucleotides, indicating clearly two important conclusions:

  1. (i)    5′ Blocked structure is important for mRNA stability,
  2. (ii)    m7G in cap structure is required for initiation of translation.

Indeed, Hickey et al.52) showed that m7GMP, a cap analogue, is a strong and specific inhibitor of eukaryotic translation, suggesting a cap is important for mRNA translation. A number of other studies used the in vitro translation systems of wheat germ extracts or rabbit reticulocyte lysates and they all showed the importance of a cap structure in protein synthesis. I participated in a study by Muthukrishnan et al.53) that showed 5′-terminal 7-methylguanosine of a cap is required for translation in the wheat germ system, and in another study by Both et al.54) that showed ribosome binding to mRNA requires 5′ terminal 7-methylguanosine. However, I evaluated highly the microinjection experiment using frog eggs with reovirus mRNA having the three different types of 5′ structures, because the experiments were done in a way close to the de novo natural cell condition, and the results most explicitly showed the fundamental role of mRNA cap structure in the cytoplasm.

Fig. 6.

Two important biological functions of m7G-cap: mRNA stabilization and stimulation of protein synthesis at the initiation. The 32P-labelled reovirus mRNAs having the three different types of 5′ structures, namely m7GpppGm-, GpppG- and ppG-, were prepared by the method of Furuichi and Shatkin (1976)51) and were either microinjected to frog oocytes or incubated in the wheat germ protein synthesizing extract and examined mRNA stability and translational activity. Data are obtained from the paper by Furuichi et al. (1977).5)

The study also predicted the presence in the nucleus and cytoplasm of a 5′-to-3′ exonuclease that preferentially digests uncapped RNA to 5′ mononucleotides exonucleolytically. Indeed, such an exonuclease, XRN1, having the expected enzymatic properties and the responsibility for digestion of decapped mRNA, was later found in S. cerevisiae and mammalian cells.55) Studies with different reovirus mRNAs indicated that transcripts with 5′-terminal m7GpppGm were preferentially translated in extracts of both plant and animal cells. More direct evidence came from findings that the 5′ cap of mRNA is part of the ribosome-binding site in 40S initiation complexes, because caps were retained in the sequences protected against RNase digestion. Furthermore, the protected capped mRNA fragments rebound efficiently, while only low levels of ribosome rebinding were observed using the corresponding uncapped fragments that were collected from RNase-treated 80S complexes. These findings on the sizes, sequences and initiation properties of 40S versus 80S ribosome-protected mRNAs provided insights that led to the “scanning model” proposed by Kozak to explain how eukaryotic ribosomes select initiation regions in mRNAs, usually the first AUG codon downstream of the cap.56),57)

Cap function as a hallmark of regulation of eukaryotic protein synthesis.

The effects of the 5′ cap in mRNA translation was studied extensively and soon it was found that most eukaryotic translation is initiated with a cap-binding protein (named later as eIF4E) that was identified by Sonenberg et al.58) This finding opened a new stage of cap research with regard to studying the regulation of protein synthesis and for understanding how viruses manage to replicate by making use of the cap-dependent living system of host cells. Selection of capped over uncapped transcripts occurred at the level of mRNA binding to 40S ribosomal subunits. Several other observations provided support for the idea that caps are recognized during the early stages of protein synthesis and have a pivotal role in regulation of translation.10)

Many studies in the last 40 years discovered molecular mechanisms underlying the regulation of protein synthesis and yielded various important concepts, such as “cap-dependent translation” (or “m7G-dependent translation”), “cap-independent translation by virus strategy”, and “cap-snatching reaction by influenza virus”.58) The “5′ terminal cap as the 40S ribosome entry site”, “cap-dependent ribosome-scanning model”, “5′ cap-recognition for assembly of protein synthesis initiation complex”, and “internal ribosome entry site IRES in the uncapped mRNAs” are other titles given to some of the cap-related biochemical reactions. These concepts and inventions of new terminologies point out clearly the essence of many cap-mediated reactions and have now become popular in understanding various important regulations for protein synthesis and virus replication strategy.59)61)

Roles of 5′ caps in pre-mRNA splicing and nucleocytoplasmic transport.

Besides protein synthesis, caps were found to have important roles in the splicing of pre-mRNAs, cooperating with spiceosomes.62) The cap on pre-mRNA interacts with splicing components at the adjacent 5′ splice site to process the first exon and remove the first intron. This interaction is mediated by a nuclear cap-binding complex (CBC) that consists of two tightly associated proteins, CBP 20 and CBP 80. The m7G-cap on pre-mRNA is specifically bound by CBC, which facilitates association of the cap-proximal 5′ splice site with U1 snRNP. The CBC remains bound to the cap during RNA processing and has an active role in both splicing and RNA export. The CBC-mediated interaction between U1 snRNP and the 5′ splice site is one of the earliest steps in spliceosome assembly and is conserved in humans and yeast. CBC also participates in 3′ polyadenylation and nucleocytoplasmic transport of mRNA.63) One study indicated that the human mRNA export machinery complex, which mediates cap-first export of mRNA through nuclear pores, interacts with CBP80 and binds RNA close to the cap in a splicing-dependent interaction, facilitating the export of spliced mRNAs.64)

Use of capping site in future diagnosis of human diseases; profiling of gene expression in patients by transcription start sites and promoter analysis with gene CAGE technology and next generation sequencers.

Nucleotide N in the 5′-cap m7GpppN(m) is the initiating nucleotide in gene transcription by PolII RNA polymerase. Unique features of the cap structure that include 5′-5′ pyrophosphate linkages and a free 2′-3′ cis-hydroxyl group in m7G ribose permit isolation of RNA sequences adjacent to and upstream of 5′ terminal caps. For example, the capped 5′-terminal oligonucleotide of retrovirus RNA was isolated by affinity column chromatography by me with borate resin that binds specifically to 2′-3′ cis-OH groups in m7G ribose.35) Suzuki and Sugano65) developed cap-modifying technology that removes m7Gpp from mRNA by tobacco acid pyrophosphatase and recaps the monophosphorylated 5′-ends with an oligonucleotide adapter by RNA ligase. This method, termed “oligo-capping”, was used to determine the exact transcription initiation sites of individual genes and to prepare libraries of full-size cDNA clones.65) Kanamori-Katayama et al.66) developed another procedure to collect cDNA/RNA hybrid copies containing mRNA 5′ terminal sequences that used biotinylation of periodate-oxidized m7G residues. The cDNA fragments synthesized by reverse transcriptase are captured first by streptavidin, and then are hybridized to the oligo-dT on the flow cell surface. This cap-trapping method is referred to as HeliScopeCAGE (cap analysis of gene expression using HeliScope single-molecule sequencer). The isolated first-strand cDNAs are sequenced directly by an advanced generation sequencer. With this technology, a large short-tag CAGE library was made available to study transcription events on a genome-wide-scale.66) Analysis of cap-trapped sequences by high-throughput sequencers enables genome-wide quantification of expression, and diagnosis (or forecasting) of diseases. This innovative technology of the cap will surely open a new field to understand various levels of cell biology, including the nature of pluripotent and adult stem cells and their various differentiated progeny, as well as the nature of altered gene action in cancer initiation and progression.

VI. Conclusion

Discovery and impact of a cap on molecular biology past and future.

Methylation of genomic DNA and mRNA is now widely accepted as a major pathway to regulate both transcription and translation in eukaryotes. The m7G-cap formed on mRNA is fundamental for eukaryotic gene expression. Almost 40 years have passed since the discovery of the cap, during which much has been learned about the importance of biochemical mechanisms of capping and the downstream effects on translation and its regulation. Genetic and RNAi studies have shown the lethal consequences of capping failure in yeast and higher organisms, confirming the importance of mRNA capping. In addition, comparative biological studies have shown evolutionary and functional conservation of capping from unicellular to multicellular organisms.

By contrast, viruses develop their specific strategies to inactivate or utilize the host capping system to proceed preferentially their replication cycle or to hijack the facility of host cells for their survival. The diverse strategies of cap formation among viruses that infect humans and domestic animals, such as influenza, corona and vesicular stomatitis viruses, have been reviewed extensively by me previously.1) Those information should be useful for drug screening in pharmaceutical companies to find efficacious antivirus compounds showing no or few adverse effects.

I was fortunate to be one of the discoverers of this important cap structure and was able to share with many colleagues and competitors the excitement of explaining how it is made and what it does. The year 1975 was the birth year of the m7G-cap appearing on the science bulletin board. It was also the starting year for The Current Content to identify the most-cited author, and I was honored to be selected as the first one.67) One day, there was a telephone call from the Current Content office asking “Why is this structure so important?” Although I missed the right answer then, now I will not disappoint the questioner by providing so much valuable information as described in this review article. In addition to the various cap effects on translational regulations that we learned in the last four decades, future studies of capping site and promoter-based analyses of gene expression by next-generation sequencers should lead to a deeper understanding of dynamic transcriptional regulation.

Profile

Yasuhiro Furuichi was born in 1940 at Joshin in the present North Korea, which was part of Japan before 1945. He graduated from the University of Toyama in 1964 and entered the graduate school of the University of Tokyo, because he wished to study the chemistry of nucleic acids in the laboratory of Prof. Tyunosin Ukita. In 1969, he received a Ph.D., and he obtained a position at the National Institute of Genetics, where he worked with Dr. Kin’ichiro Miura to characterize the structures of CPV dsRNA genome. In 1974, he moved to Dr. Aaron J. Shatkin’s laboratory at the Roche Institute of Molecular Biology in U.S.A. to extend his findings of mRNA methylation with CPV to other virus systems. He was cited by the Current Content as “most-cited author in 1975”, and he received an award from the Japanese Biochemical Society in 1976 for his finding of mRNA methylation and discovery of the 5′ cap structure present in eukaryotic mRNAs. In 1980–1985, he served as an associate editor of the Journal of Virology in the field of RNA viruses.

In 1985, he returned to Japan to direct the drug discovery program of Nippon Roche Research Center, where his team was successful in finding the endotheline antagonist Bosentan, which has long been used since as a specific medicine for pulmonary hypertension. From 1993 for seven years, he directed the national project AGENE Research Institute to elucidate the molecular basis behind Werner syndrome, a premature aging genetic disease prevalent in Japan. In 2000, he started the company GeneCare Research Institute to develop the findings made at AGENE into anticancer drugs free from adverse effects. From 2008 for five years, he directed the national project Hokuriku Innovation Cluster for Health Sciences. For the last 14 years since 2001, he has been an adviser to the Sakigake front-running programs of the Japanese government, in which he keeps encouraging young elite scientists to conduct innovation research to discover something new of value.

Acknowledgement

I would like to dedicate this review to the late Dr. Kin’ichiro Miura and Dr. Aaron J. Shatkin, whom I respected as mentors, and whom I worked together on mRNA caps throughout the term of its discovery process and subsequent investigation of its biological significances. I thank Ms. Midori Kawaguchi and Ms. Aika Takahashi at the GeneCare Research Institute and Dr. Yoshihito Ueno and Mr. Kosuke Nakamoto at Gifu University for assistance in compiling this manuscript.

References
 
© 2015 The Japan Academy
feedback
Top