Biological and Pharmaceutical Bulletin
Online ISSN : 1347-5215
Print ISSN : 0918-6158
ISSN-L : 0918-6158
Regular Articles
Synonymous and Biased Codon Usage by MERS CoV Papain-Like and 3CL-Proteases
Mahmoud Kandeel Abdallah Altaher
著者情報
ジャーナル フリー HTML
電子付録

2017 年 40 巻 7 号 p. 1086-1091

詳細
Abstract

Middle East respiratory syndrome coronavirus (MERS CoV) is a recently evolved fatal respiratory disease that poses a concern for a global epidemic. MERS CoV encodes 2 proteases, 3C-like protease (3CLpro) and papain-like protease (PLpro). These proteases share in processing MERS CoV polyproteins at different sites to yield 16 nonstructural proteins. In this work, we provide evidence that MERS CoV 3CLpro and PLpro are subject to different genetic and evolutionary influences that shape the protein sequence, codon usage pattern, and codon usage bias. Compositional bias is present in both proteins due to a preference for AT nucleotides. Thymidine (T) was highly preferred at the third position of codons, preferred and overrepresented codons in PLpro, but was replaced by guanosine (G) in 3CLpro. Compositional constraints were important in PLpro but not in 3CLpro. Directed mutation pressure seems to have a strong influence on 3CLpro codon usage, which is more than 30-fold higher than that in PLpro. Translational selection was evident with PLpro but not with 3CLpro. Both proteins are less immunogenic by showing low CpG frequencies. Correspondence analysis reveals the presence of 3 genetic clusters based on codon usage in PLpro and 3CLpro. Every protein had one common cluster and 2 different clusters. As revealed by correspondence analysis, the number of influences on codon usage are restricted in MERS CoV 3CLpro. In contrast, PLpro is controlled by a broader range of compositional, mutational, and other influences. This may be due to the multifunctional protease, deubiquitination, and innate immunity suppressing profiles of PLpro.

Middle East respiratory syndrome coronavirus (MERS CoV) is a recently evolved viral infection that has high case fatality and causes death from severe respiratory and renal dysfunction.1,2) Infections first occurred in the Arabian Peninsula and extended to include several other countries worldwide.3,4) The recent discovery of the virus in China and Korea has led to fatal disease and fear of MERS CoV epidemics.5,6)

MERS CoV is a single strand RNA virus with a relatively large genome size of 30 kb. The MERS CoV genome is composed of one large polyprotein 1ab, spike protein, envelop protein, and nucleocapsid.7) At the 5′ end of the MERS CoV genome, there is one large open reading frame (ORF), ORF1ab, that is translated to polyprotein 1ab (PP1ab). PP1ab encodes various nonstructural proteins (NSPs) that are essential for virus replication. PP1ab is processed by 2 virus-encoded proteases, papain-like protease (PLpro) and 3-C-like-protease (3CLpro).8,9) These proteases share in post-translation cleavage of PP1ab to release 16 MERS CoV NSPs. Each of these proteases recognizes specific sequences or amino acids signatures as cleavage sites along MERS CoV. PLpro cleaves the first 3 positions in PP1ab, releasing 3 proteins (NSP 1–3), and 3CLpro cleaves the rest of the sites to release 13 MERS CoV proteins (NSP 4–16). Therefore, 3CLpro is the main virus protease, and it has strict protease activity conserved in all coronaviruses.10,11) In contrast, MERS CoV PLpro is a multifunctional protein, including its classical protease activity and other downstream activities essential for virus survival, host immune resistance, and virus replication. MERS CoV PLpro has de-ISGylation (removal of Interferon Stimulated gene 15) and deubiquitination activities.1214) These activities are essential to block host interferon regulation and to interfere with host innate immunity.

Bioinformatics and computational tools are now widely used to understand genomics, proteomics, and drug targeting of various components in living organisms.1517) Understanding the gene’s composition and subsequent analysis of codon usage is indispensable in understanding virus replication and evolution and targeting the unique aspects of virus biology.18,19)

For the 20 amino acids, 61 genetic codes are used. The number of codons for every amino acid ranges from 1 to 6. Synonymous codons imply the presence of alternative codons for a certain amino acid.20) Because the selection of codons is a nonrandom process, codon bias is expected when one codon is preferentially used at the cost of synonymous codons; synonymous codons are not used equally within or between genomes.21) Compositional constraints and selection pressure are the major forces that influence codon usage.22)

Viruses depend on host machinery to produce viral proteins. The translation efficiency of virus genes depends on the compatibility between the codon usage of the virus and that of the host. Processing of virus polyprotein, maturation, and assembly of the virus require the action of the virus-encoded proteases. In this work, we analyzed MERS CoV PLpro and 3CLpro from 206 sequences for each protein. The results obtained indicate differences between MERS CoV 3CLpro and PLpro in codon usage pattern, codon usage bias, and the forces that influence codon usage. These data will help to further understanding of the MERS CoV replication process and its unique aspects to improve drug targeting and knowledge of virus evolution mechanisms.

MATERIALS AND METHODS

Construction of Protease Dataset

MERS CoV full genomes were searched in the GenBank by Geneious software. The sequences of PLpro and 3CLpro were retrieved and stored by their original genomes’ accession numbers. The files were exported in FASTA format for further analysis by codonW software.

Codon Usage Pattern

A, G, C, and T frequencies, GC%, and AT% were calculated using CLC genomics workbench. Codon usage indices, dinucleotide frequencies, the frequency of GC at the first (GCs) or third position (GC3s) of codons, the frequency of nucleotides at codon third position (A3s, T3s, G3s, and C3s), codon usage indices, Gravy (hydropathic index), Aromo (frequencies of hydrophobic amino acids), and correspondence analysis (COA) were calculated using CodonW 1.4.2.

Relative Synonymous Codons Usage

Relative synonymous codons usage (RSCU) is the ratio of the observed frequency of codons to the predicted frequency, at equal use of amino acids. RSCU is calculated by using the following equation:   

(1)
where Xij is the observed number of codons for a certain amino acid, and ni is the total number of synonymous codons for the amino acid.

Relative Dinucleotide Frequencies

Comparing the actual to predicted frequency of dinucleotides is important in order to correctly estimate the forces controlling codon bias. Relative dinucleotide frequency was calculated by the following formula   

(2)
where f(XY) is the observed frequency of dinucleotides and f(X)f(Y) are the frequencies of single nucleotides.

Overpresentation and underpresentation of dinucleotides have values of >1.23 and <0.78, respectively.23)

Effective Number of Codons

Effective number of codons (ENc) is an important measure of the codon bias in a gene. The value of codon bias ranges from 20 to 61.24) Higher values of ENc indicate lower codon bias. Extreme bias is regarded as a value <35 and indicates that lower numbers of codons are used for amino acids.25)

ENc Plot

ENc plot is a tool to assess the influence of mutational pressure on codon bias.24) ENc values are plotted against GC3s, and the resultant values are compared to a standard curve.

Codon Usage Bias Mediated by Natural Selection

Codon usage bias mediated by natural selection was investigated by using neutrality plot, codon adaptation, aromaticity (AROMO), and hydropathicity (GRAVY) indices.26)

Multivariate COA

Multivariate statistical analysis was used to estimate the relation between the samples and variables. Each target was presented as a 59-dimension vector corresponding to the RSCU of each codon. Data trends can be assessed via their position along each plotting axis.

Correlation Analysis

GraphPad prism software (GraphPad Inc., U.S.A.) was used in all correlation measurements.

RESULTS

MERS CoV Protease Nucleotide Content

A total of 206 sequences were retrieved for both MERS CoV PLpro and 3CLpro (Supplementary Table 1). The most abundant nucleotide was T (31%), and nucleotide frequencies were in the following order: T>A>G>C (Table 1). Despite the high similarity in nucleotide composition, the frequencies of nucleotides at the third position of codons showed marked differences. These differences account for important variations in codon usage and codon usage bias. In PLpro, the order of NT3s was T3s>A3s>G3s>C3s, whereas that for 3CLpro was G3s>A3s>T3s>C3s. The most frequent nucleotide in NT3s in PLpro was T3s with a frequency of 0.5. In contrast, 3CLpro was revealed to have G3s as the most predominant NT3s with a frequency of 0.37.

Table 1. Analysis of Nucleotide Contents in MERS CoV PLpro and 3CLpro
PLpro3CLpro
Frequencies of nucleotidesNucleotide
Adenine (A)0.260.25
Cytosine (C)0.180.20
Guanine (G)0.230.23
Thymine (T)0.310.31
C+G0.420.43
A+T0.580.57
NT3sT3s0.50.24
C3s0.220.23
A3s0.30.36
G3s0.210.37

RSCU Analysis

RSCU values of codons are classified into 3 groups: i) underrepresented codons (negative bias) with RSCU values <0.6; ii) represented codons (no bias) with RSCU values between 0.6 and 1.6; and iii) overrepresented codons with RSCU values >1.6 (positive bias). In the preferred codons, A3s and T3s were the most predominant in PLpro, whereas G3s and C3s were the most predominant in 3CLpro (Table 2). Ten amino acids (50%) revealed different preferred codons (Leu, Ile, Val, Ser, Ala, Tyr, Asn, Lys, Asp, Arg) in PLpro and 3CLpro. The underrepresented or negative biased codons were CUC, CUG, AUC, UCC, UCG, GCC, GCG, UAC, ACC, and GAC for PLpro and UUC, CUU, CUC, AUU, GUC, UCU, UCG, UAU, and GAU for 3CLpro. Therefore, all negative biased codons in PLPro contain G3s/C3s, and most underrepresented codons in 3CLpro contains U3s. The overbiased codons in PLpro typically contained U3s/A3s (UUA, AUU, GUU, UCU, UCA, CCU, ACA, GCU, AAU), whereas 3CLpro had balanced RSCU values for the 4 nucleotides (UUG, AUA, UCA, CCU, GCA, UAC, GAC, AGG, GGU).

Table 2. Average RSCU Values from PLpro and 3CLpro

The preferred codon for every amino acid is displayed in bold. A diagonal line is placed in cells with negative codon bias. Highlighted cells are overbiased codons.

Dinucleotide Frequencies

The relative abundance of the 16 dinucleotides can be taken as a measure of codon usage bias in genes. All overrepresented dinucleotides lacked CpG. CpG and CpC were the most underrepresented dinucleotides.

ENc and ENc Plot

ENc values range from 20 to 61. The lowest value indicates a high degree of codon bias resulting from a low number of codons being preferentially used for amino acids. The ENc values for MERS CoV PLpro and 3CLpro were 51.9 and 52.9, respectively (Table 3).

Table 3. Codon Usage Indices of PLpro and 3CLpro
ENcGC3sGCGravyAromo
PLpro51.90.340.42−0.010.09
3CLpro52.90.490.441.040.07

The influence of compositional constraints and mutational pressure on codon usage can be evaluated by an ENc plot. Points lying on the standard curve of the GCs and ENc relation indicate compositional bias. Points lying under the standard curve indicate the influence of other forces acting as mutational bias. In PLpro and 3CLpro, all points are situated under the curve, thus indicating the influence of compositional constraints on codon usage bias (Fig. 1). Correlation analysis reveals a significant correlation between GC3s and ENc (r=0.55, p<0.0001) in PLpro, indicating that PLpro has additional codon usage bias acting together with mutational bias. In 3CLpro, there was no significant correlation between GC3s and ENc (r=0.05), indicating minimal influence of compositional constraints in MERS CoV 3CLpro.

Fig. 1. ENc Plot of MERS CoV PLpro and 3CLpro

GC3s is plotted against ENc. The linear relation represents the GC3s–ENc regression. The continuous line represents the standard curve of the GC3s–ENc relation.

Natural Selection and Codon Usage Bias

A neutrality plot was used to determine the effect of natural selection and mutation pressure on the codon usage of MERS CoV proteases (Fig. 2). Genes being located at the slope of unity implies neutral mutation by random selection. The degree to which the slope changes toward the x-axis implies the amount of influence directed mutation has on codon bias.

Fig. 2. Neutrality Plot of MERS CoV PLpro and 3CLpro

In this plot, G3s is plotted against G1, 2s. The slope of this relation is regarded as the speed at which selection and mutation force evolve. The regression coefficient represents the mutation-selection equilibrium coefficient. In 3CLpro, G3s and G1, 2s were negatively correlated (r=−0.5, p<0.001). Furthermore, neutrality was extremely low at 1.38%. This low value indicates an extremely large amount of directed mutational pressure on 3CLpro. In PLpro, the regression slope was 0.43 and the correlation coefficient was 0.9 with a p-value <0.001. Therefore, PLpro has a relative neutrality of 43%, which indicates the presence of directed mutational pressure. However, other factors as selection are contributing much higher in 3CLpro.

In MERS CoV PLpro, Gravy and Aromo were significantly correlated with ENc and GCs (Table 4), which indicates that Gravy and Aromo influence codon usage in MERS CoV PLpro. In 3CLpro, Gravy and Aromo were not correlated with ENc and GC3s. Therefore, hydrophobic values and aromaticity of amino acid composition moderately influenced codon usage in PLpro but not in 3CLpro. The negative Gravy indicates MERS PLpro is nonpolar protein, whereas 3CLpro is polar (Gravy=1.04).

Table 4. Correlation of ENc, GC3s, Gravy, and Aromo Values for MERS CoV PLpro and 3CLpro
PLpro
ENcGC3sGravyAromo
ENc0.39***−0.41***0.26***
GC3s0.39***−0.41***0.56***
Gravy−0.41***−0.41***−0.36***
Aromo0.26***0.56***−0.36***
3CLpro
ENcGC3sGravyAromo
ENc0.050.04−0.07
GC3s0.051.00***−0.99***
Gravy0.041.00***−0.99***
Aromo−0.07−0.99***−0.99***

*** p<0.001, ** p<.01, * p<0.05.

Correspondence Analysis

Major trends in variation in codon usage were analyzed by COA. In 3CLpro, the first 2 axes of COA were associated with 87 and 3%, respectively, of variability in the measured genes. Similarly, the first 2 axes caused most of the variability, 56.6 and 7.6%, in PLpro. According to COA, 3CLpro seems to be under a restricted number of forces that affect selection of 3CLpro codon usage. In PLpro, high relative inertia from the first axis (56.6%) with gradually increasing cumulative inertia (data not shown) indicates that diverse forces influence codon usage.

COA of MERS CoV strains based on RSCU values reveals the presence of 3 genetic clusters of MERS CoV 3CLpro and PLpro (Fig. 3). It is notable that both proteases share one common cluster and possess 2 different clusters. Analysis of the position of different codons by axis revealed that most of the variations in PLpro were due to G3s nucleotides, whereas in 3CLpro all nucleotides contributed equally to the variability.

Fig. 3. COA Analysis of RSCU Values from MERS CoV PLpro and 3CLpro

DISCUSSION

Computational tools are now gaining marked attention for their use in the investigation of microbial genomes, evolution, codon usage, and new drug targets.2729) In this work, we analyzed the nucleotide composition of MERS CoV virus-encoded proteases to understand the forces that control codon usage and codon usage bias.

The nucleotide composition of MERS CoV proteases is in agreement with the general nucleotide composition of RNA viruses. The higher AT% and low GC% in MERS CoV proteases concurs with the accepted low GC frequency in similar coronaviruses, such as Severe acute respiratory syndrome (SARS) CoV and other RNA viruses.3032) The high ENc values (>51) in MERS CoV proteases indicate a low degree of codon bias and the presence of several codon choices for every amino acid. In this context, RNA viruses have high ENc values, which enable them to adapt to a broad range of hosts with diverse codon preferences.31)

Analysis of RSCU reveals marked differences between PLpro and 3CLpro. A/T was the most frequently presented nucleotide in the preferred codons and NT3s and was less common in negative biased codons in PLpro but not in 3CLpro. The positive bias observed in PLpro in preferred codons and NTPs coincides with similar high representation profiles in the pandemic influenza virus H1N1 and in H3N2.26) In contrast, G3s is more common in the preferred codons in 3CLpro. This result indicates that the usage of preferred codons is mostly affected by compositional constraints in PLpro, but not in 3CLpro, but with different nucleotide preferences. Furthermore, an ENc plot reveals that mutational bias and other forces control codon usage in PLpro, which seems to be affected by directed mutational pressure (estimated by neutrality plot). The lack of correlation between ENc and GC3s in 3CLpro confirms the minimal effect of nucleotide composition on codon usage in this enzyme. In addition, 3CLpro was under highly directed mutational pressure, which affected its codon usage, as revealed by its low neutrality.

The dinucleotide profile indicates a common feature of most viral proteins: lower CpG frequencies. The lower CpG rate is thought to be a way for the proteins to escape the host immune response, which recognizes unmethylated CpG as an immune stimulator or a signature against foreign nucleic acids.33) On the basis of COA, MERS CoV strains are clustered into 5 genetic clusters according to their proteases. Within these groups, only one cluster included PLpro and 3CLpro, indicating the presence of subtle differences in the evolutionary patterns of MERS CoV proteases. Analyzing the axes of COA in both proteases reveals that the first axis has a lower degree of influence (56.6%) in PLpro and the following 3 axes have a greater contribution. This discrepancy roughly indicates that stronger and more diverse influences act on PLpro. In contrast, in 3CLpro, the first axis accounts for 87% of variability, and weaker contributions are made from the second, third, and fourth axes.

In conclusion, in this study MERS proteases were analyzed for codon usage and codon usage bias. MERS CoV encodes 2 virus-encoded proteases, PLpro and 3CLpro. The main protease is 3CLpro, which cleaves most of the virus ORFs. PLpro is a multifunctional protein that possesses proteases and other immune-related activities. We report differences in codon usage patterns and codon usage bias. Codon bias was low overall due to relatively high ENc values. Although both proteins have compositional bias with higher AT%, PLpro was biased against G nucleotides in its codon third position as well as in rare codons. In contrast, 3CLpro preferentially used G nucleotides. Compositional constraints, mutational pressure, and translational selection influenced codon bias in PLpro, whereas directed mutational pressure had the greatest influence on 3CLpro. Covariate analysis of MERS CoV protease reveals the presence of 5 different MERS CoV genetic clusters, including one common and 4 variant clusters. The codon usage pattern of MERS CoV PLpro might be associated with its broad function for virus processing and host-related interactions.

Acknowledgments

This work is supported by Deanship of scientific research, King Faisal University, Grant number 160016. URL: https://www.kfu.edu.sa/en/deans/research/pages/home-new.aspx.

Conflict of Interest

The authors declare no conflict of interest.

Supplementary Materials

The online version of this article contains supplementary materials.

REFERENCES
 
© 2017 The Pharmaceutical Society of Japan
feedback
Top