2020 Volume 17 Pages 14-24
In the present study, thermodynamic properties of coarse-grained protein models have been studied by an extended ensemble method. Two types of protein model were analyzed; one is categorized into a fast folder and the other into a slow folder. Both models exhibit the following thermodynamic transitions: the collapse transition between random coil states and spatially compact, but non-native states and the folding transition between the collapsed states and the folded native states. Caloric curve for the fast folder shows strong statistical ensemble dependence, while almost no ensemble dependence is found for the slow folder. Microcanonical caloric curve for the fast folder exhibits S-shaped temperature dependence on the internal energy around the collapse transition which is reminiscent of the van der Waals loop observed for the first order transition; at the transition temperature, the collapsed and random coil states coexist dynamically. The corresponding microcanonical heat capacity is found to have negative region around the transition. This kind of exotic behaviors could be utilized to distinguish fast folding proteins.
Coarse-grained protein models studied exhibit two thermodynamic transitions: the collapse transition between random coil and spatially compact, but non-native states and the folding transition between the collapse and native states. Caloric curve for a fast folding model shows strong statistical ensemble dependence especially around the collapse transition, while almost no ensemble dependence is found for a slow folder. Microcanonical caloric curve for the fast folder exhibits S-shaped temperature dependence on the internal energy around the collapse transition; the corresponding microcanonical heat capacity is found to have negative region around the transition. This kind of exotic behaviors could be utilized to distinguish fast folding proteins.
Proteins are heteropolymers consisting of amino acid residues linearly linked by peptide bonds. Protein molecules are known to spontaneously fold into compact three-dimensional structures and to biologically function in the native states. Since the pioneering experiments by Anfinsen and coworkers [1], it has been known that the amino acid sequence uniquely determines the native states. To understand how the information coded in the amino acid sequence is translated into the corresponding three-dimensional structure is the core of the protein folding problem. Understanding the problem would allow us to predict three-dimensional structures in the native states only by the knowledge of the amino acid sequence. Despite many remarkable advances in the last decades [2–7], the protein folding problem is still far from a solution.
It is important to point out that not all polypeptides are proteins; only a very small subset of all the possible sequences of the 20 naturally occurring amino acids has been selected by evolution. While natural proteins fold into uniquely determined native states quickly, a generic polypeptide does not. Much effort has been made to understand this issue; one important perspective is provided by the consistency principle [2] which states that various energy terms in proteins, for example, short- and long-range interactions, are consistent with each other. This idea has been extended as the principle of minimum frustration and applied to the random energy model to shed light on the phase behavior of proteins [8]. These theoretical efforts have been unified as a perspective assuming that protein has a funnel-shaped energy landscape with a bias toward native structures [3,9].
In the present study, we have adopted a coarse-grained protein model developed by Honeycutt and Thirumalai [10] to investigate thermodynamic behavior associated with protein folding. This HT model has originally been introduced to examine the possibility that metastable states are relevant for protein folding. In this model, conformational frustration prevents efficient relaxation to the global potential energy minimum. To model a fast folding protein, the original model has been modified with the help of the consistency principle [11–14], the resulting coarse-grained model is referred to be a Go-like HT model. The original and Go-like HT models have been studied theoretically [10–18]. Potential energy landscape (PEL) of the HT models has been explored by the eigenvector following and basin-hopping methods [12,14]; while the original HT model is characterized by the frustrated PEL around the global potential energy minimum, the Go-like HT model where the frustration in the original model is removed exhibits a funnel type PEL. Thermodynamic properties have been studied by the histogram method and several enhanced sampling methods [11,13,15–18]. In the present study, we have applied our enhanced sampling method, the multicanonical generalized hybrid Monte Carlo method [19] to the original and Go-like HT models to explore thermodynamic transitions of the protein models. Special attention has been paid to the statistical ensemble dependence of caloric curves to distinguish fast folders from slow folders. Convexity of the microcanonical entropy is demonstrated to play a key role to characterize the fast and slow folders.
We first briefly review the multicanonical ensemble method. The multicanonical ensemble [20–23] is an artificial statistical ensemble whose probability density in the potential energy space ρ_{mc}(U) is given using the density of the potential energy states
(1) |
where W(U) is a weight function to realize the constant flat distribution of the potential energy. The obvious choice of the weight function is provided by
The generalized hybrid Monte Carlo (GHMC) method is a method to combine MD and MC methods. The configuration of the system is generated by an equation of motion and the trial configuration is accepted according to the Metropolis criterion as in MC method. Here, we briefly describe our GHMC method to generate the multicanonical ensemble [19]. We first regard the multicanonical ensemble as a fictitious canonical ensemble by introducing the following multicanonical potential: U_{mc}(U)=k_{B}T_{0}W(U({r_{i}}) where T_{0} is an arbitrary temperature. Then, the multicanonical distribution can be interpreted as a fictitious canonical distribution at the temperature T_{0} by
(2) |
where k_{B} is the Boltzmann constant. Here, we introduce the following Hamiltonian H_{mc}:
(3) |
where p_{i} is the fictitious momentum and m_{i} is the associated fictitious mass of an ith particle. Using the Hamiltonian H_{mc}, we obtain the following equations of motion [22,23]:
(4) |
Given an initial state, the system state is evolved by numerically integrating the above equations over n_{MD} steps with the time increment Δt using a symplectic integrator [29]. The map from the initial state to the trial state is represented as U_{Δ}_{τ}: ({r_{i}},{p_{i}})→({r_{i}'},{p_{i}'}) where Δτ=n_{MD}×Δt. The trial state is accepted by the following Metropolis criterion:
(5) |
where
(6) |
The momentum flip is needed to satisfy the detail balance condition in the phase space. It is noted that the Hamiltonian is invariant under the momentum flip. Then, we finally apply the momentum flip and define the next accepted state vector:
In the GHMC method, the momentum is partially refreshed at each GHMC step. We mix the momentum p with a Gaussian noise vector u drawn from the Maxwell distribution at the temperature T_{0} that is carried out by the following equation:
(7) |
Here, u_{i} is generated by u_{i} = (m_{i}k_{B}T_{0})^{1/2}ξ_{i} where each component of ξ_{i} is given by the Gaussian random number with zero mean and unit variance. The angle ϕ is introduced in the range of 0<ϕ≤π/2 that controls the ratio of the momentum mixing. It is worthwhile to note that the standard hybrid Monte Carlo [30,31] is recovered by ϕ = π/2 where the momenta are fully refreshed at each GHMC step.
Here, we summarize the algorithm of the multicanonical GHMC method. We start with an initial state of the system ({r_{i}},{p_{i}}) and mix Gaussian noise vector u_{i} drawn from the Maxwell distribution at the temperature T_{0}: p_{i}←p_{i} cos ϕ+u_{i} sin ϕ for i = 1, ..., N. Then, molecular dynamics calculation based on Eq. (4) is performed for the time increment Δτ = n_{MD}×Δt to generate a trial state that is accepted by the probability
(8) |
In the present study, we have adopted a coarse-grained model of a protein molecule developed by Honeycutt and Thirumalai [10]; hereafter, we refer it HT model. The model consists of N beads. Each bead corresponds to a coarse-grained amino acid residue. Three types of beads are introduced in this model; hydrophobic (B), hydrophilic (L), and neutral (N). The primary sequence of the HT model for N=46 is as follows: B_{9}N_{3}(LB)_{4}N_{3}B_{9}N_{3}(LB)_{5}L. The potential energy for the HT model consists of the following terms: bond-length, bond-angle, torsion-dihedral, and nonbonded potential terms [32]:
(9) |
The bond-length energy U_{bond} is given by
(10) |
where k_{r} = 400ε/a^{2}, ε is the average strength of the hydrophobic interaction, and a is the equilibrium bond length; a = ε = 1 in the present study. The bond-angle energy is given by
(11) |
where k_{θ} = 20ε/rad^{2}, θ_{i} is the bond angle defined by the successive three beads along the chain i, i + 1, i + 2, and θ_{0} = 1.8326 rad is the equilibrium bond angle. The torsion dihedral energy is represented as
(12) |
where ϕ_{i} is the dihedral angle defined by the successive four beads i, i + 1, i + 2, i + 3; A_{i} = 0 and B_{i} = 0.2ε if two or more neutral (N) beads are included, otherwise A_{i} = B_{i} = 1.2ε. The nonbonded energy is given by
(13) |
where r_{ij} = |r_{i} – r_{j}|. If both i and j beads are the hydrophobic (B), C_{ij} = D_{ij} = 1; C_{ij} = 2/3 and D_{ij} = –1 for LL and LB pairs. In the case of a pair including at least one neutral (N) bead, C_{ij} = 1 and D_{ij} = 0. The global minimum energy structure is known to be a β-barrel structure with an energy of U_{0} = –49.2635 for the HT model; the minimum energy structure is defined to be the native structure of the HT model. The potential energy landscape around the native structure is found to be highly frustrated. Here, we introduce a Go-like HT model guided by the consistency principle [11–14]; the original HT model has been modified by removing the attractive interactions between non-contacting hydrophobic beads in the native structure of the HT model. By this treatment, the native structure is specifically stabilized and the frustration in the original model is expected to be greatly reduced. In the present study, the Go-like model is constructed by setting D_{ij} = 0 for the pair whose interparticle distance r_{ij} is longer than 1.3a in the native structure of the original model [11]. The resulting minimum potential energy is found to be U_{0} = –41.0228 that is higher than that of the original model due to switching off the attractive interaction between noncontact pairs in the global minimum structure, while the native structure itself is intact.
In this section, we describe details on the calculations of the original and Go-like HT models. Algorithmic parameters of the multicanonical GHMC method are the following: the fictitious temperature T_{0}, mixing angle ϕ, MD time step Δt and MD steps n_{MD} in single GHMC step. The fictitious temperature was set to be 0.5. As guided by our previous results [19], we adopted ϕ = π/8 and n_{MD} = 20 for both original and Go-like HT models in the present calculations. We set Δt = 0.015 for the original HT model and Δt = 0.020 for the Go-like HT model. Total number of the GHMC steps amounted to 2.0×10^{7} for the original HT model and 4.0×10^{7} for the Go-like HT model. The potential energy distribution was designed to be flat in the energy range [–43, 95] for the HT model and [–34, 95] for the Go-like HT model. The potential energy distributions calculated by our multicanonical GHMC are presented in Figure 1. We find that the distribution in the range specified above is flat enough, demonstrating efficient sampling in configuration space. To perform the multicanonical calculation, we first numerically determined the multicanonical weight W(U). For the original HT model, we performed the replica exchange HMC calculations. 30 replicas were used to cover the energy range to be flatten; the exchange probabilities were in the range from 40 to 60%. 1.0 × 10^{6} HMC steps were carried out for each replica. Then, in order to obtain W(U), the density of potential energy states
Potential energy distributions generated by the multicanonical GHMC calculations for the HT model (upper panel) and the Go-like HT model (lower panel).
We first show the time series of the potential energy of the protein models along the multicanonical GHMC trajectory. The potential energies are presented in Figure 2. It is found that the broad range of the potential energy is covered by the multicanonical GHMC method, as compared to the standard canonical simulation, see Figure 7. The multicanonical calculations sample various structures from spatially extended random coils to compact structures around the global potential energy minimum. While the potential energy continuously changes in the case of the original HT model, intermittent fluctuation of the potential energy is observed for the Go-like HT model. To reveal the underlying potential energy landscape, we have applied the inherent structure analysis [33–39] to the multicanonical GHMC results. In this analysis, the structures along the trajectory are instantaneously quenched down to the absolute zero; each structure is mapped to the potential energy minimum nearest in the configuration space. In the present study, this mapping was performed by the conjugate gradient method. As mentioned in Sec. 3, the minimum potential energy is given by U_{0} = –49.2635 for the original HT model and by U_{0} = –41.0228 for the Go-like HT model. For the original HT model, almost energetically degenerate inherent structures are found around the global minimum, demonstrating frustrated energy landscape. On the other hand, the inherent structures around the global minimum are sparsely distributed for the Go-like HT model; the energy of the global minimum structure is found to be well separated from those of metastable structures. In Table 1, the number of the inherent structures found is listed for both the original and Go-like HT models. For the original HT model, the number is in good agreement with a previous work [17] in low energy ranges; in higher energy ranges, larger number of inherent structures are found in comparison with the previous work. This demonstrates that broader sampling in the configuration space has been realized by our multicanonical GHMC calculation. In the case of the Go-like HT model, the number of the inherent structures is reduced especially in the low energy ranges, corresponding to the sparse potential energy distribution of the inherent structures seen in Figure 2.
The potential energy U and the potential energy of the associated inherent structure U_{IS} as a function of GHMC steps for the HT model (left panels) and the Go-like HT model (right panels). The snapshot gives the global minimum energy structure. Each bead corresponds to a coarse-grained amino acid residue. Three types of beads are introduced in this model; hydrophobic (blue), hydrophilic (yellow), and neutral (red).
HT model | Go-like HT model | ||||
---|---|---|---|---|---|
Energy range | Present work | STMD | Energy range | Present work | |
U_{0}<U≤–49 | 5 | 5 | U_{0}<U≤–41 | 1 | |
–49<U≤–48 | 35 | 35 | –41<U≤–40 | 1 | |
–48<U≤–47 | 159 | 149 | –40<U≤–39 | 5 | |
–47<U≤–46 | 383 | 309 | –39<U≤–38 | 15 | |
–46<U≤–45 | 742 | 547 | –38<U≤–37 | 31 | |
–45<U≤–44 | 1222 | 760 | –37<U≤–36 | 80 | |
–44<U≤–43 | 1886 | 918 | –36<U≤–35 | 457 |
STMD indicates the statistical temperature molecular dynamics results reported in Ref. [17].
Using the configurations distributed according to the multicanonical density, we can evaluate the canonical ensemble average of physical quantities. The canonically averaged value of a physical quantity A(r_{1}, ..., r_{N}) is given by
(14) |
where ‹…› indicates the canonical ensemble average while ‹…›_{mc} indicates the multicanonical ensemble average. We also analyze the results on the basis of the microcanonical ensemble. The thermodynamic potential of the microcanonical ensemble is given by the entropy S(E) calculated using the density of states Ω(E) by
(15) |
where E is the internal energy. In the present multicanonical calculations, the density of potential energy states
(16) |
where C_{N} = (2mπ)^{3}^{N}^{/2}Z/Γ(3N/2); here Z is the partition function and Γ is the gamma function. Using the entropy, for example, the temperature T(E) is obtained by
(17) |
We present the caloric curves for the original and Go-like HT models in Figure 3. The internal energy E by the canonical ensemble is given as follows:
The internal energy E (top panel), the heat capacity C (middle panel) and the radius of gyration R_{g} (bottom panel) as a function of temperature T. Blue curves for the original HT model and red curves for the Go-like HT model; solid lines indicate the canonical ensemble results and dashed lines in the upper panel indicate the microcanonical ensemble results. The internal energy for the Go-like HT model around T = 0.57 is given in an inset.
(18) |
The associated heat capacity is calculated using the variance of the potential energy
(19) |
For both models, the internal energy decreases on cooling. For the Go-like HT model, steep decrease is found around T = 0.61. The corresponding heat capacity has a clear peak around the temperature. The peak signals the transition from a random coil state at high temperature to a spatially compact collapsed, but non-native state at low temperature, which has been verified by examining the order parameter introduced below. Regarding the spatial extent, we find steep decrease of the radius of gyration R_{ɡ} around the transition on cooling, as shown in Figure 3. According to the previous studies [16,41–43], we call it the collapse transition. For the original HT model, similar but much milder temperature dependence is found for the caloric curve and the heat capacity; a peak in C also signals the collapse transition. In Figure 3, the microcanonical caloric curve, E vs T = (∂S/∂E)^{–1} is also presented. For the Go-like HT model, remarkable ensemble dependence is observed; the caloric curve is found to exhibit S-shaped dependence around the collapse transition temperature T_{θ}, which is reminiscent of the van der Waals loops regarding the first order transition [44]. This point will further be discussed later. In the microcanonical caloric curve, we find another inflection point around T = 0.57 corresponding to the folding transition temperature characterized by the order parameter below. This observation suggests that the microcanonical caloric curve contains richer information compared with the canonical counterpart. For the original HT model, almost no ensemble dependence is found for the caloric curve. This type of strong ensemble dependence of the caloric curve has been observed for a fast folding protein model characterized by two states, folded and random coil states [45]. In Ref. [45], the model has been demonstrated to show a first-order type transition; the above statistical ensemble dependence is known to be observed for finite systems exhibiting first-order type transitions. In the present study, the HT models are characterized by three states, folded, collapsed and random coil states.
To quantitatively analyze the folding transition to the native state, the following order parameter is introduced [16,42]:
(20) |
where
(21) |
The calculated results are presented in Figure 4. At high temperature, the averaged order parameter is vanishingly small for the both models, since the proteins are in random coil states. For the Go-like HT model, the order parameter rapidly increases around T = 0.6 with lowering the temperature; the associated fluctuation χ_{QQ} has a peak at T_{f} = 0.57 that indicates the folding transition from the collapsed state to the native state. For the original HT model, on the other hand, the averaged order parameter gradually increases; a peak in the fluctuation function is found at much lower temperature T_{f} = 0.19 in comparison with the Go-like HT model. The transition temperatures are collected in Table 2; the transition temperatures are in good agreement with previously reported values [11]. Using the transition temperatures, the following parameter σ can be introduced and is known to be correlated with the folding time of proteins [43]:
The order parameter Q (upper panel) and the order parameter fluctuation χ_{QQ} (lower panel) as a function of temperature T; blue curves for the original HT model and red curves for the Go-like HT model.
T_{f} | T_{θ} | σ | |
---|---|---|---|
HT model | 0.19 (0.19) | 0.65 (0.66) | 0.71 (0.71) |
Go-like HT model | 0.57 (0.59) | 0.61 (0.61) | 0.07 (0.03) |
The parameter σ is defined by σ=(T_{θ}–T_{f})/T_{θ}. Values in parentheses are given in Ref. [11].
(22) |
where T_{θ} is the collapse transition temperature. Protein molecules for σ<0.1 are empirically known to show fast folding and those for σ>0.6 slow folding. As seen in the Table, the Go-like HT model falls in the range of the fast folder and the original model in the slow folder.
To further gain the insight on the nature of the transitions, we present the entropy S(E) as a function of the internal energy E around the collapse transition in Figure 5. While the entropy is found to be a concave function of E for the original HT model, the entropy has a convex E dependence in some energy region for the Go-like HT model. In the convex region, we can draw a tangent common to two energy values; the slope is found to be indeed the inverse T_{θ}. The internal energy distribution at the temperature T_{θ} is also presented in Figure 5. The internal energy indicates bimodal distribution for the Go-like HT model. The lower energy peak corresponds to the collapsed state and the higher energy peak the random coil state; the peak energy difference gives the latent heat as seen in the first order transition. Thus, the collapse transition for the Go-like HT model could be categorized into the first order type transition. For the original HT model, the energy distribution is unimodal, suggesting the collapse transition could be categorized into a continuous type transition. As shown in Figure 6, the energy distribution for the both models at the folding transition temperature is found to be unimodal; again, the folding transition for the both models could be a continuous type. In Figure 7, the instantaneous potential energy and the order parameter are plotted along the canonical GHMC trajectory controlled at the temperature T_{θ}. For the Go-like HT model, low and high potential energy states are clearly shown to coexist dynamically; the former is the collapsed state and the latter the random coil state. Typical configurations of the collapsed and the random coil states are presented in Figure 7. It is worthwhile to mention that the collapsed state has relatively large value of the order parameter Q. This is due to the fact that the collapsed state for the Go-like HT model has a barrel region consisting of three β-strands common to the native structure. To further clarify this point, we introduce a partial order parameter Q_{p} that is defined by Eq. (20) only using the above β-barrel region. This parameter is normalized to be unity if the region is similar enough with the β-barrel found in the native state. In Figure 8, the calculated Q_{p} is presented. The large Q_{p} values are observed for the collapsed state. The trend is clearer for the partial order parameters calculated using the inherent structures which have larger values close to unity. This demonstrates that the collapsed state could be regarded to be a partially folded state. On the other hand, the collapsed state for the original HT model has no such partial structural similarity with the native state.
Microcanonical entropy S(E) and the potential energy distribution P(E) at the collapse transition temperature T_{θ} for the HT model (left panels) and the Go-like HT model (right panels). Dashed line in the right upper panel indicates T_{θ}.
Microcanonical entropy S(E) and the potential energy distribution P(E) at the folding transition temperature T_{f} for the HT model (left panels) and the Go-like HT model (right panels).
The potential energy U and the order parameter Q at the collapse transition temperature T_{θ} as a function of the canonical GHMC steps for the HT model (left panels) and the Go-like HT model (right panels). The left and right figures are typical configurations of the collapsed and the random coil states for the Go-like HT model, respectively.
The partial order parameter Q_{p} (upper panel) and the corresponding partial order parameter calculated by the inherent structure Q_{IS, p} (lower panel) for the Go-like HT model at the collapse transition temperature T_{θ} as a function of the canonical GHMC steps.
In the microcanonical ensemble, it might seem that a stable internal energy intermediate between the collapsed and random coil states could be given by phase separation; however, the system is too small to phase-separate. The van der Waals loop type behavior observed for the caloric curve can be understood by the kinetic energy distribution at fixed internal energy E. The kinetic energy distribution P(K; E) for a given internal energy E can be written by [46]
(23) |
where K is the kinetic energy and s=3N–6. The calculated result is presented in Figure 9. For the Go-like HT model, in the vicinity of the collapse transition, the kinetic energy distribution broadens toward lower energy with increasing the internal energy. This arises from the system climbing up the potential energy surface, accompanying the β-barrel deformation in the collapsed state. The resulting averaged kinetic energy shows the S-shaped dependence as seen in the microcanonical caloric curve. For the original model, monotonic dependence of the averaged kinetic energy is observed, although the kinetic energy distribution broadens at the collapse transition. The S-shaped dependence seen for the Go-like HT model yields the exotic energy dependence of the heat capacity. The microcanonical heat capacity is evaluated by
The kinetic energy distribution as a function of the internal energy E for the HT model (upper panel) and the Go-like HT model (lower panel). Dashed lines indicate the averaged kinetic energy as a function of the internal energy.
(24) |
The calculated heat capacity is presented in Figure 10. For the Go-like HT model, the negative heat capacity is found around the collapse transition, reflecting the S-shaped loop in the microcanonical caloric curve. Indeed, this is due to the fact that an increase in the internal energy causes a temperature reduction. A peak is found around the folding transition arising from the inflection point in the microcanonical caloric curve. For the original HT model, a broad peak is found at the collapse transition.
The microcanonical heat capacity as a function of the internal energy E for the HT (blue curve) and the Go-like HT (red curve) models.
In the present study, we have performed the multicanonical generalized hybrid Monte Carlo (GHMC) calculations for coarse-grained protein models. One is a model proposed by Honeycutt and Thirumalai (HT) and the other is a modified version called a Go-like HT model that the global minimum structure is specifically stabilized. While the original HT model is characterized by highly frustrated energy landscape near the global potential energy minimum, the Go-like HT model has a deep global potential energy minimum of the native state. The protein models exhibit two types of thermodynamic transitions; the collapse transition between random coil states and spatially compact, but non-native states and the folding transition between the collapsed states and the folded native states. The collapsed state for the Go-like HT model actually is found to be a partially folded state, while the collapsed state for the original HT model has no structural similarity with the native folded state. An empirical parameter regarding foldability defined using the transition temperatures indicates that the original HT model is a slow folder and the Go-like HT model is a fast folder. Caloric curve for the Go-like HT model is found to show strong statistical ensemble dependence, while that for the original model does not. Microcanonical caloric curve for the Go-like HT model shows S-shaped temperature dependence of the internal energy around the collapse transition that is reminiscent of the van der Waals loop characteristic of the first order transition; at the transition temperature, the collapsed and random coil states coexist dynamically. The associated microcanonical heat capacity has negative region around the transition. On the other hand, the original HT model does not show such exotic behaviors. This ensemble dependence and the associated negative heat capacity could be utilized to distinguish fast folding proteins. An important future direction of the present study includes extension of our treatment to more realistic all atom models to explore the energy landscape of hydrated proteins; such a study with the help of integral equation theories of liquids [47,48] is in progress in our research group.
The authors declare that they have no conflict of interest.
N. M. performed all the simulations and analyses. N. M and S. M. directed the entire project and co-wrote the manuscript.