2024 Volume 24 Pages 25-47
In structure-based drug design, the fragment molecular orbital (FMO) method, which can quantitatively evaluate interaction energy with high accuracy, helps identify critical amino acid residues and types of inter- and intramolecular interactions for molecular recognition of ligands, peptides, and antigens. In this study, we performed FMO calculations using 679 apoprotein structures, classified them according to their characteristic amino acid residue pair groups (e.g., charged, polar, hydrophobic, and aromatic residues), and statistically analyzed the characteristics of their intramolecular interactions by using inter-fragment interaction energy (IFIE). The average total IFIE between amino acid residues was stronger in the order of pairs of oppositely charged residues, pairs of charged and neutral polar residues, pairs of neutral polar residues, and pairs of hydrophobic residues. Focusing on the dispersion interaction energy obtained using energy decomposition, except for the pairs of both charged residues, pairs of amino acid residues that consisted of aromatic rings had the largest attractive interaction energy compared to other residue pair groups. Furthermore, we compared the intramolecular interaction energy between the FMO and the classic molecular mechanics (MM) methods. The latter tended to estimate the attractive and repulsive interaction energies were weaker than those using the FMO method. As for the dispersion force, the van der Waals interaction energy obtained by the MM method tended to be partially repulsive of amino acid residue pairs with relatively short distances. Such statistical energy interaction analyses of amino acid residue pairs not only depend on understanding the intramolecular interactions of proteins, but is also expected to act as the indicators of IFIEs in molecular recognition (e.g., antigen–antibody and peptide drugs). Furthermore, analysis of critical amino acid residues and their interactions will help understand proteins' molecular bonding and structural stability.
1. Introduction
In structure-based drug design (SBDD), such as in silico drug discovery, analyzing protein-ligand interactions and protein binding sites using docking simulations that are based on known 3D structural information of target proteins is critical for screening and designing inhibitors and substrates. Traditionally, classical molecular mechanics (MM) models, using empirical potentials as molecular force fields, have been widely used for biomolecular simulations because of their reliability and low computational costs. With the emergence of simulation by quantum mechanical (QM) calculation, it has become possible to solve the electronic state of molecules with higher accuracy than with the MM method. The fragment molecular orbital method (FMO) [1–4], one of the QM methods, efficiently processes the QM calculation for all the atoms of the entire system ab initio, which was conventionally expensive, as it divides a target protein into small fragments. Additionally, the FMO method can estimate inter-fragment interaction energy (IFIE) with high accuracy. Since IFIE can be decomposed into its energy components using pair interaction energy decomposition analysis (PIEDA) [5–7], analysis of their interactions can be applied to drug discovery, such as the selection of lead compounds for SBDD [8–13].
Previously, several reports have analyzed the intermolecular and intramolecular interactions calculated using FMO from the viewpoint of amino acid residues. Studies [14] that performed FMO calculations on proteins analyzed 35 G-protein coupled receptors (GPCR) crystal structures and revealed a relationship between the strength of interactions between residues and the chemical properties of the transmembrane domain. These results also lead to an understanding of how interactions contribute to structural stability, ligand binding, and activation of GPCRs. In studies [15], FMO calculations have been performed on agonists/antagonists of the ligand-binding domain of nuclear receptors (NRs). This report suggests that interactions between amino acid residues in helix 12 play a vital role in the pharmacological function of NRs. Research focusing on intermolecular interactions has analyzed the correlation between IFIE by FMO and structural stabilization, as well as the mechanism of stability in protein-DNA/RNA complexes [16,17]. Other studies have used FMO to analyze protein-protein interactions and identify residues important for binding [18]. Thus, more precise interaction analysis at the amino acid residue level can lead to an understanding of the structural stability of proteins, their functions, and control mechanisms. Furthermore, amino acid residues contributing to ligand binding or binding to other molecules can be identified and used to design peptides and antibodies.
On the other hand, in the amino acid residue-based interaction analysis using the conventional MM method, a study [19] performed molecular mechanics calculations on 500 antigen-antibody complexes, and pairwise interactions were statistically analyzed for each type of amino acid residue. There are also studies [20–22] that analyze interactions using not only the MM method but also the QM method. However, these studies target only specific proteins or analyze model systems of amino acid side chain pairs of proteins and do not analyze a large number of actual proteins using QM.
FMODB [23–26] is a web-based public database for storing, managing, and sharing fragment molecular orbital (FMO) computational datasets describing complex interactions between biomacromolecules. Recently, even novices without prior knowledge can openly access FMO data calculated and analyzed by experienced researchers via such FMODB, quickly obtaining a large amount of data. In our previous study [25], we statistically analyzed IFIE and PIEDA energy components for various interaction types, such as hydrogen bonding, CH/π, and ion-pair interactions, among fragment pairs in all QM-calculated data registered in FMODB. These results can be used as indicators to estimate the interaction intensity of IFIE and PIEDA components. However, statistical analysis of amino acid residues in the FMO-calculated data of apoproteins deposited in FMODB has not yet been performed, and the characteristics of IFIE and PIEDA components in amino acid residue pairs have not been thoroughly analyzed or verified.
In this study, we performed quantum chemical calculations with the FMO method on a relatively large structural data set of actual apoprotein systems obtained from the Protein Data Bank (PDB) [27]. In the FMO calculations performed in this study, amino acid residues cleaved at specific positions of the conventional fragmentation method were treated as single fragments, and the apoprotein molecule was divided for the calculations. In this paper, we statistically analyzed the characteristics of IFIE and PIEDA values for all amino acid residue pairs obtained by calculation, focusing on the type of amino acid residue. Finally, to highlight the characteristics and advantages of the interaction energy derived from FMO calculations, we compared these energies with those obtained using the conventional MM method. This study provides an index of IFIE for intra-protein interactions using the FMO method and helps to elucidate the interaction energy in the molecular recognition of antigen–antibody and peptide drugs.
2. Materials and Methods
2.1 Dataset
Based on resolution and quality criteria, 679 apo structures were selected from X-ray crystal structures in the Protein Data Bank [27] and used for FMO calculations and analysis. The PDB IDs of these structures are
listed in Table S1. The selected protein structures ranged from 0.81 to 1.55 Å, with an average resolution of about 1.38 Å. The number of amino acid residues ranged from 30 to 632, with an average of 178 residues (Figure S1). Figure 1 shows the lowest, median, and highest number of residues among the 679 protein apo structures. The most prevalent protein family in the structures used was the hydrolases.
Figure 1.Protein structures based on the number of amino acid residues
(a) Structure with the lowest number of residues, 30 (PDB: 1P9L). (b) Structure with the median number of residues, 178 (PDB: 4ANN). (c) Structure with the highest number of residues, 632 (PDB: 3L84).
2.2 FMO calculation
For the FMO calculation of each apoprotein used in the analysis, we used a semi-automated FMO protocol [24], which was developed to make structural model/ing, creating input files, and executing calculations easier. The protocol used the Molecular Operating Environment (MOE) [28], an integrated computational chemistry system, to preprocess protein structure preparation as follows: First, 3D structural data were corrected using the “Structure Preparation” function of MOE. Only the missing atoms were complemented, and the intermediate missing residues were capped with NME and ACE. Next, when the “Protonate3D” function is run, the optimal ionization state at pH 7.0 is determined, and hydrogen atoms are added according to that ionization state. N-terminal and C-terminal residues were treated as NH3+ and COO− ions, respectively. Finally, geometry optimization calculations using MM were performed only on hydrogen atoms and complemented atoms, with all the heavy atoms fixed under force field Amber10:EHT conditions. After the preparation of the structure, FMO calculations were performed following the protocol. At that time, the structure optimization was performed with all molecules, such as protein, small molecules, and water molecules, in the PDB structure. For the apoprotein structure after structural optimization, the molecular system to calculate FMO was extracted under the following conditions. If the molecular system contained small molecules such as metal ions or inorganic crystallizing agents, we performed FMO calculations without removing those small molecules. Furthermore, for water molecules, if small molecules were included in the molecular system, only water molecules within a distance of 3 Å were retained, and other water molecules were deleted; its FMO calculations were performed. If the small molecules were not included, all water molecules were removed before the FMO calculation. The software used for calculation was ABINIT-MP MIZUHO 3.0 [29]. The protein was split into amino acid residue-level fragments using the automatic fragmentation function (Figure S2), and FMO calculations at the MP2/6-31G* level were performed. The results of each FMO calculation for the structures used in this study are publicly available in the FMODB [23].
2.3 Inter-fragments interaction energy analysis
The FMO method was developed for efficiently performing quantum chemical calculations on biopolymers[1–3]. It divided a protein into amino acid residue fragment units and solved the electronic state of the entire system using a two-body approximation. In addition, as the molecular orbital calculation for each fragment monomer and dimer can be calculated in parallel, the calculation cost can be considerably reduced, and the calculation can be executed efficiently. The total energy of the FMO calculation for the entire system is expressed in Equation (1), where each fragment (monomer) and fragment pair (dimer) were calculated under the environmental electrostatic potential of its surrounding fragments using the Schrödinger equation.
Here,
PIEDA enables a more detailed analysis of interactions between fragments. As shown in Figure S2, in the general division scheme in FMO, the FMO calculation is performed by splitting the fragments at the bond between the Cα and C=O carbon atoms of the main chain in the amino acid. Similarly, this paper treated fragments divided at the cleavage positions shown in Figure S2 as individual units of amino acid residues, which were then used for statistical analysis.
In contrast, in the MM method, the interaction energy in the molecular force field [25] was calculated using the MOE version 2019.0101 [28]. Calculations were performed in a vacuum using the Amber10EHT force field. In the MM method, the electrostatic interaction energy
Interactions between fragment pairs were detected from the geometric arrangement of fragment pairs using the prolig_Calculate function available in the MOE. The prolig_Calculate function calculates hydrogen bonds; π-orbital interactions (e.g., CH/π, cation/π, and π–π interactions); ionic interactions; interatomic distances; and bond orientation angles. The energy cut-off value for detecting each interaction was set at −0.5 kcal/mol, the default setting.
3. Results and Discussion
3.1 Analyses of IFIE and PIEDA for amino acid residue pairs
3.1.1 Fragments used for analysis
We performed FMO calculations using 679 structures in the PDB and analyzed the interaction energies between each fragment. In FMO calculation, a protein molecule is divided into small molecular units called fragments; in this research's automatic fragment division, one amino acid residue was treated as one fragment for calculations and analysis. However, for Cys residues with disulfide (S-S) bonds, the pair of bonded Cys residues is treated as one fragment. In this study, we aimed to investigate the characteristics of each fragment, that is, each amino acid residue, so if two Cys residues were treated as one fragment, the IFIE of the pair would be more significant than usual, which may affect the analysis. Therefore, the 258 Cys residues with disulfides were excluded from this analysis. Pairs containing metal, water, solvent, N-terminus, and C-terminus were also excluded as outliers from the statistics. Furthermore, our previous study suggested a high possibility that steric hindrance has occurred when
Figure 2. Heatmap of the total number of fragment pairs
The total number of fragment pairs (614,937 pairs) of the 679 FMO calculations is shown as a heat map. The numbers indicate the number of pairs (per 1,000). The amino acid residue names of the pairs are shown on the horizontal and vertical axes, and the shade of green indicates the total number. The shade becomes darker as the total increases. The maximum number of pairs is 14,330, represented as 14.3 on the graph, while the minimum is 103, shown as 0.1.
Figure 2 shows a heatmap of the number of fragment pairs (i.e., the number of pairs of amino acid residues) categorized by amino acid residues’ characteristics for the total 614,937 fragment pairs. Overall, it can be seen that the ratio of pairs of hydrophobic and neutral Leu, Ile, and Val was high. A total of 210 pairs of fragments were observed, among which the most numerous were Leu–Val, with 14,330 pairs.
Figure 3 shows the total number of fragments, and Table 1 shows the abundance ratio of each fragment. Leu had the highest ratio of fragments, comprising 11.78% of the total, followed by Val with 8.77%. The lowest ratio among all residues was Cys, at 0.92%, likely due to the exclusion of disulfide-bonded Cys.
Table 1. The abundance ratio of fragments for each amino acid residue
Residue | Asp | Glu | Arg | Lys | His | Asn | Cys | Gln | Ser | Thr |
Type | Chg | Chg | Chg | Chg |
NP Aro |
NP | NP | NP | NP | NP |
Frequency (%) | 4.68 | 5.21 | 5.25 | 4.95 | 1.76 | 3.81 | 0.92 | 3.41 | 5.07 | 5.26 |
Residue | Tyr | Ala | Gly | Ile | Leu | Met | Pro | Val | Phe | Trp |
Type |
NP Aro |
Hyd | Hyd | Hyd | Hyd | Hyd | Hyd | Hyd |
Hyd Aro |
Hyd Aro |
Frequency (%) | 4.39 | 8.21 | 5.62 | 7.20 | 11.78 | 2.42 | 3.82 | 8.77 | 5.41 | 2.04 |
The percentage (%) of each amino acid residue present in all the fragments (1,229,874 fragments) of the analyzed data (679 FMO calculations) is listed. Below the residue name, its classification is listed based on its characteristics: “Chg” represents a charged residue, “NP” denotes a neutral polar residue, “Hyd” indicates a hydrophobic residue, and “Aro” signifies a residue with an aromatic ring.
Figure 3. The total number of each amino acid residue in all the fragments
The total number of each amino acid residue present in all the fragments (1,229,874 fragments) from the analyzed data (679 FMO calculations) is listed.
We classified amino acid residues according to their properties into pairs consisting of charged residues, charged and neutral polar residues, neutral polar residues, and hydrophobic residues, respectively. Interaction analysis was performed based on these classifications. The characteristics of each pair group are as follows:
Charged residue pairs
Pairs consisting of charged amino acid residues. This group consists of pairs of basic residues (Arg and Lys) charged to +1e and acidic residues (Asp and Glu) charged to −1e.
Charged and neutral polar residue pairs
Pairs consisting of a charged residue and a polar residue. Charged residues are also polar, but the polar residues shown here refer to neutral polar residues. Thr, Asn, Ser, Gln, and Cys are polar residues that differ from other charged residues in this respect. His and Tyr are polar residues with aromatic rings. This study excluded fragments, including two Cys residues with disulfide bonds, from the analysis because the automatic fragmentation in the FMO calculation did not cleave the covalent bond.
Neutral polar residue pairs
Pairs of amino acid residues that are polar and neutral. Neutral polar residues included Thr, Asn, Ser, Gln, and Cys. His and Tyr, with aromatic rings, were the polar residues included.
Hydrophobic residue pairs
Pairs consisting of hydrophobic and nonpolar residues. The targets were Ile, Leu, Val, Gly, Pro, Ala, Met, as well as hydrophobic aromatic ring-bearing residues, Trp and Phe.
Figure S4 shows the spatial positions of four types of pairs: charged residue pairs, charged-neutral polar residue pairs, neutral polar residue pairs, and hydrophobic residue pairs, for the three structures in Figure 1.
3.1.2 total IFIE
Figure 4 shows a heat map of the average total IFIE of each fragment pair, and the bar graph below shows the mean of the total IFIE average value in the vertical columns of the heatmap. The total IFIE represents the sum of interaction energies between each residue pair and is a critical factor related to the stability of protein tertiary structure and folding.
Focusing on the interaction between charged amino acid residues, the total IFIE calculated using the FMO method showed a stronger interaction energy compared with other types of residue pairs. The largest average total IFIE values were found for pairs of basic residues (Arg and Lys, charged +1e) and acidic residues (Asp and Glu, charged −1e) with their average energy values ranged from −63.9 to −68.9 kcal/mol. This is owing to strong ionic interactions from positively and negatively charged functional groups, driven by charge-derived Coulomb forces, resulting in relatively large attractive interaction energies between the pairs. Among the attractive interactions between the positively and negatively charged pairs, the Arg–Asp pair had the strongest total IFIE average value of −68.9 kcal/mol, the weakest being the Lys–Glu pair with −63.9 kcal/mol. As shown in Figure 5, the attractive interaction energy value of the Arg pair is greater than that of the Lys pair because the guanidinium group, which has one more −NH group, facilitates interactions, such as hydrogen bonding. The strong interactions between these positively and negatively charged amino acid residue pairs play a crucial role in stabilizing the protein's tertiary structure. Conversely, the repulsive interaction energies of total IFIE were remarkable, 29.4~37.2 kcal/mol, for same positive- or negative-charged fragment pairs. This strong repulsive interaction energy can contribute to protein instability, potentially destabilizing the protein's tertiary structure.
Figure 4. The mean total IFIE value for each amino acid residue pair
This heat map represents the mean total IFIE value calculated using the FMO method for each fragment pair (amino acid residue pair). All cells are colored differently from blue to red according to the mean value of IFIE: colors from red to orange (−20 to 0 kcal/mol), from light blue to dark blue (0 to +20 kcal/mol). The bar graph below represents the average of the mean total IFIE values from the vertical columns of the heatmap. The numbers on the bar graph indicate their standard deviations.
In interactions between charged and neutral polar residues, due to the exclusion of disulfide-bonded fragments, only the pairs involving Cys exhibited a weak total IFIE. However, in general, the average energy value of such pairs was relatively strong at −5.9 to −11.3 kcal/mol. Among the attractive interactions between charged and polar amino acid residue pairs, the Asp–Asn pair had the greatest total IFIE average of −11.3 kcal/mol, and the weakest interaction pairs were Arg–Thr and Lys–Thr with −5.9 kcal/mol, excluding the Cys-containing pair. Furthermore, the attractive interaction energy was stronger in pairs with negatively charged residues than in those with positively charged residues, with the strongest interactions found in pairs between neutral polar residues and Asp.
For pairs comprised of polar amino acid residues, the total IFIE was not very strong, ranging from −2.9 kcal/mol to −6.9 kcal/mol. Among them, the largest values were −6.9 kcal/mol for the Asn–Asn and −6.5 kcal/mol for the Gln–Gln pair, as the side chains had hydrogen bond donors and acceptors in the amide group.
Figure 5. Structural comparison between the Arg–Asp and Lys–Asp pairs
(a) An example structure of the Arg–Asp. (b) An example structure of the Lys–Asp pair, where the distance between residues is approximately 1.6 Å. In the case of the Arg side chain, there are two hydrogen bonds of the guanidinium group that can form hydrogen bonds with the carboxyl group of the Asp side chain. However, in the case of the Lys side chain, there is only one hydrogen bond formed by the amino group.
Finally, the hydrophobic residue pair group had the weakest total IFIE among all residue pairs, and the value ranged from −2.0 kcal/mol to −3.9 kcal/mol. The weakest overall mean total IFIE values were for Pro–Pro and Gly–Gly at −2.0 kcal/mol, belonging to this hydrophobic group. Conversely, Trp–Pro (−3.9 kcal/mol) and Trp–Trp (−3.8 kcal/mol) had the largest average values of attractive interaction energies in this residue group.
Figure 6 shows a stacked graph for 20 amino acid residues, averaging the interaction energies [total IFIE, electrostatic interaction energy (ES), exchange repulsion energy (EX), charge transfer energy with higher order mixed terms (CT+mix), and dispersion force (DI)] for all the pairs involving that particular residue. Overall, charged residues had more characteristic average energies of total IFIE and ES components than other amino acid residues. The strength of the total IFIE value was ordered as Asp (−12.7 kcal/mol) > Glu (−12.3 kcal/mol) > Arg (−11.9 kcal/mol) > Lys (−11.6 kcal/mol) in Table S3. Contrary to other charged residues, Asn and Gln, which are neutral polar residues, had slightly stronger attractive interaction energies in total IFIE and ES components. These results stemmed from the side chains possessing carbonyl (C=O) and amino (−NH2) groups, serving respectively as hydrogen bond acceptors and donors, thus facilitating the formation of hydrogen bonds. In all residues, the energy value of the CT+mix component did not show much difference compared to the ES componen. However, this was due to the relatively small ratio of the CT+mix component to the total IFIE, obscuring the differences. The repulsive interaction energy of the EX component was slightly greater for neutral polar residues than for hydrophobic residues. For His, Tyr, Phe, and Trp, which have aromatic rings, the attractive interaction energy of the DI component was slightly more powerful than that of other residues.
Figure 6. The mean interaction energy value for each amino acid residue
This bar and line graphs show the average interaction energies (kcal/mol) calculated using the FMO method for all the fragment pairs (e.g., Asp–Asp, Asp–Glu, Asp–Arg, and Ala–Lys: a total of 20 pairs for Asp) containing a specific amino acid residue. The purple line indicates the total IFIE, while the blue bar represents the ES, the orange bar the EX, the green bar the CT+mix, and the red bar the DI.
Next, the interaction energies between each fragment pair were decomposed using PIEDA, and statistical analyses were performed to reveal the contributions of interaction energy components in each amino acid pair.
3.1.2 Electrostatic interaction energy
The resolved ES energy component represents the electrostatic interaction energy, typically observed in the hydrogen bonding and ionic interactions. Figure S5 shows the average energy value between each fragment pair for ES, similar to the total IFIE. Compared with total IFIE (Figure 4.), the ES component displays similar overall trends. In particular, for pairs of charged residues with the highest attractive interaction energy of ranging from −62.1 to −67.7 kcal/mol, the ES component constitutes a more significant proportion of the total IFIE than it does in other pairs (Figures 4 and 6). This occurs because both residues of the pair are charged, allowing them to form both hydrogen bonds and ionic interactions. This suggests that electrostatic interactions are a crucial component of protein stability in these pairs.
3.1.3 Exchange repulsion energy
Figure S6 is the average value graph of the EX component among the PIEDA energy components and represents the exchange-repulsion energy. When the EX value between amino acid residue pairs is high, electron clouds between the residues tend to repel each other as they come closer, increasing steric hindrance. This could affect the protein's tertiary structure, leading to instability.
The largest repulsive interaction energy was found in the pairs of the positively charged Arg with the negatively charged Asp and Glu, i.e., Arg−Asp or Arg−Glu pairs, and was 5.6 kcal/mol. Conversely, the pair comprising of the same positively charged (Lys) and negatively charged residue, the Lys–Asp/Glu pair, had a difference of 1.8 kcal/mol compared to Arg–Asp/Glu. The pairs of charged and neutral polar residues exhibited the next highest repulsive interaction energies, ranging from 2.0 to 4.7 kcal/mol. Pairs of residues, one negatively charged and the other polar tended to have greater repulsive interaction energies than those with one positively charged and the other polar. Among them, Asp–Ser was 4.7 kcal/mol, Asp–Thr was 4.3 kcal/mol, and Asp–Tyr was 3.8 kcal/mol, which was more potent than the other pairs. Ser, Thr, and Tyr have hydroxyl groups on the side chains, and the trend was consistent with the ES heatmap (Figure S5). This suggests that the larger repulsive interaction energy is influenced by the closer proximity due to hydrogen bonds and ionic interactions (Figure 7).
Among pairs comprised of two neutral polar residues, the His–Tyr pair had a noticeably high energy value of 3.6 kcal/mol. Both His and Tyr have bulky side chains (aromatic rings), and the hydroxyl group of Tyr and the imidazole group of His can act as both hydrogen bond donors and acceptors. Therefore, it is suggested that hydrogen bonds are easily formed, and the relatively short fragment pair distance leads to strong repulsive interaction energy. These phenomena may increase steric hindrance and affect the stability of the protein.
Figure 7. Structural comparison of Asp–Ser, Asp–Thr, and Asp–Asn pair
(a) Examples of the Asp–Ser pair, (b) the Asp–Thr pair, and (c) the Asp–Asn pair are examples that exhibit the highest EX values. For the Ser and Thr pairs, the distance between the hydroxyl group on the side chain and the carboxylic acid on the Asp side chain (a charged residue) is only 1.32 Å. On the other hand, in the Asn pair, which also involves a neutral polar residue, the distance from the amino group on the side chain is 1.59 Å.
3.1.4 Charge transfer interaction energy
The CT+mix term is the charge transfer interaction energy, and the attractive interaction energy value becomes greater when charge transfers, involving hydrogen bonding, occur. This may enhance the interaction between molecules and improve the structural stability of the protein. Figure S7 shows the average energy values of the CT+mix component.
Similar to total IFIE and ES, in the CT+mix values, the charge−charge pairs were the strongest (−2.6 to −3.3 kcal/mol), and the following highest energy values were the charge−neutral polar residue pairs (−1.1 to −2.6 kcal/mol). Moreover, the pairs comprising negatively charged and neutral polar residues showed slightly greater energy values than those of the positively charged and neutral polar residues, likely owing to the presence of a carboxylic acid group. Among these pairs, the highest attractive energy values were observed for the Asp−Ser pair (–2.6 kcal/mol) and the Asp−Thr pair (–2.5 kcal/mol). Although the hydroxyl groups (–OH) in the side chains of Ser and Thr could influence the interaction, the Asp–Tyr pair, which also contains a hydroxyl group, was not particularly strong, observed at –2.1 kcal/mol. This is due to the characteristic feature that Ser and Thr, unlike Tyr, lack a benzene ring and have the main chain's –NH group relatively close. Both the side chain –OH group and the main chain –NH group can form hydrogen bonds with the Asp side chain carboxylate, potentially resulting in higher attractive interaction energy values for the CT+mix term compared to pairs involving Tyr (Figure 8).
For pairs with neutral polar residues, the average energies of the CT+mix components were somewhat larger than the attractive energies of other neutral residue pairs (i.e., hydrophobic pairs), ranging from −1.1 to −1.7 kcal/mol. Among them, the Asn–Asn, Gln–Gln, and Tyr–Tyr pairs had relatively strong attractive interaction energy at −1.7 kcal/mol, and all of these were residue pairs containing both hydrogen bond donors and acceptors in their side chains.
Figure 8. Structural comparison of Asp–Ser, Asp–Thr, and Asp–Tyr pairs
(a) Asp–Ser pairs, (b) Asp–Thr pairs, and (c) examples of hydrogen-bonded structures in Asp–Tyr pairs. In the case of Ser and Thr, two hydrogen bonds can be formed between the carboxylic acid of the Asp side chain (a charged residue) and the hydroxyl group of the side chain or the −NH group of the main chain. In contrast, Tyr can form a hydrogen bond with one of the side-chain hydroxyl groups.
3.1.5 Dispersion force
The DI term in FMO represents the dispersion interaction energy and is closely related to π–π and CH/π interactions, which are hydrophobic interactions mainly related to dispersion forces. These interactions play a crucial role in facilitating intramolecular interactions and shaping the folding structure of proteins.
Figure 9. The mean DI energy for each amino acid residue pair
This heat map represents the mean value of DI calculated using the FMO method for each fragment pair (amino acid residue pair). All cells are colored according to the mean DI value: from red to orange, −20 to 0 kcal/mol; from light to dark blue, 0 to +20 kcal/mol. The bar graph below represents the average of the mean DI values from the vertical columns of the heatmap. The numbers on the bar graph indicate their standard deviations.
Figure 9 shows a heatmap of the mean energy values of the DI components. As shown in the heatmap, except for pairs of charged residues, the average energy values of the DI component were characteristic for pairs with aromatic rings. Those with aromatic rings on one or both residues exhibited greater attractive interaction energies than pairs without aromatic rings. Focusing on pairs in which both residues have aromatic rings, the pair with the strongest DI component energy value was the Trp–Trp pair with −3.2 kcal/mol. The second strongest DI value was that of the Trp–Tyr pair (−2.9 kcal/mol), and the third strongest was that of the Tyr–Tyr or Phe–Tyr pair (−2.8 kcal/mol). The pair with the weakest attractive interaction energy was His–Phe with −2.2 kcal/mol. The strong average attractive energy value of Trp–Trp pairs likely resulted from their two aromatic rings, making these pairs more prone to forming CH/π or π–π interactions than other aromatic ring residues (Figure 10). The difference between Tyr and Phe is the presence of hydroxyl groups, and this difference could affect their dispersing power. Even between the same aromatic rings, the strength of dispersion forces varies depending on the partner, and selecting different aromatic residue combinations may potentially impact the stability of proteins.
Figure 10. Structure of Trp–Trp pairs
(a) An example structure of a Trp–Trp pair with CH/π interaction, and (b) an example structure of a Trp–Trp pair with π–π interaction. The two ring structures make it easier for Trp–Trp pairs to interact with each other than for pairs involving Tyr, Phe, or His.
Pairs comprising charged residues and residues with aromatic rings had relatively strong DI energy: −3.2 kcal/mol for the Arg–Trp pair, −2.8 kcal/mol for the Arg–Tyr and Lys–Trp pairs, and −2.7 kcal/mol for the Lys–Tyr pair. Moreover, Trp, Tyr, and Phe differed depending on whether the partner was a positively charged residue (Arg or Lys) or a negatively charged residue (Asp or Glu), and the dispersion energy was greater in pairs with positively charged residues. As shown in Figure 11, this indicates that the long carbon atoms in the side chains of Arg and Lys are more likely to form CH/π interactions; the –NH groups in the side chains can also form NH/π and cation/π interactions. On the other hand, in the case of His, the attractive interaction energy with negatively charged residues was larger than that with positively charged residues, which showed properties different from those of other aromatic ring residues.
Comparing the average values for each amino acid residue instead of pairs (Table S3), the overall range of the DI energy values was −1.27 to −2.27 kcal/mol, whereas for residues with aromatic rings (His, Tyr, Phe, and Trp), the dispersion energy per fragment decreased as follows: Trp (−2.37 kcal/mol) > Tyr (−2.23 kcal/mol) > His (−2.08 kcal/mol) > Phe (−2.02 kcal/mol). Additionally, in hydrophobic residues with the weak overall total IFIE energy values, the proportion of DI values to the total IFIE is higher than the CT+mix values. In hydrophobic residues, dispersion interactions are crucial in stabilizing the hydrophobic residues, particularly influencing intramolecular interactions and protein folding. Thus, by observing the dispersion interaction energy (DI) using the FMO method, we can gain a more detailed understanding of these interactions without overlooking relatively weak π-orbital interactions, enabling insights into the structure and function of proteins.
Figure 11. Structure of Arg–Tyr pairs
(a) This is an example structure with a CH/π interacting pair between the Arg side chain carbon and the Tyr aromatic ring in the Arg–Tyr pair. (b) An example of the structure with an NH/π (cation/π) interacting pair between the –NH group of the Arg side chain and the Tyr aromatic ring in the Arg–Tyr pair.
3.2 Comparing interaction energies between the FMO and MM methods for amino acid residue pairs
The interaction energies of each pair were calculated using the MM method in classical mechanics and compared to interaction energies obtained using the FMO method in quantum chemical calculations.
3.2.1 Total interaction energy using the MM method
Figure 12 shows the average total IFIE value for each fragment pair,
Figure 12. The mean total IFIE value for each amino acid residue pair
This heatmap represents the mean of the MM-based IFIE for each fragment pair (amino acid residue pair). All cells are colored according to the mean value of total IFIE calculated using the MM method: red to orange, −20 to 0 kcal/mol; light to dark blue, 0 to +20 kcal/mol. The bar graph below represents the average of the mean total IFIE values from the vertical columns of the heatmap. The numbers on the bar graph indicate their standard deviations.
In pairs of charged residues, which are constructed as the basic and acidic amino acid residues, the average energy values obtained using the MM method are approximately −48.4 to −51.8 kcal/mol, which is about −15.5 to −17.1 kcal/mol smaller than those obtained using the FMO method. For pairs comprised of charged and neutral polar residues, the total energy of the MM method tended to be weaker than that of the FMO method. For example, for the Asp–Asn pair, the total IFIE calculated using the FMO method was −11.3 kcal/mol, while the total IFIE calculated using the MM method was −6.8 kcal/mol, showing a difference of −4.5 kcal/mol. A near similar tendency was observed for pairs comprised of neutral polar residues; the total IFIE calculated using the FMO method for Asn–Asn pairs was −6.9 kcal/mol, and for Gln–Gln was −6.5 kcal/mol, whereas the total IFIE calculated using the MM method was −3.9 kcal/mol and −3.6 kcal/mol, for the Asn–Asn and Gln–Gln pairs, respectively. The difference in total IFIE between the FMO and MM methods was approximately 3 kcal/mol. As shown by the average interaction energies of the fragment units calculated using the FMO and MM methods (Figure 6, Table S3, Table S4), total FMO method IFIE showed that all residues were within the range of −3.10 to −12.69 kcal/mol, whereas the total MM method IFIE was within the range of −1.72 to −8.29 kcal/mol.
Figure 13. The mean MM method interaction energy value for each amino acid residue
This bar and line graphs show the average interaction energies (kcal/mol) calculated using the MM method for all the fragment pairs (e.g., Asp–Asp, Asp–Glu, Asp–Arg, and Ala–Lys: a total of 20 pairs for Asp) containing a specific amino acid residue. The purple line represents the total interaction energy, the blue bar the electrostatic interaction energy, and the red bar the van der Waals interaction, all calculated using the MM method.
3.2.2 Electrostatic interaction energy calculated using the MM method
Figure S8 shows the average value of the electrostatic interaction energy,
3.2.3 van der Waals interaction energy calculated using the MM method
Finally, Figure S9 shows the average van der Waals interaction energies,
Focusing on pairs comprised of aromatic rings, especially Tyr, Figure 14 shows the distance dependence of the dispersion energy values between FMO and MM methods. The interaction energy calculated by the FMO method for Tyr and aromatic ring pairs shows strong values ranging from −7.3 to −4.4 kcal/mol at distances between 1.5 and 2.0 Å, and overall, the energy weakens as the distance increases. In contrast, the vdW interaction energy by the MM method ranges from −1.9 to 4.8 kcal/mol at a distance of 1.5 to 2.0 Å, and reaches approximately 0 kcal/mol at about 1.8 Å. Subsequently, the energy becomes the strongest between 2.3 and 2.7 Å, and then gradually weakens. This means that while in the MM method, steric hindrance between molecules is observed as a repulsive force, especially over short distances, the FMO method allows a more accurate evaluation of the strength of the π-π/CH-π interaction at these distances. It is recommended to use the MM and FMO methods according to the accuracy of the interaction energy appropriate for computational purposes, such as large-scale structural evaluation, initial screening, and precise interaction analysis.
Figure 14. Comparison between DI component and vdW interaction energies in of Tyr and residues with aromatic rings
Interaction energies for Tyr–aromatic ring pairs (Tyr–Tyr, Tyr–His, Tyr–Phe, and Tyr–Trp); (a) DI values calculated using the FMO method, and (b) vdW values calculated using the MM method. The horizontal axis is the distance between pairs. This distance indicates the nearest neighbor, based on atomic distance between fragment pairs.
In this study, to understand the standard values of IFIE and PIEDA, we performed a statistical analysis of intramolecular interactions in apoproteins for all amino acid residue pairs in a comprehensive manner. On the other hand, the classifications of amino acid residue pairs into the secondary structure types or the surface and inner regions of the molecule reported by Qiao et al. [31] are also interesting for understanding the folding of apoproteins. Qiao et al. [31] analyzed antigen-antibody interactions based on MM calculations, focusing on charged polar residues, uncharged polar residues, and hydrophobic residues in 350 antigen-antibody complex structures. They examined the stability of the antigen's interfacial structure by analyzing residues classified into the secondary structures and antigen residues located at the interface between antigen and antibody. Statistical analysis of interaction energies by classifying amino acid residues of protein into the secondary structure types and the surface and inner regions of the molecule helps in understanding the structural stability of proteins. In the future, we would like to add these classifications of amino acid residues and further advance statistical analysis using PIEDA energy components. Although we treated amino acid residues as one fragment in this paper, we also plan to perform interaction energy analysis by calculating and analyzing the main chain and side chain of amino acid residues separately.
4. Conclusion
In this study, we performed quantum chemical calculations on 679 apoprotein structures using the FMO method and statistically analyzed the characteristics of intramolecular interactions within the protein. In particular, we investigated the interaction energy between amino acid residue pairs by type using energy component decomposition using PIEDA based on the FMO method and analyzed the energy strength of each pair. Pairs with neutral polar residues exhibit the next highest total IFIE, primarily dominated by ES components, but the significant presence of CT+mix components indicates the importance of hydrogen bonding. The ease and strength of hydrogen bond formation are influenced by the arrangement and orientation of the side chains of polar residues, which is a factor in the specificity of each pair. The statistical analysis in this paper will help identify which types of side chains are most suitable for forming hydrogen bonds. Although the total IFIE is weak in pairs of hydrophobic residues, the proportion of DI is high, suggesting that dispersion forces play a significant role in the stability of hydrophobic residues and contribute to the folding and structural stabilization of proteins. Pairs with aromatic rings have particularly high DI values, indicating that π-π and CH-π interactions promote intermolecular stability. Furthermore, among residues with aromatic rings, the strength of DI changes depending on the type, number, and side chains of aromatic rings, and each pair shows a different interaction pattern.
On the other hand, the total IFIE by the MM method was lower overall than that by the FMO method and was particularly noticeable for pairs containing charged residues. For pairs containing aromatic rings, the van der Waals interaction energy of the MM method is underestimated, suggesting that the FMO method can more accurately evaluate the π-π and CH-π interactions. This also leads to understanding the importance of aromatic rings in protein structure and function. The statistical analysis in this study provides crucial indicators for determining the specific interaction energy values between amino acid residues calculated using the FMO method, aiding in deepening our understanding of protein structure and function. Furthermore, these insights can be applied to analyze molecular recognition in antigen-antibody interactions and the design of peptide-based pharmaceuticals.
Acknowledgment
This study was carried out as part of the FMO Drug Design Consortium (FMODD) activities. We also thank Prof. Yuji Mochizuki of Rikkyo University, Dr. Tatsuya Nakano of the National Institute of Health Sciences, and Dr. Yoshio Okiyama of Kobe University for providing ABINIT-MP and for general discussions related to FMO calculations. The FMO calculations were performed using the K computer through the HPCI System Research Project (Project ID: hp180147) and the HOKUSAI supercomputer (ID: Q18306) provided by the RIKEN Center for Computational Science. This research was partially supported by the Platform Project for Supporting Drug Discovery and Life Science Research (Basis for Supporting Innovative Drug Discovery and Life Science Research (BINDS)) from AMED under Grant Numbers JP19am0101113 and JP23ama121030. Finally, D.T. and C.W. acknowledge JSPS KAKENHI Grant No. 18K06619. C.W. acknowledges JST PRESTO Grant No. JPMJPR18GD.
We also thank Editage (www.editage.jp) for English language editing.