Chemical and Pharmaceutical Bulletin
Online ISSN : 1347-5223
Print ISSN : 0009-2363
ISSN-L : 0009-2363
Current Topics: Review
Computer-Aided Drug Design Using the Fragment Molecular Orbital Method: Current Status and Future Applications for SBDD
Daisuke Takaya
Author information
JOURNAL FREE ACCESS FULL-TEXT HTML

2024 Volume 72 Issue 9 Pages 781-786

Details
Abstract

Owing to the increasing use of computers, computer-aided drug design (CADD) has become an essential component of drug discovery research. In structure-based drug design (SBDD), including inhibitor design and in silico screening of drug target molecules, concordance with wet experimental data is important to provide insights on unique perspectives derived from calculations. Fragment molecular orbital (FMO) method is a quantum chemical method that facilitates precise energy calculations. Fragmentation method makes it possible to apply the quantum chemical method to biological macromolecules for energy calculation based on the electron behavior. Furthermore, interaction energies calculated on a residue-by-residue basis via fragmentation aid in the analysis of interactions between the target and ligand molecule residues and molecular design. In this review, we outline the recent developments in SBDD and FMO methods and highlight the prospects of developing machine learning approaches for large computational data using the FMO method.

1. Introduction

Structure-based drug design (SBDD) aims to control the functions of biomacromolecules, such as proteins and nucleic acids, using other molecules with strong binding affinity for these macromolecules. This notion of control encompasses both the inhibition and activation of functions. The design of ligand molecules, especially seed/lead/drug molecules, using computational methods is also known as computer-aided drug design (CADD). Ligand molecules may be small molecules with a molecular weight of approximately 500, medium-size molecules (such as natural compounds and cyclic peptides), or large proteins (such as antibody molecules). Depending on the context, CADD is also known as virtual or in silico screening. Receptor structure-based virtual screening is referred to as SBVS. These terms differ from each other in terms of the researcher’s focus. As implied by the term “screening,” these methods can be used to select hit compounds from a large compound database, such as ZINC,1) either qualitatively or quantitatively. For example, SBVS predicts a hit compound by focusing on the complementarity between the ligand molecule and binding site of the receptor. Screening can be performed using personal computers or computer clusters; however, recent studies have used many computational resources, including cloud computers.

We previously conducted several drug discovery projects, including the in silico screening of CaMKK2,2) DOCK,3) HCV NS3/4A protease,4) ad Sult2B1b.5) Quantification of the interactions between the ligand molecule and target biopolymer is necessary in SBDD. Numerical evaluations of these interactions may be based on the physicochemical interaction energy, statistical evaluations, or heuristic methods and aid in the design of compounds with strong binding affinity. Evaluation of these interactions involves the docking scores and binding free energies, including entropy terms such as the molecular mechanics Poisson–Boltzmann surface area (MM-PBSA) and molecular mechanics generalized Born surface area (MM-GBSA).6)

Fragment molecular orbital (FMO) method, proposed by Dr. Kitaura in 1999, is used to evaluate such interactions and is a quantum chemical calculation applicable to large biomacromolecules, such as proteins.7) FMO method can be used with software, such as ABINIT-MP8) and GAMESS.9,10) FMO method is now applicable for the quantum chemical calculations of macromolecules with more than 3000 residues, such as the severe acute respiratory syndrome-coronavirus-2 (SARS-Cov-2) spike protein, owing to its improved computational performance.11) This method is also applicable for the interaction analysis of X-ray crystal structures12,13) and activity prediction of small molecules.2) This paper will introduce topics related to CADD and FMO applications.

2. Use of FMO Calculations in CADD

FMO method divides macromolecules into fragments for calculations. The FMO fragment includes amino acid residues for proteins, bases for nucleic acids, and ligand molecules (Fig. 1, Section 1). Interaction of each moiety with a residue can be evaluated by dividing a large ligand molecule into small substituents and using them as FMO fragments. Notably, SP3 bonds are used for fragmentation instead of SP2 bonds, which exist as peptide bonds, for calculation accuracy.14) An advantage of this type of fragmentation is that the quantum chemical calculations for entire giant molecules, such as proteins, can be performed in a practical period for software development. Although the FMO method can be used to calculate the total energy of a molecule, it does not provide sufficient information to understand the large and complex molecules, such as proteins. The fragmentation approach in the FMO method is used to gain insights into molecular functions by analyzing the energy interactions, specifically the intra- and inter-molecular interactions, between fragments.15)

Fig. 1. Schematic Diagram Showing the Utilizing of Data in the Fragment Molecular Orbital (FMO) Method for Structure-Based Drug Design (SBDD)

This scheme shows the preparation of FMO data and prediction of inhibition activity as tasks. In “1. Calculations,” a cartoon model of the protein–ligand complex structure of hematopoietic prostaglandin D synthase (H-PGDS) with F092 (PDB code: 5YWX) and the 2D structure of F092 are shown. In “2. Collections” and “3. Applications,” an activity prediction model was built using machine learning with PIEDA as tabular data.

As the FMO fragments include residues of the binding site, such as proteins and nucleic acids, that function as a receptor structure for the drug target, inter-fragment interactions related to the ligand molecule can be used as a blueprint for drug design. For example, this method can be used to quantitatively determine the specific residues on the receptor as well as the ligand molecules that strongly interact with these residues (Fig. 1, Section 1).

Interaction energy between the fragments is called as the inter-fragment interaction energy (IFIE) or pair interaction energy (PIE). Pair interaction energy decomposition analysis (PIEDA) of IFIE is expressed by the equation presented in Table 1. Associations between the IFIEs, PIEDA, and interaction types collected from studies interpreting the PIEDA values and their interactions are summarized in Table 1. ES indicates the electrostatic interaction, EX indicates the exchange repulsion, CT + mix indicates the charge transfer plus mix term (simply denoted as CT), and DI indicates the dispersion force. Regarding typical physicochemical interactions and relationships between terms, electrostatic interactions, such as the salt bridge, are considered when ES is strong,2) interatomic repulsion when EX is strong,16) hydrogen bonding when ES16) or ES with CT12) are strong, and the presence of dispersion interactions, such as CH/π17) and π/π18) interactions, when DI is strong. Studies on leukocyte-specific protein tyrosine kinases (LCKs) and inhibitors using the FMO method have reported that the CH/π interactions are essential for compound binding to the target protein.19) Halogen bonds can be mainly determined using ES and DI.20)

Table 1. Relationships among the Inter-Fragment Interaction Energy (IFIE), Pair Interaction Energy Decomposition Analysis (PIEDA), and Typical Interactions, Highlighting the Dominant Terms of PIEDA Corresponding to Each Interaction Type

Check marks in the PIEDA columns indicate the dominant terms that are likely to have a large absolute value. For simplicity, CT + mix is abbreviated as CT.

3. Example of Interaction Analysis by the FMO Method

Here, I introduce a case study using the FMO method for the interaction analysis of hematopoietic prostaglandin D synthase (H-PGDS). In this study, high-resolution (1.74 Å) crystal structures of the newly synthesized compound, F092, and its complexes were obtained. Multiple water molecules were present in the binding site, regardless of whether it was a ligand complex or an apo-structure. Six of these water molecules were quantitatively evaluated using the FMO method for their binding to the ligand molecules, Arg12, Gly13, Gln36, Asp96, Trp104, and Lys112. Essential cofactor glutathione showed strong interaction with F092. Leu119 IFIE with the ligand is repulsive (IFIELeu109-F092 = +10.09 kcal/mol), but it binds strongly to F092 via water molecules (IFIELeu119-Water-F092 = −18.16 kcal/mol). IFIE and PIEDA can analyze such interactions in detail (Fig. 2). IFIE was used to determine whether each residue at the binding site contributes to the attraction or repulsion (Fig. 2, upper part). PIEDA can visualize interactions using the four components mentioned above. Unlike that shown in Fig. 2 (upper part), Trp104 was dominated by DI, indicating that a π–π bond is formed with the pyrimidine moiety of F092 (Fig. 2, lower part).

Fig. 2. Summary of the Interaction Energies of F092 with Amino Acid Residues, Water Molecules, and GSH in the H-PGDS Complex

Figures in the top row show a summary of the inter-fragment interaction energy (IFIE) analysis of the F092 fragment (yellow). Fragments with attractive and repulsive interactions are shown in red and blue, respectively. Figures in the bottom row show the results of PIEDA for the F092 fragment (yellow). Main components of the stabilizing interactions of the fragments are represented by the color scheme in the legend. Reprinted from “Characterization of crystal water molecules in a high-affinity inhibitor and hematopoietic prostaglandin D synthase complex by interaction energy,” Bioorganic & Medicinal Chemistry 26 (2018) 4726–4734, Copyright (2018), with permission from Elsevier.

4. Relationship between Experiments and the FMO Interaction Energy

IFIE and PIEDA can be used for inhibitor interaction analysis and strong compound design.21) FMO fragments include residues on the boundary region of the protein; previous studies have shown the hotspots at the interface of protein–protein interaction, which are residues showing large binding free energy change (approximately 3.5 kcal/mol) upon alanine scanning.22,23) However, explaining the experimental observations based on the energy calculated using the FMO method remains challenging. Here, various energy thresholds were used to identify the key interactions among all the inter-fragment interactions of the target molecule. A >|2| kcal/mol threshold in PIE is used for the analysis of the protein–ligand interactions of prostaglandin H(2) synthase-1,24) a >|3| kcal/mol threshold in PIE is used for hotspot residue detection,23) and a similar criterion, specifically < −3 kcal/mol with any component of PIEDA, is used for the analysis of the interaction between the antigen and spike protein of SARS-CoV-2.25) Additionally, IFIE of conserved water molecules at the binding site of H-PGDS is approximately ≤ −8 kcal/mol despite the use of a >|3| kcal/mol threshold.18) This depends on the calculation level used in the FMO method, considerations of implicit or explicit solvation, and the target structure.

As shown here, some studies use absolute values for the threshold. The values of IFIE can be positive or negative, with a positive IFIE if the electrostatic interaction or exchange repulsion term is strongly repulsive. For example, when examining the fragment interaction of a binding site with a ligand molecule, it is possible that the IFIE of an individual fragment is positive even though summation of IFIEs of the entire ligand molecule is negative. The explicit absolute thresholds seem to be designed to identify interactions of interest even from such positive IFIEs, while the negative thresholds are seemingly intended to focus on the attractive interactions.

5. Example of Inhibitor Discovery Study by FMO Method

FMO method is also used to study the development of inhibitors. Therefore, we used the FMO method in this inhibitor discovery study. In silico screening of CaM-dependent protein kinase kinase 2 (CaMKK2) inhibitors revealed several inhibitors with IC50 < 10 µM showing inhibitory activity. Additionally, crystal structures of the complexes were determined. FMO analysis of the X-ray crystal structures revealed that the conformation of residues in the binding site affected the binding of the inhibitor. Relationship between the total energy of the FMO method and inhibitory activity of the compound has been previously discussed.2) Several studies have reported the correlations between the inhibitory activity values and energies determined from FMO calculations.17,26,27) Energy calculated at Second-order Møller-Plesset Perturbation (MP2) level of the FMO method can include DI that cannot be handled by the usual molecular force field. IFIE with the FMO method precisely calculates the enthalpy but not the entropy. The activity values can be well accounted for in some cases, such as in the study of human estrogen receptor-α.27) Correlations between the experimental binding free energies and IFIEs in the FMO method can be improved by combination with Molecular Dynamics calculation.28) One way to account for entropy is to introduce solvent effects: simultaneous use of the FMO method and MM-PBSA, similar to that observed in the study of Pim1.17) In drug design, pre-evaluation of the computational methods used to predict activity and investigation of the computational conditions are necessary for reasonably accurate predictions based on experimental inhibitor data for a target receptor. In the case study of CaMKK2, FMO − HF, FMO − MP2, FMO + MM − PBSA, and the optimization of the binding site coordinates were examined, and FMO + MM − PBSA with optimization was found to be the best (R = −0.89) among all conditions.

In addition to the use of total energy for activity prediction, other statistical methods, such as classical quantitative structure–activity relationship (QSAR),29) which uses a small number of descriptors, such as log P or molecular hydrophobicity, and machine learning30) using a larger number of features, are available to predict activity values. In recent years, machine learning has become surprisingly easy to perform using laptops, Python as an interface, a wealth of information on the Internet, and open-source codes. However, the amount of data required for building an activity prediction model remains controversial. In the field of cheminformatics, which mainly focuses on information on chemical structures, it has been reported that there are more than 1500 descriptors.31) Since SBDD often uses the complex structure of a target protein and ligand molecules, PIEDA calculated by the FMO method will be adopted as the descriptor of the interaction. If the number of residues in the target protein is approximately 300 (referring to the number of residues in CaMKK2), then the number of descriptors obtained from PIEDA is expected to be 1200. Machine learning using a few thousand descriptors can be performed within a reasonable amount of time. However, when the amount of training data is small compared to the number of descriptors, we must be careful about the overlearning and dimensionality curse,32) which is generally known in the machine learning field. Referring to previous studies on the number of training data for machine learning33) and assuming that 10 times the number of descriptors is required, the amount of inhibitor data for that target protein is approximately tens of thousands to create a reasonable machine learning model. Computational scientists with a background in data science can prepare large amounts of data, but it is difficult for researchers in other fields to do so. Furthermore, if PIEDA is used as the descriptor of the interactions, FMO calculations should be performed for all complex structures. However, reducing the number of descriptors or adopting methods that are less prone to overfitting should be explored. The amount of data required to prepare for training must always be carefully considered.

6. Accumulation of FMO Data

As artificial intelligence (AI) is a type of machine learning, it is necessary to accumulate sufficient data; however, it is difficult to estimate the amount of data required to build a highly accurate AI in advance. The use of AlphaFold2 (AF2)34) for CASP14,35) which is a significant breakthrough in the development of advanced methods for predicting protein structures, a field related to SBDD, provides valuable insights. A review article on AF236) revealed that even before AF2 development (specifically in 2005), the library of single-domain protein structures was essentially complete, but information on the query protein could not be matched to the closest template in a structural library, such as the Protein Data Bank (PDB). Computational tasks have only recently been made possible using deep neural networks (DNNs).36) In the field of SBDD, ChooseLD, a protein–ligand docking method that uses a target-specific library composed of discrete data of known binding modes, has been reported. Although ChooseLD does not use AI-based approaches, the prediction performance of conformational RMSD of the ligand increased with the degree of similarity of the compounds in the library.37) It is conceivable that although there might be a sufficient amount of data already available for constructing any prediction models for drug targets, AI-based methods may not be utilized optimally. Given that the quality of training data significantly influences the accuracy of predictions in machine learning, the necessity for data collection is unlikely to diminish in the future.

Against this background, we constructed an FMO database (FMODB),20) a database for collecting quantum chemical calculation data for biological macromolecules. As of October 2023, 36904 FMO data, 7778 PDBID have been registered. For use by non-specialists in the FMO method, a graphical interface is provided, for example, the bar chart of PIEDA, from the viewpoint of the fragment of interest, such as a ligand molecule. The IFIE and PIEDA values obtained by the FMO method are useful in drug design because they allow the interpretation of the receptor–ligand interactions from a quantum chemical point of view. For example, FMO-MP2 includes a DI term, which allows precise evaluation of interactions due to dispersion force, which is related relatively weak interactions, such as CH/π and π/π interactions. Accurate calculation of the interaction energies using the FMO method requires high-resolution X-ray crystal structure data. However, structural data, including those of small molecules, which are of primary interest for drug discovery, are still limited. There are also issues related to the determination of the ionization state of the ligand molecule. Despite the challenges in accurately predicting the binding modes, additional FMO calculations of model structures should be performed using the protein–ligand docking or alternative AI method.38)

Both the total energy and PIEDA can be used as descriptors in machine learning, but the latter provides a richer description of the interaction than the former as PIEDA facilitates the four-item quantitative evaluation of the interaction of the residues from the ligand molecule. Numerical data of PIEDA can be considered tabular to describe the ligand interactions (Fig. 1, Section 3). For example, multiple regression analysis can be performed by setting the target variable as a continuous value, such as a binding constant or inhibitory activity. In general, tabular data are within the scope of machine learning. Tabular data can be analyzed using various machine learning techniques, such as the Random Forest,30) which is a widely recognized method. Many studies have reported successful SBDD by including PIEDA as the descriptor in combination with the appropriate descriptor selection methods.39) DNNs can also be applied to tabular data, but they do not perform as well as expected.40) Various approaches have been developed for accurate predictions using tabular data.41,42) This can be improved by developing a method to represent the features of biological macromolecules and small molecules without using tabular data and then building an AI to enhance its performance. Accumulation of data in FMODB may promote research on the performance of PIEDA as a machine learning descriptor. As the FMO method can calculate the structure-specific atomic partial charges in a protein, the charge data are used as training data to predict the atomic partial charges of the proteins on a set of dynamic structures obtained via molecular dynamics calculations using the atom-centered symmetry function as a descriptor along with the corresponding DNN prediction models.43)

7. Conclusion

In summary, I discussed the application of the FMO method for SBDD. All studies outlined in this review indicate that appropriate simulation conditions for the target molecule should be determined prior to FMO analysis. To date, user-friendly software for the prediction of molecular activity with high accuracy that can be used by people unfamiliar with computational chemistry has not been developed. In the future, more innovative prediction methods may emerge owing to the research progress on methodological development and data accumulation.

Acknowledgments

This study was supported by JSPS KAKENHI (Grant Number: JP23K11320).

Conflict of Interest

The author declares no conflict of interest.

References
 
© 2024 The Pharmaceutical Society of Japan
feedback
Top