Development of an automated fragment molecular orbital (FMO) calculation protocol toward construction of quantum mechanical calculation database for large biomolecules

We developed an automated FMO calculation protocol (Auto-FMO protocol) to calculate huge numbers of protein and ligand complexes, such as drug discovery targets, by an ab initio FMO method. The protocol performs not only FMO calculations but also pre-processing of input structures by homology modeling of missing atoms and subsequent MM-based optimization, as well as post-processing of calculation results. In addition, QM/MM optimization of complex structures, conformational searches of ligand structures in solvent, and MM-PBSA/GBSA calculations can be optionally carried out. In this paper, FMO calculations for 149 X-ray complex structures of estrogen receptor α and p38 MAP kinase were performed at the K computer and in-house PC cluster server by using the Auto-FMO protocol. To demonstrate the usefulness of the Auto-FMO protocol, we compared the ligand binding interaction energies by the Auto-FMO protocol with those of manually prepared data. In most cases, the data calculated by the Auto-FMO protocol showed reasonable agreement with the manually prepared data. Further improvement of the protocol is necessary for the treatment of ionization and tautomerization at the structure preparation stage, because some outlier data were observed due to these issues. The Auto-FMO protocol provides a powerful tool to deal with huge numbers of complexes for drug design, as well as for the construction of the FMO database (http://drugdesign.riken.jp/FMODB/) released in 2019. Chem-Bio Informatics Journal, Vol.19, pp.5-18 (2019) 6


A. Structure preparation of the automated FMO calculation protocol.
Flowchart of the structure preparation for the automated FMO calculation protocol (Auto-FMO protocol) shown in Figure 1. The details are described as follows.
(1) Input data. An MDB file including fields of receptor structures, ligand structures, and unique IDs as the protocol's input file is prepared. The data of each PDB file should be registered as one entry of the MDB file.
(2) Complementation of missing atoms. Each entry of the above MDB file is loaded, and the heavy atoms of the complex are fixed at the original coordinates of the input data. Next, the missing atoms of the PDB data are complemented by one of the following two procedures. First, both missing atoms and missing residues are complemented by homology modeling, using the sequence file given in the FASTA format.
Second, the residues next to the missing residues are capped by NME and ACE groups without complementation, and the partial missing atoms of the side chains are corrected using the "StructurePreparation" function. In the complementation of missing atoms step, the addition of hydrogen atoms is simultaneously conducted.
(2.1) Homology modeling and structure preparation. The SVL function on MOE and its options used in this paper for homology modeling are as follows. Although the options are mainly based on the default settings, several options, nSideModels, intermediate_refine, protonate3D, and final_refine, can be changed according to the user's preference. The SVL function and its options for structure correction used in this paper are shown, as follows.
(3) Local minimization of complemented atoms. Structural minimization of only the complemented atoms in procedure (2) is performed using the force field selected by the option. The SVL function and its options of structural minimization used in this paper are shown, as follows. Although the options are mainly based on the default settings, rigidHOH and verbose were set to active in this protocol. (4) Constraint setup of user specifications. In the optimizing step, the following four options can be selected. The other atoms are fixed and remain at their original PDB coordinates. The default setting was used in this paper.

Selection of optimizing atoms on the protocol's option.
Option 1. All hydrogen atoms of the complex structure (default).
Option 2. All hydrogen atoms of the complex structure; heavy atoms of the ligand molecule.
Option 3. All hydrogen atoms of the complex structure; heavy atoms of the ligand molecule; heavy atoms of the receptor and solvent molecules within threshold distance from the ligand. The threshold distance can be specified by the protocol's option.
Option 4. All atoms of the complex structure.
(5) Final minimization of unrestrained atoms. Final minimization of unrestrained atoms is performed by the following procedure. The complex structure is treated by the selected force field, and the partial charges of the complex are subsequently calculated. Structure minimization is performed by the MM function under the same options as in procedure (3).

(6)
Save structures used for FMO calculation. The obtained optimized structure of each MDB entry is stored in the PDB file format.
Subsequently, the structures are sent to the FMO calculation step. Table S1 summarizes the entries with large differences between the protocol data and the manually prepared data for p38α. In the three complexes (PDB IDs: 3O8P, 1OUK, and 3GFE) with different ligand charges between the protocol data and the manual data, the differences of the IFIE values were more than 100 kcal/mol. In the next entry (PDB ID: 3MW1), the ligand charges of both structures are equal; however the protonation state is different. The ligand structure (ligand name: MIH) by the protocol has two additional hydrogen atoms due to the treatment of the rare N-oxide moiety. As a result, the difference in the ligand binding energy recorded was 80 kcal/mol. There are different tautomerization states in the three complexes (PDB IDs: 3FLS, 3FML, 3FMM). The IFIE difference of the complex (PDB ID: 3HEC) resulted from the different protonated nitrogen atoms of the piperazine ring (ligand name: STI).

B. Difference of IFIE
In the case of the 3QUE entry, the charge and tautomerization state of the ligand obtained by the protocol are the same as those in the manually prepared data. Here, Figure S1 shows the three-dimensional structures of the ligand binding pocket for both data sets. For the complementation of the missing residues by homology modeling, the complemented structures around the ligand showed significant differences. Glu173 formed a hydrogen bond with the ligand in the manually prepared data, while no hydrogen bond between the ligand and Glu173 was detected in the protocol data ( Figure S1). The drastic conformational change of the Glu173 side chain caused a 20 kcal/mol difference in the ligand binding energies.  Figure S1. Three-dimensional structures of the p38α and ligand (MIH) complex (PDB ID: 3QUE, Chain: A) in the Auto-FMO protocoll data (A) and the manually prepared data (B) The ligand and amino acid residues including the X-ray crystal structure are shown by yellow ball and stick models and the cyan line, respectively. Complemented structures regarding missing atoms are highlighted in orange. Glu173 is shown by an orange ball and stick model, because a very large conformational difference between the protocol data and the manual data was observed.
S7 Figure S2 shows the correlation between the Auto-FMO protocol data and the manual data, excluding the obviously different data listed in Table S1. As a result, the correlations of all data (black), the neutral ligand charge data (red), and the positively charged data (blue) between the protocol and manual data were dramatically improved. Figure S2. Validation of the binding energies compared with the manual data and the Auto-FMO protocol data without the obvious nine outliers in the p38α dataset The neutral and positively charged ligands are marked in red and blue, respectively.