2024 Volume 10 Article ID: 2023-0038
The fragment molecular orbital (FMO) method is an efficient quantum chemical method suitable for calculating the electronic structures of large molecular systems. FMO can be accelerated by several approximations, an important one being the approximation to the environmental electrostatic potential (ESP) exerted on the fragment monomers or dimers. The environmental ESP is often approximated using the Mulliken atomic orbital charge (AOC) for proximal fragment dimers (ESP-AOC approximation) and the Mulliken point charge (PTC) for distant dimers (ESP-PTC approximation). Recently, another approximation method based on Cholesky decomposition with adaptive metric has been proposed for environmental ESP (ESP-CDAM, Okiyama et al., Bull. Chem. Soc. Jpn., 2021, 94, 91). In the current article, the energy gradient is derived under the ESP-CDAM approximation and implemented in FMO-based molecular dynamics (FMO-MD). Several test FMO-MD simulations are performed to compare the accuracy of the ESP-CDAM approximation with that of the conventional methods. The results show that ESP-CDAM is more accurate than ESP-AOC.
The ab initio fragment molecular orbital (FMO) method, first published in 1999 [1], is a highly efficient and accurate tool for electronic structure calculations of large molecular systems, especially biomolecules [2,3]. In FMO, a molecular system is divided into fragments; self-consistent field (SCF) calculations are performed for fragment monomers, dimers (FMO2) [1], trimers (FMO3) [4], tetramers (FMO4) [5], and so on; and finally the total energy of the system is reconstructed. FMO2 is usually used as the default option, but the accuracy of FMO can be systematically improved by including higher terms such as FMO3 and FMO4. However, to achieve the full improvement in FMO performance, we need to accelerate FMO2 because FMO3 and FMO4 are performed by adding correction terms to FMO2.
FMO2 can be accelerated by several approximations. The most important is the dimer-ES approximation [6], which reduces the overall cost by replacing the SCF calculation of a distant fragment pair with a simple electrostatic calculation. In addition, we can accelerate the environmental electrostatic potential (ESP) terms to further improve the throughput performance of FMO2. In the conventional FMO2 protocol [7,8], the environmental ESP is often calculated by the ESP-AOC approximation for the proximal region of a monomer using the atomic orbital (AO) charge (AOC) population analysis, and also by the ESP-PTC approximation for the distant region using the Mulliken atomic point charge (PTC). However, in another FMO2 protocol for environmental ESP [9], the ESP from the proximal region is treated by the ESP-CDAM approximation based on the Cholesky decomposition [10] with adaptive metric (CDAM) [11].
In this article, we report on the implementation of the latter protocol, namely the approximation of the environmental ESP using CDAM, in the ABINIT-MP program [3, 12, 13]. We present the semi-analytic formula of the potential energy gradient (negative force) at the Hartree-Fock (HF) and second-order Møller-Plesset perturbation theory (MP2) levels under the approximation. We show single-point energy and gradient calculations [14] as well as FMO-based molecular dynamics (FMO-MD) simulations for several molecular systems [15,16,17,18] to demonstrate the power of the ESP-CDAM/FMO implemented here.
In the subsequent mathematical discussion, a fragment monomer is referred to simply as a “monomer” and a pair of fragments as a “dimer.” Indices I, J, and K specify monomers; i, j, and k, orbitals; and μ, ν, λ and σ, AOs. Atomic nuclei are represented by indices A, B, and C; their charges by ZA, ZB, and ZC; and their 3D coordinates by A, B, and C.
2.1 2.1 HF/FMO2.We review the outline of the FMO2 energy and gradient calculation at the HF level [1,14] to facilitate description of the implementation of ESP-CDAM to FMO2. See Figure 1 for a schematic diagram of the FMO2 algorithm.
Algorithm of FMO2.
In FMO2, the total energy of the molecular system, E, is approximated by the following equation, where EI and EIJ refer to the total energies of monomer I and dimer IJ, respectively, and Nf is the number of fragments.
(2.1.1) |
Similarly, the electron density of the molecular system, ρ(r), is approximated as follows.
(2.1.2) |
The electronic structure of each monomer is calculated with an SCF loop, which is iterated until all the monomer densities become self-consistent with each other by the self-consistent charge (SCC) loop. In contrast, the SCF loop for the dimers is performed only once with the charges of monomers obtained in the previous SCC loop.
Below, we describe the monomer and dimer SCF calculations introduced in equations (2.1.3–11), namely, how to calculate equations (2.1.1) and (2.1.2) under the HF approximation. We postulate that each of the monomers and dimers forms a closed shell. Then, the Hartree‒Fock‒Roothaan equation [19] of a monomer or dimer X( = I or IJ) becomes
(2.1.3) |
using the variables defined below.
(2.1.4) |
(2.1.5) |
(2.1.6) |
(2.1.7) |
(2.1.8) |
(2.1.9) |
(2.1.10) |
(2.1.11) |
(2.1.11) |
The naive algorithm of FMO2 described above can be accelerated by several approximations, two of which are especially effective [7]. One is the approximation of the environmental ESP contained in the monomer and dimer, equation (2.1.3), and the other is the dimer-ES approximation for distant dimers [6].
To introduce approximations to the environmental ESP, we need to reformulate the FMO2 energy, equation (2.1.1), as
(2.1.12) |
Here, E´X is the energy of monomer or dimer X from which the environmental ESP has been deleted. Defining the difference density matrix (ΔD) by
(2.1.13) |
we
can apply different ESP approximations for
monomers and dimers using equation (2.1.12), which
contains ESP for dimers only [7].
(2.1.14) |
Under the ESP-CDAM approximation [9], the
environmental ESP acting on a monomer or dimer,
(2.1.15) |
When matrix G is symmetric positive semidefinite, the decomposition of its inverse can be given by
(2.1.16) |
In the CDAM method, the Cholesky basis,
(2.1.17) |
where
M is an AO pair from which the linear dependence
has been deleted, and
GMN =
(M|N). Let
K be the index of the monomer
exerting the environmental ESP, whose Cholesky
basis is
(2.1.18) |
(2.1.19) |
(2.1.20) |
The dimer-ES approximation of ESP for distant dimers is expressed as
(2.1.21) |
which avoids SCF calculations for many dimers.
We have so far described only the energy
calculations. We describe additional formula
necessary to calculate the energy gradient of FMO2
[14]. The
contribution of the third term of equation
(2.1.5), i.e., the projection operator, is
neglected because it would amount to only
(2.1.22) |
if
no ESP approximations are applied. In the ESP-CDAM
approximation,
(2.1.23) |
In the differentiation of ESP from the surrounding Nf − 2 fragments in the dimer calculation, from a strict viewpoint, it is necessary to solve the coupled perturbed Hartree‒Fock (CPHF) equation because the MO of the monomer has not been determined variationally [26, 27]. Nonetheless, the utilization of equations in Ref. [14] was shown to give sufficient precision based on a numerical analysis.
The FMO2 energy is thus calculated at the HF level by the above formulas.
Note that we have chosen not to use the fully analytical energy gradient by the Self-Consistent Z-vector (SCZV) method [23, 26, 27] for CDAM, despite its accuracy superiority over the original semi-analytical gradient [14], due to a particular drawback. The SCZV approach requires that ESP is not switched by distance, which makes only the ESP-PTC approximation valid and invalidates more practical ESP approximations, such as ESP-AOC-PTC. Unfortunately, the ESP-PTC approximation is too coarse to be applicable in FMO-MD, while acceleration algorithms such as Ewald and PPPC are required for FMO-MD simulations of large molecules. Since these methods divide the interactions into proximal and distant ones, they cannot be implemented with the SCZV method. We have therefore decided not to include SCZV in CDAM.
2.2 MP2/FMO2.In ABINIT-MP, the implementation of the MP2 first derivative (or nuclear gradient) [29,30,31,32] was based on a series of expressions given by Ref. [28], where an AO‒MO hybrid strategy was adopted for an efficient parallelized integral-direct processing of one-body density matrix generations with MO indices. These generated D and W are to be back-transformed to the corresponding AO-based matrices, as was just done for the non-separable two-body density term (ГNS) evaluated as the MP2 amplitudes. The construction of the energy-weighted density matrices W implemented is somewhat lengthy, as shown in equations 181 to 187 of Gordon’s paper [28]; therefore, the listing is omitted here. For simplicity of description, the MP2 derivative equations including the HF contributions are hereafter given AO label. Namely, the gradient with respect to nuclear coordinate a( = x, y, or z element of A) is expressed as
(2.2.1) |
where
(2.2.2) |
and
(2.2.3) |
The total FMO2-MP2 energy,
(2.2.4) |
The first derivative of
(2.2.5) |
where
X = I for a
monomer and X =
IJ for a dimer. The following
holds if we denote the environmental ESP
calculated by the electron density obtained by the
HF level monomer SCC as
(2.2.6) |
If we define equation (2.2.7) as
(2.2.7) |
then it follows that
(2.2.8) |
where
(2.2.9) |
and
(2.2.10) |
The MP2/FMO2 energy and gradient thus calculated are used in FMO-MD at the MP2 level [32].
2.3 Test calculations.The new approximations were implemented in a local version of ABINIT-MP Open series Ver. 1.0 Rev. 15 [12]. Three molecular systems were used for test calculations: a cluster of 400 water molecules W400, the Trp-cage polypeptide (1L2Y) [33], and the GESKG droplet. The GESKG droplet is our in-house benchmark system consisting of a polarizable neutral peptide (NH3+-Gly-Glu-Ser-Lys-Gly-CO2−) and a surrounding sphere of 206 water molecules (Figure 2).
The GESKG droplet visualized with BioStation Viewer graphical software [41].
FMO2 calculations were performed at MP2/6-31G*. Each water molecule was regarded as a fragment. The polypeptides were fragmented so that an amino acid residue should form a fragment unit, but at Cα to avoid breaking peptide bonds. The dimer-ES approximation was applied only to those dimers separated within a preset threshold (Ldimer) measured in van der Waals units [6]. Test calculations were performed on a 32-core Intel Xenon computer at Rikkyo University and a 40-core computer of the same type at Hoshi University.
The GESKG droplet was subjected to FMO-MD simulation as the final test of the ESP-CDAM energy gradient with MP2. The droplet was energy-minimized by the Broyden–Fletcher–Goldfarb–Shanno method [35] for 50 steps, given a kinetic energy of 200 K, heated to 300 K with the Nosé‒Hoover chains thermostat [34] with a time step of 0.5 fs for 50 fs, and then subjected to several test runs with various time steps each for 50 fs without any thermostat or constraint to draw a double logarithmic plot of [time step] vs. [root mean square deviation of total energy (RMSD (Et))]. The simulations were performed with the FMO-MD option [36] of the ABINIT-MP program.
Before investigating ESP-CDAM, we determined the optimal value of Ldimer, the threshold for the dimer-ES approximation. The errors in energy and gradient were examined for W400, 1L2Y, and the GEKSG droplet at FMO2-MP2/6-31G* (Figures 3 and 4). The energy error dropped sharply at of Ldimer = 2.0. Therefore, we concluded that of Ldimer should be appropriate and used this value for subsequent calculations.
A. Dependence on Ldimer of the computation error in the energy at FMO2-MP2/6-31G*. The error was calculated as the difference in energy between the presence and absence of the dimer-ES approximation without the environmental ESP approximation. B. An enlarged view of the region where (1.5 ≤ Ldimer ≤ 3.0) in Figure A.
A. Dependence on Ldimer of the computation error in the gradient (RMSD) at FMO2-MP2/6-31G*. The error was calculated as the difference in gradient between the presence and absence of the dimer-ES approximation without the environmental ESP approximation. B. An enlarged view of the region where (1.5 ≤ Ldimer ≤ 3.0) in Figure A.
Next, we investigated the accuracy of the ESP-CDAM for W400 at FMO2-MP2/6-31G*. We calculated the energy and gradient with no ESP approximation (ESP-NON) as a reference and also with the ESP-CDAM and ESP-AOC approximations to evaluate the accuracy of the energy and gradient of the proximal fragment dimers. From the results, we calculated the simple difference for the total energy and the RMSD for the gradient between the reference and the ESP-CDAM and ESP-AOC data to represent the errors due to the approximations (Table 1). The error due to ESP-CDAM in the total energy was as small as −0.5578 mhartree, while the error due to ESP-AOC was 10 times larger. Similarly, the error in the energy gradient was only 0.4574 mhartree/bohr for ESP-CDAM, but as large as approximately 38 mhartree/bohr for ESP-AOC. On the other hand, the calculation speed of ESP-AOC was more than six times faster than that of ESP-CDAM. This decrease in speed is due to the overhead of Cholesky decomposition for fragments as small as water molecules.
Environmental ESPapproximation | ESP-NON | ESP-CDAM | ESP-AOC |
Total energy(mhartree) | 0.0 | -0.5578 | -11.2070 |
Gradient(mhartree/bohr) | 0.0 | 0.4574 | 38.1551 |
Time (s) | 6602.0 | 5870.3 | 995.2 |
We then tested the ESP approximations to the polypeptides, i.e., Trp-cage (Table 2) and the GESKG droplet (Table 3). We found that the acceptable errors in total energy and gradient for Trp-cage were −0.0851 mhartree and 0.7578 mhartree/bohr, respectively, and that those for the GESKG droplet were 0.7516 mhartree and 1.8183 mhartree/bohr.
Environmental ESPapproximation | ESP-NON | ESP-CDAM | ESP-AOC |
Total energy(mhartree) | 0.0 | -0.0851 | 12.2303 |
Gradient(mhartree/bohr) | 0.0 | 0.7578 | 2.6267 |
Time (s) | 27897.2 | 17462.7 | 17001.6 |
Environmental ESPapproximation | ESP-NON | ESP-CDAM | ESP-AOC |
Total energy(mhartree) | 0.0 | 0.7516 | -59.2469 |
Gradient(mhartree/bohr) | 0.0 | 1.8183 | 1.3140 |
Time (s) | 10345.5 | 5259.6 | 3469.7 |
We also compared the accuracy and speed of the ESP-CDAM approximation with that of the conventional approximation, ESP-AOC, using the GESKG droplet system as the target (Table 3). The error in total energy due to ESP-CDAM (approximately 0.8 mhartree) was significantly lower than that due to ESP-AOC (approximately −59 mhartree). The error in energy gradient due to ESP-CDAM (~1.8 mhartree/bohr) was similar to that of the other two (~1.3 mhartree/bohr), but not as pronounced as the error in total energy. The reason for the relatively low gradient accuracy compared to the energy accuracy is not clear and should be investigated in the future. The computational speed of ESP-CDAM was about 1.5 times lower than that of the ESP-AOC approximation. Thus, the ESP-CDAM approximation was more accurate than the other approximation, although slower. ESP-CDAM was only 3% slower in the case of the dehydrated polypeptides (Tables 2), but 50% slower in the case of the GESKG droplet vs. ESP-AOC. Again, we suspect that this difference in behavior is due to the overhead of Cholesky decomposition in calculating fragments as small as water molecules, the dominant species in the GESKG droplet. To overcome this overhead, we plan to speed up ESP-CDAM by reducing the number of small fragments using the tree method [37].
To test the ESP-CDAM approximation, we ran several short FMO-MP2-MD simulations of the GESKG droplet and compared the accuracies and computational speeds of ESP-CDAM and ESP-AOC with those of ESP-NON. We excluded ESP-AOC-PTC from the test because its accuracy is similar to that of ESP-AOC and also because we wanted to avoid the influence of the error in the environmental ESP due to distant fragments. We performed FMO-MD simulations with different time steps each for 50 fs to draw double logarithmic plots of [time step] vs. [RMSD (Et)]. The plot should become a straight line with a slope of 2 in the case of the velocity‒Verlet time integration used this time; hence, deviation of the slope from 2 indicates degradation of the accuracy of the energy gradient [38].
The resulting plots showed that ESP-CDAM has an accuracy comparable to that of ESP-NON and better than that of ESP-AOC (Figure 5). However, all three plots deviated from the ideal slope of 2. The deviation could have been mitigated, if not avoided, by using the analytical gradient [8, 39] and/or by including higher terms [40]. Nevertheless, the current result is sufficient for the evaluation of ESP-CDAM, since we focus only on the difference between ESP-CDAM and the reference ESP-NON or the conventionally used ESP-AOC. Taken together, the present results indicate that ESP-CDAM may be a promising alternative to ESP-AOC, especially for gradient calculation in FMO-MD.
Double logarithmic plots of [Time step] vs [RMSD (Et)] of three ESP approximations.
We investigated the accuracy of the newly implemented ESP-CDAM approximation for environmental ESP. The errors in energy and gradient due to ESP-CDAM against the reference values (ESP-NON) were less than 1 mhartree and about 0.5–2 mhartree/bohr, respectively, for W400, Trp cage, and the GESKG droplet. The errors were similar to or smaller than those obtained using the conventional ESP-AOC approximation. Additional FMO-MD testing of the GESKG droplet showed that the ESP-CDAM gradient was practically as accurate as that of ESP-NON and significantly better than that of ESP-AOC. However, further improvement of the accuracy of the energy gradient in the ESP-CDAM approximation is a future challenge.
The computational speed of ESP-CDAM was twice that of ESP-NON, but significantly slower than that of ESP-AOC. We attribute the slower speed of ESP-CDAM to the overhead of Cholesky decomposition for fragments as small as water molecules. We nonetheless believe that the ESP-CDAM option can be accelerated by using the tree method, which approximates multiple fragments as a multipole [37].
We thank Prof. Kaori Fukuzawa of Osaka University and Dr. Chiduru Watanabe of RIKEN for their support and cooperation. Some of the FMO calculations were performed on the Oakforest-PACS supercomputer (project ID: hp200101) and the Fugaku supercomputer (project ID: hp210130). This study was supported by JSPS Kakenhi (JP19K12010), the Platform Project for Supporting Drug Discovery and Life Science Research (Basis for Supporting Innovative Drug Discovery and Life Science Research, BINDS) from the Japan Agency for Medical Research and Development (AMED, grant number JP21am0101113), and the FMO Drug Design Consortium (FMODD). Y. M would thank Rikkyo University Special Fund for Research (SFR).