2025 Volume 22 Issue 2 Article ID: e220011
Ligand–receptor docking simulation is difficult when the biomolecules have high intrinsic flexibility. If some knowledge on the ligand–receptor complex structure or inter-molecular contact sites are presented in advance, the difficulty of docking problem considerably decreases. This paper proposes a generalized-ensemble method “cartesian-space division mD-VcMD” (or CSD-mD-VcMD), which calculates stable complex structures without assist of experimental knowledge on the complex structure. This method is an extension of our previous method that requires the knowledge on the ligand–receptor complex structure in advance. Both the present and previous methods enhance the conformational sampling, and finally produce a binding free-energy landscape starting from a completely dissociated conformation, and provide a free-energy landscape. We applied the present method to same system studied by the previous method: A ligand (ribocil A or ribocil B) binding to an RNA (the aptamer domain of the FMN riboswitch). The two methods produced similar results, which explained experimental data. For instance, ribocil B bound to the aptamer’s deep binding pocket more strongly than ribocil A did. However, this does not mean that two methods have a similar performance. Note that the present method did not use the experimental knowledge of binding sites although the previous method was supported by the knowledge. The RNA-ligand binding site could be a cryptic site because RNA and ligand are highly flexible in general. The current study showed that CSD-mD-VcMD is actually useful to obtain a binding free-energy landscape of a flexible system, i.e., the RNA-ligand interacting system.
A novel conformation sampling method, CSD-mD-VcMD, was proposed to compute a free-energy landscape of binding of a flexible receptor and ligand. This method is free from experimental knowledge of the system, and predicts the complex structures. This method was applied to systems consisting of an RNA (the aptamer domain of FMN riboswitch) and a ligand (ribocil A or ribocil B). The resultant free-energy landscape was consistent with that obtained from our previous sampling method, which was done using the knowledge of the experimental complex structure.
In this paper, we develop a novel conformational sampling method based on our previously proposed approach. This method was applied to an RNA system. Below, we first discuss the challenges of studying a structurally flexible system, focusing on an RNA–ligand system. Next, we briefly review computational methods that have contributed to conformational sampling of biomolecular systems with atomistic details. We then describe our previous sampling method and its drawbacks, which naturally lead to the motivation for developing our currently proposed method to overcome the drawbacks.
RNA has been an important target for small-molecular drug candidates [1–8], and efforts for predicting the RNA–ligand complex structure have been accumulated [9–12]. On the other hands, those studies are uncovering that the flexibilities of RNA and ligand make the complex–structure prediction highly difficult [13,14].
The difficulty induced by structural flexibility of molecules can be seen also in proteins: “induced fit” or “induced folding” [15,16], where a part of the protein or ligand varies its structure to form a stabler complex. One of extreme cases for this difficulty of molecular binding is a fuzzy complex [17] or a multi-modal interaction ensemble [18], where many binding modes appear at the room temperature by the thermal fluctuations. We note an interesting phenomenon called a “coupled folding and binding mechanism” that is observed in an intrinsically unstructured protein: An unstructured protein or a partly unstructured protein in an isolated state adopts a well-defined tertiary structure upon binding to a partner molecule to exert its biological activity [19–22]. In previous study, we applied a generalized-ensemble method, the multicanonical molecular dynamics simulation [23] to a system of Ref. 20, which consists of an
The above examples suggest the usefulness of effective sampling methods to model or predict complex structures of flexible biomolecules and to investigate a mechanism of the complex formation. Methods in a group of generalized ensemble method sample the conformational space of biomolecular system by applying a bias force to the system or modulate a thermodynamic parameter of the to cause readily conformational transitions [25–28]. Importantly, a statistical weight (thermodynamic weight) is assigned to any of sampled conformations (snapshots) based on the physico-chemical principle. This enables us to compute the free-energy landscape (or the conformational distribution of the system) by assigning the density of conformation (potential of mean force) to the conformational space.
One may suspect the accuracy of the resultant free-energy landscape because of the inaccuracy of the force-field parameters, and we admit that no perfect force-field parameters exist so far. However, our previous study, which used a generalized ensemble method named mD-VcMD simulation, developed by us [29], showed qualitative agreement with experimental result [30]: Binding of ribocil B to the aptamer domain of the FMN riboswitch is stronger than that of ribocil A. Our subsequent analysis showed that the free-energy landscape for ribocil B has a funnel-like shape, and the landscape for ribocil A is rather ragged. On the other hands, Ref. 14 noted that the binding poses of ligands to RNA are determined poorly although the overall RNA structures are well reproduced by correct prediction of the coaxial stacking and tertiary contacts in RNA. This difficulty of the complex-structure prediction may increase the value of the generalized ensemble method because our simulation has provided some distinct RNA–ligand complex structures (binding poses) characterized by the statistical weight. Because the binding poses from computation are characterized by thermodynamic weight, the binding-poses can be used for candidates for the complex structure with a priority ranking.
Before explaining to computational methods, here we explain some biophysical significance of the simulation system (an RNA-ligand system) studied in the present paper: The aptamer domain of the FMN riboswitch (RNA) binding to ribocil A or ribocil B (ligand), which was studied using our previous method. In the present study, we applied the new method to the same system to compare the results between the two studies. Details of the system has been presented in our previous paper and the referred paper in the paper [31–36]. Thus, we explain it briefly. The FMN riboswitch exists in mRNA of a bacterium, Bacillus subtilis, which consists of the aptamer domain and an expression platform. The FMN riboswitch relates to gene-expression of riboflavin (also known as vitamin B2), which is essentially important for the life support of the bacterium. Importantly, binding of a ligand to the deep binding pocket of the aptamer domain causes a structural change in the expression platform although the mechanism for this structural change has not yet been understood well, and finally the riboflavin’s biosynthesis is suppressed by the ligand binding. The stronger the ligand binding, the stronger the biosynthesis suppression. Thus, the investigation of the ligand-receptor binding mechanism is useful for drug discovery of antibiotics.
Analyses to understand statistical mechanics of molecular systems are achieved by taking advantages of the generalized ensemble methods, and those methods were also developed to prevent conformational trapping and then raise the sampling efficiency in the research fields of physics and biophysics. Well-known methods are the replica exchange sampling [37–39], the multicanonical sampling [40–43], and adaptive umbrella sampling [44,45]. Benefit of the generalized ensemble methods are not only their sampling efficiency but also an ability to assign a statistical weight (or thermodynamic weight) to sampled conformations. Furthermore, several generalized ensemble methods have been developed [46–65], which have similarities with the replica exchange, multicanonical, or adaptive umbrella sampling although they were developed independent of replica exchange, multicanonical, or adaptive umbrella sampling. These methods had been used mainly for understanding conformational changes of a single biomolecule in an early stage, and gradually expanded to study on ligand–receptor interactions. In these methods as explained above, a bias force is introduced in the system or a thermodynamic parameter of the system is modulated during the simulation.
Here we note that there is another group of computation methods [66–71] that aim to obtain directly kinetic constants, which are required to investigate conformational transitions among regions/states set in the conformational space of a molecular system: Given two states, say
Because the above-mentioned methods search the conformations in a high-dimensional conformational space, the computation is usually expensive. A way to tackle this problem is to assume a one-dimensional (1D) line, which can be a curve line in the high-dimensional space, and sample the conformations only in the 1D line. We note that the free-energy difference (the difference of the potential of mean force) between the two ends of the 1D line can be calculated with relatively low computational cost. We call this method “line sampling” for convenience in this paper. Importantly, the line sampling has been used to calculate a binding/unbinding free energy (free-energy difference between unbound and bound states) [64,65,72–84]. However, the line sampling cannot discover low-energy regions (basins) out of the 1D pathway in the high-dimensional conformational space. This drawback is crucial when we do not know stable complex structures in advance.
To compute the free-energy landscape of a system consisting of two biomolecules (a ligand and a receptor) in a wide conformational space, which contains unbound configurations, various semi-stable complexes, and native-like complexes, we proposed a generalized ensemble method, a “virtual-system coupled molecular dynamics” (or “mD-VcMD”) simulation method [85,86]. Here we refer to this mD-VcMD as “DSD-mD-VcMD” for convenience (reason of this naming is explained later). The DSD-mD-VcMD method controls the relative positioning of the ligand and the receptor by introducing multiple reaction coordinates (multiple-RCs) and by enhancing the conformational motions in the multiple-RC space. Therefore, this method is suitable to investigate ligand–receptor docking (details are explained later). Furthermore, the DSD-mD-VcMD method can assign a statistical weight (thermodynamic weight) to each of the sampled conformations, and consequently provides a free-energy landscape of molecular binding/unbinding. However, the DSD-mD-VcMD method requires knowledge of the ligand–receptor binding sites to define RCs in advance (details are explained later). This point is a drawback to be applied to a highly flexible system such as the RNA–ligand system because different stable complex structures may be hidden in the conformational space, which is far from the ligand–receptor binding sites postulated in the introduced RCs.
In this study we present a novel sampling method, which uses multiple-RCs free from postulating the ligand–receptor binding sites in advance, and test this method to an RNA–ligand system. Therefore, we do not use the knowledge of the ligand–receptor binding sites, which are known experimentally. Here, we repeat that the present and previous methods are referred to as “CSD-mD-VcMD” and “DSD-mD-VcMD” methods, respectively, and note that the collective term for the two method is “mD-VcMD”. If the present method has proposed a procedure useful to understand the complex formation of the highly flexible molecules (i.e., RNA and ligand) and to present the most and semi-stable complex structures, we believe that the present work has a value for the RNA–ligand complex structure prediction.
Because the molecular conformational space to be sampled is vast, we have performed the mD-VcMD simulation iteratively [29]: When the
Therefore, the difference of DSD- and CSD-mD-VcMD simulations is not the data-preparation method but the definition (or setting) of the multiple-RCs. It is important that the multiple-RCs used in DSD-mD-VcMD are defined based on a priori knowledge of the ligand–receptor binding sites whereas those for CSD-mD-VcMD are free from binding-site information (reason is explained later).
Systems studiedWe first explain the molecular system studied for the present study. In the previous DSD-mD-VcMD simulation [29], we studied an RNA–ligand system consisting of the aptamer domain of the Fusobacterium nucleatum flavin mononucleotide (FMN) riboswitch (receptor) and ribocil A or ribocil B (ligand). Because ribocil A and B are stereoisomers (Figure 1), we examined two similar systems. We studied these two systems again for the present CSD-mD-VcMD simulation.
We referred to the system consisting of ribicil A and the aptamer domain of the FMN riboswitch as the “Ribo-A” system, and did to the system of ribocil B and the aptamer domain as the “Ribo-B” system which is consistent with the previous study [29]. Supplementary Figure S1 demonstrates conformations of the Ribo-A and Ribo-B systems, both of which are completely dissociated configurations of the receptor and ligand in an explicit solvent. These were used for the initial configurations for the previous DSD-mD-VcMD study: See Ref. 29 for detailed methods to generate the systems. These configurations were also used for generating the initial conformations for the current CSD-mD-VcMD simulations.
Biophysical properties of the aptamer domain regarding the ligand binding have been studied experimentally [87–90], and an X-ray crystallography reported a complex structure of the aptamer domain with ribocil B (PDB ID: 5c45; resolution 2.93
In the present study, we perform the CSD-mD-VcMD simulation assuming that we do not know either the RNA–ligand X-ray complex structure nor their ligand–receptor binding sites, and compare the results with those from the DSD-mD-VcMD simulation. Remember that DSD-mD-VcMD used the knowledge of the binding sites of the complex [29]. As it will be explained later, both the DSD-mD-VcMD and the CSD-mD-VcMD simulations started from the unbound conformations.
Multiple-reaction coordinatesNow, we explain multiple reaction coordinates (multiple-RCs), which are set before executing the mD-VcMD simulation [86]. We first explain how to control the RCs for sampling the conformation space widely. After that, we explain a drawback of DSD-mD-VcMD when the ligand-binding site is not known well. Each RC is introduced as a function of atomic coordinates in the system, otherwise we cannot control the ligand–receptor mutual positioning. We refer to a space constructed by the RCs as “RC space”. In the previous studies, we applied distances between two atom groups for the multiple-RCs [29,92–96]. If one of RCs is defined by an inter-molecular distance, then the RC can control association or dissociation of the two molecules. This is why mD-VcMD is suitable for studying the ligand–receptor binding/unbinding. A graphical image of an RC is given in Supplementary Figure S1 of Ref. 86 or Supplementary Figure S4 of Ref. 29. To clarify that the used RC space is a distance space, we have referred to those mD-VcMD simulations as “distance-space-division mD-VcMD” simulations or “DSD-mD-VcMD”.
Whereas we have used multiple RCs usually, Figure 2 presents only a single RC to explain DSD-mD-VcMD. Figure 2 presents schematically a system in that the single RC is represented by the red line between two positions fixed in either the receptor or the ligand. The two molecules go away mutually when the RC increases, and the two molecules approach when the RC decreases.
In mD-VcMD, each RC is divided into small intervals (called “zones”), to which ordinal numbers, such as 1’st, …, 9’th, are assigned as in Figure 2. Actually, we divide the RC axis into more than 9 zones usually. Now, we consider a process that a ligand approaches the ligand-binding site of the receptor from a fully dissociated position. This process is recognized as a series of inter-zone transitions in the scheme of DSD-mD-VcMD. For instance, if the ligand is in the 8’th zone in Figure 2A, the ligand’s approach to its binding site may be a series of inter-zone transitions such as 8’th → 7’th → … → 3’rd → 2’nd. We did not consider the transition from the 2’nd to the 1’st zone in Figure 2A because the 1’st zone is in the interior of the receptor in this figure: An atom collision occurs between the ligand and the receptor in the 1’st zone, which is rejected energetically during a simulation. The inter-zone transitions are controlled by assigning a transition probability between adjacent zones (inter-zone transition probability). Therefore, division of RC into zones is essentially important as discussed in detail in Ref. 86. This transition probability is also explained briefly in this paper later. After finishing the simulation, a statistical weight (thermodynamic weight) is assigned to any sampled conformation by using a re-weighting technique in mD-VcMD [86].
The RC in Figure 2A has a benefit for realizing the molecular binding. The ligand-binding site of the receptor (the red filled circle in the figure) is involved in the 2’nd zone and the volume of the 2’nd zone is smaller than the other zones except for the 1’st zone, which means that the conformation can encounter the ligand-binding site readily when the ligand moves from the 8’th zone to the 2’nd zone. The strategy to search the binding site from a point distant from the binding site is simple: If the zone ordinal No. is set to decrease with reaching the binding site, the ligand reaches the ligand-binding site by moving the ligand so that its zone ordinal No. decreases. An important thing is: The knowledge on the ligand-binding sites is mandatory to set such a convenient RC.
When the ligand-binding site is unknown, one may set an end of the RC randomly at a position of the receptor as exemplified by the “
As described in the Introduction section, the DSD-mD-VcMD may work inefficiently when applied to a highly flexible system, when the real complex structure may be far from the postulated complex structure, or when the complex structure may have a variety by the conformational fluctuations. Therefore, a robust method free from the postulation on the multiple-RCs is required, which is the “cartesian-space division mD-VcMD” or “CSD-mD-VcMD” method.
Setting zones directly to the cartesian spaceFigure 3 shows the difference of the multiple-RC setting between DSD-mD-VcMD and CSD-mD-VcMD. Figure 3A is a scheme for DSD-mD-VcMD, in which three distance-based RCs (RC1, RC2, and RC3) are introduced, all of which are bridging the receptor and the ligand. Therefore, this 3D-RC space is a distance space. Those multiple-RCs are divided into small zones and the molecular motions are controlled by the inter-zone transition probability as explained above. Because Figure 3A involves three RCs (remember that Figure 2 involved a single RC), we can control elaborately the mutual positioning of the two molecules.
An RC can be an arbitrary function expressed by the system’s atomic coordinates [86]. Figure 3B shows a scheme for CSD-mD-VcMD. First, we put the centroid of the aptamer domain of the FMN riboswitch at the body center of the periodic boundary box, as explained in see the Systems–Studied subsection of the Materials–and–Methods section and Supplementally Figure S1. The orientation of the aptamer domain regarding to the periodic box was randomly set. The x-, y-, and z-axes of the cartesian coordinate space are parallel to sides of the periodic boundary box. Therefore, the three RCs
Because the shape of the aptamer domain was relatively spherical and compact, the orientation of the aptamer domain was set randomly in the periodic box. On the other hands, one may set the orientation artificially when the receptor’s shape is deviated considerably from the spherical shape (for instance, when the receptor has a rod shape).
3D-RC space for CSD-mD-VcMDHere, we explain details of the RC setting for CSD-mD-VcMD. We first assume that the x, y, and z coordinate axes of the 3D cartesian space are parallel to the three sides of the cubic boundary box (Figure 3B). Next, we set the
Details for division of the 3D-RC space into small 3D zones are explained elsewhere [86]. We note that the space-division procedure is independent of the type of RC whatever the 3D-RCs are distances (Figure 3A) or 3D-cartesan coordinates (Figure 3B). In the mD-VcMD simulation, the conformation is confined in a 3D-RC zone (“current zone”) for a while and then transitions from the current zone to one of neighboring zones. Supplementary Section 1 explains details for the division of the 3D-RC space (Supplementary Figures S2A and B) and procedure for the inter-zone transition (Supplementary Figures S2C). The mD-VcMD simulation produces a distribution function (density function) of the system assigned to each zone, which is sampled during the simulation. A zone is denoted by
The density assigned to a zone
As mentioned in the beginning of the Materials–and–Methods section, the mD-VcMD simulation is done iteratively. This is because we set the ligand-movable box (Figure 3B) largely and accordingly the majority of the biophysically important conformations are not sampled in a single simulation run. Because iterative simulations have been performed in DSD-mD-VcMD, we also did the CSD-mD-VcMD in the present study. We numbered the iterations as “the first iterative stage”, “the secondt iterative stage”, and so on. When the
In DSD-mD-VcMD, to increase the sampling efficiency further, we have performed
We set the simulation length (denoted as
The mD-VcMD algorithm was first implemented to an MD simulation program myPresto/omegagene [27] and next to GROMACS [98] with the PLUMED plug-in [97]. Both the CSD- and DSD-mD-VcMD simulations can be done using GROMACS with PLUMED. The simulation conditions were the following: LINCS [101] to constrain the covalent-bond lengths relating to hydrogen atoms, the Nosé–Hoover thermostat [102,103] to control the simulation temperature, and the zero-dipole summation method [104–106] to compute the long-range electrostatic interactions. No structural restraint was applied to the system except for weak position restraints on some receptor atoms to suppress the translational and rotational motions of the receptor as explained later. As mentioned above, the time-step of simulation was set to 2 fs (
The TSUBAME3.0 and 4.0 computer systems are constructed by many cpus and GPUs, and each user can use a part of them depending on how the computer system is crowded. Usually, we have used 30–100 GPUs at the same time and executed 1 run by a combination of 1 cpu and 1 GPU. In the present study a run (
One might infer that the resultant conformational ensemble is affected by the confinement of the system’s conformation in 3D zones for intervals of
The systems studied are displayed in Supplementary Figure S1 and the force fields used were explained in Ref. 29. Here, we explain briefly the force fields: Those for the aptamer domain are from the ff99bsc0χOL3 (or χOL3) [107] and that for water molecule is from the 3-point optimal point charge (OPC3) water model [108]. The force field for
Figure S1 demonstrates that the aptamer domain of the FMN riboswitch was put at the center of the periodic box (cubic shape) filled by a solvent and that the ligand was put near a corner of the periodic boundary box for both systems. Therefore, the ligand and the receptor were completely dissociated. For multiple-run execution in each iterative stage, we generated
Figure 3B shows schematically the periodic boundary box and the ligand-movable box, which is put inside the periodic boundary box. See subsection of “Setting Zones Directly to the Cartesian Space” above for the method to put the aptamer domain in the ligand-movable box. The periodic boundary box was set so that their sides were parallel to the x, y, and z coordinate axes with dimension from
The CSD-mD-VcMD simulation produces the density of the ligand position in each zone at the simulation temperature (
We used the following position-restraint function, which has been implemented in GROMACS [98]:
(1) |
where
As described above, we set
Position of each sub-box is determined as follows: First, we pick up a single 3D zone randomly from the ligand-movable box. Next, a sub-box is put so that the picked zone is at the center of the sub-box. The small cubes in Figure 4A and the small squares in Figure 4B are the picked zones, which are located at the centers of sub-boxes. We put 20 sub-boxes at the beginning of each iterative stage of CSD-mD-VcMD in the present study. The sub-boxes may overlap mutually as illustrated in Figure 4C. In an iterative stage, we confined 50 runs in each sub-box. The total number of runs
Supplementary Section 4 explains how to prepare the 50 initial conformations in a sub-box for the
Before starting the simulation, all the zones are not sampled (i.e., all zones are empty). With proceeding the iterative stages, the number of sampled zones increases. After finishing the
Here, we define two quantities:
Above, we introduced ratios
The function
For this purpose, we introduce a “cumulative spatial density” of the ligand around the aptamer domain. First, all zones existing in the ligand-movable box are expressed as
(2) |
We call these 3D zones,
Next, we accumulate
(3) |
Because
Finaly, by setting
(4) |
Because we know the correspondence between
(5) |
The spatial pattern of
In the crystallographic complex structure (PDB ID: 5c45; resolution 2.93
Our computation [29] showed that the ligand can reach the binding site from both surfaces for both the Ribo-A and Ribo-B systems because the ligand’s density
Our previous study [29] quantified the ligand’s molecular orientation by two vectors
Because the vector
We next defined a scalar product,
We quantify the ligand’s structural difference between a snapshot and the X-ray complex structure by a well-known quantity “a root-mean-square-deviation” (“
(6) |
where
Remember that we applied a weak position restraint on some atoms in the aptamer domain (Equation 1) and that the restrained atoms were set to be distant from the ligand-binding pocket (Supplementary Figure S4). Because of the small force constant
Here we note a problem, which we call “degeneracy problem” in this paper, which may induce misleading in analyzing the resultant density
In short, when we set the reaction coordinate according to the Figure 2B in the DSD-mD-VcMD sampling, the conformations with the lowest and second-lowest binding-energy sites cannot be discriminated in the density function
As mentioned, the present CSD-mD-VcMD and previous DSD-mD-VcMD simulations [29] used the same molecular systems, which consist of the aptamer domain of the FMN riboswitch (receptor) and ribocil A or B (ligand). In DSD-mD-VcMD, we introduced three distance-based RCs, where two of the three RCs were set using the knowledge of the native-complex structure (PDB ID: 5c45; resolution 2.93
The extra iterative stages in CSD-mD-VcMD compering to DSD-mD-VcMD may be because we did not use the knowledge of the binding sites of the complex structure. However, we note that if the knowledge is inaccurate or the molecular system is highly flexible, the DSD-mD-vcMD simulation may require more extra iterative stages than the CSD-mD-VcMD simulation. We discuss this point later.
In CSD-mD-VcMD, we set the number of runs,
In each iterative stage, 20 sub-boxes were put randomly in the ligand-movable box, and 50 runs were confined in each of the 20 sub-boxes as explained in the Materials–and–Methods section. Sub-panels “Ite
In the first sub-panel “Ite
To analyze the process of the system reaching an equilibrium, we plot the ratios
Figure 6B plots the correlation coefficient
The bumpy behavior of
As explained in the above subsection, we performed the simulation up to the 85–th iterative stage although
Although Figure 7 demonstrated the difference of the free-energy landscape around the ligand-binding pocket between the two systems, in a broader perspective of the free-energy landscape the two systems have a similarity: The ligand density decreased rapidly when the ligand departed from the surface of the receptor. Therefore, our study shows computationally that the complex formation tends to have a superiority against being in unbound conformations, whereas the system’s state appearing in a real solution depends on the concentration of the ligand and the receptor.
The P1 and P4 helices of the riboswitch fluctuated largely during the simulation, which was also observed in our previous DSD-mD-VcMD simulation [29]. As explained in the Introduction–section, ribocil B and ribocil A are a strong and weak suppressor of the riboflavin biosynthesis. Thus, we expected that the thermal fluctuations of the P1 and P4 helices may be different between the Ribo-A anhd Ribo-B systems. However, unfortunately, the fluctuation patterns of the P1 and P4 helices did not show substantial difference between the two systems (data not shown). This result is also found in our previous DSD-mD-VcMD simulation. The expression platform was removed in both simulations and the lengths of the P1 and P4 helix may be too short to produce the motional difference between the two systems.
Cumulative densityThe above subsection gave us an impression that the free-energy landscape of Ribo-B may be more funnel-like than that of Ribo-A. To analyze more the degree of the funnel feature of the landscape, we introduced the cumulative spatial density
Not only the current simulation (CSD-mD-vcMD) but also the previous simulation (DSD-mD-VcMD) showed that ribocil B binds more strongly to the aptamer domain than ribocil A does. In other words, the lowest free-energy basin in the deep binding pocket in the aptamer domain for the Ribo-B system is deeper than that for the Ribo-A system in the free-energy landscape. Importantly, these computational results agree qualitatively with an experiment [30]. As investigated in detail in the previous simulation, this difference of the free-energy landscape between the two systems caused by the difference of inter-molecular stacking: The inter-molecular stacking was formed more firmly in the Ribo-B system than in Ribo-A system. However, the simulation showed that the force-fields involve some inaccuracy. Therefore, we consider that more examination of the force field should be done carefully to discuss quantitatively the free-energy landscape.
Radial distribution-like functionFigures 7 and 8 indicated that the lowest free energy was assigned to the ligand binding in the ligand-binding pocket. Now, we show concretely the conformations in the ligand-binding pocket. To pick up those conformations we used the radial distribution-like function
Remember that the high-density region (red-colored contours) of the Ribo-A system were split into two spots (Figure 7A), although the two spots fused into one in Supplementary Figure S10A. This inconsistency is because Figure 7A was generated by the centroid of the R3 ring (i.e., the 3D-RCs were set to the centroid of the R3 ring in CSD-mD-VcMD), whereas Figure S10A was calculated using the position of the entire-ligand centroid. As mentioned in the Materials–and–Methods–section, the ligand’s rotation around the entire-ligand centroid does not move the entire-ligand centroid position largely. In contrast, the R3 ring quantifies the ruggedness of the binding pocket, which results in the distinct high-density spots in the binding pocket in Figure 7A. We presume that it may be better to select a small portion of ligand for the multiple-RCs in the CSD-mD-VcMD.
The entire-centroids of ribicol B in the cluster
We show the complex structures of clusters identified in Figure 9. However, before it, we show the X-ray complex structure (PDB ID: 5c45) in Supplementary Figure S11, which indicates existence of three inter-molecular stackings (A48–R4, G62–R1, and A85–R3). The stackings formed in the X-ray complex structure [91] are referred to as “native stackings” in this paper. We also found two inter-molecular hydrogen bonds (H-bonds): One is from the atom O07 of ribocil B to the O2’ atom of A48 and the other is from the atom O07 of ribocil B to the N7 atom of A99, as shown in Supplementary Figure S11. We show that H-bonds regarding the atom O07 of ribocil are formed frequently to surrounding nucleotides in the clusters.
Now, we show the complex structures of the five clusters in the main text and in the supplementary file: Native-like cluster of Ribo-A is Figure 10,
Figure 10 demonstrates complex structures from the native-like cluster of the Ribo-A system. Figure 10A displays 10 snapshots of ribocil A picked randomly from the native-like cluster peak of Ribo-A. These ribocil-A structures overlap well one another and also to ribocil B in the X-ray structure (Figure 10B). Most of these complex structures form three native stackings, and an H-bond regarding the O07 atom of ribocil A are formed (Figure 10C). This figure was drawn by picking up a single snapshot randomly from the ten conformations in Figure 10A.
Figure 11 is complex structures assigned to the native-like cluster from the Ribo-B system. Figure 11A displays 10 snapshots of ribocil B picked randomly from the native-like cluster of Ribo-B. These ribocil-A structures also overlap well one another and also to the X-ray position. Most of these complex structures form three native stackings between ribocil and the aptamer domain, and the two H-bonds regarding the O07 atom of ribocil A are formed (Figure 10C). This figure was drawn by picking up a single snapshot randomly from Figure 11A. Of course, the number of native-like H bonds depends on the snapshots although the majority of the complex structures involves two native-like H-bonds (data not shown).
The native-like clusters (Figures 10 and 11) were involved the native-stackings and the native-like H-bonds, which are presumably stabilizing factors of the native-like complex structure. In the other clusters, we also found various stackings (native and non-native stackings). We also found that the clusters
The present study showed that CSD-mD-VcMD simulation sampled various complex structures, and the native-like cluster for Ribo-B was not the largest. The native-like complex structures were characterized by the native-like stackings and the native-like H-bonds. This result may suggest that the force field is inaccurate: We argued the same problem in our previous DSD-mD-VcMD study. If the force field is more accurate, the native-like complex structure is more stabilized.
Existence of the non-native clusters (
In the above subsection, we analyzed ligand’s structures in the ligand-binding pocket. In this subsection, we analyze the ligand’s density in the ligand-binding pocket. As shown in Supplementary Figure S6, there is a tunnel that connects the front and rear surfaces of the aptamer domain. Our previous study showed that non-zero ligand’s density occupies the tunnel, and we named the tunnel “density tunnel” [29]. In this subsection, we analyze again the ligand’s density in the density tunnel in detail.
Figures 12A and B focus on the density tunnel, where two and single high-density spots exist in the Ribo-A and Ribo-A systems, respectively. This figure demonstrates apparently that the front surface and the rear surface of the aptamer domain was connected by non-zero density region (the bright-yellow contours):
Of course, the connection of the density recovers when we decrease the density level of the cyan contours. We presume from Figure 12 that free-energy barrier height from the lowest free-energy spot for Ribo-B is higher than that for Ribo-A because the ratio of density between the largest density spot (
In the present CSD-mD-VcMD study,
On the other hand, we should note that the only quantity usable to analyze the molecular structures in detail is snapshots, which were output taking an interval of
Last in this subsection, we note that a poorly sampled RC region can be sampled selectively even in the CSD-mD-VcMD simulation: We can set a sub-box so that the poorly sampled region is involved in the sub-box in the next iterative stage. If the native-complex structure is known in advance, the sub-boxes can be set so that the sub-box are near the native-complex structure, although the present CSD-mD-VcMD simulation did not use the knowledge of the native-complex structure.
Ordering of ligand’s molecular orientations in the density tunnelThe quantities regarding the ligand’s molecular orientation around the aptamer domain were defined in Supplementary Section 6:
Now, we divided the density tunnel into “front tunnel” and “rear tunnel”. Similarly, each of the front and rear tunnels was divided into “entrance” (a cyan broken-line rectangle) and “middle” (a red broken-line rectangle) as shown in Figures 13A and B. Apparently, the ordering of
Figures 13A and B showed that the ligand’s orientational ordering was weak in the middle regions (red broken line rectangles). This weak ordering in the middle regions disagrees with our previous DSD-mD-VcMD study: We could detect many
First, we describe the similarities of results between the present and previous [29] methods in many aspects. In both studies, ribocil B bound to the binding pocket of the aptamer domain more strongly than ribocil A did. The current study has shown that the free-energy landscape for the Ribo-B system is funnel-like whereas the landscape for the Ribo-A system is rather rugged. The spatial density
Furthermore, for both systems, the ligand’s molecular orientation
To investigate
Next, we argue the ligand’s poses detected in the binding pocket. We obtained five distinct clusters: The native-like,
One important issue of this paper is the effect of the structural flexibility (conformational fluctuations) of RNA and ligand on the complex structure. Recent studies on the RNA–ligand complex have uncovered the importance of the structural flexibility. In other words, the flexibilities of RNA and ligand make the complex–structure prediction highly difficult [3,13,14]. Interestingly, even though the overall shape of RNA is well conserved, different binding poses of the ligand to RNA are possible [14]. Our computation results may relate to these studies. The
As discussed above, the prediction of the RNA–ligand complex structure is difficult when the high structural flexibility of the molecules or the force-field inaccuracy is involved in the computation. We admit that we cannot solve this difficulty only from the current results of the DSD-mD-VcMD method. However, the current method can propose possible complex structures (candidates) based on the statistical mechanics (or thermodynamics) with assigning a probability to each candidate. Second, our method can use various force fields by switching an input file of GROMACS (i.e., topology file), which may useful to evaluate accuracy and inaccuracy of the examined force fields. Third, the free-energy landscape can provide not only the complex structures but also the pathways along that various semi-stable complex structures appear. Then, we may obtain a deeper understanding on the RNA-ligand binding process and their structure flexibility.
Last in this paper, we discuss the extensibility and reliability of CSD-mD-VcMD to a system other than the RNA system. As explained, we developed the CSD-mD-VcMD method to be applied to a flexible system and/or a system in that the receptor–ligand interaction sites are not known well. Therefore, one suitable target for our method is an intrinsically disordered protein (or an intrinsically disordered region of an ordered protein) interacting to a flexible ligand. Besides, both the CSD- and DSD-mD-VcMD methods are achieved with using PLUMED. The difference of CSD and DSD methods is only the input file for PLUMED. Thus, we can examine various reaction coordinates other than CSD and DSD by preparing a different PLUMED input file.
The authors declare no conflicts of interest.
The fundamental idea of CSD-mD-VcMD was generated by J. H., K. K. and S. S. Then, Y. F. proposed to apply the CSD-mD-VcMD method to investigate the RNA-ligand binding mechanism and helped some data analyses to make clearer the meaning of the simulation results. K. K. and S. S. implemented the cartesian-space-division (CSD) procedure to GROMACS and PLUMED. G.-J. B. and N. K. determined the RNA–ligand simulation system to the aptamer domain of the FNM riboswitch and robocil A (or B), and generated the input files for the computation. I. F. and T. T. discussed some methodologically important points and helped some data analyses. Simulation and paper-writing were mainly done by J. H.
We uploaded important snapshots obtained from the current simulation to the Biological Structure Model Archive (BSM-Arc). The entry for this study is https://bsma.pdbj.org/entry/77 We are planning to make the code available in near future after taking the necessary steps to address the regulatory issues due to compliance with the Foreign Exchange and Foreign Trade Act.
This work was supported by JSPS KAKENHI Grants No. 21K06052 (J.H.) and 20H03229 (N.K.), and performed in part under the Cooperative Research Program of the Institute for Protein Research, Osaka University, CR22-02, CR-23-02 and CR24-02. J.H. and K.N. acknowledge support by the HPCI System Research Project (Project IDs: hp170020, hp180050, hp190018, hp200063, hp210005, hp220015, hp230003, and hp230011, hp240005, hp240032, hp240167). J.H., I.F., N.K., and Y.F. are supported by the Project Focused on Developing Key Technology for Discovering and Manufacturing Drugs for Next-Generation Treatment and Diagnosis (project ID: 23ae0121028h0003; 2018–2021 and 24ae0121028h0004: 2021–) from AMED and the Japan Biological Informatics Consortium (JBiC). GA-mD-VcMD was performed on TSUBAME3.0 and TSUBAME4.0 supercomputers at the Tokyo Institute of Technology. We used UCSF Chimera ver. 1.15 for drawing molecular structures and gnuplot for graphs. Other analyses were done using homemade programs.