A virtual system-coupled molecular dynamics simulation free from experimental knowledge on binding sites: Application to RNA-ligand binding free-energy landscape

Junichi Higo; Kota Kasahara; Shun Sakuraba; Gert-Jan Bekker; Narutoshi Kamiya; Ikuo Fukuda; Takuya Takahashi; Yoshifumi Fukunishi

doi:10.2142/biophysico.bppb-v22.0011

Abstract

Ligand–receptor docking simulation is difficult when the biomolecules have high intrinsic flexibility. If some knowledge on the ligand–receptor complex structure or inter-molecular contact sites are presented in advance, the difficulty of docking problem considerably decreases. This paper proposes a generalized-ensemble method “cartesian-space division mD-VcMD” (or CSD-mD-VcMD), which calculates stable complex structures without assist of experimental knowledge on the complex structure. This method is an extension of our previous method that requires the knowledge on the ligand–receptor complex structure in advance. Both the present and previous methods enhance the conformational sampling, and finally produce a binding free-energy landscape starting from a completely dissociated conformation, and provide a free-energy landscape. We applied the present method to same system studied by the previous method: A ligand (ribocil A or ribocil B) binding to an RNA (the aptamer domain of the FMN riboswitch). The two methods produced similar results, which explained experimental data. For instance, ribocil B bound to the aptamer’s deep binding pocket more strongly than ribocil A did. However, this does not mean that two methods have a similar performance. Note that the present method did not use the experimental knowledge of binding sites although the previous method was supported by the knowledge. The RNA-ligand binding site could be a cryptic site because RNA and ligand are highly flexible in general. The current study showed that CSD-mD-VcMD is actually useful to obtain a binding free-energy landscape of a flexible system, i.e., the RNA-ligand interacting system.

Significance

A novel conformation sampling method, CSD-mD-VcMD, was proposed to compute a free-energy landscape of binding of a flexible receptor and ligand. This method is free from experimental knowledge of the system, and predicts the complex structures. This method was applied to systems consisting of an RNA (the aptamer domain of FMN riboswitch) and a ligand (ribocil A or ribocil B). The resultant free-energy landscape was consistent with that obtained from our previous sampling method, which was done using the knowledge of the experimental complex structure.

Introduction

In this paper, we develop a novel conformational sampling method based on our previously proposed approach. This method was applied to an RNA system. Below, we first discuss the challenges of studying a structurally flexible system, focusing on an RNA–ligand system. Next, we briefly review computational methods that have contributed to conformational sampling of biomolecular systems with atomistic details. We then describe our previous sampling method and its drawbacks, which naturally lead to the motivation for developing our currently proposed method to overcome the drawbacks.

RNA has been an important target for small-molecular drug candidates [1–8], and efforts for predicting the RNA–ligand complex structure have been accumulated [9–12]. On the other hands, those studies are uncovering that the flexibilities of RNA and ligand make the complex–structure prediction highly difficult [13,14].

The difficulty induced by structural flexibility of molecules can be seen also in proteins: “induced fit” or “induced folding” [15,16], where a part of the protein or ligand varies its structure to form a stabler complex. One of extreme cases for this difficulty of molecular binding is a fuzzy complex [17] or a multi-modal interaction ensemble [18], where many binding modes appear at the room temperature by the thermal fluctuations. We note an interesting phenomenon called a “coupled folding and binding mechanism” that is observed in an intrinsically unstructured protein: An unstructured protein or a partly unstructured protein in an isolated state adopts a well-defined tertiary structure upon binding to a partner molecule to exert its biological activity [19–22]. In previous study, we applied a generalized-ensemble method, the multicanonical molecular dynamics simulation [23] to a system of Ref. 20, which consists of an α-helix bundle protein (mSin3B) and a 15-residue segment taken from a neural restrictive silencer factor (NRSF). Although this 15-residue segment is disordered in the isolated state, it adopts an α-helix when it binds to the cleft of mSin3B [20]. Therefore, in the initial conformation of our simulation, mSin3B and the segment were dissociated and the segment was disordered. Importantly, in the resultant free-energy landscape, the lowest free-energy basin coincided to the experimentally determined complex structure, where the 15-residue NRSF segment folded into an α-helix in the cleft of mSin3B [24]. In addition, the obtained free-energy landscape included various semi-stable free-energy basins with distinct tertiary structures of the system. Another interesting point of this landscape is that some basins were connected by relatively low free-energy pathways, although the other basins were spaced by high free-energy barriers. This means that the free-energy landscape provides possible pathways of conformational changes among different complex structures.

The above examples suggest the usefulness of effective sampling methods to model or predict complex structures of flexible biomolecules and to investigate a mechanism of the complex formation. Methods in a group of generalized ensemble method sample the conformational space of biomolecular system by applying a bias force to the system or modulate a thermodynamic parameter of the to cause readily conformational transitions [25–28]. Importantly, a statistical weight (thermodynamic weight) is assigned to any of sampled conformations (snapshots) based on the physico-chemical principle. This enables us to compute the free-energy landscape (or the conformational distribution of the system) by assigning the density of conformation (potential of mean force) to the conformational space.

One may suspect the accuracy of the resultant free-energy landscape because of the inaccuracy of the force-field parameters, and we admit that no perfect force-field parameters exist so far. However, our previous study, which used a generalized ensemble method named mD-VcMD simulation, developed by us [29], showed qualitative agreement with experimental result [30]: Binding of ribocil B to the aptamer domain of the FMN riboswitch is stronger than that of ribocil A. Our subsequent analysis showed that the free-energy landscape for ribocil B has a funnel-like shape, and the landscape for ribocil A is rather ragged. On the other hands, Ref. 14 noted that the binding poses of ligands to RNA are determined poorly although the overall RNA structures are well reproduced by correct prediction of the coaxial stacking and tertiary contacts in RNA. This difficulty of the complex-structure prediction may increase the value of the generalized ensemble method because our simulation has provided some distinct RNA–ligand complex structures (binding poses) characterized by the statistical weight. Because the binding poses from computation are characterized by thermodynamic weight, the binding-poses can be used for candidates for the complex structure with a priority ranking.

Before explaining to computational methods, here we explain some biophysical significance of the simulation system (an RNA-ligand system) studied in the present paper: The aptamer domain of the FMN riboswitch (RNA) binding to ribocil A or ribocil B (ligand), which was studied using our previous method. In the present study, we applied the new method to the same system to compare the results between the two studies. Details of the system has been presented in our previous paper and the referred paper in the paper [31–36]. Thus, we explain it briefly. The FMN riboswitch exists in mRNA of a bacterium, Bacillus subtilis, which consists of the aptamer domain and an expression platform. The FMN riboswitch relates to gene-expression of riboflavin (also known as vitamin B₂), which is essentially important for the life support of the bacterium. Importantly, binding of a ligand to the deep binding pocket of the aptamer domain causes a structural change in the expression platform although the mechanism for this structural change has not yet been understood well, and finally the riboflavin’s biosynthesis is suppressed by the ligand binding. The stronger the ligand binding, the stronger the biosynthesis suppression. Thus, the investigation of the ligand-receptor binding mechanism is useful for drug discovery of antibiotics.

Analyses to understand statistical mechanics of molecular systems are achieved by taking advantages of the generalized ensemble methods, and those methods were also developed to prevent conformational trapping and then raise the sampling efficiency in the research fields of physics and biophysics. Well-known methods are the replica exchange sampling [37–39], the multicanonical sampling [40–43], and adaptive umbrella sampling [44,45]. Benefit of the generalized ensemble methods are not only their sampling efficiency but also an ability to assign a statistical weight (or thermodynamic weight) to sampled conformations. Furthermore, several generalized ensemble methods have been developed [46–65], which have similarities with the replica exchange, multicanonical, or adaptive umbrella sampling although they were developed independent of replica exchange, multicanonical, or adaptive umbrella sampling. These methods had been used mainly for understanding conformational changes of a single biomolecule in an early stage, and gradually expanded to study on ligand–receptor interactions. In these methods as explained above, a bias force is introduced in the system or a thermodynamic parameter of the system is modulated during the simulation.

Here we note that there is another group of computation methods [66–71] that aim to obtain directly kinetic constants, which are required to investigate conformational transitions among regions/states set in the conformational space of a molecular system: Given two states, say sA and sB, the free energy difference (ΔG(sA,sB)) between the states is defined by ΔG(sA,sB)=−RTln[rA→B/rB→A], where rA→B and rB→A are, respectively, rate constants from stare sA to sB and from sB to sA: R and T are the gas constant and temperature, respectively. By setting the states appropriately, one can obtain a map of ΔG in a wide conformational space, i.e., a free-energy landscape.

Because the above-mentioned methods search the conformations in a high-dimensional conformational space, the computation is usually expensive. A way to tackle this problem is to assume a one-dimensional (1D) line, which can be a curve line in the high-dimensional space, and sample the conformations only in the 1D line. We note that the free-energy difference (the difference of the potential of mean force) between the two ends of the 1D line can be calculated with relatively low computational cost. We call this method “line sampling” for convenience in this paper. Importantly, the line sampling has been used to calculate a binding/unbinding free energy (free-energy difference between unbound and bound states) [64,65,72–84]. However, the line sampling cannot discover low-energy regions (basins) out of the 1D pathway in the high-dimensional conformational space. This drawback is crucial when we do not know stable complex structures in advance.

To compute the free-energy landscape of a system consisting of two biomolecules (a ligand and a receptor) in a wide conformational space, which contains unbound configurations, various semi-stable complexes, and native-like complexes, we proposed a generalized ensemble method, a “virtual-system coupled molecular dynamics” (or “mD-VcMD”) simulation method [85,86]. Here we refer to this mD-VcMD as “DSD-mD-VcMD” for convenience (reason of this naming is explained later). The DSD-mD-VcMD method controls the relative positioning of the ligand and the receptor by introducing multiple reaction coordinates (multiple-RCs) and by enhancing the conformational motions in the multiple-RC space. Therefore, this method is suitable to investigate ligand–receptor docking (details are explained later). Furthermore, the DSD-mD-VcMD method can assign a statistical weight (thermodynamic weight) to each of the sampled conformations, and consequently provides a free-energy landscape of molecular binding/unbinding. However, the DSD-mD-VcMD method requires knowledge of the ligand–receptor binding sites to define RCs in advance (details are explained later). This point is a drawback to be applied to a highly flexible system such as the RNA–ligand system because different stable complex structures may be hidden in the conformational space, which is far from the ligand–receptor binding sites postulated in the introduced RCs.

In this study we present a novel sampling method, which uses multiple-RCs free from postulating the ligand–receptor binding sites in advance, and test this method to an RNA–ligand system. Therefore, we do not use the knowledge of the ligand–receptor binding sites, which are known experimentally. Here, we repeat that the present and previous methods are referred to as “CSD-mD-VcMD” and “DSD-mD-VcMD” methods, respectively, and note that the collective term for the two method is “mD-VcMD”. If the present method has proposed a procedure useful to understand the complex formation of the highly flexible molecules (i.e., RNA and ligand) and to present the most and semi-stable complex structures, we believe that the present work has a value for the RNA–ligand complex structure prediction.

Materials and methods

Because the molecular conformational space to be sampled is vast, we have performed the mD-VcMD simulation iteratively [29]: When the M-th iterative stage has finished, the data files for the next iterative stage are prepared, and then the (M+1)-th iterative stage starts. Through this cycle, the sampled region of the conformational space is expanded. So far, we have proposed three types of data-preparation methods for mD-VcMD: The “original” [85,86], a “subzone-based” [29,86], and a “genetic-algorithm-guided” [29,86] data-preparation methods. Both the present DSD-mD-VcMD and the previous CSD-mD-VcMD simulations used the subzone-based data-preparation method. See Re. 86 for details of the subzone-based method.

Therefore, the difference of DSD- and CSD-mD-VcMD simulations is not the data-preparation method but the definition (or setting) of the multiple-RCs. It is important that the multiple-RCs used in DSD-mD-VcMD are defined based on a priori knowledge of the ligand–receptor binding sites whereas those for CSD-mD-VcMD are free from binding-site information (reason is explained later).

Systems studied

We first explain the molecular system studied for the present study. In the previous DSD-mD-VcMD simulation [29], we studied an RNA–ligand system consisting of the aptamer domain of the Fusobacterium nucleatum flavin mononucleotide (FMN) riboswitch (receptor) and ribocil A or ribocil B (ligand). Because ribocil A and B are stereoisomers (Figure 1), we examined two similar systems. We studied these two systems again for the present CSD-mD-VcMD simulation.

Figure 1 Two ligands: (A) ribocil A and (B) ribocil B, which are optical isomers mutually. Ligands are flexible in simulation because all dihedral bonds are treated as rotatable. Red hydrogen atom labeled “H” is the atom different between ribocil A and ribocil B. Four rings in each ribocil are named “R1”, “R2”, “R3”, and “R4” for convenience as shown in figure. This ring naming was used uniformly in this paper and in Ref. 29.

We referred to the system consisting of ribicil A and the aptamer domain of the FMN riboswitch as the “Ribo-A” system, and did to the system of ribocil B and the aptamer domain as the “Ribo-B” system which is consistent with the previous study [29]. Supplementary Figure S1 demonstrates conformations of the Ribo-A and Ribo-B systems, both of which are completely dissociated configurations of the receptor and ligand in an explicit solvent. These were used for the initial configurations for the previous DSD-mD-VcMD study: See Ref. 29 for detailed methods to generate the systems. These configurations were also used for generating the initial conformations for the current CSD-mD-VcMD simulations.

Biophysical properties of the aptamer domain regarding the ligand binding have been studied experimentally [87–90], and an X-ray crystallography reported a complex structure of the aptamer domain with ribocil B (PDB ID: 5c45; resolution 2.93 Å) [91]. In this complex, ribocil B binds to the deep pocket of the aptamer domain. Interestingly, an experiment [30] has shown that binding of ribocil B to the aptamer domain is considerably stronger than that of ribocil A, in spite that the two complex structures (the Ribo-A and Ribo-B complexes) are almost identical. Our previous DSD-mD-VcMD simulation provided the following results [29]. i) In the lowest free-energy conformation, both ribocil A and B were in the binding pocket of the aptamer domain, although the ligands shifted by a few angstroms from the position in the X-ray complex structure. ii) Ribocil B bound to the aptamer domain more strongly than ribocil A did, which is consistent with the experimental observation [30] as mentioned above. iii) Both ribocil A and B approached the binding site with passing two tunnels that were completely different to each other. iv) For both ligands, a ligand’s orientational selection mechanism exerted at the entrance of the either tunnel, which is an important mechanism to lead the ligand to the ligand-binding site smoothly. v) There were some ligand-binding modes in the deep binding pocket of the aptamer domain other than the experimentally reported complex structure whereas the overall tertiary structure of the aptamer domain was maintained around the X-ray structure (PDB ID: 5c45).

In the present study, we perform the CSD-mD-VcMD simulation assuming that we do not know either the RNA–ligand X-ray complex structure nor their ligand–receptor binding sites, and compare the results with those from the DSD-mD-VcMD simulation. Remember that DSD-mD-VcMD used the knowledge of the binding sites of the complex [29]. As it will be explained later, both the DSD-mD-VcMD and the CSD-mD-VcMD simulations started from the unbound conformations.

Multiple-reaction coordinates

Now, we explain multiple reaction coordinates (multiple-RCs), which are set before executing the mD-VcMD simulation [86]. We first explain how to control the RCs for sampling the conformation space widely. After that, we explain a drawback of DSD-mD-VcMD when the ligand-binding site is not known well. Each RC is introduced as a function of atomic coordinates in the system, otherwise we cannot control the ligand–receptor mutual positioning. We refer to a space constructed by the RCs as “RC space”. In the previous studies, we applied distances between two atom groups for the multiple-RCs [29,92–96]. If one of RCs is defined by an inter-molecular distance, then the RC can control association or dissociation of the two molecules. This is why mD-VcMD is suitable for studying the ligand–receptor binding/unbinding. A graphical image of an RC is given in Supplementary Figure S1 of Ref. 86 or Supplementary Figure S4 of Ref. 29. To clarify that the used RC space is a distance space, we have referred to those mD-VcMD simulations as “distance-space-division mD-VcMD” simulations or “DSD-mD-VcMD”.

Whereas we have used multiple RCs usually, Figure 2 presents only a single RC to explain DSD-mD-VcMD. Figure 2 presents schematically a system in that the single RC is represented by the red line between two positions fixed in either the receptor or the ligand. The two molecules go away mutually when the RC increases, and the two molecules approach when the RC decreases.

Figure 2 Two types of distance-based reaction coordinate (RC), which are set to the same ligand–receptor system to perform DSD-mD-VcMD. Although multiple RCs are usually used in DSD-mD-VcMD, only one RC is introduced in each panel for simplicity. (A) RC (thick red line) is defined by distance between ligand and a position near the ligand-binding site (filled red circle) of receptor. (B) RC is distance between the ligand and a position distant from the ligand-binding site of the receptor. We assume that red and black filled circles are the lowest and second-lowest energy sites for ligand binding, respectively, on the receptor surface. In both panels, RC axes are divided into small intervals (called “zones” in this paper), which are numbered from the 1’st to 9’th. Checked shells are mentioned in text.

In mD-VcMD, each RC is divided into small intervals (called “zones”), to which ordinal numbers, such as 1’st, …, 9’th, are assigned as in Figure 2. Actually, we divide the RC axis into more than 9 zones usually. Now, we consider a process that a ligand approaches the ligand-binding site of the receptor from a fully dissociated position. This process is recognized as a series of inter-zone transitions in the scheme of DSD-mD-VcMD. For instance, if the ligand is in the 8’th zone in Figure 2A, the ligand’s approach to its binding site may be a series of inter-zone transitions such as 8’th → 7’th → … → 3’rd → 2’nd. We did not consider the transition from the 2’nd to the 1’st zone in Figure 2A because the 1’st zone is in the interior of the receptor in this figure: An atom collision occurs between the ligand and the receptor in the 1’st zone, which is rejected energetically during a simulation. The inter-zone transitions are controlled by assigning a transition probability between adjacent zones (inter-zone transition probability). Therefore, division of RC into zones is essentially important as discussed in detail in Ref. 86. This transition probability is also explained briefly in this paper later. After finishing the simulation, a statistical weight (thermodynamic weight) is assigned to any sampled conformation by using a re-weighting technique in mD-VcMD [86].

The RC in Figure 2A has a benefit for realizing the molecular binding. The ligand-binding site of the receptor (the red filled circle in the figure) is involved in the 2’nd zone and the volume of the 2’nd zone is smaller than the other zones except for the 1’st zone, which means that the conformation can encounter the ligand-binding site readily when the ligand moves from the 8’th zone to the 2’nd zone. The strategy to search the binding site from a point distant from the binding site is simple: If the zone ordinal No. is set to decrease with reaching the binding site, the ligand reaches the ligand-binding site by moving the ligand so that its zone ordinal No. decreases. An important thing is: The knowledge on the ligand-binding sites is mandatory to set such a convenient RC.

When the ligand-binding site is unknown, one may set an end of the RC randomly at a position of the receptor as exemplified by the “×” mark in Figure 2B, which is unfortunately distant from the ligand-binding site (the red filled circle). In Figure 2B the ligand-binding site is in the 6’th zone, and the volume of the 6’th zone is larger than that of the 2’nd zone. Thus, the ligand in the 6’th zone in Figure 2B has a smaller probability of detecting the ligand-binding site than in the 2’nd zone in Figure 2A. In other words, a longer simulation may be required to search the ligand-binding site when the ligand-binding site is unknown.

As described in the Introduction section, the DSD-mD-VcMD may work inefficiently when applied to a highly flexible system, when the real complex structure may be far from the postulated complex structure, or when the complex structure may have a variety by the conformational fluctuations. Therefore, a robust method free from the postulation on the multiple-RCs is required, which is the “cartesian-space division mD-VcMD” or “CSD-mD-VcMD” method.

Setting zones directly to the cartesian space

Figure 3 shows the difference of the multiple-RC setting between DSD-mD-VcMD and CSD-mD-VcMD. Figure 3A is a scheme for DSD-mD-VcMD, in which three distance-based RCs (RC1, RC2, and RC3) are introduced, all of which are bridging the receptor and the ligand. Therefore, this 3D-RC space is a distance space. Those multiple-RCs are divided into small zones and the molecular motions are controlled by the inter-zone transition probability as explained above. Because Figure 3A involves three RCs (remember that Figure 2 involved a single RC), we can control elaborately the mutual positioning of the two molecules.

Figure 3 (A) 3D-RCs, RC1, RC2, and RC3, introduced for DSD-mD-VcMD. Although all RCs are defined by distances between receptor and ligand in this figure, some of RCs may be set by an intra-receptor or intra-ligand distance, which specifies the distance between two portions within the receptor. (B) 3 RCs directly assigned to periodic boundary box (i.e., 3D cartesian coordinate space). Terms and symbols appearing in panel are mentioned in text. See figure caption for Figure 2 for red and black filled circles.

An RC can be an arbitrary function expressed by the system’s atomic coordinates [86]. Figure 3B shows a scheme for CSD-mD-VcMD. First, we put the centroid of the aptamer domain of the FMN riboswitch at the body center of the periodic boundary box, as explained in see the Systems–Studied subsection of the Materials–and–Methods section and Supplementally Figure S1. The orientation of the aptamer domain regarding to the periodic box was randomly set. The x-, y-, and z-axes of the cartesian coordinate space are parallel to sides of the periodic boundary box. Therefore, the three RCs [RC1,RC2,RC3] are defined as the 3D cartesian coordinates set to the ligand (marked as “×”). In the present study, the ligand’s position was represented by the centroid of the ring R3 of the ligand (see Figure 1 for the position of ring R3). Therefore, [RC1,RC2,RC3]=[xR3,yR3,zR3], where xR3, yR3, and zR3 are, respectively, the x, y, and z coordinates of the centroid of ring R3 in the 3D cartesian coordinate space. We did not used the entire-ribocil centroid for the RCs because the ligand rotation around the entire-ribocil centroid does not vary the values of RCs. In other words, the entire-ribocil centroid does not have an enough resolution to discriminate the ligand’s orientational variety. Later in the Result–section, we exemplify a case in that the entire-ligand centroid specifies insufficiently the orientational variety.

Because the shape of the aptamer domain was relatively spherical and compact, the orientation of the aptamer domain was set randomly in the periodic box. On the other hands, one may set the orientation artificially when the receptor’s shape is deviated considerably from the spherical shape (for instance, when the receptor has a rod shape).

3D-RC space for CSD-mD-VcMD

Here, we explain details of the RC setting for CSD-mD-VcMD. We first assume that the x, y, and z coordinate axes of the 3D cartesian space are parallel to the three sides of the cubic boundary box (Figure 3B). Next, we set the RC1, RC2, and RC3 axes to be parallel to the x, y, and z coordinate axes, respectively: RC1//x, RC2//y, and RC3//z. Then, we divide the 3D cartesian space (i.e., periodic boundary box) into small 3D zones: The yellow region labeled “a zone” in Figure 3B is an example of the zone for CSD-mD-VcMD. We set a periphery region (denoted as “margin” in Figure 3B) in the periodic boundary box, and did not allow [RC1, RC2, RC3] (i.e., the centroid of the ring R3 of the ligand) to enter into the margin during the simulation. In contrast, the ligand moves freely in the region called “the ligand-movable box” in Figure 3B. Therefore, only the ligand-movable box is divided into small 3D zones. A ligand’s density (i.e., the probability of detecting the ring R3 of the ligand) in a zone is calculated only in the ligand-movable box during the mD-VcMD simulation. In both CSD- and DSD-mD-VcMD methods, the inter-zone transition procedure is implemented in the PLUMED plug-in [97] combined with GROMACS [98]. Therefore, the computational cost for CSD-mD-VcMD and DSD-mD-VcMD simulations are almost the same as that for a conventional MD simulation, because the assignment of a conformation to a virtual zone [RC1,RC2,RC3] is done simply by PLUMED. Thus, the GROMACS program runs without modification. We explain later how to control the inter-zone transitional motions of the ligand.

Zones in Multi-RC space and Inter-zone transitions

Details for division of the 3D-RC space into small 3D zones are explained elsewhere [86]. We note that the space-division procedure is independent of the type of RC whatever the 3D-RCs are distances (Figure 3A) or 3D-cartesan coordinates (Figure 3B). In the mD-VcMD simulation, the conformation is confined in a 3D-RC zone (“current zone”) for a while and then transitions from the current zone to one of neighboring zones. Supplementary Section 1 explains details for the division of the 3D-RC space (Supplementary Figures S2A and B) and procedure for the inter-zone transition (Supplementary Figures S2C). The mD-VcMD simulation produces a distribution function (density function) of the system assigned to each zone, which is sampled during the simulation. A zone is denoted by L, which is a 3D vector when multiple RCs are introduced for sampling.

The density assigned to a zone L is expressed as Qcano(L). When we specify the iterative stage M to the density, Qcano(L) is denoted as Qcano[M](L). The method to calculate Qcano[M](L) from the simulation is explained in Supplementary Equations S1 and 2 of Supplementary Section 2. The inter-zone transition probability is explained in Supplementary Section 3, where Supplementary Equations S3–7 provide rigorous derivation of the inter-zone transition probability, and Supplementary Figure S3 shows schematically the transitions. With traveling among various 3D-RC zones via the inter-zone transitions in the ligand-movable box (Figure 3B), we can obtain the density of the system in each 3D zone, which is explained below.

CSD-mD-VcMD simulation

As mentioned in the beginning of the Materials–and–Methods section, the mD-VcMD simulation is done iteratively. This is because we set the ligand-movable box (Figure 3B) largely and accordingly the majority of the biophysically important conformations are not sampled in a single simulation run. Because iterative simulations have been performed in DSD-mD-VcMD, we also did the CSD-mD-VcMD in the present study. We numbered the iterations as “the first iterative stage”, “the secondt iterative stage”, and so on. When the M-th iterative stage has finished, we calculate a density function Qcano[M](L), which is a probability of detecting the system in zone L computed from the trajectories from the 1-st to M-th iterative stages. See Supplementary Section 2 or Ref. 86 for details for calculation of Qcano[M](L).

In DSD-mD-VcMD, to increase the sampling efficiency further, we have performed Nrun runs in parallel in each iterative stage, where the runs start from different conformations. This parallel-job procedure is also used in CSD-mD-VcMD, and set Nrun=1,000 in this study. In the previous DSD-mD-VcMD simulation we set Nrun=1,024. Those MD trajectories starting from different initial conformations converges to a canonical ensemble in theory [99,100].

We set the simulation length (denoted as Δtrun) of each of Nrun runs to 2.0ns (1×106 steps with time-step of 2 fs) for both of the Ribo-A and Ribo-B systems. Therefore, the total length for an iterative stage was 2.0μs (=2.0ns×1,000runs). As reported later, we performed 85 iterations for each system. Thus, the total simulation length for each system is 170.0μs (=2.0ns×1,000×85iterations). We show in the Results–and–Discussion section that the density Qcano[M](L) converges before the 85-th iterative stage. We saved a snapshot at every 1×105 steps (0.2ns) in each run of each iterative stage. Thus, the number of saved snapshots from the 85 iterations is 8.5×105 (=170μs/0.2ns) for each system. Because we set Δτtrns to 200 fs (100 steps) in the present study, inter-zone transition events occur 104 times (=1×106/100) in a single run including self-transitions.

The mD-VcMD algorithm was first implemented to an MD simulation program myPresto/omegagene [27] and next to GROMACS [98] with the PLUMED plug-in [97]. Both the CSD- and DSD-mD-VcMD simulations can be done using GROMACS with PLUMED. The simulation conditions were the following: LINCS [101] to constrain the covalent-bond lengths relating to hydrogen atoms, the Nosé–Hoover thermostat [102,103] to control the simulation temperature, and the zero-dipole summation method [104–106] to compute the long-range electrostatic interactions. No structural restraint was applied to the system except for weak position restraints on some receptor atoms to suppress the translational and rotational motions of the receptor as explained later. As mentioned above, the time-step of simulation was set to 2 fs (Δt=2 fs) and the simulation temperature to 300K. Runs were performed on the TSUBAME3.0 (Tokyo Institute of Technology) and TSUBAME4.0 (Institute of Science Tokyo) supercomputers, both of which use GPU. The force field parameters are explained in the next subsection.

The TSUBAME3.0 and 4.0 computer systems are constructed by many cpus and GPUs, and each user can use a part of them depending on how the computer system is crowded. Usually, we have used 30–100 GPUs at the same time and executed 1 run by a combination of 1 cpu and 1 GPU. In the present study a run (1×106 steps) required about 16 minutes using TSUBAME4.0 system. Then, 85 iterations, each of which consisted of 1,000 runs, took about 9 days for each of Ribo-A and Ribo-B systems executing 30 runs in the same time. Although this computer resource is large, we note that the computation speed and the number of cpu/GPU are increasing rapidly. We are planning to examine larger biomolecules in future.

One might infer that the resultant conformational ensemble is affected by the confinement of the system’s conformation in 3D zones for intervals of Δτtrns. This criticism is valid for a conventional MD simulation that uses an artificial potential. In mD-VcMD, however, the bias effect by the zone confinement is removed from the resultant ensemble by a reweighting technique [29,86]. Thus, the thermodynamic weight assigned to a snapshot is a quantity without the confinement. On the other hands, the position restraints introduced in CSD-mD-VcMD may affect the resultant ensemble. Therefore, we set the restraints considerably weak as explained later and checked that the weakness of the restraints. Ref. 29 explains that mD-VcMD is different from umbrella sampling.

Force fields, ligand-movable box, and weak position restraints on RNA

The systems studied are displayed in Supplementary Figure S1 and the force fields used were explained in Ref. 29. Here, we explain briefly the force fields: Those for the aptamer domain are from the ff99bsc0χOL3 (or χ_OL3) [107] and that for water molecule is from the 3-point optimal point charge (OPC3) water model [108]. The force field for Mg2+ ion is from a recent study on divalent ions [109] and those for K+ and Cl− ions are from another recent study on monovalent ions [110]. We have modeled the force field parameters for ribocil A and B by us: The atomic partial charges of ribocil A or B are calculated quantum-chemically using Gaussian09 [111] at the HF/6-31G* level, followed by RESP fitting [112]. Then, the obtained atomic partial charges are incorporated with the general AMBER force field 2 (GAFF2) [113], which was designed to be compatible with conventional AMBER force-fields.

Figure S1 demonstrates that the aptamer domain of the FMN riboswitch was put at the center of the periodic box (cubic shape) filled by a solvent and that the ligand was put near a corner of the periodic boundary box for both systems. Therefore, the ligand and the receptor were completely dissociated. For multiple-run execution in each iterative stage, we generated Nrun (=1,000) conformations from this the dissociated conformation for the initial conformations of the CSD-mD-VcMD simulation. See the legend of Figure S1 for the number of constituent atoms and the size of the periodic boundary box.

Figure 3B shows schematically the periodic boundary box and the ligand-movable box, which is put inside the periodic boundary box. See subsection of “Setting Zones Directly to the Cartesian Space” above for the method to put the aptamer domain in the ligand-movable box. The periodic boundary box was set so that their sides were parallel to the x, y, and z coordinate axes with dimension from 0.0Å to 97.3511Å for the Ribo-A system and from 0.0Å to 97.3304Å for the Ribo-B system. The ligand-movable box was set as [RCmin(h),RCmax(h)]=[5.0Å,92.0Å] (∀h) for both systems in the present study. Note that solvent molecules can move in the margin region, although the centroid of the ring R3 of the ligands cannot enter into the margin. Each range of [RCmin(h),RCmax(h)] was divided into 144 zones (nvs(h)=144) for all of the three RC axes (Supplementary Table S1). The lower and upper boundaries of each zone, [ζi(h)]low and [ζi(h)]up, are listed in Supplementary Table S2. The zone width was set to 1.2Å for all zones: [ζi(h)]up−[ζi(h)]low=1.2Å (∀h and ∀i). This value of 1.2Å was determined from a preliminary study (data not shown). The RC-axis division was the same for all RC axes (Supplementary Tables S1 and S2).

The CSD-mD-VcMD simulation produces the density of the ligand position in each zone at the simulation temperature (300 K) in the ligand-movable box. Note that the zones are fixed to the 3D cartesian space (not to the receptor). Therefore, if the receptor can translate and rotate freely in the cartesian space, the ligand’s density may converge to a flat distribution function in the cartesian space. However, in that case, a considerably long simulation is required to obtain the flat density (i.e., convergence). Thus, to avoid the long simulation, we apply weak position restraints on some atoms of the aptamer domain: Eight atoms (P, OP1, OP2, O5', C5', C4', C3', and O3) of 15 nucleotides presented by the stick model in Supplementary Figure S4 are the position-restrained atoms: 120 (=8×15) atoms. Those atoms were selected not to suppress the motions of the binding pocket of the aptamer domain because those atoms are relatively distant from the ligand–binding site. Furthermore, the position-restraint was not applied to P1 and P4 helices because an experiment has reported that these two helices move largely in thermal fluctuations [91].

We used the following position-restraint function, which has been implemented in GROMACS [98]:

E p r = c p r ∑ i | r i – r i ( r e f ) | 2 ,

(1)

where ri is the position of the i-th position-restrained atom and ri(ref) is the reference position (Supplementary Figure S4), to which the restraint is applied. To realize a weak position restraint, we set the coefficient cpr in Equation 1 to a small value: cpr=10kJ/mol×nm2=0.024kcal/mol×Å2. We show in the Results–and–Discussion section that the aptamer domain fluctuates largely and that both ribocil A and B could bind to the deep binding pocket of the aptamer domain during the simulation.

Introduction of sub-box: Hierarchical space division

As described above, we set 144 zones for each RC axis and prepared 1,000 initial conformations for each system. So, we may start the CSD-mD-VcMD simulation. However, the 1443 zones existing in the 3D-RC space require a large amount of memory in computation. Therefore, to decrease the memory space, we introduce smaller boxes (denoted as “sub-boxes”) in the ligand-movable box and confine runs to move only in a sub-box. The sub-box is a cubic consisting of nSB3 zones, which is currently set as nSB=61. Thus, the number of 3D zones in the sub-box is considerably smaller than that in the ligand-movable box: (nSB/nvs(h))3=(61/144)3=0.076. Figure 4A illustrates two sub-boxes, sb1 and sb2, put in the ligand-movable box, and Figure 4B is a two-dimensional version of Figure 4A. The sub-box sb1 is fully involved in the ligand-movable box, and sb2 is done partially. Therefore, a run confined in sb1 can visit all zones (i.e., 613 zones) in the sub-box, whereas a run confined in sb2 does not sample zones outside the ligand-movable box. The red-colored regions of the zones in Figure 4B are those sampled by the runs. Figure 4C illustrates a situation where many sub-boxes are put in the ligand-movable box.

Figure 4 (A) Two sub-boxes, denoted as sb1 and sb2, put in ligand-movable box (blue region). Boxes shown by gray lines represent the periodic boundary box. Although the number of sub-boxes is 20 in the actual CSD-mD-VcMD, it is smaller than 20 in this figure. Small cube in each sub-box is the central cube of the sub-box. (B) Two-dimensional representation of panel A. Red area for each sub-box is mentioned in main text. (C) More sub-boxes than 2 are put in the sampling region, where the sub-boxed overlap mutually, and some regions are not occupied by sub-boxes in this panel. Because the sub-boxes are put randomly at the beginning of each iterative stage, the unoccupied regions are occupied with increasing the iterative stage.

Position of each sub-box is determined as follows: First, we pick up a single 3D zone randomly from the ligand-movable box. Next, a sub-box is put so that the picked zone is at the center of the sub-box. The small cubes in Figure 4A and the small squares in Figure 4B are the picked zones, which are located at the centers of sub-boxes. We put 20 sub-boxes at the beginning of each iterative stage of CSD-mD-VcMD in the present study. The sub-boxes may overlap mutually as illustrated in Figure 4C. In an iterative stage, we confined 50 runs in each sub-box. The total number of runs Nrun done in an iterative stage was 1,000 (Nrun=20×50) for the present study. The number of sub-boxes and the number of runs in each sub-box are adjustable parameters.

Supplementary Section 4 explains how to prepare the 50 initial conformations in a sub-box for the (M+1)-th iterative stage after the M-th iteration has finished. We note that the initial conformations for the first iterative stage should be prepare carefully, which is also explained in Supplementary Section 4 and Supplementary Figure S5 demonstrates actual positioning of ligands for the first iterative stage.

Ratios of sampled zones

Before starting the simulation, all the zones are not sampled (i.e., all zones are empty). With proceeding the iterative stages, the number of sampled zones increases. After finishing the M-th iterative stage, we classify each 3D zone into three types depending on the situation of the zone: “Perfectly-sampled”, “imperfectly-sampled”, and “empty”. In a perfectly-sampled zone, all of the eight subzones in the 3D zone have been sampled during the first to the M-th iterative stages. In an imperfectly sampled zone, on the other hand, one or more subzones have not yet sampled during the iterative stages. In an empty zone, no subzone has been sampled yet.

Here, we define two quantities: Rp[M] and Rp+ip[M]. Rp[M] is the ratio of the perfectly-sampled zones to all the 3D zones (i.e., [nvs(h)]3=1443) up to the M-th iterative stage, and Rp+ip[M] is the ratio of perfectly-sampled and imperfectly-sampled zones (or 1443− the number of empty zones). To analyze the convergence of Qcano[M](L) with proceeding the iterative stage, we use Rp[M] and Rp+ip[M].

Correlation of spatial pattern between Qcano(I)(L) and Qcano(J)(L)

Above, we introduced ratios Rp[M] and Rp+ip[M] to analyze the sampled zones. Now, we introduce another quantity, a correlation coefficient, Ccorr(I;J), to analyze a similarity of the 3D-RC spatial patterns between Qcano[I](L) and Qcano[J](L). The value of Ccorr(I;J) ranges from −1 to +1. The larger the Ccorr(I;J), the more similar the Qcano[I](L) to Qcano[J](L). Remember that Qcano[M](L) is the system’s spatial density calculated using from the first to the M-th iterative stages. Therefore, Qcano[J](L) expresses a future form of Qcano[I](L). Because Qcano[M](L) converges to a canonical distribution (thermally equilibrated distribution assigned to L at the simulation temperature) with increasing M [99,100], the correlation coefficient can be a good quantity to assess the convergence of Qcano[M](L). The actual form of Ccorr(I;J) is given in Supplementary Equation S8, and derivation of Ccorr(I;J) is explained by Supplementary Equations S9–14 of Supplementary Section 5.

Cumulative spatial density of ligand around receptor

The function Qcano(L) estimated by the CSD-mD-VcMD simulation expresses the spatial density of the ligand (strictly speaking, the centroid of the R3 ring of ribocil) because L is the position of a 3D zone fixed to the 3D cartesian space. Therefore, the larger the Qcano(L) in a zone L, the larger the ligand’s density in the zone. This gives us an image of the property of the free energy landscape (funnel-like or ragged-like). In this subsection, we analyze quantitatively the degree of the funnel-like property of the free-energy landscape.

For this purpose, we introduce a “cumulative spatial density” of the ligand around the aptamer domain. First, all zones existing in the ligand-movable box are expressed as {L1,L2,L3,⋯,LNe} where Ne is the number of 3D zones in the ligand-movable box: Ne=nvs(1)nvs(2)nvs(3)=1443. Then, we rearrange the zones so that their densities {Qcano(L1),⋯,Qcano(LNe)} are arranged in a descending order, and the 3D zones are renamed as: Λ1,Λ2,Λ3,…,ΛNe according to the descending order of densities. By this definition, the following inequality is satisfied:

Q c a n o ( Λ 1 ) ≥ Q c a n o ( Λ 2 ) ≥ … ≥ Q c a n o ( Λ N e ) .

(2)

We call these 3D zones, Λ1,Λ2,Λ3,…,ΛNe, as “density-rearranged zones”.

Next, we accumulate Qcano(Λi) in a descending order of Λi as:

Δ n = ∑ i = 1 n Q c a n o ( Λ i ) .

(3)

Because Δn is probability, it satisfies an inequality: 0≤Δn≤1 (remember that Qcano(L) is originally normalized by Equation 1). If we set Δn=0.5 for instance, then we can assemble zones that contribute to the system’s probability by 50% from the highest-density zones.

Finaly, by setting Δn to a value, we can define a cumulative density Qcum as:

Q c u m ( Λ i ; Δ n ) = { 1 ( i f Λ i i s i n v o l v e d i n Δ n ) 0 ( e l s e ) .

(4)

Because we know the correspondence between Li and Λk, we can re-write this equation as:

Q c u m ( L i ; Δ n ) = { 1 ( i f L i i s i n v o l v e d i n Δ n ) 0 ( e l s e ) ,

(5)

The spatial pattern of Qcum(Li;Δn) teaches us the shape of the free-energy landscape: Funnel-like or ragged shape.

Ligand binding pathways to the binding site of the aptamer domain

In the crystallographic complex structure (PDB ID: 5c45; resolution 2.93 Å), ribocil B is buried deeply in the pocket of the aptamer domain [91]. Supplementary Figure S6 shows that the binding ligand is visible from completely opposite directions (front and rear views) of the surface of the aptamer domain. The naming of “front” and “rear” in Figure S6 is not universal but assigned by us arbitrarily in our previous work [29] for convenience. This figure clarifies that the ligand-binding site is in a deep pocket and that the pocket forms a tunnel connecting the front and rear surfaces of the aptamer domain. We cannot judge which is the plausible ligand-binding pathway only from the crystallographic structure.

Our computation [29] showed that the ligand can reach the binding site from both surfaces for both the Ribo-A and Ribo-B systems because the ligand’s density Qcano(L) did not vanish in the entire tunnel region connecting the front and rear surfaces (see Figure 4b and d of Ref. 29). Thus, we named this tunnel as “density tunnel”. On the other hands, we presumed that entering from the front surface leads the ligand to the binding site more smoothly than entering from the rear surface does. Our analyses showed that ligand’s orientation should rotate largely in the density tunnel before reaching the binding sites when the ligand enters from the rear surface [29]. In other words, entering from the rear surface requires climbing a free-energy barrier resulted from the molecular rotation. This point is checked also in the present study.

Molecular orientation of ribocil around the aptamer domain

Our previous study [29] quantified the ligand’s molecular orientation by two vectors e↑ and e←, which are fixed in the ligand. Note that the vector eα (α=↑or←) moves according to the ligand’s motion. Thus, the vectors can be good quantities to represent the molecular orientation. We also analyze the ordering of eα in the present study. Supplementary Equations S15–17 of Supplementary Section 6 and Supplementary Figure S7 present the method to calculate eα.

Because the vector eα can be assigned to a spatial site in the 3D cartesian space, we can investigate the spatial pattern of eα around the aptamer domain. Then, we divided the cartesian space into small cubes (size is 2×2×2Å3), whose positions are represented by the body centers of the cubes rcube. By using the cubes, we can calculate the thermal average of eα at 300K in each cube: <eα(rcube)> (Equation S14). The larger the |<eα(rcube)>| (i.e., the larger the norm of <eα(rcube)>) at site rcube, the more the ordering of eα at the site.

We next defined a scalar product, SPα(rcube), between eα(X−ray) and <eα(rcube)> (Equation S16), where eα(X−ray) is eα of the ligand in the X-ray complex structure (PDB ID: 5c45). The larger the SPα(rcube) at site rcube, the similar the <eα(rcube)> to eα(X−ray) at this site.

Ligand’s RMSD and radial distribution-like function

We quantify the ligand’s structural difference between a snapshot and the X-ray complex structure by a well-known quantity “a root-mean-square-deviation” (“rmsdlig”):

r m s d l i g = Σ k | r k − r k ( X ) | 2 N l i g ,

(6)

where rk and rk(X) are, respectively, the position of the k-th heavy atom of the ligand in the snapshot and that of the corresponding atom in the X-ray complex structure, and Nlig is the number of heavy atoms in ribocil (Nlig=27 for both ribocil A and B).

Remember that we applied a weak position restraint on some atoms in the aptamer domain (Equation 1) and that the restrained atoms were set to be distant from the ligand-binding pocket (Supplementary Figure S4). Because of the small force constant cpr in Equation 1, the aptamer domain exhibited whole-molecular motions (we show large fluctuations later), and then the resultant rmsdlig was affected by the whole molecular motions of the aptamer domain. This means that the whole-molecular motions should be removed from ligand’s fluctuations to analyze rmsdlig, and consequently the removal of the whole-molecular motions is equivalent to superposition of some atoms of the aptamer domain of the snapshots to those in the X-ray complex structure. We selected the superimpose atoms to surround the ligand-binding pocket (Supplementary Figure S8). Note that this superposition was applied to the system after the CSD-mD-VcMD simulation. All the atoms in the snapshots move according to the superposition of the selected atoms, and Equation 6 is calculated. If a snapshot has a large rmsdlig, it means that the ligand structure in the snapshot and that in the X-ray structure are different largely to each other translationally, orientationally, and structurally. Supplementary Section 7 provides some notes for calculating rmsdlig and the radial distribution-like function ρrd(r).

Degeneracy problem in DSD-mD-VcMD and alleviation of the problem in CSD-mD-VcMD

Here we note a problem, which we call “degeneracy problem” in this paper, which may induce misleading in analyzing the resultant density Qcano(L). In Figures 2 and 3, the lowest and second-lowest binding-energy sites for receptor–ligand binding are shown by the red- and black-filed circles. See the figure caption of Figure 2. Here we note that a serious difficulty, degeneracy problem, may occur, depending on the mutual positioning of low binding-energy sites (the sites of the red- and black-filed circles in this case). The word “degeneracy” is usually used to discuss the energy of the system in physics. However, we use this word to discuss the structure of the system.

In short, when we set the reaction coordinate according to the Figure 2B in the DSD-mD-VcMD sampling, the conformations with the lowest and second-lowest binding-energy sites cannot be discriminated in the density function Qcano (i.e., two conformations are degenerated at the same position in the density function). Then, this may cause misinterpretation of the resultant free-energy landscape. In contrast, in the CSD-mD-VcMD sampling, the lowest and second-lowest binding-energy sites are distinguished (the degeneracy vanished) in the resultant density function. We explained this problem in Supplementary Section 8.

Results

As mentioned, the present CSD-mD-VcMD and previous DSD-mD-VcMD simulations [29] used the same molecular systems, which consist of the aptamer domain of the FMN riboswitch (receptor) and ribocil A or B (ligand). In DSD-mD-VcMD, we introduced three distance-based RCs, where two of the three RCs were set using the knowledge of the native-complex structure (PDB ID: 5c45; resolution 2.93 Å) [91]. Therefore, the ligand approached the position in the native complex position (see Figure 1 of Ref. 29 for actual setting of RCs) when the two RCs became small. The remaining one RC was set to control the cleft width of the aptamer domain, which may also help the ligand to enter into the binding pocket. Consequently, we performed 45-th iterative stages till the density function converged in the 3D distance space. In contrast, in CSD-mD-VcMD, the sub-boxes were set randomly in the ligand-movable box, and performed 85 iterative stages were performed for both systems as explained later.

The extra iterative stages in CSD-mD-VcMD compering to DSD-mD-VcMD may be because we did not use the knowledge of the binding sites of the complex structure. However, we note that if the knowledge is inaccurate or the molecular system is highly flexible, the DSD-mD-vcMD simulation may require more extra iterative stages than the CSD-mD-VcMD simulation. We discuss this point later.

In CSD-mD-VcMD, we set the number of runs, Nrun, performed in each iterative stage as Nrun=1,000, which is slightly smaller than that taken for our previous DSD-mD-VcMD study: Nrun=1,024. The simulation length of each run was exactly the same (1×106 steps with time step of 2 fs) with that of the previous study.

Convergence process

In each iterative stage, 20 sub-boxes were put randomly in the ligand-movable box, and 50 runs were confined in each of the 20 sub-boxes as explained in the Materials–and–Methods section. Sub-panels “Ite [1:M]” of Figures 5A and B demonstrate the sampled regions during the first to the M-th iterative stages for Ribo-A and the Ribo-B, respectively. Red-regions in the figures represent sampled zones during the iterative stages, and no color region is assigned to zones that were not sampled during the iterative stages (empty-zone region).

Figure 5 Pannels in (A) and (B) are calculated for Ribo-A and Ribo-B systems, respectively. If a zone is colored in red in a sub-panel labeled “Ite [1:M]”, then the zone is sampled during the first to the M-th iterative stages. Other zones, to which no color is assigned, are non-sampled zones (empty zones) during the iterative stages. The aptamer domain of the FMN riboswitch is put at the center of periodic boundary box, and the P1 and P4 helices are indicated by broken-line circles in panels of “Ite [1:M]”.

In the first sub-panel “Ite [1:1]” of each of Figures 5A and B, the positions of the individual sub-boxes are seen relatively clearly, although sub-boxes are fused one another with increasing M: See sub-panels “Ite [1:5]” for instance. With proceeding M, sampled regions increased rapidly for both systems. However, the volume occupied by the aptamer domain was not involved in the red contours in both systems because the ligand could not overlap the aptamer domain. On the other hands, the positions of the P1 and P4 helices of the aptamer domain were involved in the red contours when M increased. This is because the two helices fluctuated largely during the simulations. Remember that we did not apply the position restraints to the P1 and P4 helices (Supplementary Figure S4) because an experiment [91] reports that the P1 and P4 helices fluctuate largely.

To analyze the process of the system reaching an equilibrium, we plot the ratios Rp[M] and Rp+ip[M] as a function of iterative stage M. As shown in Figure 6A, the dependency of the ratios Rp[M] and Rp+ip[M] on M was similar between the two systems: Both Rp[M] and Rp+ip[M] increased rapidly up to M=30 and slowly up to M=40. After that, the ratios reached a plateau more slowly in range of 40<M≤85. Therefore, one may consider that Qcano[M](L) almost reached an equilibrium for M≥40 by seeing the similar manner of Rp[M] and Rp+ip[M] between the two systems. However, the correlation coefficient Ccorr(M;85) converged differently between the two systems as shown in the next paragraph. We note that both Rp[M] and Rp+ip[M] did not reach 100% with slowly increasing to a value smaller than 100%. This behavior is natural because the ligand cannot crash into the interior of the receptor as shown in Figure 5.

Figure 6 (A) Ratios Rp[M] and Rp+ip[M] as a function of iterative stage M. Procedure to calculate the ratios is given in Materials–and–Methods section. (B) Correlation coefficient Ccorr(M;85) (Supplementary Equation S7) as a function of M.

Figure 6B plots the correlation coefficient Ccorr(M;85) (Supplementary Equation S8) as a function of M. The coefficient Ccorr for the Ribo-B system increased quickly in range of 1≤M≤40 and reached almost plateau in range of M>40. This behavior of Ccorr is similar to those of the ratios Rp[M] and Rp+ip[M] in Figure 6A, although a peak at M=9 in Figure 6B for Ribo-B is not recognized in Figure 6A. In contrast, the behavior of Ccorr for Ribo-A was different from that for Ribo-B. The Ccorr for Ribo-A increased quickly in 33<M<37, did slowly in 38≤M,≤79, and finally reached plateau in 79≤M≤84 (see inset of Figure 6B).

The bumpy behavior of Ccorr in early iterative stages for both systems (see Ccorr for 5≤M≤20) is because the number of sampled zones used to calculate Ccorr in the early iterative stages was smaller than that used in latter iterative stages. Remember that an empty zone was eliminated in Supplementary Equation S8. Therefore, in the early iterative stages, Ccorr involved relatively large statistical errors.

Spatial density of ligand around the aptamer domain

As explained in the above subsection, we performed the simulation up to the 85–th iterative stage although Qcano[M](L) has been converged up to the 79-th iterative stage for both systems. We rewrite Qcano[M](L) at M=85 as Qcano(L) with omitting term “[M]” and consider that Qcano(L) is an equilibrated special density. Figure 7 illustrates Qcano(L), which demonstrates the ligand distribution around the aptamer domain. Apparently, the highest-density spots (i.e., the red-contour regions of Figure 7) were seen in the deep ligand-binding pocket in both the Ribo-A and Ribo-B systems. However, the two systems show different spatial patterns of Qcano(L): The red-contour region of Ribo-B was found only in the binding-pocket as a single spot (Figure 7B), although the red-contour region of Ribo-A was split into two spots (Figure 7A) in the binding-pocket. Furthermore, in Ribo-A, the red-contours were found in two regions other than in the binding-pocket: See black broken-line circles in Figure 7A. Therefore, it is likely that the density for ribocil B was localized well in the small volume in the binding-pocket whereas the density for ribocil A dispersed. This tendency of density localization is found for the second-highest iso-density regions (blue-colored contours) and third-highest iso-density regions (yellow-colored contours). These results gave us a strong impression that the free-energy landscape for Ribo-B is funnel like and that for Ribo-A is rugged. In the next subsection, we analyze this impression more quantitatively.

Figure 7 Spatial density Qcano(L) of ribocil A (A) and ribocil B (B) in the ligand-movable box. Note that Qcano(L) is normalized by Supplementary Equation S1. Shown density is from the iterative stage 85, in which Qcano(L) converged (Figure 6B). Differently colored iso-density levels are presented by differently colored contours as indicated in inset. ρmax is the maximum density assigned to a zone for each system: ρmax=10−2.45 for Ribo-A and ρmax=10−2.03 for Ribo-B. Positions of ligand-binding pocket, P1 helix, and P4 helix are indicated by words “binding site”, “P1 helix” and “P4 helix”, respectively. Ligand-binding pocket is also indicated by brown broken-line circle. Black broken-line circles are mentioned in main text.

Although Figure 7 demonstrated the difference of the free-energy landscape around the ligand-binding pocket between the two systems, in a broader perspective of the free-energy landscape the two systems have a similarity: The ligand density decreased rapidly when the ligand departed from the surface of the receptor. Therefore, our study shows computationally that the complex formation tends to have a superiority against being in unbound conformations, whereas the system’s state appearing in a real solution depends on the concentration of the ligand and the receptor.

The P1 and P4 helices of the riboswitch fluctuated largely during the simulation, which was also observed in our previous DSD-mD-VcMD simulation [29]. As explained in the Introduction–section, ribocil B and ribocil A are a strong and weak suppressor of the riboflavin biosynthesis. Thus, we expected that the thermal fluctuations of the P1 and P4 helices may be different between the Ribo-A anhd Ribo-B systems. However, unfortunately, the fluctuation patterns of the P1 and P4 helices did not show substantial difference between the two systems (data not shown). This result is also found in our previous DSD-mD-VcMD simulation. The expression platform was removed in both simulations and the lengths of the P1 and P4 helix may be too short to produce the motional difference between the two systems.

Cumulative density

The above subsection gave us an impression that the free-energy landscape of Ribo-B may be more funnel-like than that of Ribo-A. To analyze more the degree of the funnel feature of the landscape, we introduced the cumulative spatial density Qcum(Li;Δn) (Equation 5). Figure 8 demonstrates Qcum(Li;Δn) at Δn=0.6,0.8 and 0.9. All of the zones in the ligand-movable box are colored in red when we set to Δn=1.0, and in contrast, all of them are not colored when we set to Δn=0.0. Figure 8A shows that 60% (Δn=0.6) of the density of ribocil A existed not only in the binding-pocket but also in two other regions on the surface of the aptamer domain in the Ribo-A system. Note that these two regions are also found in Figure 7A. In contrast, Figure 8B demonstrates that the 60% of the ligand’s density localized only in the binding pocket in the Ribo-B system. With increasing Δn to 0.8 and 0.9, the volume of the occupied zones increased for both systems. However, the tendency of density localization in the deep binding pocket was more remarkable in the Ribo-B system than in the Ribo-A system. Therefore, Figure 8 indicates clearly that the free-energy landscape for Ribo-B is funnel-like and that the landscape for Ribo-A was rather rugged. Supplementary Figure S9 presents schematically the free-energy landscapes for the two systems: The Ribo-A system has a rugged free-energy landscape, whereas the Ribo-B does a funnel-like landscape.

Figure 8 Cumulative spatial density Qcum(Li;Δn) (Equation 5) at three values of Δn: Δn=0.6,0.8 and 0.9. (A) Qcum for Ribo-A. (B) Qcum for Ribo-B.

Not only the current simulation (CSD-mD-vcMD) but also the previous simulation (DSD-mD-VcMD) showed that ribocil B binds more strongly to the aptamer domain than ribocil A does. In other words, the lowest free-energy basin in the deep binding pocket in the aptamer domain for the Ribo-B system is deeper than that for the Ribo-A system in the free-energy landscape. Importantly, these computational results agree qualitatively with an experiment [30]. As investigated in detail in the previous simulation, this difference of the free-energy landscape between the two systems caused by the difference of inter-molecular stacking: The inter-molecular stacking was formed more firmly in the Ribo-B system than in Ribo-A system. However, the simulation showed that the force-fields involve some inaccuracy. Therefore, we consider that more examination of the force field should be done carefully to discuss quantitatively the free-energy landscape.

Radial distribution-like function

Figures 7 and 8 indicated that the lowest free energy was assigned to the ligand binding in the ligand-binding pocket. Now, we show concretely the conformations in the ligand-binding pocket. To pick up those conformations we used the radial distribution-like function ρrd(rmsdlig) for each system. Figure 9 indicates that ρrd for the Rino-A system has double peaks (i.e., two clusters) at rmsdlig≈3.75Å (denoted as A1 in the figure) and rmsdlig≈5.25Å (A2) and that ρrd for the Ribo-B does a single peak (i.e., single cluster) at rmsdlig≈6.25Å (B1). This figure also shows existence of a small peak (denoted as “native-like”) in the range of rmsdlig≤2.0Å for both systems. These clusters were named by us for convenience. Supplementary Figure S10 indicates apparently that the ribocil conformations from those clusters were in the binding-pocket. Supplementary Figure S10A displays the positions of the entire-ligand centroid of snapshots taken from the A1, A2 and native-like clusters from the Rino-A system, and Supplementary Figure S10B does those from the B1 and native-like clusters.

Figure 9 Radial distribution-like function ρrd(rmsdlig) as a function of a root mean square deviation rmsdlig (Equation 6) for Ribo-A (blue line) and for Ribo-B (red line). Names of peak of the functions, “peak A1”, “peak A2”, “peak B1”, and “native-like”, are mentioned in main text.

Remember that the high-density region (red-colored contours) of the Ribo-A system were split into two spots (Figure 7A), although the two spots fused into one in Supplementary Figure S10A. This inconsistency is because Figure 7A was generated by the centroid of the R3 ring (i.e., the 3D-RCs were set to the centroid of the R3 ring in CSD-mD-VcMD), whereas Figure S10A was calculated using the position of the entire-ligand centroid. As mentioned in the Materials–and–Methods–section, the ligand’s rotation around the entire-ligand centroid does not move the entire-ligand centroid position largely. In contrast, the R3 ring quantifies the ruggedness of the binding pocket, which results in the distinct high-density spots in the binding pocket in Figure 7A. We presume that it may be better to select a small portion of ligand for the multiple-RCs in the CSD-mD-VcMD.

The entire-centroids of ribicol B in the cluster B1 in Figure 9 forms a single cluster in Figure S10B, and remember that ribocil B exhibited a single density peak in the ligand-binding pocket. This may mean that the binding-pocket of the aptamer domain behaved as a less bumpy tunnel when ribocil B is in the pocket.

Complex structures in the ligand-binding pocket

We show the complex structures of clusters identified in Figure 9. However, before it, we show the X-ray complex structure (PDB ID: 5c45) in Supplementary Figure S11, which indicates existence of three inter-molecular stackings (A48–R4, G62–R1, and A85–R3). The stackings formed in the X-ray complex structure [91] are referred to as “native stackings” in this paper. We also found two inter-molecular hydrogen bonds (H-bonds): One is from the atom O07 of ribocil B to the O2’ atom of A48 and the other is from the atom O07 of ribocil B to the N7 atom of A99, as shown in Supplementary Figure S11. We show that H-bonds regarding the atom O07 of ribocil are formed frequently to surrounding nucleotides in the clusters.

Now, we show the complex structures of the five clusters in the main text and in the supplementary file: Native-like cluster of Ribo-A is Figure 10, A1 cluster from Ribo-A is in Supplementary Figure S12, and A2 cluster from Ribo-A is in Supplementary Figure S13. The structural comparison of A1 and A2 clusters is shown in Supplementary Figure S14. The native-like cluster of Ribo-B is in Figure 11, and B1 cluster from Ribo-B is in Supplementary Figure S15.

Figure 10 (A) Ten snapshots of ribocil A, picked randomly from the native-like cluster of Ribo-A (Figure 9), which are presented on the frame of the aptamer domain of Figure S1. (B) Ten ribocil-A structures with removing the aptamer domain from panel (A) in green-colored stick model. Magenta-colored stick model is ribocil B in the X-ray complex structure (PDB ID: 5c45). See Figure 1 for four rings R1, R2, R3, and R4 in ribocil. (C) A conformation randomly picked from ten snapshots in panel (A), where the aptamer domain was replaced by one pairing with the picked ribocil A. Three nucleotides, which form native stackings (A48, B62, and A85) with ribocil are shown. Panel (C) provides a native-like H-bond formed in the shown complex structure. The number of H-bonds depends on the conformation in the native-like cluster.

Figure 11 (A) Ten snapshots of ribocil B picked randomly from the native-like cluster of Ribo-B (Figure 9), which are presented on the frame of the aptamer domain of Figure S1. (B) Ten ribocil-A structures with removing the aptamer domain from panel (A) in green-colored stick model. Magenta-colored stick model is ribocil B in the X-ray complex structure (PDB ID: 5c45). See Figure 1 for four rings R1, R2, R3, and R4 in ribocil. (C) A conformation randomly picked from ten snapshots in panel (A), where the aptamer domain was replaced by one pairing with the picked ribocil A. Three nucleotides, which form native stackings (A48, B62, and A85) with ribocil are shown. Panel (C) provides two native-like H-bonds formed in the shown complex structure.

Figure 10 demonstrates complex structures from the native-like cluster of the Ribo-A system. Figure 10A displays 10 snapshots of ribocil A picked randomly from the native-like cluster peak of Ribo-A. These ribocil-A structures overlap well one another and also to ribocil B in the X-ray structure (Figure 10B). Most of these complex structures form three native stackings, and an H-bond regarding the O07 atom of ribocil A are formed (Figure 10C). This figure was drawn by picking up a single snapshot randomly from the ten conformations in Figure 10A.

Figure 11 is complex structures assigned to the native-like cluster from the Ribo-B system. Figure 11A displays 10 snapshots of ribocil B picked randomly from the native-like cluster of Ribo-B. These ribocil-A structures also overlap well one another and also to the X-ray position. Most of these complex structures form three native stackings between ribocil and the aptamer domain, and the two H-bonds regarding the O07 atom of ribocil A are formed (Figure 10C). This figure was drawn by picking up a single snapshot randomly from Figure 11A. Of course, the number of native-like H bonds depends on the snapshots although the majority of the complex structures involves two native-like H-bonds (data not shown).

The native-like clusters (Figures 10 and 11) were involved the native-stackings and the native-like H-bonds, which are presumably stabilizing factors of the native-like complex structure. In the other clusters, we also found various stackings (native and non-native stackings). We also found that the clusters A1, A2, and B1 are frequently characterized by H-bonds regarding the O07 atom of ribocil, although the binding atoms for the H-bonds were not those in the native-like structures. We repeat here that these stackings and H-bonds stabilized the complexes of the clusters A1, A2, and B1.

The present study showed that CSD-mD-VcMD simulation sampled various complex structures, and the native-like cluster for Ribo-B was not the largest. The native-like complex structures were characterized by the native-like stackings and the native-like H-bonds. This result may suggest that the force field is inaccurate: We argued the same problem in our previous DSD-mD-VcMD study. If the force field is more accurate, the native-like complex structure is more stabilized.

Existence of the non-native clusters (A1, A2, and B1) suggests a possibility of the force-field inaccuracy. On the other hands, Supplementary Section 9 discussed that those non-native clusters are stabilized naturally by inter-molecular stackings and inter-molecular H-bonds. As mentioned in the Introduction–section, the flexibilities of RNA and ligand make the complex–structure prediction highly difficult [3,13,14]. Even keeping the coaxial stacking and tertiary contacts in RNA, some RNA–ligand complex structures are possible [14]. Furthermore, the system in our simulation was in solution, whereas the experimental structure of this system is from an X-ray crystallography [91]. Change of the environmental condition may vary the system’s flexibility (this point is discussed later). Based on these arguments, we presume that the complex structure is fluctuating among different conformations in solution. Therefore, a sampling method that can propose a variety of complex structures is important when the system is flexible.

Ligand’s density in the density tunnel and two binding pathways

In the above subsection, we analyzed ligand’s structures in the ligand-binding pocket. In this subsection, we analyze the ligand’s density in the ligand-binding pocket. As shown in Supplementary Figure S6, there is a tunnel that connects the front and rear surfaces of the aptamer domain. Our previous study showed that non-zero ligand’s density occupies the tunnel, and we named the tunnel “density tunnel” [29]. In this subsection, we analyze again the ligand’s density in the density tunnel in detail.

Figures 12A and B focus on the density tunnel, where two and single high-density spots exist in the Ribo-A and Ribo-A systems, respectively. This figure demonstrates apparently that the front surface and the rear surface of the aptamer domain was connected by non-zero density region (the bright-yellow contours): Qcano(L)=ρmax×10−2.55 for Ribo-A (Figure 12A) and Qcano(L)=ρmax×10−3.47 for Ribo-B (Figure 12B). Thus, the ligand can go through the density tunnel from either surface. This result agrees with that from our previous DSD-mD-VcMD study [29]. On the other hands, the density of the cyan contours (Qcano(L)=ρmax×10−1.85 for Ribo-A and Qcano(L)=ρmax×10−2.77 for Ribo-B) was disconnected by the red dotted line in each panel of Figure 12. This means that a free-energy barrier exists long the red dotted line in both systems. The red dotted line is at a position slightly shifted from the highest-density spots (the red-contours) toward the rear surface. We note that the free-energy barrier was also found in the DSD-mD-VcMD study [29] and the barrier position was also similar with the present study.

Figure 12 Spatial density of ligand in the ligand-binding pocket. Density is shown at three counter levels indicated by insets. Value for ρmax is given in main text. (A) Ribo-A and (B) Ribo-B. Terms appearing in the two panels are mentioned in main text. Position of P1 and P4 helices are indicated by “P1 helix” and “P4 helix”, respectively.

Of course, the connection of the density recovers when we decrease the density level of the cyan contours. We presume from Figure 12 that free-energy barrier height from the lowest free-energy spot for Ribo-B is higher than that for Ribo-A because the ratio of density between the largest density spot (Qcano(L)=ρmax) and cyan-colored contours is ρmaxρmax×10−1.85≈70.8 for Ribo-A and ρmaxρmax×10−3.02≈1047.3 for Ribo-B.

In the present CSD-mD-VcMD study, Qcano(L) was computed using all conformations appearing in the entire simulation (i.e., the number of all simulation steps): 85×109 conformations (=106 steps ×103 runs ×85 iterations) for each system. In contrast, in DSD-mD-VcMD, Qcano(L) was computed from snapshots output during the simulation: 4.608×106 snapshots (=10 snapshots ×1024 runs ×45 iterations) for each system [29] because a snapshot was output at every 105 steps of simulation (i.e., a single run produced only 10 snapshots). In this sense, the accuracy of Qcano(L) from CSD-mD-VcMD is considerably higher than that from DSD-mD-VcMD: (85×109)/(4.608×106)≈2×104. On the other hands, in DSD-mD-VcMD, a small-RC region was sampled selectively: If the small-RC region has not been sampled well up to an iterative stage (say, the M-th iterative state), the small-RC region can be sampled selectively in the (M+1)-th iterative stage [29,86]. Consequently, DSD-mD-VcMD produced many conformations in the small-RC range than CSD-mD-VcMD did.

On the other hand, we should note that the only quantity usable to analyze the molecular structures in detail is snapshots, which were output taking an interval of 105 steps in both CSD-mD-VcMD and DSD-mD-VcMD. Because we performed presently the CSD-mD-VcMD simulation assuming that the native complex structure is unknown, the number of snapshots in the density tunnel is small. This point is discussed in the next subsection.

Last in this subsection, we note that a poorly sampled RC region can be sampled selectively even in the CSD-mD-VcMD simulation: We can set a sub-box so that the poorly sampled region is involved in the sub-box in the next iterative stage. If the native-complex structure is known in advance, the sub-boxes can be set so that the sub-box are near the native-complex structure, although the present CSD-mD-VcMD simulation did not use the knowledge of the native-complex structure.

Ordering of ligand’s molecular orientations in the density tunnel

The quantities regarding the ligand’s molecular orientation around the aptamer domain were defined in Supplementary Section 6: eα, <eα>, and SPα (α=↑ or ←) at a fixed site rcube in the periodic boundary box (or ligand-movable box). Remember that the weak position restraints were applied to 15 nucleotides of the aptamer domain (Supplementary Figure S4). The spatial pattern of vector <e↑> (Supplementary Equation S15) is provided in Figures 13A and B of the main text. We did not provide the vector <e←> because it was not informative, which means that <e←> was randomized in the density tunnel.

Figure 13 Vector <e↑> (Supplementary Equation S14) at each site for Ribo-A (A) and Ribo-B (B). Norms of <e↑> are presented in inset. Words appearing in the figures are explained in main test. Scalar product SP↑ (Supplementary Equation S16) at each site for Ribo-A (C) and for Ribo-B (D). Only two types of SP↑ are displayed by different colors depending on their norm (see inset).

Now, we divided the density tunnel into “front tunnel” and “rear tunnel”. Similarly, each of the front and rear tunnels was divided into “entrance” (a cyan broken-line rectangle) and “middle” (a red broken-line rectangle) as shown in Figures 13A and B. Apparently, the ordering of e↑ was strong in the entrance region for both the front and rear tunnels because relatively large |<e↑>| vectors were assigned to the entrances. This strong ordering of e↑ in the entrances is similar with that found in our previous DSD-mD-VcMD study [29]. It is likely that a mechanism called “ligand’s orientation selection” may work in the current ligand–RNA systems, which was observed computationally in the binding of a medicine and a GPCR [92]. Figures 13C and D demonstrate that the scalar product SP↑ is relatively large (|SP↑|≥0.5) in both entrances for both systems although the ordering in the front entrance is stronger than that in the rear entrance. This result suggests that when the ligand enters into the tunnel from the rear entrance, the ligand should change its molecular orientation largely before the ligand reaches the binding position with the experimentally observed binding pose. It is likely that the orientational rearrangement requires overcoming the free energy barrier (red broken line in Figures 12A and B). We note that this mechanism was also found in our previous DSD-mD-VcMD study [29].

Figures 13A and B showed that the ligand’s orientational ordering was weak in the middle regions (red broken line rectangles). This weak ordering in the middle regions disagrees with our previous DSD-mD-VcMD study: We could detect many <e↑> vectors in the middle regions in the study by the effect of selective sampling explained in the above subsection. We have analyzed the snapshots from the current CSD-mD-VcMD simulation and found a simple reason: The number of snapshots output in the middle regions of the density tunnel was small in the present CSD-mD-VcMD simulation. We note that even if a small number of snapshots are output to a zone, it does not necessarily mean that the density assigned to the zones is small [86]. We also noticed that the front middle region had more <e↑> vectors than the rear middle region did in Figures 13A and B. This is simply because more snapshots were output to the front middle region than to the rear middle region.

Discussion

First, we describe the similarities of results between the present and previous [29] methods in many aspects. In both studies, ribocil B bound to the binding pocket of the aptamer domain more strongly than ribocil A did. The current study has shown that the free-energy landscape for the Ribo-B system is funnel-like whereas the landscape for the Ribo-A system is rather rugged. The spatial density Qcano(L) showed the existence of a tunnel (named density tunnel) connecting the front and rear entrances to the ligand-binding pocket for both systems: It is likely that the ligand can reach the ligand-binding site of the aptamer domain by entering either the front or rear entrance. However, ligand’s entering from the rear entrance requires overcoming a free-energy barrier to reach the ligand-binding position, which agreed with the result from the previous paper.

Furthermore, for both systems, the ligand’s molecular orientation <e↑> was ordered in both the front and rear entrances (Figures 13A and B), and <e↑> in the front entrance of the density tunnel tended to be parallel to that in the X-ray complex structure (Figures 13C and D). Therefore, the ligand can reach the ligand-binding site smoothly without a large orientational rearrangement when the ligand enters from the front entrance. We presume that this ligand’s orientational ordering suggests a ligand-binding mechanism (ligand’s orientation selection) that was proposed in our work of a ligand (bosentan) binding to the deep binding site of a GPCR protein [92]. In contrast, the ligand entering from the rear entrance tended to have an anti-parallel molecular orientation to that in the X-ray complex structure. Thus, the ligand’s orientation is required to rotate largely to reach the ligand-binding site in that case. In other words, the approach from the rear entrance to the binding-site accompanies a free-energy barrier. These results also agree with our previous study [29].

To investigate <e↑> in the middle regions of the density tunnel (see Figures 13A and B for definition of the middle regions), we tried to pick up snapshots from the middle regions. However, the number of snapshots from the middle regions was considerably small. Remember that the sub-boxes were set randomly in the ligand-movable box in the present study and that <e↑> was calculated using only the output snapshots (a snapshot outputs every 105 simulation steps). Contrarily, in DSD-mD-VcMD, we can sample selectively small-RC regions, which involve the native-like complex conformations, with proceeding the iterative stage. This difference of data collection between CSD- and DSD-mD-VcMD procedures is resulted from whether the native-complex structure is known or unknown. However, we note that selective sampling of poorly-sampled RC zones can be used even when the native complex structure is unknown in CSD-mD-VcMD.

Next, we argue the ligand’s poses detected in the binding pocket. We obtained five distinct clusters: The native-like, A1 and A2 clusters from the Ribo-A system, and the native-like and B1 clusters from the Ribo-B system. As mentioned in Results section, the native-like cluster was less stable than the other clusters in Figure 9. However, existence of the native-like complex structure suggests that the current sampling method has an ability to sample a variety of conformations involving the native-like complex. Thus, the stability of the native complex structure may be stabilized by modifying the force fields.

One important issue of this paper is the effect of the structural flexibility (conformational fluctuations) of RNA and ligand on the complex structure. Recent studies on the RNA–ligand complex have uncovered the importance of the structural flexibility. In other words, the flexibilities of RNA and ligand make the complex–structure prediction highly difficult [3,13,14]. Interestingly, even though the overall shape of RNA is well conserved, different binding poses of the ligand to RNA are possible [14]. Our computation results may relate to these studies. The A1, A2, and B1 clusters were different from the native-like cluster, and each cluster was stabilized by the inter-molecular stackings and inter-molecular hydrogen bonds (Supplementary Figures S12–15). Our results may be example that some stable clusters other than the experimentally observed complex can exist in the binding-pocket of RNA with satisfying a statistically acceptable criterion (these clusters are basins in the free-energy landscape at the room temperature in solution). We calculated the structural fluctuations of the aptamer domain (see Supplementary Equations S18–21 for the calculation method of the fluctuations) and found that the fluctuations were larger than those in the crystal (Supplementary Section 10, Supplementary Table S3, and Supplementary Figure S16), whereas the patterns of the structural fluctuations were very similar between in solution and in crystal. This result also supports that multiple complex forms are possible in the binding pocket in solution.

As discussed above, the prediction of the RNA–ligand complex structure is difficult when the high structural flexibility of the molecules or the force-field inaccuracy is involved in the computation. We admit that we cannot solve this difficulty only from the current results of the DSD-mD-VcMD method. However, the current method can propose possible complex structures (candidates) based on the statistical mechanics (or thermodynamics) with assigning a probability to each candidate. Second, our method can use various force fields by switching an input file of GROMACS (i.e., topology file), which may useful to evaluate accuracy and inaccuracy of the examined force fields. Third, the free-energy landscape can provide not only the complex structures but also the pathways along that various semi-stable complex structures appear. Then, we may obtain a deeper understanding on the RNA-ligand binding process and their structure flexibility.

Last in this paper, we discuss the extensibility and reliability of CSD-mD-VcMD to a system other than the RNA system. As explained, we developed the CSD-mD-VcMD method to be applied to a flexible system and/or a system in that the receptor–ligand interaction sites are not known well. Therefore, one suitable target for our method is an intrinsically disordered protein (or an intrinsically disordered region of an ordered protein) interacting to a flexible ligand. Besides, both the CSD- and DSD-mD-VcMD methods are achieved with using PLUMED. The difference of CSD and DSD methods is only the input file for PLUMED. Thus, we can examine various reaction coordinates other than CSD and DSD by preparing a different PLUMED input file.

Conflict of interest

The authors declare no conflicts of interest.

Author contributions

The fundamental idea of CSD-mD-VcMD was generated by J. H., K. K. and S. S. Then, Y. F. proposed to apply the CSD-mD-VcMD method to investigate the RNA-ligand binding mechanism and helped some data analyses to make clearer the meaning of the simulation results. K. K. and S. S. implemented the cartesian-space-division (CSD) procedure to GROMACS and PLUMED. G.-J. B. and N. K. determined the RNA–ligand simulation system to the aptamer domain of the FNM riboswitch and robocil A (or B), and generated the input files for the computation. I. F. and T. T. discussed some methodologically important points and helped some data analyses. Simulation and paper-writing were mainly done by J. H.

Data availability

We uploaded important snapshots obtained from the current simulation to the Biological Structure Model Archive (BSM-Arc). The entry for this study is https://bsma.pdbj.org/entry/77 We are planning to make the code available in near future after taking the necessary steps to address the regulatory issues due to compliance with the Foreign Exchange and Foreign Trade Act.

Acknowledgements

This work was supported by JSPS KAKENHI Grants No. 21K06052 (J.H.) and 20H03229 (N.K.), and performed in part under the Cooperative Research Program of the Institute for Protein Research, Osaka University, CR22-02, CR-23-02 and CR24-02. J.H. and K.N. acknowledge support by the HPCI System Research Project (Project IDs: hp170020, hp180050, hp190018, hp200063, hp210005, hp220015, hp230003, and hp230011, hp240005, hp240032, hp240167). J.H., I.F., N.K., and Y.F. are supported by the Project Focused on Developing Key Technology for Discovering and Manufacturing Drugs for Next-Generation Treatment and Diagnosis (project ID: 23ae0121028h0003; 2018–2021 and 24ae0121028h0004: 2021–) from AMED and the Japan Biological Informatics Consortium (JBiC). GA-mD-VcMD was performed on TSUBAME3.0 and TSUBAME4.0 supercomputers at the Tokyo Institute of Technology. We used UCSF Chimera ver. 1.15 for drawing molecular structures and gnuplot for graphs. Other analyses were done using homemade programs.

References

Corresponding author

Register with J-STAGE for free!