Biophysics and Physicobiology
Online ISSN : 2189-4779
ISSN-L : 2189-4779
Review Article (Invited)
Molecular dynamics analysis of biomolecular systems including nucleic acids
Takeru KamedaAkinori AwazuYuichi Togashi
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML
Supplementary material

2022 Volume 19 Article ID: e190027

Details
Abstract

With the recent progress in structural biology and genome biology, structural dynamics of molecular systems that include nucleic acids has attracted attention in the context of gene regulation. The structure–function relationship is an important topic that highlights the importance of the physicochemical properties of nucleotides, as well as that of amino acids in proteins. Simulations are a useful tool for the detailed analysis of molecular dynamics that complement experiments in molecular biology; however, molecular simulation of nucleic acids is less well developed than that of proteins partly due to the physical nature of nucleic acids. In this review, we briefly describe the current status and future directions of the field as a guide to promote collaboration between experimentalists and computational biologists.

Significance

DNA carries genomic information, and RNA serves as a catalyst or regulator, as well as a temporal copy of the information. These nucleic acids are not merely information media but also parts of the information-processing machinery, and hence their physical properties are important for the operation of the system. However, compared with proteins and lipids, molecular simulation of nucleic acids is intrinsically difficult and requires further improvement. Here, we provide an overview of recent progress in this field and discuss future directions.

Introduction

Biological systems are characterized by a large variety of macromolecules, the functions of which are inseparable from their structure. In particular, the operation of molecular machines, such as enzymes, is coupled with their conformational changes, analogous to macroscopic machines. Structural biology from the viewpoint of physics and chemistry has thus expanded its scope of application and attracted broad attention in the life sciences. X-ray crystallography [13] and nuclear magnetic resonance (NMR) spectroscopy [4,5] have been applied to determine many protein structures, and recently, cryo-electron microscopy (cryoEM) has become a powerful tool for solving large complexes [611]. Analysis using these methods is driving progress in structural biology [12], and allowed protein structure and function (e.g., enzyme activity) to be unveiled in parallel [13]. However, progress in this area needs to proceed further, as there remain many proteins lacking a known structure.

These methods have also been applied to other biomolecules. Since the discovery of the double-helix structure of DNA by Watson and Crick [14], nucleic acids have been also targeted [1517]. Similar to the amino acid sequence of proteins, the nucleotide sequence affects the structure and physical properties of nucleic acids [18,19]. Complexes of nucleic acids and proteins play crucial roles in the cell. Nucleosomes, as constituent units of chromatin, are of particular importance, and their structure has been determined [2022]. Efforts to elucidate the relationships between nucleosome structure and dynamics and transcription-regulation mechanisms are now acknowledged in the emerging field of chromatin biology [2325].

Although structural biological methods have contributed to solving a large number of structures, information on their dynamics is limited [26]. Analysis of structural dynamics is particularly important if the target molecule shows large conformational changes or secondary structure transitions; however, crystallography and cryoEM can provide only single static structures in most cases, and the spatiotemporal resolution of NMR is not very high. Molecular simulation has thus become a complement to these experimental methods [2730]. In this review, an extended version of the Japanese article [31], we briefly introduce the basics of molecular dynamics (MD) simulations and then showcase recent advances in MD studies of proteins and nucleic acids. Finally, we discuss the current status and future directions of molecular modeling and simulation for nucleosomes and chromatin.

Molecular Dynamics (MD) Simulation

In this section, we briefly overview modeling and MD simulation techniques for nucleic acids and other biomolecules (for a general introduction to MD simulations, see Supplementary Text S1).

Force Fields for MD Simulation

Classical MD assumes that the force on each particle is calculated as a function of particle positions; i.e., a force field, although often represented in the form of a potential energy function. A force field is an approximation of quantum mechanics [32], and model parameters are determined by experiments or quantum chemistry calculations of small systems. Therefore, the parameters (e.g., bond stiffness) of a particular force field vary [33].

For biomolecules, such as proteins, nucleic acids, and lipids, AMBER [34,35], CHARMM [36,37], GROMOS [38], and OPLS [39,40] force fields are commonly used, with each having strengths and weaknesses according to the application. For example, AMBER provides variants of force fields adapted for different types of systems and is frequently used for nucleic acids. CHARMM is popular e.g. for membrane-bound proteins and has been increasingly used for nucleic acids, especially since the CHARMM36 update with improved accuracy [36]. An accurate force field specialized for DNA has been recently developed [41]. These force fields are periodically updated and will be further improved in the future [42].

Coarse-grained (CG) Models

A defined force field theoretically enables the performance of MD simulations for any system; however, system size and time scale are practically limited by computational costs. Using all-atom models, the accessible time scale is only nanoseconds to milliseconds, no longer than the folding time of small proteins [43,44].

Coarse-grained (CG) models are often used to mitigate this problem [4548]. Coarse-graining reduces the number of particles (or variables) in the model (e.g., using a single particle as a substitute for atoms in an amino acid residue within a protein) [49,50], which can decrease the calculation of interactions and also increase the time step allowed, drastically saving computation time and costs (Fig. 1). Various CG models for proteins and nucleic acids have been proposed; however, their accuracy and application are limited due to the simplified form, which forfeits microscopic details such as side-chain conformations and interactions [51]. Currently, applications of CG models are problem-dependent, and improvements in their use for proteins and nucleic acids have been in progress [5255]. Furthermore, long fibers of DNA and chromatin have been modeled as polymer chains [5662], and efficient simulation methods have been considered [63].

Figure 1 

Coarse-grained (CG) models. (a) 3SPN (3-site-per-nucleotide) model [65,66]. Several atoms in the all-atom model are represented by a single particle, as shown in the right panel. Figure reprinted from [66] with permission. (b) OxDNA model [67]. Each nucleotide is assumed as a rigid body with a backbone and a base parts. Figure reproduced from [68] under the Creative Commons Attribution-NonCommercial License. (c) Elastic network (1P1N) model. Each nucleotide is represented by a particle, and all the interactions are approximated as linear springs. Figure reproduced from [55] under the Creative Commons Attribution License.

Note that the use of massively parallel supercomputers enables simulations of large systems, e.g. all-atom models of cytoplasm [64], and even larger systems with billions of atoms (e.g. a whole cell) will be targeted. In contrast, computing time for each simulation step cannot be drastically reduced, even in a simulation of a small molecule by a state-of-the-art supercomputer; which is a so-called strong scaling problem. Hence, coarse-grained and/or simplified models will be still useful for slow (longer than milliseconds) phenomena also in the future.

Extended Ensemble Methods

MD simulation numerically reproduces time series for coordinates that represent molecular motion. Considering the snapshots as samples of the structural ensemble allows evaluation of statistical or physical quantities, such as the frequency of secondary structure formation in proteins [69,70]. However, MD simulation in equilibrium is not always adequate to sufficiently explore the structural space to allow estimation of properties [71,72]. In MD simulation, the initial conformation is usually established according to an experimentally known structure. If the molecule acquires another stable conformation that is separated by a high free energy barrier (Fig. 2a), acquisition of the other conformation by the system within a realistic computational cost is likely impossible due to the barrier [69,7376]. Countermeasures to address these situations have been applied.

Figure 2 

The free energy barrier. (a) Local free energy minima separated by high energetic barriers can significantly limit conformational sampling within a limited simulation time (left; the simulation starts at state A and ends before reaching another state, B), which distorts the estimated free energy landscape (right). (b) Schematics of the temperature replica-exchange method. Transitions are possible at a high temperature. The exchange probability obeys the detailed balance condition, which maintains the distribution and hence the free energy profile at each temperature unchanged.

An extended- or generalized-ensemble method can be applied, typically in combination with Monte Carlo simulations and also with MD simulations. Extended-ensemble methods make it easier to overcome free energy barriers by artificially creating high-energy states that appear more frequently than what would normally occur in the canonical ensemble. Multi-canonical and simulated-tempering methods are traditional examples along with replica-exchange [7781] and umbrella-sampling [8285] methods.

These enhanced methods can be largely classified into two groups; one of which depends on pre-defined collective variable(s) (CV), and the other does not require CVs. Although the dynamics of macromolecules involve thousands to millions of degrees of freedom (DoF), usually, only a small number of DoF are relevant to the function. Hence, the dynamics are often described by low-dimensional (typically one to several) generalized coordinates, or CVs, determined by a certain projection from the original coordinates; e.g., end-to-end or domain-to-domain distances may serve as CVs.

As long as the behavior is well captured by the CVs, conformational states can be efficiently sampled by artificially controlling the dynamics in the CV space. The umbrella-sampling method is a common CV-based method, which cancels the free energy barrier by the addition of a bias-potential function based on the CV(s) [86,87]. This allows conformational states within a certain range of coordinates to be intensively explored. Moreover, using multiple coordinate ranges can allow the broad sampling of states that include the barrier region, and the probability distribution (and hence the free energy landscape) can be recalculated by canceling the effect of the bias potential [88,89]. Metadynamics [90] and adaptive biasing force [91] methods are also based on CVs. If the initial and final states of the transition are known, besides, path sampling techniques such as the string method are convenient for searching the structural transition path [92,93].

In contrast, CV-free methods are useful when the system involves complex phenomena that cannot be represented by low-dimensional coordinates. The replica-exchange method is a parallel version of simulated tempering (also known as parallel tempering), where multiple replicas of the same system are simulated at different temperatures or other parameters in parallel. In the case of temperature replica exchange, replicas are exchanged between two temperatures and at a probability that does not change the equilibrium distribution. By using an appropriate set of temperatures, the system can escape from (meta)stable states at a high temperature (Fig. 2b). Replica-exchange MD simulation is frequently used for various problems such as protein-folding [94,95]. Multicanonical MD [96] and accelerated MD [97] methods also belong to this class.

Although these methods are frequently used for proteins, there remain problems with their application to systems that include nucleic acids.

MD Simulation of Nucleic Acids

In this section, we focus on MD studies on simplex nucleic acids, such as DNA and RNA. In the case of proteins, MD simulations usually initiate from an already-folded structure according to structural data deposited in the protein data bank (PDB), and homology modeling is applied when necessary [98,99]. By contrast, there are limited cases of experimentally determined simplex nucleic acid structures except some short segments [100103], given that they do not generally fold into a specific stable structure. For double-stranded structures of nucleic acids, structures can be modeled by stacking individual base pairs (Fig. 3). Specifically, base-pair configurations are determined by base-step parameters depending on the specific nucleotides; e.g. X3DNA [104,105]. Double-stranded (ds) DNA structures can theoretically be generated for any nucleotide sequence in this way [52,106,107], resulting in predicted structures that are potentially applicable for MD simulation.

Figure 3 

Base-pair and base-step parameters. Relative positioning (translation and rotation) of bases in a base pair (base-pair parameters) or between adjacent base pairs (base-step parameters) [104,105]. Bases are shown as planes. Figure derived from [104] under the Creative Commons Attribution License.

In addition to common B-DNA, other structures including A- and Z-DNA can be observed under special conditions [108], and their formation or transition from B-DNA has been studied [109,110]. Such structural transitions have also been the target of MD simulations, in particular their solution conditions, free energy barriers, and deformation sites [110112]. Additionally, excessive mechanical stress can reportedly induce transitions between DNA structures [113,114], with transitions depending on the chemical modification of DNA previously studied by MD simulation [115].

Chemical modification frequently occurs in DNA in the eukaryotic genome and acts as a biochemical signal [116,117]. For example, DNA methylation is broadly observed and biologically important, as 5-methylated cytosine (5mC or mC) in CpG islands (regions where CpG dinucleotides are condensed) is a well-known gene-silencing signal involved in regulating cell differentiation [118120]. The effects of DNA methylation have been investigated by MD simulation, with findings ranging from those associated with microscopic interactions with nearby atoms (Fig. 4) and hydration [121124] to functional behavior, such as strand separation [125] and homologous sequence recognition [126], as well as alterations relevant to nucleosome formation and positioning [127,128]. In a previous study, we focused on microscopic mechanical properties represented by base-pair and base-step dynamics (Fig. 3) and systematically evaluated their dependence on methylation patterns (Fig. 4) [124]. Moreover, these dependencies may explain the mechanism of (macroscopic) methylation-induced DNA stiffening observed experimentally [128,129]. Notably, other DNA modification besides methylation have gained increasing attention, and their effects have been discussed [125,128].

Figure 4 

Directed effects of methylation on dsDNA dynamics. Methylation is shown in orange. Both the tilt and twist (Fig. 3) modes are affected by methylation; however, the ranges of the effects (blue or red squares) differ for the modes. Figure reproduced from [124] under the Creative Commons Attribution License.

Compared with DNA, little is known about the physical properties of RNA either through experimental or simulated observations. Simplex RNA typically does not form a specific structure similar to dsDNA. The backbone of RNA is flexible and can result in promiscuous base-pairing between complementary sequences; therefore, its folding pattern is not simply related to the nucleotide sequence, and hence methods to determine RNA structure have been sought. RNA secondary structures have been a primary focus of discussion regarding prediction methods [130132]. Structures of protein–RNA complexes have been determined by crystallography [133135], and although methods for predicting RNA structure remain under development, the three-dimensional (3D) structure of RNA has been investigated in the context of riboswitches [136138]. Recently, noncoding RNA was intensively studied and its function in gene regulation elucidated. In these fields, it is likely that the roles of RNA as molecular machines will be a research focus and for which the structure–function relationship is important.

Although MD simulations have been applied to RNA molecules with experimentally known structures, there remains room for improvement in force field accuracy [139] and the analysis of dynamics [140]. Specifically, the experimental data needed for calibration (e.g., the crystallographic B-factor) is still not satisfactory, and additional datasets would be valuable for the refinement of force fields [141,142]. For dynamics analysis, the definition of modes of motion or reaction coordinates is important; however, there is a lack of consensus for RNA, unlike proteins, where functional domains are known in many cases. In the case of single-stranded RNA, structural behavior is extremely diverse, and its classification requires both novel experiments and theories. Currently, the behavior of only short RNA segments (e.g., hairpin structure [143], tetraloop formations [144], and base stacking [145,146]) has been discussed. In addition, nucleotide dependence and/or chemical modification in many specific systems have been discussed [147].

Another area involves comparison of DNA and RNA. Although they differ only by an oxygen atom in the nucleotide monomer, disparities are observed in the flexibility of double-stranded molecules [148], backbone dynamics and base stacking [146], and twisting and stretching stiffness [149,150]. For example, when stretched, DNA overwinds resulting in a smaller helical radius, whereas RNA unwinds [150]. Moreover, the geometries estimated by simulation also differ. Such distinct features of DNA and RNA are also likely important to their roles in molecular complexes (e.g., RNA polymerase [151] and CRISPR-Cas9 [152]).

MD Analysis of Nucleosomes

Nucleic acids do not always work alone and sometimes form a complex with proteins. Transcription factors such as p53 interact with DNA, and their recognition and binding mechanisms have been discussed by MD simulations [153]. The CRISPR-Cas9 complex, broadly utilized for genome editing, is also a target of MD analysis [152,154]. Ribosomes are protein-RNA complexed machinery essential for protein synthesis, though too large for all-atom MD as a whole [155,156].

Of special interest are nucleosomes. Chromatin comprises nucleosomes, which represent complexes of DNA and histone proteins [20,158161]. Nucleosome structures have been determined by crystallography and cryoEM, resulting in numerous structural datasets available in the PDB for different complexes [162165]. Moreover, determination of conformations of multiple nucleosomes along the DNA is also possible [166,167]; however, proper utilization of these data remains an issue for theoretical and computational biologists.

Nucleosome positioning on genomic DNA has been extensively investigated experimentally. Sequencing of nucleosomal DNA (e.g., MNase-seq) can demonstrate nucleosome affinity [168170], which can be affected by DNA mechanics depending on the DNA sequence. However, there are few structural studies addressing the sequence-dependence of such interactions. Nevertheless, special sequences required for stable nucleosome formation are commonly employed in structural-determination experiments [20,21], whereas computational studies usually adopt the sequence present in the given structure. Additionally, nucleosome sliding has been discussed in the context of MD simulations [171173], with results showing that dsDNA dynamics are dependent on nucleotide sequence, which thus alters the sliding behavior. For example, a previous study evaluating the “601 sequence” (known for highly stable positioning) showed discrete transitions, whereas simple repeat sequences allowed continuous sliding [173]. These studies applied CG models of DNA and protein, and despite their omission of atomic details, the results successfully defined the effects of nucleotide sequences and have been further applied to other investigations of chromatin dynamics [25,174176].

Nucleosome unwrapping (i.e., extracting DNA from core histones) has been investigated using either all-atom or coarse-grained models in order to determine specifics regarding regulation of the nucleosome structure and the accessibility of nucleosomal DNA. Of particular interest are the pioneer factors that interact with nucleosomal DNA to affect gene transcription [177]. The energetic barriers to changes in nucleosome conformation (i.e., resistance to unwrapping) and the roles of histones have been surveyed [157,178182]. The stable barrel-like structure is amenable to characterizing nucleosome motion. Such studies use the end-to-end distance of DNA as a coordinate (CV), which allows evaluation of structural dynamics according to those structural coordinates. Recently, long MD simulations revealed plausible modes of nucleosome unwrapping at the beginning of sliding (Fig. 5) [157]. The behavior of neighboring or overlapping nucleosomes [183] is also current targets of detailed MD analyses [184]. Such studies promote future prediction of nucleosome dynamics.

Figure 5 

Unwrapping of DNA from nucleosomes. Typical modes of DNA unwrapping from the nucleosome core according to an MD simulation. Interactions between DNA and histone tails modulate the extent of unwrapping. Figure reproduced from [157] under the Creative Commons Attribution License.

Systems of single to several nucleosomes are now good targets for MD simulations. Currently, the majority of MD studies of nucleosomes focus on histones rather than DNA. For example, the effects of histone modification (e.g., methylation at specific positions) on the nucleosome dynamics have been investigated [185,186], with the findings largely consistent with those from biochemical studies. Furthermore, the roles of histone tails in nucleosome unwrapping [157] or alignment [187] have been elucidated, with the dynamics of these tails affecting overall structural stability [188193] and offering insight into structures lacking some histones [194].

Because DNA and core histones have large negative and positive charges, respectively, they strongly attract each other and stabilize the barrel-like structure of the nucleosome; however, this also makes MD studies difficult, in that such stability precludes observation of conformational changes within a realistic time scale (e.g., microseconds for all-atom models). Therefore, detection of subtle differences based on changes in single nucleotides may require higher computational costs.

Additionally, the tails of histones are flexible and strongly attract nucleosomal DNA [189,194], resulting in observation of spontaneous detachment a rare event in MD simulations. A possible solution to this problem involves the application of methods that involve variable (reduced) strengths of electrostatic interactions (e.g., REST2 [195,196]).

Summary and Future Perspectives

In this review, we described analytical methods for evaluating the structural dynamics of molecular systems that include nucleic acids. We focused on efficient sampling methods for assessing conformational states and their necessity due to the prohibitive computational cost of all-atom MD simulation over physiological time scales. Because nucleic acids are becoming higher profile targets of bioengineering and drug discovery (e.g. mRNA vaccines [197199]), determining the structural dynamics of nucleic acids is of increasing importance, and effective computational methods for this research are needed.

Many proteins form a specific folded structure characterized by secondary structures or functional domains, whereas nucleic acids do not usually fold into a single structure, making definition of their structural states or order parameters nontrivial. Although dsDNA is typically characterized by local structural variables, these cannot be used to represent meso- or macro-scopic structural properties. This is further complicated in the case of RNA, which lacks consensus parameters necessary for structural analysis. The application of CV-based methods to nucleic acids is thus obstructed in many cases.

In some specific systems, we can define CVs suitable for the dynamics of nucleic acids therein. We recently proposed a method to estimate the free energy landscape for the binding of base pairs without destroying the structure by applying adaptive biasing force MD [91], one of the CV-based methods; in which the distance between the bases in each pair could be chosen as a CV (Fig. 6) [155,156]. Although the system size is limited by its relatively large computational cost, application of this method demonstrated the binding dynamics of codons and anticodons in the ribosome (pre-initiation complex), thereby offering insight into the mechanical basis of start-codon recognition, and confirming that the estimated affinity was consistent with the experimentally-observed initiation frequency. This method would likely be useful for the analysis of nucleotide-dependent behavior in other systems.

Figure 6 

Pairing of mRNA and tRNA in the ribosome. (a) Binding and unbinding dynamics. Binding is facilitated by stabilization of the first base pair, followed by formation of the second and third base pairs (left, downward arrows). Unbinding appears to be initiated by dissociation of the third base pair (right, upward arrow) and enhanced by stretching (right, thick arrow). (b) The adaptive biasing force (ABF) method. Mean forces in terms of the CV (conformational coordinate) ξ are calculated through the potential of mean force. Additional bias that cancels the mean forces is applied in order to unfavor already well-sampled regions and allow efficient sampling of conformations in high free energy regions (e.g. mismatched base-pairing in this case) without changing the temperature. Figure reproduced from [155] under the Creative Commons Attribution License.

On the other hand, CV-free methods have shortcomings that become crucial for systems with nucleic acids. For example, following denaturation of dsDNA, the strands do not revert to the original double-stranded structure within the typical time scale of an MD simulation; therefore, sampling methods involving complete separation of strands (e.g. temperature replica-exchange MD) cannot work efficiently. Nevertheless, for complex behavior, CV-free methods need to be adopted, and there are choices other than the classical replica-exchange. The Gaussian accelerated MD method, for example, was successfully applied to the CRISPR–Cas9 system [152,154].

Machine learning approaches are getting popular also in the field of structural biology. Among them, 3D-structure prediction by AlphaFold2 [200,201] and RoseTTAFold [202] have attracted attention [203]. These methods rely on structural data available in the PDB and provide not only a predicted structure but also additional information valuable for MD analysis, e.g. to define CVs [204]. Another trend is machine learning force fields. Unlike classical force fields, the potential function does not depend on the pre-defined types of atoms, instead fits quantum mechanics calculations using machine learning [205,206]. These techniques are indeed powerful, however, their statistical nature requires a large amount of data in general. Compared with proteins, available data sets are still limited for nucleic acids, and reliable prediction of structure and dynamics requires the accumulation of additional data. Although a similar structure prediction technique was proposed for RNA [207], there remains significant room for improvement.

In the field of drug discovery, recently, RNA-targeted drugs have attracted increased attention [208]. Currently, most studies targeting RNA with small molecules are aimed at inhibition of splicing or translation (such as antibiotics), using small molecules as structural restraints, and simulation-based techniques have not been fully utilized yet [209,210]. Development of MD analysis for the complex behavior of nucleic acids will open the possibility of rational design of nucleic-acid-targeted drugs. As well as existing drug-target proteins such as receptors or kinases, hopefully, interactions between nucleic acids and ligands will be efficiently surveyed and designed in near future.

Given the diversity of biopolymers and biopolymer complexes, only a very limited portion of possible combinations can be studied. This is especially problematic when the molecular function or structure–function relationship of these biopolymers is unclear. Solutions toward the effectual survey of such molecules and finding of their universal laws include the application of informatics, which allows the comparison of items represented by sequences, although frequently at the exclusion of physicochemical properties. Therefore, combinations of computational biophysics and bioinformatics (including multi-omics analyses) will be powerful tools for the analysis of biomolecular dynamics [211].

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Author Contributions

TK, AA, and YT wrote the paper. A preliminary version of this work, DOI: 10.48550/arXiv.2111.10749, was deposited in the arXiv on November 21, 2021.

Acknowledgements

The authors are grateful to S. Tate, T. Yamamoto, N. Sakamoto, M. M. Suzuki, I. Nikaido, R. Erban, K. Asano, S. Shinkai, and H. Nishimori for fruitful discussions. This work was supported by JSPS KAKENHI (grant No. JP18KK0388).

References
 
© 2022 THE BIOPHYSICAL SOCIETY OF JAPAN
feedback
Top