Structural Biochemistry/Molecular Modeling/Molecular Dynamics

Theoretical Overview
Classical molecular dynamics (MD) is a branch of computational chemistry that focuses on simulating the motion of particles according to Newton's Laws of Motion. Atoms are approximated as "balls on springs," which allows for the application of the following laws:

$$ f = ma $$

$$ v = v_0 + at $$

$$ x = vt + \frac 1 2 a t^2 $$

Where $$ x $$ is position, $$ v $$ is velocity, $$ a $$ is acceleration, and $$ t $$ is time.

Classical MD approximations treat bond and angle potentials as harmonic, and dihedrals as periodic. This allows for expression of the energy of a system with an equation similar to the following used by the AMBER molecular dynamics program:

$$ E_{\rm total} = \sum_{\rm bonds} K_r (r - r_{eq})^2 + \sum_{\rm angles} K_\theta (\theta - \theta_{eq})^2 + \sum_{\rm dihedrals} {V_n \over 2} [1 + {\rm cos}(n\phi - \gamma)] + \sum_{i<j} \left [ {A_{ij} \over R_{ij}^{12}} - {B_{ij} \over R_{ij}^6} + {q_iq_j \over \epsilon R_{ij}} \right ] + \sum_{\rm H-bonds} \left [ {C_{ij} \over R_{ij}^{12}} - {D_{ij} \over R_{ij}^{10}} \right ] $$

Simulations are conducted in a series of "time steps." In each step, the force on each atom is calculated according to the derivative of the energy equation. The atom is moved a certain distance corresponding to the magnitude and direction of the force over the size of the time step. The process is repeated over many time steps, iteratively calculating forces and solving the equations of motion based on the accelerations from the forces, and produces results that mimic complex behavior from docking of pharmaceutical compounds to the folding of simple proteins.



Quantum Effects
Although atoms are known to behave according to the Schroedinger equation rather than by Newton's Laws, classical methods can be used with reasonable accuracy. A good indication of this can be given by checking the De Broglie wavelength of the particles involved :

$$ \Lambda = \sqrt{ \frac {2\pi\hbar^2} {Mk_BT} } $$

where $$ M $$ is the atomic mass and $$ T $$ is the temperature. The classical approximation is justified when $$ \Lambda \ll $$ the mean nearest neighbor separation between particles. So for heavier systems, such as organic molecules, the classical approximation is reasonable, but for light systems such as H2, He, or Ne, the method loses considerable accuracy.

Since atoms are approximated as discrete particles, interactions of smaller particles cannot be correctly modeled with classical MD. Making and breaking chemical bonds, noncovalent reaction intermediates, and proton or electron tunneling all cannot be simulated with this method. Simulations at very low temperature are also problematic, as near absolute zero classical mechanics are increasingly disfavored for quantum explanations. Quantum molecular dynamics methods exist for simulations that can be more electronically realistic, but at much greater computational cost.

System size and timescale
The number of atoms and amount of time that may be simulated is directly related to available computing power. Although the entirety of a small virus has been simulated before, the majority of fine-grained, all-atom simulations that take place are on the scale of proteins rather than organelles or complex assemblies. This is because the length of time that is necessary to simulate to observe results directly relates to system size:

Table - Timescales of molecular processes
So, the larger the system, the longer the simulation needs to be run to observe relevant behavior. Today, computer hardware specifically designed for molecular dynamics simulations can run approximately 17 microseconds of a 23,000 atom system in one day with a 0.5 femtosecond timestep. On the same system, a general-purpose parallel supercomputer can obtain on the order of a few hundred nanoseconds of simulation time.

Thermostats
Using an approach based on Newton's equations of motion, a simulation is obtained in the "microcanonical ensemble", which is a constant energy system (nVE). This is unrealistic, however, and in order to produce accurate comparisons to experiment, simulations are most commonly done in the canonical ensemble, which features constant temperature (nVT). This temperature must be maintained with an algorithm, which is called a thermostat. There are several thermostats that are commonly used, each of which has its advantages and disadvantages in terms of reliability, accuracy, and computational expense.

A typical simulation involves an MD equilibration phase, at which the structure is simulated at constant temperature for a certain amount of time at 0K, a heating phase where the system is gradually heated to the desired temperature (often 300K), and a final production simulation at that constant temperature. The initial equilibration is necessary not to move the structure to a chemical equilibrium but rather to an energetic minimum (which will usually be near the chemical equilibrium state). If the equilibration were omitted and the simulation run in a region with high energy, the simulation will continue to sample highly energetic phase space, producing results that will be inconsistent with experimental data.

The Langevin Theromostat
This thermostat follows the Langevin equation of motion rather than Newton's, where a frictional force proportional to the velocity is added to the conservative force, which adjusts the kinetic energy of the particle so that the temperature is correct.

The system is then described by the equation:

$$ ma = -v\xi + f(r) + f' $$

Where $$m$$, $$v$$ and $$a$$ are mass, velocity, and acceleration of the particle. $$\xi$$ is a frictional constant, and $$f'$$ is a random force that will add kinetic energy to the particle. It is sampled from a Gaussian distribution where the variance is a function of the set temperature and the time step. This balances the frictional force to maintain the temperature at the set value.

The Anderson Thermostat
This thermostat couples the system to a heat bath at the desired temperature. The bath is represented by collisions with a stocastic particle on randomly selected system particles.

It has been shown that the addition of stochastic collisions to the Newtonian MD system results in the simulation being a Markov chain that is irreducible and aperiodic. This implies that the generated distribution with Anderson's algorithm is not a canonical distribution.

This thermostat will randomly affect velocities of particles, which results in nonphysical dynamics in the simulated system.

The Gaussian Thermostat
This thermostat strongly couples the system with a heat bath set to the desired temperature. Here, the change in momentum for each particle is represented as:

$$\frac{dp_i}{dt} = \displaystyle\sum_{j}^{N} (F_{ij}|q_i-q_j|) - p_i\alpha $$

This rescaling destroys dynamic correlations very quickly, and therefore does not maintain the canonical ensemble.

The Berendsen Thermostat
This thermostat, first presented in 1984 involves re-scaling of the velocities of each particle in a manner that controls the temperature.

Similar to the Gaussian thermostat, the system is coupled to a heat bath with the desired temperature, although this algorithm uses weak rather than strong coupling, where the temperature of the system is corrected such that the deviation in temperature decays exponentially with some time constant $$\Tau$$ proportional to the timestep:

$$\frac{dT}{dt} = \frac{T_0-T}{\Tau}$$

The Nose-Hoover Thermostat
This thermostat adds an extra term, $$\eta$$ to the Newtonian equations of motion, producing the following second-order differential:

$$\frac{dp_n}{dt} = \displaystyle\sum_{i}^{N} \frac{|p_i|^2}{2m_i} - \frac{3}{2}nk_BT $$

Where $$m$$ is mass, $$T$$ is the desired temperature, and $$k_B$$ is the Boltzmann constant. The heat bath is given some mass, $$Q$$, which for Nose's suggested value of $$6nk_BT$$ produces the above equation for the momentum change of each individual particle.

With this thermostat, the energy of the simulated system fluctuates, but the total energy of the system and heat bath is conserved. If the system is ergodic (all accessible microstates are equally probable), then this system gives the canonical ensemble.

Force fields
Molecular dynamics simulations require a "force field," which lists the parameters involved in each type of interaction in the energy equation. For example, the AMBER force field requires parameters for bond equilibrium length, bond force constant for every possible combination of two atom types, angle equilibrium value, and angle force for every combination of three atom types, dihedral equilibrium phase, dihedral force constant, and dihedral periodicity for every combination of four atom types. Additional parameters also describe the nonbonded (electrostatic) forces for each atom as well.

The development and analysis of these parameters, which collectively are called a force field, is a focus of molecular dynamics research.

Force field development
Parameters for classical molecular dynamics ideally lead to behavior that closely approximates that of actual molecules on an atomistic level, which are in reality governed by quantum phenomena. The classical parameters are developed such that this is true- once an energy and force equation (such as the AMBER equation) has been made to establish the paradigm for the system, parameters are derived such that calculated values with them match those calculated with quantum mechanical methods or experiment. Due to the lack of availability of experimental data for the wide variety of molecules and molecular conformations necessary to develop a force field, data from quantum calculations on the systems are used instead.

Force fields are developed for specific classes of molecules. For example, AMBER provides the Amber ff99SB forcefield for proteins, the Glycam forcefield for carbohydrates, and the newly developed lipid forcefield GAFFLipid. In addition, a "General AMBER Force Field," or GAFF, has been developed for the parameterization of small ligands and other molecules that do not fit into the broad force field categories.

The accuracy of a force field is critical to simulation accuracy. Simple terms, such as dihedral force constants, can have an extremely large effect on overall simulations. In fact, inaccurate dihedral parameters for the protein backbone $$\phi$$ and $$\psi$$ angles produced simulations that were excessively disposed to alpha helix formation, a fact which affected numerous results.

Historically, force field development has been complicated by the lack of availability of energy and force data to fit to. Force field parameters are generated by fitting the parameters such that the energy of a structure or the predicted forces on each atom predicted using the parameters matches the energy or forces generated by ab initio quantum calculations. These calculations are extremely expensive, and as a result considerable extrapolation is usually made from one set of parameters to a general one in order to avoid re-doing the calculations. For example, using parameters generated for the protein backbone of tetra-alanine as describing the backbone of any protein.

The availability of increasing computing power, more efficient algorithms, and knowledge of past methods proves promising for future parameter development. For example, side-chain parameters for various amino acids may be calculated separately and then added in to a general parameter set in order to characterize each amino acid individually.

Force field accuracy
Classical force fields have been developed by fitting the classical parameters such that they give results consistent with experimental and quantum data. They are categorized by the type of biomolecule- protein, carbohydrate, and lipid forcefields have been developed separately for all major MD programs. However, this categorization is still quite broad, and can neglect system-specific interactions.