13.1 Conformational sampling of ala-dipeptide in water using Gaussian accelerated molecular dynamics

The Gaussian accelerated Molecular Dynamics (GaMD) method [1,2] accelerates the conformational sampling of biomolecules by adding a harmonic boost potential to smooth the potential energy of a biological system:

\( \displaystyle U'(x) = U(x)+\Delta U^{\text{GaMD}}\left(U(x)\right),\\ \displaystyle \Delta U^{\text{GaMD}}\left(U(x)\right) = \begin{cases}\frac{1}{2}k\left\{E-U(x)\right\}^2 & (U(x) < E)\\ 0 & (U(x)\ge E),\end{cases}\)

where \(x\) is the configuration of the system, \(U\) is the potential energy, \(\Delta U^{\text{GaMD}}\) is the boost potential of GaMD, the prime represents the value biased by GaMD boosting, \(k\) is a harmonic force constant, and \(E\) is the threshold of GaMD. As shown in Figure 1, we can see that the boost potential is added to the original potential surface, which lowers the energy barrier between basins.

Figure 1. Scheme of GaMD boosting.

GaMD does not require predefined reaction coordinates, which is a significant advantage because expert knowledge of the biomolecular systems of interest is essential for the definition of reaction coordinates. The use of the harmonic boost potential allows us to recover the unbiased free-energy changes through cumulant expansion to the second order, which resolves the practical reweighting problems in the original accelerated MD [3]. Here, we demonstrate a GaMD simulation by employing the alanine dipeptide as an example.

Note: GPU acceleration is available in GaMD simulations. If you have GPUs, we strongly recommend you to use GPUs. Please see here for GENESIS installation to GPU machines.


Preparation

First of all, let’s download the tutorial file (tutorial-13.1.tar.gz ). This tutorial consists of six steps: 1) system setup, 2) minimization, heating, and equilibration, 3) determination of initial parameters of GaMD by a short simulation, 4) determination of GaMD parameters with updating, 5) production run of GaMD, and 6) trajectory analysis. Control files for GENESIS are already included in the download file.

# Download the tutorial file
$ cd /home/user/GENESIS/Tutorials
$ mv ~/Downloads/tutorial-13.1.tar.gz ./
$ tar -xzvf tutorial-13.1.tar.gz
$ cd tutorial-13.1
$ ls
1_setup  2_equil  3_gamd_init  4_gamd_update  5_gamd_prod  6_anal


1. Setup

In this tutorial, we perform GaMD of alanine dipeptide in water. In the following commands, we build the system using tLeap which is included in AmberTools. If you do not have AmberTools, please download it from here. We obtain ionize.pdb, ionize.inpcrd, and ionize.prmtop as the input files for GENESIS.

# Change directory for the system setup
$ cd 1_setup
$ ls
alad.pdb  run.sh

# Make inpcrd and prmtop files using tLeap
$ run.sh
$ ls
alad.pdb  ionize.inpcrd  ionize.pdb  ionize.prmtop  leap.log  run.sh

# View the initial structure
$ vmd ionize.prmtop ionize.pdb


2. Minimization, heating, and equilibration

We carry out energy minimization, heating, and equilibration of the system. Please refer to Tutorial 3.2, for details about input options.

# Change directory for equilibration
$ cd ../2_equil
$ ls
run.sh  run1.inp  run2.inp  run3.inp

First, we perform a 5000-step minimization of the system to remove non-physical steric clashes. run1.inp is a control file for the minimization.

# Perform minimization
$ /home/user/GENESIS/bin/spdyn -np 8 run1.inp > run1.out

Next, we heat up the system from 0.1 K to 300 K during a 100-picosecond molecular dynamics simulation. run2.inp is a control file for heating.

# Perform heating
$ /home/user/GENESIS/bin/spdyn -np 8 run2.inp > run2.out

In the equilibration run, we carry out a 100-ps MD simulation at 300 K and 1 atm using the Bussi thermostat and barostat. run3.inp is a control file for equilibration. The equations of motion are integrated with the velocity Verlet algorithm with the time step of 2 fs, where the SHAKE algorithm is used for bond constraint.

# Perform equilibration
$ /home/user/GENESIS/bin/spdyn -np 8 run3.inp > run3.out

# View the trajectory using VMD
$ vmd ../1_setup/ionize.prmtop -dcd run3.dcd


3.
Determination of initial parameters of GaMD by a short simulation

In this tutorial, we use the dual boost scheme of GaMD, in which not only the dihedral potential but also the total potential of the system are independently boosted by harmonic potentials. We first determine initial parameters for GaMD dual boosting (i.e., pot_max, pot_min, pot_ave, pot_dev, dih_max, dih_min, dih_ave, dih_dev). The harmonic constant \(k\) and the threshold \(E\) in the above equation are determined from these parameters. Please refer to GENESIS manual or the original papers of GaMD [1,2] for more detail. To obtain the initial guess of the boosting potential, we perform a short MD simulation without GaMD boosting. The following command executes the simulation:

$ cd ../3_gamd_init
$ /home/user/GENESIS/bin/spdyn -np 8 run.inp > run.out

The following shows the most important parts in run.inp. In [GAMD] section, we set the keyword related to GaMD. GaMD is turned on by gamd = yes. boost = no means that GaMD boosts are not applied but GaMD parameters are updated from the trajectory. In this example, we use the dual boost scheme and set the threshold of GaMD to the lower bound, which are specified by boost_type = DUAL and thresh_type = LOWER, respectively. sigma0_pot and sigma0_dih are the upper limit of the standard deviation of the total potential boost and that of the dihedral potential boost, respectively. These parameters are required for accurate reweighting. We here set these values to 6 kcal/mol. update_period in [GaMD] section is an update frequency of GaMD parameters. GaMD parameters are updated every update_period steps. At the same time, the updated parameters are output to gamdfile in [OUTPUT]. For the initial determination of parameters, update_period should be set to the same as nsteps in [DYNAMICS].

[OUTPUT]
dcdfile  = run.dcd
rstfile  = run.rst
gamdfile = run.gamd

[GAMD]
gamd          = yes
boost         = no
boost_type    = DUAL
thresh_type   = LOWER
sigma0_pot    = 6.0
sigma0_dih    = 6.0
update_period = 500000

The example of GaMD output file (gamdfile) is shown below.

            STEP         POT_MAX         POT_MIN         POT_AVE         POT_DEV          POT_TH          POT_KC         POT_KC0         DIH_MAX         DIH_MIN         DIH_AVE         DIH_DEV          DIH_TH          DIH_KC         DIH_KC0
 --------------- --------------- --------------- --------------- ---------------
          500000     -20927.9358     -21470.9234     -21170.3356         66.0818     -20927.9358          0.0003          0.1640         18.8888          8.4258         11.0364          1.0713         18.8888          0.0956          1.0000


4. 
Determination of GaMD parameters with updating

In this step, we determine GaMD parameters by updating them at each update_period period. The following command executes the simulation:

$ cd ../4_gamd_update
$ /home/user/GENESIS/bin/spdyn -np 8 run.inp > run.out

The most important parts in run.inp are shown below. To turn on GaMD boosting, we set boost = yes in this step. The values of pot_max, pot_min, pot_ave, pot_dev, dih_max, dih_min, dih_ave, and dih_dev are initially set to the parameters determined in the previous step (i.e., the values in the last line in gamdfile). We here specify the update frequency of GaMD parameters by update_period = 500. From the energy trajectory of 500 steps, the maximum, minimum, average, and standard deviation of the energy are calculated, and then the corresponding GaMD parameters are updated. The updated parameters are output to gamdfile in [OUTPUT] every update_period = 500 period.

[OUTPUT]
dcdfile  = run.dcd
rstfile  = run.rst
gamdfile = run.gamd

[GAMD]
gamd          = yes
boost         = yes
boost_type    = DUAL
thresh_type   = LOWER
sigma0_pot    = 6.0
sigma0_dih    = 6.0
update_period = 500
pot_max       = -20927.9358
pot_min       = -21470.9234
pot_ave       = -21170.3356
pot_dev       = 66.0818
dih_max       = 18.8888
dih_min       = 8.4258
dih_ave       = 11.0364
dih_dev       = 1.0713

The example of GaMD output file is shown below. Parameter updating should be continued until the parameters converge. The parameters in the last line of gamdfile are used for the production run.

            STEP         POT_MAX         POT_MIN         POT_AVE         POT_DEV          POT_TH          POT_KC         POT_KC0         DIH_MAX         DIH_MIN         DIH_AVE         DIH_DEV          DIH_TH          DIH_KC         DIH_KC0
 --------------- --------------- --------------- --------------- ---------------
             500     -20927.9358     -21470.9234     -21023.5243         31.8438     -20927.9358          0.0004          0.2287         18.8888          8.4258         11.5085          1.0981         18.8888          0.0956          1.0000


            1000     -20927.9358     -21470.9234     -21019.7042         30.3392     -20927.9358          0.0004          0.2380         18.8888          8.4258         12.0291          1.4921         18.8888          0.0956          1.0000



         2499500     -20630.7328     -21470.9234     -20854.8429         44.3726     -20630.7328          0.0002          0.1844         28.6641          8.4258         13.6859          1.8830         28.6641          0.0494          1.0000


         2500000     -20630.7328     -21470.9234     -20854.8335         44.3752     -20630.7328          0.0002          0.1844         28.6641          8.4258         13.6857          1.8830         28.6641          0.0494          1.0000


5. 
Production run of GaMD

In this step, we perform the production run of GaMD. The following command executes the simulation:

$ cd ../5_gamd_prod
$ /home/user/GENESIS/bin/spdyn -np 8 run.inp > run.out

The most important parts in run.inp are shown below. The values of pot_max, pot_min, pot_ave, pot_dev, dih_max, dih_min, dih_ave, and dih_dev are set to the parameters obtained in the previous step. Since GaMD parameters should not be changed in production MDs, we specify update_period = 0. We notice that the frequency of output should be high, i.e., eneout_period = 50 and crdout_period = 50 in order to reweight the GaMD trajectory accurately.

[OUTPUT]
dcdfile  = run.dcd
rstfile  = run.rst
gamdfile = run.gamd

[GAMD]
gamd          = yes
boost         = yes
boost_type    = DUAL
thresh_type   = LOWER
sigma0_pot    = 6.0
sigma0_dih    = 6.0
update_period = 0
pot_max       = -20630.7328
pot_min       = -21470.9234
pot_ave       = -20854.8335
pot_dev       = 44.3752
dih_max       = 28.6641
dih_min       = 8.4258
dih_ave       = 13.6857
dih_dev       = 1.8830


6.
Analysis

6.1 Calculation of dihedral angles of alanine dipeptide

We calculate the free-energy landscape in a 2-dimensional dihedral angle space. First, we extract dihedral angles from the trajectory. The following command calculates the dihedral angles of the alanine dipeptide:

$ cd ../6_anal/1_calc_cv
$ ./run.sh

The content of run.sh is shown below. This script calculates Φ (torsion 1) and Ψ (torsion 2) dihedral angles (shown in Figure 2) of alanine dipeptide from run.dcd which is obtained by the GaMD simulation. The values of the dihedrals are output to “cv.dat”.

#!/bin/bash

rm cv.dat

cat << EOF > cv.inp
[INPUT]
prmtopfile = ../../1_setup/ionize.prmtop
reffile    = ../../1_setup/ionize.pdb

[OUTPUT]
torfile = cv.dat

[TRAJECTORY]
trjfile1      = ../../5_gamd_prod/run.dcd
md_step1      = 1000000                      # number of MD steps
mdout_period1 = 1                            # MD output period
ana_period1   = 1                            # analysis period
repeat1       = 1
trj_format    = DCD                          # (PDB/DCD)
trj_type      = COOR+BOX                     # (COOR/COOR+BOX)

[OPTION]
check_only = NO
torsion1   = 1:ACE:C 2:ALA:N  2:ALA:CA 2:ALA:C       # PHI angle
torsion2   = 2:ALA:N 2:ALA:CA 2:ALA:C  3:NME:N       # PSI angle
EOF

trj_analysis cv.inp

rm cv.inp

Figure 2. Dihedral angles of alanine dipeptide.

6.2 Reweighting

The distribution of dihedral angles obtained in the previous step is not yet reweighted, that is, the free energy landscape of the dihedrals is still imposed by the GaMD boost potentials. We here reweight the free-energy landscape. The unbiased probability distribution of the CV of interest can be obtained by following equations:

\( \displaystyle p(\xi) = \frac{\int \delta (\xi-\xi(x))\exp [-\beta U(x)] dx}{\int \exp [-\beta U(x)] dx} \\ \displaystyle =\frac{\int \delta (\xi-\xi(x))\exp [\beta \Delta U^{\text{GaMD}}(x)] \exp [-\beta U'(x)] dx}{\int \delta (\xi-\xi(x)) \exp [-\beta U'(x)] dx} \frac{\int \delta (\xi-\xi(x))\exp [-\beta U'(x)] dx}{\int \exp [-\beta U'(x)] dx} \\ \displaystyle \frac{\int \exp [-\beta U'(x)] dx}{\int \exp [\beta \Delta U^{\text{GaMD}}(x)] \exp [-\beta U'(x)] dx} \\ \displaystyle = \left<\exp[\beta \Delta U^{\text{GaMD}}]\right>’_{\xi} \frac{p'(\xi)}{\left<\exp[\beta \Delta U^{\text{GaMD}}]\right>’},\)

where \(\xi\) is the CV. First, we extract the GaMD boost potentials \(\Delta U^{\text{GaMD}}\) from the GENESIS log file.

$ cd ../2_reweight_gamd
$ grep "^INFO" ../../5_gamd_prod/run.out > out.dat

The example of out.dat is shown below. The values in 17th and 18th columns correspond to the boost potentials of the total potential and dihedral energies, respectively.

INFO:       STEP            TIME       TOTAL_ENE   POTENTIAL_ENE     KINETIC_ENE            RMSG            BOND           ANGLE        DIHEDRAL        IMPROPER         VDWAALS   DISP-CORR_ENE           ELECT     TEMPERATURE          VOLUME  POTENTIAL_GAMD   DIHEDRAL_GAMD
INFO:         50          0.1000     -16913.7735     -21081.7653       4151.8457         13.0088          3.0848         10.7171         11.3267          0.2241       3287.3677       -180.4632     -24214.0226        310.7003      67975.4663          8.7200          7.4261
INFO:        100          0.2000     -16910.0513     -21044.4349       4122.8594         13.1644          5.5444          8.3032         14.7615          0.3908       3308.6533       -180.4632     -24201.6250        308.5311      67975.4663          6.7491          4.7751


From out.dat and cv.dat, we can obtain the reweighted free-energy landscape according to above equations. However, the convergence of \(  \left<\exp[\beta \Delta U^{\text{GaMD}}]\right>’_{\xi} \) is very slow, which causes the large statistical error. To reduce the statistical error, the cumulant expansion to the second order has been employed for GaMD simulations [4]. We do the cumulant expansion by the following commands.

./gamd_reweight_2d.py -t 300.0 -c 10 -n 1000000 --Xmin -180 --Xmax 180 --Xdel 10 --Ymin -180 --Ymax 180 --Ydel 10 --cv_file ../1_calc_cv/cv.dat --out_file out.dat --Xcyclic --Ycyclic
./plot_pmf_2d.py

gamd_reweight_2d.py reweights the free-energy landscape. The absolute temperature, the number of samples, the minimum value of the CV, the maximum value of the CV, the grid size of the bin for the CV are assigned by “-t”, “-n”, “–Xmin”, “–Xmax”, “–Xdel”, respectively. “-c” represents the cutoff of the number of samples in each bin. If the number of samples in a certain bin is below 10, the probability (or free energy) in the bin is omitted. “–Xcyclic” represents the periodicity of the CV. This script outputs the free energies in each bin “pmf.dat” and grid points of the CVs “xi.dat” and “yi.dat”. plot_pmf_2d.py draws the two-dimensional free-energy landscape using “pmf.dat”, “xi.dat”, and “yi.dat”.

The free-energy landscape reconstructed from the obtained dihedral angles and GaMD boost potentials is shown in Figure 3. The population of each conformational state is consistent with the results of the earlier studies in the literature.

Figure 3. The reweighted free-energy landscape of two dihedral angles of alanine dipeptide.

You can obtain the free-energy landscape without reweighting according to the following commands.

./gamd_reweight_2d.py -t 300.0 -c 10 -n 1000000 --Xmin -180 --Xmax 180 --Xdel 10 --Ymin -180 --Ymax 180 --Ydel 10 --cv_file ../1_calc_cv/cv.dat --out_file out.dat --Xcyclic --Ycyclic --noreweight
./plot_pmf_2d.py

The unreweighted free-energy landscape constructed from the obtained dihedral angles is shown in Figure 4. Due to the GaMD biases, the populations of the αR and C7ex conformations are overestimated, while the β/C5 conformation is underestimated. We can confirm that these overestimation and underestimations are modified in the above reweighting.

Figure 4. The unreweighted free-energy landscape of two dihedral angles of alanine dipeptide.


References

  1. Y. Miao, V. A. Feher, and J. A. McCammon. Gaussian Accelerated Molecular Dynamics: Unconstrained Enhanced Sampling and Free Energy Calculation. J. Chem. Theory Comput., 11, 3584–3595 (2015).
  2. Y. T. Pang, Y. Miao, Y. Wang, and J. A. McCammon. Gaussian Accelerated Molecular Dynamics in NAMD. J. Chem. Theory Comput., 13, 9–19 (2017).
  3. D. Hamelberg, J. Mongan, and J. A. McCammon. Accelerated Molecular Dynamics: A Promising and Efficient Simulation Method for Biomolecules. J. Chem. Phys., 120, 11919–11929 (2004).
  4. Y. Miao, W. Sinko, L. Pierce, D. Bucher, R. C. Walker, and J. A. McCammon. Improved Reweighting of Accelerated Molecular Dynamics Simulations for Free Energy Calculation. J. Chem. Theory Comput., 10, 2677–2689 (2014).

Written by Hiraku Oshima@RIKEN BDR
September, 2, 2019