10.1 REMD simulation of alanine-tripeptide in water

In this tutorial, we illustrate how to run temperature replica exchange molecular dynamics (T-REMD) simulations of Alanine tri-peptide and calculate PMF of distance distribution. T-REMD is an enhanced sampling method that parallely simulate the system at a range of different temperatures and periodically exchanging between them. As a result of sampling at several temperature, the simulation can efficiently overcome well known problem of conventional MD, wherein the conformation could be trapped in a local minimum. For further details of REMD method, please refer to [1]. 


Preparation

First, download the tutorial file(tutorial19-10.1.zip ). This tutorial consists of seven steps: 1) system setup, 2) energy minimization, 3) heating, 4) equilibration, 5) REMD equilibration, 6) production run, and 7) trajectory analysis. Control files for GENESIS are already included in the download file. Since we use the CHARMM force field, we make a symbolic link to the CHARMM toppar directory (see Tutorial 2).

$ cd Tutorials
$ mv ~/Downloads/tutorial19-10.1.zip ./
$ unzip tutorialq9-10.1.zip
$ cd tutorial-10.1
$ ln -s ../../Others/CHARMM/toppar ./
$ ls 
1_setup  2_minimize  3_heat 4_equilibrate  5_equilibrate_remd 6_production  7_analysis  


1. Setup

To setup the system, please follow the steps in the basic tutorial (see Tutorial 3.2).


2. Minimization

The first step of the simulation is to run energy minimization of the system, in order to remove atomic clashes in the initial structure. For further details of the control file and choice of parameters, please refer to basic tutorials section (see Tutorial 3.2).

Let’s change directory to perform energy minimization, and view the control file of GENESIS.

#Change directory to 2_minimize
$ cd 2_minimize
$ ls
$ step2.inp  
$ vi step2.inp

#Input control file.

[INPUT]
topfile = ../1_setup/top_all36_prot.rtf     # topology file
parfile = ../1_setup/par_all36m_prot.prm     # parameter file
strfile = ../1_setup/toppar_water_ions.str  # stream file
psffile = ../1_setup/wbox.psf               # protein structure file
pdbfile = ../1_setup/wbox.pdb               # PDB file
reffile = ../1_setup/wbox.pdb               # PDB file for restraints

[OUTPUT]
dcdfile    = step2.dcd   # DCD trajectory file
rstfile    = step2.rst   # restart file

[ENERGY]
forcefield       = CHARMM   # CHARMM force field
electrostatic    = PME      # Particle Mesh Ewald method
switchdist       = 10.0     # switch distance
cutoffdist       = 12.0     # cutoff distance
pairlistdist     = 13.5     # pair-list distance
pme_ngrid_x      = 40       # grid size_x in [PME]
pme_ngrid_y      = 40       # grid size_y in [PME]
pme_ngrid_z      = 40       # grid size_z in [PME]
contact_check    = yes
vdw_force_switch = yes
 
[MINIMIZE]
method           = SD       # Steepest descent method
nsteps           = 2000     # number of minimization steps
eneout_period    = 100      # energy output period
crdout_period    = 100      # coordinates output period
rstout_period    = 1000     # restart output period
 
[BOUNDARY]
type             = PBC      # periodic boundary condition
box_size_x       = 50.200   # box size (x) in [PBC]
box_size_y       = 50.200   # box size (y) in [PBC]
box_size_z       = 50.200   # box size (z) in [PBC] 

[SELECTION]
group1           = sid:PROA and heavy
 
[RESTRAINTS]
nfunctions       = 1        # number of functions
function1        = POSI     # restraint function type
constant1        = 1.0      # force constant
select_index1    = 1        # restrained groups

To execute the calculation use the following command:

# Run energy minimization (it takes ~13 seconds)
$ export OMP_NUM_THREADS=5
$ mpirun -np 8 /home/user/GENESIS-1.7.1/bin/spdyn step2.inp > log

After the calculation, check the trajectory by using VMD. We can see that the atoms are slightly moved, but the atomic clashes are actually removed. In this step, we used 8 MPI processors and 5 OpenMP threads, namely, 40 CPU cores.


3. Heating

The second step of the simulation is to heat up the system, with restraint on the peptide heavy atoms, to 300 K. The heating is performed via annealing process wherein the temperature is increased by 3 K every 500 steps. Total simulation is 100 ps and it finishes within 5 minutes using spdyn and 40 CPU cores. Note that in this step the integrator is Leap and the thermostat is Langevin; both will be changed in the following steps. The control input file as follow:

#Change directory to 3_heat
$ cd ../3_heat
$ ls
$ step3.inp  
$ vi step3.inp

#Input control file.
[INPUT]
topfile = ../1_setup/top_all36_prot.rtf     # topology file
parfile = ../1_setup/par_all36m_prot.prm    # parameter file
strfile = ../1_setup/toppar_water_ions.str  # stream file
psffile = ../1_setup/wbox.psf               # protein structure file
pdbfile = ../1_setup/wbox.pdb               # PDB file
reffile = ../1_setup/wbox.pdb               # PDB file for restraints
rstfile = ../2_minimize/step2.rst                       # restart file

[OUTPUT]
dcdfile  = step3.dcd       # DCD trajectory file
rstfile  = step3.rst       # restart file

[ENERGY]
electrostatic    = PME       # [CUTOFF,PME]
switchdist       = 10.0      # switch distance
cutoffdist       = 12.0      # cutoff distance
pairlistdist     = 13.5      # pairlist distance
pme_ngrid_x      = 40        # grid size_x in [PME]
pme_ngrid_y      = 40        # grid size_y in [PME]
pme_ngrid_z      = 40        # grid size_z in [PME]
vdw_force_switch = yes
contact_check    = yes

[DYNAMICS]
integrator       = LEAP      # [LEAP,VVER]
nsteps           = 50000     # number of MD steps
timestep         = 0.002     # timestep (ps)
eneout_period    = 500       # energy output period
crdout_period    = 500       # coordinates output period
rstout_period    = 50000     # restart output period
annealing        = YES       # simulated annealing
anneal_period    = 500       # annealing period
dtemperature     = 3         # temperature change at annealing (K)
 
[CONSTRAINTS]
rigid_bond     = YES       # constraints all bonds 
                           # involving hydrogen
 
[ENSEMBLE]
ensemble       = NVT       # [NVE,NVT,NPT]
tpcontrol      = LANGEVIN  # thermostat
temperature    = 0.1       # initial temperature (K)
 
[BOUNDARY]
type           = PBC       # [PBC,NOBC]

[SELECTION]
group1           = sid:PROA and heavy

[RESTRAINTS]
nfunctions       = 1        # number of functions
function1        = POSI     # restraint function type
constant1        = 1.0      # force constant
select_index1    = 1        # restrained groups

To execute the simulation, we use similar commands as previous step.

# Run heating step (it takes ~5 minutes)
$ export OMP_NUM_THREADS=5
$ mpirun -np 8 /home/user/GENESIS-1.7.1/bin/spdyn step3.inp > log


4. 
Equilibration

The next step before REMD simulations is to equilibrate the system using two step, for 100 ps each; 1) We equilibrate the system upon removing heavy atoms restraints in NVT ensemble, then 2) We equilibrate the simulation box using NPT ensemble. Note that in these steps we switch the integrator to Velocity Verlet (VVER) and the thermostat to Bussi; for more information regarding choice of thermostat and its effect on simulation, please check [2]. The control file (step4.1) for the first equilibration step as follow:

#Change directory to 4_equilibrate
$ cd ../4_equilibirate
$ ls
$ step4.1.inp step4.2.inp   
$ vi step4.1.inp

#Input control file.
[INPUT]
topfile = ../1_setup/top_all36_prot.rtf     # topology file
parfile = ../1_setup/par_all36m_prot.prm    # parameter file
strfile = ../1_setup/toppar_water_ions.str  # stream file
psffile = ../1_setup/wbox.psf               # protein structure file
pdbfile = ../1_setup/wbox.pdb               # PDB file
rstfile = ../3_heat/step3.rst               # restart file

[OUTPUT]
dcdfile  = step4.1.dcd       # DCD trajectory file
rstfile  = step4.1.rst       # restart file

[ENERGY]
electrostatic    = PME       # [CUTOFF,PME]
switchdist       = 10.0      # switch distance
cutoffdist       = 12.0      # cutoff distance
pairlistdist     = 13.5      # pairlist distance
pme_ngrid_x      = 40        # grid size_x in [PME]
pme_ngrid_y      = 40        # grid size_y in [PME]
pme_ngrid_z      = 40        # grid size_z in [PME]
vdw_force_switch = yes
 
[DYNAMICS]
integrator    = VVER      # [LEAP,VVER]
nsteps        = 50000     # number of MD steps
timestep      = 0.002     # timestep (ps)
eneout_period = 500       # energy output period
crdout_period = 500       # coordinates output period
rstout_period = 10000     # restart output period
stoptr_period = 10        # remove translational and rotational motions period
 
[CONSTRAINTS]
rigid_bond    = YES       # constraints all bonds involving hydrogen
 
[ENSEMBLE]
ensemble      = NVT       # [NVE,NVT,NPT]
tpcontrol     = BUSSI     # thermostat and barostat
temperature   = 300.0     # initial temperature (K)
pressure      = 1.0       # target pressure (atm)
 
[BOUNDARY]
type           = PBC      # [PBC,NOBC]

The second equilibration step control file is:

#Input control file.
[INPUT]
topfile = ../1_setup/top_all36_prot.rtf     # topology file
parfile = ../1_setup/par_all36m_prot.prm    # parameter file
strfile = ../1_setup/toppar_water_ions.str  # stream file
psffile = ../1_setup/wbox.psf               # protein structure file
pdbfile = ../1_setup/wbox.pdb               # PDB file
rstfile = step4.1.rst                       # restart file

[OUTPUT]
dcdfile  = step4.2.dcd       # DCD trajectory file
rstfile  = step4.2.rst       # restart file

[ENERGY]
electrostatic    = PME       # [CUTOFF,PME]
switchdist       = 10.0      # switch distance
cutoffdist       = 12.0      # cutoff distance
pairlistdist     = 13.5      # pairlist distance
pme_ngrid_x      = 40        # grid size_x in [PME]
pme_ngrid_y      = 40        # grid size_y in [PME]
pme_ngrid_z      = 40        # grid size_z in [PME]
vdw_force_switch = yes
 
[DYNAMICS]
integrator    = VVER      # [LEAP,VVER]
nsteps        = 50000     # number of MD steps
timestep      = 0.002     # timestep (ps)
eneout_period = 500       # energy output period
crdout_period = 500       # coordinates output period
rstout_period = 10000     # restart output period
stoptr_period = 10        # remove translational and rotational motions period
 
[CONSTRAINTS]
rigid_bond    = YES       # constraints all bonds involving hydrogen
 
[ENSEMBLE]
ensemble      = NPT       # [NVE,NVT,NPT]
tpcontrol     = BUSSI     # thermostat and barostat
temperature   = 300.0     # initial temperature (K)
pressure      = 1.0       # target pressure (atm)
 
[BOUNDARY]
type           = PBC      # [PBC,NOBC]

To run both steps, we use the following commands.

# Run equilibration steps. First and second steps take ~5 minutes each. 
$ export OMP_NUM_THREADS=5
$ mpirun -np 8 /home/user/GENESIS-1.7.1/bin/spdyn step4.1.inp > log

$ export OMP_NUM_THREADS=5
$ mpirun -np 8 /home/user/GENESIS-1.7.1/bin/spdyn step4.2.inp > log


5. REMD 
Equilibration

Before starting T-REMD simulation, we must determine the number of replicas and the temperatures of replicas. These temperatures need to be chosen carefully, because they will greatly affect the results of T-REMD simulations and probabilities of the replica exchanges. If the difference of temperatures between adjacent replicas is small, the exchange probability is high, but the number of replicas needed for such simulation is also large. Therefore, we need to find the optimal temperature intervals to perform efficient REMD simulations.

We recommend using the web server REMD Temperature generator [3]. This tool automatically generates the number of replicas and their temperatures according to the information we input. We show the example of the input for the T-REMD simulation of the solvated trialanine system:

temp_generator_input_table

Replica exchange probabilities are often set to 0.2 – 0.3, and here we set the replica exchange probability to 0.25. The lower and upper temperature limits are set to 300 K and 350 K, respectively. Since we use SHAKE and SETTLE algorithms in the simulation, “constraints in the protein” is set to “Bonds to hydrogen only”, and “Constraints in water” to “rigid”. We use all-atom model, so we select “All H” for the parameter of “Hydrogens in proteins”. The numbers of protein atoms and the water molecules are input according to the numbers in "../1_setup/wbox.pdb“. We can count the number of the water molecules in the system as follows:

# count the number of TIP3P water molecules
$ grep "TIP3" ../1_setup/wbox.pdb | wc -l | awk '{print $1 / 3}'

When we fill in all the parameters and submit them, the server generates the summary, temperatures and energies of 20 replicas in a few seconds as follows.

In the table shown above, there is the parameter “Simulation type“. We can select either “NPT” or “NVT” for the parameter, but, when we select “NVT“, the server returns an error message. Thus we choose “NPT” here, even though we use NVT ensemble in our T-REMD simulation.

temp_generator_resultUsing these generated temperature parameters, we start our T-REMD simulation. First, each replica must be equilibrated at the selected temperature just like conventional MD simulations. The following command performs a 500 ps NVT MD simulations. Here we use 640 (= 8 MPI x 80 OpenMP) CPU cores for the REMD simulations. Note that in this step as well as the next step (production run) we further change the integrator to RESPA (VRES) in order to use 2.5 femto second time step. (see Recommended control parameters for ver. 1.4) Make sure that you use NVT ensemble, as using NPT ensemble make significant differences in the potential energy of each replica and subsequently hinder the exchange between them. The T-REMD equilibration control file is:

#Change directory to 5_equilibrate_remd
$ cd ../5_equilibrate_remd
$ ls
$ step5.inp   
$ vi step5.inp

#Input control file.
[INPUT]
topfile = ../1_setup/top_all36_prot.rtf        # topology file
parfile = ../1_setup/par_all36m_prot.prm       # parameter file
strfile = ../1_setup/toppar_water_ions.str     # stream file
psffile = ../1_setup/wbox.psf                  # protein structure file
pdbfile = ../1_setup/wbox.pdb                  # PDB file
rstfile = ../4_equilibrate/step4.2.rst         # restart file

[OUTPUT]
dcdfile  = step5_rep{}.dcd       # DCD trajectory file
rstfile  = step5_rep{}.rst       # restart file
remfile  = step5_rep{}.rem       # replica exchange ID file
logfile  = step5_rep{}.log       # log file of each replica

[ENERGY]
forcefield       = CHARMM    # CHARMM force field
electrostatic    = PME       # [CUTOFF,PME]
switchdist       = 10.0      # switch distance
cutoffdist       = 12.0      # cutoff distance
pairlistdist     = 13.5      # pairlist distance
pme_ngrid_x      = 40        # grid size_x in [PME]
pme_ngrid_y      = 40        # grid size_y in [PME]
pme_ngrid_z      = 40        # grid size_z in [PME]
vdw_force_switch = yes

[DYNAMICS]
integrator        = VRES      # [LEAP,VVER]
nsteps            = 200000    # number of MD steps
timestep          = 0.0025    # timestep (ps)
eneout_period     = 500       # energy output period
crdout_period     = 500       # coordinates output period
rstout_period     = 10000     # restart output period
stoptr_period     = 10        # remove translational and rotational motions period
elec_long_period  = 2
thermostat_period = 10
barostat_period   = 10
nbupdate_period   = 10

[CONSTRAINTS]
rigid_bond       = YES         # constraints all bonds involving hydrogen

[ENSEMBLE]
ensemble      = NVT            # [NVE,NVT,NPT]
tpcontrol     = BUSSI          # thermostat and barostat
temperature   = 300.0          # initial temperature (K)
pressure      = 1.0            # target pressure (atm)
[BOUNDARY]
type          = PBC            # periodic boundary condition

[REMD]
dimension        = 1            # number of parameter types
exchange_period  = 0            # NO exchange for equilibration

type1            = temperature  # T-REMD
nreplica1        = 20           # number of replicas
parameters1      = 300.00 302.53 305.09 307.65 310.24 \
                   312.85 315.47 318.12 320.78 323.46 \
                   326.16 328.87 331.61 334.37 337.14 \
                   339.94 342.75 345.59 348.44 351.26

In [INPUT] section, we specify step4.2.rst file as the restart file of the simulation run. In the REMD simulation, the 20 (equal to the number of replicas specified nreplica in [REMD] section explained below) copies are automatically made from this restart file.

In [OUTPUT] section, in addition to dcdfile and rstfile names, we give also logfile and remfile. If we perform REMD simulations, file names of logfile and remfile are also required. logfile gives a log file of MD simulation for each replica. remfile gives replica exchange parameter. In this section, “{}” returns a replica index number from 1 to 20.

When we want to run REMD simulations, we add [REMD] section in the control file. In [REMD] section, the number of dimensions is set to dimension = 1, and the type of the exchanged variable is set to type1 = temperature for T-REMD simulation. The number of replicas is set to nreplica1 = 20, and we assign the replica temperatures to the variable parameter1. If exchange_period = 0, then no exchanges occur during the run, and so we set exchange_period to 0 for equilibration of all replicas.

In [ENSEMBLE] section, we change NPT ensemble to NVT ensemble.  Though we can perform REMD simulations both in  NVT and NPT ensemble with GENESIS, we recommend performing T-REMD simulations in NVT ensemble, because in a temperature exceeding the water boiling point, the system may be disrupted.

Other parameters in [INPUT][ENERGY][ENSEMBLE][BOUNDARY] and [CONSTRAINTS] sections are not changed from the previous runs (“step4.2.inp“).

To run T-REM equilibiration, we use the following commands.

# Run REMD-equilibiration step. It take 26 minutes to finish.  
$ export OMP_NUM_THREADS=5
$ mpirun -np 80 /home/user/GENESIS-1.7.1/bin/spdyn step5.inp > log


6. Production run

Since we have now completed all preparation steps, now we can start running the production simulation. We ran a short simulation for 10 ns for the purpose of tutorial, however longer simulation is actually needed to obtain better energy results. The following control file is used to run the simulation in NVT ensemble.

#Change directory to 6_production
$ cd ../6_production
$ ls
$ step6.inp 
$ vi step6.inp

#Input control file.
[INPUT]
topfile = ../1_setup/top_all36_prot.rtf           # topology file
parfile = ../1_setup/par_all36m_prot.prm          # parameter file
strfile = ../1_setup/toppar_water_ions.str        # stream file
psffile = ../1_setup/wbox.psf                     # protein structure file
pdbfile = ../1_setup/wbox.pdb                     # PDB file
rstfile = ../5_equilibrate_remd/step5_rep{}.rst   # restart file

[OUTPUT]
dcdfile  = step6_rep{}.dcd       # DCD trajectory file
rstfile  = step6_rep{}.rst       # restart file
remfile  = step6_rep{}.rem       # replica exchange ID file
logfile  = step6_rep{}.log       # log file of each replica

[ENERGY]
forcefield       = CHARMM    # CHARMM force field
electrostatic    = PME       # [CUTOFF,PME]
switchdist       = 10.0      # switch distance
cutoffdist       = 12.0      # cutoff distance
pairlistdist     = 13.5      # pairlist distance
pme_ngrid_x      = 40        # grid size_x in [PME]
pme_ngrid_y      = 40        # grid size_y in [PME]
pme_ngrid_z      = 40        # grid size_z in [PME]
vdw_force_switch = yes

[DYNAMICS]
integrator        = VRES      # [LEAP,VVER]
nsteps            = 4000000
timestep          = 0.0025    # timestep (ps)
eneout_period     = 500       # energy output period
crdout_period     = 500       # coordinates output period
rstout_period     = 10000     # restart output period
stoptr_period     = 10        # remove translational and rotational motions period
elec_long_period  = 2
thermostat_period = 10
barostat_period   = 10
nbupdate_period   = 10

[CONSTRAINTS]
rigid_bond       = YES       # constraints all bonds involving hydrogen

[ENSEMBLE]
ensemble      = NVT       # [NVE,NVT,NPT]
tpcontrol     = BUSSI     # thermostat and barostat
temperature   = 300.0     # initial temperature (K)
pressure      = 1.0       # target pressure (atm)

[BOUNDARY]
type             = PBC       # periodic boundary condition

[REMD]
dimension        = 1            # number of parameter types
exchange_period  = 2500         # Exchange Interval

type1            = temperature  # T-REMD
nreplica1        = 20           # number of replicas
parameters1      = 300.00 302.53 305.09 307.65 310.24 \
                   312.85 315.47 318.12 320.78 323.46 \
                   326.16 328.87 331.61 334.37 337.14 \
                   339.94 342.75 345.59 348.44 351.26

To run T-REM production run, we use the following commands.

# Run T-REMD production step. It take 10 hours to finish.  
$ export OMP_NUM_THREADS=5
$ mpirun -np 80 /home/user/GENESIS-1.7.1/bin/spdyn step6.inp > step6.log

In [INPUT] section, rstfile points to the output files from the equilibration. “{}” returns a series of the 20 input files.

In [REMD] section, the period between replica exchanges is specified by exchange_period = 2500. Then replica exchange attempts occurs every 2500 steps, that is, every 6.5 ps. The other parameters in [REMD] section should not be changed from those in the input files of your previous runs.


7. Analysis

In this tutorial, we mainly focus on calculating PMF of the end to end distance distribution at 300 K. In which, we use all temperatures trajectory upon applying the Multistate Bennett Acceptance Ratio (MBAR) re-weighting method. For information on MBAR method, please check [4]. However, before calculating PMF, we first check the simulation by calculating acceptance ratio, replica random walk and temperature potential energy distribution.

In T-REMD control file, we setup the exchange_period=2500 which means replica exchange is attempt every 6.5 ps. In the log output of the REMD simulation, we can see the information about replica-exchange attempts at every exchange_period steps.

REMD> Step:    2497500   Dimension:    1   ExchangePattern:    2
  Replica      ExchangeTrial             AcceptanceRatio      Before       After
        1         16 >    17   A         230 /       500     339.940     342.750
        2         19 >    18   R         209 /       500     348.440     348.440
        3         12 >    13   A         194 /       500     328.870     331.610
        4          4 >     5   R         194 /       500     307.650     307.650
        5          6 >     7   R         196 /       500     312.850     312.850
        6         10 >    11   A         222 /       500     323.460     326.160
        7          5 >     4   R         194 /       500     310.240     310.240
        8          7 >     6   R         196 /       500     315.470     315.470
        9         14 >    15   R         199 /       500     334.370     334.370
       10         11 >    10   A         222 /       500     326.160     323.460
       11         13 >    12   A         194 /       500     331.610     328.870
       12          3 >     2   A         192 /       500     305.090     302.530
       13         20 >     0   N           0 /         0     351.260     351.260
       14          8 >     9   R         185 /       500     318.120     318.120
       15         17 >    16   A         230 /       500     342.750     339.940
       16         18 >    19   R         209 /       500     345.590     345.590
       17          1 >     0   N           0 /         0     300.000     300.000
       18         15 >    14   R         199 /       500     337.140     337.140
       19          2 >     3   A         192 /       500     302.530     305.090
       20          9 >     8   R         185 /       500     320.780     320.780
  
Parameter    :    342.750   348.440   331.610   307.650   312.850   326.160   310.240   315.470   334.370   323.460   328.870   302.530   351.260   318.120   339.940   345.590   300.000   337.140   305.090   320.780
RepIDtoParmID:         17        19        13         4         6        11         5         7        14        10        12         2        20         8        16        18         1        15         3         9
ParmIDtoRepID:         17        12        19         4         7         5         8        14        20        10         6        11         3         9        18        15         1        16         2        13

REMD> Step:    2500000   Dimension:    1   ExchangePattern:    1
  Replica      ExchangeTrial             AcceptanceRatio      Before       After
        1         17 >    18   A         210 /       500     342.750     345.590
        2         19 >    20   R         212 /       500     348.440     348.440
        3         13 >    14   A         215 /       500     331.610     334.370
        4          4 >     3   R         197 /       500     307.650     307.650
        5          6 >     5   R         213 /       500     312.850     312.850
        6         11 >    12   A         212 /       500     326.160     328.870
        7          5 >     6   R         213 /       500     310.240     310.240
        8          7 >     8   R         178 /       500     315.470     315.470
        9         14 >    13   A         215 /       500     334.370     331.610
       10         10 >     9   R         205 /       500     323.460     323.460
       11         12 >    11   A         212 /       500     328.870     326.160
       12          2 >     1   R         194 /       500     302.530     302.530
       13         20 >    19   R         212 /       500     351.260     351.260
       14          8 >     7   R         178 /       500     318.120     318.120
       15         16 >    15   A         224 /       500     339.940     337.140
       16         18 >    17   A         210 /       500     345.590     342.750
       17          1 >     2   R         194 /       500     300.000     300.000
       18         15 >    16   A         224 /       500     337.140     339.940
       19          3 >     4   R         197 /       500     305.090     305.090
       20          9 >    10   R         205 /       500     320.780     320.780

Parameter    :    345.590   348.440   334.370   307.650   312.850   328.870   310.240   315.470   331.610   323.460   326.160   302.530   351.260   318.120   337.140   342.750   300.000   339.940   305.090   320.780
RepIDtoParmID:         18        19        14         4         6        12         5         7        13        10        11         2        20         8        15        17         1        16         3         9
ParmIDtoRepID:         17        12        19         4         7         5         8        14        20        10        11         6         9         3        15        18        16         1         2        13

In this log file, we should pay attention to the AcceptanceRatio values. If those values are  much lower than target, you should review the parameters of your simulation, such as modifying the temperature range. In this table, 'A' and 'R' mean that the exchange at this step is accepted or rejected respectively. The last two columns show replica temperatures before and after the exchange trials, respectively.

Lines in blue summarize the locations and parameters after replica exchanges. The Parameter line gives the temperature of each replica in T-REMD simulation. The RepIDtoParmID line stands for the permutation function that converts Replica ID to Parameter ID. For example, in the 1st column, 18 is written, which means that the temperature of Replica 1 is set to 345.59 K. The ParmIDtoRepID line also represents the permutation function that converts Parameter ID to Replica ID. For example, in the 6th column, 5 is written, which means that Parameter 6 (corresponding to the replica temperature, 313.00 K) is located in Replica 5.

In REMD preparation step using temperature generator, we setup the probability of exchange to 0.25. So we first check the simulation acceptance ratio and check if it matches our original selection, see next section.

# change directory
$ cd ../7_analysis
$ ls
1_calc_ratio  2_plot_index  3_sort_dcd  4_plot_potential  5_end_end_distance 6_MBAR 7__PMF

7.1. Calculate the acceptance ratio of each replica

As we mentioned in the previous section, acceptance ratio of replica exchange is one of the important factors that determine the efficiency of REMD simulations. The acceptance ratio is displayed in a standard log output “step6.log“, and we examine the data from the last step. Here, we show an example how to examine the data. Note that the acceptance ratio of replica “A” to “B” is identical to “B” to “A”, and thus we calculate only “A” to “B”. For this calculation, you can use the script “calc_ratio.sh“.

# change directory
$ cd 1_calc_ratio
$ ls 
$ calc_ratio.sh
# make the file executable and use it
$ chmod u+x calc_ratio.sh
$ ./calc_ratio.sh

1 > 2 0.37
3 > 4 0.39125
5 > 6 0.3725
7 > 8 0.4075
9 > 10 0.39625
11 > 12 0.4075
13 > 14 0.41
15 > 16 0.39875
17 > 18 0.4375
19 > 20 0.455

The file “calc_ratio.sh” contains the following commands:

# get acceptance ratios between adjacent parameter IDs
$ grep "  1 >     2" ../../6_production/step6.log | tail -1  > acceptance_ratio.dat
$ grep "  3 >     4" ../../6_production/step6.log | tail -1 >> acceptance_ratio.dat
$ grep "  5 >     6" ../../6_production/step6.log | tail -1 >> acceptance_ratio.dat
....
$ grep " 19 >    20" ../../6_production/step6.log | tail -1 >> acceptance_ratio.dat

# calculate the ratios
$ awk '{print $2,$3,$4,$6/$8}' acceptance_ratio.dat

7.2. Plot time courses of replica indices and temperatures

To examine the random walks of each replica in temperature space, we analyze time course of the replica indices. We need to plot the values of the “ParmIDtoRepID” lines from step6.log for a chosen starting replica temperature, for example 300 K (first column), versus time. Using following commands, we can get replica IDs in each snapshot.

# change directory
$ cd ../2_plot_index
$ ls
$ plot_index.sh plot_temperature.sh step_number.sh plot_repID-Tempreture.gnuplot plot_parmID-repID.gnuplot 

# make the file executable and use it
$ chmod u+x plot_index.sh
$ ./plot_index.sh

The file “plot_index.sh” contains the following commands:

# get replica IDs in each snapshot
$ grep "ParmIDtoRepID:" ../../6_production/step6.log | sed 's/ParmIDtoRepID:/ /' > T-REMD_parmID-repID.dat

The generated file does not include time steps. So we use step_number.sh to generate and combine step number to replica index. The step_number.sh file includes the folllowing commands.

# get step number and combine to replica index
$grep "REMD> Step:" ../../6_production/step6.log | cut -c 12-25 > step.log
$paste step.log T-REMD_parmID-repID.dat > T-REMD_parmID-repID.log

Using following gnuplot commands, we can plot the replica IDs in each snapshot.

# plot replica IDs in each snapshot
$ gnuplot plot_parmID-repID.gnuplot
#The parmID-repID.gnuplot include the following commands.

set terminal jpeg
set output "300.00K.jpg"
set yrange [0:21]
set mxtics
set mytics
set xlabel "Time (ns)"
set ylabel "Replica ID"
plot "T-REMD_parmID-repID.log" using ($1*0.00625):2  with points pt 5 ps 0.5 lt 1 title "300.00 K"

# You can plot other replica IDs by repeating previous step as follow:
plot "T-REMD_parmID-repID.log" using ($1*0.00625):11 with points pt 7 ps 0.5 lt 2 title "323.46 K"
plot "T-REMD_parmID-repID.log" using ($1*0.00625):21 with points pt 9 ps 0.5 lt 3 title "351.26 K"
triala_T-REMD_ParmID-1 triala_T-REMD_ParmID-10 triala_T-REMD_ParmID-20

These graphs indicate that the temperatures (parameterIDs) visit randomly each replica, and thus random walks in the temperature spaces are successfully realized.

We also plot time courses of temperatures in one replica. We need to plot one column in the “Parameter :” lines in step6.log versus time. Using following commands. we can get replica temperatures in each snapshot.

# make the file executable and use it
$ chmod u+x plot_temperature.sh
$ ./plot_temperature.sh

The file “plot_temperature.sh” contains the following commands:

# get replica temperatures in each snapshot
$ grep "Parameter    :" ../../6_production/step6.log | sed 's/Parameter    :/ /' > T-REMD_repID-temperatrue.dat

As previous step, we can use gunplot script to plot the replica IDs in each snapshot.

# plot tempreture IDs in each snapshot
$ gnuplot plot_repID-Temperature.gnuplot
#The plot_repID-Temperature.gnuplot include the following commands.

set terminal jpeg
set output "RepID1_10_20.jpg"
set yrange [0:21]
set mxtics
set mytics
set xlabel "Time (ns)"
set ylabel "Temperature (K)"
plot "T-REMD_repID-Temperature.log" using ($1*0.00625):2  with lines lt 3 title "repID=1 ",\
"T-REMD_repID-Temperature.log" using ($1*0.00625):11 with lines lt 2 title "repID=10",\
"T-REMD_repID-Temperature.log" using ($1*0.00625):21 with lines lt 1 title "repID=20"
triala_T-REMD_RepID_tempratures

The temperatures of each replica during the simulation are distributed in all temperatures assigned. It means that correct annealing of the system is realized.


7.3. Sort coordinates in DCD trajectory files by parameters

The temperature in output DCD files of T-REMD simulation have all range of temperatures, due to the exchange. Therefore, to analyze the simulation further, we first need to sort the frames in the trajectory based on their temperature. To do that, we use GENESIS analysis tool (remd_convert). Sorting is done based on the information written in remfiles generated from the REMD simulation. Concomitantly, we also sort log files for each replica based on temperature parameters

# change directory
$ cd ../3_sort
$ ls 
$ step7.remd_convert.inp
# Sort frames by parameters
$ /home/user/GENESIS-1.7.1/bin/remd_convert step7.remd_convert.inp | tee step7.remd_convert.log 

# The step7.remd_convert.inp control file as follow 
[INPUT]
psffile = ../../1_setup/wbox.psf
reffile = ../../1_setup/wbox.pdb
dcdfile = ../../6_production/step6_rep{}.dcd
remfile = ../../6_production/step6_rep{}.rem
logfile = ../../6_production/step6_rep{}.log

[OUTPUT]
pdbfile = trialanine.pdb
trjfile = remd_paramID{}_trialanine.dcd
logfile = remd_paramID{}_trialanine.log

[SELECTION]
group1 = atomno:1-42
group2 = resno:2 and (an:N or an:CA or an:C or an:O)

[FITTING]
fitting_method = TR+ROT
fitting_atom   = 2
mass_weight    = YES

[OPTION]
check_only    = NO
convert_type  = PARAMETER
exchange_period =        2500
crdout_period   =        500
eneout_period   =        500
num_replicas  = 20
convert_ids   =                # (empty = all)
nsteps        = 4000000
trjout_format = DCD
trjout_type   = COOR+BOX
trjout_atom   = 1
pbc_correct   = NO

Now we have sorted temperature log and DCD file which will be used in the following analysis steps.


7.4. Plot potential energy distribution for each temperature

Now as we already sorted the log files for each temperature parameters, we plot potential energy distribution to ensure sufficient overlap between all parameters. First, grep command we extract potential energies and step number, similar to previous plot of replica index.

# change directory
$ cd ../4_plot_potential
$ ls
$ plot_potential.sh step_numer.sh plot_potential.gnuplot

# make the file executable and use it
$ chmod u+x plot_potential.sh
$ ./plot_index.sh

The file “plot_index.sh” contains the following commands:

# get potential energies for each temperature
$ grep "INFO:" ../3_sort_dcd/remd_paramID1_trialanine.log  | tail -n +2 | awk '{print  $5}' > potential_energy_rep1.dat

$ grep "INFO:" ../3_sort_dcd/remd_paramID2_trialanine.log  | tail -n +2 | awk '{print  $5}' > potential_energy_rep2.dat
.
.
.
$ grep "INFO:" ../3_sort_dcd/remd_paramID20_trialanine.log  | tail -n +2 | awk '{print  $5}' > potential_energy_rep20.dat

The generated file does not include time steps. So we use step_number.sh to generate and combine step number to replica index. The step_number.sh file includes the following commands.

# get step number and combine to replica index
$grep "REMD> Step:" ../../6_production/step6.log | cut -c 12-25 > step.log
$paste step.log potential_energy_rep20.datID.dat > potential_energy_repID.log

We can plot the potential energy distribution using the provided plot_potential.gnuplot script

# plot potential energy distribution
$ gnuplot plot_potential.gnuplot
#The plot_potential.gnuplot include the following commands.

set terminal jpeg
set output "pot-tempretures.jpg"
set key below
set xlabel "Potential Energy kcal/mol"
set ylabel "Probability"
binwidth=50
bin(x,width)=width*floor(x/width)
ndata=1600
plot for [k=1:20] "potential_energy_rep".k.".pot" u (bin($2,binwidth)):(1.0/ndata) t "Tempreture".k with lines smooth freq
triala_T-REMD_RepID_tempratures

As can bee seen from the figure the potential energies of all temperatures have good overlap.


7.5. Calculating end to end distance

In order to calculate PMF of end to end distance distribution, in the current subsection we calculate the distance between the two terminal alanine (OY_HNT). In which we use GENESIS analysis tool trj_analysis as follow:

# change directory
$ cd ../5_end_end_distance
$ ls
$ step7.trj_end_end.inp

# The control file step7.trj_end_end.inp includes the following
[INPUT]
psffile = ../../1_setup/wbox.psf             # protein structure file
reffile = ../../1_setup/wbox.pdb             # PDB file

[OUTPUT]
disfile = parameter_ID1.dis

[TRAJECTORY]
trjfile1      = ../3_sort_dcd/remd_paramID1_trialanine.dcd       # output trajectory file
md_step1      = 4000000                          # number of MD steps
mdout_period1 = 500                              # MD output period
ana_period1   = 500                              # analysis period
trj_format    = DCD                              # (PDB/DCD)
trj_type      = COOR+BOX                         # (COOR/COOR+BOX)
trj_natom     = 42

[OPTION]
check_only     = NO
distance1      = PROA:1:ALA:OY PROA:3:ALA:HNT

# We run the analysis using the following command.
$ /home/user/GENESIS-1.7.1/bin/trj_analysis step7.trj_end_end.inp| tee step7.trj_end_end.log 

# Note that this step is repeated for all temperatures.

7.6. MBAR analysis

In order to use conformers from temperatures higher than the target temperature (300K), we apply GENESIS mbar_analysis tool where we use our sorted potential energy files as cv.

# change directory
$ cd ../6_MBAR
$ ls
$ step7.MBAR.inp

# The control file step7.MBAR.inp includes the following
[INPUT]
cvfile             = ../4_plot_potential/potential_energy_rep{}.pot # input cv file

[OUTPUT]
fenefile           = fene.dat  
weightfile         = weight{}.dat     
                               
[MBAR]
num_replicas       = 20
input_type         = REMD 
dimension          = 1
nblocks            = 1
self_iteration     = 100
newton_iteration   = 100
temperature        = 300.00 302.53 305.09 307.65 310.24 312.85 315.47 318.12 320.78 323.46 326.16 328.87 331.61 334.37 337.14 339.94 342.75 345.59 348.44 351.26
target_temperature = 300.00
tolerance = 10E-08
# We run the analysis using the following command.
$ /home/user/GENESIS-1.7.1/bin/mbar_analysis step7.MBAR.inp| tee step7.MBAR.log 

This produces “fene.dat” file containing the evaluated relative free energies  and 20 “weight*.dat” files containing the weights of each snapshot for each replica. For example, weight1.dat is as follows:


$ less weight1.dat
1  1.0065891378709005E-004
2  1.1736458534447951E-004
3  8.4343499264392580E-005
4  8.6783253986960640E-005
5  9.2384946365274938E-005
6  3.8196034013116886E-005
7  1.0958058585433545E-004
8  6.7623790893945265E-005
9  1.0870199631442051E-004
10 1.0330517310658403E-004

7.7. Calculating PMF of distance distribution

The final step of this tutorial is to use the calculated distances in 7.5. and weight files from MBAR analysis (7.6.) to calculate PMF of end to end distance distribution in alanine tripeptide. We use another tool in GENESIS (pmf_analysis) as follow:

# change directory
$ cd ../7_PMF
$ ls
$ step7.PMF.inp plot_potential.gnuplot

# The control file step7.PMF.inp includes the following
[INPUT]
cvfile             = ../5_end_end_distance/parameter_ID{}.dis        # input cv file
weightfile         = ../6_MBAR/weight{}.dat

[OUTPUT]
pmffile      = dist.pmf


[OPTION]
nreplica     = 20             # number of replicas
dimension    = 1              # dimension of cv space
temperature  = 300 
grids1       = 0 15 61    # (min max num_of_bins)
band_width1  = 0.25            # sigma of gaussian kernel
                              # should be comparable or smaller than the grid size
                              # (pmf_analysis creates histogram by accumulating gaussians)
is_periodic1 = NO            # periodicity of cv1

# We run the analysis using the following command: 
$ /home/user/GENESIS-1.7.1/bin/pmf_analysis step7.PMF.inp| tee step7.PMF.log 

We plot PMF of end to end distance using the provided plot_pmf.gnuplot script

# plot PMF 
$ gnuplot plot_potential.gnuplot
#The plot_pmfl.gnuplot include the following commands.

set terminal jpeg
set output "PMF_end_end.jpg"
set xrange [0:13]
set yrange [0:10]
set xlabel "End to End Distance (Å)"
set ylabel "PMF (kcal/mol)"
plot "dist.pmf" u 1:3 w l notitle

We can see that there is the global energy minimum around r = 10.125 Å and a local energy minimum around r = 2.88 Å. The latter corresponds to the α-helix conformation, where the hydrogen bond between OY and HNT is formed. These results suggest that in water the alanine tripeptide tends to form an extended conformation rather than α-helix. 

References

  1. Y. Sugita et al., Chemical Physics Letters, 314, 141–151 (1999).
  2. J. E. basconi et al., J. Chem. Theory Comput. 9, 2887-2899 (2013).
  3. A. Patriksson et al,  Phys. Chem. Chem. Phys.10, 2073-2077 (2008).
  4. M. Shirts et al., J. Chem. Phys., 129, 124105-124114 (2008).

Written by Daisuke Matsuoka@RIKEN Theoretical molecular science laboratory

Updated by Hisham Dokainish@RIKEN Theoretical molecular science laboratory
August, 28, 2019

Updated by Hisham Dokainish@RIKEN Theoretical molecular science laboratory
March, 31, 2021