12.3 gREST simulation of trialanine in water

In this tutorial, we illustrate how to perform and analyze the replica exchange with solute tempering (REST) simulations for alanine tri-peptide. REST was originally developed by Berne et al.[1, 2] to improve the performance of temperature replica exchange MD (T-REMD) simulations. Instead of increase the temperature of the whole system, temperature of the “solute” region is virtually increased. This modification significantly reduces the required number of replicas. In GENESIS, new scheme of REST, referred to as generalized REST (gREST) [3], is implemented. In gREST, the solute region is defined as a part of a molecule and/or a part of the potential energy terms. The rational choice of the solute region in the gREST framework effectively limits the energy space covered in REST and reduces the number of required replica. In the previous REMD tutorial, 20 replicas were needed while in this tutorial only 4 replicas are needed to cover the same temperature range.

When the temperature of the system (= target temperature) and the solute temperature are \(T_0\) and \(T_m\), respectively, the energy in gREST is modified as follows:

\( \displaystyle \beta_0 U_{m} = \beta_0 \left[ \frac{\beta_m}{\beta_0} U_{uu} + \left( \frac{\beta_m}{\beta_0} \right)^{l/n} U_{uv} + U_{vv} \right]\),

where \(\beta_0\) and \(\beta_m\) are the inverse temperatures, \(u\) and \(v\) in the subscript represent the solute and solvent regions, respectively, and \(U_{uv}\) is the interaction between the solute and solvent. For example, the dihedral energy term can be used as the solute. Please note that in the temperature-REMD, each replica has the different temperature of the system, while in gREST all replicas have the same temperature of the system (= \(T_0\)). Only the potential energy (\(U_m\)) is changed by scaling the energy terms in the solute.


Preparation

First, download the tutorial file(tutorial22-12.3). This tutorial consists of five steps: 1) system setup, 2) energy minimization and pre-equilibration, 3) equilibration,  4) production run, and 5) trajectory analysis. Control files for GENESIS are already included in the download file. Since we use the CHARMM force field, we make a symbolic link to the CHARMM toppar directory (see Tutorial 2.2).

# Download the tutorial file
$ cd Tutorials
$ mv ~/Downloads/tutorial22-12.3.zip ./
$ unzip tutorial22-12.3.zip
$ cd tutorial-12.3
$ ls
1_setup/ 2_minimize_pre-equi/ 3_equilibrate/ 4_production/ 5_analysis/

# Make a symbolic link to the CHARMM toppar directory (see Tutorial 2.2)
$ ln -s ../../Data/Parameters/toppar_c36_jul21 ./toppar

# Setup the directory for binary files of GENESIS 2.0 
$ export GENESIS_BIN_DIR=../../../GENESIS_Tutorials-2022/Programs/genesis-2.0/bin/


1. Setup

To setup the system, please follow the steps in the basic tutorial (see Tutorial 3.2). We use the same input PDB and PSF files as in Tutorial 3.2.

# Prepare the input files
$ cd 1_setup
$ ln -s ../../tutorial-3.2/1_setup/3_solvate/wbox.pdb ./
$ ln -s ../../tutorial-3.2/1_setup/3_solvate/wbox.psf ./


2.
Minimization and pre-equilibration

Non-physical steric clashes or non-equilibrium geometries in the initial structure must be resolved before the gREST simulation. Here, we do this via five steps: 1) minimization (step2.1), 2) equilibration in NVT ensemble (step2.2), 3) relaxation of the simulation box in NPT ensemble with positional restraints (step2.3), 4) equilibration in NPT ensemble without positional restraints, and 5) equilibration in NPT ensemble with 3.5 fs time step and r-RESPA (step2.5). In the 2_minimize_pre-equil directory, all control files “step2.*.inp” are given. For further details of the control file and choice of parameters, please refer to basic tutorials section (see Tutorial 3.2) and Tutorial 10.1 for simulations with large time step.

Let’s change directory.

#Change directory to 2_minimize_pre-equi
$ cd ../2_minimize_pre-equi
$ ls
step2.1.inp step2.2.inp step2.3.inp step2.4.inp step2.5.inp
2.1. Minimization

The first step of the simulation is to run energy minimization of the system, in order to remove atomic clashes in the initial structure. The control file “step2.1.inp” is as follows:

[INPUT]
topfile = ../toppar/top_all36_prot.rtf     # topology file
parfile = ../toppar/par_all36m_prot.prm    # parameter file
strfile = ../toppar/toppar_water_ions.str  # stream file
psffile = ../1_setup/wbox.psf              # protein structure file
pdbfile = ../1_setup/wbox.pdb              # PDB file

[OUTPUT]
dcdfile = step2.1.dcd          # DCD trajectory file
rstfile = step2.1.rst          # restart file

[ENERGY]
forcefield       = CHARMM  # [CHARMM]
electrostatic    = PME     # [PME]
switchdist       = 10.0    # switch distance
cutoffdist       = 12.0    # cutoff distance
pairlistdist     = 13.5    # pair-list distance
vdw_force_switch = YES     # force switch option for van der Waals
contact_check    = YES     # check atomic clash

[MINIMIZE]
method           = SD      # [SD]
nsteps           = 2000    # number of minimization steps
eneout_period    =   50    # energy output period
crdout_period    =   50    # coordinates output period
rstout_period    = 2000    # restart output period

[BOUNDARY]
type             = PBC     # [PBC]
box_size_x       = 50.2    # box size (x) in [PBC]
box_size_y       = 50.2    # box size (y) in [PBC]
box_size_z       = 50.2    # box size (z) in [PBC]

[SELECTION]
group1           = sid:PROA and heavy  # restraint group 1

[RESTRAINTS]
nfunctions       = 1        # number of functions
function1        = POSI     # restraint function type
direction1       = ALL      # direction [ALL,X,Y,Z]
constant1        = 1.0      # force constant
select_index1    = 1        # restrained groups

To execute the calculation, use the following command:

# Run energy minimization
$ export OMP_NUM_THREADS=5
$ mpirun -np 8 $GENESIS_BIN_DIR/spdyn step2.1.inp > step2.1.out 

After the calculation, check the trajectory by using VMD. We can see that the atoms are slightly moved, but the atomic clashes are actually removed. In this step, we used 8 MPI processors and 5 OpenMP threads, namely, 40 CPU cores.

2.2. Equilibration of the system at 300 K

The second step of the simulation is to heat up the system, with restraint on the peptide heavy atoms, to 300 K. The heating is performed via annealing process wherein the temperature is increased by 3 K every 500 steps. Total simulation is 100 ps. Note that in this step the integrator is Velocity Verlet (VVER) and Bussi thermostat. Please view the control input file (step2.2.inp). The following is the important options in the control file.

[DYNAMICS]
integrator       =   VVER   # [LEAP,VVER]
nsteps           =  50000   # number of MD steps
timestep         =  0.002   # timestep (ps)
eneout_period    =    500   # energy output period
crdout_period    =    500   # coordinates output period
rstout_period    =  50000   # restart output period
annealing        =    YES   # simulated annealing
anneal_period    =    500   # annealing period
dtemperature     =      3   # temperature change at annealing (K)
iseed            =  31415   # random number seed

[CONSTRAINTS]
rigid_bond       = YES      # constraints all bonds involving hydrogen

[ENSEMBLE]
ensemble         = NVT      # [NVE,NVT,NPT,NPAT,NPgT]
tpcontrol        = BUSSI    # [NO,BERENDSEN,BUSSI,NHC]
temperature      = 0.1      # initial and target temperature (K)

To execute the simulation, we use similar commands as previous step.

# Run heating step
$ export OMP_NUM_THREADS=5
$ mpirun -np 8 $GENESIS_BIN_DIR/spdyn step2.2.inp > step2.2.out
2.3. Relaxation of the simulation box

After the system reached the desired temperature 300.00 K, we relax the simulation box in NPT ensemble at 300 K and 1 atm for 50 ps. The following is the important options in the control file (step2.3.inp).

[DYNAMICS]       
integrator       =   VVER   # [LEAP,VVER]
nsteps           =  25000   # number of MD steps 
timestep         =  0.002   # timestep (ps)
eneout_period    =    500   # energy output period
crdout_period    =    500   # coordinates output period
rstout_period    =  25000   # restart output period

[ENSEMBLE]
ensemble         = NPT      # [NVE,NVT,NPT,NPAT,NPgT]
tpcontrol        = BUSSI    # [NO,BERENDSEN,BUSSI,NHC]
temperature      = 300      # initial and target temperature (K)
pressure         = 1.0      # target pressure (atm)

To execute the simulation, we use the following command.

# Run relaxation of the system with restraints
$ export OMP_NUM_THREADS=5
$ mpirun -np 8 $GENESIS_BIN_DIR/spdyn step2.3.inp > step2.3.out
2.4. Equilibration with no positional restraints

We equilibrate all atom positions including the peptide atoms without positional restraints (removes [RESTRAINTS]section). Here we turn on the functions of hydrogen mass repartitioning (HMR) and group temperature/pressure (group T/P) to use the 3.5 fs time step in the next equilibration step.  Please note that this option is only available in GENESIS v2.0 and later. The following is the important options in the control file (step2.4.inp).

[DYNAMICS]
integrator       =   VVER   # [LEAP,VVER,VRES]
nsteps           =  25000   # number of MD steps (50 ps)
timestep         =  0.002   # timestep (2 fs)
eneout_period    =    500   # energy output period
crdout_period    =    500   # coordinates output period
rstout_period    =  25000   # restart output period
hydrogen_mr      = yes      # Turn on HMR
hmr_ratio        = 3.0
hmr_ratio_xh1    = 2.0
hmr_target       = solute   # Apply HMR only to solute 

[ENSEMBLE]
ensemble         = NPT      # [NVE,NVT,NPT,NPAT,NPgT]
tpcontrol        = BUSSI    # [NO,BERENDSEN,BUSSI,LANGEVIN]
temperature      = 300      # initial and target temperature (K)
pressure         = 1.0      # target pressure (atm)
group_tp         = YES      # usage of group tempeature and pressure

To run this step, we use the following command.

# Run equilbration without restraints
$ export OMP_NUM_THREADS=5
$ mpirun -np 8 $GENESIS_BIN_DIR/spdyn step2.4.inp > step2.4.out
2.5. Equilibration with 3.5 fs time step

Finally we run an additional 105-ps pre-equilibration simulation in NPT ensemble. In this step, we equilibrate the system with the multiple time step integrator, r-RESPA (VRES, timestep of 3.5 fs). The integration with the longer time step requires HMR and group T/P options. The following is the important options in the control file (step2.5.inp).

[DYNAMICS]
integrator        =   VRES  # [LEAP,VVER,VRES]
nsteps            =  30000  # number of MD steps (105 ps)
timestep          = 0.0035  # timestep (3.5 fs)
eneout_period     =    300  # energy output period (1.05 ps)
crdout_period     =    300  # coordinates output period (1.05 ps)
rstout_period     =  30000  # restart output period
nbupdate_period   =      6  # nonbond update period
elec_long_period  =      2  # period of reciprocal space calculation
thermostat_period =      6  # period of thermostat update
barostat_period   =      6  # period of barostat update
hydrogen_mr       = yes     # Turn on HMR
hmr_ratio         = 3.0
hmr_ratio_xh1     = 2.0
hmr_target        = solute  # Apply HMR only to solute

[ENSEMBLE]
ensemble         = NPT      # [NVE,NVT,NPT,NPAT,NPgT]
tpcontrol        = BUSSI    # [NO,BERENDSEN,BUSSI,LANGEVIN]
temperature      = 300      # initial and target temperature (K)
pressure         = 1.0      # target pressure (atm)
group_tp         = YES      # usage of group tempeature and pressure

To run this step, we use the following command.

# Run equilbration without restraints
$ export OMP_NUM_THREADS=5
$ mpirun -np 8 $GENESIS_BIN_DIR/spdyn step2.5.inp > step2.5.out

3. Equilibration for gREST

In this step, we equilibrate the system for each replica without replica exchange. Let’s move to the directory for equilibration of gREST.

#Change directory to 3_equilibrate
$ cd ../3_equilibrate
$ ls
step3.inp 

The control file of gREST is very similar to that of T-REMD. The only difference is that gREST needs select_indexN and param_typeN to specify the “solute” region. The rest of system is regarded as “solvent”. In this example, we define the the LJ and dihedral angle of potential energy terms in Ala3 molecule as the solute region (param_type1 = D L in [REMD] section and group1 = ai:1-42 in [SELECTION] section).  In default, all of potential energy terms are treated as the solute (param_type1 = ALL in [REMD] section). If you want to treat other parts of potential energy terms as the solute, param_typeN parameter may be modified (e.g. param_type1 = C CM to specify only charge and CMAP terms as solute.).

We here use four replicas with the solute temperature range of 300.00 K – 351.26 K (parameters1 = 300.00 318.12 337.14 351.26 in [REMD] section) as a example. If you apply gREST to your system, you must determine the number and the temperatures of replicas. To find the number of replicas and temperatures, please refer to the appendix ((see Automatic Parameter Tuning for REMD, REUS, REST)). Each replica must be equilibrated at the selected temperature just like conventional MD simulations. Note that in this step as well as the next step (production run) we use NVT ensemble. The r-RESPA integrator is combined with HMR and group T/P in order to use 3.5 fs time step. The following is the important options in the control file (step3.inp).

[REMD]
dimension        = 1            # number of parameter types
exchange_period  = 0            # NO exchange for equilibration
iseed            = 3141592
type1            = REST         # Replica Exchange with Solute Tempering
nreplica1        = 4            # number of replicas
parameters1      = 300.00 318.12 337.14 351.26 # list of solute temperatures
select_index1    = 1            # solute region selection
param_type1      = D L          # function types
                                # valid keywords are:
                                # ALL (default), BOND(B), ANGLE(A),
                                # DIHEDRAL(D), IMPROPER(I), CMAP(CM),
                                # CHARGE(C), LJ(L)...
                                # See manual for further details.

[SELECTION]
group1           = ai:1-42

[DYNAMICS]
integrator        =   VRES  # [LEAP,VVER,VRES]
nsteps            =  30000  # number of MD steps (105ps)
timestep          = 0.0035  # timestep (3.5fs)
eneout_period     =    300  # energy output period (1.05ps)
crdout_period     =    300  # coordinates output period (1.05ps)
rstout_period     =   3000  # restart output period
nbupdate_period   =      6  # nonbond update period
elec_long_period  =      2  # period of reciprocal space calculation
thermostat_period =      6  # period of thermostat update
barostat_period   =      6  # period of barostat update
hydrogen_mr       = yes     # Turn on HMR
hmr_ratio         = 3.0
hmr_ratio_xh1     = 2.0
hmr_target        = solute  # Apply HMR only to solute

[ENSEMBLE]
ensemble        = NVT       # [NVE,NVT,NPT]
tpcontrol       = BUSSI     # thermostat and barostat
temperature     = 300.0     # initial temperature (K)
pressure        = 1.0       # target pressure (atm)
group_tp        = YES

The following command performs a 105 ps NVT gREST simulations. Here we use 160 (= 32 MPI x 5 OpenMP) CPU cores for the simulations.

# Run gREST equilibiration step  
$ export OMP_NUM_THREADS=5
$ mpirun -np 32 $GENESIS_BIN_DIR/spdyn step3.inp > step3.out

4. Production

Since we have now completed all preparation steps, now we can start running the production simulation. Let’s move to the directory for production of gREST.

#Change directory
$ cd ../4_production
$ ls
step4.inp 

We run a short simulation for 10.5 ns for the purpose of tutorial, however longer simulation might be needed for conversion. The following control file is used to run the simulation in NVT ensemble. One needs to modify the control file for the equilibration such that the exchange_period has a non-zero value (e.g. 3000 steps (10.5 ps)) and [INPUT] and [OUTPUT] files are properly refered. In order to analyze the free energy, you need to select the analysis_grest=YES in [REMD]. The option enables calculation of energies with different solute temperatures. The energies with different solute temperatures of each replica are written in step4_rep{}.ene. The important part of the control file (step4.inp) is as follows:

[REMD]
dimension        = 1            # number of parameter types
exchange_period  = 3000         # exchange per 3000*3.5fs = 10.5 ps
type1            = REST         # Replica Exchange with Solute Tempering
nreplica1        = 4            # number of replicas
parameters1      = 300.00 318.12 337.14 351.26 # list of solute temperatures
select_index1    = 1            # solute region selection
param_type1      = D L          # function types
                                # valid keywords are:
                                # ALL (default), BOND, ANGLE,
                                # DIHEDRAL, IMPROPER, CMAP,
                                # CHARGE LJ...
                                # See manual for further details.
analysis_grest   = YES          

[SELECTION]
group1           = ai:1-42

[DYNAMICS]
integrator        =    VRES # [LEAP,VVER,VRES]
nsteps            = 3000000 # number of MD steps (10.5 ns)
timestep          =  0.0035 # timestep (3.5 fs)
eneout_period     =     300 # energy output period (1.05 ps)
crdout_period     =     300 # coordinates output period (1.05 ps)
rstout_period     =   30000 # restart output period (105 ps)
nbupdate_period   =       6 # nonbond update period
elec_long_period  =       2 # period of reciprocal space calculation
thermostat_period =       6 # period of thermostat update
barostat_period   =       6 # period of barostat update
hydrogen_mr       = yes     # Turn on HMR
hmr_ratio         = 3.0
hmr_ratio_xh1     = 2.0
hmr_target        = solute  # Apply HMR only to solute

[ENSEMBLE]
ensemble         = NVT       # [NVE,NVT,NPT]
tpcontrol        = BUSSI     # thermostat and barostat
temperature      = 300.00    # K
group_tp         = YES       # usage of group tempeature and pressure  

To run gREST production run, we use the following commands.

# Run gREST production step
$ export OMP_NUM_THREADS=5
$ mpirun -np 32 $GENESIS_BIN_DIR/spdyn step4.inp > step4.out

5. Analysis

In this tutorial, we mainly focus on calculating PMF of the end to end distance distribution at 300 K. In which, we use all temperatures trajectory upon applying the Multistate Bennett Acceptance Ratio (MBAR) re-weighting method. For information on MBAR method, please check [4]. However, before calculating PMF, we first check the simulation by calculating acceptance ratio, replica random walk and temperature potential energy distribution.

In gREST control file, we setup the exchange_period=3000 which means replica exchange is attempt every 10.5 ps. In the log output of the gREST simulation, we can see the information about replica-exchange attempts at every exchange_period steps.

REMD> Step:    2982000   Dimension:    1   ExchangePattern:    1
  Replica      ExchangeTrial             AcceptanceRatio      Before       After
        1          4 >     3   A         400 /       497     351.260     337.140
        2          2 >     1   A         394 /       497     318.120     300.000
        3          1 >     2   A         394 /       497     300.000     318.120
        4          3 >     4   A         400 /       497     337.140     351.260
  Parameter    :    337.140   300.000   318.120   351.260
  RepIDtoParmID:          3         1         2         4
  ParmIDtoRepID:          2         3         1         4

In this log file, we should pay attention to the AcceptanceRatio values. If those values are  much lower than target, you should review the parameters of your simulation, such as modifying the temperature range. In this table, 'A' and 'R' mean that the exchange at this step is accepted or rejected, respectively. The last two columns show replica temperatures before and after the exchange trials, respectively.

Lines in red summarize the locations and parameters after replica exchanges. The Parameter line gives the temperature of each replica in gREST simulation. The RepIDtoParmID line stands for the permutation function that converts Replica ID to Parameter ID. For example, in the 1st column, 3 is written, which means that the temperature of Replica 1 is set to 337.140 K. The ParmIDtoRepID line also represents the permutation function that converts Parameter ID to Replica ID. For example, in the 3th column, 1 is written, which means that Parameter 3 (corresponding to the replica temperature, 337.140 K) is located in Replica 1.

Now, please change the directory to  analysis and proceed with the following steps:

# change directory
$ cd ../5_analysis
$ ls
1_calc_ratio  2_plot_index  3_sort  4_clac_dist  5_mbar  6_pmf
5.1. Calculate the acceptance ratio of each replica

Acceptance ratio of replica exchange is one of the important factors that determine the efficiency of gREST simulations. The acceptance ratio is displayed in a standard log output “step4.out“, and we examine the data from the last step. Here, we show an example how to examine the data. Note that the acceptance ratio of replica “A” to “B” is identical to “B” to “A”, and thus we calculate only “A” to “B”. For this calculation, you can use the script “calc_ratio.sh“.

# change directory
$ cd 1_calc_ratio
$ ls
calc_ratio.sh

# make the file executable and use it
$ chmod u+x calc_ratio.sh
$ ./calc_ratio.sh

1 > 2 0.826
2 > 3 0.804
3 > 4 0.894

The file “calc_ratio.sh” contains the following commands:

# get acceptance ratios between adjacent parameter IDs
$ grep "  1 >     2" ../../4_production/step4.out | tail -1  > acceptance_ratio.dat
$ grep "  2 >     3" ../../4_production/step4.out | tail -1 >> acceptance_ratio.dat
$ grep "  3 >     4" ../../4_production/step4.out | tail -1 >> acceptance_ratio.dat

# calculate the ratios
$ awk '{print $2,$3,$4,$6/$8}' acceptance_ratio.dat

Note that the average acceptance ration in this case is quite high, representing that a much larger temperature range can be achieved with 4 replica. For consistency and comparison to REMD tutorial, the temperature range was kept the same.

5.2. Plot time courses of replica indices and temperatures

To examine the random walks of each replica in temperature space, we analyze time course of the replica indices. We need to plot the values of the “ParmIDtoRepID” lines from step4.out for a chosen starting replica temperature, for example 300 K (first column), versus time. Using following commands, we can get replica IDs in each snapshot.

# change directory
$ cd ../2_plot_index
$ ls
plot_index.sh plot_temperature.sh

# make the file executable and use it
$ chmod u+x plot_index.sh
$ ./plot_index.sh

The file “plot_index.sh” contains the following commands:

# get replica IDs in each snapshot
$ grep "ParmIDtoRepID:" ../../4_production/step4.out | sed 's/ParmIDtoRepID:/ /' > param.dat

Using following gnuplot commands, we can plot the replica IDs in each snapshot.

# make input file for gnuplot
cat << EOF > tmp.plt
set terminal png
set yrange [0:5] 
set mxtics 
set mytics 
set xlabel "Time (ns)" 
set ylabel "Replica ID"

# plot replica IDs which visited 300.00 K 
set output "output_index1.png"
plot "param.dat" using (\$0*10.5/1000):1 with points pt 5 ps 0.5 lt 1 title "300.00 K"

# plot replica IDs which visited 318.12 K 
set output "output_index2.png"
plot "param.dat" using (\$0*10.5/1000):2 with points pt 5 ps 0.5 lt 1 title "318.12 K"

# plot replica IDs which visited 337.14 K 
set output "output_index3.png"
plot "param.dat" using (\$0*10.5/1000):3 with points pt 5 ps 0.5 lt 1 title "337.14 K"

# plot replica IDs which visited 351.26 K 
set output "output_index4.png"
plot "param.dat" using (\$0*10.5/1000):4 with points pt 5 ps 0.5 lt 1 title "351.26 K"
EOF

# execute gnuplot
gnuplot ./tmp.plt

This graph indicate that the temperatures (300 K) visit randomly each replica, and thus random walks in the temperature spaces are successfully realized.

We also plot time courses of temperatures in one replica. We need to plot one column in the “Parameter :” lines in step4.out versus time. Using following commands. we can get replica temperatures in each snapshot.

# make the file executable and use it
$ chmod u+x plot_temperature.sh
$ ./plot_temperature.sh

The file “plot_temperature.sh” contains the following commands:

# get replica temperatures in each snapshot
$ grep "Parameter    :" ../../4_production/step4.out | sed 's/Parameter    :/ /' > temperature.dat

As previous step, we can use gunplot script to plot the parameter IDs (= replica temperatures)  in each snapshot.

# make input file for gnuplot
cat << EOF > tmp.plt
set term png
set mxtics
set mytics
set xlabel "Time (ps)"
set ylabel "Temperature (K)"
set output "output_temperature_rep1.png"
plot "temperature.dat" using (\$0*10.5/1000):1  with lines lc rgb "blue" title "repID=1 "
set output "output_temperature_rep2.png"
plot "temperature.dat" using (\$0*10.5/1000):2  with lines lc rgb "blue" title "repID=2 "
set output "output_temperature_rep3.png"
plot "temperature.dat" using (\$0*10.5/1000):3  with lines lc rgb "blue" title "repID=3 "
set output "output_temperature_rep4.png"
plot "temperature.dat" using (\$0*10.5/1000):4  with lines lc rgb "blue" title "repID=4 "
EOF

# execute gnuplot
gnuplot tmp.plt

The temperatures of each replica during the simulation are distributed in all temperatures assigned. It means that correct annealing of the system is realized.

5.3. Sort coordinates in DCD trajectory files by parameters

The temperature in output DCD files of gREST simulation have all range of temperatures, due to the exchange. Therefore, to analyze the simulation further, we first need to sort the frames in the trajectory based on their temperature. To do that, we use GENESIS analysis tool (remd_convert). Sorting is done based on the information written in remfiles generated from the gREST simulation. Concomitantly, we also sort log and energy files for each replica based on temperature parameters. The sorted ene files will be used in the next step in MBAR analysis.

# change directory
$ cd ../3_sort
$ ls 
remd_convert.inp

# Sort frames by parameters
$ $GENESIS_BIN_DIR/remd_convert remd_convert.inp | tee remd_convert.out

This example sorts the trajectory of the solute temperature of 300.00 K (parameterID = 1) (convert_type = PARAMETER, convert_ids = 1 in [OPTION] section). The system is aligned to the backbone of central alanine during the sorting (fitting_atom = 2 in [FITTING] section, group2 = resno:2 and (an:N or an:CA or an:C or an:O in [SELECTION] section), and the water molecules are removed from the sorted trajectory (trjout_atom = 1 in [OPTION] section, group1 = ai:1-42 in [SELECTION] section). The control file “remd_convert.inp” is as follows:

[INPUT]
psffile = ../../1_setup/wbox.psf
reffile = ../../1_setup/wbox.pdb
dcdfile = ../../4_production/step4_rep{}.dcd    # see remd_conv.sh
remfile = ../../4_production/step4_rep{}.rem    # see remd_conv.sh
enefile = ../../4_production/step4_rep{}.ene  # enefile of gREST

[OUTPUT]
pdbfile         = param.pdb    # pdbfile
trjfile         = param{}.dcd  # sorted dcdfile
enefile         = param{}.ene  # sorted enefile

[SELECTION]
group1          = ai:1-42   # only tri-alanine
group2          = resno:2 and (an:N or an:CA or an:C or an:O) # backbone of 2nd ALA

[FITTING]
fitting_method  = TR+ROT    # center-of-mass translation + rotation fitting
fitting_atom    = 2         # fitting to backbone of central alanine
mass_weight     = YES       # mass-weighted fitting

[OPTION]
check_only      = NO
convert_type    = PARAMETER
convert_ids     =           # only lowest T replicas
num_replicas    = 4
nsteps          = 3000000
exchange_period = 3000
crdout_period   = 300       # trjectory output frequency in REMD
eneout_period   = 300       # energy output frequency in REMD
trjout_format   = DCD
trjout_type     = COOR+BOX
trjout_atom     = 1         # output only tri-alanine moiety
pbc_correct     = NO        # nothing will happen when water mols excluded

Now we have sorted temperature log, ene and DCD file which will be used in the following analysis steps.

One can check the sorted trajectory by following commands:

# Check the sorted dcd file for ParameterID 1 (300.00 K)
$ vmd ./param.pdb ./param1.dcd

param*.ene” files contain the energies at different solute temperatures at the snapshots of each replica. For example, param1.ene is shown below.  The second column represents the potential energy at 300.00 K. The third, fourth, and fifth columns represent the potential energies, which are estimated at 318.12 K, 337.14 K, and 351.26 K, respectively, using the trajectory of 300.00 K.

 300       -38420.8021     -38420.7701     -38420.6700     -38420.5615
 600       -38437.8129     -38437.7333     -38437.5876     -38437.4485
 900       -38454.1105     -38454.6977     -38455.1828     -38455.4708
1200       -38523.4070     -38523.3613     -38523.2382     -38523.1079
1500       -38452.2631     -38452.5241     -38452.6885     -38452.7519
1800       -38472.7160     -38472.7599     -38472.7295     -38472.6667

5.4. Calculating end to end distance

In order to calculate the potential of the mean force (PMF) of the end-to-end distance distribution, in the current subsection we calculate the distance between the two terminal alanine (OY_HNT).

# change directory
$ cd ../4_calc_distance
$ ls
calc_dist.sh

# make the file executable and execute the script
$ chmod u+x calc_dist.sh
$ ./calc_dist.sh

In the script “calc_dist.sh”, we use GENESIS analysis tool trj_analysis as follow:

for i in 1 2 3 4; do

# Make input files for each Parameter ID.
cat << EOF > inp${i}
[INPUT]
reffile = ../3_sort/param.pdb

[OUTPUT]
disfile = param${i}.dis

[TRAJECTORY]
trjfile1      = ../3_sort/param${i}.dcd
md_step1      = 3000000         # number of MD steps
mdout_period1 = 300             # MD output period
ana_period1   = 300             # analysis period
repeat1       = 1
trj_format    = DCD             # (PDB/DCD)
trj_type      = COOR+BOX        # (COOR/COOR+BOX)
trj_natom     = 0               # (0:uses reference PDB atom count)

[OPTION]
check_only    = NO              # (YES/NO)
distance1     = PROA:1:ALA:OY PROA:3:ALA:HNT
EOF

# Execute trj_analysis
trj_analysis ./inp${i}

done
5.5. MBAR analysis

In order to use conformers from temperatures higher than the target temperature (300K), we use the MBAR method. MBAR reweights each snapshot of each replica into the target temperature and provide the unbiased weight for each snapshot:

\( \displaystyle W_{jn} = \frac{1}{c} \frac{\exp [-\beta_0 U_0(x_{jn})]}{\sum_k N_k \exp [\beta_0 (f_k – U_k(x_{jn}))]},\)

where \(x_{jn}\) is a configuration of snapshot \(n\) at replica \(j\), \(f_k\) is the free energy of replica \(k\), and \(c\) is a normalization constant. \(U_0\) is the potential energy function at \(\beta_0\). \(U_k(x_{jn})\) corresponds to energies in param*.ene files. Please note that the values of the solute temperatures are not required for reweighting because their information is already included in \(U_k(x_{jn})\).

We apply GENESIS mbar_analysis tool where we use our sorted energy files as cv.

# change directory
$ cd ../5_mbar
$ ls
mbar_analysis.inp


# We run the analysis using the following command.
$ $GENESIS_BIN_DIR/mbar_analysis mbar_analysis.inp | tee mbar_analysis.log 

The control file ” mbar_analysis.inp” is as follows:

[INPUT]
cvfile             = ../3_sort/param{}.ene # input cv file

[OUTPUT]
fenefile           = fene.dat
weightfile         = weight{}.dat

[MBAR]
nreplica           = 4
input_type         = REST
dimension          = 1
nblocks            = 1
temperature        = 300.00
target_temperature = 300.00

input_type is set to REST to reweight gREST trajectories. temperature and target_temperature are set to 300.00. As explained above, the information about solute temperatures are not needed for reweighting. mbar_analysis produces “fene.dat” file containing the evaluated relative free energies  and 4 “weight*.dat” files containing the weights of each snapshot for each replica, which are reweighted to 300.00 K. For example, weight1.dat is shown below. The first line corresponds to the weight of the first snapshot in param1.dcd. Each weight represents the probability of each snapshot at 300.00 K.

$ head weight1.dat
         300  1.836119160862883E-005
         600  1.962571423551713E-005
         900  4.720421550879687E-006
        1200  1.883861114197157E-005
        1500  1.100339116123613E-005
        1800  1.636948804025372E-005
        2100  1.812549281313518E-005
        2400  2.113452205982935E-005
        2700  8.409791663698055E-006
        3000  1.536017798686323E-005


5.6. Calculating PMF of distance distribution

The final step of this tutorial is to use the calculated distances in 5.4. and weight files from MBAR analysis (5.5.) to calculate the PMF of the end-to-end distance distribution in Ala3. We use another tool in GENESIS (pmf_analysis) as follow:

# change directory
$ cd ../6_pmf
$ ls
pmf_analysis.inp plot_pmf.sh

# We run the analysis using the following command: 
$ $GENESIS_BIN_DIR/pmf_analysis pmf_analysis.inp | tee pmf_analysis.log 

The control file ” pmf_analysis.inp” is as follows:

[INPUT]
cvfile       = ../4_calc_dist/param{}.dis
weightfile   = ../5_mbar/weight{}.dat

[OUTPUT]
pmffile      = dist.pmf

[OPTION]
nreplica     = 4             # number of replicas
dimension    = 1             # dimension of cv space
temperature  = 300.0
grids1       = 0 15 101      # (min max num_of_bins)
band_width1  = 0.1
is_periodic1 = NO            # periodicity of cv1

We plot the PMF using the provided script, “plot_pmf.sh”.

# make the file executable and plot PMF
$ chmod u+x plot_pmf.sh
$ ./plot_pmf.sh

We can see that there is the global energy minimum around r = 10 Å and a local energy minimum around r = 2.8 Å. The latter corresponds to the α-helix conformation, where the hydrogen bond between OY and HNT is formed. These results suggest that in water the alanine tripeptide tends to form an extended conformation rather than α-helix. 

References

  1. P. Liu, B. Kim, R. A. Friesner, B. J. Berne, Replica exchange with solute tempering: A method for sampling biological systems in explicit water. Proc. Natl. Acad. Sci. U.S.A.102, 13749–13754 (2005).
  2. L. Wang, R. A. Friesner, B. J. Berne, Replica exchange with solute scaling: A more 683efficient version of replica exchange with solute tempering (REST2). J. Phys. Chem. B 115, 9431–9438 (2011).
  3. M. Kamiya, Y. Sugita, Flexible selection of the solute region in replica exchange with solute tempering: Application to protein-folding simulations. J. Chem. Phys. 149, 072304 (2018)
  4. M. Shirts et al., J. Chem. Phys., 129, 124105-124114 (2008).

Written Mar 1, 2018 by Motoshi Kamiya@RIKEN Computational biophysics research team

Updated Feb 25, 2019 by Yasuhiro Matsunaga@RIKEN R-CCS

Updated Aug 31, 2019 by Suyong Re@RIKEN BDR

Updated Jul 2, 2020 by Chigusa Kobayashi@RIKEN R-CCS

Updated Jul 1, 2022 by Hisham Dokainish@RIKEN CPR

Updated Sep 14, 2022 by Hiraku Oshima@RIKEN BDR