Glycans (carbohydrates) are a major group of biopolymers with structural and functional importance in biology. Protein glycosylation is a post-translational modification observed in all domains of life.1 It is estimated that most proteins in nature are glycosylated, and the most abundant type of protein glycosylation is N-glycosylation.2 Studying glycans and glycoconjugates are important to understand the many processes for which they are involved, including protein folding, solvation, trafficking, protease protection, and more.3

Experimental identification of glycans are particularly challenging compared to proteins or nucleic acids because they can exhibit extensive branching. Their synthesis is not explicitly encoded in genes, but rather result from the differential expression of carbohydrate enzymes.4 Furthermore, there exist many structural and stereochemical isomers for each monomer, producing an enormous space for possible glycan structures. Simulation of glycans is therefore of particular importance to provide a theoretical reference to experimental measurements.

Meanwhile, simulation of glycans also presents challenges. Glycans have many degrees of freedom, thereby we must consider a vast set of conformations to accurately model their behavior. Due to the many sites for intramolecular hydrogen bonds, the energetic barrier between conformers is high. While enhanced sampling methods may be important for accurately simulating the dynamics of glycan systems, we will perform conventional molecular dynamics (MD) simulation for this tutorial.

Setup System in CHARMM-GUI

CHARMM-GUI offers a Glycan Reader & Modeler convenient for setting up simulations of glycan systems using the CHARMM force field.5 Let us use this tool to obtain the initial coordinates and parameter files for the simulation.

From the Input Generator column of CHARMM-GUI, select Glycan Reader & Modeler. Because our system is glycan only, select “Glycan Only System” from the bottom of the page.

Building the Structure

Let us build the glycan molecule from its sequence. Specify whether the anomer is α or β, select the monosaccharide, and define the positions of the glycosidic bonds. If branches are present, click the + button of the unit from which branching occurs. No trouble if you make a mistake in entering the structures, just click the button to delete a unit. Confirm the structure displayed in Sequence Graph is consistent with the reference structure. Once complete, select “Next Step: Generate PDB/PSF”.

Solvation

The built molecule is embedded in three dimensional coordinates. This model is subsequently solvated in a waterbox. Let us select “Fit Waterbox Size to Protein Size” from the Waterbox Size Options section. Specify the edge distance to be 17.0Å for a rectangular box. We deselect “Include Ions”. Select “Next Step: Solvate Molecule”.

Periodic Boundary Condition

The glycan model is now solvated in a waterbox with no ions. Let us simply fit the grid information for particle-mesh Ewald (PME) fast Fourier transform (FFT) automatically to the system. In the Periodic Boundary Condition Options section, select “Generate grid information for PME FFT automatically”, and move to next section by clicking “Next Step: Setup Periodic Boundary Condition”.

Input File Generation

The simulation system has been set up. We select the input files for equilibration and production. Let us use the CHARMM36m force field in the Force Field Options. Select “GENESIS” from the Input Generation Options. We will modify these files later, but for now, select “NVT Ensemble” for both equilibration and dynamics inputs. Finally, set the temperature to 300.0K. Complete the selections by selecting “Next Step: Generate Equilibration and Dynamics inputs”.

Download

The input files are now ready to download. Select “download.tgz” from the right to obtain files as a compressed directory. Transfer the compressed directory to a project folder to perform simulation with GENESIS.

Extract the directory and navigate to the genesis/ subdirectory.

# extract the directory of generated files
tar -xzf charmm-gui.tgz
# go to the directory containing the GENESIS input files
cd charmm-gui-4212660946/genesis/
ls
output
 restraints
 step3_input.crd
 step3_input.pdb
 step3_input.psf
 step4.0_minimization.inp
 step4.1_equilibration.inp
 step5_production.inp
 sysinfo.dat

The input scripts can be modified for our use. Consider downloading the input scripts here, or modify them yourself.

Minimization

We first perform energy minimization to the system to obtain a representative starting structure to simulation. The downloaded script, step4.0_minimization.inp can be used with no modification. Let us inspect the input script.

more step4.0_minimization.inp
output
 [INPUT]
 topfile = ../toppar/top_all36_prot.rtf, ../toppar/top_all36_na.rtf, ../toppar/top_all36_carb.rtf, ../toppar/top_all36_lipid.rtf, ../toppar/top_all36_cgenff.rtf, ../toppar/top_interface.rtf 
 parfile = ../toppar/par_all36m_prot.prm, ../toppar/par_all36_na.prm, ../toppar/par_all36_carb.prm, ../toppar/par_all36_lipid.prm, ../toppar/par_all36_cgenff.prm, ../toppar/par_interface.prm 
 strfile = ../toppar/toppar_all36_moreions.str, ../toppar/toppar_all36_nano_lig.str, ../toppar/toppar_all36_nano_lig_patch.str, ../toppar/toppar_all36_synthetic_polymer.str, ../toppar/toppar_all36_synthetic_polymer_patch.str, ../toppar/toppar_all36_polymer_solvent.str, ../toppar/toppar_water_ions.str, ../toppar/toppar_dum_noble_gases.str, ../toppar/toppar_ions_won.str, ../toppar/toppar_all36_prot_arg0.str, ../toppar/toppar_all36_prot_c36m_d_aminoacids.str, ../toppar/toppar_all36_prot_fluoro_alkanes.str, ../toppar/toppar_all36_prot_heme.str, ../toppar/toppar_all36_prot_na_combined.str, ../toppar/toppar_all36_prot_retinol.str, ../toppar/toppar_all36_prot_model.str, ../toppar/toppar_all36_prot_modify_res.str, ../toppar/toppar_all36_na_nad_ppi.str, ../toppar/toppar_all36_na_rna_modified.str, ../toppar/toppar_all36_lipid_sphingo.str, ../toppar/toppar_all36_lipid_archaeal.str, ../toppar/toppar_all36_lipid_bacterial.str, ../toppar/toppar_all36_lipid_cardiolipin.str, ../toppar/toppar_all36_lipid_cholesterol.str, ../toppar/toppar_all36_lipid_dag.str, ../toppar/toppar_all36_lipid_inositol.str, ../toppar/toppar_all36_lipid_lnp.str, ../toppar/toppar_all36_lipid_lps.str, ../toppar/toppar_all36_lipid_mycobacterial.str, ../toppar/toppar_all36_lipid_miscellaneous.str, ../toppar/toppar_all36_lipid_model.str, ../toppar/toppar_all36_lipid_prot.str, ../toppar/toppar_all36_lipid_tag.str, ../toppar/toppar_all36_lipid_yeast.str, ../toppar/toppar_all36_lipid_hmmm.str, ../toppar/toppar_all36_lipid_detergent.str, ../toppar/toppar_all36_lipid_ether.str, ../toppar/toppar_all36_carb_glycolipid.str, ../toppar/toppar_all36_carb_glycopeptide.str, ../toppar/toppar_all36_carb_imlab.str, ../toppar/toppar_all36_label_spin.str, ../toppar/toppar_all36_label_fluorophore.str 
 
 psffile = step3_input.psf                 # protein structure file
 pdbfile = step3_input.pdb                 # PDB file
 reffile = step3_input.pdb
 localresfile = restraints/step4.0_minimization.dihe
  
 [OUTPUT]
 rstfile = step4.0_minimization.rst
 
 [ENERGY]
 forcefield      = CHARMM        # [CHARMM]
 electrostatic   = PME           # [CUTOFF,PME]
 switchdist      = 10.0          # switch distance
 cutoffdist      = 12.0          # cutoff distance
 pairlistdist    = 13.5          # pair-list distance
 pme_nspline     = 4
 water_model     = NONE
 vdw_force_switch = YES
 contact_check   = YES          # avoid atomic crash
 
 [MINIMIZE]
 method          = SD
 nsteps          = 10000
 rstout_period   = 100
  
 [CONSTRAINTS]
 rigid_bond      = NO
 fast_water      = NO
 shake_tolerance = 1.0D-10
 
 [BOUNDARY]
 type            = PBC           # [PBC]
 box_size_x      = 67
 box_size_y      = 67
 box_size_z      = 67
 
 [SELECTION]
 group1          = (sid:CARB) and not hydrogen
 
 [RESTRAINTS]
 nfunctions      = 1
 
 function1       = POSI
 constant1       = 1.0
 select_index1   = 1

The file begins by listing the topology, parameter, and structure files downloaded from the CHARMM-GUI. We will only be using those for carbohydrates (e.g. ../toppar/top_all36_carb.rtf).

Then we have the protein structure file (PSF), protein database (PDB) file, and the reference PDB file for the system as we prepared it on the CHARMM-GUI. The output for the minimization calculation is a restart file, named step4.0_minimization.rst. The various parameters are listed in the subsequent sections.

Run the minimization as follows:

export OMP_NUM_THREADS=8
bindir=/home/user/genesis/bin

mpirun -np 8 ${bindir}/spdyn step4.0_minimization.inp > step4.0_minimization.log

Confirm we performed the calculation as we expected. Let us take the log file and extract the system information during minimization. Extract the second and third fields, which contain the step number and potential energy, respectively. Then convert the values from space-separated to comma-separated and save to file.

# extract relevant information from the log file and save to CSV
grep 'INFO:' step4.0_minimization.log | awk '{print $2,$3}' | sed 's/ /,/g' > results/step4.0_minimization.csv
# print the first few lines
head results/step4.0_minimization.csv
output
 ##STEP,POTENTIAL_ENE
 0,-111631.6654
 10,-111672.9057
 20,-111712.9913
 30,-111751.4066
 40,-111788.6135
 50,-111824.8910
 60,-111860.2577
 70,-111894.8479
 80,-111928.7750

Visualize the time dependent potential energy. Let us use the Python packages Pandas6 and Seaborn7 for this tutorial.

# import relevant packages
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# load data
df = pd.read_csv("results/step4.0_minimization.csv")

# Format the figure. Here we use a white background, 
# colorblind-friendly palette, and remove the line on the right and top
sns.set_theme(style= 'white', palette='colorblind', 
              rc={"axes.spines.right": False,  "axes.spines.top": False, 
              'xtick.bottom': True, 'ytick.left': True})

# define the figure, plot a line from the data, and show figure
plt.figure()
fig = sns.lineplot(data = df, x = 'STEP', y = 'POTENTIAL_ENE')
# label the axes
fig.set(xlabel ="step", ylabel = "potential energy")
plt.show()
plot

We confirm the potential energy is decreasing at each step as expected. Because the potential energy is plateauing, we observe the minimization is sufficiently converged.

Equilibration

We need to equilibrate the system in preparation to a production run. We first perform equilibration in the NPT ensemble to determine the system volume at 1.0atm pressure. Then we let the system equilibrate with the NVT ensemble.

NPT Equilibration

Let us use the downloaded file, step4.1_equilibration.inp as a template and write step4.1_equilibrationNPT.inp.

more step4.1_equilibrationNPT.inp
output
 [INPUT]
 topfile = ../toppar/top_all36_prot.rtf, ../toppar/top_all36_na.rtf, ../toppar/top_all36_carb.rtf, ../toppar/top_all36_lipid.rtf, ../toppar/top_all36_cgenff.rtf, ../toppar/top_interface.rtf 
 parfile = ../toppar/par_all36m_prot.prm, ../toppar/par_all36_na.prm, ../toppar/par_all36_carb.prm, ../toppar/par_all36_lipid.prm, ../toppar/par_all36_cgenff.prm, ../toppar/par_interface.prm 
 strfile = ../toppar/toppar_all36_moreions.str, ../toppar/toppar_all36_nano_lig.str, ../toppar/toppar_all36_nano_lig_patch.str, ../toppar/toppar_all36_synthetic_polymer.str, ../toppar/toppar_all36_synthetic_polymer_patch.str, ../toppar/toppar_all36_polymer_solvent.str, ../toppar/toppar_water_ions.str, ../toppar/toppar_dum_noble_gases.str, ../toppar/toppar_ions_won.str, ../toppar/toppar_all36_prot_arg0.str, ../toppar/toppar_all36_prot_c36m_d_aminoacids.str, ../toppar/toppar_all36_prot_fluoro_alkanes.str, ../toppar/toppar_all36_prot_heme.str, ../toppar/toppar_all36_prot_na_combined.str, ../toppar/toppar_all36_prot_retinol.str, ../toppar/toppar_all36_prot_model.str, ../toppar/toppar_all36_prot_modify_res.str, ../toppar/toppar_all36_na_nad_ppi.str, ../toppar/toppar_all36_na_rna_modified.str, ../toppar/toppar_all36_lipid_sphingo.str, ../toppar/toppar_all36_lipid_archaeal.str, ../toppar/toppar_all36_lipid_bacterial.str, ../toppar/toppar_all36_lipid_cardiolipin.str, ../toppar/toppar_all36_lipid_cholesterol.str, ../toppar/toppar_all36_lipid_dag.str, ../toppar/toppar_all36_lipid_inositol.str, ../toppar/toppar_all36_lipid_lnp.str, ../toppar/toppar_all36_lipid_lps.str, ../toppar/toppar_all36_lipid_mycobacterial.str, ../toppar/toppar_all36_lipid_miscellaneous.str, ../toppar/toppar_all36_lipid_model.str, ../toppar/toppar_all36_lipid_prot.str, ../toppar/toppar_all36_lipid_tag.str, ../toppar/toppar_all36_lipid_yeast.str, ../toppar/toppar_all36_lipid_hmmm.str, ../toppar/toppar_all36_lipid_detergent.str, ../toppar/toppar_all36_lipid_ether.str, ../toppar/toppar_all36_carb_glycolipid.str, ../toppar/toppar_all36_carb_glycopeptide.str, ../toppar/toppar_all36_carb_imlab.str, ../toppar/toppar_all36_label_spin.str, ../toppar/toppar_all36_label_fluorophore.str 
 
 psffile = step3_input.psf                 # protein structure file
 pdbfile = step3_input.pdb                 # PDB file
 reffile = step3_input.pdb
 rstfile = step4.0_minimization.rst              # restart file
 localresfile = restraints/step4.1_equilibration.dihe
  
 [OUTPUT]
 rstfile = step4.1_equilibrationNPT.rst
 dcdfile = step4.1_equilibrationNPT.dcd
 
 [ENERGY]
 forcefield      = CHARMM        # [CHARMM]
 electrostatic   = PME           # [CUTOFF,PME]
 switchdist      = 10.0          # switch distance
 cutoffdist      = 12.0          # cutoff distance
 pairlistdist    = 13.5          # pair-list distance
 pme_nspline     = 4
 water_model     = NONE
 vdw_force_switch = YES
 
 [DYNAMICS]
 integrator      = LEAP          # [LEAP,VVER]
 timestep        = 0.001         # timestep (ps)
 nsteps          = 1000000        # number of MD steps
 crdout_period   = 5000
 eneout_period   = 1000          # energy output period
 rstout_period   = 1000      
 nbupdate_period = 10
  
 [CONSTRAINTS]
 rigid_bond      = YES           # constraints all bonds involving hydrogen
 fast_water      = YES         
 shake_tolerance = 1.0D-10
 
 [ENSEMBLE]
 ensemble        = NPT           # [NVE,NVT,NPT]
 tpcontrol       = LANGEVIN      # thermostat and barostat
 temperature     = 300
 gamma_t         = 1.0
 pressure = 1.0
 isotropy = ISO
 
 [BOUNDARY]
 type            = PBC           # [PBC]
 
 [SELECTION]
 group1          = (sid:CARB) and not hydrogen
 
 [RESTRAINTS]
 nfunctions      = 1
 
 function1       = POSI
 constant1       = 1.0
 select_index1   = 1

For the [OUTPUT] section, we specified the generated files are NPT ensemble. We also modify the [ENSEMBLE] section. We set the ensemble to NPT and define the target pressure to 1.0 atm. We also set isotropy to ISO to couple the XYZ dimensions.

Perform the equilibration using NPT ensemble.

mpirun -np 8 ${bindir}/spdyn step4.1_equilibrationNPT.inp > step4.1_equilibrationNPT.log

Again, confirm the calculation proceeded as expected. Extract the relevant information from the log file, and visualize relevant variables.

# save the log to csv
grep 'INFO:' step4.1_equilibrationNPT.log | awk '{print $3,$16,$17,$18,$22}' | sed 's/ /,/g' > results/step4.1_equilibrationNPT.csv

head results/step4.1_equilibrationNPT.csv
output
 ##TIME,TEMPERATURE,VOLUME,BOXX,PRESSURE
 0.0000,300.2294,300763.0000,67.0000,-782.1020
 1.0000,242.9415,281611.6217,65.5466,95.1258
 2.0000,273.5984,281125.2820,65.5088,110.5847
 3.0000,282.1065,279581.6117,65.3887,110.2059
 4.0000,289.9704,279243.7750,65.3624,-0.6163
 5.0000,292.7471,281674.2495,65.5515,-211.6280
 6.0000,300.1268,281377.5038,65.5284,39.0473
 7.0000,299.6401,282233.8679,65.5948,9.3761
 8.0000,299.3061,282535.8499,65.6182,167.9042
equilibrationNPT = pd.read_csv('results/step4.1_equilibrationNPT.csv')

# define the figure, plot a line from the data, and show figure
plt.figure()
fig = sns.lineplot(data = equilibrationNPT, x = 'TIME', y = 'PRESSURE', lw= 0.5)
# label the axes
fig.set(xlabel ="time (ps)", ylabel = "pressure (atm)")
# add line to reference pressure
fig.axhline(1.0, alpha = 0.5, linestyle ='--')

plt.show()
plot

The pressure is initially very negative. It approaches 1.0atm and oscillates around the target value. The pressure is maintained by adjusting the volume of the system. Observe the change over time of the edge length of the simulation box.

plt.figure()
fig = sns.lineplot(data = equilibrationNPT, x = 'TIME', y = 'BOXX', lw= 0.5)
fig.set(xlabel ="time (ps)", ylabel = "edge length (Å)")
plt.show()
plot

We observe the edge length approaches a value between 65.5Å and 66.0Å over time. While the fluctuation in the pressure may appear large, we observe the change in volume to maintain the pressure is quite small. Let us also confirm the system has approached the target temperature of 300. K.

plt.figure()
fig = sns.lineplot(data = equilibrationNPT, x = 'TIME', y = 'TEMPERATURE', lw= 0.5)
fig.set(xlabel ="time (ps)", ylabel = "temperature (K)")
fig.axhline(300., alpha = 0.5, linestyle ='--')
plt.show()
plot

NVT Equilibration

Let us take the last frame from the NPT equilibration and use it as the input for the NVT equilibration. The control file, step4.2_equilibrationNVT.inp, was written using step4.1_equilibration.inp as the template. Let us view the control file.

more step4.2_equilibrationNVT.inp
output
 [INPUT]
 topfile = ../toppar/top_all36_prot.rtf, ../toppar/top_all36_na.rtf, ../toppar/top_all36_carb.rtf, ../toppar/top_all36_lipid.rtf, ../toppar/top_all36_cgenff.rtf, ../toppar/top_interface.rtf 
 parfile = ../toppar/par_all36m_prot.prm, ../toppar/par_all36_na.prm, ../toppar/par_all36_carb.prm, ../toppar/par_all36_lipid.prm, ../toppar/par_all36_cgenff.prm, ../toppar/par_interface.prm 
 strfile = ../toppar/toppar_all36_moreions.str, ../toppar/toppar_all36_nano_lig.str, ../toppar/toppar_all36_nano_lig_patch.str, ../toppar/toppar_all36_synthetic_polymer.str, ../toppar/toppar_all36_synthetic_polymer_patch.str, ../toppar/toppar_all36_polymer_solvent.str, ../toppar/toppar_water_ions.str, ../toppar/toppar_dum_noble_gases.str, ../toppar/toppar_ions_won.str, ../toppar/toppar_all36_prot_arg0.str, ../toppar/toppar_all36_prot_c36m_d_aminoacids.str, ../toppar/toppar_all36_prot_fluoro_alkanes.str, ../toppar/toppar_all36_prot_heme.str, ../toppar/toppar_all36_prot_na_combined.str, ../toppar/toppar_all36_prot_retinol.str, ../toppar/toppar_all36_prot_model.str, ../toppar/toppar_all36_prot_modify_res.str, ../toppar/toppar_all36_na_nad_ppi.str, ../toppar/toppar_all36_na_rna_modified.str, ../toppar/toppar_all36_lipid_sphingo.str, ../toppar/toppar_all36_lipid_archaeal.str, ../toppar/toppar_all36_lipid_bacterial.str, ../toppar/toppar_all36_lipid_cardiolipin.str, ../toppar/toppar_all36_lipid_cholesterol.str, ../toppar/toppar_all36_lipid_dag.str, ../toppar/toppar_all36_lipid_inositol.str, ../toppar/toppar_all36_lipid_lnp.str, ../toppar/toppar_all36_lipid_lps.str, ../toppar/toppar_all36_lipid_mycobacterial.str, ../toppar/toppar_all36_lipid_miscellaneous.str, ../toppar/toppar_all36_lipid_model.str, ../toppar/toppar_all36_lipid_prot.str, ../toppar/toppar_all36_lipid_tag.str, ../toppar/toppar_all36_lipid_yeast.str, ../toppar/toppar_all36_lipid_hmmm.str, ../toppar/toppar_all36_lipid_detergent.str, ../toppar/toppar_all36_lipid_ether.str, ../toppar/toppar_all36_carb_glycolipid.str, ../toppar/toppar_all36_carb_glycopeptide.str, ../toppar/toppar_all36_carb_imlab.str, ../toppar/toppar_all36_label_spin.str, ../toppar/toppar_all36_label_fluorophore.str 
 
 psffile = step3_input.psf                 # protein structure file
 pdbfile = step3_input.pdb                 # PDB file
 reffile = step3_input.pdb
 rstfile = step4.1_equilibrationNPT.rst              # restart file
 localresfile = restraints/step4.1_equilibration.dihe
  
 [OUTPUT]
 rstfile = step4.2_equilibrationNVT.rst
 dcdfile = step4.2_equilibrationNVT.dcd
 
 [ENERGY]
 forcefield      = CHARMM        # [CHARMM]
 electrostatic   = PME           # [CUTOFF,PME]
 switchdist      = 10.0          # switch distance
 cutoffdist      = 12.0          # cutoff distance
 pairlistdist    = 13.5          # pair-list distance
 pme_nspline     = 4
 water_model     = NONE
 vdw_force_switch = YES
 
 [DYNAMICS]
 integrator      = VRES          # [LEAP,VVER,VRES]
 timestep        = 0.0025         # timestep (ps)
 nsteps          = 400000        # number of MD steps
 crdout_period   = 5000
 eneout_period   = 1000          # energy output period
 rstout_period   = 1000      
 nbupdate_period = 10
  
 [CONSTRAINTS]
 rigid_bond      = YES           # constraints all bonds involving hydrogen
 fast_water      = YES         
 shake_tolerance = 1.0D-10
 
 [ENSEMBLE]
 ensemble        = NVT           # [NVE,NVT,NPT]
 tpcontrol       = BUSSI      # thermostat and barostat
 temperature     = 300
 gamma_t         = 1.0
 pressure = 1.0
 isotropy = ISO
 group_tp        = YES  # usage of group tempeature and pressure
 
 [BOUNDARY]
 type            = PBC           # [PBC]
 
 [SELECTION]
 group1          = (sid:CARB) and not hydrogen
 
 [RESTRAINTS]
 nfunctions      = 1
 
 function1       = POSI
 constant1       = 1.0
 select_index1   = 1

We modified the file I/O, and changed the ensemble to NVT. Run the equilibration.

mpirun -np 8 ${bindir}/spdyn step4.2_equilibrationNVT.inp > step4.2_equilibrationNVT.log

Confirm the simulation proceeded as expected. Confirm the temperature and pressure are stable over time. There should be no change to volume.

# save the log to csv
grep 'INFO:' step4.2_equilibrationNVT.log | awk '{print $3,$16,$17}' | sed 's/ /,/g' > results/step4.2_equilibrationNVT.csv

head results/step4.2_equilibrationNVT.csv
output
 ##TIME,TEMPERATURE,VOLUME
 1.0000,298.4705,283222.3930
 2.0000,299.3117,283222.3930
 3.0000,298.7946,283222.3930
 4.0000,299.0869,283222.3930
 5.0000,300.4319,283222.3930
 6.0000,298.4922,283222.3930
 7.0000,301.1420,283222.3930
 8.0000,303.3589,283222.3930
 9.0000,301.2478,283222.3930
equilibrationNVT = pd.read_csv('results/step4.2_equilibrationNVT.csv')

plt.figure()
fig = sns.lineplot(data = equilibrationNVT, x = 'TIME', y = 'TEMPERATURE', lw= 0.5)
fig.set(xlabel ="time (ps)", ylabel = "temperature (K)")
fig.axhline(300., alpha = 0.5, linestyle ='--')
plt.show()
plot

plt.figure()
fig = sns.lineplot(data = equilibrationNVT, x = 'TIME', y = 'VOLUME', lw= 0.5)
fig.set(xlabel ="time (ps)", ylabel = "volume (Å^3)")
plt.show()
plot

Production

The N-glycan is now ready for a productive run. Let us simulate the system at 300K for 1.0ns using step size of 2.5fs. The input script is as follows.

more step5.0_productionNVT.inp
output
 [INPUT]
 topfile = ../toppar/top_all36_prot.rtf, ../toppar/top_all36_na.rtf, ../toppar/top_all36_carb.rtf, ../toppar/top_all36_lipid.rtf, ../toppar/top_all36_cgenff.rtf, ../toppar/top_interface.rtf 
 parfile = ../toppar/par_all36m_prot.prm, ../toppar/par_all36_na.prm, ../toppar/par_all36_carb.prm, ../toppar/par_all36_lipid.prm, ../toppar/par_all36_cgenff.prm, ../toppar/par_interface.prm 
 strfile = ../toppar/toppar_all36_moreions.str, ../toppar/toppar_all36_nano_lig.str, ../toppar/toppar_all36_nano_lig_patch.str, ../toppar/toppar_all36_synthetic_polymer.str, ../toppar/toppar_all36_synthetic_polymer_patch.str, ../toppar/toppar_all36_polymer_solvent.str, ../toppar/toppar_water_ions.str, ../toppar/toppar_dum_noble_gases.str, ../toppar/toppar_ions_won.str, ../toppar/toppar_all36_prot_arg0.str, ../toppar/toppar_all36_prot_c36m_d_aminoacids.str, ../toppar/toppar_all36_prot_fluoro_alkanes.str, ../toppar/toppar_all36_prot_heme.str, ../toppar/toppar_all36_prot_na_combined.str, ../toppar/toppar_all36_prot_retinol.str, ../toppar/toppar_all36_prot_model.str, ../toppar/toppar_all36_prot_modify_res.str, ../toppar/toppar_all36_na_nad_ppi.str, ../toppar/toppar_all36_na_rna_modified.str, ../toppar/toppar_all36_lipid_sphingo.str, ../toppar/toppar_all36_lipid_archaeal.str, ../toppar/toppar_all36_lipid_bacterial.str, ../toppar/toppar_all36_lipid_cardiolipin.str, ../toppar/toppar_all36_lipid_cholesterol.str, ../toppar/toppar_all36_lipid_dag.str, ../toppar/toppar_all36_lipid_inositol.str, ../toppar/toppar_all36_lipid_lnp.str, ../toppar/toppar_all36_lipid_lps.str, ../toppar/toppar_all36_lipid_mycobacterial.str, ../toppar/toppar_all36_lipid_miscellaneous.str, ../toppar/toppar_all36_lipid_model.str, ../toppar/toppar_all36_lipid_prot.str, ../toppar/toppar_all36_lipid_tag.str, ../toppar/toppar_all36_lipid_yeast.str, ../toppar/toppar_all36_lipid_hmmm.str, ../toppar/toppar_all36_lipid_detergent.str, ../toppar/toppar_all36_lipid_ether.str, ../toppar/toppar_all36_carb_glycolipid.str, ../toppar/toppar_all36_carb_glycopeptide.str, ../toppar/toppar_all36_carb_imlab.str, ../toppar/toppar_all36_label_spin.str, ../toppar/toppar_all36_label_fluorophore.str 
 
 psffile = step3_input.psf                 # protein structure file
 pdbfile = step3_input.pdb                 # PDB file
 reffile = step3_input.pdb
 rstfile = step4.3_equilibrationNVT_noRestraint.rst       # restart file
  
 [OUTPUT]
 rstfile = step5.0_productionNVT.rst
 dcdfile = step5.0_productionNVT.dcd
 
 [ENERGY]
 forcefield      = CHARMM        # [CHARMM]
 electrostatic   = PME           # [CUTOFF,PME]
 switchdist      = 10.0          # switch distance
 cutoffdist      = 12.0          # cutoff distance
 pairlistdist    = 13.5          # pair-list distance
 pme_nspline     = 4
 water_model     = NONE
 vdw_force_switch = YES
 
 [DYNAMICS]
 integrator      = VRES          # [LEAP,VVER]
 timestep        = 0.0025         # timestep (ps)
 nsteps          = 400000        # number of MD steps
 crdout_period   = 5000
 eneout_period   = 1000          # energy output period
 rstout_period   = 1000      
 nbupdate_period = 10
 
 [CONSTRAINTS]
 rigid_bond      = YES           # constraints all bonds involving hydrogen
 fast_water      = YES         
 shake_tolerance = 1.0D-10
 
 [ENSEMBLE]
 ensemble        = NVT           # [NVE,NVT,NPT]
 tpcontrol       = BUSSI      # thermostat and barostat
 temperature     = 300
 group_tp        = YES
 
 [BOUNDARY]
 type            = PBC           # [PBC]
 
 [SELECTION]
 group1          = (sid:CARB) and not hydrogen

Submit the calculation

mpirun -np 8 {params.bin_path}spdyn step5.0_productionNVT.inp > step5.0_productionNVT.log

Analysis

Let us confirm the simulation was performed at the target temperature of 300.K. Extract the temperature and time from the log file.

# save the log to csv
grep 'INFO:' step5.0_productionNVT.log | awk '{print $3,$16}' | sed 's/ /,/g' > results/step5.0_productionNVT.csv

head results/step5.0_productionNVT.csv
output
 ##TIME,TEMPERATURE
 2.0000,298.4089
 4.0000,301.4387
 6.0000,304.3761
 8.0000,303.9433
 10.0000,298.7027
 12.0000,299.7664
 14.0000,299.7396
 16.0000,299.1172
 18.0000,298.2199
productionNVT = pd.read_csv('results/step5.0_productionNVT.csv')

plt.figure()
fig = sns.lineplot(data = productionNVT, x = 'TIME', y = 'TEMPERATURE', lw= 0.5)
fig.set(xlabel ="time (ps)", ylabel = "temperature (K)")
fig.axhline(300., alpha = 0.5, linestyle ='--')
plt.show()
plot

As expected, the trajectory fluctuates near the target temperature of 300.K. Let us also visualize the trajectory.

It may be useful to record the number of intramolecular hydrogen bonds over time, as a change in conformation must overcome existing hydrogen bonds to produce a new set of hydrogen bonds. We use the hbond_analysis tool in GENESIS to perform this. Let us take a look at the input file.

more step6.0_hbondAnalysis.inp
output
 # control parameters in hbond_analysis
  
 [INPUT]
 psffile        = step3_input.psf       # protein structure file
 pdbfile        = step3_input.pdb       # PDB file
 
 [OUTPUT]
 txtfile        = step6.0_hbondAnalysis/hbond_analysis.txt      # text file
 hb_listfile  = step6.0_hbondAnalysis/hbond_analysis_().hb_list    # parallel-IO H-bond list file
  
 [TRAJECTORY]
 trjfile1       = step5.0_productionNVT.dcd      # trajectory file
 md_step1       = 500000               # number of MD steps
 mdout_period1  = 5000              # MD output period
 # ana_period1    = 1               # analysis period
 # repeat1        = 1
 trj_format     = DCD             # (PDB/DCD)
 trj_type       = COOR+BOX        # (COOR/COOR+BOX)
 trj_natom      = 0               # (0:uses reference PDB atom count)
  
 [ENSEMBLE]
 ensemble     = NVT             # (NVT/NPT/NVE)
  
 [BOUNDARY]
 type          = PBC       # [PBC]
 box_size_x    = 0.0       # box size (x) in [PBC]
 box_size_y    = 0.0       # box size (y) in [PBC]
 box_size_z    = 0.0       # box size (z) in [PBC]
 domain_x      = 2         # domain size (x)
 domain_y      = 2         # domain size (y)
 domain_z      = 2         # domain size (z)
 num_cells_x   = 8         # number of cells (x)
 num_cells_y   = 8         # number of cells (x)
 num_cells_z   = 8         # number of cells (x)
 
  
 [SELECTION]
  group1         = atomno:1-246             # selection group 1
  group2         = atomno:1-246 # selection group 2
 # mole_name1     = protein  P1:1:TYR P1:5:MET
 # mole_name2     = lipid    OLEO:PCGL:OLEO
  
 [SPANA_OPTION]
 # wrap        = no         # wrap trajectories
 buffer        = 10
 box_size      = TRAJECTORY # (TRAJECTORY / MAX / MANUAL)
 
 [HBOND_OPTION]
 recenter      = 1                # recenter
 output_type   = Count_Atom       # (Count_snap / Count_Atom)
 analysis_atom = 1                # atom group for searching H-bond partners
 target_atom   = 2                # serach the HB partners from this atom group
 solvent_list  = TIP3 POPC        # molecule names treated as solvent (only for count_atom)
 HB_distance = 3.4              # the upper limit of (D .. A) distance (default: 3.4 A)
 DHA_angle   = 120.0            # the lower limit of (D-H .. A) angle (default: 120.0 deg)
 HDA_angle   =  20.0            # the upper limit of (H-D .. A) angle (default:  20.0 deg)

Under “[SELECTION]”, we provide the atom indices corresponding only to the N-glycan molecule as group 1 & 2, not solvent molecules. Therefore, we will only record intramolecular hydrogen bonds. Under “[HBOND_OPTION]”, we specify the output_type to be Count_Atom. This counts the hydrogen bonds observed between a particular pair of atoms. Count_snap, on the other hand, will record the number of hydrogen bonds in a given snapshot.

Let us run the hbond_analysis binary with the input file.

mkdir step6.0_hbondAnalysis

mpirun -np 8 ../../../bin/hbond_analysis step6.0_hbondAnalysis.inp > step6.0_hbondAnalysis.log

The output text file contains the number of hydrogen bonds observed. Let us extract the information for analysis.

# skip lines with comments, and get the lines containing data
# save to CSV
echo 'H-bond_count,atom_ana,residue_ana,res_ana_num,atom_tar,residue_tar,res_tar_num' > step6.0_hbondAnalysis.csv

grep -v '#' step6.0_hbondAnalysis/hbond_analysis.txt | grep '\.\.' | awk '{print $1,$3,$4,$5,$8,$9,$10}' | sed 's/ /,/g' >> step6.0_hbondAnalysis.csv

head step6.0_hbondAnalysis.csv
output
 ##H-bond_count,atom_ana,residue_ana,res_ana_num,atom_tar,residue_tar,res_tar_num
 4,O,BGLCNA,1,O3,BGLCNA,1
 4,O3,BGLCNA,1,O,BGLCNA,1
 4,O3,BGLCNA,1,O5,BGLCNA,2
 1,O3,BGLCNA,1,O6,BGLCNA,2
 4,O5,BGLCNA,2,O3,BGLCNA,1
 2,O,BGLCNA,2,O3,BGLCNA,2
 3,O,BGLCNA,2,O6,AMAN,8
 2,O3,BGLCNA,2,O,BGLCNA,2
 11,O3,BGLCNA,2,O5,BMAN,3

While the hydrogen bond count over the trajectory is given by atom, let us get the values by glycan subunit. Read the data to a DataFrame and record the hydrogen bond count by subunit.

df = pd.read_csv('step6.0_hbondAnalysis.csv')
# sum the number of hydrogen bonds by interacting residues
df = df.groupby(by = ['res_ana_num', 'res_tar_num'], as_index = False)['H-bond_count'].agg('sum')
# convert the DataFrame from long to wide format
df_wide = df.pivot_table(index='res_ana_num', columns='res_tar_num', values='H-bond_count')
# Also, there are 100 frames considered. Let us get the average number 
# of hydrogen bonds between residues by dividing by the number of frames.
df_wide /= 100.

plt.figure()
fig = sns.heatmap(data = df_wide)
fig.set(xlabel ="subunit index", ylabel = "subunit index")
plt.show()
plot

We observe each subunit engage in some degree of hydrogen bonding interactions over the trajectory. The subunits 3 (β-D-mannose) exhibited the highest hydrogen bonding count on average, having 0.44 bond with subunit 2 (β-D-N-acetylglucosamine) and 0.38 bond with subunit 10 (α-D-mannose).

Summary

We simulated a core N-glycan in water for 1.0 ns. The system was equilibrated to the expected temperature and pressure. We may pursue a longer simulation time or enhanced sampling methods to ensure the conformational space has been sufficiently explored.

Bibliography

(1) Schwarz, F.; Aebi, M. Mechanisms and Principles of n-Linked Protein Glycosylation. Current opinion in structural biology 2011, 21 (5), 576–582.
(2) Apweiler, R.; Hermjakob, H.; Sharon, N. On the Frequency of Protein Glycosylation, as Deduced from Analysis of the SWISS-PROT Database. Biochimica et Biophysica Acta (BBA)-General Subjects 1999, 1473 (1), 4–8.
(3) Dwek, R. A. Biological Importance of Glycosylation. In Molecular recognition and inclusion; Springer, 1998; pp 1–6.
(4) Krasnova, L.; Wong, C.-H. Understanding the Chemistry and Biology of Glycosylation with Glycan Synthesis. Annual review of biochemistry 2016, 85, 599–630.
(5) Park, S.-J.; Lee, J.; Qi, Y.; Kern, N. R.; Lee, H. S.; Jo, S.; Joung, I.; Joo, K.; Lee, J.; Im, W. CHARMM-GUI Glycan Modeler for Modeling and Simulation of Carbohydrates and Glycoconjugates. Glycobiology 2019, 29 (4), 320–331.
(6) McKinney, W.; others. Data Structures for Statistical Computing in Python. In Proceedings of the 9th python in science conference; Austin, TX, 2010; Vol. 445, pp 51–56.
(7) Waskom, M.; Botvinnik, O.; O’Kane, D.; Hobson, P.; Lukauskas, S.; Gemperline, D. C.; Augspurger, T.; Halchenko, Y.; Cole, J. B.; Warmenhoven, J.; Ruiter, J. de; Pye, C.; Hoyer, S.; Vanderplas, J.; Villalba, S.; Kunter, G.; Quintero, E.; Bachant, P.; Martin, M.; Meyer, K.; Miles, A.; Ram, Y.; Yarkoni, T.; Williams, M. L.; Evans, C.; Fitzgerald, C.; Brian; Fonnesbeck, C.; Lee, A.; Qalieh, A. Mwaskom/Seaborn: V0.8.1 (September 2017); Zenodo, 2017. https://doi.org/10.5281/zenodo.883859.

Written by Kiyoto Aramis Tanemura @ Michigan State University
June 18, 2022
Updated by Chigusa Kobayashi@RIKEN Center for Computational Science
May 30, 2024