### 16.4 Vibrational analysis of a phosphate ion in solution

#### 1. Introduction

In this section, we will calculate the infrared (IR) spectrum of a phosphate ion (H2PO4) in water solution using QM/MM. This system has been studied by QM/MM [1, 2] and ab initio MD . These works have reported strong solvation effects on the IR spectrum, i.e., there is a large spectral shift in the spectrum between the gas-phase and solution due to strong hydrogen bonds (HBs) of H2PO4 and water molecules. The treatment of HB interaction is of essential importance to reproduce the experimental IR spectrum accurately.

Here, we carry out QM/MM calculations, in which the solute is treated by DFT at the level of B3LYP/cc-pVDZ and the water molecules by TIP3P force field. The QM calculations are performed using Gaussian16. The IR spectrum is obtained by solving the vibrational Schrödinger equation (VSE),

$$\displaystyle \left[ -\frac{1}{2} \sum_{i=1}^f \frac{\partial^2}{\partial Q_i^2} + V(\mathbf{Q}) \right]\Psi_n(\mathbf{Q}) = E_n \Psi_n(\mathbf{Q}),$$

where $$f$$ is the number of degrees of freedom, $$\mathbf{Q} = (Q_1, Q_2, \cdots, Q_f)$$ is the vibrational modes, and $$V(\mathbf{Q})$$ is the potential energy surface (PES). In the harmonic approximation, the PES is truncated at the second-order Taylor expansion around the equilibrium geometry,

$$\displaystyle V(\mathbf{Q}) \simeq V_0 + \sum_{i=1}^f c_{ii} Q_i^2,$$

where the analytical solution to the VSE can be obtained. We also demonstrate anharmonic vibrational calculations, which incorporate the third- and higher-order terms of the PES. The latter is performed by combining GENESIS with SINDO program. SINDO is used to generate the anharmonic potential  and to solve the VSE by the second-order vibrational quasi-degenerate perturbation theory (VQDPT2) [5,6].

#### 2. Preparation

Download the tutorial file (tutorial19-16.4.zip or github), unzip it, and proceed to tutorial-16.4. This directory contains nine sub-directories.

$unzip tutorial19-16.4.zip$ cd tutorial-16.4
$ls 0_qmmm_md/ 1_setup/ 2_minimize/ 3_harmonic/ 4_qff/ 5_grid/ 6_vib/ sindo/ toppar/ 0.qmmm_md contains input files to setup H2PO4 in a water box and to perform QM/MM-MD simulations using DFTB3. The simulation can be performed in the same way as in Section 16.2. Although we just leave the input files and don’t go into the details here, interested readers are encouraged to try them out. 1.setup contains the outcome of the QM/MM-MD simulation, $ ls 1_setup
cmd_100.crd    cmd_100.pdb    cmd_100.psf    qmmm_prod.rst

cmd_100.pdb/psf is H2PO4 in a water sphere of 40 Å diameter obtained from a MD snapshot using qmmm_generator, and qmmm_prod.rst is a restart file of the QM/MM-MD simulation. toppar is already included in the zip file,

$ls toppar po4.prm po4.rtf po4.rtf/prm contain the topology and parameter of H2PO4 and TIP3P water. #### 3. Geometry optimization Let’s proceed to 2_minimize, $ cd 2_minimize
$ls gaussian.com minimize.gpi qmmm_min.inp run.sh runGau.sh gaussian.com is a template to generate an input file of Gaussian, and runGau.sh is a script to invoke Gaussian from GENESIS. qmmm_min.inp is a control file of GENESIS, and run.sh is a script to run GENESIS. We will look into these files in the following. ##### (a) The template file of Gaussian Important keywords of gaussisan.com are listed below. %chk=gaussian.chk (1) ... scf(conver=8) iop(4/5=100) NoSymm Charge Force Prop=(Field,Read) pop=mk (2) ... #coordinate# (3) #charge# (4) #elec_field# (5) (1) The name of a checkpoint file must be “gaussian.chk”. Never remove or change this line. (2) These are keywords for Gaussian. scf(conver=8) runs the SCF calculation with a convergence criterion of 10-8 Hartree. The threshold is reduced by 2 upon restart. iop(4/5=100) is a directive to read molecular orbitals (MO) from a checkpoint file if present. NoSymm prevents reorientation of axis. Charge reads MM coordinates and charges. Force calculates the force in addition to the energy. Prop=(Field,Read) calculates the electric field at given positions. pop=mk calculates the ESP charge of QM atoms obtained by fitting the electrostatic potential generated by the electron density to that of atomic charges. Note that this option must be present if macro=YES in [MINIMIZE] (see below). (3), (4), (5) are replaced by the coordinates of QM atoms, the coordinates and charge of MM atoms, and the coordinates of MM atoms, respectively, at runtime by GENESIS. See Appendix A for further details on the input options. ##### (b) The script file to run Gaussian The first part of runGau.sh is shown below. The text in red must be properly set by users, export g16root=/path/to/gaussian (1) export GAUSS_EXEDIR=$g16root/g16
export GAUSS_EXEBIN=$g16root/g16/g16 export PATH=$PATH:$GAUSS_EXEDIR export LD_LIBRARY_PATH=${GAUSS_EXEDIR}:${LD_LIBRARY_PATH} # --- Set the path for a scratch folder --- scratch=/scr/$USER                      (2)
# (optional)
# --- Set a chkpoint file to read initial MOs ---
#initialchk='../path/to/initial.chk'    (3)

(1) Set a path to a folder where Gaussian is installed.

(2) Set a path to a scratch folder. If your machine has a fast, local disk equipped, it is better to use it as a scratch folder. Guassian (and most QM programs) writes a large intermediate file, so that the speed of disk access may significantly affect the performance.

(3) Optionally, set a checkpoint file to read initial MO. (See Section 4)

Note that GENESIS calls the script in the following way,

$./run.sh The convergence of optimization is checked by the magnitude of root-mean-square gradient (RMSG) and the maximum gradient (MAXG). The tolerance is set to be 0.36 and 0.54 kcal mol-1 Å-1 by default. These values can be changed by tol_rmsg and tol_maxg in the [MINIMIZE] section. In this example, the convergence will reach in around 50 cycles. The following message appears in the end of the log file when the convergence is reached.  >>>>> STOP: RMSG and MAXG becomes sufficiently small Let us plot the variation of the potential energy and MAXG as a function of step number. > grep INFO qmmm_min.out > minimize.log 2>&1 > gnuplot minimize.gpi Fig. 1. The variation of the potential energy and maximum gradient (MAXG) as a function of minimization steps. Because the variation in the potential energy is huge in the first few steps, you may adjust the range of y-axis by modifying minimize.gpi, > vi minimize.gpi ... set yrange [-418535:-418500] ... > gnuplot minimize.gpi Fig. 2. Same as Figure 1, but with a different range in the y-axis. You can see that both energy and MAXG become small along the step. Note that the energy is converged with an accuracy of 1 kcal mol-1 around the 30-th step, while it takes 20 more steps to converge MAXG. A similar behavior is often seen for geometric parameters. #### 4. Harmonic vibrational analysis Now, we carry out the harmonic vibrational analysis. In this calculation, the Hessian matrix of a partial system is calculated by numerical differentiations of the gradient. It requires the gradient at 6N + 1 grid points, where N is the number of atoms of the target molecule. The calculation is parallelized over the grid points, i.e., multiple QM calculations run simultaneously for different grid points. Furthermore, we use the molecular orbitals (MOs) obtained at the equilibrium geometry for the initial MOs of SCF calculations at each grid point. The restart of MOs significantly speed up the SCF convergence. $ cd 3_harmonic
$ls gaussian.com plotIR.gpi plotIR.sh qmmm_harm.inp run.sh runGau.sh qmmm_harm.inp is the control file of GENESIS. [INPUT] and [OUTPUT] sections look like this: [INPUT] topfile = ../toppar/po4.rtf # topology file parfile = ../toppar/po4.prm # parameter file psffile = ../1_setup/cmd_100.psf # protein structure file pdbfile = ../1_setup/cmd_100.pdb # PDB file rstfile = ../2_minimize/qmmm_min.rst (1) [OUTPUT] minfofile = qmmm_harm.minfo (2) (1) restarts from the optimized structure obtained in 2_minimize. (2) minfofile is an output file, where the normal modes and harmonic frequencies are written. Instead of [MINIMIZE], we now have a new section, [VIBRATION]. [VIBRATION] runmode = HARM (1) nreplica = 2 (2) vibatm_select_index = 3 (3) output_minfo_atm = 4 (4) [SELECTION] ... group3 = sid:PO4 (5) group4 = sid:PO4 around_res:3.0 (6)  (1) sets the calculation to the harmonic analysis. (2) specifies the number of QM (=MPI) processes to run in parallel. (3) and (5) set the target molecule of vibrational calculations to H2PO4. (4) and (6) set the atoms printed to the minfo file to H2PO4 and water molecules within 3.0 Å of H2PO4. gaussian.com is the same as the previous one in 2_minimize. runGau.sh is also the same except that we use the MOs of the last run for the initial MOs, # --- Set a chkpoint file to read initial MOs --- initialchk='../2_minimize/qmmm_min.0/gaussian.chk' In run.sh, we specify 2 MPI processes. Note that the number of MPI processes must be the same or multiples of nreplica in [VIBRATION]. Assuming that we use 1 node with 16-core CPU, export QM_NUM_THREADS=8 export OMP_NUM_THREADS=8 # 1) Open MPI # mpirun -np 2 --map-by node:pe=${QM_NUM_THREADS} $GENESIS qmmm_harm.inp > qmmm_harm.out 2>&1 # 2) Intel MPI # #mpirun -np 2 -ppn 2$GENESIS qmmm_harm.inp > qmmm_harm.out 2>$1 With this setting, we will have 2 MPI processes running in a node. Let’s run the program, $ chmod +x run.sh
$./run.sh In the output file (qmmm_harm.out), we can check the atoms selected for vibrational analysis, Enter vibrational analysis Cartesian coordinates of atoms for vib. analysis 1 1 PO4 1 PO42 P -5.0216935451 -10.7075582221 -5.3890709445 2 2 PO4 1 PO42 O1 -5.6128257332 -10.8802681545 -3.8550956045 3 3 PO4 1 PO42 H1 -5.4714433764 -11.7843520102 -3.5086043226 4 4 PO4 1 PO42 O2 -5.7403129966 -9.2977042064 -5.7845382483 5 5 PO4 1 PO42 H2 -6.7222316005 -9.3549834289 -5.7006580436 6 6 PO4 1 PO42 O3 -3.5100721734 -10.4488167274 -5.4038611074 7 7 PO4 1 PO42 O4 -5.5363941548 -11.8686855373 -6.2753623567 Then, we see the iteration over grid points, Generate Hessian matrix by num. diff. of gradients Loop over atoms Done for 0 input replicaID = 1 energy = -418534.145104 Done for 21 O2 +Y replicaID = 2 Done for 1 P +X replicaID = 1 Done for 22 O2 +Y replicaID = 2 Done for 2 P +X replicaID = 1 ... Done for 42 O4 +Z replicaID = 2 RMSD of the gradient at the input geometry = 0.198811D+00 [kcal mol-1 Angs-1]  It is a good practice to make sure that the gradient at the input geometry (= equilibrium geometry) is sufficiently small. You will see a warning message if the gradient is large. In that case, check if the input geometry (rstfile) is correct, if the preceding minimization is converged, and so on. Then, the harmonic frequencies (in cm-1), infrared (IR) intensity (in km mol-1), and normal displacement vectors are printed,  Harmonic frequencies and normal displacement vectors mode 1 2 3 4 5 omega 123.7917 155.1630 163.7271 194.8849 229.0114 IR int. 6.4719 5.9877 5.0536 5.2343 3.0750 1 P X 0.2251 -0.0904 -0.3468 -0.1093 -0.1671 1 P Y 0.0003 0.5412 0.0397 -0.2286 0.0281 1 P Z 0.3616 0.0503 0.3070 0.3317 -0.1136 The minfo file (qmmm_harm.minfo) can be visulalized using JSindo. See Appendix C how to download and install the program. JSindo is invoked by the following command (on your local computer) $ . ../sindo/sindo-4.0/sindovars.sh
$java JSindo Then, proceed to File → Open → Choose qmmm_harm.minfo. You will see the molecule show up on the screen. Then, goto Show → Vibrational Data. It will bring up a table of vibrational frequencies. By clicking one of the frequency, the corresponding vibrational mode is depicted in the molecular view. Fig. 3. Visualization of vibrational modes using JSindo. The IR spectrum can be plotted using HarmSpectrum tool in SINDO. plotIR.sh is a script to do this, $ cat plotIR.sh
#!/bin/bash

. ../sindo/sindo-4.0/sindovars.sh
java HarmSpectrum qmmm_harm.minfo 5 300 1800 1 > harmonic.spectrum
gnuplot plotIR.gpi

The arguments of HarmSpectrum are the name of minfo file, the width of Lorentz function, the range of plot (300 – 1800 cm-1), and the interval of the plot data (1 cm-1). By running the script,

$chmod +x plotIR.sh$ ./plotIR.sh
$ls plotIR.sh plotIR.gpi plotIR.pdf ...  we obtain a plot of the IR spectrum, plotIR.pdf, which looks like this, Fig. 4. IR spectrum of H2PO4– in solution obtained by the harmonic approximation. #### 5. Anharmonic vibrational calculations We now perform anharmonic vibrational analysis. Since this section requires SINDO program, first see Appendix C and setup the program if you haven’t done it yet. ##### (a) Generation of the anharmonic PES We first generate the anharmonic PES. The functional form of the PES is a combination of the quartic force field (QFF) and a one-dimensional (1D) grid PES. QFF reads, $$\displaystyle V^{\mathrm{QFF}}(\mathbf{Q}) = V_0 + \sum_{i=1}^f c_{ii} Q_i^2 + \sum_{i, j, k}^f c_{ijk} Q_iQ_jQ_k + \sum_{i, j, k,l}^f c_{ijkl} Q_iQ_jQ_kQ_l,$$ and the 1D-grid PES is, $$\displaystyle V^{\mathrm{1D-grid}} (\mathbf{Q}) = \sum_{i=1}^f V_i(Q_i)$$ where $$V_i(Q_i)$$ is a potential energy function along $$Q_i$$ in a descritized, grid representation. Proceed to 4_qff, $ cd 4_qff
$ls gaussian.com makePES.xml makePES1.sh* makePES2.sh* qmmm_qff.inp run.sh* runGau.sh* makePES1.sh is a script to invoke RunMakePES tool of SINDO. $ cat makePES1.sh
#!/bin/bash

. ../sindo/sindo-4.0/sindovars.sh
java RunMakePES -f makePES.xml > makePES.out1  2>&1

makePES.xml is the input file of RunMakePES to generate QFF. We don’t go into detail of this file here, but interested readers are referred to the Users’ guide ot SINDO. Run the script by typing,

$chmod +x makePES1.sh$ ./makePES1.sh
$ls gaussian.com makePES.xml makePES2.sh* qmmm_qff.inp runGau.sh* makePES.out1 makePES1.sh* makeQFF.xyz run.sh* The command creates makeQFF.xyz. This file contains coordinates of the target molecule, i.e., H2PO4, at grid points where the energy and gradient are needed. qmmm_qff.inp is the control file of GENESIS, which is the same as qmmm_harm.inp except for the [VIBRATION] section, [VIBRATION] runmode = QFF (1) nreplica = 2 vibatm_select_index = 3 gridfile = makeQFF.xyz (2) minfo_folder = minfo.files (3) (1) sets the calculation to QFF generation. (2) specifies the file that contains the coordinates of grid points. (3) specifies the directory where intermediate minfo files are stored. run.sh is a script to run GENESIS, which is the same as before. Now, let’s run the job, $ chmod +x run.sh
$./run.sh We can check the iteration over the grid points in the output, qmmm_qff.out,  Compute energy at grid points: minfo files created in [ minfo.files ] Done for mkqff-eq : replicaID = 2 Done for mkqff8-0 : replicaID = 1 Done for mkqff8-2 : replicaID = 1 Done for mkqff8-1 : replicaID = 2 ... and the information at each grid point is written to minfo.files/mkqff-*.minfo. $ ls minfo.files
mkqff10-0.minfo     mkqff14_8-0.minfo   mkqff17_13-3.minfo  mkqff19_15-3.minfo
mkqff10-1.minfo     mkqff14_8-1.minfo   mkqff17_14-0.minfo  mkqff19_16-0.minfo
...

When the GENESIS job is done, we invoke the RunMakePES tool again,

$cat makePES2.sh #!/bin/bash . ../sindo/sindo-4.0/sindovars.sh java RunMakePES -f makePES.xml >& makePES.out2$ chmod +x makePES2.sh
$./makePES2.sh$ ls
gaussian.com   makePES.xml    makePES.out1  makePES.out2  makeQFF.xyz_0
minfo.files/   prop_no_1.mop  qmmm_qff.0/   qmmm_qff.1/   qmmm_qff.inp
qmmm_qff.out   run.sh         runGau.sh

Although the command and the input file for RunMakePES are the same as before, the program retrieves the information from minfo.files/*.minfo, and prints the QFF coefficients to prop_no_1.mop.

The grid PES can be generated in the same manner. Proceed to 5_grid,

$cd 5_grid$ ls
gaussian_ene.com  makePES1.sh      qmmm_grid.inp     runGau.sh
makePES.xml       makePES2.sh      run.sh

qmmm_grid.inp is an input of GENESIS, in which the [VIBRATION] section looks like this,

[VIBRATION]
runmode             = GRID                (1)
nreplica            = 2
vibatm_select_index = 3
gridfile            = makeGrid.xyz        (2)
datafile            = makeGrid.dat        (3)

(1) sets the calculation to grid PES generation.

(2) specifies the file that contains the coordinates of grid points.

(3) specifies the file where the energy and dipole moment are written.

In the grid PES generation, we calculate the energy and dipole moment, but don’t need the gradient. Therefore, the template of Gaussian, gaussian_ene.com, doesn’t have the “force” keyword.

#P B3LYP/cc-pVDZ EmpiricalDispersion=GD3
scf(conver=8) iop(4/5=100) NoSymm
Charge  (Force is removed)

Now, let’s run the job.

$chmod +x run.sh makePES1.sh makePES2.sh$ ./makePES1.sh
$./run.sh$ ./makePES2.sh

The first command (makePES1.sh) writes the coordinates of grid points to makeGrid.xyz, and the second command (run.sh) invokes GENESIS. We can check the iteration over the grid points in the output, qmmm_grid.out,

 Compute energy at grid points: data written to [ makeGrid.dat ]
Done for          mkg-q9-11-0 :    replicaID =     1
Done for               mkg-eq :    replicaID =     2
Done for          mkg-q9-11-1 :    replicaID =     2...

and the energy and dipole is written to makeGrid.dat.

$ls minfo.files mkg-q9-11-0, -666.961159903, 7.591214970E+00, 2.097339590E+01, 1.154947680E+01 mkg-q9-11-2, -666.971926490, 7.601926380E+00, 2.088108250E+01, 1.158874200E+01 mkg-q9-11-4, -666.976106344, 7.612145000E+00, 2.080164720E+01, 1.161928860E+01 ... The last command (makePES2.sh) creates the grid PES and dipole moment surface (DMS) and stores the information to *.pot and *.dipole, respectively. $ ls *pot
eq.pot	 q11.pot  q13.pot  q15.pot  q17.pot  q19.pot  q21.pot
q10.pot  q12.pot  q14.pot  q16.pot  q18.pot  q20.pot  q9.pot
$ls *dipole eq.dipole q12.dipole q15.dipole q18.dipole q21.dipole q10.dipole q13.dipole q16.dipole q19.dipole q9.dipole q11.dipole q14.dipole q17.dipole q20.dipole ##### (b) VQDPT2 calculations Finally, we carry out VQDPT2 calculations and obtain the anharmonic spectrum. The PES is a composite of QFF and 1D-grid PES, $$\displaystyle V(\mathbf{Q}) = V^{\mathrm{QFF’}}(\mathbf{Q}) + V^{\mathrm{1D-grid}} (\mathbf{Q}),$$ where $$V^{\mathrm{QFF’}}$$ is a QFF without the 1-mode terms, $$\displaystyle V^{\mathrm{QFF’}}(\mathbf{Q}) = V^{\mathrm{QFF}} – \sum_{i=1}^f [ c_{ii} Q_i^2 + c_{iii} Q_i^3 + c_{iiii} Q_i^4].$$ Proceed to 6_vib, $ cd 6_vib
$ls plotIR.gpi run.sh vqdpt2.inp run.sh is a script to set the comosite PES and to run VQDPT2. The first part of the script looks like this, export POTDIR=./pes if [ -e${POTDIR} ]; then
rm -r ${POTDIR} fi mkdir${POTDIR}
cp ../4_qff/prop_no_1.mop ${POTDIR} cp ../5_grid/*.pot${POTDIR}
cp ../5_grid/*.dipole     ${POTDIR} The information of the PES in 4_qff and 5_grid is gathered to $POTDIR (pes). The composit PES above is specified in this way, becuase the grid PES takes higher priority than QFF. vqdpt2.inp is the input file to carry out VQDPT2 calculations. See SINDO website for the details about the input file.

Let’s run the program,

sindo < vqdpt2.inp > vqdpt2.out 2>&1

gnuplot plotIR.gpi

sindo creates, vqdpt-IR.spectrum, and gnuplot plots the resulting spectrum, which looks like this.

group2 = sid:PO4 around_res: 3.0 or sid:PO4