Prat-Resina's blog: The Simplest Potential of Mean Force using Molecular Dynamics

Xavier Prat-Resina

Show code

calcBarrier <- function(dens,init,endit){
  #this function calculates the barrier in a dens$log array
  #the dens array comes from the hist function, the log is calculated separately
  #init is the first angle and endit is the last angle
  firstindex = which.min( abs(dens$mids - init ) )
  lastindex = which.min( abs(dens$mids - endit ) )
  DE= max(dens$log[firstindex:lastindex]) - min(dens$log[firstindex:lastindex])
  return(DE)
}
op = function(x, d=2) sprintf(paste0("%1.",d,"f"), x)

1 What is this?

The goal of this post is exclusively pedagogical: learning to calculate PMFs in the simplest system for chemists. Some context here, you have a background in physical chemistry and computational chemistry but never really quite understood the meaning of free energy of reaction, aka Potential of Mean Force (PMF) and how to use molecular simulations to calculate it.

I want to give a very quick overview of tools from statistical thermodynamics to study reactions. There are lots of textbooks covering this material and there’s online tutorials with complex biological systems. But I wonder if I can cover the simplest of the systems (conformational rotation of ethane) with the simplest of the approaches (unbiased molecular dynamics).

For higher barriers and higher number of atoms you will have to use more sophisticated approaches to calculate the free energy paths (check this review). And that may be the main roadblock for students who want to understand it. You are told to use tools such as metadynamics or string-based replica exchange dynamics without really knowing why. You may feel like you are not grasping the fundamentals… or at least that’s how I felt back when I was a grad student.

2 Some theory: reaction paths, free energy, and molecular dynamics

In computational chemistry we are interested in identifying the molecular structures that participate in a chemical reaction, what we call a reaction mechanism or reaction path that connects reactants and products. Invoking the second law of thermodynamics (principle of minimum energy) we identify the lowest energy structures as the most stable and therefore the most likely path. Sometimes a good estimation is to calculate the potential energy of the system. Or at most you may just need to approximate your surface with a parabola (build a Hessian, diagonalize it, get the frequencies and calculate the harmonic vibrational partition function). You probably have run a gaussian freq calculation before?

But that won’t be the case if your system is wiggly and floppy (read here that your reactants is a collection of conformers or a rugged surface, not a single deep minimum) or your reaction requires a large rearrangement (high entropy change between reactants and transition state). That’s when you need to sample the phase space. The phase space is what we call all the positions and velocities available at a given temperature.

I’m going to throw academic rigor out of the window and I will simplify the equations saying that the free energy along a coordinate “r” can be obtained by calculating the density (\(\rho(r)\) : think of a histogram of probability) along that coordinate “r”

\[ G(r) = -RT ln(\rho(r)) \] In other words values of “r” that are more probable will show higher density and therefore lower free energy. In other words, this density is just a histogram of the values of your reaction coordinate during a molecular dynamics simulation.

I remember being confused about this result. How could it be that just a histogram of a coordinate could contain the precious free energy information. To answer that question we need to unpack this density function.

The density \(\rho(r)\) has in it all the thermal and stability information about the path:

\[ \rho(r) = \frac{\int{ \delta(q-r) e^{(-V(q)/RT)} dq} }{\int{ e^{(-V(q)/RT)} dq}} \] “V(q)” is the potential energy of the system, “q” is all the coordinates of our system and the symbol \(\delta(q-r)\) just Dirac’s delta function, which is just a way to say that we’re just counting states of a given value “r”.

Let me translate that into English. Because we are going to be running a molecular dynamics simulation at a given temperature, the states that the simulation visits are already following the Boltzmann factor \(e^{(-V(q)/RT)}\). This means that high energy states are less visited than low energy states, which is exactly what the integral is doing.

Notice that the integral is overall all positions. We hope we’ll visit all positions after some longish simulation. This is, simplified, the ergodic hypothesis, we will converge the integral (summation of states) that goes over all cases if we run the simulation over a long enough time. It is not exactly true that all of the states that we visit in a molecular dynamics run they all belong to the statistical ensemble. Two consecutive steps are too correlated and they probably overrepresenting that area of the phase space. This is why we will usually take measurements every “x” steps of a dynamics run. The longer “x” is the less correlated two measurements will be, but the longer the dynamics will need to be run. The objective of this strategy is to collect a true Markov chain.

For a useful discussion on ergodic hypothesis using molecular dynamics check this article.

3 Studying the rotation of ethane: Free energy

Disclaimer: The very concept of Temperature and some of the assumptions from Statistical Mechanics are not valid for a single molecule simulation. That being said, it is still conceptually useful to study a single molecule of ethane for how fast calculations are and how conceptually simple the system is.

We are a running a simple molecular dynamics simulations with the NAMD code. You may download the input and outputs used in this post here We will use VMD to load the trajectory file (dcd) and save the dihedral data. Assuming we’ve done all that, let the analysis begin!

3.1 Analysis of a 1ns trajectory

Show code

setwd("~/Teaching/Pchem/S18/Pchem_SharedFolder/10_Modeling/ethane")
dihed600 = read.table("./dihedral.txt")
dihed300 = read.table("./dihedral_300.txt")

Let’s plot how the dihedral evolves during a 1 ns simulation (1fs/step = 1E6 steps). We have saved the structure every 5 steps, which means we have 1E6/5 = 2E5 structures.

Show code

plot( dihed300$V2,type="l",main="Ethane dihedral @300K/1ns",ylab="Dihedral",xlab="Structure/5steps")

Show code

#legend("bottomleft",legend=c("Ethane Dihedral 300K"))

So, not much is happening, during most of the 1ns simulation, the dihedral remains at -60 and by structure 150000 it switches to +60 with some swing back and forth.

This same information can be obtained by calculating a histogram

Show code

hist(dihed300$V2,breaks=120,main="Ethane dihedral @300K/1ns",xlab="dihedral")

Show code

#legend("topright",legend=c("Ethane Dihedral 300K"))

This is a somewhat unexpected result. We would liked to see three peaks corresponding to three minima where the staggered conformation of ethane is most stable. The first conclusion that one may extract is that the dynamics has not explored all the available phase space given the 1ns simulation and therefore given that we have not converged the phase space integral we cannot calculate the free energy. So, we have two options if we want to explore it further:

Option 1 is to increase the temperature
Option 2 is to run the simulation longer than 1ns
Option 3 is to add some additional potential or bias that makes the simulations explore other regions.

In this post we will explore option 1 and 2.

3.2 Option 1: Effect of increasing the temperature

If we are interested in measuring the free energy barrier of the dihedral rotation at 300K. Increasing the temperature will improve our exploration of the phase space, but it won’t be the same ensemble as in 300K and therefore the free energy won’t be the same. But for the sake of argument, let’s just compare how much more the dihedral changes when increasing the temperature Remember, temperature for 8 atoms is ill-defined and it does not make much sense to exactly specifiy it. But the atoms will have higher velocity and therefore higher chance to overcome the energy barrier.

Show code

plot( dihed600$V2,type="l",col=c("red"),ylim=c(-200,200),main="Ethane dihedral MD@300K and @600K/1ns",ylab="Dihedral",xlab="Structure/5steps")
lines( dihed300$V2,col=c("black"))
legend("bottomleft",legend=c("Dihedral@300K","Dihedral@600K"),text.col=c("black","red"))

So there is clearly more exploration at a higher temperature. Let’s compare the histograms. I randomly decide to make a bin every three degrees giving a total of 360/3 = 120 bins.

Let’s display the histogram

Show code

denshist600 =hist(dihed600$V2,breaks=120,main="Ethane dihedral MD@600K/1ns",xlab="dihedral",xlim = c(-180,180))

Show code

#plot(denshist600$mids, denshist600$density,type="l",col=c("red"),xlim=c(-180,180))
#legend("topright",legend=c("Dihedral@600K"),text.col=c("red"))

Now let’s calculate the free energy \(\Delta G = -RT ln(\rho)\) in kcal/mol (using R = 1.985E-3). As we said in the first section, the density \(\rho(r)\) is just the histogram of the dihedral shown above. We’ll calculate the logarithm of the density and multiply it by (-RT).

Show code

R = 1.985E-3
Temp = 600.0
denshist600$log = -R*Temp*log(denshist600$density)
plot(denshist600$mids,denshist600$log,type="l",col=c("red"),xlim=c(-180,180),ylim = c(5,10),main = "Free Energy - Ethane Rotation",xlab = "Dihedral",ylab = "G(kcal/mol)")
legend("topright",legend=c("Free Energy@600K 1ns "),text.col=c("red"))

Show code

#record barriers
barr_all_f5_1ns_1 = toString(op(calcBarrier(denshist600,-179.9,-110)))
barr_all_f5_1ns_2 = toString(op(calcBarrier(denshist600,-70,10)))
barr_all_f5_1ns_3 = toString(op(calcBarrier(denshist600,50,130)))

Qualitatively, the barrier is between 1 and 1.5 kcal/mol at 600K, remember that free energy barriers will depend on the temperature. Also, notice that the dihedral barriers should be the same for all the staggered-eclipsed transitions. It looks like that the dihedral at -60 degrees is more stable than at 60 degrees, which makes no physical sense. We’ll have to discard some data, read below for more info about this.

Let’s calculate the barrier between the -60 and 0 degrees as precisely as we can: 3.02 which is different than the one between +60 and +120 which is 2.33. They should not be different as all hydrogens are identical. This discrepancy means that we explored all phase space but not equally or at least we are oversampling the minimum at -60 which makes it too stable. Let’s remember this result for later.

3.3 Option 2: Longer dynamics

If we run a longer molecular dynamics, say five times longer, 5 ns (5E6 steps) at 300 K we can see that we cover all of the dihedral’s phase space. The details are the same, 300K, saving a structure every 5 steps.

Show code

setwd("~/Teaching/Pchem/S18/Pchem_SharedFolder/10_Modeling/ethane")
dihed300_5ns = read.table("./dihed_300_5ns.txt")
denshist300_5ns =hist(dihed300_5ns$V2,breaks=120,main="Ethane diehdral MD@300K/5ns",xlab = "dihedral",xlim = c(-180,180),plot=TRUE)

Show code

#plot(denshist300_5ns$mids, denshist300_5ns$density,type="l",col=c("red"),xlim=c(-180,180))
#legend("topright",legend=c("Dihedral@300K - 5ns"),text.col=c("red"))

Compared to the 300K@1ns the histograms above are now showing that at 5ns at 300K we are exploring all the dihedral range. However, the 180 degree seems to be more visited than the -60 and +60, which still does not make physical sense. Let’s calculate the free energy using the above histogram to see its impact on the value of free energy of rotation.

Show code

init = 10020 # discard the first 50000 steps (50 ps)
final = 1000020
freq = 1 # every 50 steps (5x10)
R = 1.985E-3
Temp = 300.0
thisDihed = dihed300_5ns$V2[seq(init,final,freq)]

dens = hist(thisDihed,breaks=120,plot = FALSE)
dens$log = -R*Temp*log(dens$density)
plot(dens$mids, dens$log,type="l",col=c("red"),xlim=c(-180,180),ylim = c(2,7),main = "Free Energy - Ethane Rotation",xlab = "Dihedral",ylab = "G(kcal/mol)")
legend("topright",legend=c("Free Energy@300K-5ns "),text.col=c("red"))

Show code

#record barriers
barr_1s_f5_5ns_1 = toString(op(calcBarrier(dens,-179.99,-110)))
barr_1s_f5_5ns_2 = toString(op(calcBarrier(dens,-70,10)))
barr_1s_f5_5ns_3 = toString(op(calcBarrier(dens,50,130)))

If you look at the R code above you’ll see that I discarded the first 50ps of simulations. This is a common procedure to eliminate the first equilibration steps. The barrier is now 2.58 kcal/mol between -60 and 0 degrees and 2.65 kcal/mol between +60 and +120 degrees.

4 Studying the different options for the PMF

For the above calculation of rotation barrier we have made some random decisions. We have collected a dihedral measurement every 5 steps of the simulation, we have discarded the first 50 ps but used the rest. So we have some valid questions ahead:

Should we measure the dihedral less frequently? How often?
Should we discard the first part of the simulation? How much?
What tells us when we have converged results?

Below is the code for: 5ns - Every 10 steps - Discard the first 50ps - Use the rest

Show code

init = 10020 # discard the first 50000 steps (50 ps)
final = 1000020
freq = 2 # every 50 steps (5x10)
R = 1.985E-3
Temp = 300.0
thisDihed = dihed300_5ns$V2[seq(init,final,freq)]

dens = hist(thisDihed,breaks=120,xlim = c(-180,180),plot = FALSE)
dens$log = -R*Temp*log(dens$density)
plot(dens$mids, dens$log,type="l",col=c("red"),xlim=c(-180,180),ylim = c(2,7),main = "Free Energy - Ethane Rotation",xlab = "Dihedral",ylab = "G(kcal/mol)")
legend("topright",legend=c("300K - Every 10 steps - from 50ps to 5ns "),text.col=c("red"))

Show code

#record barriers
barr_1s_f10_5ns_1 = toString(op(calcBarrier(dens,-179.99,-110)))
barr_1s_f10_5ns_2 = toString(op(calcBarrier(dens,-70,10)))
barr_1s_f10_5ns_3 = toString(op(calcBarrier(dens,50,130)))

Below is the code for: 5ns - Every 50 steps - Discard the first 50ps - Use the rest. We will not plot the rest of free energy profiles, but you may do it by uncommenting the plot lines in the R code.

Show code

init = 10020 # discard the first 50000 steps (50 ps)
final = 1000020
freq = 10 # every 50 steps (5x10)
R = 1.985E-3
Temp = 300 
thisDihed = dihed300_5ns$V2[seq(init,final,freq)]

dens = hist(thisDihed,breaks=120,xlim = c(-180,180),plot = FALSE)
dens$log = -R*Temp*log(dens$density)
#plot(dens$mids, dens$log,type="l",col=c("red"),xlim=c(-180,180),ylim = c(2,7),main = "Free Energy - Ethane Rotation",xlab = "Dihedral",ylab = "G(kcal/mol)")
#legend("topright",legend=c("300K - Every 50 steps - from 50ps to 5ns "),text.col=c("red"))

#record barriers
barr_1s_f50_5ns_1 = toString(op(calcBarrier(dens,-179.99,-110)))
barr_1s_f50_5ns_2 = toString(op(calcBarrier(dens,-70,10)))
barr_1s_f50_5ns_3 = toString(op(calcBarrier(dens,50,130)))

Below is the code for: 5ns - Every 5 steps - Discard the first 3ns - Use the rest

Show code

init = 600020 # discard the first 3ns (5x6E5)
final = 1000020
freq = 1 # every 50 steps (5x10)
R = 1.985E-3
Temp = 300 
thisDihed = dihed300_5ns$V2[seq(init,final,freq)]

dens = hist(thisDihed,breaks=120,xlim = c(-180,180),plot = FALSE)
dens$log = -R*Temp*log(dens$density)
#plot(dens$mids, dens$log,type="l",col=c("red"),xlim=c(-180,180),ylim = c(2,7),main = "Free Energy - Ethane Rotation",xlab = "Dihedral",ylab = "G(kcal/mol)")
#legend("topright",legend=c("300K - Every 50 steps - from 50ps to 5ns "),text.col=c("red"))

#record barriers
barr_2s_f5_5ns_1 = toString(op(calcBarrier(dens,-179.99,-110)))
barr_2s_f5_5ns_2 = toString(op(calcBarrier(dens,-70,10)))
barr_2s_f5_5ns_3 = toString(op(calcBarrier(dens,50,130)))

Below is the code for: 5ns - Every 10 steps - Discard the first 3ns - Use the rest

Show code

init = 600020 # discard the first 3ns (5x6E5)
final = 1000020
freq = 2 # every 10 steps (5x10)
R = 1.985E-3
Temp = 300 
thisDihed = dihed300_5ns$V2[seq(init,final,freq)]

dens = hist(thisDihed,breaks=120,xlim = c(-180,180),plot = FALSE)
dens$log = -R*Temp*log(dens$density)
#plot(dens$mids, dens$log,type="l",col=c("red"),xlim=c(-180,180),ylim = c(2,7),main = "Free Energy - Ethane Rotation",xlab = "Dihedral",ylab = "G(kcal/mol)")
#legend("topright",legend=c("300K - Every 10 steps - from 3ns to 5ns "),text.col=c("red"))

#record barriers
barr_2s_f10_5ns_1 = toString(op(calcBarrier(dens,-179.99,-110)))
barr_2s_f10_5ns_2 = toString(op(calcBarrier(dens,-70,10)))
barr_2s_f10_5ns_3 = toString(op(calcBarrier(dens,50,130)))

Below is the code for: 5ns - Every 50 steps - Discard the first 3ns - Use the rest

Show code

init = 600020 # discard the first 3ns (5x6E5)
final = 1000020
freq = 10 # every 50 steps (5x10)
R = 1.985E-3
Temp = 300 
thisDihed = dihed300_5ns$V2[seq(init,final,freq)]

dens = hist(thisDihed,breaks=120,xlim = c(-180,180),plot = FALSE)
dens$log = -R*Temp*log(dens$density)
#plot(dens$mids, dens$log,type="l",col=c("red"),xlim=c(-180,180),ylim = c(2,7),main = "Free Energy - Ethane Rotation",xlab = "Dihedral",ylab = "G(kcal/mol)")
#legend("topright",legend=c("300K - Every 50 steps - 50-5ns "),text.col=c("red"))
#record barriers
barr_2s_f50_5ns_1 = toString(op(calcBarrier(dens,-179.99,-110)))
barr_2s_f50_5ns_2 = toString(op(calcBarrier(dens,-70,10)))
barr_2s_f50_5ns_3 = toString(op(calcBarrier(dens,50,130)))

The table below compares the barrier of rotation for all these different options. At this point we will avoid using statistical tools to assess how converged or how different these results are. For a longer discussion and a more systematic approach on assessing the convergence of simulations, check the paper by Grossfield and Zuckerman entitled Quantifying uncertainty and sampling quality in biomolecular simulations

Simulation Simulation / Freq / sample	-180°→ -120°	-60°→0°	+60°→+120°
600K 1ns/5 steps/0 - 1ns	2.68	3.02	2.33
300K 5ns/5 steps/50ps- 5ns	2.82	2.58	2.65
300K 5ns/10 steps/50ps- 5ns	2.85	2.60	2.66
300K 5ns/50 steps/50ps- 5ns	2.79	2.63	2.65
300K 5ns/5 steps/3ns- 5ns	2.66	2.91	2.86
300K 5ns/10 steps/3ns- 5ns	2.71	2.93	2.84
300K 5ns/50 steps/3ns- 5ns	2.70	3.04	2.85

Notice that for a simple system such as ethane and for a relatively long simulation such as 5ns one can observe differences of the order of 0.5 kcal/mol. In the paper cited above by Grossfield and Zuckerman, the authors list in section 4 a series of recommendations. The take home message is that there is still a fair amount of heuristic analysis to settle on the calculation of thermodynamic properties using molecular simulation.

The Simplest Potential of Mean Force using Molecular Dynamics

1 What is this?

2 Some theory: reaction paths, free energy, and molecular dynamics

3 Studying the rotation of ethane: Free energy

3.1 Analysis of a 1ns trajectory

3.2 Option 1: Effect of increasing the temperature

3.3 Option 2: Longer dynamics

4 Studying the different options for the PMF

Citation