Calculate the Potential of Mean Force from MD of an enzymatic reaction

Modeling R

Parsing files, plotting coordinates, calculating free energy

Xavier Prat-Resina https://pratresina.umn.edu (University of Minnesota Rochester)https://r.umn.edu
08-17-2021

Introduction

One of the daunting tasks when running molecular simulations with thousands of atoms is to understand the results of your calculations. You have many files, long simulations, and it may be hard to reduce all that information into the chemistry of arrow pushing mechanism that you are used to. Granted that if you have the time you want to spend it looking at the actual structures on VMD/PyMol, but your supervisor/collaborator doesn’t have time for that and wants to see graphs, tables and evidence of what you are talking about.

Here I will try to show some basic analysis of an umbrella sampling simulation to calculate the free energy profile along a reaction coordinate. Mostly the purpose of this post is to show some basic R scripting to manipulate and plot data.

The reaction and the reaction coordinates

The reaction to study is a trans-acetalization. In terms of organic chemistry is a nucleophilic addition and nucleophilic elimination. The goal here is to understand if that happens through a stable intermediate and what level of synchronocity exists between the distances labeled as “A3” and “A4”.

The active site of the reaction we want to study

Our molecular dynamics simulations were based on the scanning of the asymmetric stretch coordinate (A3-A4) under an umbrella potential.

Load trajectory files and measure time-dependent magnitudes

Considering that we used CHARMM to run the simulations we will use CHARMM’s scripting analysis tools. Alternative tools such as MDAnalysis are very powerful, but for the sake of pedagogy I will use a DIY approach, even if it takes us longer. The following is a CHARMM script that loads the trajectory file dcd and measures the distances involved in the reaction coordinate.

!load the dcd trajectories
open unit 51 read name @inputdir/@window.dcd

!Open data output file
open write form UNIT 28 name @d/@window.a3.data
open write form UNIT 29 name @d/@window.a4.data
open write form UNIT 30 name @d/@window.515o2_h.data
open write form UNIT 31 name @d/@window.h_o3.data
open write form UNIT 32 name @d/@window.477o1_h.data
open write form UNIT 33 name @d/@window.515o1_h.data
open write form UNIT 34 name @d/@window.477o2_h.data

CORREL MAXSERIES 34 MAXTIMESTEPS 60000 MAXATOMS 8500

ENTER A3 DIST sele ((resid 1 .and. type O3 .and. segid S).or.(resid 3 .and. type C1 .and. segid G)) end
ENTER A4 DIST sele ((resid 477 .and. type OD1 .and. segid C).or.(resid 3 .and. type C1 .and. segid G)) end
ENTER D1 DIST sele ((resid 515 .and. type OE2 .and. segid C).or.(resid 1 .and. type HO3 .and. segid S)) end
ENTER D2 DIST sele ((resid 1 .and. type O3 .and. segid S).or.(resid 1 .and. type HO3 .and. segid S)) end
ENTER D3 DIST sele ((resid 477 .and. type OD1 .and. segid C).or.(resid 1 .and. type HO3 .and. segid S)) end
ENTER D4 DIST sele ((resid 515 .and. type OE1 .and. segid C).or.(resid 1 .and. type HO3 .and. segid S)) end
ENTER D5 DIST sele ((resid 477 .and. type OD2 .and. segid C).or.(resid 1 .and. type HO3 .and. segid S)) end

TRAJ  FIRSTU 51 NUNIT 1 begin 1 
!If your run didn't finish, you need to tell traj when to stop:
!TRAJ  FIRSTU 51 NUNIT 1 begin 1 stop 8580

write A3 unit 28 dumb time
write A4 unit 29 dumb time
write D1 unit 30 dumb time
write D2 unit 31 dumb time
write D3 unit 32 dumb time
write D4 unit 33 dumb time
write D5 unit 34 dumb time

stop

The code above needs to be applied to each window of simulation which will give reaction coordinate files for each window. We will use R to load all these files and analyze them.

Show code
getFiles <- function(myPath,pattern){
  setwd(myPath)
  listOfFiles = Sys.glob("eqq*.data")
  listOfFiles = listOfFiles[grepl(pattern,listOfFiles)]
  n = listOfFiles[grepl("n",listOfFiles)]
  p = listOfFiles[grepl("p",listOfFiles)]
  listOfFiles = c(sort(n,decreasing = TRUE),p)
  return(listOfFiles)
}
buildDataFrame <- function(maxlen,myPath, removepat, filenames){
  setwd(myPath)
  thisDataFrame <- data.frame(matrix(NA,nrow = maxlen, ncol = length(filenames)))
  for (i in 1:length(filenames)){
    thisCol = read.table(filenames[i],header = FALSE)
    thisColName = gsub(removepat,"",filenames[i])
    names(thisDataFrame)[i] = thisColName
    #R must have a function, but thisCol may be shorter than maxlen, so better to do it one by one
    for (j in 1:length(thisCol$V2)){
      thisDataFrame[[thisColName]][j] = thisCol$V2[j]
    }
  }
  return(thisDataFrame)
}

myPath = "/Users/xavier/Gd/Research/Chile/Camilo/analysis_dist/"
a3_files = getFiles(myPath,"a3")
a4_files = getFiles(myPath,"a4")

#this assumes that the first window of the list has the maximum length (a finished job)
#some windows were not run to the finish so they will have NA at the end
sample = read.table(a3_files[1],header = FALSE)
maxlen = length(sample$V2)
a3 = buildDataFrame(maxlen,myPath,".a3.data",a3_files)
a4 = buildDataFrame(maxlen,myPath,".a4.data",a4_files)
a3a4 = a3-a4

fe_a3a4 = read.csv("/Users/xavier/Gd/Research/Chile/Camilo/10ns/free_energy",header = FALSE, sep = "\t")

Analysis of reaction coordinate: nucleophilic addition/elimination O - C - O

Dynamic analysis

A brief statistical summary (using R’s function summery) can give information about each window:

Show code
for (i in 1:ncol(a3a4)){
   print(paste("Showing results for the Rc=A3-A4 for window:",names(a3a4)[i],sep = ""))
   print(summary(a3a4[,i]))
}
[1] "Showing results for the Rc=A3-A4 for window:eqq_n17"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -1.870  -1.749  -1.716  -1.716  -1.683  -1.575 
[1] "Showing results for the Rc=A3-A4 for window:eqq_n15"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -1.676  -1.554  -1.522  -1.522  -1.491  -1.370 
[1] "Showing results for the Rc=A3-A4 for window:eqq_n13"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -1.503  -1.371  -1.338  -1.338  -1.304  -1.183 
[1] "Showing results for the Rc=A3-A4 for window:eqq_n11"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -1.291  -1.176  -1.143  -1.144  -1.114  -1.010 
[1] "Showing results for the Rc=A3-A4 for window:eqq_n09"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
-1.1195 -1.0145 -0.9868 -0.9857 -0.9554 -0.8627     142 
[1] "Showing results for the Rc=A3-A4 for window:eqq_n07"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.9724 -0.8650 -0.8388 -0.8377 -0.8073 -0.6646 
[1] "Showing results for the Rc=A3-A4 for window:eqq_n05"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.8267 -0.6944 -0.6540 -0.6543 -0.6174 -0.4623 
[1] "Showing results for the Rc=A3-A4 for window:eqq_n04"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
-0.7823 -0.5912 -0.5475 -0.5481 -0.5018 -0.3225     107 
[1] "Showing results for the Rc=A3-A4 for window:eqq_n03"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.6324 -0.4406 -0.3930 -0.3934 -0.3449 -0.1337 
[1] "Showing results for the Rc=A3-A4 for window:eqq_n02"
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.40947 -0.27016 -0.22751 -0.22572 -0.18576 -0.00251 
[1] "Showing results for the Rc=A3-A4 for window:eqq_n01"
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.30212 -0.12723 -0.08176 -0.08421 -0.03752  0.12434 
[1] "Showing results for the Rc=A3-A4 for window:eqq_p00"
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.151132  0.003918  0.042896  0.043282  0.084240  0.225363 
[1] "Showing results for the Rc=A3-A4 for window:eqq_p01"
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.008844  0.129696  0.170706  0.169833  0.208377  0.365419 
[1] "Showing results for the Rc=A3-A4 for window:eqq_p02"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1217  0.3071  0.3493  0.3496  0.3943  0.5379 
[1] "Showing results for the Rc=A3-A4 for window:eqq_p04"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.3570  0.4893  0.5232  0.5237  0.5586  0.6624 
[1] "Showing results for the Rc=A3-A4 for window:eqq_p06"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.5663  0.6972  0.7309  0.7298  0.7635  0.8702 
[1] "Showing results for the Rc=A3-A4 for window:eqq_p07"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.6487  0.7753  0.8095  0.8089  0.8429  0.9580 
[1] "Showing results for the Rc=A3-A4 for window:eqq_p09"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 0.7698  0.9134  0.9479  0.9487  0.9829  1.1488     339 
[1] "Showing results for the Rc=A3-A4 for window:eqq_p11"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.9461  1.0755  1.1146  1.1131  1.1487  1.2664 
[1] "Showing results for the Rc=A3-A4 for window:eqq_p13"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.145   1.271   1.308   1.306   1.343   1.498 
[1] "Showing results for the Rc=A3-A4 for window:eqq_p15"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.350   1.455   1.488   1.491   1.527   1.657 

In addition to the above average data and in order to get a more visual and dynamic overview of what the reaction coordinates look like in each of our windows we can plot the two distances that make up our reaction coordinate. A3 in red and A4 in blue.

Show code
#this builds a 3x3 lattice. oma and mar take care of margins
par(mfrow=c(3,3),
    oma = c(5,4,0,0) + 0.1,
    mar = c(0,0,1,1) + 0.1)


for (i in 1:length(a3_files)){
  #I'm using ohfiles just for naming and finding windows
  colname = gsub(".a3.data","",a3_files[i])
  #only show y-axis numbers on the first graph of the row
  #print(colname)
  if ( (i+2)%%3 == 0){
    plot( a3[[colname]],ylim=c(0.5,3.5),type="l",col="red" ,ann = FALSE,xaxt='n')
    lines(a4[[colname]],ylim=c(0.5,3.5),type="l",col="blue",ann = FALSE,xaxt='n')
  }else{
    plot( a3[[colname]],ylim=c(0.5,3.5),type="l",col="red" ,ann = FALSE,xaxt='n',yaxt='n')
    lines(a4[[colname]],ylim=c(0.5,3.5),type="l",col="blue",ann = FALSE,xaxt='n')
  }
  rcValue = gsub("eqq_","",colname)
  graphTitle = paste(rcValue,"Avg:A3=",round(mean(a3[[colname]], na.rm = TRUE),digits = 2),
                                " A4=",round(mean(a4[[colname]], na.rm = TRUE),digits = 2),sep = "")
  title(graphTitle,line=-7)
}

The chemistry question is wether the nucleophilic attack (A4: blue) happens at the same time as the elimination (A3: red) or not. The elimination (A3) only starts happening once the attacking nucleophile (A4 blue) is close enough at window n04. It is also true that while there is a tetra-coordinate intermediate at window p00, the distances do not belong to a regular O-C distance (1.5) and they are rather long around 2.0. This fact allows to predict without looking a the free energy that it cannot possibly be a stable intermediate. Therefore the nucleophilic/elimination reaction happens rather synchronously and in one step.

Umbrella Sampling

When plotting a histogram one can build a histogram of all windows at the same time:

Show code
dens = hist(data.matrix(a3a4),breaks = 300,plot = FALSE)
plot(dens$mids,dens$density,type="l",main="A3-A4")

Notice that the above histograms does not distinguish between windows. In order to see the overlap between windows we will build a histogram for each window.

Show code
plot(0,0,xlim = c(-2,1.70),ylim = c(0,200),type = "n")
cl = rainbow(ncol(a3a4))
## here you set the font size 
op <- par(cex = 0.5)
for (i in 1:ncol(a3a4)){
  dens = hist(a3a4[,i],breaks = 25,plot = FALSE)
  lines(dens$mids,dens$counts,col=cl[i],type='l')
}
legend("topleft",legend = names(a3a4),text.col = cl)

This representation allows us to see that there’s a gap in some values at R=0.6. This is bad because each window must significantly overlap with the neighboring windows. The reason is the integration constants: \[ \Delta G = -RT ln(\rho) + Constant\] In the straight umbrella sampling (not WHAM) we make it so that the G of the two neighboring windows are the same at the overlapping point. If there’s no overlapping the free energy is not a continuous function. We are going to continue with the analysis but in a real context we would have to run more windows to cover those gaps shown in the histograms above.

We can still plot the free energy along this coordinate using the WHAM method. We will use Alan Grossfield’s WHAM code.

Show code
plot(fe_a3a4$V1,fe_a3a4$V2)

Analysis of additional reaction coordinates: proton transfers

The fact that we were able to obtain an apparently continous free energy profile does not necessarily mean that the chosen reaction coordinate (A3-A4) is the right one. We know that proton transfers play a significant role and it could well be that while are scanning A3-A4 the protons are switching back and forth to accommodate the scanning rather than following the true reaction path.

Dynamic analysis

We know that there’s a proton than can jump from three different sites: Asp477, Glu515, and O3 (the substrate leaving group)

Let’s start by loading up the files and calculate basic statistical for each window.

Show code
#Get file names
o1_477h_files = getFiles(myPath,"477o1_h")
o2_477h_files = getFiles(myPath,"477o2_h")
o1_515h_files = getFiles(myPath,"515o1_h")
o2_515h_files = getFiles(myPath,"515o2_h")
o3h_files = getFiles(myPath,"h_o3")

o1_477h = buildDataFrame(maxlen,myPath,".477o1_h.data",o1_477h_files)
o2_477h = buildDataFrame(maxlen,myPath,".477o2_h.data",o2_477h_files)
o1_515h = buildDataFrame(maxlen,myPath,".515o1_h.data",o1_515h_files)
o2_515h = buildDataFrame(maxlen,myPath,".515o2_h.data",o2_515h_files)
h_o3 =    buildDataFrame(maxlen,myPath,".h_o3.data",o3h_files)

A carboxylic acid has two potential oxygens that can hold a proton. We must first make sure that we are monitoring the right O-H distance. In other words, identify the oxygen that is holding or interacting with the proton.

For that we can have a quick look at averages and minimum distances during each trajectory.

Show code
for (i in 1:length(o1_477h_files)){
  colname = gsub(".477o1_h.data","",o1_477h_files[i])
  myLine = paste(colname,"AspO1-H Ave",sprintf('%.2f',mean(o1_477h[[colname]],na.rm = TRUE)),
                                 "Min",sprintf('%.2f',min(o1_477h[[colname]],na.rm = TRUE)),
                         "AspO2-H Ave",sprintf('%.2f',mean(o2_477h[[colname]],na.rm = TRUE)),
                                 "Min",sprintf('%.2f',min(o2_477h[[colname]],na.rm = TRUE)),
                 sep = " : ")
  print(myLine)
}
[1] "eqq_n17 : AspO1-H Ave : 1.75 : Min : 1.20 : AspO2-H Ave : 3.52 : Min : 2.85"
[1] "eqq_n15 : AspO1-H Ave : 2.00 : Min : 1.10 : AspO2-H Ave : 3.86 : Min : 2.89"
[1] "eqq_n13 : AspO1-H Ave : 1.84 : Min : 1.12 : AspO2-H Ave : 3.94 : Min : 3.23"
[1] "eqq_n11 : AspO1-H Ave : 1.18 : Min : 1.00 : AspO2-H Ave : 3.25 : Min : 2.91"
[1] "eqq_n09 : AspO1-H Ave : 1.18 : Min : 1.00 : AspO2-H Ave : 3.26 : Min : 2.96"
[1] "eqq_n07 : AspO1-H Ave : 2.28 : Min : 1.01 : AspO2-H Ave : 4.31 : Min : 3.05"
[1] "eqq_n05 : AspO1-H Ave : 3.48 : Min : 2.62 : AspO2-H Ave : 5.47 : Min : 4.66"
[1] "eqq_n04 : AspO1-H Ave : 3.51 : Min : 2.53 : AspO2-H Ave : 5.52 : Min : 4.51"
[1] "eqq_n03 : AspO1-H Ave : 3.56 : Min : 2.77 : AspO2-H Ave : 5.49 : Min : 4.68"
[1] "eqq_n02 : AspO1-H Ave : 3.74 : Min : 3.19 : AspO2-H Ave : 5.58 : Min : 4.76"
[1] "eqq_n01 : AspO1-H Ave : 3.84 : Min : 3.32 : AspO2-H Ave : 5.63 : Min : 4.88"
[1] "eqq_p00 : AspO1-H Ave : 3.91 : Min : 3.35 : AspO2-H Ave : 5.56 : Min : 4.79"
[1] "eqq_p01 : AspO1-H Ave : 3.89 : Min : 3.38 : AspO2-H Ave : 5.53 : Min : 4.68"
[1] "eqq_p02 : AspO1-H Ave : 4.03 : Min : 3.46 : AspO2-H Ave : 5.71 : Min : 5.06"
[1] "eqq_p04 : AspO1-H Ave : 3.74 : Min : 3.29 : AspO2-H Ave : 5.34 : Min : 4.32"
[1] "eqq_p06 : AspO1-H Ave : 3.61 : Min : 3.19 : AspO2-H Ave : 5.43 : Min : 4.96"
[1] "eqq_p07 : AspO1-H Ave : 3.63 : Min : 3.20 : AspO2-H Ave : 5.44 : Min : 4.99"
[1] "eqq_p09 : AspO1-H Ave : 3.67 : Min : 3.20 : AspO2-H Ave : 5.49 : Min : 5.03"
[1] "eqq_p11 : AspO1-H Ave : 3.74 : Min : 3.23 : AspO2-H Ave : 5.57 : Min : 4.94"
[1] "eqq_p13 : AspO1-H Ave : 3.75 : Min : 3.22 : AspO2-H Ave : 5.66 : Min : 4.99"
[1] "eqq_p15 : AspO1-H Ave : 3.94 : Min : 3.35 : AspO2-H Ave : 5.79 : Min : 5.17"

So we can conclude that Asp477 has “O1” as the active oxygen because the O2-H distance is always longer and therfore irrelevant.

Show code
for (i in 1:length(o1_515h_files)){
  colname = gsub(".515o1_h.data","",o1_515h_files[i])
  myLine = paste(colname,"GluO1-H Ave",sprintf('%.2f',mean(o1_515h[[colname]],na.rm = TRUE)),
                                 "Min",sprintf('%.2f',min(o1_515h[[colname]],na.rm = TRUE)),
                         "GluO2-H Ave",sprintf('%.2f',mean(o2_515h[[colname]],na.rm = TRUE)),
                                 "Min",sprintf('%.2f',min(o2_515h[[colname]],na.rm = TRUE)),
                 sep = " : ")
  print(myLine)
}
[1] "eqq_n17 : GluO1-H Ave : 2.47 : Min : 2.24 : GluO2-H Ave : 1.10 : Min : 0.95"
[1] "eqq_n15 : GluO1-H Ave : 2.47 : Min : 2.21 : GluO2-H Ave : 1.08 : Min : 0.94"
[1] "eqq_n13 : GluO1-H Ave : 2.43 : Min : 2.17 : GluO2-H Ave : 1.09 : Min : 0.95"
[1] "eqq_n11 : GluO1-H Ave : 2.63 : Min : 2.25 : GluO2-H Ave : 1.37 : Min : 1.07"
[1] "eqq_n09 : GluO1-H Ave : 2.66 : Min : 2.24 : GluO2-H Ave : 1.40 : Min : 1.07"
[1] "eqq_n07 : GluO1-H Ave : 2.56 : Min : 2.23 : GluO2-H Ave : 1.20 : Min : 0.94"
[1] "eqq_n05 : GluO1-H Ave : 2.45 : Min : 2.14 : GluO2-H Ave : 1.07 : Min : 0.94"
[1] "eqq_n04 : GluO1-H Ave : 2.45 : Min : 2.13 : GluO2-H Ave : 1.08 : Min : 0.94"
[1] "eqq_n03 : GluO1-H Ave : 2.48 : Min : 2.19 : GluO2-H Ave : 1.18 : Min : 0.95"
[1] "eqq_n02 : GluO1-H Ave : 2.53 : Min : 2.16 : GluO2-H Ave : 1.31 : Min : 1.04"
[1] "eqq_n01 : GluO1-H Ave : 2.59 : Min : 2.19 : GluO2-H Ave : 1.43 : Min : 1.10"
[1] "eqq_p00 : GluO1-H Ave : 2.56 : Min : 2.07 : GluO2-H Ave : 1.48 : Min : 1.15"
[1] "eqq_p01 : GluO1-H Ave : 2.25 : Min : 1.29 : GluO2-H Ave : 1.89 : Min : 1.23"
[1] "eqq_p02 : GluO1-H Ave : 1.85 : Min : 1.26 : GluO2-H Ave : 2.35 : Min : 1.45"
[1] "eqq_p04 : GluO1-H Ave : 2.03 : Min : 1.36 : GluO2-H Ave : 2.22 : Min : 1.41"
[1] "eqq_p06 : GluO1-H Ave : 2.76 : Min : 2.29 : GluO2-H Ave : 1.61 : Min : 1.17"
[1] "eqq_p07 : GluO1-H Ave : 2.71 : Min : 2.28 : GluO2-H Ave : 1.59 : Min : 1.19"
[1] "eqq_p09 : GluO1-H Ave : 2.68 : Min : 1.73 : GluO2-H Ave : 1.56 : Min : 1.20"
[1] "eqq_p11 : GluO1-H Ave : 2.65 : Min : 2.03 : GluO2-H Ave : 1.56 : Min : 1.22"
[1] "eqq_p13 : GluO1-H Ave : 2.42 : Min : 1.30 : GluO2-H Ave : 2.06 : Min : 1.35"
[1] "eqq_p15 : GluO1-H Ave : 1.69 : Min : 1.35 : GluO2-H Ave : 2.72 : Min : 2.30"

From the above we see that both oxygens of Glu515 are active at a different times of the reaction.

Finally we can look at when the proton makes it to its final destination, the leaving group O3.

Show code
for (i in 1:length(o3h_files)){
  colname = gsub(".h_o3.data","",o3h_files[i])
  myLine = paste(colname,"O3-H Ave",sprintf('%.2f',mean(h_o3[[colname]],na.rm = TRUE)),
                                 "Min",sprintf('%.2f',min(h_o3[[colname]],na.rm = TRUE)),
                 sep = " : ")
  print(myLine)
}
[1] "eqq_n17 : O3-H Ave : 2.62 : Min : 1.92"
[1] "eqq_n15 : O3-H Ave : 2.30 : Min : 1.46"
[1] "eqq_n13 : O3-H Ave : 2.92 : Min : 1.80"
[1] "eqq_n11 : O3-H Ave : 3.47 : Min : 2.88"
[1] "eqq_n09 : O3-H Ave : 2.90 : Min : 2.39"
[1] "eqq_n07 : O3-H Ave : 2.23 : Min : 1.35"
[1] "eqq_n05 : O3-H Ave : 1.61 : Min : 1.21"
[1] "eqq_n04 : O3-H Ave : 1.61 : Min : 1.21"
[1] "eqq_n03 : O3-H Ave : 1.40 : Min : 1.03"
[1] "eqq_n02 : O3-H Ave : 1.23 : Min : 0.98"
[1] "eqq_n01 : O3-H Ave : 1.16 : Min : 0.98"
[1] "eqq_p00 : O3-H Ave : 1.13 : Min : 0.96"
[1] "eqq_p01 : O3-H Ave : 1.09 : Min : 0.92"
[1] "eqq_p02 : O3-H Ave : 1.06 : Min : 0.93"
[1] "eqq_p04 : O3-H Ave : 1.06 : Min : 0.94"
[1] "eqq_p06 : O3-H Ave : 1.10 : Min : 0.92"
[1] "eqq_p07 : O3-H Ave : 1.10 : Min : 0.94"
[1] "eqq_p09 : O3-H Ave : 1.09 : Min : 0.95"
[1] "eqq_p11 : O3-H Ave : 1.08 : Min : 0.95"
[1] "eqq_p13 : O3-H Ave : 1.06 : Min : 0.93"
[1] "eqq_p15 : O3-H Ave : 1.06 : Min : 0.95"

We can see that it arrives as soon as the window Rc=-0.4 and it stays there for the rest of the reaction.

So it looks like in terms of proton transfer lots of things are happening among the four oxygens being mentioned.

Except for Asp477O2-H distance identified as not relevant, the other four will be taken into account in the below representation: The Asp477-O1 (red), the two Glu515-O (1:blue and 2:green) and the leaving group O3(black).

Show code
#this builds a 3x3 lattice. oma and mar take care of margins
par(mfrow=c(2,3),
    oma = c(5,4,0,0) + 0.1,
    mar = c(0,0,1,1) + 0.1)

for (i in 1:length(h_o3)){
  #I'm using ohfiles just for naming and finding windows
  colname = gsub(".h_o3.data","",o3h_files[i])
  #only show y-axis numbers on the first graph of the row
  #print(colname)
  if ( (i+2)%%3 == 0){
    plot(o1_477h[[colname]],ylim=c(0.5,4.5),type="l",col="red",ann = FALSE,xaxt='n')
    lines(o1_515h[[colname]],ylim=c(0.5,4.5),type="l",col="blue",ann = FALSE,xaxt='n')
    lines(o2_515h[[colname]],ylim=c(0.5,4.5),type="l",col="green",ann = FALSE,xaxt='n')
    lines(h_o3[[colname]],ylim=c(0.5,4.5),type="l",col="black",ann = FALSE,xaxt='n')
  }else{
    plot(o1_477h[[colname]],ylim=c(0.5,4.5),type="l",col="red",ann = FALSE,xaxt='n',yaxt='n')
    lines(o1_515h[[colname]],ylim=c(0.5,4.5),type="l",col="blue",ann = FALSE,xaxt='n',yaxt='n')
    lines(o2_515h[[colname]],ylim=c(0.5,4.5),type="l",col="green",ann = FALSE,xaxt='n',yaxt='n')
    lines(h_o3[[colname]],ylim=c(0.5,4.5),type="l",col="black",ann = FALSE,xaxt='n',yaxt='n')
  }
  rcValue = gsub("eqq_","",colname)
  graphTitle = rcValue
                            
  title(graphTitle)
}

Let’s read carefully the above representations, from products p15 to reactants n17

Conclusion

It is true that we were able to obtain a smooth free energy profile that took us from reactants to products. But at this point the work is not done. Some of the proton transfers in the negative side of the reaction coordinate, specially n11 to n07 will require longer simulations and discard any non-equilibrium arrangements. The goal here is to make sure that we can justify each proton transfer as a genuine and chemically justifiable rather than an artifact of scanning an incomplete reaction coordinate pulled across different energy valleys.

Citation

For attribution, please cite this work as

Prat-Resina (2021, Aug. 17). Prat-Resina's blog: Calculate the Potential of Mean Force from MD of an enzymatic reaction. Retrieved from https://xavierprat.github.io/Blog/posts/calculating_pmf_with_umbrella/

BibTeX citation

@misc{prat-resina2021calculate,
  author = {Prat-Resina, Xavier},
  title = {Prat-Resina's blog: Calculate the Potential of Mean Force from MD of an enzymatic reaction},
  url = {https://xavierprat.github.io/Blog/posts/calculating_pmf_with_umbrella/},
  year = {2021}
}