From e8d57e957d9170b076abca2f2393b2eb989f449a Mon Sep 17 00:00:00 2001 From: Michael Shirts Date: Sat, 27 Apr 2024 11:49:02 -0600 Subject: [PATCH] Update paper.md with some comments. Some additional rephrasings and clarifications. Also adding another reference in (will do in another commit) --- joss_paper/paper.md | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/joss_paper/paper.md b/joss_paper/paper.md index b6aafebe..e848bcee 100644 --- a/joss_paper/paper.md +++ b/joss_paper/paper.md @@ -54,18 +54,18 @@ Other free energies extracted from simulations are useful in solution thermodyna Molecular dynamics (MD) packages such as GROMACS [@pronk2013gromacs], AMBER [@case2014ff14sb], NAMD [@phillips2020scalable], and GOMC [@cummings2021open] are used to run free energy simulations and many of these packages also contain tools for the subsequent processing of simulation data into free energies. However, there are no standard output formats and analysis tools implement different algorithms for the different stages of the free energy data processing pipeline. Therefore, it is very difficult to analyze data from different MD packages in a consistent manner. -Furthermore, the native analysis tools do not always implement current best practices [@klimovich2015guidelines] or are out of date -Overall, the coupling between data generation and analysis in most MD packages hinders seamless collaboration and comparison of results across. +Furthermore, the native analysis tools do not always implement current best practices [@klimovich2015guidelines,@mey2020bestpractices] or are out of date. +Overall, the lack of coupling between data generation and analysis in most MD packages hinders seamless collaboration and comparison of results across different implementations of data generation for free energy calculations. -*alchemlyb* addresses this problem by focusing only on the data analysis with the goal to provide a unified interface for working with free energy data. -In an initial step data are read from the native MD package file formats and then organized into a common standard data structure, a pandas Dataframe. -Additional functions enable subsampling or decorrelation of data and applying estimators from statistical mechanics to derive free energy quantities. -Overall, *alchemlyb* implements modular building blocks to simplify the process of extracting crucial thermodynamic insights from molecular simulations in a uniform manner. +*alchemlyb* addresses this problem by focusing only the data analysis portion of this process with the goal to provide a unified interface for working with free energy data generated from different MD packages. +In an initial step, data are read from the native MD package file formats and then organized into a common standard data structure, organized as a pandas Dataframe. +Additional functions enable subsampling or decorrelation of data and applying statistical mechanical estimators to extract the free energies and thermodynamic expectations as well associated metrics of quality. +*alchemlyb* implements these workflows using modular building blocks to simplify the process of extracting crucial thermodynamic insights from molecular simulations in a uniform manner. *alchemlyb* succeeds the widely-used but now deprecated [alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) tool [@klimovich2015guidelines], which combined pre-processing, free energy estimation, and plotting in a single script. -`alchemical-analysis.py` was not thoroughly tested and hard to integrate into modern workflows due to its monolithic design. +`alchemical-analysis.py` was not thoroughly tested and hard to integrate into modern workflows due to its monolithic design, as well as remaining in python 2. *alchemlyb* improves over its predecessor with a modular, function based design and thorough testing of all components using continuous integration. -Thus, *alchemlyb* is primarily a library that enables users to easily use well-tested building blocks with in their own tools while additionally providing examples of complete end-to-end workflows. +Thus, *alchemlyb* is a library that enables users to easily use well-tested building blocks with in their own tools while additionally providing examples of complete end-to-end workflows. This innovation enables consistent processing of free energy data from diverse MD packages, facilitating streamlined comparison and combination of results. Notably, *alchemlyb*'s robust and user-friendly nature has led to its integration into other automated workflow libraries such as BioSimSpace [@hedges2023suite]. @@ -74,33 +74,31 @@ This further enhances its accessibility and usability within broader scientific # Implementation -Solvation free energy, a key physical property often computed by computational chemists, involves constructing two end states: one where the ligand interacts with water and itself (coupled state), and the other where the ligand interacts only with itself, mimicking the pure solvent and ligand in the gas phase (decoupled state). +Transfer free energies, key physical property often computed by computational chemists, involves constructing two end states where a target molecule interacts with different environments For example, in a solvation free energy calculation, at one state (the coupled state) it interacts with a solvent (in the case hydration free energies, water), and the other where the ligand has no intermolecular interactions (the decoupled state), mimicking the transfer of a ligand at infinite dilution in the solvent at one end of the process and then ligand in the gas phase at the other. The solvation free energy is then obtained by calculating the free energy difference between these two end states. To achieve this, it is crucial to ensure sufficient overlap in phase space between the coupled and decoupled states, a condition often challenging to achieve. -Overlapping is facilitated by introducing a parameter `lambda` ($\lambda $) that connects the two end-states, resulting in a series of intermediate states. -MD engines simulate the system at these states, generating and accumulating free energy data. +The creation of overlapping states is facilitated by introducing a parameter `lambda` ($\lambda$) that continuously connects the functional form of the two end-states, resulting in a series of intermediate states each with a $\lambda$ value ranging from 0 to 1 (inclusive) are simulated. MD engines simulate the system at these states at these intermediate alchemical states, generating and accumulating free energy data discussed below. *alchemlyb* offers specific parsers designed to load raw free energy data from various MD engines, converting them into standard `pandas` `DataFrames`. Two types of free energy data are considered: Hamiltonian gradients (`dHdl`, $dH/d\lambda$) at all lambda states, suitable for thermodynamic integration (TI) estimators [@kirkwood1935statistical], and reduced potential energy differences between lambda states (`u_nk`, $u_{nk}$), which are used for free energy perturbation (FEP) estimators [@zwanzig1954high]. In *alchemlyb*, TI [@paliwal2011benchmark] and TI with Gaussian quadrature [@gusev2023active] estimators are implemented in the TI category of estimators. FEP category estimators include Bennett Acceptance Ratio (BAR) [@bennett1976efficient] and Multistate BAR (MBAR) [@shirts2008statistically]. -These estimators assume uncorrelated samples, and *alchemlyb* provides tools for data resampling based on autocorrelation times [@chodera2007use]. +These estimators assume uncorrelated samples in order to give unbiased estimates of the uncertainties, and *alchemlyb* provides tools for data resampling based on autocorrelation times [@chodera2007use]. To evaluate the accuracy of the free energy estimate, *alchemlyb* offers a range of assessment tools. The error of the TI method is correlated with the average curvature [@pham2011identifying], while the error of FEP estimators depends on the overlap in sampled energy distributions [@pohorille2010good]. -*alchemlyb* visualizes the smoothness of the integrand for TI estimators and the overlap matrix for FEP estimators. -Additionally, the accumulated samples should be collected from equilibrated simulations, and *alchemlyb* has tools for plotting the convergence of the free energy estimate as a function of simulation time [@yang2004free] to detect potentially un-equilibrated data. +*alchemlyb* creates visualizations of the smoothness of the integrand for TI estimators and the overlap matrix for FEP estimators, which can be qualitatively and quantitatively analyzed to determine the degree of overlap between simulated alchemical states, and suggest whether additional simulations should be run. +For statistical validity, the accumulated samples should be collected from equilibrated simulations, an *alchemlyb* thus also has tools for plotting the convergence of the free energy estimate as a function of simulation time [@yang2004free] to detect the presence of potentially un-equilibrated data. *alchemlyb* offers all these tools as a library for users to customize each stage of the analysis (Figure 1). -Additionally, *alchemlyb* provides an automated end-to-end workflow that reads in the raw input data and performs decorrelation, estimation, and quality plotting of the estimate. +Additionally, *alchemlyb* provides an automated end-to-end workflow that carries out all stages of the analysing, reading in the raw input data and performs decorrelation, estimation, and quality plotting of the estimates. This workflow allows for the estimation of quantities such as solvation free energy with minimal code. Moreover, this facilitates more complex calculations, such as absolute binding free energy, which is the free energy difference between the solvation free energy of the ligand in water and the solvation free energy of the ligand in the protein's binding pocket. ![The building blocks of *alchemlyb*](Fig1.pdf) - # Acknowledgements Some work on alchemlyb was supported by grants from the National Institutes of Health (Award No R01GM118772 to O.B.) and the National Science Foundation (award ACI-1443054 to O.B.).