From 557d13d25d876c9054785da1e1bd1cbbca0fc524 Mon Sep 17 00:00:00 2001 From: Michael Shirts Date: Sat, 30 Dec 2023 12:05:24 -0700 Subject: [PATCH] MRS comments (#339) * MRS comments Some notes from MRS * Update joss_paper/paper.md Co-authored-by: Zhiyi Wu --------- Co-authored-by: Zhiyi Wu --- joss_paper/paper.md | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/joss_paper/paper.md b/joss_paper/paper.md index 1401a11d..55042034 100644 --- a/joss_paper/paper.md +++ b/joss_paper/paper.md @@ -13,14 +13,19 @@ authors: - name: David Dotson equal-contrib: true # (This is how you can denote equal contributions between multiple authors) affiliation: 2 + - name: Michael R. Shirts + orcid: 0000-0003-3249-1097 + affiliation: 3 - name: Author with no affiliation corresponding: true # (This is how to denote the corresponding author) affiliation: 3 affiliations: - - name: Exscientia, Oxford, UK. + - name: Exscientia, Oxford, UK index: 1 - name: Institution Name, Country index: 2 + - name: University of Colorado Boulder, Boulder, Colorado, USA + index: 3 date: 29 December 2023 bibliography: paper.bib @@ -29,44 +34,41 @@ bibliography: paper.bib # Summary -*Alchemlyb* is a dedicated open-source software package tailored for the analysis of alchemical free energy calculations, an integral part of computational chemistry and biology, notably in the field of drug discovery. The software spans a wide range of functions, starting with the extraction of raw data from Molecular Dynamics (MD) engines, moving on to data preprocessing tasks such as decorrelation, using various estimators to derive free energy estimates, and finally providing quality analysis tools for data convergence checking. +*Alchemlyb* is a dedicated open-source software package tailored for the analysis of alchemical free energy calculations, an integral part of computational chemistry and biology, most notably in the field of drug discovery. The software spans a wide range of functions, starting with the extraction of raw data from molecular dynamics (MD) engines, moving on to data preprocessing tasks such as decorrelation of time series, using various estimators to derive free energy estimates from simulation samples, and finally providing quality analysis tools for data convergence checking. A distinctive attribute of *alchemlyb* is its streamlined, end-to-end analysis process reminiscent of the now-discontinued [alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) workflow. This user-friendly workflow facilitates navigation through the entire analysis pipeline, from the initial data input stage to the final result derivation, enabling researchers from diverse scientific backgrounds, and not solely computational chemistry specialists, to utilize alchemlyb effectively. # Statement of need -In the pharmaceutical sector, the utilization of computational chemistry techniques is integral for evaluating potential drug compounds based on their protein-binding affinity [@deng2009computations]. Notably, relative/absolute binding free energy calculations are routinely employed for this purpose [@merz2010drug]. The resultant free energy data is essential for understanding binding affinity throughout various stages of drug discovery, such as hit identification and lead optimization [@merz2010drug]. The *alchemlyb* software adeptly processes this data, providing crucial insights and establishing itself as an indispensable asset in computational chemistry [^1]. +In the pharmaceutical sector, computational chemistry techniques are integral for evaluating potential drug compounds based on their protein binding affinity [@deng2009computations]. Notably, absolute binding free energy calculations between proteins and ligands or relative binding affinity of ligands to the same protein are routinely employed for this purpose [@merz2010drug]. The resultant estimates of these free energies are essential for understanding binding affinity throughout various stages of drug discovery, such as hit identification and lead optimization [@merz2010drug]. Other free energies extracted from simulations are useful in solution thermodynamics, chemical engineering, environmental science, and material science. The *alchemlyb* software processes the raw data from MD simulations using key estimators from statistical mechanics, drastically simplifying the process of extracting crucial thermodynamic insights from molecular simulations [^1]. [^1]: As of 29/12/2023, *alchemlyb* has been downloaded 23,922 times from [conda-forge](https://anaconda.org/conda-forge/alchemlyb/files). -In the realm of computational research, various molecular dynamics (MD) engines, including GROMACS [@pronk2013gromacs], AMBER [@case2014ff14sb], GOMC [@cummings2021open], and NAMD [@phillips2020scalable], offer distinct tools for conducting free energy calculations. However, the diversity in output formats and analysis tools among different MD engines complicates the research process. Data generated by each engine requires unique processing and analysis methods, hindering seamless collaboration and comparison of results. - +Various molecular dynamics (MD) engines, including GROMACS [@pronk2013gromacs], AMBER [@case2014ff14sb], GOMC [@cummings2021open], and NAMD [@phillips2020scalable], offer distinct tools for performing free energy calculations. However, the diversity in output formats and analysis tools among different MD engines complicates the research process. Data generated by each engine requires individualized processing and analysis methods, hindering seamless collaboration and comparison of results. -Addressing this complexity is the [alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) tool [@klimovich2015guidelines], which precedes *alchemlyb*. Although [alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) has been deprecated, *alchemlyb* continues to provide a unified, engine-agnostic analysis workflow. Unlike its predecessor, *alchemlyb* breaks down components into individual tools, allowing users to customize their analysis. This innovation enables consistent processing of free energy data from diverse MD engines, facilitating streamlined comparison and combination of results. +THe [alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) tool [@klimovich2015guidelines], which preceeded *alchemlyb*, addressed this problem. Now that [alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) has been deprecated, *alchemlyb* continues to provide a unified, engine-agnostic analysis workflow. Unlike its predecessor, *alchemlyb* breaks down components of the workflow into modular tools, allowing users to more easily customize their analysis. This innovation enables consistent processing of free energy data from diverse MD engines, facilitating streamlined comparison and combination of results. -Notably, *alchemlyb*'s robust and user-friendly nature has led to its integration into other automated workflow libraries such as Biosimspace [@hedges2023suite]. This further enhances its accessibility and usability within broader scientific workflows, reinforcing its position as a versatile and essential tool in the field of computational chemistry. +Notably, *alchemlyb*'s robust and user-friendly nature has led to its integration into other automated workflow libraries such as BioSimSpace [@hedges2023suite]. This further enhances its accessibility and usability within broader scientific workflows, reinforcing its position as a versatile and essential tool in the field of computational chemistry. # Implementation -The binding free energy of a drug within a protein is defined as the disparity in free energy between the drug's end-state in the protein's binding pocket and its alternative end-state in a solution, typically water. Absolute binding free energy calculations employ a thermodynamic cycle that establishes a connection between these two end-states through two alchemical legs, namely the bound and free legs (Figure 1). In the bound leg, the drug is decoupled from the binding pocket, while in the free leg, the same drug is decoupled from the solvent. The resulting free energy difference represents the energy required to transfer the drug from the solvent to the protein binding pocket, constituting the binding free energy of the drug. +The binding free energy of a drug within a protein is defined as the difference in free energy between the drug's end-state in the protein's binding pocket and its alternative end-state in a solution, typically water. Absolute binding free energy calculations employ a thermodynamic cycle that establishes a connection between these two end-states through two alchemical legs, namely the bound and free legs (Figure 1). In the bound leg, the drug is decoupled from the binding pocket, while in the free leg, the same drug is decoupled from the solvent. The resulting free energy difference represents the energy required to transfer the drug from the solvent to the protein binding pocket, constituting the binding free energy of the drug. -![The thermodynamics cycle of Absolute binding free energy calculation](Fig1.pdf) +![The thermodynamics cycle of absolute binding free energy calculation](Fig1.pdf) To determine the free energy difference associated with decoupling a drug from its environment, it is essential to ensure sufficient overlap in phase space between the coupled and decoupled states, a condition often challenging to achieve. Overlapping is facilitated by introducing a parameter lambda ($\lambda $) that connects the two end-states, leading to the creation of a series of intermediate states. MD engines are employed to simulate the system at these states, generating and accumulating free energy data. -*Alchemlyb* offers specific parsers designed to load raw free energy data from various MD engines, converting them into standard pandas dataframes. Two types of free energy data are considered: potential energy differences between adjacent lambda states (u_nk), suitable for free energy perturbation (FEP) methods [@zwanzig1954high], and $dU/d\lambda $ at all lambda states, suitable for thermodynamic integration (TI) methods [@kirkwood1935statistical]. +*Alchemlyb* offers specific parsers designed to load raw free energy data from various MD engines, converting them into standard pandas dataframes. Two types of free energy data are considered: potential energy differences between lambda states (u_nk), which are used for free energy perturbation (FEP) methods [@zwanzig1954high], and $dU/d\lambda $ at all lambda states, suitable for thermodynamic integration (TI) methods [@kirkwood1935statistical]. -In *alchemlyb*, TI [@paliwal2011benchmark] and TI with Gaussian quadrature [@gusev2023active] methods are implemented in the TI category. Perturbation category methods include Bennett Acceptance Ratio (BAR) [@bennett1976efficient] and Multistate BAR (MBAR) [@shirts2008statistically]. These methods necessitate uncorrelated samples, and *alchemlyb* provides tools for data resampling based on autocorrelation times [@chodera2007use]. +In *alchemlyb*, TI [@paliwal2011benchmark] and TI with Gaussian quadrature [@gusev2023active] methods are implemented in the TI category. Perturbation category methods include Bennett Acceptance Ratio (BAR) [@bennett1976efficient] and Multistate BAR (MBAR) [@shirts2008statistically]. These methods assume uncorrelated samples, and *alchemlyb* provides tools for data resampling based on autocorrelation times [@chodera2007use]. -To evaluate the accuracy of the free energy estimate, *alchemlyb* offers specific assessment tools. The error of the TI method is correlated with the average curvature [@pham2011identifying], while the error of perturbation methods depends on the overlap in sampled energy distributions [@pohorille2010good]. *Alchemlyb* visualizes the smoothness of the integrand for TI methods and the overlap matrix for perturbation methods. Additionally, the accumulated samples should be at an equilibrated state, and *alchemlyb* allows for plotting the convergence of the free energy estimate as a function of simulation time [@yang2004free] to detect potentially un-equilibrated data. +To evaluate the accuracy of the free energy estimate, *alchemlyb* offers a range of assessment tools. The error of the TI method is correlated with the average curvature [@pham2011identifying], while the error of perturbation methods depends on the overlap in sampled energy distributions [@pohorille2010good]. *Alchemlyb* visualizes the smoothness of the integrand for TI methods and the overlap matrix for perturbation methods. Additionally, the accumulated samples should be collected from equilibrated simulations, and *alchemlyb* has tools for plotting the convergence of the free energy estimate as a function of simulation time [@yang2004free] to detect potentially un-equilibrated data. - -*Alchemlyb* offers all these tools as a library for users to customize each stage of the analysis (Figure 2). Additionally, *alchemlyb* provides an automated end-to-end tool that reads in the raw input data and performs the decorelation, estimation, and quality plotting of the estimate. This automated workflow allows users to experience a similar process as [alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) [@klimovich2015guidelines], which is the predecessor of *alchemlyb*. +*Alchemlyb* offers all these tools as a library for users to customize each stage of the analysis (Figure 2). Additionally, *alchemlyb* provides an automated end-to-end workflow that reads in the raw input data and performs the decorelation, estimation, and quality plotting of the estimate, similar to the precessesor[alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) [@klimovich2015guidelines]. ![The building blocks of *alchemlyb*](Fig2.pdf) - # Acknowledgements We acknowledge contributions from XXXXX during the genesis of this project.