From 081e90c834b5a1794e2a5b95801639e7c91e7e02 Mon Sep 17 00:00:00 2001 From: David Dotson Date: Mon, 8 Apr 2024 19:52:02 -0700 Subject: [PATCH] Small edits, clauses set as newlines for easier diffs --- joss_paper/paper.md | 77 +++++++++++++++++++++++++++++++++++---------- 1 file changed, 61 insertions(+), 16 deletions(-) diff --git a/joss_paper/paper.md b/joss_paper/paper.md index 5df0587c..c5e3fbfb 100644 --- a/joss_paper/paper.md +++ b/joss_paper/paper.md @@ -22,7 +22,7 @@ authors: affiliations: - name: Exscientia, Oxford, UK index: 1 - - name: Institution Name, Country + - name: Datryllic LLC, Phoenix, AZ, USA index: 2 - name: University of Colorado Boulder, Boulder, Colorado, USA index: 3 @@ -34,41 +34,86 @@ bibliography: paper.bib # Summary -*Alchemlyb* is a dedicated open-source software package tailored for the analysis of alchemical free energy calculations, an integral part of computational chemistry and biology, most notably in the field of drug discovery. The software spans a wide range of functions, starting with the extraction of raw data from molecular dynamics (MD) engines, moving on to data preprocessing tasks such as decorrelation of time series, using various estimators to derive free energy estimates from simulation samples, and finally providing quality analysis tools for data convergence checking. +*Alchemlyb* is a dedicated open-source software package tailored for the analysis of alchemical free energy calculations, + an integral part of computational chemistry and biology, + most notably in the field of drug discovery. +The software spans a wide range of functions, + starting with the extraction of raw data from molecular dynamics (MD) engines, + moving on to data preprocessing tasks such as decorrelation of time series, + using various estimators to derive free energy estimates from simulation samples, + and finally providing quality analysis tools for data convergence checking. -A distinctive attribute of *alchemlyb* is its streamlined, end-to-end analysis workflow. This user-friendly workflow facilitates navigation through the entire analysis pipeline, from the initial data input stage to the final result derivation. This attribute enhances accessibility, enabling researchers from diverse scientific backgrounds, and not solely computational chemistry specialists, to utilize alchemlyb effectively. +A distinctive attribute of *alchemlyb* is its streamlined, end-to-end analysis workflow. +This user-friendly workflow facilitates navigation through the entire analysis pipeline, + from the initial data input stage to the final result derivation. This attribute enhances accessibility, + enabling researchers from diverse scientific backgrounds, + and not solely computational chemistry specialists, + to utilize *alchemlyb* effectively. # Statement of need -In the pharmaceutical sector, computational chemistry techniques are integral for evaluating potential drug compounds based on their protein binding affinity [@deng2009computations]. Notably, absolute binding free energy calculations between proteins and ligands or relative binding affinity of ligands to the same protein are routinely employed for this purpose [@merz2010drug]. The resultant estimates of these free energies are essential for understanding binding affinity throughout various stages of drug discovery, such as hit identification and lead optimization [@merz2010drug]. Other free energies extracted from simulations are useful in solution thermodynamics, chemical engineering, environmental science, and material science. The *alchemlyb* software processes the raw data from MD simulations using key estimators from statistical mechanics, drastically simplifying the process of extracting crucial thermodynamic insights from molecular simulations . +In the pharmaceutical sector, computational chemistry techniques are integral for evaluating potential drug compounds based on their protein binding affinity [@deng2009computations]. +Notably, absolute binding free energy calculations between proteins and ligands or relative binding affinity of ligands to the same protein are routinely employed for this purpose [@merz2010drug]. +The resultant estimates of these free energies are essential for understanding binding affinity throughout various stages of drug discovery, such as hit identification and lead optimization [@merz2010drug]. +Other free energies extracted from simulations are useful in solution thermodynamics, chemical engineering, environmental science, and material science. +The *alchemlyb* software processes the raw data from MD simulations using key estimators from statistical mechanics, drastically simplifying the process of extracting crucial thermodynamic insights from molecular simulations. -Various molecular dynamics (MD) engines, including GROMACS [@pronk2013gromacs], AMBER [@case2014ff14sb], GOMC [@cummings2021open], and NAMD [@phillips2020scalable], offer distinct tools for performing free energy calculations. However, the diversity in output formats and analysis tools among different MD engines complicates the research process. Data generated by each engine requires individualized processing and analysis methods, hindering seamless collaboration and comparison of results. +Various molecular dynamics (MD) engines, including GROMACS [@pronk2013gromacs], AMBER [@case2014ff14sb], GOMC [@cummings2021open], and NAMD [@phillips2020scalable], + offer distinct tools for performing free energy calculations. +However, the diversity in output formats and analysis tools among different MD engines complicates the research process. +Data generated by each engine requires individualized processing and analysis methods, hindering seamless collaboration and comparison of results. -THe [alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) tool [@klimovich2015guidelines], which preceeded *alchemlyb*, addressed this problem. Now that [alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) has been deprecated, *alchemlyb* continues to provide a unified, engine-agnostic analysis workflow. Unlike its predecessor, *alchemlyb* breaks down components of the workflow into modular tools, allowing users to more easily customize their analysis. This innovation enables consistent processing of free energy data from diverse MD engines, facilitating streamlined comparison and combination of results. +THe [alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) tool [@klimovich2015guidelines], which preceeded *alchemlyb*, addressed this problem. +Now that [alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) has been deprecated, + *alchemlyb* continues to provide a unified, engine-agnostic analysis workflow. +Unlike its predecessor, *alchemlyb* breaks down components of the workflow into modular tools, + allowing users to more easily customize their analysis. +This innovation enables consistent processing of free energy data from diverse MD engines, facilitating streamlined comparison and combination of results. -Notably, *alchemlyb*'s robust and user-friendly nature has led to its integration into other automated workflow libraries such as BioSimSpace [@hedges2023suite]. This further enhances its accessibility and usability within broader scientific workflows, reinforcing its position as a versatile and essential tool in the field of computational chemistry [^1]. +Notably, *alchemlyb*'s robust and user-friendly nature has led to its integration into other automated workflow libraries such as BioSimSpace [@hedges2023suite]. +This further enhances its accessibility and usability within broader scientific workflows, + reinforcing its position as a versatile and essential tool in the field of computational chemistry [^1]. [^1]: As of 29/12/2023, *alchemlyb* has been downloaded 23,922 times from [conda-forge](https://anaconda.org/conda-forge/alchemlyb/files). # Implementation -Solvation free energy, a key physical property often computed by computational chemists, involves constructing two end states: one where the ligand interacts with water and itself (coupled state), and the other where the ligand interacts only with itself, mimicking the pure solvent and ligand in the gas phase (decoupled state). The solvation free energy is then obtained by calculating the free energy difference between these two end states. To achieve this, it is crucial to ensure sufficient overlap in phase space between the coupled and decoupled states, a condition often challenging to achieve. Overlapping is facilitated by introducing a parameter lambda ($\lambda $) that connects the two end-states, resulting in a series of intermediate states. MD engines simulate the system at these states, generating and accumulating free energy data. +Solvation free energy, a key physical property often computed by computational chemists, involves constructing two end states: + one where the ligand interacts with water and itself (coupled state), + and the other where the ligand interacts only with itself, mimicking the pure solvent and ligand in the gas phase (decoupled state). +The solvation free energy is then obtained by calculating the free energy difference between these two end states. +To achieve this, it is crucial to ensure sufficient overlap in phase space between the coupled and decoupled states, a condition often challenging to achieve. +Overlapping is facilitated by introducing a parameter `lambda` ($\lambda $) that connects the two end-states, resulting in a series of intermediate states. +MD engines simulate the system at these states, generating and accumulating free energy data. + +*alchemlyb* offers specific parsers designed to load raw free energy data from various MD engines, converting them into standard `pandas` `DataFrames`. +Two types of free energy data are considered: reduced potential energy differences between lambda states (`u_nk`, $u_{nk}$), which are used for free energy perturbation (FEP) estimators [@zwanzig1954high], + and $dU/d\lambda$ at all lambda states, suitable for thermodynamic integration (TI) estimators [@kirkwood1935statistical]. + +In *alchemlyb*, TI [@paliwal2011benchmark] and TI with Gaussian quadrature [@gusev2023active] estimators are implemented in the TI category of estimators. +FEP category estimators include Bennett Acceptance Ratio (BAR) [@bennett1976efficient] and Multistate BAR (MBAR) [@shirts2008statistically]. +These estimators assume uncorrelated samples, and *alchemlyb* provides tools for data resampling based on autocorrelation times [@chodera2007use]. + +To evaluate the accuracy of the free energy estimate, *alchemlyb* offers a range of assessment tools. +The error of the TI method is correlated with the average curvature [@pham2011identifying], while the error of FEP estimators depends on the overlap in sampled energy distributions [@pohorille2010good]. +*alchemlyb* visualizes the smoothness of the integrand for TI estimators and the overlap matrix for FEP estimators. +Additionally, the accumulated samples should be collected from equilibrated simulations, + and *alchemlyb* has tools for plotting the convergence of the free energy estimate as a function of simulation time [@yang2004free] to detect potentially un-equilibrated data. + +*Alchemlyb* offers all these tools as a library for users to customize each stage of the analysis (Figure 1). +Additionally, *alchemlyb* provides an automated end-to-end workflow that reads in the raw input data and performs decorrelation, estimation, and quality plotting of the estimate. +This workflow allows for the estimation of quantities such as solvation free energy with minimal code. +Moreover, this facilitates more complex calculations, such as absolute binding free energy, which is the free energy difference between the solvation free energy of the ligand in water and the solvation free energy of the ligand in the protein's binding pocket. -*Alchemlyb* offers specific parsers designed to load raw free energy data from various MD engines, converting them into standard pandas dataframes. Two types of free energy data are considered: potential energy differences between lambda states (u_nk), which are used for free energy perturbation (FEP) methods [@zwanzig1954high], and $dU/d\lambda $ at all lambda states, suitable for thermodynamic integration (TI) methods [@kirkwood1935statistical]. - -In *alchemlyb*, TI [@paliwal2011benchmark] and TI with Gaussian quadrature [@gusev2023active] methods are implemented in the TI category. Perturbation category methods include Bennett Acceptance Ratio (BAR) [@bennett1976efficient] and Multistate BAR (MBAR) [@shirts2008statistically]. These methods assume uncorrelated samples, and *alchemlyb* provides tools for data resampling based on autocorrelation times [@chodera2007use]. - -To evaluate the accuracy of the free energy estimate, *alchemlyb* offers a range of assessment tools. The error of the TI method is correlated with the average curvature [@pham2011identifying], while the error of perturbation methods depends on the overlap in sampled energy distributions [@pohorille2010good]. *Alchemlyb* visualizes the smoothness of the integrand for TI methods and the overlap matrix for perturbation methods. Additionally, the accumulated samples should be collected from equilibrated simulations, and *alchemlyb* has tools for plotting the convergence of the free energy estimate as a function of simulation time [@yang2004free] to detect potentially un-equilibrated data. +![The building blocks of *alchemlyb*](Fig1.pdf) -*Alchemlyb* offers all these tools as a library for users to customize each stage of the analysis (Figure 1). Additionally, *alchemlyb* provides an automated end-to-end workflow that reads in the raw input data and performs decorrelation, estimation, and quality plotting of the estimate. This workflow allows for the estimation of quantities such as solvation free energy with minimal code. Moreover, one could it facilitates more complex calculations, such as absolute binding free energy, which is the free energy difference between the solvation free energy of the ligand in water and the solvation free energy of the ligand in the protein's binding pocket. -![The building blocks of *alchemlyb*](Fig1.pdf) # Acknowledgements -O.B. and D.D. designed the project. Z.W., D.D., contribute to the new features. Z.W., D.D., O.B. maintain the codebase. Z.W., M.R.S wrote the manuscript. +O.B. and D.D. designed the project. Z.W., D.D., contributed to the new features. Z.W., D.D., O.B. maintain the codebase. Z.W., M.R.S wrote the manuscript. # References