Skip to content

Commit

Permalink
edit
Browse files Browse the repository at this point in the history
  • Loading branch information
xiki-tempula committed Dec 29, 2023
1 parent e8976be commit 7816081
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 10 deletions.
Binary file added joss_paper/Fig1.pdf
Binary file not shown.
Binary file added joss_paper/Fig2.pdf
Binary file not shown.
25 changes: 15 additions & 10 deletions joss_paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,37 +29,42 @@ bibliography: paper.bib

# Summary

alchemlyb is a dedicated open-source software package tailored for the analysis of alchemical free energy calculations, an integral part of computational chemistry and biology, notably in the field of drug discovery. The software spans a wide range of functions, starting with the extraction of raw data from Molecular Dynamics (MD) engines, moving on to data preprocessing tasks such as decorrelation, using various estimators to derive free energy estimates, and finally providing quality analysis tools for data convergence checking.
*Alchemlyb* is a dedicated open-source software package tailored for the analysis of alchemical free energy calculations, an integral part of computational chemistry and biology, notably in the field of drug discovery. The software spans a wide range of functions, starting with the extraction of raw data from Molecular Dynamics (MD) engines, moving on to data preprocessing tasks such as decorrelation, using various estimators to derive free energy estimates, and finally providing quality analysis tools for data convergence checking.

A distinctive attribute of alchemlyb is its streamlined, end-to-end analysis process reminiscent of the now-discontinued alchemical analysis workflow. This user-friendly workflow facilitates navigation through the entire analysis pipeline, from the initial data input stage to the final result derivation, enabling researchers from diverse scientific backgrounds, and not solely computational chemistry specialists, to utilize alchemlyb effectively.
A distinctive attribute of *alchemlyb* is its streamlined, end-to-end analysis process reminiscent of the now-discontinued [alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) workflow. This user-friendly workflow facilitates navigation through the entire analysis pipeline, from the initial data input stage to the final result derivation, enabling researchers from diverse scientific backgrounds, and not solely computational chemistry specialists, to utilize alchemlyb effectively.

# Statement of need

In the pharmaceutical sector, the utilization of computational chemistry techniques is integral for evaluating potential drug compounds based on their protein-binding affinity [@deng2009computations]. Notably, relative/absolute binding free energy calculations are routinely employed for this purpose [@merz2010drug]. The resultant free energy data is essential for understanding binding affinity throughout various stages of drug discovery, such as hit identification and lead optimization [@merz2010drug]. The alchemlyb software adeptly processes this data, providing crucial insights and establishing itself as an indispensable asset in computational chemistry.
In the pharmaceutical sector, the utilization of computational chemistry techniques is integral for evaluating potential drug compounds based on their protein-binding affinity [@deng2009computations]. Notably, relative/absolute binding free energy calculations are routinely employed for this purpose [@merz2010drug]. The resultant free energy data is essential for understanding binding affinity throughout various stages of drug discovery, such as hit identification and lead optimization [@merz2010drug]. The *alchemlyb* software adeptly processes this data, providing crucial insights and establishing itself as an indispensable asset in computational chemistry [^1].

[^1]: As of 29/12/2023, *alchemlyb* has been downloaded 23,922 times from [conda-forge](https://anaconda.org/conda-forge/alchemlyb/files).

In the realm of computational research, various molecular dynamics (MD) engines, including GROMACS [@pronk2013gromacs], AMBER [@case2014ff14sb], GOMC [@cummings2021open], and NAMD [@phillips2020scalable], offer distinct tools for conducting free energy calculations. However, the diversity in output formats and analysis tools among different MD engines complicates the research process. Data generated by each engine requires unique processing and analysis methods, hindering seamless collaboration and comparison of results.


Addressing this complexity is the "alchemicalanalysis.py" tool [@klimovich2015guidelines], which precedes alchemlyb. Although "alchemicalanalysis.py" has been deprecated, alchemlyb continues to provide a unified, engine-agnostic analysis workflow. Unlike its predecessor, alchemlyb breaks down components into individual tools, allowing users to customize their analysis. This innovation enables consistent processing of free energy data from diverse MD engines, facilitating streamlined comparison and combination of results.
Addressing this complexity is the [alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) tool [@klimovich2015guidelines], which precedes *alchemlyb*. Although [alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) has been deprecated, *alchemlyb* continues to provide a unified, engine-agnostic analysis workflow. Unlike its predecessor, *alchemlyb* breaks down components into individual tools, allowing users to customize their analysis. This innovation enables consistent processing of free energy data from diverse MD engines, facilitating streamlined comparison and combination of results.


Notably, alchemlyb's robust and user-friendly nature has led to its integration into other automated workflow libraries such as Biosimspace [@hedges2023suite]. This further enhances its accessibility and usability within broader scientific workflows, reinforcing its position as a versatile and essential tool in the field of computational chemistry.
Notably, *alchemlyb*'s robust and user-friendly nature has led to its integration into other automated workflow libraries such as Biosimspace [@hedges2023suite]. This further enhances its accessibility and usability within broader scientific workflows, reinforcing its position as a versatile and essential tool in the field of computational chemistry.

# Implementation

The binding free energy of a drug within a protein is defined as the disparity in free energy between the drug's end-state in the protein's binding pocket and its alternative end-state in a solution, typically water. Absolute binding free energy calculations employ a thermodynamic cycle that establishes a connection between these two end-states through two alchemical legs, namely the bound and free legs (Figure 1). In the bound leg, the drug is decoupled from the binding pocket, while in the free leg, the same drug is decoupled from the solvent. The resulting free energy difference represents the energy required to transfer the drug from the solvent to the protein binding pocket, constituting the binding free energy of the drug.

To determine the free energy difference associated with decoupling a drug from its environment, it is essential to ensure sufficient overlap in phase space between the coupled and decoupled states, a condition often challenging to achieve. Overlapping is facilitated by introducing a parameter lambda that connects the two end-states, leading to the creation of a series of intermediate states. Molecular dynamics (MD) engines are employed to simulate the system at these states, generating and accumulating free energy data.
![The thermodynamics cycle of Absolute binding free energy calculation](Fig1.pdf)

Alchemlyb offers specific parsers designed to load raw free energy data from various MD engines, converting them into standard pandas dataframes. Two types of free energy data are considered: potential energy differences between adjacent lambda states, suitable for free energy perturbation (FEP) methods [@zwanzig1954high], and dU/dlambda at all lambda states, suitable for thermodynamic integration (TI) methods [@kirkwood1935statistical].
To determine the free energy difference associated with decoupling a drug from its environment, it is essential to ensure sufficient overlap in phase space between the coupled and decoupled states, a condition often challenging to achieve. Overlapping is facilitated by introducing a parameter lambda ($\lambda $) that connects the two end-states, leading to the creation of a series of intermediate states. MD engines are employed to simulate the system at these states, generating and accumulating free energy data.

In alchemlyb, TI [@paliwal2011benchmark] and TI with Gaussian quadrature [@gusev2023active] methods are implemented in the TI category. Perturbation category methods include Bennett Acceptance Ratio (BAR) [@bennett1976efficient] and Multistate BAR (MBAR) [@shirts2008statistically]. These methods necessitate uncorrelated samples, and alchemlyb provides tools for data resampling based on autocorrelation times [@chodera2007use].
*Alchemlyb* offers specific parsers designed to load raw free energy data from various MD engines, converting them into standard pandas dataframes. Two types of free energy data are considered: potential energy differences between adjacent lambda states (u_nk), suitable for free energy perturbation (FEP) methods [@zwanzig1954high], and $dU/d\lambda $ at all lambda states, suitable for thermodynamic integration (TI) methods [@kirkwood1935statistical].

To evaluate the accuracy of the free energy estimate, alchemlyb offers specific assessment tools. The error of the TI method is correlated with the average curvature [@pham2011identifying], while the error of perturbation methods depends on the overlap in sampled energy distributions [@pohorille2010good]. Alchemlyb visualizes the smoothness of the integrand for TI methods and the overlap matrix for perturbation methods. Additionally, the accumulated samples should be at an equilibrated state, and alchemlyb allows for plotting the convergence of the free energy estimate as a function of simulation time [@yang2004free] to detect potentially un-equilibrated data.
In *alchemlyb*, TI [@paliwal2011benchmark] and TI with Gaussian quadrature [@gusev2023active] methods are implemented in the TI category. Perturbation category methods include Bennett Acceptance Ratio (BAR) [@bennett1976efficient] and Multistate BAR (MBAR) [@shirts2008statistically]. These methods necessitate uncorrelated samples, and *alchemlyb* provides tools for data resampling based on autocorrelation times [@chodera2007use].

To evaluate the accuracy of the free energy estimate, *alchemlyb* offers specific assessment tools. The error of the TI method is correlated with the average curvature [@pham2011identifying], while the error of perturbation methods depends on the overlap in sampled energy distributions [@pohorille2010good]. *Alchemlyb* visualizes the smoothness of the integrand for TI methods and the overlap matrix for perturbation methods. Additionally, the accumulated samples should be at an equilibrated state, and *alchemlyb* allows for plotting the convergence of the free energy estimate as a function of simulation time [@yang2004free] to detect potentially un-equilibrated data.

Alchemlyb offers all these tools as a library for users to customize each stage of the analysis (Figure 2). Additionally, alchemlyb provides an automated end-to-end tool that reads in the raw input data and performs the decorelation, estimation, and quality plotting of the estimate. This automated workflow allows users to experience a similar process as “alchemical–analysis.py" [@klimovich2015guidelines], which is the predecessor of alchemlyb.

*Alchemlyb* offers all these tools as a library for users to customize each stage of the analysis (Figure 2). Additionally, *alchemlyb* provides an automated end-to-end tool that reads in the raw input data and performs the decorelation, estimation, and quality plotting of the estimate. This automated workflow allows users to experience a similar process as [alchemical-analysis.py](https://github.com/MobleyLab/alchemical-analysis) [@klimovich2015guidelines], which is the predecessor of *alchemlyb*.

![The building blocks of *alchemlyb*](Fig2.pdf)


# Acknowledgements
Expand Down

0 comments on commit 7816081

Please sign in to comment.