Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add entropy and enthalpy #406

Merged
merged 15 commits into from
Nov 14, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions CHANGES
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,12 @@ The rules for this file:
* accompany each entry with github issue/PR number (Issue #xyz)
* release numbers follow "Semantic Versioning" https://semver.org

**/**/**** xiki-tempula
**/**/**** xiki-tempula, jaclark5

* 2.5.0

Enhancements
- Added matrices for entropy and enthalpy for MBAR (PR #406)
jaclark5 marked this conversation as resolved.
Show resolved Hide resolved
- Parallelise read and preprocess for ABFE workflow. (PR #371)


Expand All @@ -27,7 +28,7 @@ Enhancements
Fixes
- [doc] tutorial: use alchemlyb.concat (PR #399)
- Resolve pandas FutureWarnings in bar_.py and mbar_.py (issue #408 PR #407)


09/17/2024 jaclark5, orbeckst

Expand Down
20 changes: 20 additions & 0 deletions src/alchemlyb/estimators/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ class _EstimatorMixOut:
_d_delta_f_ = None
_delta_f_ = None
_states_ = None
_d_delta_h_ = None
_delta_h_ = None
_d_delta_s_ = None
_delta_s_ = None

@property
def d_delta_f_(self):
Expand All @@ -14,6 +18,22 @@ def d_delta_f_(self):
def delta_f_(self):
return self._delta_f_

@property
def d_delta_h_(self):
return self._d_delta_h_

@property
def delta_h_(self):
return self._delta_h_

@property
def d_delta_s_(self):
return self._d_delta_s_

@property
def delta_s_(self):
return self._delta_s_

@property
def states_(self):
return self._states_
60 changes: 52 additions & 8 deletions src/alchemlyb/estimators/mbar_.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,20 @@ class MBAR(BaseEstimator, _EstimatorMixOut):
The estimated statistical uncertainty (one standard deviation) in
dimensionless free energy differences.

delta_h_ : DataFrame
The estimated dimensionless enthalpy difference between each state.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is enthalpy dimensionless? Should it be having energy unit?

Copy link
Contributor Author

@jaclark5 jaclark5 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiki-tempula Do you want me to change the docstring? I put this since the existing documentation for free energy is:

    delta_f_ : DataFrame
        The estimated dimensionless free energy difference between each state.

and I'm treating enthalpy and entropy the same as free energy... it seems I should change entropy appropriately though. See the new commit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then I think this is fine. I will change this in a later PR.


d_delta_h_ : DataFrame
The estimated statistical uncertainty (one standard deviation) in
dimensionless enthalpy differences.

delta_s_ : DataFrame, optional
The estimated dimensionless entropy difference between each state.

d_delta_s_ : DataFrame
The estimated statistical uncertainty (one standard deviation) in
dimensionless entropy differences.

theta_ : DataFrame
The theta matrix.

Expand All @@ -80,9 +94,11 @@ class MBAR(BaseEstimator, _EstimatorMixOut):
.. versionchanged:: 2.1.0
`n_bootstraps` option added.
.. versionchanged:: 2.4.0
Handle initial estimate, initial_f_k, from bar in the instance
that not all lambda states represented as column headers are
represented in the indices of u_nk.
Handle initial estimate, initial_f_k, from bar in the instance
that not all lambda states represented as column headers are
represented in the indices of u_nk.
.. versionchanged:: 2.5.0
Added computation of enthalpy and entropy

"""

Expand Down Expand Up @@ -110,7 +126,7 @@ def __init__(
# handle for pymbar.MBAR object
self._mbar = None

def fit(self, u_nk):
def fit(self, u_nk, compute_entropy_enthalpy=False):
"""
Compute overlap matrix of reduced potentials using multi-state
Bennett acceptance ratio.
Expand All @@ -121,6 +137,9 @@ def fit(self, u_nk):
``u_nk[n, k]`` is the reduced potential energy of uncorrelated
configuration ``n`` evaluated at state ``k``.

compute_entropy_enthalpy : bool, optional, default=False
Compute entropy and enthalpy.

"""
# sort by state so that rows from same state are in contiguous blocks
u_nk = u_nk.sort_index(level=u_nk.index.names[1:])
Expand Down Expand Up @@ -171,13 +190,18 @@ def fit(self, u_nk):
solver_protocol=self.method,
n_bootstraps=self.n_bootstraps,
)
if self.n_bootstraps == 0:
uncertainty_method = None
else:
uncertainty_method = "bootstrap"

uncertainty_method = None if self.n_bootstraps == 0 else "bootstrap"
out = self._mbar.compute_free_energy_differences(
return_theta=True, uncertainty_method=uncertainty_method
)
if compute_entropy_enthalpy:
out.update(
self._mbar.compute_entropy_and_enthalpy(
uncertainty_method=uncertainty_method
)
)

self._delta_f_ = pd.DataFrame(
out["Delta_f"], columns=self._states_, index=self._states_
)
Expand All @@ -187,9 +211,29 @@ def fit(self, u_nk):
self.theta_ = pd.DataFrame(
out["Theta"], columns=self._states_, index=self._states_
)
if compute_entropy_enthalpy:
self._delta_h_ = pd.DataFrame(
out["Delta_u"], columns=self._states_, index=self._states_
)
self._d_delta_h_ = pd.DataFrame(
out["dDelta_u"], columns=self._states_, index=self._states_
)
self._delta_s_ = pd.DataFrame(
out["Delta_s"], columns=self._states_, index=self._states_
)
self._d_delta_s_ = pd.DataFrame(
out["dDelta_s"], columns=self._states_, index=self._states_
)

self._delta_f_.attrs = u_nk.attrs
self._d_delta_f_.attrs = u_nk.attrs
if compute_entropy_enthalpy:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you do add the units back.

self._delta_h_.attrs = u_nk.attrs
self._d_delta_h_.attrs = u_nk.attrs
self._delta_s_.attrs = u_nk.attrs
self._d_delta_s_.attrs = u_nk.attrs
self._delta_s_.attrs["energy_unit"] = "k"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does entropy has unit of k? I think it shall be u_nk.attrs["energy_unit"] + '/K'?

Copy link
Collaborator

@xiki-tempula xiki-tempula Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm fine for this unit to be the same as other energy unit (though I know it is wrong) as it would mean that the unit conversion would work as it is (between J and cal).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Free energy is in units of energy (J) and when scaled by kT, is in units of kT. Similarly entropy is in units of energy/unit temperature (J/K) and so is scaled by k, with the units of k.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced entropy should be unitless, entropy should be J/degree, right? Even if passed through some unitless quantities, probably the code should be clear what is going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to have the correct units, so I'll look into the unit conversion, I didn't consider that feature.

Copy link
Contributor Author

@jaclark5 jaclark5 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way alchemlyb defines the units as u_nk.attrs['energy_unit'] = "kT". Adopting the same treatment for entropy to similarly have the correct scaling quantity, 'k' would be used to follow suit.
If alchemlyb's definition of u_nk.attrs['energy_unit'] = "kT" is not the message that is best to send, that's outside of this PR. Although I like it, I think implies that it's dimensionless while also indicating what quantities should be used to restore the units (although I hope that an average user would know).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The energy_unit, allowed to be kT; kcal/mol and kJ/mol, it is just defaults to kT. I think k is more like a scientific constant for most people so having it as unit is kind of bad.
The unit conversion is supported between these three units, which is just a scaling factor.
So the easiest solution is to just set the entropy to have the same unit as enthalpy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To adjust the unit conversion functionality to accommodate entropy, should I add the appropriate duplicated function of to_kT as to_k or add if-statements to detect the case where it's entropy? I'm leaning toward the former since running a matrix of entropy values through to_kT() would feel wrong. This would mean I would duplicate to_kcalmol(df, T=None) as to_kcalmolK(df) and to_kJmol(df, T=None) as to_kJmolK(df).

Copy link
Contributor Author

@jaclark5 jaclark5 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiki-tempula entropy can't have the same units at kT, otherwise the existing functions would multiply it by a temperature resulting in S*T instead of S. If that's what you want to do we can and be clear in the documentation.

Although the units would still be incorrect all the time with a label of "entropy", maybe adjusting the attribute to be delta_sT would be best.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we could have enthalpy in kT, kJ/mol and kcal/mol.
The entropy should then be in k, kJ/mol/T and kcal/mol/T.

In the current implementation we could have a case where enthalpy is kJ/mol and it is labelled correctly as kJ/mol. But we have entropy in kJ/mol/T but is labelled as k.

self._d_delta_s_.attrs["energy_unit"] = "k"

return self

Expand Down
25 changes: 25 additions & 0 deletions src/alchemlyb/tests/test_fep_estimators.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,31 @@ def test_bootstrap(gmx_benzene_Coulomb_u_nk):
assert mbar_bootstrap_err != mbar_err


def test_enthalpy_entropy_mbar(gmx_benzene_Coulomb_u_nk):
mbar = MBAR()
u_nk = alchemlyb.concat(gmx_benzene_Coulomb_u_nk)
mbar.fit(u_nk, compute_entropy_enthalpy=True)

assert mbar.delta_f_.iloc[0, :].to_numpy() == pytest.approx(
np.array([0.0, 1.619069, 2.557990, 2.986302, 3.041156]), abs=1e-6
)
assert mbar.delta_h_.iloc[0, :].to_numpy() == pytest.approx(
np.array([0.0, 1.241970, 1.695000, 1.706555, 1.388906]), abs=1e-6
)
assert mbar.delta_s_.iloc[0, :].to_numpy() == pytest.approx(
np.array([0.0, -0.377099, -0.862990, -1.279746, -1.652250]), abs=1e-6
)
assert mbar.d_delta_f_.iloc[0, :].to_numpy() == pytest.approx(
np.array([0.0, 0.008802, 0.014432, 0.018097, 0.020879]), abs=1e-6
)
assert mbar.d_delta_h_.iloc[0, :].to_numpy() == pytest.approx(
np.array([0.0, 0.011598, 0.016538, 0.018077, 0.018940]), abs=1e-6
)
assert mbar.d_delta_s_.iloc[0, :].to_numpy() == pytest.approx(
np.array([0.0, 0.012242, 0.019519, 0.023606, 0.026107]), abs=1e-6
)


def test_wrong_initial_f_k():
with pytest.raises(
ValueError, match="Only `BAR` is supported as string input to `initial_f_k`"
Expand Down
Loading