Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add entropy and enthalpy #406

Merged
merged 15 commits into from
Nov 14, 2024

Conversation

jaclark5
Copy link
Contributor

Add enthalpy and entropy outputs for MBAR and BAR (via 2-state MBAR).

Copy link

codecov bot commented Sep 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.80%. Comparing base (9e8139e) to head (d4dbeb0).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #406      +/-   ##
==========================================
+ Coverage   98.78%   98.80%   +0.01%     
==========================================
  Files          28       28              
  Lines        1982     2008      +26     
  Branches      354      356       +2     
==========================================
+ Hits         1958     1984      +26     
  Misses          2        2              
  Partials       22       22              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Parameters
----------
u_nk : DataFrame
u_nk[n,k] is the reduced potential energy of uncorrelated
configuration n evaluated at state k.

use_mbar : bool, optional, default=False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry do you mind explain to me why do you want to do this in BAR? I think if you want the enthalpic and entropic contributions, you would just run MBAR?

@xiki-tempula
Copy link
Collaborator

I'm happy with the MBAR changes, it looks simple and good. I kind of not too happy with the BAR one as it looks very complicated and I think if one want to use MBAR to compute entropy and enthalpy.
One shall just use MBAR?

Or one could do a BAR calculation first and then run MBAR again to get entropy and enthalpy. Instead of use BAR and then run MBAR inside BAR to get entropy and enthalpy.

@mrshirts
Copy link

@jaclark5 and I have been talking over things (she can discuss her side here) and the current thought is that MBAR is better for entropy and enthalpies, though there are some questions about the calculations she is still interested in.

The BAR algorithm is more surely converging since there are better methods for 1-dimensional root finding (I think I coded up false position) than N-dimensional root finding. One can start with BAR, and then use those solutions to initialize 2-state MBAR (should just take 1-2 steps as it locks in numerically, as the values should be the same).

@jaclark5
Copy link
Contributor Author

So the suggestion is to remove the use_mbar flag, and instead run 2-state mbar only for the entropies / enthalpies? If that's the case, then if calculate_enthalpy_entropy is True then BAR.fit() will run BAR and 2-state MBAR in fit(). I can make that change if it sounds agreeable.

@xiki-tempula
Copy link
Collaborator

Would this kind of interface work?

bar = BAR()
bar.fit(u_nk)
mbar=MBAR(initial_guess=bar._delta_f) # Need to do some data massage to be able to get this in as initial guess.
mbar.fit(u_nk, compute_entropy_enthalpy=True)

Note that if you just need to use BAR to provide an initial guess for the MBAR, you could just do

mbar=MBAR(initial_guess='bar') 
mbar.fit(u_nk, compute_entropy_enthalpy=True)

@jaclark5
Copy link
Contributor Author

jaclark5 commented Oct 1, 2024

If you had a populated matrix with all the U(X_i, lambda_j) values, that would yield a different results than 2-state MBAR, which is more akin to BAR. We could include a method that strips u_nk of energy values that are not from adjacent states, then your suggestion would work, but getting an estimate from BAR isn't needed, it can go straight to MBAR.
What if BAR is left alone and has no entropy/enthalpy functionality, but an option is added to the initialization of the MBAR class to strip energy values from nonadjacent states in u_nk; something like 2_state=False by default, which would produce the same result as the current BAR(use_mbar=True)... I think... I would want to test how MBAR handles a sparse matrix. If that isn't equivalent, then a for-loop, like the loop in the BAR class, is needed in the MBAR.fit() method and I think that will look worse, but maybe more appropriate than calling the MBAR class in the BAR class.

Ultimately my motivation is to get enthalpy/entropy values using 2-state MBAR with the simplicity that alchemlyb offers. It turns out that nonadjacent values of U(x_i,lambda_j) in LAMMPS have a numerical issue and I cannot use MBAR, except in the 2-state case. Although it seems that the enthalpy and entropy values for the gromacs benzene example are pretty different for full MBAR vs 2-state MBAR case so I'm not sure if I'll move forward with reporting these values until that is rectified. When/if it is, I would like for my results to be reproducible in some way with this PR.

@mrshirts
Copy link

mrshirts commented Oct 1, 2024

I think... I would want to test how MBAR handles a sparse matrix.

It can't, it will give incorrect results. It's not equivalent.

The advantage of a BAR interface to enthalpy-entropy, I see, is when you did NOT collect the full MBAR data, and only have neighboring values.

Although it seems that the enthalpy and entropy values for the gromacs benzene example are pretty different for full MBAR vs 2-state MBAR case so I'

Yeah, I realize this is still pending with me to look at, things have been piling up.

@jaclark5
Copy link
Contributor Author

jaclark5 commented Oct 1, 2024

Ok, so if there is merit to offering a calculation of 2-state MBAR (which I have need for, and it seems could be more generally used as-well), there are a few options:

  1. Keep the PR as is where the BAR class can contain a 2-state MBAR option within the existing for-loop structure for BAR
  2. Leave the BAR class alone, and and add a 2-state MBAR option to the MBAR class and add the same sort of for-loop structure that is present in the BAR class.
  3. Fill in the blank

@xiki-tempula
Copy link
Collaborator

For the MBAR part, I think I'm happy with the current state, where a simple call to pymbar gives the entropy and enthalpy.
With regard to the BAR part, I think it is a rather peculiar use case. We might need more discussion for it to be included.

@jaclark5 jaclark5 force-pushed the issue_entropy_enthalpy branch from 163a0ff to 74b8341 Compare November 12, 2024 20:07
@jaclark5
Copy link
Contributor Author

Hey @xiki-tempula, I removed the changes to BAR. I was able to resolve my issues with MBAR and LAMMPS. I think you'll find this PR acceptable.

@@ -57,6 +57,20 @@ class MBAR(BaseEstimator, _EstimatorMixOut):
The estimated statistical uncertainty (one standard deviation) in
dimensionless free energy differences.

delta_h_ : DataFrame
The estimated dimensionless enthalpy difference between each state.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is enthalpy dimensionless? Should it be having energy unit?

Copy link
Contributor Author

@jaclark5 jaclark5 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiki-tempula Do you want me to change the docstring? I put this since the existing documentation for free energy is:

    delta_f_ : DataFrame
        The estimated dimensionless free energy difference between each state.

and I'm treating enthalpy and entropy the same as free energy... it seems I should change entropy appropriately though. See the new commit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then I think this is fine. I will change this in a later PR.

CHANGES Show resolved Hide resolved

self._delta_f_.attrs = u_nk.attrs
self._d_delta_f_.attrs = u_nk.attrs
if compute_entropy_enthalpy:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you do add the units back.

Copy link
Collaborator

@xiki-tempula xiki-tempula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. Thanks. I'm a bit concerned with the description of dimensionless enthalpy as it has unit on it but I guess it is trivial.

self._d_delta_h_.attrs = u_nk.attrs
self._delta_s_.attrs = u_nk.attrs
self._d_delta_s_.attrs = u_nk.attrs
self._delta_s_.attrs["energy_unit"] = "k"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does entropy has unit of k? I think it shall be u_nk.attrs["energy_unit"] + '/K'?

Copy link
Collaborator

@xiki-tempula xiki-tempula Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm fine for this unit to be the same as other energy unit (though I know it is wrong) as it would mean that the unit conversion would work as it is (between J and cal).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Free energy is in units of energy (J) and when scaled by kT, is in units of kT. Similarly entropy is in units of energy/unit temperature (J/K) and so is scaled by k, with the units of k.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced entropy should be unitless, entropy should be J/degree, right? Even if passed through some unitless quantities, probably the code should be clear what is going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to have the correct units, so I'll look into the unit conversion, I didn't consider that feature.

Copy link
Contributor Author

@jaclark5 jaclark5 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way alchemlyb defines the units as u_nk.attrs['energy_unit'] = "kT". Adopting the same treatment for entropy to similarly have the correct scaling quantity, 'k' would be used to follow suit.
If alchemlyb's definition of u_nk.attrs['energy_unit'] = "kT" is not the message that is best to send, that's outside of this PR. Although I like it, I think implies that it's dimensionless while also indicating what quantities should be used to restore the units (although I hope that an average user would know).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The energy_unit, allowed to be kT; kcal/mol and kJ/mol, it is just defaults to kT. I think k is more like a scientific constant for most people so having it as unit is kind of bad.
The unit conversion is supported between these three units, which is just a scaling factor.
So the easiest solution is to just set the entropy to have the same unit as enthalpy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To adjust the unit conversion functionality to accommodate entropy, should I add the appropriate duplicated function of to_kT as to_k or add if-statements to detect the case where it's entropy? I'm leaning toward the former since running a matrix of entropy values through to_kT() would feel wrong. This would mean I would duplicate to_kcalmol(df, T=None) as to_kcalmolK(df) and to_kJmol(df, T=None) as to_kJmolK(df).

Copy link
Contributor Author

@jaclark5 jaclark5 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiki-tempula entropy can't have the same units at kT, otherwise the existing functions would multiply it by a temperature resulting in S*T instead of S. If that's what you want to do we can and be clear in the documentation.

Although the units would still be incorrect all the time with a label of "entropy", maybe adjusting the attribute to be delta_sT would be best.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we could have enthalpy in kT, kJ/mol and kcal/mol.
The entropy should then be in k, kJ/mol/T and kcal/mol/T.

In the current implementation we could have a case where enthalpy is kJ/mol and it is labelled correctly as kJ/mol. But we have entropy in kJ/mol/T but is labelled as k.

@jaclark5 jaclark5 force-pushed the issue_entropy_enthalpy branch from c09fbb1 to d8850fd Compare November 13, 2024 15:34
self._d_delta_h_ = pd.DataFrame(
out["dDelta_u"], columns=self._states_, index=self._states_
)
self._delta_sT_ = pd.DataFrame(
Copy link
Collaborator

@xiki-tempula xiki-tempula Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy for this value to be s*T. Is the mbar output s or s*T?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the mbar output is reduced entropy, so S/k or sT/(kT) so it's consistent.

Copy link
Collaborator

@xiki-tempula xiki-tempula Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that the unit is the same but I worried about the magnitude.
The G=H-sT so if the Delta_s is s then the Delta_f should be the same as Delta_u-Delta_s * T. If Delta_s is sT, then Delta_f should be Delta_u-Delta_s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree, so I'm changing the variable names to make that clear. I'll do whatever the maintainers agree on, maybe @mrshirts or @orbeckst have an opinion?

Summary: Entropy has different units and which need to be handled properly. While the units of enthalpy and free energy are attr['energy_units'] = 'kT', 'kcal/mol' or 'kJ/mol', the equivalent for entropy would then be 'k', 'kcal/mol/K' or 'kJ/mol/K'. The downside of this difference in units would require either having copies of the units.py functions for entropy specifically or add a series of if statements to the functions in units.py to detect and handle entropy units. Also, it's awkward to say that dimensionless entropy has units of 'k'

Alternative: By having the entropy matrix represent S*T (denoted with the variable name delta_sT_) the object can retain the same units as enthalpy and free energy and use the same unit conversion functions, making everything cleaner internally, and avoid the awkward unit "definition" of 'k' for S.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'm merging this PR. I have checked the numerical values and can verify that Delta_s is not s but rather s*T. So it is in the unit of kT, kJ/mol and kcal/mol.

Copy link
Collaborator

@xiki-tempula xiki-tempula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiki-tempula xiki-tempula merged commit 968d4f2 into alchemistry:master Nov 14, 2024
4 of 8 checks passed
jaclark5 added a commit to jaclark5/alchemlyb that referenced this pull request Nov 15, 2024
* Add enthalpy and entropy calculations

* Add enthalpy and entropy calculations

* Update tests for coverage

* black

* Add bootstrapping to BAR().fit(use_mbar=True) and adapt tests for code coverage

* Update CHANGES

* Update get_group syntax

* Update method default in bar when use_mbar==True

* Remove enthalpy/entropy for BAR

* Address review comments

* Update docstrings for units.py for the case of entropy values

* Update entropy attribute name from delta_s to delta_sT

* Undo irrelevant addition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants