Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving average #381

Merged
merged 24 commits into from
Sep 14, 2024
Merged

Moving average #381

merged 24 commits into from
Sep 14, 2024

Conversation

jaclark5
Copy link
Contributor

Addition of Moving Average convergence and visualization functions, in addition to some changes to BAR to make this feasible. #380

Copy link

codecov bot commented Jul 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.78%. Comparing base (4cba9ed) to head (750cb0d).
Report is 25 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #381      +/-   ##
==========================================
+ Coverage   98.73%   98.78%   +0.05%     
==========================================
  Files          28       28              
  Lines        1895     1981      +86     
  Branches      407      435      +28     
==========================================
+ Hits         1871     1957      +86     
  Misses          2        2              
  Partials       22       22              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jaclark5
Copy link
Contributor Author

jaclark5 commented Jul 24, 2024

The convergence and visualization models don't appear to be present in the alchemtest repository. How should this be handled? I found them in the main repository, not in alchemtest

@jaclark5 jaclark5 force-pushed the moving_average branch 2 times, most recently from ee2937e to b5fad16 Compare July 25, 2024 19:19
@jaclark5
Copy link
Contributor Author

I'm not sure how to resolve the outstanding codecov issues. After entering and executing the lines in an if-statement, codecov is labeling it as "partially covered"

@jaclark5 jaclark5 force-pushed the moving_average branch 2 times, most recently from c22cfd3 to cb2c664 Compare July 25, 2024 20:36
@xiki-tempula
Copy link
Collaborator

So codecov will label as partially covered if there isn't an explicit else statement and the else part has not been covered. I have not had a close look yet but in terms of metric I think the coverage of this PR is already decent.

@jaclark5
Copy link
Contributor Author

For reference, when this code was in the lammps branch, you asked why states was necessary instead of self._states_. That's because when running BAR with a single fep-lambda value (e.g., 0.1), while the number of self._states_ reflects a larger number of states in the columns (e.g., 0.0, 0.1, 0.2, .. 1.0) then BAR complains about a mismatch between a matrix that is (1,1) [from fep-lambda] and one that is (11,11) [from self._states_].

@xiki-tempula
Copy link
Collaborator

Do you mind give me a ping when you are happy for this PR to be reviewed? Thanks.

@jaclark5
Copy link
Contributor Author

@xiki-tempula Ok it's ready now!

Copy link
Collaborator

@xiki-tempula xiki-tempula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the PR; it's very well-written! I have a couple of comments.

It would be important to consider supporting multiple lambda values. MD engines like Gromacs often utilize several lambda windows, with each lambda potentially changing at different rates. It could be problematic if this isn't supported.

docs/convergence.rst Outdated Show resolved Hide resolved
@@ -94,7 +95,16 @@ def forward_backward_convergence(
# select estimator class by name
my_estimator = estimators_dispatch[estimator](**kwargs)
logger.info(f"Use {estimator} estimator for convergence analysis.")


# Check that each df in the list has only one value of lambda
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry do you mind remind me of why one cannot have more than one value of lambda?
I might be wrong but I think in principle, one could do forward_backward_convergence of more than one lambda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this function is assess whether a particular production run has converged. The lambda state of that system must be constant throughout a dataframe for this assessment. If the lambda state changes later on in the trajectory (toward the bottom of the rows of the dataframe), the result of this function would not make sense or be useful. A user might eventually find their mistake, or they may think that their trajectory is not long enough. This check will help a user quickly produce useful results.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarification. I think there might be some misunderstanding here. It seems that you're referring to lambda dynamics, where a lambda value is constantly changing at different points in the trajectory. I agree that this isn't supported in this repository.

However, the use case I'm referring to involves MD engines like Gromacs, where multiple windows with different lambda values can run simultaneously. For example, you might have windows such as (coul lambda=0, vdw lambda=0), (coul lambda=0.5, vdw lambda=0), (coul lambda=1, vdw lambda=0), (coul lambda=1, vdw lambda=0.5), and (coul lambda=1, vdw lambda=1). Each lambda window represents an independent simulation, and within each simulation, the lambda value does not change.

The function in question would, for example, take the first 10% of data from all the windows to derive an MBAR estimate, then take the first 20% of data from all the windows to derive another MBAR estimate. This approach ensures that each independent lambda window is appropriately considered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and the way the forward_backward_convergence function handles that is to have each of those windows provided as a separate DataFrame in the df_list. This section is meant to ensure that each DataFrame has a constant set of lambda values independently, not that all the provided DataFrames contains the same set of lambda values.

Copy link
Collaborator

@xiki-tempula xiki-tempula Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean now. But what does

ind = [j for j in range(len(lambda_values[0])) if len(list(set([x[j] for x in lambda_values]))) > 1][0]

Do? I guess if you want the lambda value to be the same then

if len(set(df.reset_index('time').index))) > 1:
    raise Exception

Should be enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make this flexible for the number of indices available in dataframe, as either fep-lambda, vdw-lambda, or coul-lambda could have multiple values. Given that lambda_values is a list of unique lambda sets, e.g., [[1]], [[0],[1]], [[0,0]], or [[0,0], [0,1]]. This line will identify the index that is changing so for [[0],[1]], ind=0, and for [[0,0], [0,1]], ind=1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more complicated than I thought. I think my assumption is that for each column there will only be one float for either fep-lambda, vdw-lambda, or coul-lambda. Is lampps giving this kind of output?

Copy link
Contributor Author

@jaclark5 jaclark5 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree that each DataFrame will have lambda columns that each contain a single value of lambda, but that additional DataFrames may be added to the list with different lambda values.

The complication arises from the case where u_nk columns, vdw-lambda and coul-lambda are present, so len(df.index[0]) == 3, at this point it wouldn't matter which simulation engine was used to create u_nk.

src/alchemlyb/convergence/convergence.py Outdated Show resolved Hide resolved
src/alchemlyb/convergence/convergence.py Outdated Show resolved Hide resolved
src/alchemlyb/convergence/convergence.py Show resolved Hide resolved
src/alchemlyb/estimators/bar_.py Show resolved Hide resolved
src/alchemlyb/estimators/bar_.py Show resolved Hide resolved
src/alchemlyb/tests/test_convergence.py Outdated Show resolved Hide resolved
src/alchemlyb/tests/test_convergence.py Show resolved Hide resolved
src/alchemlyb/tests/test_visualisation.py Show resolved Hide resolved
@xiki-tempula xiki-tempula merged commit b1b6d4f into alchemistry:master Sep 14, 2024
8 checks passed
@orbeckst orbeckst mentioned this pull request Sep 14, 2024
@orbeckst
Copy link
Member

@jaclark5 we forgot to add you to AUTHORS in this PR.

I'll fix it in PR #395 .

orbeckst added a commit that referenced this pull request Sep 16, 2024
AUTHORS entry had been forgotten in PR #381
orbeckst added a commit that referenced this pull request Sep 16, 2024
AUTHORS entry had been forgotten in PR #381
orbeckst added a commit that referenced this pull request Sep 16, 2024
* add CITATION.cff file
* close #394
* add all AUTHORS with ORCIDs and affiliation (as far as they are confirmed);   order based on AUTHORS
* add alchemlyb PIs
* add contributors that were listed in JOSS paper acknowledgements (PR #328) but had NOT been listed in AUTHORS (inserted in chronological order)
* only add emails for maintainers/PIs
* add Software Heritage Identifier
* add paper citation section for MBAR, decorrelation, and preliminary citation for JOSS paper #71 
* add one new contributor (from recent PR #381) to AUTHORS and CITATIONS.cff
orbeckst pushed a commit that referenced this pull request Sep 17, 2024
* fix bug introduced in PR #381: there was a change to creating the delta_f_ matrix,
  which resulted in the columns and indices being tuples that were in the wrong order 
  for single lambda computations.
* ensure that columns are in the correct order by explicitly sorting
* add a test for the delta_f_ columns
jaclark5 added a commit to jaclark5/alchemlyb that referenced this pull request Sep 19, 2024
* Add moving_average function for visualization and convergence testing

* Update versionadded

* Run Black

* Bug fix bar_.py states

* Update Changelog

* Update the docs

* Add tests

* Formatting to align with Black

* Update tests

* Refactor moving_average to align with forward_backward_convergence function

* Update tests

* Update test_convergence and lambda tests in convergence.moving_average

* Adjust convergence.py and tests for codecoverage

* black

* Update moving_average to block_average for more accurate descriptive name

* Address reviewer comments

* Update test to align with changed handling of dfs of different length in block_average

* Remove incorrect popagation of error in BAR

* Add tests and error catch for ill constructed BAR input, u_nk

* black

* Updated version comments

---------

Co-authored-by: Oliver Beckstein <orbeckst@gmail.com>
jaclark5 pushed a commit to jaclark5/alchemlyb that referenced this pull request Sep 19, 2024
AUTHORS entry had been forgotten in PR alchemistry#381
jaclark5 added a commit to jaclark5/alchemlyb that referenced this pull request Sep 19, 2024
* fix bug introduced in PR alchemistry#381: there was a change to creating the delta_f_ matrix,
  which resulted in the columns and indices being tuples that were in the wrong order 
  for single lambda computations.
* ensure that columns are in the correct order by explicitly sorting
* add a test for the delta_f_ columns
jaclark5 added a commit to jaclark5/alchemlyb that referenced this pull request Nov 15, 2024
* Add moving_average function for visualization and convergence testing

* Update versionadded

* Run Black

* Bug fix bar_.py states

* Update Changelog

* Update the docs

* Add tests

* Formatting to align with Black

* Update tests

* Refactor moving_average to align with forward_backward_convergence function

* Update tests

* Update test_convergence and lambda tests in convergence.moving_average

* Adjust convergence.py and tests for codecoverage

* black

* Update moving_average to block_average for more accurate descriptive name

* Address reviewer comments

* Update test to align with changed handling of dfs of different length in block_average

* Remove incorrect popagation of error in BAR

* Add tests and error catch for ill constructed BAR input, u_nk

* black

* Updated version comments

---------

Co-authored-by: Oliver Beckstein <orbeckst@gmail.com>
jaclark5 added a commit to jaclark5/alchemlyb that referenced this pull request Nov 15, 2024
* fix bug introduced in PR alchemistry#381: there was a change to creating the delta_f_ matrix,
  which resulted in the columns and indices being tuples that were in the wrong order
  for single lambda computations.
* ensure that columns are in the correct order by explicitly sorting
* add a test for the delta_f_ columns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants