-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moving average #381
Moving average #381
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #381 +/- ##
==========================================
+ Coverage 98.73% 98.78% +0.05%
==========================================
Files 28 28
Lines 1895 1981 +86
Branches 407 435 +28
==========================================
+ Hits 1871 1957 +86
Misses 2 2
Partials 22 22 ☔ View full report in Codecov by Sentry. |
|
ee2937e
to
b5fad16
Compare
I'm not sure how to resolve the outstanding codecov issues. After entering and executing the lines in an if-statement, codecov is labeling it as "partially covered" |
c22cfd3
to
cb2c664
Compare
So codecov will label as partially covered if there isn't an explicit else statement and the else part has not been covered. I have not had a close look yet but in terms of metric I think the coverage of this PR is already decent. |
For reference, when this code was in the |
Do you mind give me a ping when you are happy for this PR to be reviewed? Thanks. |
@xiki-tempula Ok it's ready now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for the PR; it's very well-written! I have a couple of comments.
It would be important to consider supporting multiple lambda values. MD engines like Gromacs often utilize several lambda windows, with each lambda potentially changing at different rates. It could be problematic if this isn't supported.
@@ -94,7 +95,16 @@ def forward_backward_convergence( | |||
# select estimator class by name | |||
my_estimator = estimators_dispatch[estimator](**kwargs) | |||
logger.info(f"Use {estimator} estimator for convergence analysis.") | |||
|
|||
|
|||
# Check that each df in the list has only one value of lambda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry do you mind remind me of why one cannot have more than one value of lambda?
I might be wrong but I think in principle, one could do forward_backward_convergence of more than one lambda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this function is assess whether a particular production run has converged. The lambda state of that system must be constant throughout a dataframe for this assessment. If the lambda state changes later on in the trajectory (toward the bottom of the rows of the dataframe), the result of this function would not make sense or be useful. A user might eventually find their mistake, or they may think that their trajectory is not long enough. This check will help a user quickly produce useful results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the clarification. I think there might be some misunderstanding here. It seems that you're referring to lambda dynamics, where a lambda value is constantly changing at different points in the trajectory. I agree that this isn't supported in this repository.
However, the use case I'm referring to involves MD engines like Gromacs, where multiple windows with different lambda values can run simultaneously. For example, you might have windows such as (coul lambda=0, vdw lambda=0), (coul lambda=0.5, vdw lambda=0), (coul lambda=1, vdw lambda=0), (coul lambda=1, vdw lambda=0.5), and (coul lambda=1, vdw lambda=1). Each lambda window represents an independent simulation, and within each simulation, the lambda value does not change.
The function in question would, for example, take the first 10% of data from all the windows to derive an MBAR estimate, then take the first 20% of data from all the windows to derive another MBAR estimate. This approach ensures that each independent lambda window is appropriately considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and the way the forward_backward_convergence
function handles that is to have each of those windows provided as a separate DataFrame in the df_list. This section is meant to ensure that each DataFrame has a constant set of lambda values independently, not that all the provided DataFrames contains the same set of lambda values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean now. But what does
ind = [j for j in range(len(lambda_values[0])) if len(list(set([x[j] for x in lambda_values]))) > 1][0]
Do? I guess if you want the lambda value to be the same then
if len(set(df.reset_index('time').index))) > 1:
raise Exception
Should be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to make this flexible for the number of indices available in dataframe, as either fep-lambda, vdw-lambda, or coul-lambda could have multiple values. Given that lambda_values
is a list of unique lambda sets, e.g., [[1]], [[0],[1]], [[0,0]], or [[0,0], [0,1]]. This line will identify the index that is changing so for [[0],[1]], ind=0, and for [[0,0], [0,1]], ind=1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more complicated than I thought. I think my assumption is that for each column there will only be one float for either fep-lambda, vdw-lambda, or coul-lambda. Is lampps giving this kind of output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree that each DataFrame will have lambda columns that each contain a single value of lambda, but that additional DataFrames may be added to the list with different lambda values.
The complication arises from the case where u_nk
columns, vdw-lambda and coul-lambda are present, so len(df.index[0]) == 3
, at this point it wouldn't matter which simulation engine was used to create u_nk
.
… in block_average
AUTHORS entry had been forgotten in PR #381
AUTHORS entry had been forgotten in PR #381
* add CITATION.cff file * close #394 * add all AUTHORS with ORCIDs and affiliation (as far as they are confirmed); order based on AUTHORS * add alchemlyb PIs * add contributors that were listed in JOSS paper acknowledgements (PR #328) but had NOT been listed in AUTHORS (inserted in chronological order) * only add emails for maintainers/PIs * add Software Heritage Identifier * add paper citation section for MBAR, decorrelation, and preliminary citation for JOSS paper #71 * add one new contributor (from recent PR #381) to AUTHORS and CITATIONS.cff
* fix bug introduced in PR #381: there was a change to creating the delta_f_ matrix, which resulted in the columns and indices being tuples that were in the wrong order for single lambda computations. * ensure that columns are in the correct order by explicitly sorting * add a test for the delta_f_ columns
* Add moving_average function for visualization and convergence testing * Update versionadded * Run Black * Bug fix bar_.py states * Update Changelog * Update the docs * Add tests * Formatting to align with Black * Update tests * Refactor moving_average to align with forward_backward_convergence function * Update tests * Update test_convergence and lambda tests in convergence.moving_average * Adjust convergence.py and tests for codecoverage * black * Update moving_average to block_average for more accurate descriptive name * Address reviewer comments * Update test to align with changed handling of dfs of different length in block_average * Remove incorrect popagation of error in BAR * Add tests and error catch for ill constructed BAR input, u_nk * black * Updated version comments --------- Co-authored-by: Oliver Beckstein <orbeckst@gmail.com>
AUTHORS entry had been forgotten in PR alchemistry#381
* fix bug introduced in PR alchemistry#381: there was a change to creating the delta_f_ matrix, which resulted in the columns and indices being tuples that were in the wrong order for single lambda computations. * ensure that columns are in the correct order by explicitly sorting * add a test for the delta_f_ columns
* Add moving_average function for visualization and convergence testing * Update versionadded * Run Black * Bug fix bar_.py states * Update Changelog * Update the docs * Add tests * Formatting to align with Black * Update tests * Refactor moving_average to align with forward_backward_convergence function * Update tests * Update test_convergence and lambda tests in convergence.moving_average * Adjust convergence.py and tests for codecoverage * black * Update moving_average to block_average for more accurate descriptive name * Address reviewer comments * Update test to align with changed handling of dfs of different length in block_average * Remove incorrect popagation of error in BAR * Add tests and error catch for ill constructed BAR input, u_nk * black * Updated version comments --------- Co-authored-by: Oliver Beckstein <orbeckst@gmail.com>
* fix bug introduced in PR alchemistry#381: there was a change to creating the delta_f_ matrix, which resulted in the columns and indices being tuples that were in the wrong order for single lambda computations. * ensure that columns are in the correct order by explicitly sorting * add a test for the delta_f_ columns
Addition of Moving Average convergence and visualization functions, in addition to some changes to BAR to make this feasible. #380