-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel read and preprocess the data #371
Parallel read and preprocess the data #371
Conversation
…tps://github.com/alchemistry/alchemlyb into 359-speed-up-the-readpreprocess-in-abfe-workflow
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #371 +/- ##
=======================================
Coverage 98.78% 98.78%
=======================================
Files 28 28
Lines 1978 1982 +4
Branches 435 436 +1
=======================================
+ Hits 1954 1958 +4
Misses 2 2
Partials 22 22 ☔ View full report in Codecov by Sentry. |
0f9ddae
to
d5fe5dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks like a powerful new feature for the workflow. Do you have some simple performance benchmarks?
My primary concerns are (see comments)
- Making sure that joblib is explicitly installed.
- Is the new default
n_jobs=-1
safe? - add docs
import pytest | ||
from alchemtest.amber import load_bace_example | ||
from alchemtest.gmx import load_ABFE | ||
from joblib import parallel_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just import joblib
so that it's clearer below what comes from joblib? In this way any un-qualified functions and classes are alchemlyb and everything else is external.
suffix="xvg", | ||
T=310, | ||
) | ||
with parallel_config(backend="threading"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find
with parallel_config(backend="threading"): | |
with joblib.parallel_config(backend="threading"): |
clearer when quickly reading the code.
src/alchemlyb/workflows/abfe.py
Outdated
@@ -115,7 +116,7 @@ def __init__( | |||
else: | |||
raise NotImplementedError(f"{software} parser not found.") | |||
|
|||
def read(self, read_u_nk=True, read_dHdl=True): | |||
def read(self, read_u_nk=True, read_dHdl=True, n_jobs=-1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is default -1 really always the best choice? Did you try on a machine with, say, 16 cores, and 16 hyperthreaded cores (or really anything with hyperthreads)?
I am willing to make -1 the default if this is not throwing surprises for users. Otherwise the conservative 1 would be better and users can then explicitly enable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add versionchanged to docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a paragraph about parallelization: how to enable it, what it does (for each file), any potential problems...
src/alchemlyb/workflows/abfe.py
Outdated
@@ -201,6 +219,7 @@ def run( | |||
overlap="O_MBAR.pdf", | |||
breakdown=True, | |||
forwrev=None, | |||
n_jobs=-1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
- Is -1 safe as new default?
- add docs (explanation)
- add versionchanged
src/alchemlyb/workflows/abfe.py
Outdated
@@ -307,7 +329,7 @@ def update_units(self, units=None): | |||
logger.info(f"Set unit to {units}.") | |||
self.units = units or None | |||
|
|||
def preprocess(self, skiptime=0, uncorr="dE", threshold=50): | |||
def preprocess(self, skiptime=0, uncorr="dE", threshold=50, n_jobs=-1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
- Is -1 safe as new default?
- add docs (explanation)
- add versionchanged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing all my comments and especially the extensive docs.
Use joblib to parallelise the read and preprocess.