-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to select backend #107
Conversation
I also made a fuction to see the available backends and get the current backend
Yay! This is what I was looking for, thank you very much for taking the time to work on this. I'm really busy these days, but I'll do a proper review of this ASAP. From a quick look, it looks good!
Ah! I did not think about this issue at all, great that you bring it up. I should think about it a bit. I don't like the idea of having to change the molecule (as you guessed), but there might be no way around it. In the latter case, your proposed solution sounds sensible. |
Awesome! Another point I wasn't sure of in the beginning was if the graph module had to be reimported after a backend change, but from my testing this doesn't seem to be necessary. (I thought that maybe the inital As for the code review, don't worry, there is no rush. After your review we can decide how to move forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @Jnelen, I left a few comments from a first pass of review.
You are right, we can work first on this before adding some tests.
Mmmh, I just noticed that the CI tests failed. Besides the style/formatting ones that should be fixable with |
This commit contains the suggested changes regarding making the available_backends variable private, and refactoring the get_available_backends method to available_backends Co-authored-by: Rocco Meli <r.meli@bluemail.ch>
…le_backends everywhere where possible
Hi @RMeli , Thanks a lot for your review! It's really helpful and I'm learning a lot. I tried to implement/add your suggestions as well as I could. Other things that still should be addressed as well I suppose are:
Kind regards, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, at the moment I can't really think of a better way.
Me neither, and as I mentioned this is not necessarily a blocker. I'll try to look more closely at the mypy
error, and maybe also have a look at how matplotlib
switches between different plotting backends.
adding more tests (with the set_backend() feature)
fixing the molecule.to_graph() edge case caching (which gives problems when switching backends)
Definitely, but let's make sure that the design is robust before doing this. I think we are nearly there, thanks to your work!
For the molecular graph, the solution you suggested sounds sensible. Another possibility is to compute the graph on the fly, since caching it already caused bugs previously (see #17).
print warning when a certain backend isn't installed
Good suggestion for the _validate_backend function. I just updated to code to implement this. For the alias dictionary, I went with "generating" it, since I think it might more useful when more backends are added? If you don't like it we can also just replace it with defining the dictionary "manually". Also, i'm bad at coming up with variable names, so feel free to change them how you see fit. Finally regarding the caching of molecular graphs, I agree that just always doing it on the fly is a reasonable solution as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the alias dictionary, I went with "generating" it, since I think it might more useful when more backends are added?
I think that's good, it's less error prone.
If you don't like it we can also just replace it with defining
I also added a warning when a backend is not installed.
Thanks, I think this is useful.
I'll leave the decision up to you as for what path to take!
I'll experiment with this and see what's best.
In terms of tests, we need to add a CI configuration that has both networkx
and graph-tool
installed, and then a specific test (or set of tests) that only run if multiple backends are installed. I'd do something like the following (it's just a sketch):
def test_set_backend():
import networks as nx
import graph_tool as gt
# A = ...
set_backend("network")
G = graph.graph_from_adjacency_matrix(A)
assert isinstance(G, nx.Graph)
set_backend("network")
G = graph.graph_from_adjacency_matrix(A)
assert isinstance(G, gt.Graph)
Can you please also run black
and pre-commits
, in order to fix the formatting issues that are failing in CI (pre-commit
and flake8
actions)? Together with the change for mypy
I think CI should come back all green!
Thanks a lot for your work, much appreciated!
spyrmsd/graph.py
Outdated
) | ||
_available_backends.append("networkx") | ||
except ImportError: | ||
warnings.warn("The networkx backend does not seem to be installed.", stacklevel=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the effect of stacklevel=2
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is a weird one.. I also have this with the set_backend method.
Basically, without it, this is the warning I recieved when setting the backend looks like this:
spyrmsd/spyrmsd/graph.py:70: UserWarning: The backend is already graph-tool warnings.warn(f"The backend is already {backend}")
So for some reason it's referring to the part where the warning is raised in the graph.py file. I found this odd, and after some googling I found the stacklevel argument. After setting the stacklevel=2, this is the warning that's produced:
customtest_backend.py:32: UserWarning: The backend is already graph-tool graph.set_backend("GT")
Now it refers to the part of the user's code in that actually caused the warning to happen. I found this more sensible, but I agree it's peculiar. I can remove all instances where I used stacklevel=2 and remove that part if it's more suitable.
add __all__ back refactor _alias_backendDict to _alias_to_backend
Co-authored-by: Rocco Meli <r.meli@bluemail.ch>
Would you want a separate set of tests? Or would you also like to modify existing tests (for example the rsmd tests) to run each of them with both backends?
Yes, I just incorporated most of the changes. I also (finally) used the pre-commit hooks. Apologies, I haven't really used this before, but i'm very glad to learn about it! After learning more about it, I think this is a very good practice which is an important part of excellent software development. I think we're making good progress and getting close! Thank you so much for your patience and your in-depth feedback and review. |
Looks like To be more concrete, I think something like the following should work def _dummy(*args, **kwargs):
"""
Dummy function for backend not set.
"""
raises NotImplementedError("No backend is set.")
num_vertices = _dummy
num_edges = _dummy
# etc...
# In set_backend, assign the function to the proper backend
# num_vertices = nx_num_vertices
# etc... Edit: I'm back to the keyboard and I tested this and it indeed works. Funnily enough, when I typed
I think making sure that the backend is switched and that the functions you added work is the main thing in this PR. So I'd add tests where the tests for A bonus for this PR would be to print out which backend is used by default when running the tests, so that it is clear in CI. Running all existing tests with both backends is something for another PR, which can also simplify the CI matrix quite a lot (right now, there is one configuration per graph library). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this before, but from a few tests I did, the set_backend
function should not return anything. It should just set the _current_backend
, which needs to be set as global
at the top of the function (you need global
to be able to modify such variables, while for read-only it is not needed).
FYI this is the quick test I added, which will need to be skipped when there is only one backend installed:
def test_set_backend() -> None:
A = np.array([[0, 1, 1], [1, 0, 0], [1, 0, 1]])
import networkx as nx
import graph_tool as gt
graph.set_backend("networkx")
assert graph.get_backend() == "networkx"
Gnx = graph.graph_from_adjacency_matrix(A)
assert isinstance(Gnx, nx.Graph)
graph.set_backend("graph-tool")
assert graph.get_backend() == "graph-tool"
Ggt = graph.graph_from_adjacency_matrix(A)
assert isinstance(Ggt, gt.Graph)
with pytest.raises(ValueError, match="backend is not recognized or supported"):
graph.set_backend("unknown")
Co-authored-by: Rocco Meli <r.meli@bluemail.ch>
remove __all__ assignment
You're right, this seems to work! Thanks a lot for the suggestion, I think it would have taken me a long time to come up with something like this myself!
I would imagine this shouldn't be too difficult by using something like
For the tests, couldn't we use the graph._available_backends, and use those as fixtures? At the start of every test (or start of the test file), we can just set the backend using the fixture as an input. I think this approach should work if you only one backend, but also would make the tests a lot more flexible when adding new backends. Maybe I'm too optimistic, but if we can make it work like this, we could reduce the CI matrix and just install both nx and gt, and it will automatically run both. But as you said, maybe it'd be cleaner to do this from a separate PR and get this to work first. |
For the simple tests of the functions you added (like the small example I did), I think using def test_set_backend() -> None:
A = np.array([[0, 1, 1], [1, 0, 0], [1, 0, 1]])
nx = pyrest.importskip("networkx")
gt = pytest.importskip("grap_tool")
graph.set_backend("networkx")
assert graph.get_backend() == "networkx"
Gnx = graph.graph_from_adjacency_matrix(A)
assert isinstance(Gnx, nx.Graph)
graph.set_backend("graph-tool")
assert graph.get_backend() == "graph-tool"
Ggt = graph.graph_from_adjacency_matrix(A)
assert isinstance(Ggt, gt.Graph)
with pytest.raises(ValueError, match="backend is not recognized or supported"):
graph.set_backend("unknown")
Yes, this is what I had in mind. Ideally, I'd like a solution that does not need to change every test function (i.e. run the fixture for setting the backend for all the tests), but the general idea is the same. However, I think this is better off on another PR to separate the concerns (if you want to work on that great, otherwise I can take care of this more boring stuff).
Right, I think your suggestion of caching based on the backend is good. We can clean it up later if we then decide that building on the fly is cleaner. A test for this will be also needed. |
Fantastic, thanks!
Sounds good to me!
Yes, exactly. This is definitely good enough to get this PR merged. The only thing that's missing is a CI configuration that actually runs the test, since in the current setup I only install one backend per configuration. Do you want to look into this or should I? (I'm asking because some people dislike CI work and I don't want to put this burden on you.) If you would like to look into it, you'll need to
|
Ah ofcourse, I didn't think about that, but it makes a lot of sense!
I'll take a look! I'm planning to add CI to a project I will work on soon anyway, so I will need to learn it anyway. Why not start now haha. I think I can handle it, but If for some reason I can't get it to work I will let you know so you can take over! |
That's great, thanks! |
Co-authored-by: Rocco Meli <r.meli@bluemail.ch>
…sn't support graph-tool)
It has been a great experience, so thank you for that!
I'm on it! Also, what about the documentation, can you update it? Or how does that work? |
Glad to hear that, thank you for bearing with me!
Dang! I always forget about the documentation. At this point I think it's better off on another PR. This one is definitely ready to go! I'll open a couple of follow-up issues with TODOs to track these stuff. If you want to keep working on it, feel free to ping me on the new issues and I can assign them to you, otherwise I'll take care of them (eventually). |
I feel the opposite is more appropriate! Thank you for the patience and being very helpful in general
Yeah I agree completely!
Great! I think from here some more things can still be improved and I'm always happy to help! But the first thing I want to work on now is the parallel RMSD calculation with timeout. First I want to make it work for my application, and after that I'll see if I can adapt it to a clean spyrmsd function. After that I can try to help with some of the other stuff we talked about, like backend support for the standalone tool and cleaning up/simplifying the CI. I also have some other projects coming up however, so things might get busy for me, but I will try to keep an eye out and help where possible! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Jnelen, LGTM!
* add details on NotImplementedError exceptions * changelog * remove versioneer * changelog * fix warning * fix tests with pytest 8 * update changelog * update ci * add setuptools for python 12 * setuptools in ci * use miniconda, conda is missing from macos-14 * miniforge * fixes * mamba * bump min python version * update changelog * Update build system to flit_core * Add entry to changelog * Update CHANGELOG.md * Delete .gitattributes * Delete .lgtm.yml * Update CHANGELOG.md * Add functionality to select backend (#107) * Documentation and CLI for backend selection (#112) * Add rustworkx backend (#111) * Update graph.py to support set_backend function I also made a fuction to see the available backends and get the current backend * Apply private _available_backends suggestions from code review This commit contains the suggested changes regarding making the available_backends variable private, and refactoring the get_available_backends method to available_backends Co-authored-by: Rocco Meli <r.meli@bluemail.ch> * Print warning when backend is already set + make sure we use _available_backends everywhere where possible * remove reliance on environment variables * make _validate_backend function print warning when a certain backend isn't installed * Use precommit hooks add __all__ back refactor _alias_backendDict to _alias_to_backend * Update ValueError message Co-authored-by: Rocco Meli <r.meli@bluemail.ch> * Don't return the backend when setting it Co-authored-by: Rocco Meli <r.meli@bluemail.ch> * Add dummy function to make mypy happy remove __all__ assignment * fist play with rustworkx backend * add documentation for backend selection * add cli backend selection * cleanup * changelog * apply @Jnelen suggestion and add warning filter * make molecule test also more robust * add rx to ci * add rustworkx to all backends tests * add back graphtool * add rx to test all backends and add verbose mode * Update test_molecule.py --------- Co-authored-by: jnelen <jnelen@ucam.edu> Co-authored-by: Jochem Nelen <78348388+Jnelen@users.noreply.github.com> * Update .readthedocs.yml (#114) * Update .readthedocs.yml * rm duplicate key * prepare release 0.7.0 --------- Co-authored-by: Thomas Kluyver <takowl@gmail.com> Co-authored-by: Jochem Nelen <78348388+Jnelen@users.noreply.github.com> Co-authored-by: jnelen <jnelen@ucam.edu>
Hi,
Here is my second try at implementing the backend selection feature. I've been quite busy, so that's why it took a bit longer than I wanted to put this pull request. Either way, below is a short summary:
I only modified graph.py, and I tried to change as little as possible.
I use environment variables to check and update backend. I also added a function to check all available backends, and to see the current backend. (That last one maybe is a bit redundant and could be removed).
Everything seems to work fine, except there is one edge-case/"bug" with the to_graph() functionality of a molecule: if you use this before and after switching to another backend for the same molecule, it will still return the graph of the old backend because of the cache. I would propose to make use of a dictionary structure, where the (standardized) names of the backends serve as keys and the values the graphs. I could implement this quite easily, but ofcourse this would require some tweaks to molecule.py. That's why I waited with implementing this in order to first hear your opinion. Maybe there's a better way?
Also, I didn't add any tests yes, since I'd like your feedback/opinion first! After that i'll gladly spend some more time to polish everything!
Close #68