-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster Info #455
base: dev
Are you sure you want to change the base?
Cluster Info #455
Conversation
Draft solution to GH-342 with the ability to extend to other types of generic config/info in the future. Add `gempyor.info` module with exported function `get_cluster_info` which pulls and parses an information yaml into a pydantic model.
Added specification for path modifications to the cluster info (namely to add AWS cli on rockfish to path). Specification includes the path to add, prepend vs append, and a flag to error if path is missing.
* Added `gempyor.info._infer_cluster_from_fqdn` internal utility. * `gempyor.info.get_cluster_info` now accepts `None` for `name`. * Minor testing fixes.
Fleshed out documentation for the `gempyor.info` module. Including module level docs and docstrings for the individual classes. Fleshed out examples.
Made the `ValueError` from `_infer_cluster_from_fqdn` optional, but default to `True` for current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems overall good, but I have a hard concern re including longleaf / rockfish specs in what's supposed to be a general community library.
I get that for our use, it's pretty ideal to have these here. But seems like the wrong answer for not-us users, and taking the easy short cut for us is maybe meaning we aren't thinking hard enough about what's the right answer for other users.
Set the `$FLEPI_PATH` environment variable in the gempyor CI GitHub action, because the internal `gempyor.info._get_info` function needs it to know where to look for config YAMLs.
89b289e
to
86dbb75
Compare
Good point. There are alternatives:
It's pretty easy to change the code to accommodate either use case so I have no qualms with either. Can also add in fallbacks like first check @pearsonca @jcblemai @MacdonaldJoshuaCaleb @saraloo @anjalika-nande what makes the most sense for y'all? |
I kinda think project path? Each project should represent all the pertinent data for executing that project. I could see a place for system / user "profile" like data - that seems like the next level version of this, as we're going to have think a bit more generically about handling various project-wide-defaults. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like including module-level documentation before import statements isn't something that has been done a lot in flepiMoP, even though it should be like this. Glad you are starting that here (though, I guess that's primarily because we don't have very much module-level documentation yet...)
info/cluster/schema.yml
Outdated
@@ -0,0 +1,42 @@ | |||
$schema: https://json-schema.org/draft/2020-12/schema | |||
$id: https://example.com/product.schema.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably copy paste error?
To me |
I agree with Sara on this |
To me, this should ultimately look roughly like the .Rprofile / .gitconfig / etc approach - look in the immediate place, then defer to some user site-wide file, then a system wide, etc. |
This seems more natural to me too! |
This is very good work and addresses the past discussion on yaml cluster configuration. I feel that it's helpful to have our rockfish and longleaf configuration inside the repository, as a living example for others to build their on their cluster and for us to have them all updated at the same time (and if flepiMoP is run on many more cluster by others, we can remove that at a later time). But I agree with Carl that this might mean we forget other use cases, so we'd need to be careful. I'd be for the
(adding then look at the flepiMoP repo)) thing which is common for these tools, instead of another environment variable. |
Sounds like the consensus is:
Does that sound correct before I begin making these edits (thumbs up for yes)? |
Updated the search paths to 1) current working directory, 2) `$FLEPI_INFO_PATH`, and finally 3) `$FLEPI_PATH`. Per feedback: #455 (comment).
c0b2c11
to
c8c055c
Compare
Added a GitHub action workflow to convert the yamls in `info/` and validate them against the schema file in the corresponding directory.
c8c055c
to
5da3fa4
Compare
This PR now implements the agreed upon search order in #455 (comment) and is ready for review again. |
Describe your changes.
This pull request adds an
info
module togempyor
with:_get_info
helper for pulling a configuration YAML given a category and class to parse into,get_cluster_info
for pulling the configuration associated with a particular HPC cluster, and_infer_cluster_from_fqdn
to guess the HPC cluster from the FQDN.New module/functionality include docstrings and unit tests. This PR also makes it easy to add future categories of generic info/config as needed.
Does this pull request make any user interface changes? If so please describe.
The user interface changes are contained in
gempyor.info
and only contains new functionality, no breaking changes.Those are reflected in updates to the documentation in docstrings, but could be added to the GitBook if needed. However I think the docstrings are comprehensive and are useful for the target audience.
What does your pull request address? Tag relevant issues.
This pull request addresses GH-342, and was spun out of GH-394.