-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed behaviour of source._open_dataset
to:
#681
Conversation
- Search for data & coordinate variables from just data variables. - Don't check to remove unnecessary coordinates & variables from dataset as this automatically removes all requested coordinate variables. - If no data variables are found, load the first dataset returned: this avoids concatenation issues resulting from trying to concatenate along nonexistent dimenions. Added a 'test_request_coord_vars' test to test/test_source to ensure the following behaviour: - Only data variables requested & coordinates they depend on are returned if only data variables are requested (no change from previous behaviour). - Entire dataset (all data & coordinate variables) are returned if no variables are requested (no change from previous behaviour). - Only requested coordinate variables are returned if only coordinate variables are requested (updated behaviour). - Data variables requested, coordinates they depend on, and additional requested coordinate variables are returned both data and coordinate variables are requested (updated behaviour).
``` datasets = [ ds.set_coords(set(ds.variables) - set(ds.attrs[OPTIONS['vars_key']])) for ds in datasets ] ``` to _open_dataset - ought to fix failing test. Removal was based on wrong assumption that this check always removes specified coordinate variables, which is only true if they are not passed in the __init__ of the ESMDataSource class.
… return first dataset if so - necessary to indexing static files
19c3f51
to
287812d
Compare
… return first dataset if so - necessary to indexing static files Appears to be potential intermittent issue with Read the Docs build - unable to reproduce reliably locally.
thank you so much for this addition, @charles-turner-1! regarding the test failures, the recent test failures have highlighted the need to revamp our testing infrastructure. one area of focus is reducing our reliance on cloud-based datasets, which are prone to changes without prior notice. additionally, network connectivity glitches often lead to intermittent test failures. |
No worries @andersy005 - think this one is ready for review now. Our infrastructure for interacting with model outputs at ACCESS-NRI builds pretty heavily/directly on intake-esm, so would be happy to assist with identifying/improving flaky tests. |
@mgrover1 If you're able to review that would be great |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me - I like this way of approaching the problem of coordinates and such. We can resolve the doc build issues in additional PRs
Change Summary
Changes discovery of variables in source._open_dataset to additionally include coordinate variables, with the following behaviour:
Only data variables requested & coordinates they depend on are returned if only data variables are requested (no change from previous behaviour).
Entire dataset (all data & coordinate variables) are returned if no variables are requested (no change from previous behaviour).
Only requested coordinate variables are returned if only coordinate variables are requested (updated behaviour).
Data variables requested, coordinates they depend on, and additional requested coordinate variables are returned both data and coordinate variables are requested (updated behaviour)
Sync fork (
intake_esm/cat.py
)Related issue number
Closes #660
Checklist