Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a new documentation page for faster GRIB aggregations #495

Merged
merged 7 commits into from
Sep 9, 2024

Conversation

Anu-Ra-g
Copy link
Contributor

This PR adds a new page in the kerchunk documentation for faster reference consolidation for GRIB files.

@Anu-Ra-g Anu-Ra-g changed the title Added a new page for faster aggregations Added a new documentation page for faster GRIB aggregations Aug 27, 2024
docs/source/reference_aggregation.rst Outdated Show resolved Hide resolved
docs/source/reference_aggregation.rst Outdated Show resolved Hide resolved
GRIB Aggregations
-----------------

This new method for reference aggregation, developed by **Camus Energy**, is based on GRIB2 files. Utilizing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't be "new" for long, so drop this.

I would put the restrictions first:

  • must have .idx files
  • specialised for time-series data, each file having identical message structure

@emfdavid : an opinion on whether Camus wants to be referenced here and how that would look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emfdavid Should I mention Camus Energy like @martindurant asked here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked in with the team and it would be great if you could link https://www.camus.energy/ and if there is a place for attribution you can include my github handle @emfdavid, otherwise the contributors list is fine.

docs/source/reference_aggregation.rst Outdated Show resolved Hide resolved
docs/source/reference_aggregation.rst Outdated Show resolved Hide resolved
docs/source/reference_aggregation.rst Outdated Show resolved Hide resolved
Comment on lines 114 to 115
The index in ``idx`` file indexes the GRIB messages where as the ``k_index`` (kerchunk index)
we build as part of this workflow index the variables in those messages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note explains my question above, but is not very clear. Map out the steps we need to do before launching into the details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steps for how we build the index? Should I include the code for building the index?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No code, just brief points.

docs/source/reference_aggregation.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@emfdavid emfdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start - one suggestion re limitations.

- The ``.idx`` file must be of *text* type.
- Only specialised for time-series data, where GRIB files
have *identical* structure.
- Aggregation only works for a specific **forecast horizon** files.
Copy link
Contributor

@emfdavid emfdavid Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference index can be combined across many horizons but each horizon must be indexed separately.
Looking forward to seeing what you make of the reinflate api... there you can see all of the FMRC slices are supported against a collection of indexed data from many horizons, runtimes and valid times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reinflate api

ooh, what is this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method to turn the k_index and the metadata back into a ref_spec you can use in zarr/xarray
https://github.com/asascience-open/nextgen-dmac/blob/main/grib_index_aggregation/dynamic_zarr_store.py#L198
I think @Anu-Ra-g is already working on adding it into kerchunk?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, in that case I suspect it works already, right @Anu-Ra-g : but you can only work on one set of horizons OR one set of timepoints, not both at once? Something like that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can return an array with multiple dimensions.
I didn't have a strong use for this so struggled to do something general and practical.
For instance if you request by Horizon, you can provide multiple horizon axis and you dimensions should include 'horizon' and 'valid_time". Similarly you can request multiple runtimes and then your dimensions should include 'runtime' and 'step'.
Honestly not sure if this is helpful or over complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martindurant I tried it out with one set of horizons with the original code. Actually, I'm still figuring out the reinflating part of the code, aggregation types and the new indexes.

I noticed that for reinflating can also work with a grib_tree model made from a single grib file.
@emfdavid can you confirm this in this notebook that I made?

@martindurant
Copy link
Member

Let me know when this PR is ready for another look.

@Anu-Ra-g
Copy link
Contributor Author

@martindurant I've made the changes like you've suggested. It is ready for review.

@martindurant martindurant merged commit af4c5dd into fsspec:main Sep 9, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants