Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xarray engine tries and fails to open GeoTIFF instead of letting rasterio take precedence #579

Closed
chpolste opened this issue Jan 10, 2025 · 2 comments · Fixed by #591
Closed
Labels
bug Something isn't working

Comments

@chpolste
Copy link
Member

What happened?

When opening a GeoTIFF file with xarray's open_dataset, open_dataarray or open_mfdataset in an environment with earthkit-data installed, an error is raised. The file is opened without problems if earthkit-data is not installed in the environment. The error trace shows that the earthkit engine for xarray tries and fails to open the file instead of letting rioxarray with its rasterio backend take precedence.

The earthkit engine for xarray should be configured to give precedence to the rasterio engine for GeoTIFF files (or open them properly).

What are the steps to reproduce the bug?

In an environment with xarray and rioxarray, but not earthkit-data:

$ python
Python 3.13.1 | packaged by conda-forge | (main, Jan  8 2025, 09:15:59) [Clang 18.1.8 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import xarray as xr
>>> xr.open_dataarray("dgm50hs_col_32_368_5616_nw.tif")
<xarray.DataArray 'band_data' (band: 3, y: 294, x: 315)> Size: 1MB
[277830 values with dtype=float32]
Coordinates:
  * band         (band) int64 24B 1 2 3
  * x            (x) float64 3kB 3.68e+05 3.681e+05 ... 3.839e+05 3.84e+05
  * y            (y) float64 2kB 5.632e+06 5.632e+06 ... 5.616e+06 5.616e+06
    spatial_ref  int64 8B ...
Attributes:
    TIFFTAG_XRESOLUTION:     96
    TIFFTAG_YRESOLUTION:     96
    TIFFTAG_RESOLUTIONUNIT:  2 (pixels/inch)
    AREA_OR_POINT:           Area

After installing earthkit-data:

$ pip install earthkit-data
...
$ python
Python 3.13.1 | packaged by conda-forge | (main, Jan  8 2025, 09:15:59) [Clang 18.1.8 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import xarray as xr
>>> xr.open_dataarray("dgm50hs_col_32_368_5616_nw.tif")
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    xr.open_dataarray("dgm50hs_col_32_368_5616_nw.tif")
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/test/lib/python3.13/site-packages/xarray/backends/api.py", line 869, in open_dataarray
    dataset = open_dataset(
        filename_or_obj,
    ...<15 lines>...
        **kwargs,
    )
  File "/opt/homebrew/Caskroom/miniforge/base/envs/test/lib/python3.13/site-packages/xarray/backends/api.py", line 679, in open_dataset
    backend_ds = backend.open_dataset(
        filename_or_obj,
    ...<2 lines>...
        **kwargs,
    )
  File "/opt/homebrew/Caskroom/miniforge/base/envs/test/lib/python3.13/site-packages/earthkit/data/utils/xarray/engine.py", line 292, in open_dataset
    return SingleDatasetBuilder(fieldlist, profile, from_xr=True, backend_kwargs=_kwargs).build()
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/test/lib/python3.13/site-packages/earthkit/data/utils/xarray/builder.py", line 586, in build
    ds_sorted, _ = self.parse(self.ds, self.profile)
                   ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/test/lib/python3.13/site-packages/earthkit/data/utils/xarray/builder.py", line 546, in parse
    profile.update(ds_xr)
    ~~~~~~~~~~~~~~^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/test/lib/python3.13/site-packages/earthkit/data/utils/xarray/profile.py", line 323, in update
    raise ValueError(f"No metadata values found for variable key {self.variable_key}")
ValueError: No metadata values found for variable key param

The file can be read only by overriding the engine:

>>> xr.open_dataarray("dgm50hs_col_32_368_5616_nw.tif", engine="rasterio")
<xarray.DataArray 'band_data' (band: 3, y: 294, x: 315)> Size: 1MB
[277830 values with dtype=float32]
Coordinates:
  * band         (band) int64 24B 1 2 3
  * x            (x) float64 3kB 3.68e+05 3.681e+05 ... 3.839e+05 3.84e+05
  * y            (y) float64 2kB 5.632e+06 5.632e+06 ... 5.616e+06 5.616e+06
    spatial_ref  int64 8B ...
Attributes:
    TIFFTAG_XRESOLUTION:     96
    TIFFTAG_YRESOLUTION:     96
    TIFFTAG_RESOLUTIONUNIT:  2 (pixels/inch)
    AREA_OR_POINT:           Area

Version

earthkit-data 0.12.0

Platform (OS and architecture)

macOS, Linux

Relevant log output

No response

Accompanying data

No response

Organisation

ECMWF

@sandorkertesz
Copy link
Collaborator

@chpolste, thank you for reporting this issue. The upcoming hotfix release will fix this problem.

@sandorkertesz
Copy link
Collaborator

Fixed by #591

Available in 0.12.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants