Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: key :Ti not found #392

Closed
Balinus opened this issue May 14, 2024 · 3 comments
Closed

KeyError: key :Ti not found #392

Balinus opened this issue May 14, 2024 · 3 comments
Labels
documentation Improvements or additions to documentation

Comments

@Balinus
Copy link
Contributor

Balinus commented May 14, 2024

Hello, I have the following error when using YAXArrays.Datasets.open_mfdataset. The files represent daily data (1st file is day 1, 2nd file is second day, etc). It is ERA5-Land data downloaded from Copernicus (I do not have the downloading script sadly).

using NetCDF
using YAXArrays
using Glob

repbrut = "/path/to/files"
patterns = "*copernicus_era5_land_surface.nc"

files = glob(patterns, repbrut)
obs = YAXArrays.Datasets.open_mfdataset(files[1:10]) # loading only a subset of the 3000 files

KeyError: key :Ti not found

Stacktrace:
  [1] getindex
    @ ./dict.jl:498 [inlined]
  [2] _broadcast_getindex_evalf
    @ ./broadcast.jl:709 [inlined]
  [3] _broadcast_getindex
    @ ./broadcast.jl:682 [inlined]
  [4] #31
    @ ./broadcast.jl:1118 [inlined]
  [5] ntuple
    @ ./ntuple.jl:50 [inlined]
  [6] copy
    @ ./broadcast.jl:1118 [inlined]
  [7] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(getindex), Tuple{Base.RefValue{Dict{Symbol, Any}}, Tuple{Symbol, Symbol, Symbol}}})
    @ Base.Broadcast ./broadcast.jl:903
  [8] merge_datasets(dslist::Vector{YAXArrays.Datasets.Dataset})
    @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/jdA1f/src/DatasetAPI/Datasets.jl:903
  [9] open_mfdataset(g::Vector{String})
    @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/jdA1f/src/DatasetAPI/Datasets.jl:280
 [10] top-level scope
    @ In[4]:5

I can open the files individually, for example:

ds1 = open_dataset(files[1])

YAXArray Dataset
Shared Axes: 
 longitude Sampled{Float32} -82.0f0:0.1f0:-50.0f0 ForwardOrdered Regular Points,
 latitude  Sampled{Float32} 64.0f0:-0.1f0:42.0f0 ReverseOrdered Regular Points,
↗ Ti        Sampled{DateTime} [1950-01-01T00:00:00, , 1950-01-01T23:00:00] ForwardOrdered Irregular Points
Variables: 
snowc, e, skt, asn, d2m, stl1, t2m, lai_lv, u10, sro, ssrd, src, v10, lai_hv, sp, sd, rsn, evaow, sde, sf, tp, ro, 
Properties: Dict{String, Any}("history" => "2024-05-10 22:57:09 GMT by grib_to_netcdf-2.28.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf -S param -o /cache/data2/adaptor.mars.internal-1715381827.014326-19889-5-81cba0ea-74b0-4995-b5c3-8458c0c8abd5.nc /cache/tmp/81cba0ea-74b0-4995-b5c3-8458c0c8abd5-adaptor.mars.internal-1715381789.8065639-19889-3-tmp.grib", "Conventions" => "CF-1.6")

ds2 = open_dataset(files[2])

YAXArray Dataset
Shared Axes: 
 longitude Sampled{Float32} -82.0f0:0.1f0:-50.0f0 ForwardOrdered Regular Points,
 latitude  Sampled{Float32} 64.0f0:-0.1f0:42.0f0 ReverseOrdered Regular Points,
↗ Ti        Sampled{DateTime} [1950-01-02T00:00:00, , 1950-01-02T23:00:00] ForwardOrdered Irregular Points
Variables: 
snowc, e, skt, asn, d2m, stl1, lai_lv, t2m, u10, sro, ssrd, src, v10, lai_hv, sp, sd, rsn, evaow, sde, sf, tp, ro, 
Properties: Dict{String, Any}("history" => "2024-05-10 22:54:16 GMT by grib_to_netcdf-2.28.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf -S param -o /cache/data3/adaptor.mars.internal-1715381653.851235-8099-3-1961ccb9-cd31-4fe2-b913-5973053f1ab1.nc /cache/tmp/1961ccb9-cd31-4fe2-b913-5973053f1ab1-adaptor.mars.internal-1715381615.6087704-8099-3-tmp.grib", "Conventions" => "CF-1.6")

but I am unable to merge the datasets:

newds = YAXArrays.Datasets.merge_datasets([ds1, ds2])

KeyError: key :Ti not found

Stacktrace:
 [1] getindex
   @ ./dict.jl:498 [inlined]
 [2] _broadcast_getindex_evalf
   @ ./broadcast.jl:709 [inlined]
 [3] _broadcast_getindex
   @ ./broadcast.jl:682 [inlined]
 [4] #31
   @ ./broadcast.jl:1118 [inlined]
 [5] ntuple
   @ ./ntuple.jl:50 [inlined]
 [6] copy
   @ ./broadcast.jl:1118 [inlined]
 [7] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(getindex), Tuple{Base.RefValue{Dict{Symbol, Any}}, Tuple{Symbol, Symbol, Symbol}}})
   @ Base.Broadcast ./broadcast.jl:903
 [8] merge_datasets(dslist::Vector{YAXArrays.Datasets.Dataset})
   @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/jdA1f/src/DatasetAPI/Datasets.jl:903
 [9] top-level scope
   @ In[15]:1

As far as I can tell, :Ti is present in both files here (and in all 3000 files I have), but somehow it does not seems to be able to pick it up.

(Climat) pkg> st

  [179af706] CFTime v0.1.3
  [a93c6f00] DataFrames v1.6.1
  [0703355e] DimensionalData v0.27.2
  [31c24e10] Distributions v0.25.108
  [85f8d34a] NCDatasets v0.14.4
  [30363a11] NetCDF v0.11.8
  [90b8fcef] YAXArrayBase v0.6.1
  [c21b50f5] YAXArrays v0.5.6
⌃ [0a941bbe] Zarr v0.9.3
  [ade2ca70] Dates
  [10745b16] Statistics v1.10.0

From Manifest

[fcd2136c] DiskArrayTools v0.1.10
⌅ [3c3547ce] DiskArrays v0.3.23
@felixcremer
Copy link
Member

That is something that we should fix. As a stop gap you could extract all cubes from the dataset use cat(cubes..., dims=Ti) to merge them and wrap the concatenated cubes in a Dataset.

@Balinus
Copy link
Contributor Author

Balinus commented May 14, 2024

ok, thanks, I'll see what I can do.

I didn't calculated correctly the number of files... it is 27_000 files.

I am doing the following, but I get a warning about lookup tables not matching (the order of the variable name are perhaps not sorted, creating a problem? -> see "t2m" and "lai_lv" in both lists)

cubes = Cube.(files[1:4])
ds2 = cat(cubes..., dims=:Ti);

Warning: Lookup values for Dim{:Variable} of 
["snowc", "e", "skt", "asn", "d2m", "stl1", "t2m", "lai_lv", "u10", "sro", "ssrd", "src", "v10", "lai_hv", "sp", "sd", "rsn", "evaow", "sde", "sf", "tp", "ro"] 
and
["snowc", "e", "skt", "asn", "d2m", "stl1", "lai_lv", "t2m", "u10", "sro", "ssrd", "src", "v10", "lai_hv", "sp", "sd", "rsn", "evaow", "sde", "sf", "tp", "ro"] do not match. Can't `cat` AbstractDimArray, applying to `parent` object.
└ @ DimensionalData.Dimensions [~/.julia/packages/DimensionalData/yZgLJ/src/Dimensions/primitives.jl:774](https://vscode-remote+ssh-002dremote-002bdl2594-002elogin.vscode-resource.vscode-cdn.net/gpfs/groups/gc095/dl2594/Codes/ExtractionsBassins/Notebooks/~/.julia/packages/DimensionalData/yZgLJ/src/Dimensions/primitives.jl:774)

@lazarusA lazarusA added the documentation Improvements or additions to documentation label Sep 21, 2024
@lazarusA
Copy link
Collaborator

close by #470, #481. If there are other edge cases, please open a new issue with a MWE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants