Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does YAXArrays.jl support interop with xarray? #483

Open
gRox167 opened this issue Dec 17, 2024 · 8 comments
Open

Does YAXArrays.jl support interop with xarray? #483

gRox167 opened this issue Dec 17, 2024 · 8 comments

Comments

@gRox167
Copy link

gRox167 commented Dec 17, 2024

If we have a .nc or .zarr file generated by xarray, can we read and manipulate it with all functionality with YAXArrays.jl?

  • I tried open .zarr file generated by xarray and it works perfectly.

I just don't know if it is intentionally supported or it just happen to work.

Also, from other direction, if we have files generated by YAXArrays.jl, is that possible to use all functionality from xarray?

@lazarusA
Copy link
Collaborator

Yes,and if some things don't work please open more issues.

@felixcremer
Copy link
Member

There is also https://github.com/meggart/PyYAXArrays.jl for interop in a session, so that you could load your data with xarray and then use YAXArrays functionality for the analysis. This is experimental at the moment so you might not want to rely on it too much.

@gRox167
Copy link
Author

gRox167 commented Dec 17, 2024

Thanks for the clarification! Closing this issue now.

@gRox167 gRox167 closed this as completed Dec 17, 2024
@Balinus
Copy link
Contributor

Balinus commented Dec 17, 2024

If we have a .nc or .zarr file generated by xarray, can we read and manipulate it with all functionality with YAXArrays.jl?

  • I tried open .zarr file generated by xarray and it works perfectly.

I just don't know if it is intentionally supported or it just happen to work.

Also, from other direction, if we have files generated by YAXArrays.jl, is that possible to use all functionality from xarray?

For clarification, this is because both package (xarray and YAXArrays.jl) reads standardized data in the netcdf/zarr format (and other metadata-based format I know less)

@gRox167
Copy link
Author

gRox167 commented Dec 19, 2024

I find out an inconsistency when open xarray saved np.complex64 data. YAXArrays.jl will read a Complex64 data, which is corresponding to np.complex128 data.

using PythonCall
pyexec("
import numpy as np
import xarray as xr
", Main)
@pyexec """
data = np.random.random((3, 5)) + 1j*np.random.random((3, 5))
data = data.astype(np.complex64)
da = xr.DataArray(data, dims=["x", "y"], name="random_complex")
dtype = data.dtype
ds = xr.Dataset({"random_complex": da})
ds.to_zarr("random_complex.zarr", mode="w")
""" => dtype

this will return

Python: dtype('complex64')

which means out data have np.complex64 datatype which consists 2 float32 array.

open_dataset("random_complex.zarr", driver=:zarr)["random_complex"]

it will return

┌ 5×3 YAXArray{ComplexF64, 2} ┐
├─────────────────────────────┴────────────────────────────────────────── dims ┐
  ↓ y Sampled{Int64} 1:5 ForwardOrdered Regular Points,
  → x Sampled{Int64} 1:3 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any} with 1 entry:
  "name" => "random_complex"
├─────────────────────────────────────────────────────────────── loaded lazily ┤
  data size: 240.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

here data type is ComplexF64, which contains 2 Float64 array. And we also cannot index this array.

uncompressed data is not a multiple of sizeof(ComplexF64)

Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] decompress!(dest::Vector{ComplexF64}, src::Vector{UInt8})
    @ Blosc ~/.julia/packages/Blosc/jk4Np/src/Blosc.jl:184
  [3] zuncompress!
    @ ~/.julia/packages/Zarr/3QSdj/src/Compressors.jl:68 [inlined]
  [4] zuncompress!
    @ ~/.julia/packages/Zarr/3QSdj/src/Compressors.jl:14 [inlined]
  [5] uncompress_raw!(a::Matrix{ComplexF64}, z::ZArray{ComplexF64, 2, Zarr.BloscCompressor, DirectoryStore}, curchunk::Vector{UInt8})
    @ Zarr ~/.julia/packages/Zarr/3QSdj/src/ZArray.jl:261
  [6] uncompress_to_output!(aout::Matrix{ComplexF64}, output_base_offsets::Tuple{Int64, Int64}, z::ZArray{ComplexF64, 2, Zarr.BloscCompressor, DirectoryStore}, chunk_compressed::Vector{UInt8}, current_chunk_offsets::Tuple{Int64, Int64}, a::Matrix{ComplexF64}, indranges::Tuple{UnitRange{Int64}, UnitRange{Int64}})
    @ Zarr ~/.julia/packages/Zarr/3QSdj/src/ZArray.jl:270
  [7] readblock!(aout::Matrix{ComplexF64}, z::ZArray{ComplexF64, 2, Zarr.BloscCompressor, DirectoryStore}, r::CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}})
    @ Zarr ~/.julia/packages/Zarr/3QSdj/src/ZArray.jl:178
  [8] readblock!
    @ ~/.julia/packages/Zarr/3QSdj/src/ZArray.jl:247 [inlined]
  [9] readblock_sizecheck!
    @ ~/.julia/packages/DiskArrays/ny95C/src/diskarray.jl:337 [inlined]
 [10] getindex_disk_nobatch!(out::Nothing, a::ZArray{ComplexF64, 2, Zarr.BloscCompressor, DirectoryStore}, i::Tuple{Colon})
    @ DiskArrays ~/.julia/packages/DiskArrays/ny95C/src/diskarray.jl:247
 [11] getindex_disk!(out::Nothing, a::ZArray{ComplexF64, 2, Zarr.BloscCompressor, DirectoryStore}, i::Function)
    @ DiskArrays ~/.julia/packages/DiskArrays/ny95C/src/diskarray.jl:260
 [12] getindex_disk
    @ ~/.julia/packages/DiskArrays/ny95C/src/diskarray.jl:218 [inlined]
 [13] getindex
    @ ~/.julia/packages/DiskArrays/ny95C/src/diskarray.jl:370 [inlined]
 [14] getindex(A::YAXArray{ComplexF64, 2, ZArray{ComplexF64, 2, Zarr.BloscCompressor, DirectoryStore}, Tuple{Dim{:y, DimensionalData.Dimensions.Lookups.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Regular{Int64}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}, Dim{:x, DimensionalData.Dimensions.Lookups.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Regular{Int64}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}}, Dict{String, Any}}, i::Colon)
    @ DimensionalData ~/.julia/packages/DimensionalData/oXUIT/src/array/indexing.jl:61

@gRox167 gRox167 reopened this Dec 19, 2024
@felixcremer
Copy link
Member

That seems to be specifically a Zarr.jl issue. There is a mismatch in how python and Julia zarr implementations interpret the zarr metadata.
We would have to change the Zarr.sizemapf function for complex numbers and multiply the sizeof by 2.
I might be able to open a PR later on.

@lazarusA
Copy link
Collaborator

see JuliaIO/Zarr.jl#168

@felixcremer
Copy link
Member

That might be related but is not the main issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants