-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing in other packages #2
Comments
Nice work! Do you think it would also make sense to apply this for GDAL raster grids in https://github.com/yeesian/ArchGDAL.jl, or at least https://github.com/evetion/GeoArrays.jl? @yeesian @evetion. |
Thanks for the feedback. Yes, both of them should definitely be on the radar. I just looked at the ArchGDAL raster interface and implementation there should be straightforward. |
Great. I guess GeoArrays.jl can be removed from the list. Right know it only supports reading the entire grid. And if it would add support for subsets, since it uses ArchGDAL for IO, it could profit from ArchGDAL's use of DiskArrays. |
This looks good. Seems pretty comprehensive for indexing it will be great to have everything standardised. My vote is also to implement it in ArchGDAL, then GeoData.jl can use it too. I recently added some basic array indexing to ArchGDAL, we could build off that. |
Ok, I created an initial PR for ArchGDAL. Maybe it would be a good idea to start registering ChunkedArrayBase and DiskArrays so that testing gets easier for dependent packages? |
Great! Yeah I think registering may be good. Since both ChunkedArrayBase and DiskArrays are pure julia, no-deps, small projects, why do you prefer to split it into two packages? May be a good choice, but just wondering. |
I don't know what the best strategy would be here. I thought there might be packages that want to use the concept of chunking (maybe some DistributedArray-like package) that don't have the slow-access characteristics to be tackled here, but from a maintenance point of view I think you are right and it is better to merge these packages. I will copy the code from ChunkedArrayBase here then. Are there any suggestions where DiskArrays should live? Is JuliaGeo the right place for it or should it stay in my namespace for now. |
Ok yeah then I agree on merging them. Since it is potentially quite a foundational package with many reverse dependencies, placing it in an organization is probably a good idea. Since it is created for and often used with geospatial rasters, JuliaGeo could be a good place for it to live. Another good option would be in https://github.com/JuliaArrays, living along the other special Array types with package names |
I was also wondering if it should be in JuliaArrays... HDF5 is a lot more general than geospatial data, and there are probably other non geo applications. |
I think since testing the PRs depends so much on this package being registered, I decided to register the package as it is for now. We can still move to an org later. |
Support for HDF5.jl would be great! |
@oschulz I just opened JuliaIO/HDF5.jl#615 to discuss this |
Oh, thanks - almost forgot about this! I should also check if I can use this in UpROOT.jl. |
I've implemented this in https://github.com/evetion/GeoArrays.jl, based on the ArchGDAL.jl implementation. Only thing that it breaks is my support for missing values, for which I normally read the arrays into memory and convert them to But |
Great to hear about the GeoArrays implementation. Is there anything specific that could be done about the missing values on the DiskArrays side. I am struggling with this now and then as well and have some attempts at solving this e.g. here https://github.com/meggart/DiskArrayTools.jl/blob/6728512ed9c33e1fa261bf38b2de3b97589ce310/src/DiskArrayTools.jl#L155-L192 |
Last time I tried, DiskArrays in NCDatasets I had some issues with types of undefined size like String and probable also NetCDF4 variable-length arrays:
See here: Alexander-Barth/NCDatasets.jl#79 (comment) Can DiskArrays now be used with an arrays of Strings? |
@Alexander-Barth have you seen and tried this: #24 sorry for not notifying you earlier, let me know if this works for you. |
Thanks a lot @meggart for let me know. I hope that I can find time soon to test this, but right now it is a bit difficult for me. I had just a look to your patch: """
A fallback element size for arrays to determine a where elements have unknown
size like strings. Defaults to 100MB
"""
const fallback_element_size = Ref(100) Should this be 100 bytes in doc strings ? |
I think the general indexing functionality for this package is there now, so that one could use this package as a dependency for NetCDF, HDF5 or Zarr without losing functionality. Here we can colltect a list to the packages that might profit from DiskArrays and link to the respective PRs:
@visr @rafaqz @Alexander-Barth any suggestions and feedback would be highly appreciated
The text was updated successfully, but these errors were encountered: