Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zip triggers lots of invalidations #175

Closed
charleskawczynski opened this issue Jun 14, 2024 · 11 comments · Fixed by #193
Closed

zip triggers lots of invalidations #175

charleskawczynski opened this issue Jun 14, 2024 · 11 comments · Fixed by #193

Comments

@charleskawczynski
Copy link
Contributor

charleskawczynski commented Jun 14, 2024

We have a workload over at ClimaAtmos.jl, where we use SnoopCompile's invalidations script, and the result is:

[ Info: 4296 methods invalidated for 178 functions (showing 10 functions)
┌─────────────────────────────────────────────────────────┬───────────────┬───────────────┬─────────────────┐
│ <file name>:<line number>                               │ Function Name │ Invalidations │ Invalidations % │
│                                                         │               │               │     (xᵢ/∑x)     │
├─────────────────────────────────────────────────────────┼───────────────┼───────────────┼─────────────────┤
│ DiskArrays/bZBJE/src/zip.jl:83                          │      zip      │     3247      │       14        │
│ DiskArrays/bZBJE/src/zip.jl:70                          │      zip      │     3247      │       14        │
│ DiskArrays/bZBJE/src/zip.jl:83                          │      zip      │     3247      │       14        │
│ FillArrays/eOEVm/src/fillbroadcast.jl:78                │  broadcasted  │     1220      │        5        │
│ CommonDataModel/G3moc/src/groupby.jl:267                │  broadcasted  │     1220      │        5        │
│ ClimaCore/ANgUC/src/MatrixFields/field_name_dict.jl:331 │  broadcasted  │     1220      │        5        │
│ ClimaCore/ANgUC/src/MatrixFields/field_name_dict.jl:475 │  broadcasted  │     1220      │        5        │
│ ClimaCore/ANgUC/src/MatrixFields/lazy_operators.jl:44   │  broadcasted  │     1220      │        5        │
│ GeometryBasics/ebXl0/src/offsetintegers.jl:40           │    convert    │      834      │        4        │
│ DiskArrays/bZBJE/src/generator.jl:53                    │   Generator   │      439      │        2        │
└─────────────────────────────────────────────────────────┴───────────────┴───────────────┴─────────────────┘

So, DiskArrays' zip seems to invalidate 9741 methods.

@rafaqz
Copy link
Collaborator

rafaqz commented Jun 14, 2024

Ugh damn. We have a plan to fix zip in a more fundamental way but it will be a while off.

Is there anything specific about our definition that is leading to invalidations?

(Generators are also not great it seems)

@rafaqz
Copy link
Collaborator

rafaqz commented Jun 14, 2024

It would be good to remove those, but NCDatasets.jl uses the macros to avoid inheriting AbstractVariable < : AbstractDiskArray

We (myself and @meggart) would very much like to remove those macros from the package but @Alexander-Barth may not.

@charleskawczynski
Copy link
Contributor Author

Maybe this is a question for @Alexander-Barth, but why would a type that doesn't subtype AbstractDiskArray have zip return DiskZip?

@rafaqz
Copy link
Collaborator

rafaqz commented Jun 17, 2024

DiskArrays provides a macro system to implement everything without explicit inheritance - the idea was that some packages wouldn't use the interface otherwise.

CommonDataModel.AbstractVariable does a lot of work in CommonDataModel.jl that doesn't always need disk array handling. So packages apply the macros to some concrete Variable types rather than making AbstractVariable <: AbstractDiskArray.

That causes a lot of issues, like this one. I'm hoping over time we can identify the reasons AbstractVariable doesn't need disk behaviors and add traits to DiskArrays to remove any performance loss for those types, and eventually switch everything over to just being <: AbstractDiskArray. But I'm not sure who could do that work when.

There is some discussion here:
JuliaGeo/CommonDataModel.jl#8

@Alexander-Barth
Copy link
Contributor

I would not be opposed to remove the zip macro, but I am not sure to what extent changes are necessary on the CommonDataModel/NCDatasets side.

As we have seen Zarr and NetCDF require sometimes different trade-offs. Some dataset are also entirely in memory or "virtual" (in the sense that values are computed based on indices, such as longitude/latitude arrays based on the projection). As long as I would be able to provide more specific routines (without ambiguities) in case when an alternative implementations is more performant, then I can also subtype directly from AbstractDiskArray.

@rafaqz
Copy link
Collaborator

rafaqz commented Jun 24, 2024

You can always provide more specific methods than AbstractDiskArray for specific variable types. There may be a few boilerplate methods needed to avoid ambiguity, yes.

What is required is a DiskArrays dependency in CommonDataModel.jl and defining AbstractVariable <: AbstrackDiskArray. Then probably default fallback methods to forward haschunks and eachunk to the parent array.

readblock! and writeblock! Would need to be defined for CFVariable and maybe some other wrapper arrays types.

Then you can delete the macros wherever they are applied.

(This will mean broadcasts, reductions and other nice things that are currently broken will work for CFDiskArray)

@charleskawczynski
Copy link
Contributor Author

xref: CliMA/ClimaCore.jl#1812

@charleskawczynski
Copy link
Contributor Author

Hi @rafaqz, I was looking back at this, and I'm wondering if we can at least alleviate the invalidations by making AccessCountDiskArray a subtype of AbstractDiskArray, and then skip defining the same zip methods on AccessCountDiskArray, since the fallback will be the same.

@rafaqz
Copy link
Collaborator

rafaqz commented Sep 19, 2024

Ok yes that's no problem at all. Feel free to PR

@charleskawczynski
Copy link
Contributor Author

Ok, thanks, I'll give this a try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants