-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimization for associative reductions #189
Conversation
I think we also need the version with a function as first argument. And also is count really in that list? Especially with the function? I thought it would need some tweaking. |
I think the root problem is that some internal dispatch changed in Julia Base. All of these were working in previous Julia versions because they used to fall back to |
Oh right the dispatch on internals may have broken, sorry I should have realised that. We also just switched this to the main method in DimensionalData.jl Sorry @Alexander-Barth for the misdirection! I guess we just need to edit the But probably we can keep the tests to catch these things in future, especially if we explicitly test the access count of those arrays after the reductions, to make sure we are not hitting fallbacks. We clearely are not doing that at the moment. |
I am not sure about this we have these tests: https://github.com/meggart/DiskArrays.jl/blob/ccd3092ca6f32c2663ded1d74176aa9dd222a5ef/test/runtests.jl#L114 which also test the access count. Also the test that was added in this PR already passes the access_count test on the current main: using DiskArrays
using DiskArrays.TestTypes
using Test
A = rand(1:10,30,30)
DA = AccessCountDiskArray(A,chunksize=(2,2));
r = sum(DA);
@test getindex_count(DA)==length(eachchunk(DA)) Here we test that every chunk is accessed exactly once, please try this on main, so I would guess the slow performance in #188 must come from somewhere else or some operation we are not yet hitting with our tests. |
Ok, I did some digging and realized that currently we do not hit our mapreduce methods but rather the iterator fallback, which does read the data chunk by chunk but is more inefficient and numerically unstable compared to this implementation. Will launch Cthulhu to find out where this goes wrong |
Ok, I just opened #191 which also solves the performance issues described in #188. My suggestion for this PR would be to merge anyway, because of the following benefits:
So I would definitely want to add the specialized functions from this PR for |
Seems this has been pretty productive! good to have all those things working and tested. I think youre right - I don't quite understand how |
Superseded by #196 |
* optimization for associative reductions (sum, prod, count, all, any, minimum and maximum) * add test for associative reductions * added functional form of reducerds * add test for early stopping * Update src/mapreduce.jl Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com> --------- Co-authored-by: Alexander Barth <barth.alexander@gmail.com> Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com>
This PR addresses #188