optimization for associative reductions #189

Alexander-Barth · 2024-08-27T15:07:42Z

This PR addresses #188

…minimum and maximum)

rafaqz · 2024-08-28T08:16:44Z

I think we also need the version with a function as first argument.

And also is count really in that list? Especially with the function? I thought it would need some tweaking.

meggart · 2024-08-28T15:45:55Z

I think the root problem is that some internal dispatch changed in Julia Base. All of these were working in previous Julia versions because they used to fall back to reduce or foldl for which we have an efficient implementation. So I am wondering if we could rather make this work again instead of hand-picking a few reduction functions that can be fixed and maybe missing some on the way.

rafaqz · 2024-08-28T15:58:03Z

Oh right the dispatch on internals may have broken, sorry I should have realised that. We also just switched this to the main method in DimensionalData.jl

https://github.com/meggart/DiskArrays.jl/blob/ccd3092ca6f32c2663ded1d74176aa9dd222a5ef/src/mapreduce.jl#L7-L12

Sorry @Alexander-Barth for the misdirection! I guess we just need to edit the mapreduce implementation instead. Its basically the generalisation of the methods in this PR.

But probably we can keep the tests to catch these things in future, especially if we explicitly test the access count of those arrays after the reductions, to make sure we are not hitting fallbacks. We clearely are not doing that at the moment.

meggart · 2024-08-29T12:22:45Z

But probably we can keep the tests to catch these things in future, especially if we explicitly test the access count of those arrays after the reductions, to make sure we are not hitting fallbacks. We clearely are not doing that at the moment.

I am not sure about this we have these tests: https://github.com/meggart/DiskArrays.jl/blob/ccd3092ca6f32c2663ded1d74176aa9dd222a5ef/test/runtests.jl#L114 which also test the access count. Also the test that was added in this PR already passes the access_count test on the current main:

using DiskArrays
using DiskArrays.TestTypes
using Test
A = rand(1:10,30,30)
DA = AccessCountDiskArray(A,chunksize=(2,2));

r = sum(DA);

@test getindex_count(DA)==length(eachchunk(DA))

Here we test that every chunk is accessed exactly once, please try this on main, so I would guess the slow performance in #188 must come from somewhere else or some operation we are not yet hitting with our tests.

meggart · 2024-08-29T12:36:04Z

Ok, I did some digging and realized that currently we do not hit our mapreduce methods but rather the iterator fallback, which does read the data chunk by chunk but is more inefficient and numerically unstable compared to this implementation. Will launch Cthulhu to find out where this goes wrong

meggart · 2024-08-29T13:11:47Z

Ok, I just opened #191 which also solves the performance issues described in #188. My suggestion for this PR would be to merge anyway, because of the following benefits:

The PR adds some more safety in case Julia Base switches their mapreducedim internal signatures again in future Julia versions.
In the case of sum for very large or very small chunks this PR should also be more accurate than the fallback mapreduce, i.e. when already hitting floating point issues within a chunk or when adding too many chunk in Float32 and Float16 precision (would be great to have some examples where this happens in the unit tests, I could also do this later if you don't have time)
WIth this PR any and all can be much faster because of early-loop-exit when the first true resp false value is encountered which we would not have in the generic fallback

So I would definitely want to add the specialized functions from this PR for sum, prod, any and all, ideally with some accompanying test that check these advantages. For count, maximum, and minimum I am quite neutral as the only benefit would be protection from future Base interface changes, would be great to hear your opinions.

rafaqz · 2024-08-29T15:33:06Z

Seems this has been pretty productive! good to have all those things working and tested.

I think youre right - sum, prod, any and all are the real beneficiaries of not using mapreduce. maximum and minimum are probably worth keeping too for reliability.

I don't quite understand how count is working over chunks here because it returns an Int but does not work on an iterator of Int. I thought it would need to be sum as the outer function and count as the inner.

meggart · 2024-10-18T12:15:42Z

Superseded by #196

* optimization for associative reductions (sum, prod, count, all, any, minimum and maximum) * add test for associative reductions * added functional form of reducerds * add test for early stopping * Update src/mapreduce.jl Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com> --------- Co-authored-by: Alexander Barth <barth.alexander@gmail.com> Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com>

Alexander-Barth added 2 commits August 27, 2024 16:59

optimization for associative reductions (sum, prod, count, all, any, …

7a976c2

…minimum and maximum)

add test for associative reductions

2077a08

meggart mentioned this pull request Oct 18, 2024

Update #189 #196

Merged

meggart closed this Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimization for associative reductions #189

optimization for associative reductions #189

Alexander-Barth commented Aug 27, 2024

rafaqz commented Aug 28, 2024

meggart commented Aug 28, 2024

rafaqz commented Aug 28, 2024 •

edited

Loading

meggart commented Aug 29, 2024

meggart commented Aug 29, 2024

meggart commented Aug 29, 2024

rafaqz commented Aug 29, 2024

meggart commented Oct 18, 2024

optimization for associative reductions #189

optimization for associative reductions #189

Conversation

Alexander-Barth commented Aug 27, 2024

rafaqz commented Aug 28, 2024

meggart commented Aug 28, 2024

rafaqz commented Aug 28, 2024 • edited Loading

meggart commented Aug 29, 2024

meggart commented Aug 29, 2024

meggart commented Aug 29, 2024

rafaqz commented Aug 29, 2024

meggart commented Oct 18, 2024

rafaqz commented Aug 28, 2024 •

edited

Loading