Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix map! undefined value exposure #56673

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

adienes
Copy link
Contributor

@adienes adienes commented Nov 24, 2024

hopefully more targeted / less disruptive than #50647 , would (at least partially) fix #36235

may need extra documentation explaining that with this implementation, @inbounds map!(dest, As) will always iterate through the indices of As[1] regardless of the other sizes

tbh, the existing implementation is pretty fundamentally incorrect and does not match the behavior explicitly described in the docstring, so I think that this PR (or #50647 , or something else similar) should merge regardless of benchmarks impact. however, I would appreciate if someone could trigger a benchmark run just so we can see the impact :)

@adienes
Copy link
Contributor Author

adienes commented Nov 24, 2024

pretty sure build failures are unrelated

@LilithHafner
Copy link
Member

LilithHafner commented Nov 27, 2024

This looks like a simple typo:

@boundscheck LinearIndices(dest) == idxs1 && all(x -> LinearIndices(x) == idxs1, As)

That's not how @boundscheck works. That won't throw even if the indices don't line up. Would it work to simply fix that typo or would that be too strict because the typo has been present for six years?

@adienes
Copy link
Contributor Author

adienes commented Nov 27, 2024

typo aside, it's checking the wrong thing. that's checking that literally all the indices are equal.

map is supposed to stop when any of the constituent arguments are exhausted --- this should be honored even in the presence of @inbounds ! I only made it skip this check in the presence of @inbounds in the PR because I figured there would be complaints if the added indices intersection impacted performance

but for a correct implementation to match the docstring, I think intersect(map(eachindex ∘ LinearIndices, As)...) --- or something else to that effect must be computed, @inbounds or no

this is one of those cases where Julia skipped directly to "make it fast" before "make it right" ☹️

@LilithHafner LilithHafner added the bugfix This change fixes an existing bug label Nov 27, 2024
Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this significant issue. Much appreciated. A related issue that you might want to be aware of, but don't need to fix in this PR is #56696.

I agree that it is acceptable to take some performance regressions in exchange for fixing this bug. I have some refactoring suggestions to make it a bit simpler.

I'm planning to run benchmarks once we agree on an implementation, but if you want me to run them right away instead I'd be happy to do that.

Comment on lines 3465 to 3470
idxs1 = eachindex(LinearIndices(As[1]))
@boundscheck begin
idxs1 = intersect(map(eachindex ∘ LinearIndices, As)...)
checkbounds(dest, idxs1)
end
for i = idxs1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
idxs1 = eachindex(LinearIndices(As[1]))
@boundscheck begin
idxs1 = intersect(map(eachindex LinearIndices, As)...)
checkbounds(dest, idxs1)
end
for i = idxs1
idxs = mapreduce(LinearIndices, intersect, As)
checkbounds(dest, inds)
for i = idxs
  • I don't see why we need eachindex is here
  • We should only include bounds-check elision if benchmarks reveal that it makes a difference. In this case, we would also need to add @propagate_inbounds to map! to make it take effect.
  • I think that mapreduce(LinearIndices, intersect, As) is more elegant than intersect(map(LinearIndices, As)...), unless there is some performance reason to use the latter.
  • If always compute the intersection of indices then inds is a better name than inds1

Copy link
Contributor Author

@adienes adienes Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the comments!

the eachindex I added for the following reason

julia> mapreduce(LinearIndices, intersect, [A, B])
3-element Vector{Int64}:
 1
 2
 3

julia> mapreduce(eachindex ∘ LinearIndices, intersect, [A, B])
Base.OneTo(3)

so I figured that if we can hit fast-path intersection for ranges rather than collecting the indices we should. Another solution might be to add a fast intersect(::LinearIndices, ::LinearIndices) ?

the mapreduce formulation indeed looks nicer to me as well.

  • If always compute the intersection of indices then inds is a better name than inds1

Agreed! but I suppose we should decide if in the first place @inbounds should allow skipping the check and iterate through the indices of the first argument? I think this is a decision on semantics rather than implementation that I am not empowered to make. the docs should probably be updated if this is intended

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the eachindex I added for the following reason

Good reason.

I suppose we should decide if in the first place @inbounds should allow skipping the check and iterate through the indices of the first argument?

@inbounds should never expose documented semantics that are otherwise not present. This is because you can't count on bounds checks being removed. For exmaple, Julia could be run with -check-bound=yes or the boundschecking might not be inlined into the callsite where @inbounds is present. In either case, the bounds checks will remain. Consequently I think the answer to that question is no: @inbounds should not change the semantics other than to skip bounds checks.

@@ -910,6 +910,9 @@ include("generic_map_tests.jl")
generic_map_tests(map, map!)
@test_throws ArgumentError map!(-, [1])

# Issue #30624
@test map!(+, [0,0,0], [1,2], [10,20,30], [100]) == [111,0,0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you also add a test for #36235 (comment), please?

@adienes
Copy link
Contributor Author

adienes commented Nov 28, 2024

  • added a bunch more tests (many failing or segfaulting on main)
  • removed all @inbounds until it's clear that these are needed for performance, we can re-add
  • I would like to abdicate responsibility for fixing OffsetArrays interactions :)

@LilithHafner LilithHafner added minor change Marginal behavior change acceptable for a minor release needs pkgeval Tests for all registered packages should be run with this change labels Dec 2, 2024
Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, map!(+, ones(1), ones(2, 2)) and map!(+, ones(1), ones(2, 2), ones(2, 2)) worked and this makes them throw (as the docstring indicates they should).

Performance is, at a glance, sometimes slightly better and sometimes slightly worse than it was on master. Correctness first, performance optimization second so I'm not too concerned.

I like that the implementation for map_n! is a bit simpler now and that it clearly aligns with the implementations of 1-arg and 2-arg map.

@@ -910,6 +910,26 @@ include("generic_map_tests.jl")
generic_map_tests(map, map!)
@test_throws ArgumentError map!(-, [1])

function test_30624()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a testset?

@LilithHafner
Copy link
Member

  • I would like to abdicate responsibility for fixing OffsetArrays interactions :)

Unfortunately we have to deal with them before merging because otherwise existing, valid, code that uses OffsetArrays will break. For example,

julia> R = fill(0, -4:-1);

julia> B = OffsetArray(reshape(1:12, 4, 3), -5, 6);

julia> minimum!(R, B)
4-element OffsetArray(::Vector{Int64}, -4:-1) with eltype Int64 with indices -4:-1:
 1
 2
 3
 4

This is a test failure from CI which fails because the implementation of minimum! (which lives in Base) for arrays with unconventional indexing relies on a bug that this PR fixes:

julia> x = fill(0, 1:3);

julia> y = fill(1, 2:4);

julia> map!(identity, x, y)
3-element OffsetArray(::Vector{Int64}, 1:3) with eltype Int64 with indices 1:3:
 1
 1
 1

@nanosoldier runtests()

@adienes
Copy link
Contributor Author

adienes commented Dec 2, 2024

I guess we'll have to iterate through destination idxs separately to fix the OffsetArrays issue? then dest will be treated just as a fully generic iterable / assignable collection.

function map!(f::F, dest::AbstractArray, A::AbstractArray, B::AbstractArray) where F
    inds = intersect(eachindex(LinearIndices(A)), eachindex(LinearIndices(B)))
    @boundscheck length(dest) >= length(inds)
    dest_idx = firstindex(dest)
    for i = inds
        dest[dest_idx] = f(A[i], B[i])
        dest_idx = nextind(dest, dest_idx)
    end
    return dest
end

I could be wrong but this seems more likely to hurt performance.

Would there be a meaningful difference here between iterating through firstindex, nextindex of dest directly vs through LinearIndices(dest) ?

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

@LilithHafner
Copy link
Member

Oh, wow! Nanosoldier looks clean. Rerunning failures to be sure:

@nanosoldier runtests(["ApproxLogFunction", "TestSetExtensions", "FunctionOperators", "RemoteFiles", "ChebyshevFiltering", "EinExprs", "Andes", "OpenTelemetryExporterPrometheus", "RegressionTables", "GLPK", "CDDLib", "OrdinaryDiffEqSSPRK", "MPIMeasurements", "ManifoldDiffEq", "OceanRobots", "DynamicMovementPrimitives", "ActiveInference", "Yunir", "ImmersedLayers", "FrequencySweep", "LowRankIntegrators", "NetworkJumpProcesses", "BoxCox"])

@LilithHafner
Copy link
Member

I guess we'll have to iterate through destination idxs separately to fix the OffsetArrays issue?

Given how nice PkgEval is looking, another option would be to fix the minimum! implementation to not rely on the map! bug.

Would there be a meaningful difference here between iterating through firstindex, nextindex of dest directly vs through LinearIndices(dest) ?

I doubt it, but hopefully that won't be necessary. If we can implement it in a way that isn't practically breaking, I'd love to make map! take offset indices into account (like map) rather than iterating separately.

@adienes
Copy link
Contributor Author

adienes commented Dec 4, 2024

I'm finding all the indirection in reducedim.jl pretty inscrutable 😓

but I guess the "real" issue is in mapfirst! ?

function mapfirst!(f::F, R::AbstractArray, A::AbstractArray{<:Any,N}) where {N, F}
    lsiz = check_reducedims(R, A)
    t = _firstreducedslice(axes(R), axes(A))
    map!(f, R, view(A, t...))
end

in your example, we have

julia> t
(IdOffsetRange(values=-4:-1, indices=-4:-1), 7:7)

so then

julia> LinearIndices(view(B, t...)) |> collect
4×1 Matrix{Int64}:
 1
 2
 3
 4

on main, that final map! call will happily map indices [1 2 3 4] into [-4, -3, -2, -1] whereas this PR makes it (correctly?) throw a BoundsError

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

@adienes
Copy link
Contributor Author

adienes commented Dec 4, 2024

actually, @mbauman 's tour de force #55318 with this PR rebased on top seems to solve the offset arrays case. so maybe the solution is rebase & wait to get more help on that one?

@adienes
Copy link
Contributor Author

adienes commented Dec 4, 2024

at the very least, this PR would swap "bad" bugs (segfaults, undefined behavior) for "regular" bugs (something throwing a BoundsError when it should not)

@adienes
Copy link
Contributor Author

adienes commented Jan 26, 2025

updates #30389. also breadcrumb to #46352

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug minor change Marginal behavior change acceptable for a minor release needs pkgeval Tests for all registered packages should be run with this change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

map! resulting in undefined values
3 participants