StaticArrays woes #413

ExpandingMan · 2024-08-17T17:23:37Z

There currently seem to be quite a few issues with StaticArrays on many different back-ends. In most cases this is inefficiency due to inappropriate allocations (sometimes quite severe), in other cases there are outright errors. This issue is to document the various problems. Note that very few, if any of these are actually issues with DifferentiationInterface.jl itself, but rather with the back-ends.

In what follows we will use

f1(x) = sum(x .* x)
f2(x) = x .* x

testarray() = @SVector ones(4)

x = testarray()

Enzyme

Improper Allocation in Gradient

◖◗ @btime gradient($f1, $(AutoEnzyme()), $x)
  107.511 ns (9 allocations: 688 bytes)
4-element SVector{4, Float64} with indices SOneTo(4):
 2.0
 2.0
 2.0
 2.0

This is likely due to insufficient specialization in Enzyme.gradient for StaticArrays. I have confirmed that a raw Enzyme.autodiff is efficient and does not allocate. I'm attempting to address this, among other things in this PR.

Invalid Construction in Jacobian

◖◗ @btime jacobian($f2, $(AutoEnzyme()), $x)
ERROR: DimensionMismatch: No precise constructor for SVector{4, Float64} found. Length of input was 1.
Stacktrace:
  [1] _no_precise_size(SA::Type, x::Tuple{Float64})
    @ StaticArrays ~/.julia/packages/StaticArrays/MSJcA/src/convert.jl:169
  [2] length_match_size(::Type{SVector{4, Float64}}, x::Tuple{Float64})
    @ StaticArrays ~/.julia/packages/StaticArrays/MSJcA/src/convert.jl:138
  [3] adapt_size(::Type{SVector{4, Float64}}, x::Tuple{Float64})
    @ StaticArrays ~/.julia/packages/StaticArrays/MSJcA/src/convert.jl:107
  [4] construct_type(::Type{SVector{4, Float64}}, x::Tuple{Float64})
    @ StaticArrays ~/.julia/packages/StaticArrays/MSJcA/src/convert.jl:95
  [5] Type
    @ ~/.julia/packages/StaticArrays/MSJcA/src/convert.jl:175 [inlined]
  [6] #3
    @ ~/.julia/packages/Enzyme/OOd6p/ext/EnzymeStaticArraysExt.jl:16 [inlined]
  [7] macro expansion
    @ ./ntuple.jl:72 [inlined]
  [8] ntuple
    @ ./ntuple.jl:69 [inlined]
  [9] onehot
    @ ~/.julia/packages/Enzyme/OOd6p/ext/EnzymeStaticArraysExt.jl:14 [inlined]
 [10] #98
    @ ~/.julia/packages/Enzyme/OOd6p/src/Enzyme.jl:1149 [inlined]
 [11] ntuple
    @ ./ntuple.jl:48 [inlined]
 [12] chunkedonehot
    @ ~/.julia/packages/Enzyme/OOd6p/src/Enzyme.jl:1147 [inlined]
 [13] prepare_jacobian(f::Function, backend::AutoEnzyme{Nothing, Nothing}, x::SVector{4, Float64})
    @ DifferentiationInterfaceEnzymeExt ~/.julia/packages/DifferentiationInterface/cuZBh/ext/DifferentiationInterfaceEnzymeExt/forward_onearg.jl:131
 [14] jacobian(f::typeof(f2), backend::AutoEnzyme{Nothing, Nothing}, x::SVector{4, Float64})
    @ DifferentiationInterface ~/.julia/packages/DifferentiationInterface/cuZBh/src/fallbacks/no_extras.jl:9
 [15] var"##core#509"(f2#506::typeof(f2), 507::AutoEnzyme{Nothing, Nothing}, x#508::SVector{4, Float64})
    @ Main ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:561
 [16] var"##sample#510"(::Tuple{typeof(f2), AutoEnzyme{Nothing, Nothing}, SVector{4, Float64}}, __params::BenchmarkTools.Parameters)
    @ Main ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:570
 [17] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; maxevals::Int64, kwargs::@Kwargs{})
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:187
 [18] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters)
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:182
 [19] #invokelatest#2
    @ ./essentials.jl:892 [inlined]
 [20] invokelatest
    @ ./essentials.jl:889 [inlined]
 [21] #lineartrial#46
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:51 [inlined]
 [22] lineartrial
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:50 [inlined]
 [23] tune!(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, verbose::Bool, pad::String, kwargs::@Kwargs{})
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:300
 [24] tune!
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:289 [inlined]
 [25] tune!(b::BenchmarkTools.Benchmark)
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:289
 [26] top-level scope
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:666

This error occurs when an insufficiently narrow StaticArrays type is used as a constructor. Again, this calls for more specialization within Enzyme. This may be fixed by this PR.

ForwardDiff (solved by #414 !)

Improper Allocation In Gradient

◖◗ @btime gradient($f1, $(AutoForwardDiff()), $x)
  247.977 ns (6 allocations: 608 bytes)
4-element SVector{4, Float64} with indices SOneTo(4):
 2.0
 2.0
 2.0
 2.0

This seems to be due to type instability in ForwardDiff.GradientConfig which that package mostly relies on the compiler eliding, but which does not get elided during its use in DifferentiationInterface.jl. In my opinion that flaw runs pretty deep in ForwardDiff.jl as it plays fast and loose with types which can only be inferred at runtime, but there is a patch that I believe would fix the this secific issue here.

Improper Allocation in Jacobian

◖◗ @btime jacobian($f2, $(AutoForwardDiff()), $x)
  282.829 ns (6 allocations: 704 bytes)
4×4 SMatrix{4, 4, Float64, 16} with indices SOneTo(4)×SOneTo(4):
 2.0  0.0  0.0  0.0
 0.0  2.0  0.0  0.0
 0.0  0.0  2.0  0.0
 0.0  0.0  0.0  2.0

I think this is the same issue as the above.

FiniteDiff

I'm less familiar with the internals of this package, but it claims to be non-allocating and compatible with StaticArrays.

Gradient uses `setindex!`

◖◗ @btime gradient($f1, $(AutoFiniteDiff()), $x);
ERROR: setindex!(::SVector{4, Float64}, value, ::Int) is not defined.
 Hint: Use `MArray` or `SizedArray` to create a mutable static array
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] setindex!(a::SVector{4, Float64}, value::Float64, i::Int64)
    @ StaticArrays ~/.julia/packages/StaticArrays/MSJcA/src/indexing.jl:3
  [3] macro expansion
    @ ~/.julia/packages/StaticArrays/MSJcA/src/broadcast.jl:159 [inlined]
  [4] _broadcast!
    @ ~/.julia/packages/StaticArrays/MSJcA/src/broadcast.jl:143 [inlined]
  [5] _copyto!
    @ ~/.julia/packages/StaticArrays/MSJcA/src/broadcast.jl:70 [inlined]
  [6] copyto!
    @ ~/.julia/packages/StaticArrays/MSJcA/src/broadcast.jl:63 [inlined]
  [7] materialize!
    @ ./broadcast.jl:914 [inlined]
  [8] materialize!
    @ ./broadcast.jl:911 [inlined]
  [9] finite_difference_gradient!(df::SVector{…}, f::typeof(f1), x::SVector{…}, cache::FiniteDiff.GradientCache{…}; relstep::Float64, absstep::Float64, dir::Bool)
    @ FiniteDiff ~/.julia/packages/FiniteDiff/wm6iC/src/gradients.jl:241
 [10] finite_difference_gradient(f::Function, x::SVector{…}, cache::FiniteDiff.GradientCache{…}; relstep::Float64, absstep::Float64, dir::Bool)
    @ FiniteDiff ~/.julia/packages/FiniteDiff/wm6iC/src/gradients.jl:222
 [11] finite_difference_gradient(f::Function, x::SVector{…}, cache::FiniteDiff.GradientCache{…})
    @ FiniteDiff ~/.julia/packages/FiniteDiff/wm6iC/src/gradients.jl:209
 [12] gradient(f::Function, ::AutoFiniteDiff{…}, x::SVector{…}, extras::DifferentiationInterfaceFiniteDiffExt.FiniteDiffGradientExtras{…})
    @ DifferentiationInterfaceFiniteDiffExt ~/.julia/packages/DifferentiationInterface/cuZBh/ext/DifferentiationInterfaceFiniteDiffExt/onearg.jl:95
 [13] gradient(f::typeof(f1), backend::AutoFiniteDiff{Val{:forward}, Val{:forward}, Val{:hcentral}}, x::SVector{4, Float64})
    @ DifferentiationInterface ~/.julia/packages/DifferentiationInterface/cuZBh/src/fallbacks/no_extras.jl:9
 [14] var"##core#600"(f1#597::typeof(f1), 598::AutoFiniteDiff{Val{…}, Val{…}, Val{…}}, x#599::SVector{4, Float64})
    @ Main ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:561
 [15] var"##sample#601"(::Tuple{typeof(f1), AutoFiniteDiff{…}, SVector{…}}, __params::BenchmarkTools.Parameters)
    @ Main ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:570
 [16] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; maxevals::Int64, kwargs::@Kwargs{})
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:187
 [17] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters)
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:182
 [18] #invokelatest#2
    @ ./essentials.jl:892 [inlined]
 [19] invokelatest
    @ ./essentials.jl:889 [inlined]
 [20] #lineartrial#46
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:51 [inlined]
 [21] lineartrial
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:50 [inlined]
 [22] tune!(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, verbose::Bool, pad::String, kwargs::@Kwargs{})
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:300
 [23] tune!
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:289 [inlined]
 [24] tune!(b::BenchmarkTools.Benchmark)
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:289
 [25] top-level scope
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:666
Some type information was truncated. Use `show(err)` to see complete types.

This one might actually be a problem with DifferentiationInterface.jl itself because there are surely methods somewhere in FiniteDiff that don't rely on this.

Inappropriate allocations in jacobian

◖◗ @btime jacobian($f2, $(AutoFiniteDiff()), $x)
  345.040 ns (44 allocations: 1.84 KiB)
4×4 SMatrix{4, 4, Float64, 16} with indices SOneTo(4)×SOneTo(4):
 2.0  0.0  0.0  0.0
 0.0  2.0  0.0  0.0
 0.0  0.0  2.0  0.0
 0.0  0.0  0.0  2.0

I'm less than completely confident this is indeed a bug, but it likely is as I don't really see why this would have to allocate.

TODO: Others?

There are definitely lots of similar issues with other backends, but I haven't documented them yet. However, many of those other back-ends give fewer guarantees about performance with StaticArrays, so there are likely only 4 or 5 backends (including those listed above) where performance quips are valid.

The text was updated successfully, but these errors were encountered:

gdalle · 2024-08-17T17:43:53Z

For those cases in which you see improper allocations, can you benchmark again using DI's preparation functionality, as outlined in our docs? That's how you can get fully non allocating behavior e.g. for ForwardDiff

ExpandingMan · 2024-08-17T17:57:31Z

As far as I can tell, for the most part what prepare_ does is redundant with whatever would normally happen when using those packages directly. In a case where you have ordinary arrays, it may prepare a buffer to avoid allocations during differentiation itself, but this should not be the case when both the objects in the Extras structs and all involved arrays are SArray. It's not clear why users should have to allocate an MArray as a manual step when in most cases (e.g. ForwardDiff) this is not necessary when using the backend directly. In other words, based on my current understanding of both this package and the back-ends, the StaticArrays allocations should be considered a bug whether prepare_ is used or not (at least for most backends).

(Note I realize when re-reading this that I'm conflating prepare_ with the mutating versions. However I believe the same logic holds: they still should not allocate when everything involved is static.)

To further elucidate this point, note

◖◗ @btime ForwardDiff.gradient($f1, $x)
  10.972 ns (0 allocations: 0 bytes)
4-element SVector{4, Float64} with indices SOneTo(4):
 2.0
 2.0
 2.0
 2.0

It seems reasonable to me to expect this to behave the same way as gradient(f1, AutoForwardDiff(), x), regardless of whether this generalizes to all cases.

gdalle · 2024-08-17T18:35:22Z

That's where you're at least partly wrong. My priority in designing this package was to obtain optimal performance after preparation. So by default, DI.gradient (when called without extras) first prepares the operator (creating an extras object) and then calls the gradient with said preparation. I can also make DI.gradient call ForwardDiff.gradient directly while skipping preparation, which would probably give you the behavior you want

gdalle · 2024-08-17T18:35:45Z

It just wasn't a priority because I thought most performance-focused users would leverage preparation

ExpandingMan · 2024-08-18T18:37:07Z

It's looking like #414 entirely solves the forward diff cases.

gdalle · 2024-10-11T08:14:10Z

Hi @ExpandingMan. I have recently given a lot of thought to StaticArrays and I think we are now able to close this issue.

Enzyme

This is a whole can of worms and evolving fast, so we should probably open specific issues when the need arises. See also #558

ForwardDiff

The finishing touch is #571, which should make pushforward allocation-free (please double-check).

FiniteDiff

I actually don't think StaticArrays are supported by their native gradient function either. In any case, the behavior of both packages seems coherent:

using ADTypes
import DifferentiationInterface as DI
using FiniteDiff: FiniteDiff
using StaticArrays

x = SVector(1.0, 2.0)

DI.gradient(sum, AutoFiniteDiff(), x)  # errors
FiniteDiff.finite_difference_gradient(sum, x) # errors

DI.jacobian(identity, AutoFiniteDiff(), x) # works
FiniteDiff.finite_difference_jacobian(identity, x) # works

DI.hessian(sum, AutoFiniteDiff(), x) # works
FiniteDiff.finite_difference_hessian(sum, x) # works

Testing

Thanks to #571, DifferentiationInterfaceTest works better with StaticArrays. You can even benchmark code in a few lines to check where allocations occur:

using DifferentiationInterface, DifferentiationInterfaceTest
using ForwardDiff: ForwardDiff
using StaticArrays

backends = [AutoForwardDiff()]
scenarios = static_scenarios()  # modify this with your own scenarios
data = benchmark_differentiation(backends, scenarios; logging=true)  # returns a DataFrame

ExpandingMan · 2024-10-11T14:20:00Z

Thanks. Indeed I've also been losing track, recently some of the hacks I'd been using to fix things have been broken but sometimes I still see some allocations I haven't tracked down yet. It'll probably take a couple of weeks for everything to sort itself out and me to again understand everything that's going on. I will open specific issues if and as I find them, if I find issues in other repos relevant to this I will ping you.

gdalle added the backend Related to one or more autodiff backends label Aug 17, 2024

gdalle mentioned this issue Aug 18, 2024

Unprepared operators for ForwardDiff #414

Merged

gdalle mentioned this issue Oct 11, 2024

Test Enzyme with StaticArrays #558

Closed

gdalle closed this as completed Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StaticArrays woes #413

StaticArrays woes #413

ExpandingMan commented Aug 17, 2024 •

edited

Loading

gdalle commented Aug 17, 2024

ExpandingMan commented Aug 17, 2024 •

edited

Loading

gdalle commented Aug 17, 2024

gdalle commented Aug 17, 2024

ExpandingMan commented Aug 18, 2024

gdalle commented Oct 11, 2024 •

edited

Loading

ExpandingMan commented Oct 11, 2024

StaticArrays woes #413

StaticArrays woes #413

Comments

ExpandingMan commented Aug 17, 2024 • edited Loading

Enzyme

Improper Allocation in Gradient

Invalid Construction in Jacobian

ForwardDiff (solved by #414 !)

Improper Allocation In Gradient

Improper Allocation in Jacobian

FiniteDiff

Gradient uses setindex!

Inappropriate allocations in jacobian

TODO: Others?

gdalle commented Aug 17, 2024

ExpandingMan commented Aug 17, 2024 • edited Loading

gdalle commented Aug 17, 2024

gdalle commented Aug 17, 2024

ExpandingMan commented Aug 18, 2024

gdalle commented Oct 11, 2024 • edited Loading

Enzyme

ForwardDiff

FiniteDiff

Testing

ExpandingMan commented Oct 11, 2024

ExpandingMan commented Aug 17, 2024 •

edited

Loading

Gradient uses `setindex!`

ExpandingMan commented Aug 17, 2024 •

edited

Loading

gdalle commented Oct 11, 2024 •

edited

Loading