wip: overhaul EscapeAnalysis.jl #56849

aviatesk · 2024-12-17T11:58:51Z

This PR aims to implement the new design for EscapeAnalysis.jl as proposed in https://hackmd.io/XKTmg0R0Tt2giLW56mWZ3A. While the actual implementation deviates slightly from the design document, the high-level goals and concepts remain the same.

The current EscapeAnalysis.jl is based on the old escape analysis design of Java's Graal compiler¹ and suffers from the following issues:

Flow-insensitive analysis: To simplify the implementation, EscapeAnalysis.jl currently maintains and updates a single escape state for the entire method being analyzed, preventing flow-sensitive analysis.
Backward analysis: To improve the efficiency of escape information propagation, the analysis is performed backward, but it suffers from handling alias information (i.e. information about object field), which inherently propagates forward.
Sloppy convergence check: The current algorithm iterates over the IR until all states converge. This leads to unnecessary iterations even for simple cases.
No support for inter-procedural alias analysis: Alias analysis is currently limited to intra-procedural contexts, significantly reducing its precision when encountering non-inlined calls.

The new EscapeAnalysis.jl takes inspiration from their (relatively) recent paper² on partial escape analysis and adopts the following design:

Flow-sensitive analysis: The new analysis, similar to our type inference, uses flow-sensitive abstract interpretation, allowing the propagation of escape/alias information based on control flow.
Forward analysis: The new analysis focuses on propagating information along the control flow and is implemented as a forward analysis.
Convergence check with a basic-block-based working set: Like our type inference, the new algorithm manages a working set of basic blocks during analysis iterations, enabling more efficient convergence checks.
Inter-procedural alias analysis: By appropriately caching and propagating alias information inter-procedurally, the new design aims to preserve alias analysis precision even for IR containing non-inlined calls.

With this new analysis, a more powerful SROA (or load-forwarding) and a better type inference for capturing closure become possible. It may also enable optimizations like allocation sinking and stack allocation in the future.

As a showcase to the capabilities achieved by the new EscapeAnalysis.jl, the analysis results for the examples presented in their paper² are as follows:

mutable struct Key
    idx::Int
    ref
    Key(idx::Int, @nospecialize(ref)) = new(idx, ref)
end
import Base: ==
key1::Key == key2::Key =
    key1.idx == key2.idx && key1.ref === key2.ref

global cache_key::Key
global cache_value

function get_value(idx::Int, ref)
    global cache_key, cache_value
    key = Key(idx, ref)
    if key == cache_key
        return cache_value
    else
        cache_key = key
        cache_value = create_value(key)
        return cache_value
    end
end

julia> result = code_escapes(get_value, (Int,Any))
get_value(✓ idx::Int64, X ref::Any) in Main at REPL[11]:1
3 1 ── X  %1  = %new(Main.Key, _2, _3)::Key                            │╻  Key
4 │    X  %2  = Main.cache_key::Key                                    │  
  │    ✓  %3  =   builtin Base.getfield(%1, :idx)::Int64 (↦ _2)        │╻  ==
  │    X  %4  =   builtin Base.getfield(%2, :idx)::Int64 (↦ X)         ││┃  getproperty
  │    ◌  %5  =   builtin (%3 === %4)::Bool                            ││╻  ==
  └─── ◌        goto #3 if not %5                                      ││ 
  2 ── X  %7  =   builtin Base.getfield(%1, :ref)::Any (↦ _3)          ││╻  getproperty
  │    X  %8  =   builtin Base.getfield(%2, :ref)::Any (↦ X)           │││
  │    ✓  %9  =   builtin (%7 === %8)::Bool                            ││ 
  └─── ◌        goto #4                                                ││ 
  3 ── ◌        goto #4                                                ││ 
  4 ┄─ ✓  %12 = φ (#2 => %9, #3 => false)::Bool                        │  
  └─── ◌        goto #6 if not %12                                     │  
5 5 ── X  %14 = Main.cache_value::Any                                  │  
  └─── ◌        return %14                                             │  
  6 ── ◌        nothing::Nothing                                       │  
  7 ── ◌        nothing::Nothing                                       │  
7 8 ── X          builtin Base.setglobal!(Main, :cache_key, %1)::Key   │  
8 │    X  %19 = Main.create_value::Any                                 │  
  │    X  %20 =   dynamic (%19)(%1)::Any                               │  
  │    X  %21 =   builtin Core.get_binding_type(Main, :cache_value)::Type 
  │    ◌  %22 =   builtin (%20 isa %21)::Bool                          │  
  └─── ◌        goto #10 if not %22                                    │  
  9 ── ◌        goto #11                                               │  
  10 ─ X  %25 =   dynamic Base.convert(%21, %20)::Any                  │  
  11 ┄ X  %26 = φ (#9 => %20, #10 => %25)::Any                         │  
  │    X          builtin Base.setglobal!(Main, :cache_value, %26)::Any│  
9 │    X  %28 = Main.cache_value::Any                                  │  
  └─── ◌        return %28                                             │

julia> EscapeAnalysis.has_no_escape(result.eresult.bbescapes[5][SSAValue(1)])
true

Here, builtin Base.getfield(%1, :idx)::Int64 (↦ _2) and builtin Base.getfield(%1, :ref)::Any (↦ _3) indicate that load-forwarding is possible for these getfield operations. Additionally, EscapeAnalysis.has_no_escape(result.eresult.bbescapes[5][SSAValue(1)]) shows that %1 = %new(Main.Key, _2, _3)::Key does not (yet) escape in basic block 5, which demonstrates that allocation sinking optimization is possible for this IR.

While the new analysis captures significantly more information, its performance has not degraded substantially:

julia> @benchmark code_escapes(ir, 3) setup=(
           ir = first(only(Base.code_ircode(get_value, (Int,Any)))))

# master
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  14.750 μs …  8.301 ms  ┊ GC (min … max): 0.00% … 99.46%
 Time  (median):     15.792 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   17.176 μs ± 82.903 μs  ┊ GC (mean ± σ):  4.81% ±  0.99%

    ▄▇███▇▆▅▃▃▃▂▂▁▁▁ ▁   ▁     ▁                              ▂
  ▅███████████████████████████████▇██▇▆▆▆▄▆▅▆▅▅▅▄▅▄▅▅▄▅▅▅▃▅▅▂ █
  14.8 μs      Histogram: log(frequency) by time      24.8 μs <

 Memory estimate: 14.91 KiB, allocs estimate: 302.

# this PR
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  22.333 μs …  7.978 ms  ┊ GC (min … max): 0.00% … 99.05%
 Time  (median):     24.416 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   26.968 μs ± 95.222 μs  ┊ GC (mean ± σ):  4.85% ±  1.40%

     ▃█▄▂                                                      
  ▂▃▇████▇▆▄▄▃▄▃▄▃▃▃▃▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  22.3 μs         Histogram: frequency by time        40.7 μs <

 Memory estimate: 38.84 KiB, allocs estimate: 719.

In particular, performance has improved for simpler cases:

julia> @benchmark code_escapes(ir, 1) setup=(
           ir = Base.code_ircode() do
               x = Ref{String}()
               x[] = "foo"
               out1 = x[]
               x[] = "bar"
               out2 = x[]
               return x, out1, out2
           end |> only |> first)

# master
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   9.791 μs … 55.917 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     10.708 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.109 μs ±  1.635 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▃▅▆███▇▆▆▅▄▄▃▂▂▂▂▂▁▁▁▁▁ ▁ ▁   ▁                            ▂
  ▇████████████████████████████████▇▇▇█▇▇▇▇▆▆▆▆▆▅▅▅▄▄▄▄▃▄▅▄▅▄ █
  9.79 μs      Histogram: log(frequency) by time      17.7 μs <

 Memory estimate: 13.66 KiB, allocs estimate: 330.

# this PR
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.065 μs …  1.369 ms  ┊ GC (min … max): 0.00% … 99.19%
 Time  (median):     4.399 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.675 μs ± 13.651 μs  ┊ GC (mean ± σ):  2.90% ±  0.99%

       ▁▂█▇▇▅▃                                                
  ▁▁▁▄▆███████▆▅▅▅▄▄▄▄▄▄▄▄▃▃▃▃▂▂▂▂▂▂▂▁▁▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  4.07 μs        Histogram: frequency by time        5.88 μs <

 Memory estimate: 4.95 KiB, allocs estimate: 103.

Furthermore, while their paper does not discuss inter-procedural alias analysis in much detail, this PR also aims to implement inter-procedural alias analysis. Ultimately, the goal is to successfully analyze targets such as the following:

function issue56561_2(a)
    x = sin(a)
    x = identity(x)
    return (a -> a + x)(a)
end

This PR is still a work in progress, and the following TODO list outlines the remaining tasks:

Switch to the new flow-sensitive and working-set-based algorithm
Implement the new alias analysis design (locally)
Reimplement inter-procedural escape information propagation
Implement the inter-procedural alias analysis design
Optimize performance
Enhance sroa_mutables! using EscapeAnalysis
1. Make MemoryInfo CFG-aware
Enable optimizations for capturing closures in combination with perform inference using optimizer-derived type information #56687

(the actual application to optimizations (6–7) may be addressed in a separate PR)

Thomas Kotzmann and Hanspeter Mössenböck. 2005. Escape analysis in the context of dynamic compilation and deoptimization. In Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments (VEE '05). Association for Computing Machinery, New York, NY, USA, 111–120. https://doi.org/10.1145/1064979.1064996 ↩
Lukas Stadler, Thomas Würthinger, and Hanspeter Mössenböck. 2018. Partial Escape Analysis and Scalar Replacement for Java. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '14). Association for Computing Machinery, New York, NY, USA, 165–174. https://doi.org/10.1145/2544137.2544157 ↩ ↩²

oscardssmith · 2024-12-17T15:20:08Z

can you move the renaming of EscapeState->EscapeResult change to a separate commit? I feel like it would make review easier.

aviatesk · 2024-12-17T15:53:32Z

The EscapeResult data structure is closely tied to this overhaul, so it’s difficult to separate them.

StefanKarpinski · 2024-12-17T16:38:44Z

Rename in a first mechanical commit and then do the rest of the change?

aviatesk · 2024-12-19T06:41:12Z

Of course it is possible to perform the rename, but the name EscapeResult in the previous code did not hold much meaning. This is because the previous analysis managed and updated a single state, which could then be returned directly as the final result. In that sense, EscapeState == EscapeResult.
In the analysis being implemented now, multiple BlockEscapeState structures manage the analysis state for each basic block. These are distinct from the final analysis result, EscapeResult, which represents the escape information for the entire method. So the rename was necessary and kind of tied to this refactoring.

That said, if it makes the review process easier, it might be worth trying.

aviatesk · 2024-12-19T06:42:02Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2024-12-19T19:31:14Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

- switched to the working set & flow-sensitive algorithm - implemented the new alias analysis design - added more tests - recovered inter-procedural propagation of escape information - made EA able to handle `new_nodes` - add explicit `nstmts::Int` field to `BlockEscapeState` - detect top escape with `TopLiveness` - define `AnalyzableIRElement` type alias - manage `aliasset` globally instead of on a per-block basis: While `aliasset` is necessary for propagating escape information, the convergence of the analysis is determined by the convergence of escape information, so the convergence of `aliasset` is not strictly required. - rename `escape_xxx` to `analyze_xxx` - propagate current state to handler state correctly Even if there are no changes made on current statement. - `analyze_invoke`: propagate the return value escape information - use generator for aliasset instead of collecting into array

topolarity · 2025-01-10T16:32:17Z