Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: overhaul EscapeAnalysis.jl #56849

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

wip: overhaul EscapeAnalysis.jl #56849

wants to merge 2 commits into from

Conversation

aviatesk
Copy link
Member

@aviatesk aviatesk commented Dec 17, 2024

This PR aims to implement the new design for EscapeAnalysis.jl as proposed in https://hackmd.io/XKTmg0R0Tt2giLW56mWZ3A. While the actual implementation deviates slightly from the design document, the high-level goals and concepts remain the same.

The current EscapeAnalysis.jl is based on the old escape analysis design of Java's Graal compiler1 and suffers from the following issues:

  • Flow-insensitive analysis: To simplify the implementation, EscapeAnalysis.jl currently maintains and updates a single escape state for the entire method being analyzed, preventing flow-sensitive analysis.
  • Backward analysis: To improve the efficiency of escape information propagation, the analysis is performed backward, but it suffers from handling alias information (i.e. information about object field), which inherently propagates forward.
  • Sloppy convergence check: The current algorithm iterates over the IR until all states converge. This leads to unnecessary iterations even for simple cases.
  • No support for inter-procedural alias analysis: Alias analysis is currently limited to intra-procedural contexts, significantly reducing its precision when encountering non-inlined calls.

The new EscapeAnalysis.jl takes inspiration from their (relatively) recent paper2 on partial escape analysis and adopts the following design:

  • Flow-sensitive analysis: The new analysis, similar to our type inference, uses flow-sensitive abstract interpretation, allowing the propagation of escape/alias information based on control flow.
  • Forward analysis: The new analysis focuses on propagating information along the control flow and is implemented as a forward analysis.
  • Convergence check with a basic-block-based working set: Like our type inference, the new algorithm manages a working set of basic blocks during analysis iterations, enabling more efficient convergence checks.
  • Inter-procedural alias analysis: By appropriately caching and propagating alias information inter-procedurally, the new design aims to preserve alias analysis precision even for IR containing non-inlined calls.

With this new analysis, a more powerful SROA (or load-forwarding) and a better type inference for capturing closure become possible. It may also enable optimizations like allocation sinking and stack allocation in the future.

As a showcase to the capabilities achieved by the new EscapeAnalysis.jl, the analysis results for the examples presented in their paper2 are as follows:

mutable struct Key
    idx::Int
    ref
    Key(idx::Int, @nospecialize(ref)) = new(idx, ref)
end
import Base: ==
key1::Key == key2::Key =
    key1.idx == key2.idx && key1.ref === key2.ref

global cache_key::Key
global cache_value

function get_value(idx::Int, ref)
    global cache_key, cache_value
    key = Key(idx, ref)
    if key == cache_key
        return cache_value
    else
        cache_key = key
        cache_value = create_value(key)
        return cache_value
    end
end
julia> result = code_escapes(get_value, (Int,Any))
get_value(✓ idx::Int64, X ref::Any) in Main at REPL[11]:1
3 1 ── X  %1  = %new(Main.Key, _2, _3)::Key                            │╻  Key
4 │    X  %2  = Main.cache_key::Key                                    │  
  │    ✓  %3  =   builtin Base.getfield(%1, :idx)::Int64 ( _2)        │╻  ==
  │    X  %4  =   builtin Base.getfield(%2, :idx)::Int64 ( X)         ││┃  getproperty
  │    ◌  %5  =   builtin (%3 === %4)::Bool                            ││╻  ==
  └─── ◌        goto #3 if not %5                                      ││ 
  2 ── X  %7  =   builtin Base.getfield(%1, :ref)::Any ( _3)          ││╻  getproperty
  │    X  %8  =   builtin Base.getfield(%2, :ref)::Any ( X)           │││
  │    ✓  %9  =   builtin (%7 === %8)::Bool                            ││ 
  └─── ◌        goto #4                                                ││ 
  3 ── ◌        goto #4                                                ││ 
  4 ┄─ ✓  %12 = φ (#2 => %9, #3 => false)::Bool                        │  
  └─── ◌        goto #6 if not %12                                     │  
5 5 ── X  %14 = Main.cache_value::Any                                  │  
  └─── ◌        return %146 ── ◌        nothing::Nothing7 ── ◌        nothing::Nothing7 8 ── X          builtin Base.setglobal!(Main, :cache_key, %1)::Key8 │    X  %19 = Main.create_value::Any                                 │  
  │    X  %20 =   dynamic (%19)(%1)::Any                               │  
  │    X  %21 =   builtin Core.get_binding_type(Main, :cache_value)::Type 
  │    ◌  %22 =   builtin (%20 isa %21)::Bool                          │  
  └─── ◌        goto #10 if not %22                                    │  
  9 ── ◌        goto #11                                               │  
  10 ─ X  %25 =   dynamic Base.convert(%21, %20)::Any11 ┄ X  %26 = φ (#9 => %20, #10 => %25)::Any                         │  
  │    X          builtin Base.setglobal!(Main, :cache_value, %26)::Any9 │    X  %28 = Main.cache_value::Any                                  │  
  └─── ◌        return %28                                             │

julia> EscapeAnalysis.has_no_escape(result.eresult.bbescapes[5][SSAValue(1)])
true

Here, builtin Base.getfield(%1, :idx)::Int64 (↦ _2) and builtin Base.getfield(%1, :ref)::Any (↦ _3) indicate that load-forwarding is possible for these getfield operations. Additionally, EscapeAnalysis.has_no_escape(result.eresult.bbescapes[5][SSAValue(1)]) shows that %1 = %new(Main.Key, _2, _3)::Key does not (yet) escape in basic block 5, which demonstrates that allocation sinking optimization is possible for this IR.

While the new analysis captures significantly more information, its performance has not degraded substantially:

julia> @benchmark code_escapes(ir, 3) setup=(
           ir = first(only(Base.code_ircode(get_value, (Int,Any)))))
# master
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  14.750 μs   8.301 ms  ┊ GC (min  max): 0.00%  99.46%
 Time  (median):     15.792 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   17.176 μs ± 82.903 μs  ┊ GC (mean ± σ):  4.81% ±  0.99%

    ▄▇███▇▆▅▃▃▃▂▂▁▁▁ ▁   ▁     ▁                              ▂
  ▅███████████████████████████████▇██▇▆▆▆▄▆▅▆▅▅▅▄▅▄▅▅▄▅▅▅▃▅▅▂ █
  14.8 μs      Histogram: log(frequency) by time      24.8 μs <

 Memory estimate: 14.91 KiB, allocs estimate: 302.

# this PR
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  22.333 μs   7.978 ms  ┊ GC (min  max): 0.00%  99.05%
 Time  (median):     24.416 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   26.968 μs ± 95.222 μs  ┊ GC (mean ± σ):  4.85% ±  1.40%

     ▃█▄▂                                                      
  ▂▃▇████▇▆▄▄▃▄▃▄▃▃▃▃▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  22.3 μs         Histogram: frequency by time        40.7 μs <

 Memory estimate: 38.84 KiB, allocs estimate: 719.

In particular, performance has improved for simpler cases:

julia> @benchmark code_escapes(ir, 1) setup=(
           ir = Base.code_ircode() do
               x = Ref{String}()
               x[] = "foo"
               out1 = x[]
               x[] = "bar"
               out2 = x[]
               return x, out1, out2
           end |> only |> first)
# master
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):   9.791 μs  55.917 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     10.708 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.109 μs ±  1.635 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▃▅▆███▇▆▆▅▄▄▃▂▂▂▂▂▁▁▁▁▁ ▁ ▁   ▁                            ▂
  ▇████████████████████████████████▇▇▇█▇▇▇▇▆▆▆▆▆▅▅▅▄▄▄▄▃▄▅▄▅▄ █
  9.79 μs      Histogram: log(frequency) by time      17.7 μs <

 Memory estimate: 13.66 KiB, allocs estimate: 330.

# this PR
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min  max):  4.065 μs   1.369 ms  ┊ GC (min  max): 0.00%  99.19%
 Time  (median):     4.399 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.675 μs ± 13.651 μs  ┊ GC (mean ± σ):  2.90% ±  0.99%

       ▁▂█▇▇▅▃                                                
  ▁▁▁▄▆███████▆▅▅▅▄▄▄▄▄▄▄▄▃▃▃▃▂▂▂▂▂▂▂▁▁▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  4.07 μs        Histogram: frequency by time        5.88 μs <

 Memory estimate: 4.95 KiB, allocs estimate: 103.

Furthermore, while their paper does not discuss inter-procedural alias analysis in much detail, this PR also aims to implement inter-procedural alias analysis. Ultimately, the goal is to successfully analyze targets such as the following:

function issue56561_2(a)
    x = sin(a)
    x = identity(x)
    return (a -> a + x)(a)
end

This PR is still a work in progress, and the following TODO list outlines the remaining tasks:

  1. Switch to the new flow-sensitive and working-set-based algorithm
  2. Implement the new alias analysis design (locally)
  3. Reimplement inter-procedural escape information propagation
  4. Implement the inter-procedural alias analysis design
  5. Optimize performance
  6. Enhance sroa_mutables! using EscapeAnalysis
    1. Make MemoryInfo CFG-aware
  7. Enable optimizations for capturing closures in combination with perform inference using optimizer-derived type information #56687

(the actual application to optimizations (6–7) may be addressed in a separate PR)

Footnotes

  1. Thomas Kotzmann and Hanspeter Mössenböck. 2005. Escape analysis in the context of dynamic compilation and deoptimization. In Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments (VEE '05). Association for Computing Machinery, New York, NY, USA, 111–120. https://doi.org/10.1145/1064979.1064996

  2. Lukas Stadler, Thomas Würthinger, and Hanspeter Mössenböck. 2018. Partial Escape Analysis and Scalar Replacement for Java. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '14). Association for Computing Machinery, New York, NY, USA, 165–174. https://doi.org/10.1145/2544137.2544157 2

@aviatesk aviatesk changed the title wip: EA overhaul wip: overhaul EscapeAnalysis.jl Dec 17, 2024
Base automatically changed from avi/EA-cleanup to master December 17, 2024 14:16
@aviatesk aviatesk force-pushed the avi/EA-overhaul branch 2 times, most recently from 0e1fc63 to 66eb5d2 Compare December 17, 2024 14:40
@oscardssmith
Copy link
Member

can you move the renaming of EscapeState->EscapeResult change to a separate commit? I feel like it would make review easier.

@aviatesk
Copy link
Member Author

The EscapeResult data structure is closely tied to this overhaul, so it’s difficult to separate them.

@oscardssmith oscardssmith added performance Must go faster compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) needs nanosoldier run This PR should have benchmarks run on it needs pkgeval Tests for all registered packages should be run with this change labels Dec 17, 2024
@StefanKarpinski
Copy link
Member

Rename in a first mechanical commit and then do the rest of the change?

@aviatesk aviatesk force-pushed the avi/EA-overhaul branch 5 times, most recently from 5664ae5 to 1b3d957 Compare December 19, 2024 06:28
@aviatesk
Copy link
Member Author

Of course it is possible to perform the rename, but the name EscapeResult in the previous code did not hold much meaning. This is because the previous analysis managed and updated a single state, which could then be returned directly as the final result. In that sense, EscapeState == EscapeResult.
In the analysis being implemented now, multiple BlockEscapeState structures manage the analysis state for each basic block. These are distinct from the final analysis result, EscapeResult, which represents the escape information for the entire method. So the rename was necessary and kind of tied to this refactoring.

That said, if it makes the review process easier, it might be worth trying.

@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@aviatesk aviatesk force-pushed the avi/EA-overhaul branch 3 times, most recently from c0a6fa9 to f918549 Compare December 19, 2024 18:08
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk aviatesk force-pushed the avi/EA-overhaul branch 6 times, most recently from df1b173 to b3bc64f Compare December 25, 2024 19:45
@aviatesk aviatesk force-pushed the avi/EA-overhaul branch 2 times, most recently from 6dacb43 to 9554f8c Compare December 26, 2024 09:19

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
- switched to the working set & flow-sensitive algorithm
- implemented the new alias analysis design
- added more tests
- recovered inter-procedural propagation of escape information
- made EA able to handle `new_nodes`
- add explicit `nstmts::Int` field to `BlockEscapeState`
- detect top escape with `TopLiveness`
- define `AnalyzableIRElement` type alias
- manage `aliasset` globally instead of on a per-block basis:
  While `aliasset` is necessary for propagating escape information, the
  convergence of the analysis is determined by the convergence of escape
  information, so the convergence of `aliasset` is not strictly required.
- rename `escape_xxx` to `analyze_xxx`
- propagate current state to handler state correctly
  Even if there are no changes made on current statement.
- `analyze_invoke`: propagate the return value escape information
- use generator for aliasset instead of collecting into array
x::Liveness == y::Liveness = begin
@nospecialize
if x === ⊥ₗ
return y === ⊥ₗ
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return y === ⊥ₗ
return (y === ⊥ₗ) || (y isa PCLiveness && length(y.pcs) == 0)

If I got the lattice right, these are also lattice-equal (although I guess that's not strictly required)

"""
x::EscapeInfo ⊔ y::EscapeInfo = begin
x::EscapeInfo ⊔ₑꜝ y::EscapeInfo = begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to realize that ⊔ₑꜝ is an in-place union - the is hard enough to read that I think we might need a clearer call-out for the mutation

Maybe joinₑ!(...)?

return new(aliases)
end
end
struct UnknownMemoryInfo <: MemoryInfo end # not part of the `⊑ₘ` lattice, just a marker for `ssamemoryinfo`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not include this in the lattice as usual? Seems like an easy inclusion

@nospecialize
if x isa MustAliasMemoryInfo
if y isa MustAliasMemoryInfo
return x.alias === y.alias
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this handle must-alias transitivity?

I'm thinking of must-alias chains e.g. x must-alias y and y must-alias z - Do we choose a 'representative' object for the set of must-aliases?

Thinking also of cases like a may-alias {x,y} and then we prove x must-alias y, which implies a must-alias y

# By incorporating some form of CFG information into `MemoryInfo`, it becomes possible
# to enable load-forwarding even in cases where conflicts occur by inserting φ-nodes.

abstract type MemoryInfo end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to AliasInfo? Gives me a more immediate sense of what it does, anyway

end

abstract type ObjectInfo end
struct HasUnanalyzedMemory <: ObjectInfo end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
struct HasUnanalyzedMemory <: ObjectInfo end
struct NoMemoryContents <: ObjectInfo end

Just a suggestion, but maybe closer to the intuition of this type as the oinfo lattice bottom

end
xfields, yfields = x.fields, y.fields
@goto compare_xfields_yfields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth a separate function

@@ -225,59 +328,114 @@ end
The non-strict partial order over [`EscapeInfo`](@ref).
"""
x::EscapeInfo ⊑ₑ y::EscapeInfo = begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this relation is no longer used - is that temporary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) needs nanosoldier run This PR should have benchmarks run on it needs pkgeval Tests for all registered packages should be run with this change performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants