Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(cruby): reuse XPathContext objects #3378

Merged
merged 1 commit into from
Dec 16, 2024

Conversation

flavorjones
Copy link
Member

What problem is this PR intended to solve?

Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize).

This PR makes the following changes:

  • the XPathContext object ...
    • can accept nil as the value arg to #register_ns and #register_variable to deregister the namespace or variable
    • tracks namespaces and variables registered through those two methods
    • has a new #reset method that deregisters all namespaces and variables registered
    • has a new #node= method to set the current node being searched
  • all the Searchable methods ...
    • use a thread-local XPathContext object that is only available in a block yielded by #get_xpath_context
    • and that context object is #reset when the block returns

There is an escape hatch that I will leave undocumented, which is to set the env var NOKOGIRI_DEOPTIMIZE_XPATH, out of an abundance of caution.

Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file:

$ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       large: normal     3.790k i/100ms
Calculating -------------------------------------
       large: normal     37.556k (± 1.7%) i/s   (26.63 μs/i) -    189.500k in   5.047390s
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       small: normal    11.726k i/100ms
Calculating -------------------------------------
       small: normal    113.719k (± 2.5%) i/s    (8.79 μs/i) -    574.574k in   5.055648s

$ ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    large: optimized     4.609k i/100ms
Calculating -------------------------------------
    large: optimized     48.107k (± 1.6%) i/s   (20.79 μs/i) -    244.277k in   5.079041s

Comparison:
    large: optimized:    48107.3 i/s
       large: normal:    37555.7 i/s - 1.28x  slower

ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    small: optimized    32.014k i/100ms
Calculating -------------------------------------
    small: optimized    319.760k (± 0.6%) i/s    (3.13 μs/i) -      1.601M in   5.006140s

Comparison:
    small: optimized:   319759.6 i/s
       small: normal:   113719.0 i/s - 2.81x  slower

What alternatives were considered?

I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling xmlXPathRegisterAllFunctions to re-register them took us back to the original performance profile.

The win here is to avoid re-calling xmlXPathRegisterAllFunctions, and for that we track registered variables and namespaces, and de-register them after the query eval completes.

Have you included adequate test coverage?

Yes.

Does this change affect the behavior of either the C or the Java implementations?

CRuby only.

@flavorjones flavorjones added this to the v1.18.0 milestone Dec 14, 2024
@flavorjones
Copy link
Member Author

@tenderlove if you get some time to review this, I'd really appreciate it. You can just focus on the final commit (everything else is nonfunctional refactoring). ❤️

@flavorjones flavorjones force-pushed the flavorjones-reuse-xpath-context branch from f92c678 to 334ad08 Compare December 14, 2024 21:12
@flavorjones flavorjones mentioned this pull request Dec 14, 2024
@flavorjones flavorjones force-pushed the flavorjones-reuse-xpath-context branch 2 times, most recently from b760c62 to 1729fd5 Compare December 14, 2024 21:19
flavorjones added a commit that referenced this pull request Dec 14, 2024
**What problem is this PR intended to solve?**

Some cleanup and refactoring that was originally in #3378 but is
generally valuable even if that PR doesn't land, so I split it out.
@flavorjones flavorjones force-pushed the flavorjones-reuse-xpath-context branch from 1729fd5 to beecfa4 Compare December 14, 2024 22:19
Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize).

This PR makes the following changes:

- the `XPathContext` object ...
  - can accept `nil` as the value arg to `#register_ns` and `#register_variable` to _deregister_ the namespace or variable
  - tracks namespaces and variables registered through those two methods
  - has a new `#reset` method that deregisters all namespaces and variables registered
  - has a new `#node=` method to set the current node being searched
- all the `Searchable` methods ...
  - use a thread-local XPathContext object that is only available in a block yielded by `#get_xpath_context`
  - and that context object is `#reset` when the block returns

There is an escape hatch that I will leave undocumented, which is to set the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution.

Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file:

```
$ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       large: normal     3.790k i/100ms
Calculating -------------------------------------
       large: normal     37.556k (± 1.7%) i/s   (26.63 μs/i) -    189.500k in   5.047390s
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       small: normal    11.726k i/100ms
Calculating -------------------------------------
       small: normal    113.719k (± 2.5%) i/s    (8.79 μs/i) -    574.574k in   5.055648s

$ ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    large: optimized     4.609k i/100ms
Calculating -------------------------------------
    large: optimized     48.107k (± 1.6%) i/s   (20.79 μs/i) -    244.277k in   5.079041s

Comparison:
    large: optimized:    48107.3 i/s
       large: normal:    37555.7 i/s - 1.28x  slower

ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    small: optimized    32.014k i/100ms
Calculating -------------------------------------
    small: optimized    319.760k (± 0.6%) i/s    (3.13 μs/i) -      1.601M in   5.006140s

Comparison:
    small: optimized:   319759.6 i/s
       small: normal:   113719.0 i/s - 2.81x  slower
```

I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling `xmlXPathRegisterAllFunctions` to re-register them took us back to the original performance profile.

The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and for that we track registered variables and namespaces, and de-register them after the query eval completes.
@flavorjones flavorjones force-pushed the flavorjones-reuse-xpath-context branch from beecfa4 to 969bea9 Compare December 14, 2024 22:21
@flavorjones flavorjones mentioned this pull request Dec 14, 2024
14 tasks
@flavorjones
Copy link
Member Author

Going to merge this so I can ship a 1.18 release candidate today. But if you have any feedback, the final release is still a couple of weeks away.

@flavorjones flavorjones merged commit cef057f into main Dec 16, 2024
145 checks passed
@flavorjones flavorjones deleted the flavorjones-reuse-xpath-context branch December 16, 2024 12:55
@nwellnhof
Copy link

Note that if you use a single XPath context and support custom XPath extension functions, a custom function could evaluate XPath expressions recursively which will lead to corruption of context variables. This is mostly due to some design mistakes in libxml2. So you should backup and restore the following members of xmlXPathContext before and after evaluating an expression:

  • doc
  • node
  • contextSize
  • proximityPosition

Other settings that could change like namespaces should be backed and restored up as well. Or you could simply disallow recursive evaluation in some way.

@flavorjones
Copy link
Member Author

@nwellnhof thank you for the advice! doc and node are being set explicitly every time, but I'll save/restore contextSize and proximityPosition before the final release.

@flavorjones
Copy link
Member Author

@nwellnhof Would it be OK to set contextSize and proximityPosition back to their defaults before every evaluation, rather than saving/restoring? Looks like xmlXPathNewContext sets them to -1.

flavorjones added a commit that referenced this pull request Dec 20, 2024
If we're re-using the xmlXPathContext object, there's a chance that
the context variables will be trashed by recursive custom functions.

In #3378 (comment),
Nick advised:

> Note that if you use a single XPath context and support custom XPath
> extension functions, a custom function could evaluate XPath
> expressions recursively which will lead to corruption of context
> variables. This is mostly due to some design mistakes in libxml2.

So let's set these context variables back to their default.
@flavorjones
Copy link
Member Author

See #3386

@nwellnhof
Copy link

doc and node are being set explicitly every time

That's not enough. Consider:

  • xmlXPathEval is called for the first time and context->node is set.
  • An extension function is called which evaluates another XPath expression.
  • xmlXPathEval is called for the second time and context->node is set to a new value.
  • xmlXPathEval returns but context->node still contains the new value.
  • The rest of the original evaluation continues with the wrong context node.

Looks like xmlXPathNewContext sets them to -1.

I'd suggest to set both values to 1 once after creating an XPath context, but it doesn't matter much outside of XSLT.

@flavorjones
Copy link
Member Author

@nwellnhof Oh! That makes sense, I misunderstood your original message. It hadn't even occurred to me that people would want to do that.

Alternatively, I could create a new xpath context object if there's a recursive call being made.

@nwellnhof
Copy link

Yes, it seems unlikely that Nokogiri users would want to do that. But in libxslt, this happens all the time.

I think the underlying issue is that context creation is too slow in libxml2 because the XPath functions are registered each time. If this was fixed, it shouldn't be too costly to recreate a context for each evaluation.

@flavorjones
Copy link
Member Author

@nwellnhof Yes, in my benchmark, xmlXPathRegisterAllFunctions dominated the initialization overhead.

@flavorjones
Copy link
Member Author

@nwellnhof If it's interesting, I implemented a branch in #3388 where I copy an initialized xmlXPathContext struct (i.e., I deep-copy funcHash instead of calling xmlXPathRegisterAllFunctions) and it's slightly faster but not an overwhelming win.

@nwellnhof
Copy link

I reworked XPath function lookup here: https://gitlab.gnome.org/GNOME/libxml2/-/commit/bf5fcf6e646bb51a0f6a3655a1d64bea97274867

xmlXPathNewContext should be a lot faster now.

@flavorjones
Copy link
Member Author

flavorjones commented Dec 21, 2024

@nwellnhof Thanks for that!

Here's how the above benchmark looks for small ~6kb files:

Comparison:
         small: edge:   306454.7 i/s
      small: patched:   225456.0 i/s - 1.36x  slower
         small: main:   124554.9 i/s - 2.46x  slower

or, dropping the "re-use the object" comparison:

Comparison:
            small: patched:   242901.7 i/s
               small: main:   127486.0 i/s - 1.91x  slower

where:

  • "edge" is Nokogiri v1.18.0.rc1 re-using the context object
  • "patched" is using libxml2 v2.13.5 with GNOME/libxml2@bf5fcf6e patched in
  • "main" is Nokogiri v1.17 creating a new XPathContext for each query

That's a really good win!

flavorjones added a commit that referenced this pull request Dec 21, 2024
See extended discussion at #3378

Benchmark comparing this commit against v1.17.x ("main"):

  Comparison:
           large: main:     3910.6 i/s
        large: patched:     3759.6 i/s - same-ish: difference falls within error

  Comparison:
        small: patched:   242901.7 i/s
           small: main:   127486.0 i/s - 1.91x  slower

I think we could get greater performance gains by re-using
XPathContext objects, but only at the cost of a significant amount of
additional complexity, since in order to properly support recursive
XPath evaluation, Nokogiri would have to push and pop "stack frames"
containing:

- internal state contextSize and proximityPosition
- registered namespaces
- registered variables
- function lookup handler

That feels like a lot of code for a small win. Comparatively, pulling
in this upstream patch is still a 2x speedup for zero additional complexity.
flavorjones added a commit that referenced this pull request Dec 21, 2024
See extended discussion at #3378

Benchmark comparing this commit against v1.17.x ("main"):

  Comparison:
           large: main:     3910.6 i/s
        large: patched:     3759.6 i/s - same-ish: difference falls within error

  Comparison:
        small: patched:   242901.7 i/s
           small: main:   127486.0 i/s - 1.91x  slower

I think we could get greater performance gains by re-using
XPathContext objects, but only at the cost of a significant amount of
additional complexity, since in order to properly support recursive
XPath evaluation, Nokogiri would have to push and pop "stack frames"
containing:

- internal state contextSize and proximityPosition
- registered namespaces
- registered variables
- function lookup handler

That feels like a lot of code for a small win. Comparatively, pulling
in this upstream patch is still a 2x speedup for zero additional complexity.
flavorjones added a commit that referenced this pull request Dec 21, 2024
See extended discussion at #3378

Benchmark comparing this commit against v1.17.x ("main"):

  Comparison:
           large: main:     3910.6 i/s
        large: patched:     3759.6 i/s - same-ish: difference falls within error

  Comparison:
        small: patched:   242901.7 i/s
           small: main:   127486.0 i/s - 1.91x  slower

I think we could get greater performance gains by re-using
XPathContext objects, but only at the cost of a significant amount of
additional complexity, since in order to properly support recursive
XPath evaluation, Nokogiri would have to push and pop "stack frames"
containing:

- internal state contextSize and proximityPosition
- registered namespaces
- registered variables
- function lookup handler

That feels like a lot of code for a small win. Comparatively, pulling
in this upstream patch is still a 2x speedup for zero additional complexity.
@nwellnhof
Copy link

Maybe I'm missing something, but looking at commit cef057f, I can't see where namespaces (or variables) are unregistered from the global context after evaluation.

On another note, if you really want to speed up XPath processing, you should think about supporting compiled expressions.

flavorjones added a commit that referenced this pull request Dec 21, 2024
**What problem is this PR intended to solve?**

After chatting with @nwellnhof in #3378, he went ahead and made an
performance-focused change upstream in libxml2/gnome@bf5fcf6e that I'm
pulling in here to the vendored library.

Benchmark comparing this PR against v1.17.x ("main"):

    Comparison:
             large: main:     3910.6 i/s
large: patched: 3759.6 i/s - same-ish: difference falls within error

    Comparison:
          small: patched:   242901.7 i/s
             small: main:   127486.0 i/s - 1.91x  slower

We could get greater performance gains by re-using `XPathContext`
objects, but only at the cost of a significant amount of additional
complexity, since in order to properly support recursive XPath
evaluation, Nokogiri would have to push and pop "stack frames"
containing:

- internal state contextSize and proximityPosition
- registered namespaces
- registered variables
- function lookup handler

That feels like a lot of code for a small win.

Comparatively, pulling in this upstream patch is still a ~2x speedup
with zero additional complexity. That's a win.

**Have you included adequate test coverage?**

N/A


**Does this change affect the behavior of either the C or the Java
implementations?**

Performance improvement impacting CRuby only.
@flavorjones
Copy link
Member Author

flavorjones commented Dec 21, 2024

@nwellnhof I reverted most of this PR in #3389, which I just merged. So what's on main no longer re-uses the XPathContext struct/object.

But if it's interesting, the ruby method XPathContext#reset was what was deregistering variables and namespaces. This method was called from an ensure block after the evaluation returned.

@flavorjones
Copy link
Member Author

flavorjones commented Dec 21, 2024

@nwellnhof re: compiled expressions, I did implement it in #3380 but I was not able to benchmark as large of a performance difference as I expected. Any advice you'd have on when to compile expressions might be helpful for me, both to benchmark and to help users understand when to do it.

Similarly: I have a branch that enables the XPath cache, and it did show some pretty good speedup on some blunt benchmarks, but again I don't know enough about how it works to give people advice on when to enable it.

@nwellnhof
Copy link

nwellnhof commented Dec 21, 2024

but I was not able to benchmark much of a difference. Any advice you'd have on gaining a better understanding of when this would be an improvement would be very valuable.

It should be most noticeable with expressions that evaluate quickly like @attr. But maybe compilation isn't that much of an overhead after all. The XPath compiler also still misses some important optimizations.

I also have a branch that enabled the XPath cache, and it did show some pretty good speedup on some blunt benchmarks, but I don't know enough about how it works to give people advice on when to enable it.

The cache speeds up allocation of XPath objects. In general, it's a good idea to enable it.

Another useful optimization is xmlXPathOrderDocElems which speeds up sorting of node sets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants