perf(cruby): reuse XPathContext objects #3378

flavorjones · 2024-12-14T21:06:46Z

What problem is this PR intended to solve?

Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize).

This PR makes the following changes:

the XPathContext object ...
- can accept nil as the value arg to #register_ns and #register_variable to deregister the namespace or variable
- tracks namespaces and variables registered through those two methods
- has a new #reset method that deregisters all namespaces and variables registered
- has a new #node= method to set the current node being searched
all the Searchable methods ...
- use a thread-local XPathContext object that is only available in a block yielded by #get_xpath_context
- and that context object is #reset when the block returns

There is an escape hatch that I will leave undocumented, which is to set the env var NOKOGIRI_DEOPTIMIZE_XPATH, out of an abundance of caution.

Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file:

$ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       large: normal     3.790k i/100ms
Calculating -------------------------------------
       large: normal     37.556k (± 1.7%) i/s   (26.63 μs/i) -    189.500k in   5.047390s
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       small: normal    11.726k i/100ms
Calculating -------------------------------------
       small: normal    113.719k (± 2.5%) i/s    (8.79 μs/i) -    574.574k in   5.055648s

$ ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    large: optimized     4.609k i/100ms
Calculating -------------------------------------
    large: optimized     48.107k (± 1.6%) i/s   (20.79 μs/i) -    244.277k in   5.079041s

Comparison:
    large: optimized:    48107.3 i/s
       large: normal:    37555.7 i/s - 1.28x  slower

ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    small: optimized    32.014k i/100ms
Calculating -------------------------------------
    small: optimized    319.760k (± 0.6%) i/s    (3.13 μs/i) -      1.601M in   5.006140s

Comparison:
    small: optimized:   319759.6 i/s
       small: normal:   113719.0 i/s - 2.81x  slower

What alternatives were considered?

I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling xmlXPathRegisterAllFunctions to re-register them took us back to the original performance profile.

The win here is to avoid re-calling xmlXPathRegisterAllFunctions, and for that we track registered variables and namespaces, and de-register them after the query eval completes.

Have you included adequate test coverage?

Yes.

Does this change affect the behavior of either the C or the Java implementations?

CRuby only.

flavorjones · 2024-12-14T21:08:19Z

@tenderlove if you get some time to review this, I'd really appreciate it. You can just focus on the final commit (everything else is nonfunctional refactoring). ❤️

**What problem is this PR intended to solve?** Some cleanup and refactoring that was originally in #3378 but is generally valuable even if that PR doesn't land, so I split it out.

Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize). This PR makes the following changes: - the `XPathContext` object ... - can accept `nil` as the value arg to `#register_ns` and `#register_variable` to _deregister_ the namespace or variable - tracks namespaces and variables registered through those two methods - has a new `#reset` method that deregisters all namespaces and variables registered - has a new `#node=` method to set the current node being searched - all the `Searchable` methods ... - use a thread-local XPathContext object that is only available in a block yielded by `#get_xpath_context` - and that context object is `#reset` when the block returns There is an escape hatch that I will leave undocumented, which is to set the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution. Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file: ``` $ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: normal 3.790k i/100ms Calculating ------------------------------------- large: normal 37.556k (± 1.7%) i/s (26.63 μs/i) - 189.500k in 5.047390s ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: normal 11.726k i/100ms Calculating ------------------------------------- small: normal 113.719k (± 2.5%) i/s (8.79 μs/i) - 574.574k in 5.055648s $ ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: optimized 4.609k i/100ms Calculating ------------------------------------- large: optimized 48.107k (± 1.6%) i/s (20.79 μs/i) - 244.277k in 5.079041s Comparison: large: optimized: 48107.3 i/s large: normal: 37555.7 i/s - 1.28x slower ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: optimized 32.014k i/100ms Calculating ------------------------------------- small: optimized 319.760k (± 0.6%) i/s (3.13 μs/i) - 1.601M in 5.006140s Comparison: small: optimized: 319759.6 i/s small: normal: 113719.0 i/s - 2.81x slower ``` I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling `xmlXPathRegisterAllFunctions` to re-register them took us back to the original performance profile. The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and for that we track registered variables and namespaces, and de-register them after the query eval completes.

flavorjones · 2024-12-16T12:55:41Z

Going to merge this so I can ship a 1.18 release candidate today. But if you have any feedback, the final release is still a couple of weeks away.

nwellnhof · 2024-12-20T13:21:00Z

Note that if you use a single XPath context and support custom XPath extension functions, a custom function could evaluate XPath expressions recursively which will lead to corruption of context variables. This is mostly due to some design mistakes in libxml2. So you should backup and restore the following members of xmlXPathContext before and after evaluating an expression:

doc
node
contextSize
proximityPosition

Other settings that could change like namespaces should be backed and restored up as well. Or you could simply disallow recursive evaluation in some way.

flavorjones · 2024-12-20T18:25:10Z

@nwellnhof thank you for the advice! doc and node are being set explicitly every time, but I'll save/restore contextSize and proximityPosition before the final release.

flavorjones · 2024-12-20T18:31:26Z

@nwellnhof Would it be OK to set contextSize and proximityPosition back to their defaults before every evaluation, rather than saving/restoring? Looks like xmlXPathNewContext sets them to -1.

If we're re-using the xmlXPathContext object, there's a chance that the context variables will be trashed by recursive custom functions. In #3378 (comment), Nick advised: > Note that if you use a single XPath context and support custom XPath > extension functions, a custom function could evaluate XPath > expressions recursively which will lead to corruption of context > variables. This is mostly due to some design mistakes in libxml2. So let's set these context variables back to their default.

flavorjones · 2024-12-20T18:43:13Z

See #3386

nwellnhof · 2024-12-20T18:46:44Z

doc and node are being set explicitly every time

That's not enough. Consider:

xmlXPathEval is called for the first time and context->node is set.
An extension function is called which evaluates another XPath expression.
xmlXPathEval is called for the second time and context->node is set to a new value.
xmlXPathEval returns but context->node still contains the new value.
The rest of the original evaluation continues with the wrong context node.

Looks like xmlXPathNewContext sets them to -1.

I'd suggest to set both values to 1 once after creating an XPath context, but it doesn't matter much outside of XSLT.

flavorjones · 2024-12-20T19:02:16Z

@nwellnhof Oh! That makes sense, I misunderstood your original message. It hadn't even occurred to me that people would want to do that.

Alternatively, I could create a new xpath context object if there's a recursive call being made.

nwellnhof · 2024-12-20T19:13:59Z

Yes, it seems unlikely that Nokogiri users would want to do that. But in libxslt, this happens all the time.

I think the underlying issue is that context creation is too slow in libxml2 because the XPath functions are registered each time. If this was fixed, it shouldn't be too costly to recreate a context for each evaluation.

flavorjones · 2024-12-20T19:16:07Z

@nwellnhof Yes, in my benchmark, xmlXPathRegisterAllFunctions dominated the initialization overhead.

flavorjones · 2024-12-20T22:26:07Z

@nwellnhof If it's interesting, I implemented a branch in #3388 where I copy an initialized xmlXPathContext struct (i.e., I deep-copy funcHash instead of calling xmlXPathRegisterAllFunctions) and it's slightly faster but not an overwhelming win.

nwellnhof · 2024-12-21T16:27:07Z

I reworked XPath function lookup here: https://gitlab.gnome.org/GNOME/libxml2/-/commit/bf5fcf6e646bb51a0f6a3655a1d64bea97274867

xmlXPathNewContext should be a lot faster now.

flavorjones · 2024-12-21T19:49:23Z

@nwellnhof Thanks for that!

Here's how the above benchmark looks for small ~6kb files:

Comparison:
         small: edge:   306454.7 i/s
      small: patched:   225456.0 i/s - 1.36x  slower
         small: main:   124554.9 i/s - 2.46x  slower

or, dropping the "re-use the object" comparison:

Comparison:
            small: patched:   242901.7 i/s
               small: main:   127486.0 i/s - 1.91x  slower

where:

"edge" is Nokogiri v1.18.0.rc1 re-using the context object
"patched" is using libxml2 v2.13.5 with GNOME/libxml2@bf5fcf6e patched in
"main" is Nokogiri v1.17 creating a new XPathContext for each query

That's a really good win!

See extended discussion at #3378 Benchmark comparing this commit against v1.17.x ("main"): Comparison: large: main: 3910.6 i/s large: patched: 3759.6 i/s - same-ish: difference falls within error Comparison: small: patched: 242901.7 i/s small: main: 127486.0 i/s - 1.91x slower I think we could get greater performance gains by re-using XPathContext objects, but only at the cost of a significant amount of additional complexity, since in order to properly support recursive XPath evaluation, Nokogiri would have to push and pop "stack frames" containing: - internal state contextSize and proximityPosition - registered namespaces - registered variables - function lookup handler That feels like a lot of code for a small win. Comparatively, pulling in this upstream patch is still a 2x speedup for zero additional complexity.

nwellnhof · 2024-12-21T20:14:31Z

Maybe I'm missing something, but looking at commit cef057f, I can't see where namespaces (or variables) are unregistered from the global context after evaluation.

On another note, if you really want to speed up XPath processing, you should think about supporting compiled expressions.

@nwellnhof

**What problem is this PR intended to solve?** After chatting with @nwellnhof in #3378, he went ahead and made an performance-focused change upstream in libxml2/gnome@bf5fcf6e that I'm pulling in here to the vendored library. Benchmark comparing this PR against v1.17.x ("main"): Comparison: large: main: 3910.6 i/s large: patched: 3759.6 i/s - same-ish: difference falls within error Comparison: small: patched: 242901.7 i/s small: main: 127486.0 i/s - 1.91x slower We could get greater performance gains by re-using `XPathContext` objects, but only at the cost of a significant amount of additional complexity, since in order to properly support recursive XPath evaluation, Nokogiri would have to push and pop "stack frames" containing: - internal state contextSize and proximityPosition - registered namespaces - registered variables - function lookup handler That feels like a lot of code for a small win. Comparatively, pulling in this upstream patch is still a ~2x speedup with zero additional complexity. That's a win. **Have you included adequate test coverage?** N/A **Does this change affect the behavior of either the C or the Java implementations?** Performance improvement impacting CRuby only.

flavorjones · 2024-12-21T22:14:14Z

@nwellnhof I reverted most of this PR in #3389, which I just merged. So what's on main no longer re-uses the XPathContext struct/object.

But if it's interesting, the ruby method XPathContext#reset was what was deregistering variables and namespaces. This method was called from an ensure block after the evaluation returned.

flavorjones · 2024-12-21T22:16:11Z

@nwellnhof re: compiled expressions, I did implement it in #3380 but I was not able to benchmark as large of a performance difference as I expected. Any advice you'd have on when to compile expressions might be helpful for me, both to benchmark and to help users understand when to do it.

Similarly: I have a branch that enables the XPath cache, and it did show some pretty good speedup on some blunt benchmarks, but again I don't know enough about how it works to give people advice on when to enable it.

nwellnhof · 2024-12-21T22:43:16Z

but I was not able to benchmark much of a difference. Any advice you'd have on gaining a better understanding of when this would be an improvement would be very valuable.

It should be most noticeable with expressions that evaluate quickly like @attr. But maybe compilation isn't that much of an overhead after all. The XPath compiler also still misses some important optimizations.

I also have a branch that enabled the XPath cache, and it did show some pretty good speedup on some blunt benchmarks, but I don't know enough about how it works to give people advice on when to enable it.

The cache speeds up allocation of XPath objects. In general, it's a good idea to enable it.

Another useful optimization is xmlXPathOrderDocElems which speeds up sorting of node sets.

flavorjones added this to the v1.18.0 milestone Dec 14, 2024

flavorjones assigned tenderlove Dec 14, 2024

flavorjones force-pushed the flavorjones-reuse-xpath-context branch from f92c678 to 334ad08 Compare December 14, 2024 21:12

flavorjones mentioned this pull request Dec 14, 2024

Minor cleanup #3379

Merged

flavorjones force-pushed the flavorjones-reuse-xpath-context branch 2 times, most recently from b760c62 to 1729fd5 Compare December 14, 2024 21:19

flavorjones unassigned tenderlove Dec 14, 2024

flavorjones requested a review from tenderlove December 14, 2024 21:19

flavorjones mentioned this pull request Dec 14, 2024

[perf] Explore re-using XPathContext objects, and compiling XPath expressions #3266

Closed

flavorjones added a commit that referenced this pull request Dec 14, 2024

Minor cleanup (#3379)

c9b26b0

**What problem is this PR intended to solve?** Some cleanup and refactoring that was originally in #3378 but is generally valuable even if that PR doesn't land, so I split it out.

flavorjones force-pushed the flavorjones-reuse-xpath-context branch from 1729fd5 to beecfa4 Compare December 14, 2024 22:19

flavorjones force-pushed the flavorjones-reuse-xpath-context branch from beecfa4 to 969bea9 Compare December 14, 2024 22:21

flavorjones mentioned this pull request Dec 14, 2024

release planning: v1.18.0 #3367

Open

14 tasks

flavorjones added the topic/performance label Dec 15, 2024

flavorjones merged commit cef057f into main Dec 16, 2024
145 checks passed

flavorjones deleted the flavorjones-reuse-xpath-context branch December 16, 2024 12:55

flavorjones mentioned this pull request Dec 20, 2024

Reset xmlXPathContext state variables before each evaluation #3386

Closed

flavorjones mentioned this pull request Dec 20, 2024

Try to initialize xmlXPathContext faster #3388

Closed

flavorjones mentioned this pull request Dec 21, 2024

perf: use libxml2 patch for faster xpath context init #3389

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(cruby): reuse XPathContext objects #3378

perf(cruby): reuse XPathContext objects #3378

flavorjones commented Dec 14, 2024

flavorjones commented Dec 14, 2024

flavorjones commented Dec 16, 2024

nwellnhof commented Dec 20, 2024

flavorjones commented Dec 20, 2024

flavorjones commented Dec 20, 2024

flavorjones commented Dec 20, 2024

nwellnhof commented Dec 20, 2024

flavorjones commented Dec 20, 2024

nwellnhof commented Dec 20, 2024

flavorjones commented Dec 20, 2024

flavorjones commented Dec 20, 2024

nwellnhof commented Dec 21, 2024

flavorjones commented Dec 21, 2024 •

edited

Loading

nwellnhof commented Dec 21, 2024

flavorjones commented Dec 21, 2024 •

edited

Loading

flavorjones commented Dec 21, 2024 •

edited

Loading

nwellnhof commented Dec 21, 2024 •

edited

Loading

perf(cruby): reuse XPathContext objects #3378

perf(cruby): reuse XPathContext objects #3378

Conversation

flavorjones commented Dec 14, 2024

flavorjones commented Dec 14, 2024

flavorjones commented Dec 16, 2024

nwellnhof commented Dec 20, 2024

flavorjones commented Dec 20, 2024

flavorjones commented Dec 20, 2024

flavorjones commented Dec 20, 2024

nwellnhof commented Dec 20, 2024

flavorjones commented Dec 20, 2024

nwellnhof commented Dec 20, 2024

flavorjones commented Dec 20, 2024

flavorjones commented Dec 20, 2024

nwellnhof commented Dec 21, 2024

flavorjones commented Dec 21, 2024 • edited Loading

nwellnhof commented Dec 21, 2024

flavorjones commented Dec 21, 2024 • edited Loading

flavorjones commented Dec 21, 2024 • edited Loading

nwellnhof commented Dec 21, 2024 • edited Loading

flavorjones commented Dec 21, 2024 •

edited

Loading

flavorjones commented Dec 21, 2024 •

edited

Loading

flavorjones commented Dec 21, 2024 •

edited

Loading

nwellnhof commented Dec 21, 2024 •

edited

Loading