Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support compiled XPath expressions #3380

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

flavorjones
Copy link
Member

What problem is this PR intended to solve?

There has been some discussion, summarized at #3266, about exposing libxml2's support for compiled XPath expressions. The idea is that, if you have a complex expression that you use a lot and you don't want to pay the cost of parsing/compiling it multiple times, then you can compile it once and presumably your document search will be faster.

This PR implements a new T_DATA class, XML::XPath::Expression, which stores the result of compiling an XPath expression via xmlXPathCompile. The XPathContext class knows how to accept either a String or an Expression.

However, I'm not seeing noticeable improvements in speed, though my benchmark may not capture the benefits.

I'm posting this as a draft in case someone wants to write me a benchmark that shows compiled XPath expressions are compellingly faster than just using Strings. Right now, based on what I'm seeing, I'm not at all sure the complexity is worth the benefit.

Have you included adequate test coverage?

Yes.

Does this change affect the behavior of either the C or the Java implementations?

This is an optimization available on CRuby only; though the idea is that the shorthand methods Nokogiri::XML::XPath.expression and Nokogiri::CSS.selector will be no-ops on JRuby (returning the string argument) and code that uses Expressions could be portable across both implementations.

@flavorjones flavorjones force-pushed the flavorjones-compiled-xpath-queries branch from a40965b to 930e231 Compare December 21, 2024 22:21
@flavorjones
Copy link
Member Author

flavorjones commented Dec 21, 2024

An example benchmark script:

#! /usr/bin/env ruby

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "nokogiri", path: "."
  gem "benchmark-ips"
end

doc_large = Nokogiri::HTML5.parse(File.read(File.join(__dir__, "../test/files/tlm.html")))
doc_small = Nokogiri::HTML5.parse(File.read(File.join(__dir__, "../test/files/noencoding.html")))

Benchmark.ips do |x|
  x.warmup = 0
  expression_str = "//p[nokogiri-builtin:css-class(@class,'br0') and count(preceding-sibling::*)=0]"
  expression_comp = Nokogiri::XML::XPath::Expression.new(expression_str)

  x.report("small: compiled") do
    doc_small.xpath(expression_comp).length == 0 or raise("nope")
  end

  x.report("small: string") do
    doc_small.xpath(expression_str).length == 0 or raise("nope")
  end

  x.compare!
end

outputs:

Calculating -------------------------------------
     small: compiled     56.468k (±12.3%) i/s   (17.71 μs/i) -    269.438k in   4.948751s
       small: string     49.666k (±14.9%) i/s   (20.13 μs/i) -    236.924k in   4.955582s

Comparison:
     small: compiled:    56468.3 i/s
       small: string:    49665.7 i/s - same-ish: difference falls within error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant