Skip to content

Commit

Permalink
Make available the XPath path of the node where a SyntaxError occurre…
Browse files Browse the repository at this point in the history
…d during Schema validation (#3316)

**What problem is this PR intended to solve?**

When validating an XML document with a schema it is difficult to
impossible to know where the error happened in the document. With the
path on the error, we can identify where in the document the error
occurred.

**Have you included adequate test coverage?**

I believe so. I updated the test in the schema to check to see if path
errors were being returned correctly.

**Does this change affect the behavior of either the C or the Java
implementations?**

This change the C implementation gently. Setting the path during error
creation.
I looked if xerces-j could do it, but I couldn't find anything on the
SAXException that would help.

---------

Co-authored-by: Mike Dalessio <mike.dalessio@gmail.com>
  • Loading branch information
ryanong and flavorjones authored Oct 18, 2024
1 parent 0df9227 commit cda444f
Show file tree
Hide file tree
Showing 6 changed files with 35 additions and 1 deletion.
6 changes: 6 additions & 0 deletions ext/nokogiri/xml_syntax_error.c
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ noko__error_raise(void *ctx, xmlErrorConstPtr error)
VALUE
noko_xml_syntax_error__wrap(xmlErrorConstPtr error)
{
xmlChar *c_path ;
VALUE msg, e, klass;

klass = cNokogiriXmlSyntaxError;
Expand All @@ -61,16 +62,21 @@ noko_xml_syntax_error__wrap(xmlErrorConstPtr error)
);

if (error) {
c_path = xmlGetNodePath(error->node);

rb_iv_set(e, "@domain", INT2NUM(error->domain));
rb_iv_set(e, "@code", INT2NUM(error->code));
rb_iv_set(e, "@level", INT2NUM((short)error->level));
rb_iv_set(e, "@file", RBSTR_OR_QNIL(error->file));
rb_iv_set(e, "@line", INT2NUM(error->line));
rb_iv_set(e, "@path", RBSTR_OR_QNIL(c_path));
rb_iv_set(e, "@str1", RBSTR_OR_QNIL(error->str1));
rb_iv_set(e, "@str2", RBSTR_OR_QNIL(error->str2));
rb_iv_set(e, "@str3", RBSTR_OR_QNIL(error->str3));
rb_iv_set(e, "@int1", INT2NUM(error->int1));
rb_iv_set(e, "@column", INT2NUM(error->int2));

xmlFree(c_path);
}

return e;
Expand Down
9 changes: 9 additions & 0 deletions lib/nokogiri/xml/syntax_error.rb
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,15 @@ def aggregate(errors)
attr_reader :level
attr_reader :file
attr_reader :line

# The XPath path of the node that caused the error when validating a `Nokogiri::XML::Document`.
#
# This attribute will only be non-nil when the error is emitted by `Schema#validate` on
# Document objects. It will return `nil` for DOM parsing errors and for errors emitted during
# Schema validation of files.
#
# ⚠ `#path` is not supported on JRuby, where it will always return `nil`.
attr_reader :path
attr_reader :str1
attr_reader :str2
attr_reader :str3
Expand Down
2 changes: 2 additions & 0 deletions test/html4/test_document.rb
Original file line number Diff line number Diff line change
Expand Up @@ -813,6 +813,7 @@ def test_silencing_nonparse_errors_during_attribute_insertion_1262
Nokogiri::HTML4.parse(input, nil, nil, parse_options)
end
assert_match(/Parser without recover option encountered error or warning/, exception.to_s)
assert_nil(exception.path)
end
end

Expand All @@ -835,6 +836,7 @@ def test_silencing_nonparse_errors_during_attribute_insertion_1262
Nokogiri::HTML4.parse(input, nil, "UTF-8", parse_options)
end
assert_match(/Parser without recover option encountered error or warning/, exception.to_s)
assert_nil(exception.path)
end
end

Expand Down
3 changes: 2 additions & 1 deletion test/xml/test_document.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1087,9 +1087,10 @@ def test_can_be_closed
let(:parse_options) { xml_strict }

it "raises exception on parse error" do
assert_raises Nokogiri::SyntaxError do
error = assert_raises Nokogiri::SyntaxError do
Nokogiri::XML.parse(input, nil, nil, parse_options)
end
assert_nil(error.path)
end
end

Expand Down
15 changes: 15 additions & 0 deletions test/xml/test_schema.rb
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,17 @@ class TestNokogiriXMLSchema < Nokogiri::TestCase

assert(errors = xsd.validate(doc))
assert_equal(2, errors.length)
if Nokogiri.uses_libxml?
assert_equal(
["/purchaseOrder/billTo/state", "/purchaseOrder/shipTo/state"],
errors.map(&:path).sort,
)
else
assert_equal(
[nil, nil],
errors.map(&:path).sort,
)
end
end

it "validate_invalid_file" do
Expand All @@ -171,6 +182,10 @@ class TestNokogiriXMLSchema < Nokogiri::TestCase

assert(errors = xsd.validate(tempfile.path))
assert_equal(2, errors.length)
assert_equal(
[nil, nil],
errors.map(&:path).sort,
)
end

it "validate_non_document" do
Expand Down
1 change: 1 addition & 0 deletions test/xml/test_syntax_error.rb
Original file line number Diff line number Diff line change
Expand Up @@ -58,5 +58,6 @@
assert_nil error.column
assert_nil error.level
end
assert_nil error.path
end
end

0 comments on commit cda444f

Please sign in to comment.