-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new tests for syntax and ill-formed parser errors and fix... emm... errors #684
Conversation
- `eof_1` and `bad_1` replaced by `syntax::decl` tests. - `test_start` replaced by `syntax::tag::normal` tests. - `test_offset_err_comment` and `test_offset_err_comment_trim_text` replaced by `syntax::comment` tests. Checks of position of preceding start tag is not meaningful for that tests, so such replacement is justified. failures (150): syntax::cdata::unclosed1::async_tokio syntax::cdata::unclosed1::borrowed syntax::cdata::unclosed1::buffered syntax::cdata::unclosed2::async_tokio syntax::cdata::unclosed2::borrowed syntax::cdata::unclosed2::buffered syntax::cdata::unclosed3::async_tokio syntax::cdata::unclosed3::borrowed syntax::cdata::unclosed3::buffered syntax::cdata::unclosed4::async_tokio syntax::cdata::unclosed4::borrowed syntax::cdata::unclosed4::buffered syntax::cdata::unclosed5::async_tokio syntax::cdata::unclosed5::borrowed syntax::cdata::unclosed5::buffered syntax::cdata::unclosed6::async_tokio syntax::cdata::unclosed6::borrowed syntax::cdata::unclosed6::buffered syntax::cdata::unclosed7::async_tokio syntax::cdata::unclosed7::borrowed syntax::cdata::unclosed7::buffered syntax::cdata::unclosed8::async_tokio syntax::cdata::unclosed8::borrowed syntax::cdata::unclosed8::buffered syntax::cdata::unclosed9::async_tokio syntax::cdata::unclosed9::borrowed syntax::cdata::unclosed9::buffered syntax::comment::unclosed1::async_tokio syntax::comment::unclosed1::borrowed syntax::comment::unclosed1::buffered syntax::comment::unclosed2::async_tokio syntax::comment::unclosed2::borrowed syntax::comment::unclosed2::buffered syntax::comment::unclosed3::async_tokio syntax::comment::unclosed3::borrowed syntax::comment::unclosed3::buffered syntax::comment::unclosed4::async_tokio syntax::comment::unclosed4::borrowed syntax::comment::unclosed4::buffered syntax::comment::unclosed5::async_tokio syntax::comment::unclosed5::borrowed syntax::comment::unclosed5::buffered syntax::comment::unclosed6::async_tokio syntax::comment::unclosed6::borrowed syntax::comment::unclosed6::buffered syntax::comment::unclosed7::async_tokio syntax::comment::unclosed7::borrowed syntax::comment::unclosed7::buffered syntax::decl::normal1::async_tokio syntax::decl::normal1::borrowed syntax::decl::normal1::buffered syntax::decl::unclosed1::async_tokio syntax::decl::unclosed1::borrowed syntax::decl::unclosed1::buffered syntax::decl::unclosed2::async_tokio syntax::decl::unclosed2::borrowed syntax::decl::unclosed2::buffered syntax::decl::unclosed3::async_tokio syntax::decl::unclosed3::borrowed syntax::decl::unclosed3::buffered syntax::decl::unclosed4::async_tokio syntax::decl::unclosed4::borrowed syntax::decl::unclosed4::buffered syntax::doctype::unclosed1::async_tokio syntax::doctype::unclosed1::borrowed syntax::doctype::unclosed1::buffered syntax::doctype::unclosed2::async_tokio syntax::doctype::unclosed2::borrowed syntax::doctype::unclosed2::buffered syntax::doctype::unclosed3::async_tokio syntax::doctype::unclosed3::borrowed syntax::doctype::unclosed3::buffered syntax::doctype::unclosed4::async_tokio syntax::doctype::unclosed4::borrowed syntax::doctype::unclosed4::buffered syntax::doctype::unclosed5::async_tokio syntax::doctype::unclosed5::borrowed syntax::doctype::unclosed5::buffered syntax::doctype::unclosed6::async_tokio syntax::doctype::unclosed6::borrowed syntax::doctype::unclosed6::buffered syntax::doctype::unclosed7::async_tokio syntax::doctype::unclosed7::borrowed syntax::doctype::unclosed7::buffered syntax::doctype::unclosed8::async_tokio syntax::doctype::unclosed8::borrowed syntax::doctype::unclosed8::buffered syntax::doctype::unclosed9::async_tokio syntax::doctype::unclosed9::borrowed syntax::doctype::unclosed9::buffered syntax::pi::normal_empty::async_tokio syntax::pi::normal_empty::borrowed syntax::pi::normal_empty::buffered syntax::pi::unclosed10::async_tokio syntax::pi::unclosed10::borrowed syntax::pi::unclosed10::buffered syntax::pi::unclosed1::async_tokio syntax::pi::unclosed1::borrowed syntax::pi::unclosed1::buffered syntax::pi::unclosed2::async_tokio syntax::pi::unclosed2::borrowed syntax::pi::unclosed2::buffered syntax::pi::unclosed3::async_tokio syntax::pi::unclosed3::borrowed syntax::pi::unclosed3::buffered syntax::pi::unclosed4::async_tokio syntax::pi::unclosed4::borrowed syntax::pi::unclosed4::buffered syntax::pi::unclosed5::async_tokio syntax::pi::unclosed5::borrowed syntax::pi::unclosed5::buffered syntax::pi::unclosed6::async_tokio syntax::pi::unclosed6::borrowed syntax::pi::unclosed6::buffered syntax::pi::unclosed7::async_tokio syntax::pi::unclosed7::borrowed syntax::pi::unclosed7::buffered syntax::pi::unclosed8::async_tokio syntax::pi::unclosed8::borrowed syntax::pi::unclosed8::buffered syntax::pi::unclosed9::async_tokio syntax::pi::unclosed9::borrowed syntax::pi::unclosed9::buffered syntax::tag::unclosed1::async_tokio syntax::tag::unclosed1::borrowed syntax::tag::unclosed1::buffered syntax::tag::unclosed2::async_tokio syntax::tag::unclosed2::borrowed syntax::tag::unclosed2::buffered syntax::tag::unclosed3::async_tokio syntax::tag::unclosed3::borrowed syntax::tag::unclosed3::buffered syntax::tag::unclosed4::async_tokio syntax::tag::unclosed4::borrowed syntax::tag::unclosed4::buffered syntax::tag::unclosed5::async_tokio syntax::tag::unclosed5::borrowed syntax::tag::unclosed5::buffered syntax::tag::unclosed6::async_tokio syntax::tag::unclosed6::borrowed syntax::tag::unclosed6::buffered syntax::tag::unclosed7::async_tokio syntax::tag::unclosed7::borrowed syntax::tag::unclosed7::buffered syntax::unclosed_bang1::async_tokio syntax::unclosed_bang1::borrowed syntax::unclosed_bang1::buffered syntax::unclosed_bang2::async_tokio syntax::unclosed_bang2::borrowed syntax::unclosed_bang2::buffered
…nted to `<` Some code become ugly, but anyway it will be rewritten soon failures (6): syntax::decl::normal1::async_tokio syntax::decl::normal1::borrowed syntax::decl::normal1::buffered syntax::pi::normal_empty::async_tokio syntax::pi::normal_empty::borrowed syntax::pi::normal_empty::buffered
Although empty target name isnot allowed, we would check that (or change parsing) the same way as currently we check the presence of mandatory `version` attribute in XML declaration. This requires introducing separate type for PI content which will be done in tafia#650 Fixed (3): syntax::pi::normal_empty::async_tokio syntax::pi::normal_empty::borrowed syntax::pi::normal_empty::buffered failures (3): syntax::decl::normal1::async_tokio syntax::decl::normal1::borrowed syntax::decl::normal1::buffered
This is more consistent with the way how we check presence of the mandatory `version` attribute in XML declaration. That check currently only performed when you try to read version. Also, technically in the grammar space is a part of version info: https://www.w3.org/TR/xml11/#NT-XMLDecl Fixed (3): syntax::decl::normal1::async_tokio syntax::decl::normal1::borrowed syntax::decl::normal1::buffered
failures (15): ill_formed::double_hyphen_in_comment2::async_tokio ill_formed::double_hyphen_in_comment2::borrowed ill_formed::double_hyphen_in_comment2::buffered ill_formed::double_hyphen_in_comment3::async_tokio ill_formed::double_hyphen_in_comment3::borrowed ill_formed::double_hyphen_in_comment3::buffered ill_formed::double_hyphen_in_comment4::async_tokio ill_formed::double_hyphen_in_comment4::borrowed ill_formed::double_hyphen_in_comment4::buffered ill_formed::missed_doctype_name1::async_tokio ill_formed::missed_doctype_name1::borrowed ill_formed::missed_doctype_name1::buffered ill_formed::missed_doctype_name2::async_tokio ill_formed::missed_doctype_name2::borrowed ill_formed::missed_doctype_name2::buffered
…omment)` `p` in the original code was the number of found occurrence and not a byte offset into `buf` Fixed (9): ill_formed::double_hyphen_in_comment2::async_tokio ill_formed::double_hyphen_in_comment2::borrowed ill_formed::double_hyphen_in_comment2::buffered ill_formed::double_hyphen_in_comment3::async_tokio ill_formed::double_hyphen_in_comment3::borrowed ill_formed::double_hyphen_in_comment3::buffered ill_formed::double_hyphen_in_comment4::async_tokio ill_formed::double_hyphen_in_comment4::borrowed ill_formed::double_hyphen_in_comment4::buffered failures (6): ill_formed::missed_doctype_name1::async_tokio ill_formed::missed_doctype_name1::borrowed ill_formed::missed_doctype_name1::buffered ill_formed::missed_doctype_name2::async_tokio ill_formed::missed_doctype_name2::borrowed ill_formed::missed_doctype_name2::buffered
…me)` Fixed (6): ill_formed::missed_doctype_name1::async_tokio ill_formed::missed_doctype_name1::borrowed ill_formed::missed_doctype_name1::buffered ill_formed::missed_doctype_name2::async_tokio ill_formed::missed_doctype_name2::borrowed ill_formed::missed_doctype_name2::buffered
src/errors.rs
Outdated
@@ -103,6 +114,12 @@ pub enum IllFormedError { | |||
impl fmt::Display for IllFormedError { | |||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | |||
match self { | |||
Self::MissedVersion(None) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, DeclWithoutVersion
makes more sense as the name. MissedVersion
doesn't provide very much context.
The existing tags using that pattern e.g. MissedEnd
are a little better, enough to make it OK in my opinion, but on reflection maybe it would be better to be more explicit e.g. MissingEndTag
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep errors with a similar meaning named similar. What about
MissingDeclVersion
MissingDoctypeName
MissingEndTag
MismatchedEndTag
UnmatchedEndTag
DoubleHyphenInComment
(not changed)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MissingDeclVersion
is fine, I have no issue with those names.
return Err(Error::IllFormed(IllFormedError::MissedDoctypeName)); | ||
match buf[8..].iter().position(|&b| !is_whitespace(b)) { | ||
Some(start) => Ok(Event::DocType(BytesText::wrap( | ||
&buf[8 + start..], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be kind of nice if we could document these constants here and above. With a bit of thought most of them are not so difficult to figure out, but they're not obvious at first clance.
e.g.
BangType::DocType if uncased_starts_with(buf, b"!DOCTYPE") => {
let doctype_tag_len = 8;
match buf[doctype_tag_len..].iter().position(|&b| !is_whitespace(b)) {
Some(start) => Ok(Event::DocType(BytesText::wrap(
&buf[doctype_tag_len + start..],
Feel free to tweak the details (such as const
vs let
, doctype_tag_len
vs DOCTYPE_TAG.len()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 87 - 113 could use the same treatment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that code already slightly rewritten (I'll push draft PR soon) and does not contain magic numbers (or they documented)
tests/reader-errors.rs
Outdated
// ^= 9 | ||
err!(missed_doctype_name2("<!DOCTYPE \t\r\n>") => 13: IllFormedError::MissedDoctypeName); | ||
// ^= 13 | ||
ok!(missed_doctype_name3("<!DOCTYPE \t\r\nx>") => Event::DocType(BytesText::new("x"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intend to leave these names as missed_doctype_name3
(and below, mismatched_end1
, and double_hyphen_in_comment1
) for testing variants that are well-formed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The main motivation is that these names of the "test about MissedDoctypeName
error number one... number two... and so on". The final test is a canary test that ensures that after all we able to parse correct representation without mentioned error.
I could wrap that tests into module named missed_doctype_name
(snake cased error name), but I decide, that indent level would be not very comfort. The similar names of tests for the particular error is useful for running of only related tests via cargo test ... -- missed_doctype_name
.
In the end, I thought that just a common name with a number is quite convenient and concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improvements are suggested. This looks OK otherwise.
Co-authored-by: Daniel Alley <dalley@redhat.com>
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #684 +/- ##
==========================================
+ Coverage 65.14% 65.31% +0.16%
==========================================
Files 38 38
Lines 17885 17920 +35
==========================================
+ Hits 11651 11704 +53
+ Misses 6234 6216 -18
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
This PR is a prepare step for rewriting parser from scratch. First of all I want to guarantee that behaviour will not be changed, especially conditions on which errors is returned and a final state after the error.
Thanks to that tests, I found several bugs related to result of
.buffer_position()
that should report error position.This PR changes how some ill-formed XML constructs are processed:
<?xml?>
now will be threated as syntactically correct XML declaration, although it is not well-formed. Note, however, that some online validators allows such declaration. Previously it was reported asEvent::PI
. According to the specification, aversion
attribute is mandatory, but I follow the quick-xml ideology, that we do not do checks that could slow down parsing, but that checks are performed by demand. In that case when you callBytesDecl::version()
. Maybe that is not the better way to do things, but this PR not about it.<??>
now will be parsed as syntactically correct processing instruction, although it is not well-formed, previously error is returned. Again, some online validators allows that. Currently there is no way to enforce well-formedless check for PI, but I think, that could be done in Processing instructions should have a dedicated type #650