-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(yaml) Possible performance regression in snakeyaml 1.18 #67
Comments
Ok. Thank you for reporting this. I don't remember if It is too bad SnakeYAML has severe performance issues in general (it's literally 10-20x slower than, say, json parsing, or 10x xml); otherwise it works very well. |
"10x [slower than] xml [parsing]?" With XML being so verbose it's hard to imagine anything being slower than XML parsing. That's amazing. |
@chrisco484 there is no fundamental reason it should be, since at low level textual codecs really should converge to similar speeds relative to raw size (that is, more verbose formats would be bit slower just since there's more text to decode). On the other hand, XML parsers have been rather aggressively optimized so parsers like Aalto get pretty decent throughput, in 10s of megabytes per second per core. It would be nice if YAML part could be improved. |
Yes, I guess the XML world has had 20 years to optimize whereas YAML is
relatively recent in its ever growing adoption.
The theoretical max speed of any YAML parser should be quicker than the
theoretical max speed of any XML parser given YAML doesn't have the
verbosity of XML but, as you say, the difference may not be large - but
still would be noticeable on larger data sets I would think.
…________________________________
From: Tatu Saloranta [mailto:notifications@github.com]
Sent: Thursday, 5 April 2018 8:42 AM
To: FasterXML/jackson-dataformats-text
Cc: Chris Colman; Mention
Subject: Re: [FasterXML/jackson-dataformats-text] (yaml) Possible
performance regression in snakeyaml 1.18 (#67)
@chrisco484 <https://github.com/chrisco484> there is no fundamental
reason it should be, since at low level textual codecs really should
converge to similar speeds relative to raw size (that is, more verbose
formats would be bit slower just since there's more text to decode). On
the other hand, XML parsers have been rather aggressively optimized so
parsers like Aalto get pretty decent throughput, in 10s of megabytes per
second per core. It would be nice if YAML part could be improved.
-
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#67 (comment)
mment-378767743> , or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA6iPxUcSd1hM5VyfzHiW
9ZaqcT6l6JLks5tlUwcgaJpZM4R3EVP> .
<https://github.com/notifications/beacon/AA6iP62050QDKvbGURyVhfEvhZIs3MK
Zks5tlUwcgaJpZM4R3EVP.gif>
|
Right. And one problem with YAML is that the rules for decoding are rather complicated (specification is huge, and with all the variations to support wiki-style, or not, it gets hairy quick), which makes it much more work to optimize. As long as YAML is mostly used for configuration handling this isn't much of a problem. |
yeah, i was thinking of using it for data integration/migration because
of Yaml's smaller serialised output.
…________________________________
From: Tatu Saloranta [mailto:notifications@github.com]
Sent: Thursday, 5 April 2018 12:59 PM
To: FasterXML/jackson-dataformats-text
Cc: Chris Colman; Mention
Subject: Re: [FasterXML/jackson-dataformats-text] (yaml) Possible
performance regression in snakeyaml 1.18 (#67)
Right. And one problem with YAML is that the rules for decoding are
rather complicated (specification is huge, and with all the variations
to support wiki-style, or not, it gets hairy quick), which makes it much
more work to optimize.
But I don't think that's where most of the time is spent right now: I
think it's just SnakeYAML using quite complex processing pipeline.
As long as YAML is mostly used for configuration handling this isn't
much of a problem.
But it could be a problem for anyone using YAML for data interchange
(request/response).
-
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#67 (comment)
mment-378806463> , or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA6iP6u8pz8JDh3EY4fq4
6JSCKrdKrr8ks5tlYh1gaJpZM4R3EVP> .
<https://github.com/notifications/beacon/AA6iP1KaINMBPNTXCEL1O9wp7xveA_A
kks5tlYh1gaJpZM4R3EVP.gif>
|
@tlrx Looks like |
fyi: snakeyaml
cc: @asomov |
I just ran into this performance problem (parsing a 30KB string on Android) and spent a couple of hours looking at the snakeyaml code, before finding my way here. The main problem in 1.18 is a O(n-squared) loop in the Reader.peek() function, which has indeed been completely rewritten and fixed in later versions. Forcing snakeyaml to 1.21 (with jackson 2.9.5) in my build gives acceptable parsing performance on Android. It does break serialization though, as mentioned above. Generally speaking, there is a lot of seemingly unnecessary string buffering/duplication/copying going on in the snakeyaml parsing code (optimisation is hard!), so I am not at all surprised if performance is bad. |
@chrisco484 : the YAML 1.2 spec is very big. Chomping, flow styles, scalar styles (especially the folded style), implicit types, anchors/aliases/recursive structures, complex keys are not present in XML and they are not often used. Nevertheless the parser must always take care of all these details. |
@yborovikov: can I help to adapt jackson 2.9.5 to work with SnakeYAML 1.21 ? |
Quick note: next patch version would be I have no objections to changes in 2.9(.6) that allow use of newer version (and dependency to newer version), as long as change would still work with older version(s) that Jackson 2.9 has relied on. For 2.10, new version should definitely be used, so above caveat is only wrt 2.9.x patches. |
@asomov Theoretically, reinstating the This way Jackson |
I am confused now. I thought that this issue is all about migrating to SnakeYAML 1.21 |
@asomov Well, migrating to 1.21 is fairly simple and is covered by this PR: https://github.com/FasterXML/jackson-dataformats-text/pull/82/files However, it would prevent consumers from using SnakeYAML 1.19 (and older - which might be desirable for some people due to the alleged performance regression) with the newer Jackson 2.9.x. There's an option of dynamically finding the right constructor at runtime (to maintain compatibility with older SnakeYAML versions as @cowtowncoder requested above), yet, arguably, it's a bit ugly. So, simply applying #82 and releasing it as a part of Jackson 2.10 might be what the maintainers prefer. Ultimately, I'm confused now, too :) |
SnakeYAML version was upgraded, as per #81, so I think this is resolved now. |
Hi,
I spotted a performance regression in snakeyaml introduced in version 1.18+. This issue has been reported on the snakeyaml mailing list here and seems to affect the parsing of large text values.
It affects
jackson-dataformats-text
starting version 2.9.1 as it depends on snakeyaml 1.18. Version 1.19 is also problematic but version 1.17 is ok.I understand that this is not a
jackson-dataformats-text
issue but I'd like 1) to raise awareness on this regression and 2) to ensure thatjackson-dataformats-text
dependency on snakeyaml will be updated to a newer version that fixes the regression (if such version is released)The text was updated successfully, but these errors were encountered: