(yaml) Possible performance regression in snakeyaml 1.18 #67

tlrx · 2018-02-02T11:45:10Z

Hi,

I spotted a performance regression in snakeyaml introduced in version 1.18+. This issue has been reported on the snakeyaml mailing list here and seems to affect the parsing of large text values.

It affects jackson-dataformats-text starting version 2.9.1 as it depends on snakeyaml 1.18. Version 1.19 is also problematic but version 1.17 is ok.

I understand that this is not a jackson-dataformats-text issue but I'd like 1) to raise awareness on this regression and 2) to ensure that jackson-dataformats-text dependency on snakeyaml will be updated to a newer version that fixes the regression (if such version is released)

The text was updated successfully, but these errors were encountered:

cowtowncoder · 2018-02-02T18:04:02Z

Ok. Thank you for reporting this. I don't remember if 1.18 was specifically needed for a bug fix or not. If not, could gracefully downgrade until 1.20 (or whatever with fix) comes along.

It is too bad SnakeYAML has severe performance issues in general (it's literally 10-20x slower than, say, json parsing, or 10x xml); otherwise it works very well.

chrisco484 · 2018-04-04T19:01:51Z

"10x [slower than] xml [parsing]?"

With XML being so verbose it's hard to imagine anything being slower than XML parsing. That's amazing.

cowtowncoder · 2018-04-04T22:41:31Z

@chrisco484 there is no fundamental reason it should be, since at low level textual codecs really should converge to similar speeds relative to raw size (that is, more verbose formats would be bit slower just since there's more text to decode). On the other hand, XML parsers have been rather aggressively optimized so parsers like Aalto get pretty decent throughput, in 10s of megabytes per second per core. It would be nice if YAML part could be improved.

chrisco484 · 2018-04-05T00:57:18Z

Yes, I guess the XML world has had 20 years to optimize whereas YAML is relatively recent in its ever growing adoption. The theoretical max speed of any YAML parser should be quicker than the theoretical max speed of any XML parser given YAML doesn't have the verbosity of XML but, as you say, the difference may not be large - but still would be noticeable on larger data sets I would think.

…

________________________________ From: Tatu Saloranta [mailto:notifications@github.com] Sent: Thursday, 5 April 2018 8:42 AM To: FasterXML/jackson-dataformats-text Cc: Chris Colman; Mention Subject: Re: [FasterXML/jackson-dataformats-text] (yaml) Possible performance regression in snakeyaml 1.18 (#67) @chrisco484 <https://github.com/chrisco484> there is no fundamental reason it should be, since at low level textual codecs really should converge to similar speeds relative to raw size (that is, more verbose formats would be bit slower just since there's more text to decode). On the other hand, XML parsers have been rather aggressively optimized so parsers like Aalto get pretty decent throughput, in 10s of megabytes per second per core. It would be nice if YAML part could be improved. - You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#67 (comment) mment-378767743> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AA6iPxUcSd1hM5VyfzHiW 9ZaqcT6l6JLks5tlUwcgaJpZM4R3EVP> . <https://github.com/notifications/beacon/AA6iP62050QDKvbGURyVhfEvhZIs3MK Zks5tlUwcgaJpZM4R3EVP.gif>

cowtowncoder · 2018-04-05T02:59:00Z

Right. And one problem with YAML is that the rules for decoding are rather complicated (specification is huge, and with all the variations to support wiki-style, or not, it gets hairy quick), which makes it much more work to optimize.
But I don't think that's where most of the time is spent right now: I think it's just SnakeYAML using quite complex processing pipeline.

As long as YAML is mostly used for configuration handling this isn't much of a problem.
But it could be a problem for anyone using YAML for data interchange (request/response).

chrisco484 · 2018-04-05T08:32:22Z

yeah, i was thinking of using it for data integration/migration because of Yaml's smaller serialised output.

…

________________________________ From: Tatu Saloranta [mailto:notifications@github.com] Sent: Thursday, 5 April 2018 12:59 PM To: FasterXML/jackson-dataformats-text Cc: Chris Colman; Mention Subject: Re: [FasterXML/jackson-dataformats-text] (yaml) Possible performance regression in snakeyaml 1.18 (#67) Right. And one problem with YAML is that the rules for decoding are rather complicated (specification is huge, and with all the variations to support wiki-style, or not, it gets hairy quick), which makes it much more work to optimize. But I don't think that's where most of the time is spent right now: I think it's just SnakeYAML using quite complex processing pipeline. As long as YAML is mostly used for configuration handling this isn't much of a problem. But it could be a problem for anyone using YAML for data interchange (request/response). - You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#67 (comment) mment-378806463> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AA6iP6u8pz8JDh3EY4fq4 6JSCKrdKrr8ks5tlYh1gaJpZM4R3EVP> . <https://github.com/notifications/beacon/AA6iP1KaINMBPNTXCEL1O9wp7xveA_A kks5tlYh1gaJpZM4R3EVP.gif>

cowtowncoder · 2018-04-12T00:00:25Z

@tlrx Looks like 1.20 is out. Not sure if that differs from 1.19 in this aspect; based on discussion it sounds like probably not?

yborovikov · 2018-04-25T19:08:53Z

fyi: snakeyaml 1.20 introduced a breaking (for jackson 2.9.5) change in this commit https://bitbucket.org/asomov/snakeyaml/commits/6cb855839ebc1f3796cff26aa6e5423f1395b975.

java.lang.NoSuchMethodError: org.yaml.snakeyaml.events.MappingStartEvent.<init>(Ljava/lang/String;Ljava/lang/String;ZLorg/yaml/snakeyaml/error/Mark;Lorg/yaml/snakeyaml/error/Mark;Ljava/lang/Boolean;)V

    at com.fasterxml.jackson.dataformat.yaml.YAMLGenerator.writeStartObject(YAMLGenerator.java:489)
    at com.fasterxml.jackson.core.base.GeneratorBase.writeStartObject(GeneratorBase.java:286)
    at com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:626)
    at com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:33)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:319)
    at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3893)
    at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3207)
    ...

cc: @asomov

40ft · 2018-05-24T07:30:21Z

I just ran into this performance problem (parsing a 30KB string on Android) and spent a couple of hours looking at the snakeyaml code, before finding my way here.

The main problem in 1.18 is a O(n-squared) loop in the Reader.peek() function, which has indeed been completely rewritten and fixed in later versions. Forcing snakeyaml to 1.21 (with jackson 2.9.5) in my build gives acceptable parsing performance on Android. It does break serialization though, as mentioned above.

Generally speaking, there is a lot of seemingly unnecessary string buffering/duplication/copying going on in the snakeyaml parsing code (optimisation is hard!), so I am not at all surprised if performance is bad.

asomov · 2018-05-24T07:55:01Z

@chrisco484 : the YAML 1.2 spec is very big. Chomping, flow styles, scalar styles (especially the folded style), implicit types, anchors/aliases/recursive structures, complex keys are not present in XML and they are not often used. Nevertheless the parser must always take care of all these details.

asomov · 2018-05-24T07:56:59Z

@yborovikov: can I help to adapt jackson 2.9.5 to work with SnakeYAML 1.21 ?

cowtowncoder · 2018-05-24T21:10:05Z

Quick note: next patch version would be 2.9.6, but I have decided that there probably should be minor upgrade of 2.10 for Jackson, before bigger changes in 3.0.

I have no objections to changes in 2.9(.6) that allow use of newer version (and dependency to newer version), as long as change would still work with older version(s) that Jackson 2.9 has relied on.
This because users sometimes have slightly inconsistent set of dependencies, and I don't want to add risk in patch versions.

For 2.10, new version should definitely be used, so above caveat is only wrt 2.9.x patches.

yborovikov · 2018-05-29T16:39:10Z

@asomov Theoretically, reinstating the ScalarEvent(String anchor, String tag, ImplicitTuple implicit, String value, Mark startMark, Mark endMark, Character style) constructor (and marking it as @Deprecated) should restore snakeyaml's compatibility with jackson's YAMLGenerator.

This way Jackson 2.9.x would be able to use both older and newer versions of snakeyaml (with the exception of 1.20 and 1.21).

asomov · 2018-05-29T17:08:51Z

I am confused now. I thought that this issue is all about migrating to SnakeYAML 1.21

yborovikov · 2018-05-29T17:25:07Z

@asomov Well, migrating to 1.21 is fairly simple and is covered by this PR: https://github.com/FasterXML/jackson-dataformats-text/pull/82/files

However, it would prevent consumers from using SnakeYAML 1.19 (and older - which might be desirable for some people due to the alleged performance regression) with the newer Jackson 2.9.x.

There's an option of dynamically finding the right constructor at runtime (to maintain compatibility with older SnakeYAML versions as @cowtowncoder requested above), yet, arguably, it's a bit ugly.

So, simply applying #82 and releasing it as a part of Jackson 2.10 might be what the maintainers prefer. Ultimately, I'm confused now, too :)

cowtowncoder · 2018-09-26T04:46:57Z

SnakeYAML version was upgraded, as per #81, so I think this is resolved now.

cowtowncoder added the 2.9 label Feb 2, 2018

cowtowncoder added the yaml Issue related to YAML format backend label Apr 11, 2018

yborovikov mentioned this issue Apr 25, 2018

Fix #81 (update snakeyaml version to 1.21) #82

Closed

tlrx mentioned this issue Jun 20, 2018

snakeyaml conflict with Spring Boot 2.0.2.RELEASE and up elastic/elasticsearch#31459

Closed

cowtowncoder closed this as completed Sep 26, 2018

skarsaune mentioned this issue Apr 7, 2021

Do not rely on whatever kubernetes context is current at the client (allow specifying context) jolokia/jolokia#453

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(yaml) Possible performance regression in snakeyaml 1.18 #67

(yaml) Possible performance regression in snakeyaml 1.18 #67

tlrx commented Feb 2, 2018 •

edited

Loading

cowtowncoder commented Feb 2, 2018

chrisco484 commented Apr 4, 2018

cowtowncoder commented Apr 4, 2018

chrisco484 commented Apr 5, 2018 via email

cowtowncoder commented Apr 5, 2018

chrisco484 commented Apr 5, 2018 via email

cowtowncoder commented Apr 12, 2018 •

edited

Loading

yborovikov commented Apr 25, 2018 •

edited

Loading

40ft commented May 24, 2018

asomov commented May 24, 2018 •

edited

Loading

asomov commented May 24, 2018

cowtowncoder commented May 24, 2018

yborovikov commented May 29, 2018

asomov commented May 29, 2018

yborovikov commented May 29, 2018

cowtowncoder commented Sep 26, 2018

(yaml) Possible performance regression in snakeyaml 1.18 #67

(yaml) Possible performance regression in snakeyaml 1.18 #67

Comments

tlrx commented Feb 2, 2018 • edited Loading

cowtowncoder commented Feb 2, 2018

chrisco484 commented Apr 4, 2018

cowtowncoder commented Apr 4, 2018

chrisco484 commented Apr 5, 2018 via email

cowtowncoder commented Apr 5, 2018

chrisco484 commented Apr 5, 2018 via email

cowtowncoder commented Apr 12, 2018 • edited Loading

yborovikov commented Apr 25, 2018 • edited Loading

40ft commented May 24, 2018

asomov commented May 24, 2018 • edited Loading

asomov commented May 24, 2018

cowtowncoder commented May 24, 2018

yborovikov commented May 29, 2018

asomov commented May 29, 2018

yborovikov commented May 29, 2018

cowtowncoder commented Sep 26, 2018

tlrx commented Feb 2, 2018 •

edited

Loading

cowtowncoder commented Apr 12, 2018 •

edited

Loading

yborovikov commented Apr 25, 2018 •

edited

Loading

asomov commented May 24, 2018 •

edited

Loading