Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OF-2628: Avoid polynomial regular expression used on uncontrolled data #2221

Merged

Conversation

guusdk
Copy link
Member

@guusdk guusdk commented Jul 18, 2023

Prevent possibility of regular expression execution depending on user-provided values taking exceptionally long / many resources to complete

This commit replaces the regular expression with a solution that doesn't use regular expressions. Arguably, the complexity of this code does not change much.

Prevent possibility of regular expression execution depending on user-provided values taking exceptionally long / many resources to complete

This commit replaces the regular expression with a solution that doesn't use regular expressions. Arguably, the complexity of this code does not change much.
@guusdk
Copy link
Member Author

guusdk commented Jul 18, 2023

Is this a solution that covers all edge cases, and is roughly not much worse in performance as compared to the original?

@Fishbowler
Copy link
Member

I wonder if we can test efficacy AND performance with a unit test pre- and post-change

Copy link
Contributor

@GregDThomas GregDThomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and is better in that it allows hex char references.

I can't offer any opinion on performance, other than perhaps suggest it probably doesn't matter too much unless it's awful, which I doubt?

@Fishbowler
Copy link
Member

Added some unit tests and ran them on main and on this branch a bunch of times via mvn -Dtest=XMLLightweightParserTest#testHasIllegalCharacterReferences* test - sometimes one is faster, sometimes the other. There's no perceivable difference.

@Fishbowler
Copy link
Member

I won't merge unit Guus gives my unit tests a look. They're really very brief...

@GregDThomas
Copy link
Contributor

Might want to replace @test with @RepeatedTest(10_000) - or some other number, but they look pretty quick to run - to get a bigger sample for timing purposes.

@guusdk
Copy link
Member Author

guusdk commented Jul 22, 2023

The original should also have allowed for hex-based character references - but the fact that that wasn't clear is probably a good indication that this change isn't the worst, from a readability perspective.

As for the test:

  • /me points at org.jivesoftware.openfire.nio.XmlNumericCharacterReferenceTest
  • Although it's good to get some anecdotal evidence, I don't think we should use unit testing for automated load comparison, to be honest.

@Fishbowler
Copy link
Member

Thanks Guus. I totally didn't spot those, and they're much better functional coverage. Reverted.

I agree with not using unit tests for automated load comparison in the general case. It requires constrained and understood code paths and an environment where the performance variables are somewhat understood. Understanding the results are also important (e.g. one run with a 10% isn't interesting, but a consistent 10% execution time change very much is).

@Fishbowler Fishbowler merged commit 9fea45d into igniterealtime:main Jul 25, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants