-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional Regular Expression cache to MatchesOperator #5837
Conversation
Hi, thanks for the PR. Do you have please some measurements for the performance improvement? |
Hi - nothing specific. A sample ide run went from 16 seconds to 14 seconds for me on a simple application with some matches. The approach to avoid .matches in a loop is well known. https://www.baeldung.com/java-regex-performance |
The test that did not complete on jenkins works fine locally. Is this an issue with the code I changed?
|
@JaredDavis22 Thank you for the PR! Below, some suggestions. |
@@ -0,0 +1,23 @@ | |||
package org.drools.model.operators; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add Apache License header just like other java sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - will change
// Regular expression is second parameter | ||
assertThat(instance.eval("a","a")).isTrue(); | ||
assertThat(instance.eval("a","b")).isFalse(); | ||
assertThat(instance.eval("a","a")).isTrue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 13 and 15 are identical. Is it intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes back to my bad test case. I do not know how to create a test case that would run for both cases where the system property "drools.matches.compiled.cache.count" is 0 and is a positive number > 0 with the one INSTANCE. I see you did it with the benchmark, but that uses jmh. I manually ran the test case with the system property set to 100. When the cache is used row 15 tests the previously stored Pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made these changes to improve the tests but did not commit them. Are they better/uglier/? Test looks better, main class looks worse and no longer inlines the size test in the maps removeEldestEntry.
public enum MatchesOperator implements Operator.SingleValue<String, String> {
INSTANCE;
private static final String CACHE_MATCHES_COMPILED_MAX_PROPERTY = "drools.matches.compiled.cache.count";
// default to 0 for no cache, not final for unit tests
private static int MAX_SIZE_CACHE = Integer.parseInt(System.getProperty(CACHE_MATCHES_COMPILED_MAX_PROPERTY, "0"));
// store Pattern for regular expressions using the regular expression as the key up to MAX_SIZE_CACHE entries.
public final Map<String, Pattern> patternMap = Collections.synchronizedMap(new LinkedHashMap<>() {
@Override
protected boolean removeEldestEntry( Map.Entry<String, Pattern> eldest ) {
return size() > (MAX_SIZE_CACHE);
}
});
// for unit testing
public void forceCacheSize(int size) {
MAX_SIZE_CACHE = size;
patternMap.clear();
}
// for unit testing
public void reInitialize() {
forceCacheSize(Integer.parseInt(System.getProperty(CACHE_MATCHES_COMPILED_MAX_PROPERTY, "0")));
}
public class MatchesOperatorTest {
@Test
public void testMatchesOperatorCache() {
MatchesOperator instance = MatchesOperator.INSTANCE;
instance.forceCacheSize(100);
// s1 maybe null
assertThat(instance.eval(null,"anything")).isFalse();
// cache enabled
// Regular expression is second parameter
assertThat(instance.eval("a","a")).isTrue();
assertThat(instance.eval("a","b")).isFalse();
assertThat(instance.eval("a","a")).isTrue(); // regular expression "a" in map.
assertThat(instance.eval("b","b")).isTrue(); // regular expression "b" in map.
assertThat(instance.eval("c","b")).isFalse(); // regular expression "b" in map.
}
@Test
public void testMatchesOperatorNoCache() {
MatchesOperator instance = MatchesOperator.INSTANCE;
instance.forceCacheSize(0);
// s1 maybe null
assertThat(instance.eval(null,"anything")).isFalse();
assertThat(instance.eval("a","a")).isTrue();
assertThat(instance.eval("a","b")).isFalse();
assertThat(instance.eval("b","b")).isTrue();
}
@After
public void resetCache() {
MatchesOperator instance = MatchesOperator.INSTANCE;
instance.reInitialize();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JaredDavis22 Thanks. Yes, having test purpose methods is good to manage such static field conditions. Furthermore, you can make forceCacheSize
and reInitialize
package private (removing public
) because MatchesOperator
and MatchesOperatorTest
are in the same package.
The error seems to be a jenkins environment issue. We can ignore. |
Retrospectively, filed a github issue: #5841 @JaredDavis22 Please file a github issue when you raise a PR. Thanks! |
I created a benchmark PR: apache/incubator-kie-benchmarks#283 Local results:
Notable improvement with |
public enum MatchesOperator implements Operator.SingleValue<String, String> { | ||
|
||
INSTANCE; | ||
|
||
// not final due to unit tests | ||
private int MAX_SIZE_CACHE = getMaxSizeCache(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this field and also the Map be static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we have a singleton only usage since it is an enum. Does that make all class variables in a sense static? Want me to change it? I did not benchmark it either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed them to static and pushed
GHA windows : Flaky test, ignore for this PR (I filed #5847 I'll investigate it, but it doesn't block this PR)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* Add optional Regular Expression cache to MatchesOperator * Improve test cases, variable names and comments. * Added static to two fields
Improves performance of matches operator by optionally caching the Pattern instances compiled from the regular expression.
To use the cache add the following property before MatchesOperator is initialized.
System.setProperty("drools.matches.compiled.cache.count", "100");
I did not know how to create a test case that would run for both cases where the system property "drools.matches.compiled.cache.count" is 0 and is a positive number > 0.