Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional Regular Expression cache to MatchesOperator #5837

Merged
merged 3 commits into from
Apr 16, 2024
Merged

Add optional Regular Expression cache to MatchesOperator #5837

merged 3 commits into from
Apr 16, 2024

Conversation

JaredDavis22
Copy link
Contributor

Improves performance of matches operator by optionally caching the Pattern instances compiled from the regular expression.

To use the cache add the following property before MatchesOperator is initialized.

System.setProperty("drools.matches.compiled.cache.count", "100");

I did not know how to create a test case that would run for both cases where the system property "drools.matches.compiled.cache.count" is 0 and is a positive number > 0.

@baldimir
Copy link
Contributor

Hi, thanks for the PR. Do you have please some measurements for the performance improvement?

@JaredDavis22
Copy link
Contributor Author

Hi, thanks for the PR. Do you have please some measurements for the performance improvement?

Hi - nothing specific. A sample ide run went from 16 seconds to 14 seconds for me on a simple application with some matches. The approach to avoid .matches in a loop is well known. https://www.baeldung.com/java-regex-performance

@JaredDavis22
Copy link
Contributor Author

The test that did not complete on jenkins works fine locally. Is this an issue with the code I changed?

[2024-04-10T15:27:28.988Z] [INFO] Running org.drools.mvel.integrationtests.concurrency.ConcurrentInsertionsToSubnetworksTest
[2024-04-10T15:27:50.781Z] Cannot contact jenkins-shared-ubuntu-2: java.lang.InterruptedException
[2024-04-10T15:30:51.697Z] [INFO] apache/incubator-kie-drools failed. Won't execute remaining commands and projects

@tkobayas
Copy link
Contributor

@JaredDavis22 Thank you for the PR! Below, some suggestions.

@@ -0,0 +1,23 @@
package org.drools.model.operators;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add Apache License header just like other java sources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - will change

// Regular expression is second parameter
assertThat(instance.eval("a","a")).isTrue();
assertThat(instance.eval("a","b")).isFalse();
assertThat(instance.eval("a","a")).isTrue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 13 and 15 are identical. Is it intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes back to my bad test case. I do not know how to create a test case that would run for both cases where the system property "drools.matches.compiled.cache.count" is 0 and is a positive number > 0 with the one INSTANCE. I see you did it with the benchmark, but that uses jmh. I manually ran the test case with the system property set to 100. When the cache is used row 15 tests the previously stored Pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made these changes to improve the tests but did not commit them. Are they better/uglier/? Test looks better, main class looks worse and no longer inlines the size test in the maps removeEldestEntry.

public enum MatchesOperator implements Operator.SingleValue<String, String> {

    INSTANCE;

    private static final String CACHE_MATCHES_COMPILED_MAX_PROPERTY = "drools.matches.compiled.cache.count";

    // default to 0 for no cache, not final for unit tests
    private static int MAX_SIZE_CACHE = Integer.parseInt(System.getProperty(CACHE_MATCHES_COMPILED_MAX_PROPERTY, "0"));

    // store Pattern for regular expressions using the regular expression as the key up to MAX_SIZE_CACHE entries.
    public final Map<String, Pattern> patternMap = Collections.synchronizedMap(new LinkedHashMap<>() {
        @Override
        protected boolean removeEldestEntry( Map.Entry<String, Pattern> eldest ) {
            return size() > (MAX_SIZE_CACHE);
        }
    });

    // for unit testing
    public void forceCacheSize(int size) {
        MAX_SIZE_CACHE = size;
        patternMap.clear();
    }
    // for unit testing
    public void reInitialize() {
        forceCacheSize(Integer.parseInt(System.getProperty(CACHE_MATCHES_COMPILED_MAX_PROPERTY, "0")));
    }

public class MatchesOperatorTest {


    @Test
    public void testMatchesOperatorCache() {
        MatchesOperator instance = MatchesOperator.INSTANCE;
        instance.forceCacheSize(100);

        // s1 maybe null
        assertThat(instance.eval(null,"anything")).isFalse();

        // cache enabled
        // Regular expression is second parameter
        assertThat(instance.eval("a","a")).isTrue();
        assertThat(instance.eval("a","b")).isFalse();
        assertThat(instance.eval("a","a")).isTrue(); // regular expression "a" in map.
        assertThat(instance.eval("b","b")).isTrue(); // regular expression "b" in map.
        assertThat(instance.eval("c","b")).isFalse(); // regular expression "b" in map.
    }

    @Test
    public void testMatchesOperatorNoCache() {
        MatchesOperator instance = MatchesOperator.INSTANCE;
        instance.forceCacheSize(0);
        // s1 maybe null
        assertThat(instance.eval(null,"anything")).isFalse();
        assertThat(instance.eval("a","a")).isTrue();
        assertThat(instance.eval("a","b")).isFalse();
        assertThat(instance.eval("b","b")).isTrue();
    }

    @After
    public void resetCache() {
        MatchesOperator instance = MatchesOperator.INSTANCE;
        instance.reInitialize();
    }


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JaredDavis22 Thanks. Yes, having test purpose methods is good to manage such static field conditions. Furthermore, you can make forceCacheSize and reInitialize package private (removing public) because MatchesOperator and MatchesOperatorTest are in the same package.

@tkobayas
Copy link
Contributor

The test that did not complete on jenkins works fine locally. Is this an issue with the code I changed?

The error seems to be a jenkins environment issue. We can ignore.

@tkobayas
Copy link
Contributor

Retrospectively, filed a github issue: #5841

@JaredDavis22 Please file a github issue when you raise a PR. Thanks!

@tkobayas
Copy link
Contributor

I created a benchmark PR: apache/incubator-kie-benchmarks#283

Local results:

Benchmark              (_factsNumber)  (_rulesNumber)  (cacheEnabled)  Mode  Cnt  Score   Error  Units
MatchesBenchmark.test              32              16            true    ss   20  2.389 ± 0.278  ms/op
MatchesBenchmark.test              32              16           false    ss   20  2.157 ± 0.439  ms/op
MatchesBenchmark.test              32              32            true    ss   20  2.627 ± 0.517  ms/op
MatchesBenchmark.test              32              32           false    ss   20  3.466 ± 0.822  ms/op
MatchesBenchmark.test             256              16            true    ss   20  2.650 ± 0.521  ms/op
MatchesBenchmark.test             256              16           false    ss   20  4.716 ± 2.079  ms/op
MatchesBenchmark.test             256              32            true    ss   20  4.852 ± 0.945  ms/op
MatchesBenchmark.test             256              32           false    ss   20  7.036 ± 1.410  ms/op

Notable improvement with cacheEnabled=true especially with many facts.

@tkobayas tkobayas requested a review from baldimir April 15, 2024 05:13
public enum MatchesOperator implements Operator.SingleValue<String, String> {

INSTANCE;

// not final due to unit tests
private int MAX_SIZE_CACHE = getMaxSizeCache();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this field and also the Map be static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we have a singleton only usage since it is an enum. Does that make all class variables in a sense static? Want me to change it? I did not benchmark it either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed them to static and pushed

@tkobayas
Copy link
Contributor

tkobayas commented Apr 16, 2024

GHA windows : Flaky test, ignore for this PR (I filed #5847 I'll investigate it, but it doesn't block this PR)

2024-04-15T15:46:02.5654413Z [ERROR] Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 20.26 s <<< FAILURE! -- in org.drools.mvel.integrationtests.KieContainerTest^M
2024-04-15T15:46:02.5991469Z [ERROR] org.drools.mvel.integrationtests.KieContainerTest.testIncrementalCompilationSynchronization[KieBase type=CLOUD_IDENTITY] -- Time elapsed: 20.02 s <<< ERROR!^M
2024-04-15T15:46:02.5993926Z org.junit.runners.model.TestTimedOutException: test timed out after 20000 milliseconds^M
2024-04-15T15:46:02.5995481Z    at app//org.drools.core.common.ConcurrentNodeMemories.getNodeMemory(ConcurrentNodeMemories.java:91)^M
2024-04-15T15:46:02.5997472Z    at app//org.drools.core.phreak.RuntimeSegmentUtilities.updateRiaAndTerminalMemory(RuntimeSegmentUtilities.java:174)^M
2024-04-15T15:46:02.5999561Z    at app//org.drools.core.phreak.RuntimeSegmentUtilities.restoreSegmentFromPrototype(RuntimeSegmentUtilities.java:102)^M
2024-04-15T15:46:02.6001565Z    at app//org.drools.core.phreak.RuntimeSegmentUtilities.getOrCreateSegmentMemory(RuntimeSegmentUtilities.java:66)^M
2024-04-15T15:46:02.6003146Z    at app//org.drools.core.common.Memory.getOrCreateSegmentMemory(Memory.java:38)^M
2024-04-15T15:46:02.6004585Z    at app//org.drools.core.reteoo.LeftInputAdapterNode.doInsertObject(LeftInputAdapterNode.java:174)^M
2024-04-15T15:46:02.6006133Z    at app//org.drools.core.reteoo.LeftInputAdapterNode.assertObject(LeftInputAdapterNode.java:162)^M
2024-04-15T15:46:02.6007938Z    at app//org.drools.core.reteoo.SingleObjectSinkAdapter.propagateAssertObject(SingleObjectSinkAdapter.java:73)^M
2024-04-15T15:46:02.6009589Z    at app//org.drools.core.reteoo.ObjectTypeNode.propagateAssert(ObjectTypeNode.java:216)^M
2024-04-15T15:46:02.6010921Z    at app//org.drools.core.reteoo.ObjectTypeNode.assertInitialFact(ObjectTypeNode.java:184)^M
2024-04-15T15:46:02.6012621Z    at app//org.drools.kiesession.session.StatefulKnowledgeSessionImpl.initInitialFact(StatefulKnowledgeSessionImpl.java:599)^M
2024-04-15T15:46:02.6014544Z    at app//org.drools.kiesession.session.StatefulKnowledgeSessionImpl.<init>(StatefulKnowledgeSessionImpl.java:351)^M
2024-04-15T15:46:02.6016417Z    at app//org.drools.kiesession.session.StatefulKnowledgeSessionImpl.<init>(StatefulKnowledgeSessionImpl.java:276)^M
2024-04-15T15:46:02.6018387Z    at app//org.drools.kiesession.factory.PhreakWorkingMemoryFactory.createWorkingMemory(PhreakWorkingMemoryFactory.java:40)^M
2024-04-15T15:46:02.6020567Z    at app//org.drools.kiesession.factory.RuntimeComponentFactoryImpl.createStatefulSession(RuntimeComponentFactoryImpl.java:99)^M
2024-04-15T15:46:02.6038425Z    at app//org.drools.kiesession.rulebase.SessionsAwareKnowledgeBase.newKieSession(SessionsAwareKnowledgeBase.java:187)^M
2024-04-15T15:46:02.6040453Z    at app//org.drools.kiesession.rulebase.SessionsAwareKnowledgeBase.newKieSession(SessionsAwareKnowledgeBase.java:165)^M
2024-04-15T15:46:02.6042321Z    at app//org.drools.compiler.kie.builder.impl.KieContainerImpl.newKieSession(KieContainerImpl.java:615)^M
2024-04-15T15:46:02.6043919Z    at app//org.drools.compiler.kie.builder.impl.KieContainerImpl.newKieSession(KieContainerImpl.java:533)^M
2024-04-15T15:46:02.6045513Z    at app//org.drools.compiler.kie.builder.impl.KieContainerImpl.newKieSession(KieContainerImpl.java:516)^M
2024-04-15T15:46:02.6047558Z    at app//org.drools.mvel.integrationtests.KieContainerTest.testIncrementalCompilationSynchronization(KieContainerTest.java:280)^M

Copy link
Contributor

@tkobayas tkobayas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@tkobayas tkobayas merged commit ac6b6fe into apache:main Apr 16, 2024
9 checks passed
@JaredDavis22 JaredDavis22 deleted the matchesCached branch April 17, 2024 12:40
rgdoliveira pushed a commit to rgdoliveira/drools that referenced this pull request Apr 23, 2024
* Add optional Regular Expression cache to MatchesOperator

* Improve test cases, variable names and comments.

* Added static to two fields
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants