-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new rule PinotSeminJoinDistinctProjectRule to apply a distinct to a semi join right side project #14758
Add a new rule PinotSeminJoinDistinctProjectRule to apply a distinct to a semi join right side project #14758
Conversation
6ff9572
to
df34109
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14758 +/- ##
============================================
+ Coverage 61.75% 63.87% +2.12%
- Complexity 207 1612 +1405
============================================
Files 2436 2705 +269
Lines 133233 150995 +17762
Branches 20636 23323 +2687
============================================
+ Hits 82274 96445 +14171
- Misses 44911 47336 +2425
- Partials 6048 7214 +1166
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
df34109
to
18efb4f
Compare
Will this rule prevent us from doing:
We don't always want to pay overhead of distinct when we already know the value is unique, or there are very few duplicates. Even without distinct, the result is still correct, just not necessary the most efficient way to execute. For the example query, in order to achieve the desired query plan, we can write it as:
|
I agree with you on the rewrite part. Technically this should be controllable or we may use a hint to turn on/off the rule? |
56cdf1e
to
753eef2
Compare
Amended this PR to add a custom rule to add the |
Only the queries with right |
I'm not sure if this hint is useful. If user knows to add this hint, they should just write the query as semi join |
The query mentioned in the PR description is already planned as a Semi join in the query plan. |
From user perspective, adding rule is definitely simpler than rewrite query. And this kind of thing cqn only work for power users who knows how to check query plans and rewrite queries as well.
|
Because it matches the conditions above: right side of JOIN has distinct values; there is no project to right side after join.
What I meant is that currently in order to apply the rule, user needs to provide a query hint. From my own experience, knowing how to apply the hint is harder than rewriting the query. |
...nner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotSeminJoinDistinctProjectRule.java
Outdated
Show resolved
Hide resolved
When there is no project to the right table, there are 4 ways to execute:
Ideally, we want to able to control which one to use because each of them might be best for certain use cases. |
753eef2
to
919c784
Compare
…to a semi join right side project
919c784
to
83870e1
Compare
Add a new rule
PinotSeminJoinDistinctProjectRule
to apply a distinct to a semi join right side project using hint:append_distinct_to_semi_join_project='true'
The goal is to apply a distinct on the Semi Join Right side Project Node.
E.g. in
ColocatedJoinEngineQuickStart
:the query:
The old query plan is:
The new query plan is: