-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BETWEEN support to NumericalFilterOptimizer #14163
Merged
Jackie-Jiang
merged 6 commits into
apache:master
from
yashmayya:between-numerical-filter-optimizer
Oct 11, 2024
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
f7a17ef
Add BETWEEN support to NumericalFilterOptimizer
yashmayya 41c3f48
Add check to query generator ensuring that BETWEEN on numerical liter…
yashmayya aafeedd
Remove filter prune for BETWEEN to avoid breaking tests
yashmayya 772c564
Cleanup conditional logic in NumericalFilterOptimizer::optimizeChild
yashmayya 337a088
Address review comments
yashmayya 454fa24
Add TODO to consider unifying rewrite logic for range and between exp…
yashmayya File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this is a reduced version of the existing
rewriteRangeExpression()
. To keep the behavior consistent, are we able to extract out the logic of rewriting lower bound and upper bound, then use the common logic for bothrewriteRangeExpression()
andrewriteBetweenExpression()
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider consolidating the two but the major difference is that in
rewriteRangeExpression
we're rewriting the operators too (and avoiding manually modifying the literal value other than the cast) which we don't want to do forBETWEEN
(as discussed in #14111). The other alternative would be to changerewriteRangeExpression
to avoid rewriting the operators but I thought it'd be better not to change the behavior of the existing optimizer?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part I still don't follow. Do you mean we should not rewrite
BETWEEN
the same way as other range filters, or is it too complicated? As long as we computed lower and upper bound, we should be able to assemble it back to aBETWEEN
.It is also fine to do it separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, basically this. For instance, taking
intCol >= 2.5
as an example.2.5
is cast to2
(int), and then the>=
is rewritten to>
becauseactual - converted > 0
resulting inintCol > 2
. ForBETWEEN
, we want to instead rewriteintCol BETWEEN 2.5 AND y
tointCol BETWEEN 3 AND y
. We could change the logic for regular range filter to rewriteintCol >= 2.5
tointCol >= 3
instead to match theBETWEEN
rewrite logic - is that what you're suggesting? There are some other differences too though. For instance,floatCol < longLiteral
can be rewritten tofloatCol <= castedFloatLiteral
depending on the comparison betweenlongLiteral
andcastedFloatLiteral
. We can't do the same forBETWEEN
though, and we simply skip any conversion in these cases, allowing the server to do the cast. Given these differences, it seemed better overall to keep these rewrites separate, what do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation. Ideally we want the behavior to be consistent across all range filters. Given the difficulties and the scope of this PR, let's add a TODO to revisit in the future