Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix v1 query engine behavior for aggregations without group by where the limit is zero #13564

Merged
merged 1 commit into from
Dec 24, 2024

Conversation

yashmayya
Copy link
Collaborator

  • Aggregation queries without GROUP BY clauses typically aren't used with a LIMIT clause since it semantically doesn't make sense.
  • However, it isn't considered invalid, and most databases (including Postgres and MySQL) return the same aggregation results (i.e., over the entire table) regardless of what the LIMIT is - except if the LIMIT is 0, in which case an empty result set is returned.
  • The v1 query engine in Pinot follows the former but not the latter - i.e., it returns the same aggregation results (i.e., over the entire table) regardless of what the LIMIT is (even if it is 0).
  • This patch standardizes the Pinot behavior in this case by introducing an EmptyAggregationOperator (somewhat similar to the EmptySelectionOperator):
> EXPLAIN PLAN FOR SELECT SUM(ArrDelay) FROM airlineStats LIMIT 0;

PLAN_START(numSegmentsForThisPlan:31)
BROKER_REDUCE(limit:0)
COMBINE_AGGREGATE
AGGREGATE_EMPTY
> SELECT SUM(ArrDelay) FROM airlineStats LIMIT 0;

{"resultTable":{"dataSchema":{"columnNames":["sum(ArrDelay)"],"columnDataTypes":["DOUBLE"]},"rows":[]},"numRowsResultSet":0,"partialResult":false,"exceptions":[],"numGroupsLimitReached":false,"timeUsedMs":3,"requestId":"11345490000000018","brokerId":"Broker_192.168.29.25_8000","numDocsScanned":0,"totalDocs":9746,"numEntriesScannedInFilter":0,"numEntriesScannedPostFilter":0,"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":31,"numSegmentsProcessed":31,"numSegmentsMatched":0,"numConsumingSegmentsQueried":0,"numConsumingSegmentsProcessed":0,"numConsumingSegmentsMatched":0,"minConsumingFreshnessTimeMs":0,"numSegmentsPrunedByBroker":0,"numSegmentsPrunedByServer":0,"numSegmentsPrunedInvalid":0,"numSegmentsPrunedByLimit":0,"numSegmentsPrunedByValue":0,"brokerReduceTimeMs":0,"offlineThreadCpuTimeNs":0,"realtimeThreadCpuTimeNs":0,"offlineSystemActivitiesCpuTimeNs":0,"realtimeSystemActivitiesCpuTimeNs":0,"offlineResponseSerializationCpuTimeNs":0,"realtimeResponseSerializationCpuTimeNs":0,"offlineTotalCpuTimeNs":0,"realtimeTotalCpuTimeNs":0,"explainPlanNumEmptyFilterSegments":0,"explainPlanNumMatchAllFilterSegments":0,"traceInfo":{}}
> SET useMultiStageEngine = true; EXPLAIN PLAN FOR SELECT SUM(ArrDelay) FROM airlineStats LIMIT 0;

Execution Plan
LogicalValues(tuples=[[]])

@codecov-commenter
Copy link

codecov-commenter commented Jul 9, 2024

Codecov Report

Attention: Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.

Project coverage is 62.09%. Comparing base (59551e4) to head (6184918).
Report is 731 commits behind head on master.

Files Patch % Lines
.../core/operator/query/EmptyAggregationOperator.java 0.00% 8 Missing ⚠️
...erator/blocks/results/AggregationResultsBlock.java 0.00% 1 Missing and 2 partials ⚠️
.../combine/merger/AggregationResultsBlockMerger.java 0.00% 1 Missing and 1 partial ⚠️
...rg/apache/pinot/core/plan/AggregationPlanNode.java 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13564      +/-   ##
============================================
+ Coverage     61.75%   62.09%   +0.34%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2559     +123     
  Lines        133233   140935    +7702     
  Branches      20636    21870    +1234     
============================================
+ Hits          82274    87518    +5244     
- Misses        44911    46794    +1883     
- Partials       6048     6623     +575     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.05% <0.00%> (+0.34%) ⬆️
java-21 61.98% <0.00%> (+0.35%) ⬆️
skip-bytebuffers-false 62.07% <0.00%> (+0.33%) ⬆️
skip-bytebuffers-true 61.94% <0.00%> (+34.21%) ⬆️
temurin 62.09% <0.00%> (+0.34%) ⬆️
unittests 62.09% <0.00%> (+0.34%) ⬆️
unittests1 46.67% <0.00%> (-0.22%) ⬇️
unittests2 27.65% <0.00%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For aggregation function without group-by, we should be able to short circuit it on the broker side and always get the correct data schema.
For selection/group-by, ideally we also want to short circuit on the broker side since schema is already enforced. One missing piece is to derive the transform result type based on the input type.
You may also take a look at #13057 and we can probably solve them together.

@siddharthteotia
Copy link
Contributor

@jackjlli - Can you help review this please ?

Copy link
Member

@jackjlli jackjlli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on regardless of whether it's an aggregate query, selection query, order-by query or not, if the limit is set to 0, we can always short circuit it on broker side to fail fast.

Copy link
Contributor

@gortiz gortiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although #13057 and #13831 may superseded this PR, this PR should fix one of the issues. As a note for the other two PRs, please study whether changes in this PR are still needed once the other two are ready to merge.

Copy link
Contributor

@gortiz gortiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although #13057 and #13831 may superseded this PR, this PR should fix one of the issues. As a note for the other two PRs, please study whether changes in this PR are still needed once the other two are ready to merge.

@gortiz gortiz merged commit b398fed into apache:master Dec 24, 2024
20 checks passed
@Jackie-Jiang Jackie-Jiang added backward-incompat Referenced by PRs that introduce or fix backward compat issues query labels Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backward-incompat Referenced by PRs that introduce or fix backward compat issues bugfix query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants