Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return improved dataschema for empty results when all segments are pruned by broker #13831

Merged
merged 2 commits into from
Jan 24, 2025

Conversation

vvivekiyer
Copy link
Contributor

When all segments are pruned on the server, we construct a empty responseBlock . As this response block is constructed without access to segment data, we default to STRING as the datatype for the column - here is the code link . This problem does not exist for pure Aggregation queries.

This PR fixes that by just processing one segment.

Added tests to verify the code changes.

@vvivekiyer vvivekiyer force-pushed the fix_empty_results_schema branch from 377ea73 to 514c218 Compare August 17, 2024 01:05
@Jackie-Jiang
Copy link
Contributor

#13057 (all segments pruned on broker) is similar to the problem solved by this PR. We should be able to construct the empty response based on the schema. Do you want to also help fix that?

@vvivekiyer
Copy link
Contributor Author

@Jackie-Jiang Sure, I can address it.

We have two options to solve it:

  1. If all segments are pruned on broker, send the query to 1 server with 1 segment ( with optimization to not send for aggregation only queries) . The code change already made in this PR will help return the correct schema. This will automatically handle deriving schema types for group-by, select, and transform result types. This will add some overhead to these empty queries but should be negligle.

  2. Construct an empty result-table and derive the data-types for the query using schema. For transform types, would need to look into how to get the type.

I'm thinking of going with option (1). Thoughts?

Additionally a followup could be to add a broker-side LIMIT_0_PRUNER that will help short-circuit faster instead of routing queries to all servers.

@Jackie-Jiang
Copy link
Contributor

1 is not always possible in the following cases:

  • There is no segment at all
  • There are only empty segments

I prefer 2 because we should be able to derive the data type for transform based on input types in order to be SQL compatible. If we cannot get that right now, we can leave a TODO and put a type as placeholder.

@jadami10
Copy link
Contributor

+1 to jackie's idea of leaving the column types as TODOs.

We've noticed internally that the bigger issue is we're strictly missing the result table. We don't even use the column types from Pinot since the code usually 1) assumes the types ahead of time or 2) infers it from the json result. But the column names and rows are typically assumed to be there.

@vvivekiyer
Copy link
Contributor Author

I'm working on the broker side changes as well. I'll create that as a separate PR.

@vvivekiyer
Copy link
Contributor Author

vvivekiyer commented Jan 22, 2025

Lost track of this PR.

Please take a look @jadami10 @gortiz @albertobastos

@vvivekiyer vvivekiyer force-pushed the fix_empty_results_schema branch from d430a8d to 04291d3 Compare January 22, 2025 21:59
@vvivekiyer vvivekiyer changed the title Return correct dataschema for empty results Return correct dataschema for empty results when all segments are pruned by broker Jan 22, 2025
@vvivekiyer vvivekiyer changed the title Return correct dataschema for empty results when all segments are pruned by broker Return improved dataschema for empty results when all segments are pruned by broker Jan 22, 2025
@vvivekiyer vvivekiyer force-pushed the fix_empty_results_schema branch from 04291d3 to ff95e23 Compare January 22, 2025 22:03
@codecov-commenter
Copy link

codecov-commenter commented Jan 22, 2025

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 63.72%. Comparing base (59551e4) to head (96d2c89).
Report is 1617 commits behind head on master.

Files with missing lines Patch % Lines
...sthandler/BaseSingleStageBrokerRequestHandler.java 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13831      +/-   ##
============================================
+ Coverage     61.75%   63.72%   +1.96%     
- Complexity      207     1469    +1262     
============================================
  Files          2436     2708     +272     
  Lines        133233   151424   +18191     
  Branches      20636    23376    +2740     
============================================
+ Hits          82274    96489   +14215     
- Misses        44911    47687    +2776     
- Partials       6048     7248    +1200     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.70% <0.00%> (+1.99%) ⬆️
java-21 63.61% <0.00%> (+1.98%) ⬆️
skip-bytebuffers-false 63.72% <0.00%> (+1.97%) ⬆️
skip-bytebuffers-true 63.59% <0.00%> (+35.86%) ⬆️
temurin 63.72% <0.00%> (+1.96%) ⬆️
unittests 63.71% <0.00%> (+1.96%) ⬆️
unittests1 56.28% <ø> (+9.39%) ⬆️
unittests2 34.02% <0.00%> (+6.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@jadami10 jadami10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering fillEmptyResponseSchema was improved in the linked PR, this seems like a straightforward addition

@vvivekiyer vvivekiyer merged commit a29ecd7 into apache:master Jan 24, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants