Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable query cancellation for MSQE + cancel using client-provided id #14823

Merged
merged 34 commits into from
Feb 11, 2025

Conversation

albertobastos
Copy link
Contributor

@albertobastos albertobastos commented Jan 15, 2025

  • Enables query cancellation feature for MSQE queries (wasn't supported until now).
  • Lets the client setting a clientQueryId query option that can be used when using the clientQuery/{clientQueryId} endpoint.
  • Creates a sleep(ms) function, as for today only recommended for testing purposes.

Some refactor involved to reuse as much as possible cancellation logic between SSQE and MSQE.

}
String clientQueryId = extractClientQueryId(sqlNodeAndOptions);
if (StringUtils.isBlank(clientQueryId)) {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) in general we don't recommend returning NULL as a coding practice

@siddharthteotia
Copy link
Contributor

Is ClientQueryID a new concept? Is it same as requestID ?

How does the support added here improve the existing Query Cancellation (which is also exposed to user IIRC) ?

@codecov-commenter
Copy link

codecov-commenter commented Jan 16, 2025

Codecov Report

Attention: Patch coverage is 27.48815% with 153 lines in your changes missing coverage. Please review.

Project coverage is 63.62%. Comparing base (59551e4) to head (f4ad417).
Report is 1688 commits behind head on master.

Files with missing lines Patch % Lines
...oller/api/resources/PinotRunningQueryResource.java 0.00% 76 Missing ⚠️
.../pinot/query/service/dispatch/QueryDispatcher.java 40.62% 13 Missing and 6 partials ⚠️
...roker/requesthandler/BaseBrokerRequestHandler.java 52.94% 12 Missing and 4 partials ⚠️
...pinot/broker/api/resources/PinotClientRequest.java 0.00% 11 Missing ⚠️
...r/requesthandler/BrokerRequestHandlerDelegate.java 0.00% 9 Missing ⚠️
...sthandler/BaseSingleStageBrokerRequestHandler.java 70.00% 3 Missing and 3 partials ⚠️
...requesthandler/MultiStageBrokerRequestHandler.java 50.00% 5 Missing ⚠️
...inot/common/function/scalar/DateTimeFunctions.java 42.85% 3 Missing and 1 partial ⚠️
...common/response/broker/BrokerResponseNativeV2.java 0.00% 3 Missing ⚠️
...roker/requesthandler/TimeSeriesRequestHandler.java 0.00% 2 Missing ⚠️
... and 2 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14823      +/-   ##
============================================
+ Coverage     61.75%   63.62%   +1.87%     
- Complexity      207     1482    +1275     
============================================
  Files          2436     2727     +291     
  Lines        133233   152607   +19374     
  Branches      20636    23582    +2946     
============================================
+ Hits          82274    97099   +14825     
- Misses        44911    48186    +3275     
- Partials       6048     7322    +1274     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.58% <27.48%> (+1.87%) ⬆️
java-21 63.52% <27.48%> (+1.89%) ⬆️
skip-bytebuffers-false 63.62% <27.48%> (+1.87%) ⬆️
skip-bytebuffers-true 63.48% <27.48%> (+35.75%) ⬆️
temurin 63.62% <27.48%> (+1.87%) ⬆️
unittests 63.62% <27.48%> (+1.87%) ⬆️
unittests1 56.23% <42.85%> (+9.34%) ⬆️
unittests2 33.97% <19.43%> (+6.23%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@albertobastos
Copy link
Contributor Author

Is ClientQueryID a new concept? Is it same as requestID ?

How does the support added here improve the existing Query Cancellation (which is also exposed to user IIRC) ?

Hi Siddharth,

AFAIK, the current cancellation feature depends on the internal requestId generated by the broker itself. That request id is not returned until the query completes, so an external user requires first to ask for the active running queries, determine from the responded array the requestId assigned to the one he's interested in (just comparing the query body) and finally use the cancel operation to abort it. That's two back-and-forth trips between the user and the cluster.

With a client-provided requestId he can skip one step, going straight to the cancel operation using his own ID to abort the query.

@albertobastos
Copy link
Contributor Author

As some extra context, the endgame of this is enable on UI a "Cancel" button the customer can use to abort an ongoing query. Using a query id provided by the customer or the UI itself, that can be done without need of any internal id retrieval.

@albertobastos albertobastos changed the title add clientQueryId and its cancel operation Enable query cancellation for MSQE + cancel using client-provided id Jan 27, 2025
@albertobastos albertobastos marked this pull request as ready for review January 29, 2025 14:19
Copy link
Contributor

@gortiz gortiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding several comments but I wasn't able to read the whole PR.

Although I'm asking for changes, it is a good PR overall. We just need to finish the last mile.

Comment on lines 554 to 557
} catch (InterruptedException e) {
//TODO: handle interruption
//Thread.currentThread().interrupt();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to fix this TODO before merging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestion on how we should deal with an interruption here? Just warn it and skip the sleep or propagate the error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given you cannot throw interruption exception here, you have to mark the interruption flag again and probably throw another exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes total sense f8d23cd

Comment on lines +103 to +111
boolean enableQueryCancellation =
Boolean.parseBoolean(config.getProperty(CommonConstants.Broker.CONFIG_OF_BROKER_ENABLE_QUERY_CANCELLATION));
if (enableQueryCancellation) {
_queriesById = new ConcurrentHashMap<>();
_clientQueryIds = new ConcurrentHashMap<>();
} else {
_queriesById = null;
_clientQueryIds = null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not something we introduced in this PR, but something I think we need to take care of in the future:

We use BaseBrokerRequestHandler as the root/common state for the broker, probably for historical reasons. But that is not true. A single broker may have SSE, MSE, GRPC and even TSE queries running at the same time. It would be a better design to have a shared state between them instead of the trick we do with the delegate.

This is something we need to improve in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the shared state refactor, we should write it down somewhere so we actually do it ;-)

@@ -179,6 +198,9 @@ protected abstract BrokerResponse handleRequest(long requestId, String query, Sq
@Nullable HttpHeaders httpHeaders, AccessControl accessControl)
throws Exception;

protected abstract boolean handleCancel(long queryId, int timeoutMs, Executor executor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need javadoc here to explain how it should work. At least we should say that queryId may be a client or pinot generated id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the queryId received here always refers to a broker-generated internal id. The clientQueryId -> brokerQueryId translation is done by BaseBrokerRequestHandler.cancelQueryByClientId.

Added some minimal javadoc here: 52998d3

I tried to mimic the current code design for handleRequest, but it is a bit confusing the existance of two handleRequest methods here:

  • A public method implemented by the interface and called from the endpoint layer that receives a SqlNodeAndOptions parameter.
  • A protected method called from the previous one and already receiving a requestId and the query's string itself.

To increase confusion, neither of the two methods have a javadoc.

This probably could get better designed if we move forward with the proposed shared state design.

Copy link
Contributor Author

@albertobastos albertobastos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, @gortiz

Besides my doubts on how to handle the sleep interruption (is it really necessary now that we only enable it during tests?) and some future tasks and refactors derived from the PR, I believe I follow your advice on all your suggestions.

Comment on lines 554 to 557
} catch (InterruptedException e) {
//TODO: handle interruption
//Thread.currentThread().interrupt();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestion on how we should deal with an interruption here? Just warn it and skip the sleep or propagate the error?

Comment on lines +103 to +111
boolean enableQueryCancellation =
Boolean.parseBoolean(config.getProperty(CommonConstants.Broker.CONFIG_OF_BROKER_ENABLE_QUERY_CANCELLATION));
if (enableQueryCancellation) {
_queriesById = new ConcurrentHashMap<>();
_clientQueryIds = new ConcurrentHashMap<>();
} else {
_queriesById = null;
_clientQueryIds = null;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the shared state refactor, we should write it down somewhere so we actually do it ;-)

@@ -179,6 +198,9 @@ protected abstract BrokerResponse handleRequest(long requestId, String query, Sq
@Nullable HttpHeaders httpHeaders, AccessControl accessControl)
throws Exception;

protected abstract boolean handleCancel(long queryId, int timeoutMs, Executor executor,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the queryId received here always refers to a broker-generated internal id. The clientQueryId -> brokerQueryId translation is done by BaseBrokerRequestHandler.cancelQueryByClientId.

Added some minimal javadoc here: 52998d3

I tried to mimic the current code design for handleRequest, but it is a bit confusing the existance of two handleRequest methods here:

  • A public method implemented by the interface and called from the endpoint layer that receives a SqlNodeAndOptions parameter.
  • A protected method called from the previous one and already receiving a requestId and the query's string itself.

To increase confusion, neither of the two methods have a javadoc.

This probably could get better designed if we move forward with the proposed shared state design.

@gortiz gortiz merged commit f1509f8 into apache:master Feb 11, 2025
20 of 21 checks passed
@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes multi-stage Related to the multi-stage query engine documentation labels Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation feature multi-stage Related to the multi-stage query engine release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants