-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit GroupByCombineOperator to use 2 * numCores threads instead of creating one task per operator #14843
Limit GroupByCombineOperator to use 2 * numCores threads instead of creating one task per operator #14843
Conversation
…reating one task per operator
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #14843 +/- ##
============================================
+ Coverage 61.75% 63.75% +2.00%
- Complexity 207 1612 +1405
============================================
Files 2436 2708 +272
Lines 133233 151293 +18060
Branches 20636 23357 +2721
============================================
+ Hits 82274 96464 +14190
- Misses 44911 47594 +2683
- Partials 6048 7235 +1187
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Can we verify that empirically with a test? |
I updated the existing Old (without changes from this PR)
New (with changes from this PR)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Thanks Yash! I know running these tests is very time consuming, but given it wasn't possible limit the change to only MSE, I wanted to be sure.
Math.min(numSegments, maxExecutionThreads)
wheremaxExecutionThreads
is calculated asMath.max(1, Math.min(10, numCores / 2))
(unless it is explicitly specified in the query options usingmaxExecutionThreads
).GroupByCombineOperator
specifically, thismaxExecutionThreads
is currently overridden to be equal to the number of segments being processed (see here by default for some reason.2 * numCores
by default (this one).numTasks
are the number of async tasks submitted to the executor service, the actual number of threads used by the server would be limited to the fixed number of threads available in the executor service.QueryRunner
- i.e., this cached thread pool executor that can spawn infinite threads. This means that for leaf stage operators that involve a group by, the number of threads spawned for a single multi-stage query on each server can be equal to the number of segments being processed on that server (could be thousands of threads which is super problematic).GroupByCombineOperator
by default to be the default number of query worker threads (which is2 * numCores
). This shouldn't impact SSQE performance while also avoiding the creation of a huge number of threads in MSQE.