-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flushing Proxy Channels at CPU side upon reaching the Inflight Request Limit #415
base: main
Are you sure you want to change the base?
Conversation
@@ -11,6 +11,8 @@ | |||
|
|||
namespace mscclpp { | |||
|
|||
constexpr int MAX_INFLIGHT_REQUEST = 500; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the default CQ size. It's a constant number or can be configured by users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value is 1024, depending on the hardware the value can be bigger, for example the Mellanox HCA can support up to 2048. So the user can configure a different value, but must be small than the hardware specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value is 1024, depending on the hardware the value can be bigger, for example the Mellanox HCA can support up to 2048. So the user can configure a different value, but must be small than the hardware specification.
Can we make this value configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -85,6 +93,7 @@ ProxyHandlerResult ProxyService::handleTrigger(ProxyTrigger triggerRaw) { | |||
if (trigger->fields.type & TriggerSync) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we just say trigger->fields.type & TriggerSync || inflightRequests > MAX_INFLIGHT_REQUEST
here to merge the same block above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current way might have performance issue. When we call flush
, it does busy polling until all inflight requests appear in CQ. This might waste some CPU cycles which instead could be used for submitting new requests. A better logic could be:
while (proxy_thread_running) {
if (there_are_new_requests_to_submit) submit_requests();
inflight += num_submitted_requests;
if (there_are_new_requests_completed) complete_requests();
inflight -= num_completed_requests;
}
But current connection::flush
api doesn't do this. Perhaps we need to expose something like https://github.com/microsoft/mscclpp/blob/main/src/ib.cc#L288 to proxy thread.
No description provided.