-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify the releasing Zap endpoints from IO thread process #1303
Conversation
The patch is a part of the bigger patch to improve the thread_stats command. |
abbc8eb
to
2daaf53
Compare
lib/src/zap/zap_priv.h
Outdated
@@ -478,14 +478,23 @@ double zap_env_dbl(char *name, double default_value); | |||
zap_err_t zap_io_thread_ep_assign(zap_ep_t ep, int tpi); | |||
|
|||
/** | |||
* Release \c ep from the zap io thread. | |||
* Disable \c ep assigned to the zap io thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explanation on zap_io_thread_ep_release()
and zap_io_thread_ep_remove()
are confusing to me. Not sure how to make it better though ... maybe just saying release
means the endpoint won't be processed by the thread any more (i.e. no more event callbacks to the application) but the endpoint still keep a reference to the thread for further stat data access, while the remove
means disassociate the endpoint from the thread and reducing thead's ep load? I'm not sure ...
@nichamon I think the code looks OK to me. Just a nit on the function documentation. |
2daaf53
to
affb893
Compare
The patch separates the process of reducing the number of Zap endpoints associated with an IO thread from the process of releasing the endpoint from the IO thread. Additionally, it defers the nullification of `zap->thread` until the endpoint reference reaches zero. `zap_sock` and `zap_ugni` need to remove their events from epoll before delivering the disconnected event to applications, preventing the reception of additional epoll events. However, without this patch, this results in resetting `zap->thread` to NULL. Thus, the applications cannot retrive Zap thread information in the disconnected path. The patch addresses this limitation.
affb893
to
71d1f6e
Compare
I think this creates a race between disconnect and ldms_xprt_update. With this change isn't it possible to have an ldms_xprt_update succeed after a disconnect? |
The patch keeps the existing logic for event removal from the epoll by I made this patch so that Zap's applications can obtain the IO thread statistics of a Zap endpoint in the disconnected path. @narategithub has suggested another change to facilitate this. The suggestion was to add a pointer to the |
@nichamon thank you for looking into this... |
Is everyone happy with this?? |
@tom95858 I think this patch looks OK to me. |
The patch separates the process of reducing the number of Zap endpoints associated with an IO thread from the process of releasing the endpoint from the IO thread. Additionally, it defers the nullification of
zap->thread
until the endpoint reference reaches zero.zap_sock
andzap_ugni
need to remove their events from epoll before delivering the disconnected event to applications, preventing the reception of additional epoll events. However, without this patch, this results in resettingzap->thread
to NULL. Thus, the applications cannot retrive Zap thread information in the disconnected path. The patch addresses this limitation.