Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ldms_stream_close race fix #1294

Merged
merged 3 commits into from
Oct 27, 2023

Conversation

narategithub
Copy link
Collaborator

This pull request includes

  • ldms_stream_close() race fix
  • Python ldms.Xprt.stream_unscribe() GIL deadlock fix
  • Exporting ovis_log_set_level_by_name() to Python ldms for debugging / verification purpose.

The following is the description of the ldms_stream_close() race:

Stream-Client-Entry (sce) is a tailq entry that lives in
stream->client_tq (list of clients subscribing the stream) and
client->stream_tq (list of streams the client subscribed to) to
manage client-stream relation. When a stream client closes, it is
unbound from the stream, and the sce is removed from the list. This
leads to a use-after-free when another thread iterate through the
stream->client_tq and unlock to call the application callback
function. This patch modifies the unbind procedure to NOT remove the
sce, and just unbind the client - stream from the sce. The unbound
sce is later removed from the client->stream_tq list when the client
is closed, and is removed from stream->client_tq when the callback
processing thread detects an unbound sce.

ldms.Xprt.stream_unsubscribe() held GIL (Python's Global Interpreter
Lock) while it `sem_wait()` for the result. This resulted in a dead lock
on the GIL. This patch removed the unnecessary GIL holding in
ldms.Xprt.stream_unsubscribe() function.
This is used in test script to control DEBUG logging (e.g. for
verification).
Stream-Client-Entry (sce) is a tailq entry that lives in
`stream->client_tq` (list of clients subscribing the stream) and
`client->stream_tq` (list of streams the client subscribed to) to
manage client-stream relation. When a stream client closes, it is
unbound from the stream, and the `sce` is removed from the list. This
leads to a use-after-free when another thread iterate through the
`stream->client_tq` and unlock to call the application callback
function. This patch modifies the unbind procedure to NOT remove the
`sce`, and just unbind the client - stream from the `sce`. The unbound
`sce` is later removed from the `client->stream_tq` list when the client
is closed, and is removed from `stream->client_tq` when the callback
processing thread detects an unbound `sce`.
@narategithub narategithub requested a review from tom95858 October 27, 2023 03:30
@tom95858 tom95858 merged commit 3fe7765 into ovis-hpc:OVIS-4 Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants