-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Staging/iiod segfault #1243
base: main
Are you sure you want to change the base?
Staging/iiod segfault #1243
Conversation
On iiod_io_cancel(), io->write_token is protected by the iiod_responder lock while on iiod_io_wait_for_command_done() was using the iiod_io lock. That is surely not fine, so always use the iiod_responder lock to protect io->write_token. Also, make sure the NULL check in iiod_enqueue_command() is properly locked. Signed-off-by: Nuno Sá <nuno.sa@analog.com>
536bae6
to
e4c2a48
Compare
v2:
|
task.c
Outdated
@@ -94,6 +87,54 @@ static int iio_task_run(void *d) | |||
|
|||
return 0; | |||
} | |||
static int __iio_task_sync(struct iio_task_token *token, unsigned int timeout_ms, bool token_destroy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's best to avoid using a double underscore at the beginning of an identifier name, because it is reserved for system use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can change that. Though I find it very unlikely to have any kind of conflict with the above symbol :)
The solution looks complex but at the same time necessary to make the IIOD stable. LGTM! |
Not seeing any other way. We could go with refcounts but I fear that would be more complex... Also looks like a good practice to let any ongoing block to finish the current operation... But IIOD actually blocks (takes a lot of time) sending a reply back to the client (the dequeued block IIRC) which seems there might be something to be improved on the client side. |
75b7010
to
e4c2a48
Compare
I have also tested your changes with iio_rwdev and ad9361-iiostream and haven't encountered any issues. The IIOD seems stable. |
Make sure to hit ctrl-c for exiting... That was triggering the issue for me. You'll likely see that it now takes sometime to exit the app and the blocking point is io_submit() on IIOD. Not really sure why... Anyways, there's a "fix" that can be done in iio_stream_destroy() on the client side which makes sense to me... I'll send it in another PR. But I'll still dig a bit more just to make sure thereś nothing fundamentally wrong in IIOD causing io_submit() to block for whatever reason. |
We need to completely synchronize handle_free_blocks() with operations that might enqueue them in another task (like handle_transfer_block()). Otherwise we can race in a way that a block is freed but it will still be enqueued in another task which will lead to a user after free. For doing this we add get_iio_block_unlocked() and lock around it so we can synchronize beyond getting the block. Note that this still allows for a block to be enqueued in a task and then freed while in use by that task. But it does not allow for a block to be freed __while__ being enqueued which will allow for a future fix where we can properly sync tokens before freeing the block. Signed-off-by: Nuno Sá <nuno.sa@analog.com>
We were getting the buffer lock in get_iio_block(), releasing it and then grabbing it again. It does not make sense so let's make use of get_iio_block_unlocked() and grab the lock for the whole time. Also, there's no point in going two times over the buffer list of blocks. Just pass the buffer_entry pointer inside get_iio_block_unlocked(). Signed-off-by: Nuno Sá <nuno.sa@analog.com>
By default, the struct iio_task_token token is destroyed after syncing the task. In preparation for adding tokens with no bounded lifetimes (meaning that outer callers can create and destroy tokens as needed), split iio_task_sync() so we can control if the token is automatically destroyed or not. Signed-off-by: Nuno Sá <nuno.sa@analog.com>
In preparation for adding support for creating and destroying struct iio_task_token tokens with full control of their lifetimes, add an helper function to find tokens in the task list as that code will latter be reused outside of iio_task_cancel(). Signed-off-by: Nuno Sá <nuno.sa@analog.com>
This change is also in preparation for adding support for creating and destroying struct iio_task_token tokens with full control of their lifetimes. Hence, iio_task_token_do_enqueue() is added so that we can reuse the code that adds a token to a task (and better control if it's an existing vs a new token). Signed-off-by: Nuno Sá <nuno.sa@analog.com>
This adds APIs to create, enqueue and destroy (destroying just turns the already existing function public) struct iio_task_token tokens. This will be important in fixing a race between freeing IIO blocks and enqueue/dequeueing them since with this, we will be able to properly sync on the tokens. Signed-off-by: Nuno Sá <nuno.sa@analog.com>
Normally, iio_task_cancel() is combined with __iio_task_sync(). Hence let's add an helper that combines these two calls. Note that it will automatically call __iio_task_sync() with token_destroy set to false. Signed-off-by: Nuno Sá <nuno.sa@analog.com>
IIOD was often crashing during client removal when buffering. The reason was that blocks could be enqueued in the (d)enqueue tasks, then freed and only then the tasks would process the already freed block. In order to fix this (and now that we have no possible race between handle_free_block() and handle_transfer_block()), we need to sync on the (d)enqueue block tokens before freeing it. Like this we can be sure that the block won't be used after it is freed. Signed-off-by: Nuno Sá <nuno.sa@analog.com>
e4c2a48
to
b917d12
Compare
v3:
|
PR Description
This series fixes a bug where iiod crashes (very often actually). To reproduce the issue one just needs to start a buffering session (I tested with the network backend) with iio_rwdev and hit CTRL-C. IIOD crashed fairy easy and the reason is that blocks could be enqueued in the (d)enqueue tasks, then freed and only then the tasks would process the already freed block.
the first patch is a bit unrelated but it fixes some issues regarding locking of the iiod_io write_token. Then, it's all about preparing the path (re-working task.c) for the real fix in the last commit.
PR Type
PR Checklist