Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staging/iiod segfault #1243

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Staging/iiod segfault #1243

wants to merge 9 commits into from

Conversation

nunojsa
Copy link
Contributor

@nunojsa nunojsa commented Feb 11, 2025

PR Description

This series fixes a bug where iiod crashes (very often actually). To reproduce the issue one just needs to start a buffering session (I tested with the network backend) with iio_rwdev and hit CTRL-C. IIOD crashed fairy easy and the reason is that blocks could be enqueued in the (d)enqueue tasks, then freed and only then the tasks would process the already freed block.

the first patch is a bit unrelated but it fixes some issues regarding locking of the iiod_io write_token. Then, it's all about preparing the path (re-working task.c) for the real fix in the last commit.

PR Type

  • Bug fix (a change that fixes an issue)
  • New feature (a change that adds new functionality)
  • Breaking change (a change that affects other repos or cause CIs to fail)

PR Checklist

  • I have conducted a self-review of my own code changes
  • I have commented new code, particulary complex or unclear areas
  • I have checked that I did not intoduced new warnings or errors (CI output)
  • I have checked that components that use libiio did not get broken
  • I have updated the documentation accordingly (GitHub Pages, READMEs, etc)

On iiod_io_cancel(), io->write_token is protected by the iiod_responder
lock while on iiod_io_wait_for_command_done() was using the iiod_io
lock. That is surely not fine, so always use the iiod_responder lock to
protect io->write_token.

Also, make sure the NULL check in iiod_enqueue_command() is properly
locked.

Signed-off-by: Nuno Sá <nuno.sa@analog.com>
@nunojsa
Copy link
Contributor Author

nunojsa commented Feb 11, 2025

v2:

  • Fix some indentation issues on iio_task_token_find()

task.c Outdated
@@ -94,6 +87,54 @@ static int iio_task_run(void *d)

return 0;
}
static int __iio_task_sync(struct iio_task_token *token, unsigned int timeout_ms, bool token_destroy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's best to avoid using a double underscore at the beginning of an identifier name, because it is reserved for system use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can change that. Though I find it very unlikely to have any kind of conflict with the above symbol :)

@dNechita
Copy link
Contributor

The solution looks complex but at the same time necessary to make the IIOD stable. LGTM!

@nunojsa
Copy link
Contributor Author

nunojsa commented Feb 12, 2025

The solution looks complex but at the same time necessary to make the IIOD stable. LGTM!

Not seeing any other way. We could go with refcounts but I fear that would be more complex... Also looks like a good practice to let any ongoing block to finish the current operation... But IIOD actually blocks (takes a lot of time) sending a reply back to the client (the dequeued block IIRC) which seems there might be something to be improved on the client side.

@nunojsa nunojsa force-pushed the staging/iiod-segfault branch from 75b7010 to e4c2a48 Compare February 12, 2025 17:07
@dNechita
Copy link
Contributor

I have also tested your changes with iio_rwdev and ad9361-iiostream and haven't encountered any issues. The IIOD seems stable.

@nunojsa
Copy link
Contributor Author

nunojsa commented Feb 13, 2025

I have also tested your changes with iio_rwdev and ad9361-iiostream and haven't encountered any issues. The IIOD seems stable.

Make sure to hit ctrl-c for exiting... That was triggering the issue for me. You'll likely see that it now takes sometime to exit the app and the blocking point is io_submit() on IIOD. Not really sure why... Anyways, there's a "fix" that can be done in iio_stream_destroy() on the client side which makes sense to me... I'll send it in another PR. But I'll still dig a bit more just to make sure thereś nothing fundamentally wrong in IIOD causing io_submit() to block for whatever reason.

We need to completely synchronize handle_free_blocks() with operations
that might enqueue them in another task (like handle_transfer_block()).
Otherwise we can race in a way that a block is freed but it will still
be enqueued in another task which will lead to a user after free. For
doing this we add get_iio_block_unlocked() and lock around it so we can
synchronize beyond getting the block.

Note that this still allows for a block to be enqueued in a task and
then freed while in use by that task. But it does not allow for a block
to be freed __while__ being enqueued which will allow for a future fix where
we can properly sync tokens before freeing the block.

Signed-off-by: Nuno Sá <nuno.sa@analog.com>
We were getting the buffer lock in get_iio_block(), releasing it and
then grabbing it again. It does not make sense so let's make use of
get_iio_block_unlocked() and grab the lock for the whole time. Also,
there's no point in going two times over the buffer list of blocks. Just
pass the buffer_entry pointer inside get_iio_block_unlocked().

Signed-off-by: Nuno Sá <nuno.sa@analog.com>
By default, the struct iio_task_token token is destroyed after syncing the
task. In preparation for adding tokens with no bounded lifetimes
(meaning that outer callers can create and destroy tokens as needed),
split iio_task_sync() so we can control if the token is automatically
destroyed or not.

Signed-off-by: Nuno Sá <nuno.sa@analog.com>
In preparation for adding support for creating and destroying
struct iio_task_token tokens with full control of their lifetimes, add
an helper function to find tokens in the task list as that code will
latter be reused outside of iio_task_cancel().

Signed-off-by: Nuno Sá <nuno.sa@analog.com>
This change is also in preparation for adding support for creating and
destroying struct iio_task_token tokens with full control of their
lifetimes. Hence, iio_task_token_do_enqueue() is added so that we can
reuse the code that adds a token to a task (and better control if it's
an existing vs a new token).

Signed-off-by: Nuno Sá <nuno.sa@analog.com>
This adds APIs to create, enqueue and destroy (destroying just turns the
already existing function public)  struct iio_task_token tokens. This
will be important in fixing a race between freeing IIO blocks and
enqueue/dequeueing them since with this, we will be able to properly sync
on the tokens.

Signed-off-by: Nuno Sá <nuno.sa@analog.com>
Normally, iio_task_cancel() is combined with __iio_task_sync(). Hence
let's add an helper that combines these two calls. Note that it will
automatically call __iio_task_sync() with token_destroy set to false.

Signed-off-by: Nuno Sá <nuno.sa@analog.com>
IIOD was often crashing during client removal when buffering. The
reason was that blocks could be enqueued in the (d)enqueue tasks, then
freed and only then the tasks would process the already freed block. In
order to fix this (and now that we have no possible race between
handle_free_block() and handle_transfer_block()), we need to sync on the
(d)enqueue block tokens before freeing it. Like this we can be
sure that the block won't be used after it is freed.

Signed-off-by: Nuno Sá <nuno.sa@analog.com>
@nunojsa nunojsa force-pushed the staging/iiod-segfault branch from e4c2a48 to b917d12 Compare February 14, 2025 11:18
@nunojsa
Copy link
Contributor Author

nunojsa commented Feb 14, 2025

v3:

  • Renamed __iio_task_sync();
  • Fixed codacy complain (an actual deadlock);

@nunojsa nunojsa mentioned this pull request Feb 14, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants