Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std assumes that accessing an already-initialized TLS variable is async-signal-safe #133698

Open
RalfJung opened this issue Dec 1, 2024 · 9 comments
Labels
A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows A-thread-locals Area: Thread local storage (TLS) C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented Dec 1, 2024

TLS accesses are not documented to be async-signal-safe, but they mostly are – except for the first access to a variable. We rely on that: we install signal handlers that access thread-local variables, but we ensure they have been accessed before setting up the signal handler. This would be a good thing to discuss with the runtimes we depend on, since hopefully they can provide this as an actual guarantee, or else we'll learn that we can't rely on it.

@joboet @the8472 in terms of interaction with POSIX, what exactly are we relying on here? Or are the kind of TLS variables we are using not part of POSIX? Is it a libc thing, a linker thing, or something else?

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Dec 1, 2024
@joboet
Copy link
Member

joboet commented Dec 1, 2024

in terms of interaction with POSIX, what exactly are we relying on here? Or are the kind of TLS variables we are using not part of POSIX? Is it a libc thing, a linker thing, or something else?

That depends on whether native TLS is supported. If it is, then TLS is provided by the combined efforts of the dynamic linker and libc (it further depends on the TLS access model, e.g. local-exec cannot actually be made async-signal unsafe). If it is not, we rely on pthread keys, those are provided by libc.

@the8472
Copy link
Member

the8472 commented Dec 1, 2024

Both. It's part of posix when using pthread_key_create & co. It's part of a libc's ABI, compiler backends and linking when using #[thread_local]
It depends on individual platforms which one gets used.

but we ensure they have been accessed before setting up the signal handler.

TLS values can get destroyed and then require reinitialization if some code runs during or after thread local cleanup. So having them initialized once is not a guarantee that the allocation will not happen again.

we install signal handlers that access thread-local variables

Hrrm, maybe we could put those variables somewhere in the signal stack allocation instead, but I don't know of a async-signal-safe way to retrieve the stack base.

@joboet
Copy link
Member

joboet commented Dec 1, 2024

TLS values can get destroyed and then require reinitialization if some code runs during or after thread local cleanup. So having them initialized once is not a guarantee that the allocation will not happen again.

Currently, the signal stack is uninstalled before destructors are run, so stack overflows will just directly crash. #131282 will change this, but it also switches the TLS variables to local_pointer!s, which do not have a destructor and are always available once initialized.

@joboet
Copy link
Member

joboet commented Dec 1, 2024

Hrrm, maybe we could put those variables somewhere in the signal stack allocation instead, but I don't know of a async-signal-safe way to retrieve the stack base.

That sigaltstack is not documented to be AS-safe seems like an oversight considering the existence of SS_ONSTACK (which indicates that one is running on the alternate signal stack), so we could use that.

@RalfJung
Copy link
Member Author

RalfJung commented Dec 1, 2024

it further depends on the TLS access model, e.g. local-exec cannot actually be made async-signal unsafe

If we let people change the TLS model (rust-lang/compiler-team#805), we better make sure that std is sound with all of them.

@the8472
Copy link
Member

the8472 commented Dec 1, 2024

Hrrm, maybe we could put those variables somewhere in the signal stack allocation instead, but I don't know of a async-signal-safe way to retrieve the stack base.

That sigaltstack is not documented to be AS-safe seems like an oversight considering the existence of SS_ONSTACK (which indicates that one is running on the alternate signal stack), so we could use that.

The gnu documentation says it's only as-unsafe on hurd.
https://www.gnu.org/software/libc/manual/html_node/Signal-Stack.html#index-sigaltstack

@joboet
Copy link
Member

joboet commented Dec 1, 2024

If we let people change the TLS model (rust-lang/compiler-team#805), we better make sure that std is sound with all of them.

That MCP is about TLS dialects, which are a bit different than models. Both the local-exec and initial-exec TLS models have AS-safe accesses by default, because the TLS access is performed simply by offsetting from the thread pointer. The general-dynamic and local-dynamic models are the problematic ones, independent of dialect, since they do a function call to retrieve the address, and that function is not always AS-safe (e.g. glibc lazily allocates the TLS storage, which makes the first access AS-unsafe). Here's a good summary of all this: https://maskray.me/blog/2021-02-14-all-about-thread-local-storage (it even mentions signal safety).

@joboet
Copy link
Member

joboet commented Dec 1, 2024

The gnu documentation says it's only as-unsafe on hurd. https://www.gnu.org/software/libc/manual/html_node/Signal-Stack.html#index-sigaltstack

That's unfortunate, but might not be deal-breaking. I just discovered this passage in the SUS (section 2.4.4):

In the presence of signals, all functions defined by this volume of POSIX.1-2024 shall behave as defined when called from or interrupted by a signal-catching function, with the exception that when a signal interrupts an unsafe function or function-like macro, or equivalent [...], and the signal-catching function calls an unsafe function or function-like macro, the behavior is undefined.

So as long as we require users to change the signal handler before calling sigaltstack, it will have defined behaviour (or at least it should). Sorry, I misunderstood that passage ("an unsafe function" not "this unsafe function").

@RalfJung
Copy link
Member Author

RalfJung commented Dec 1, 2024

That MCP is about TLS dialects, which are a bit different than models.

Ah, I didn't realize there's two (orthogonal?) degrees of freedom here. Thanks for pointing that out.

@joboet joboet added T-libs Relevant to the library team, which will review and decide on the PR/issue. A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows A-thread-locals Area: Thread local storage (TLS) labels Dec 1, 2024
@jieyouxu jieyouxu added C-bug Category: This is a bug. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows A-thread-locals Area: Thread local storage (TLS) C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants