-
libcu++ introduces the notion of a "thread scope" on its synchronization primitives like
e.g., a However, it is not clear if an object with a particular scope can or should be reused by different threads that belong to a particular scope. For example:
Here we have an atomic scoped to a single thread, but used from different threads. The current specification for thread scopes says that:
In other words, so long as threads in different scopes don't attempt to concurrently use the same object, there appears to be no problem. The Note that this code "works", but is it intentional to allow this behavior? Consider another example where we have a
The spec says nothing that would exclude this from being a well-defined program. Is this intentional? If not, do we think it is important to allow this behavior? @ogiroux I'm especially interested in hearing what you think about this. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 9 replies
-
I think it's important that it be allowed in C++, so this was intentional when it was written. Consider the first example with the atomic int, except replace it with a plain a non-atomic int instead -- that is obviously correct. I think it's important to be able to explain atomic int as a superset of an int, and not a super-subset of an int. The barrier case looks different on the surface, but I think it should be allowed that: 1) barrier objects could be constructed once in a pool and reused across many blocks in the life of a grid, and 2) that the CPU be allowed to construct this pool. TL;DR: I think it would be aberrant if atomic types were somehow weaker than non-atomic ones; thread scopes determine if you'll get atomic or non-atomic semantics for a given conflict but doesn't change what those semantics mean. |
Beta Was this translation helpful? Give feedback.
-
Question is regarding the thread_scope state composed into the atomic.
Assertion is that construction of an object (atomic, barrier, ...) with specified thread_scope allows (implies) that the object may be constructed to have state specific to the thread_scope in which it was constructed. |
Beta Was this translation helpful? Give feedback.
-
For posterity, we decided to place a specific limitation on For all other data structures and state spaces, it remains valid to reuse those objects across different thread groups. See: NVIDIA/cccl#75 |
Beta Was this translation helpful? Give feedback.
For posterity, we decided to place a specific limitation on
cuda::barrier<cuda::thread_scope_block>
in__shared__
memory such that it can only be used by the threads in the CTA of the thread that constructed it. This was necessary compromise in order to leverage certain hardware acceleration features.For all other data structures and state spaces, it remains valid to reuse those objects across different thread groups.
See: NVIDIA/cccl#75