-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Numa affinity for worker threads #1130
Conversation
@maikel I can run this code if you need numa machines to benchmark on. |
/ok to test |
My current test machine has 2 numa nodes:
I will contact you for more data when I'm done with an initial attempt |
/ok to test |
/ok to test |
static_thread_pool( | ||
std::uint32_t threadCount, | ||
bwos_params params = {}, | ||
numa_policy* numa = get_numa_policy()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not a fan of raw pointers in public interfaces. why does numa_policy
need to be dynamically polymorphic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like a template parameter for the Numa policy more? There is actually no reason to use type erasure when all member functions are defined inline anyway. That way I wouldn't need to hard code an allocator, too.
We have discussed that the planned improvements to this happen in an extra PR. |
/ok to test |
This PR introduces numa awareness for the
static_thread_pool
.Summary of Changes
STDEXEC_ENABLE_NUMA
to explicitly opt-in numa awareness. Notice that we need to linkstdexec
withlibnuma
, if enabled. We might also introduce a new CMake target especially forexec
or forexec::static_thread_pool
.static_thread_pool
takes an additional pointer toexec::numa_policy
, which defaults toexec::get_default_numa_policy()
. This Numa policy defines a distribution mapping, which maps a thread number to a Numa node. It also provides a member function to bind the current thread to a specified Numa node. IfSTDEXEC_ENABLE_NUMA
is false, then the numa policy is doing nothing.exec::numa_allocator<T>
that allocates memory on a specified numa node.static_thread_pool::get_scheduler_on(numa_node_mask)
andstatic_thread_pool::get_scheduler_on(cpu_mask)
to return a scheduler that schedules with specified constraints.API Design
This PR introduces the interface
numa_policy
which is defined asThe thread pool takes a pointer to
numa_policy
as an optional argument to customize the distribution of worker threads to Numa nodes.The
static_thread_pool
has the following ways to get a scheduler with certain propertiesBenchmarks
We perform a benchmark on a machine with 2 numa nodes. Each node has 14 cores (28 threads).
We perform the nested schedule benchmark from the last PR and we see that the scaling is better with numa affinity enabled.
The OS scheduler does an amazing job when having less than 28 threads.
Max throughput
Which is an improvement of roughly 20%