tests: dynamic_thread_stack: Fix multicore and incoherent cache #68311

ceolin · 2024-01-30T20:55:11Z

When running it in a multicore and with incoherent cache environment it
is possible that the thread allocating dynamic stacks is switched to a
different cpu. In this situation further access to that memory (like
when releasing resources) will be invalid.

Fixes #67515

andyross

Hm. How is the dynamic stack incoherent? The only incoherent writable memory should be static thread stacks.

Really the issue should be the opposite: we should want the dynamic stacks to be cached (to avoid a performance disaster), but they aren't because nothing in the system heap understands the intel_adsp caching weirdness. Didn't I remember that there was a patch a while back that made sure that we don't enable both KERNEL_COHERENCE and DYNAMIC_THREAD at the same time? While in theory they can be made to work, they really can't in a way that a practical system could ship.

andyross · 2024-01-30T23:12:51Z

tests/kernel/threads/dynamic_thread_stack/src/main.c

@@ -111,6 +112,8 @@ ZTEST(dynamic_thread_stack, test_dynamic_thread_stack_alloc)
 		}
 	}

+	sys_cache_data_flush_all();


Should add a comment explaining the specific coherence problems on intel_adsp, otherwise this looks like voodoo

that is fair. Btw, just checked that this is not enough if the thread allocating these stacks switches to a different cpu before flush the cache.

andyross

Actually thinking more on this, I think I'm a -1. Workarounds like this simply aren't supposed to be needed. On incoherent xtensa platforms, you get uncached/coherent memory everywhere, for correctness.[1] You can get to cached memory if you insist, but that becomes an app design decision[2], and the app becomes responsible for validation.

I really don't think this test is be doing any of that, not having been written with an eye to the xtensa cache madness. If it managed to get its hands on incoherent memory then it's either breaking rules about the use of its own stack in a way that KERNEL_COHERENCE can't catch, or we have a bug somewhere else in the platform layer. Either way we should fix that elsewhere, not patch the test.

[1] With the sole exception of thread stack memory, for which there's a mildly complicated validation framework in place to catch mistakes like e.g. putting shared kernel data on the stack.

[2] e.g. the SOF application heap allows you to specify whether you want cached memory or not.

ceolin · 2024-01-31T00:07:41Z

Hm. How is the dynamic stack incoherent? The only incoherent writable memory should be static thread stacks.

Really the issue should be the opposite: we should want the dynamic stacks to be cached (to avoid a performance disaster), but they aren't because nothing in the system heap understands the intel_adsp caching weirdness. Didn't I remember that there was a patch a while back that made sure that we don't enable both KERNEL_COHERENCE and DYNAMIC_THREAD at the same time? While in theory they can be made to work, they really can't in a way that a practical system could ship.

Why not ? That is not the system heap, but a heap defined in the application for this only purpose.

Actually thinking more on this, I think I'm a -1. Workarounds like this simply aren't supposed to be needed. On incoherent xtensa platforms, you get uncached/coherent memory everywhere, for correctness.[1] You can get to cached memory if you insist, but that becomes an app design decision[2], and the app becomes responsible for validation.

Agree, the application is responsible. In this case the application is the test itself isn't ? That is why the test is defining a heap in the cached region.

The change though is not enough, if the thread that is allocating stacks changes cpu before flushing the cache we have a problem. One way to do this is pinning the thread that allocate stacks, spawn threads and release these resources.

I really don't think this test is be doing any of that, not having been written with an eye to the xtensa cache madness. If it managed to get its hands on incoherent memory then it's either breaking rules about the use of its own stack in a way that KERNEL_COHERENCE can't catch, or we have a bug somewhere else in the platform layer. Either way we should fix that elsewhere, not patch the test.

[1] With the sole exception of thread stack memory, for which there's a mildly complicated validation framework in place to catch mistakes like e.g. putting shared kernel data on the stack.

[2] e.g. the SOF application heap allows you to specify whether you want cached memory or not.

That is not what we are doing here ? defining a heap in cached area and being responsible for it ?

andyross · 2024-01-31T01:07:12Z

Again though: how is the test getting the cached memory to begin with? Everything the default linker gives you (except statically defined thread stacks) should be uncached. That's where the bug is.

ceolin · 2024-01-31T02:36:20Z

Again though: how is the test getting the cached memory to begin with? Everything the default linker gives you (except statically defined thread stacks) should be uncached. That's where the bug is.

The heap is being defined in __incoherent section. https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/kernel/threads/dynamic_thread_stack/src/main.c#L22

Though, I have a different approach. Move the heap to coherent memory and get the uncached pointer for the allocated stack. I know, that is taking advantage of the memory being mapped twice, but the alternative is really put the heap in incoherent memory and ensure that who is manipulating that heap is aware of that. Threads just using the allocate stack does not need any further change since the kernel expects stacks being cached.

Please take a look in the new version.

ceolin · 2024-01-31T02:46:13Z

@andyross that is an alternative to ..

https://gist.github.com/ceolin/83009ab1938ca6134a218cdab99aa441

What you think ? Do you see a problem in the one in this pr ? I don't see a reason to disable dynamic thread stack when having incoherent memory , it seems to me that who is using it should be responsible to ensure that is using cached memory properly.

nashif · 2024-01-31T15:46:12Z

tests/kernel/threads/dynamic_thread_stack/src/main.c

@@ -114,10 +114,27 @@ ZTEST(dynamic_thread_stack, test_dynamic_thread_stack_alloc)
 	/* spwan our threads */
 	for (size_t i = 0; i < N; ++i) {
 		tflag[i] = false;
+#ifdef CONFIG_XTENSA


with #68140, this will not be needed :)

was hoping for that :)

andyross · 2024-01-31T19:27:24Z

The heap is being defined in __incoherent section. https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/kernel/threads/dynamic_thread_stack/src/main.c#L22

Ah, bingo. OK, that's the root cause. That won't work, because the heap stores its metadata internally and needless to say that's going to break badly if it's allowed to skew between CPUs. Heaps on intel_adsp can only work over uncached memory, SOF makes caching work "inside" the blocks by padding/aligning them to cache lines, converting to a cached pointer, and invalidating on free. It's a lot more work than just putting it in the right section.

My preference would be to fix this by just removing that section assignment and eating the performance cost. Moving to a SOF-style heap would be possible but IMHO probably not worth it, especially given that the future of the platform is an MMU world where you wouldn't be using a sys_heap for stack allocation anyway (i.e. you'd just use a page or two and map them vs. trying to allocate the region contiguously).

ceolin · 2024-01-31T20:46:56Z

The heap is being defined in __incoherent section. https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/kernel/threads/dynamic_thread_stack/src/main.c#L22

Ah, bingo. OK, that's the root cause. That won't work, because the heap stores its metadata internally and needless to say that's going to break badly if it's allowed to skew between CPUs. Heaps on intel_adsp can only work over uncached memory, SOF makes caching work "inside" the blocks by padding/aligning them to cache lines, converting to a cached pointer, and invalidating on free. It's a lot more work than just putting it in the right section.

Yep got this the hard way :/ Btw, that can work if you ensure that the thread manipulating the heap does not change cpu.

My preference would be to fix this by just removing that section assignment and eating the performance cost. Moving to a SOF-style heap would be possible but IMHO probably not worth it, especially given that the future of the platform is an MMU world where you wouldn't be using a sys_heap for stack allocation anyway (i.e. you'd just use a page or two and map them vs. trying to allocate the region contiguously).

Did you see any problem with the current change? In the current implementation the heap is in uncached memory, but for the stack buffer we get a cached pointer (just because the memory is mapped twice). If we have a target that does not map the memory twice it will need to be more careful with the memory.

The buffer is page aligned so we can map it properly for userspace, the metadata, AFAIU, is out of these pages and shouldn't be accessible from the user thread. Am I missing something something ?

andyross · 2024-02-01T16:04:56Z

Oh, sorry. I didn't look at the filename. The heap is in the test, not the subsystem code. Honestly can't you just revert the __incoherent bit? It's a test, we don't care about performance per se.

As far as whether this particular fix is enough, my guess is no, actually. It might fix the specific problem seen, but in the general case any mutation of the heap metadata (i.e. any alloc or free) needs to be coherent, which means it's not enough to flush in one spot: every time you're about to change the heap you need to get every other CPU to invalidate the region that is going to be changed (which isn't really well-bounded, it can touch other chunks than just the ones you pass) and then flush that region from your own core. That's even harder than a SOF-style cached-blocks-inside-uncached-heap trick.

Just not worth it, IMHO. Removing the __incoherent is the right thing.

ceolin · 2024-02-01T17:45:40Z

Oh, sorry. I didn't look at the filename. The heap is in the test, not the subsystem code. Honestly can't you just revert the __incoherent bit? It's a test, we don't care about performance per se.

As far as whether this particular fix is enough, my guess is no, actually. It might fix the specific problem seen, but in the general case any mutation of the heap metadata (i.e. any alloc or free) needs to be coherent, which means it's not enough to flush in one spot: every time you're about to change the heap you need to get every other CPU to invalidate the region that is going to be changed (which isn't really well-bounded, it can touch other chunks than just the ones you pass) and then flush that region from your own core. That's even harder than a SOF-style cached-blocks-inside-uncached-heap trick.

Just not worth it, IMHO. Removing the __incoherent is the right thing.

It is already removed :) And there is no flush, I'd figured out the problem. The other proposal I had was to have only one pinned thread manipulating the heap, but that is out of the scope of this test. I put the heap in the coherent memory, but I am using the cached region for the stack.

andyross · 2024-02-01T19:10:42Z

tests/kernel/threads/dynamic_thread_stack/src/main.c

+		 * region but the architecture maps addressable RAM twice in
+		 * two different regions in a way that for any given pointer,
+		 * it is possible to convert it to/from a cached version.
+		 *


This is closer but still not quite right: the trick only works if the stack size and alignment is padded to fit within an integer number of cache lines, with no overlap on either side that might be used (e.g. for heap metadata) from another core. Or alternatively (since the allocation is done in subsystem code) you can trim out the cache-aligned subrange of the stack memory and use that.

That makes sense. Thinking about it a little bit more, isn't it a general requirement for targets with incoherent memory ? I mean even for static stacks, they should be cache aligned right, otherwise when flushing them we may be overwriting adjacent data in other cores. I don't remember have seen this, will take a look. Or I am just wrong about it :)

ceolin · 2024-02-06T17:47:25Z

@andyross you were right, we should just the pay the price in dynamic stacks. Keeping it in cached regions is over complicated, it is not only about cache alignment but when userspace is enabled the stack address is used to track this kernel object, when we pass a cached pointer the kernel is not able to get the kernel object associated with, requiring a lot of additional checks in different places.

andyross

Yeah. Cache incoherency is awful. I'll wear the scars from the APL DSP SMP bringup for the rest of my life.

nashif · 2024-02-06T20:12:42Z

@ceolin please fix that typo

When allowing dynamic thread stack allocation the stack may come from the heap in coherent memory, trying to use cached memory is over complicated because of heap meta data and cache line sizes. Also when userspace is enabled, stacks have to be page aligned and the address of the stack is used to track kernel objects. Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>

When running it in a multicore and with incoherent cache environment it is possible that the thread allocating dynamic stacks is switched to a different cpu. In this situation further access to that memory (like when releasing resources) will be invalid. Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>

andyross

refresh +1

zephyrbot added the area: Kernel label Jan 30, 2024

zephyrbot requested review from andyross, cfriedt, dcpleung, nashif, npitre and peter-mitsis January 30, 2024 20:55

zephyrbot assigned andyross Jan 30, 2024

dcpleung previously approved these changes Jan 30, 2024

View reviewed changes

andyross reviewed Jan 30, 2024

View reviewed changes

andyross requested changes Jan 30, 2024

View reviewed changes

ceolin dismissed dcpleung’s stale review via e6b7318 January 31, 2024 02:36

ceolin force-pushed the tests/dynamic-thread branch from d141d38 to e6b7318 Compare January 31, 2024 02:36

ceolin changed the title ~~tests: dynamic_thread_stack: Flush cache~~ tests: dynamic_thread_stack: Fix multicore and inocherent cache Jan 31, 2024

nashif reviewed Jan 31, 2024

View reviewed changes

andyross reviewed Feb 1, 2024

View reviewed changes

ceolin force-pushed the tests/dynamic-thread branch from e6b7318 to 98e1169 Compare February 6, 2024 17:45

andyross previously approved these changes Feb 6, 2024

View reviewed changes

Flavio Ceolin added 2 commits February 6, 2024 20:35

ceolin dismissed andyross’s stale review via 3ea82dd February 6, 2024 20:36

ceolin force-pushed the tests/dynamic-thread branch from 98e1169 to 3ea82dd Compare February 6, 2024 20:36

kartben changed the title ~~tests: dynamic_thread_stack: Fix multicore and inocherent cache~~ tests: dynamic_thread_stack: Fix multicore and incoherent cache Feb 6, 2024

nashif approved these changes Feb 6, 2024

View reviewed changes

dcpleung approved these changes Feb 6, 2024

View reviewed changes

fabiobaltieri added this to the v3.6.0 milestone Feb 7, 2024

cfriedt approved these changes Feb 9, 2024

View reviewed changes

andyross approved these changes Feb 9, 2024

View reviewed changes

nashif merged commit 36a497b into zephyrproject-rtos:main Feb 9, 2024
32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: dynamic_thread_stack: Fix multicore and incoherent cache #68311

tests: dynamic_thread_stack: Fix multicore and incoherent cache #68311

ceolin commented Jan 30, 2024 •

edited by nashif

Loading

andyross left a comment

andyross Jan 30, 2024

ceolin Jan 31, 2024 •

edited

Loading

andyross left a comment •

edited

Loading

ceolin commented Jan 31, 2024

andyross commented Jan 31, 2024

ceolin commented Jan 31, 2024

ceolin commented Jan 31, 2024

nashif Jan 31, 2024

ceolin Jan 31, 2024

andyross commented Jan 31, 2024

ceolin commented Jan 31, 2024 •

edited

Loading

andyross commented Feb 1, 2024

ceolin commented Feb 1, 2024

andyross Feb 1, 2024

ceolin Feb 1, 2024

ceolin commented Feb 6, 2024 •

edited

Loading

andyross left a comment

nashif commented Feb 6, 2024

andyross left a comment

tests: dynamic_thread_stack: Fix multicore and incoherent cache #68311

tests: dynamic_thread_stack: Fix multicore and incoherent cache #68311

Conversation

ceolin commented Jan 30, 2024 • edited by nashif Loading

andyross left a comment

Choose a reason for hiding this comment

andyross Jan 30, 2024

Choose a reason for hiding this comment

ceolin Jan 31, 2024 • edited Loading

Choose a reason for hiding this comment

andyross left a comment • edited Loading

Choose a reason for hiding this comment

ceolin commented Jan 31, 2024

andyross commented Jan 31, 2024

ceolin commented Jan 31, 2024

ceolin commented Jan 31, 2024

nashif Jan 31, 2024

Choose a reason for hiding this comment

ceolin Jan 31, 2024

Choose a reason for hiding this comment

andyross commented Jan 31, 2024

ceolin commented Jan 31, 2024 • edited Loading

andyross commented Feb 1, 2024

ceolin commented Feb 1, 2024

andyross Feb 1, 2024

Choose a reason for hiding this comment

ceolin Feb 1, 2024

Choose a reason for hiding this comment

ceolin commented Feb 6, 2024 • edited Loading

andyross left a comment

Choose a reason for hiding this comment

nashif commented Feb 6, 2024

andyross left a comment

Choose a reason for hiding this comment

ceolin commented Jan 30, 2024 •

edited by nashif

Loading

ceolin Jan 31, 2024 •

edited

Loading

andyross left a comment •

edited

Loading

ceolin commented Jan 31, 2024 •

edited

Loading

ceolin commented Feb 6, 2024 •

edited

Loading