pass KernelOps into SharedMemoryCache constructor #1531

jcosborn · 2025-01-04T23:00:24Z

This will allow the SYCL target to allocate shared memory at kernel launch and pass the pointer in through the KernelOps structure.

maddyscientist

I've left some comments on the PR, but the main issue I have is with
1.) The lack of documentation regarding using KernelOps. E.g., how does someone who is writing a new kernel that uses shared memory know what to do. We need a extensive wiki page describing this.
2.) How KernelOps will be tested in CI without any SYCL target. How can we ensure that KernelOps are developed correctly?

We can discuss in our call.

include/gauge_fix_ovr_hit_devf.cuh

maddyscientist · 2025-01-15T01:08:52Z

include/kernels/coarse_op_kernel.cuh

@@ -303,7 +304,7 @@ namespace quda {
 #pragma unroll
    for (int s = 0; s < uvSpin; s++) {
      if constexpr (Arg::compute_max) {
-        uv_max = fmax(UV[s].abs_max(), uv_max);
+        uv_max = max(UV[s].abs_max(), uv_max);


This file uses a mixed of quda::max and max. Can you make it consistent?

quda::max was needed in some places to avoid conflict with local variables named max.
I've renamed the local variables and replaced quda::max with just max.
768f37b

maddyscientist · 2025-01-15T01:13:34Z

include/kernels/gauge_fix_ovr.cuh

@@ -205,16 +221,19 @@ namespace quda {
        int idx1 = (((x[3] * X[2] + x[2]) * X[1] + x[1]) * X[0] + x[0]) >> 1;
        Link link1 = arg.u(mu, idx1, 1 - parity);

-        switch (Arg::type) {
+        if constexpr (Arg::type == 3) {


Why the change here from switch to if ? The former is more logical.

This has to be constexpr now since the different routines being called in each case have different variants of the SharedMemoryCache. This corresponding KernelOps are selected here https://github.com/lattice/quda/pull/1531/files#diff-aa538a7e33ac26c68858653fe4c25222aae7dc96f95459c62bf86ea5acd254b4R135 at compile time. Due to the compile-time consistency checking of the SharedMemoryCache instantiation, we can't compile the branches that don't have the correct KernelOps even if they are never called.

maddyscientist · 2025-01-15T01:20:01Z

include/gauge_fix_ovr_hit_devf.cuh

-    SharedMemoryCache<Float> cache;
-    auto elems = cache.data();
+    SharedMemoryCache<array<Float, 4>> cache(ftor);
+    Float *elems = &(*cache.data())[0];


I am not liking this change. Why is this pathology necessary?

Yes, there was no need to use an inner array. I've made it similar to the others now.
99632a5

include/kernels/block_transpose.cuh

jcosborn · 2025-01-16T16:15:55Z

I've left some comments on the PR, but the main issue I have is with 1.) The lack of documentation regarding using KernelOps. E.g., how does someone who is writing a new kernel that uses shared memory know what to do. We need a extensive wiki page describing this. 2.) How KernelOps will be tested in CI without any SYCL target. How can we ensure that KernelOps are developed correctly?

We can discuss in our call.

I've added some documentation here
https://github.com/lattice/quda/wiki/KernelOps-usage
It will need to be updated as more pieces are merged in from the SYCL branch.
Let me know if anything isn't clear so far.

As we discussed on the call, the KernelOps handling should be fairly robust (once fully merged), so that instantiating tracked operations will require passing in a kernel functor that inherited from the appropriate KernelOps type. This type checking happens at compile time.

maddyscientist · 2025-01-16T17:20:18Z

I've added some documentation here https://github.com/lattice/quda/wiki/KernelOps-usage
It will need to be updated as more pieces are merged in from the SYCL branch. Let me know if anything isn't clear so far.

Documentation is exactly what I wanted, thanks @jcosborn.

… others

weinbe2 · 2025-01-30T04:15:03Z

This design looks good to me given the requirements, thanks @jcosborn. The wiki page is greatly appreciated.

I don't "like" the decltype(*this) that needed to be used in one of the coarse-related file, but since it's a one-off in a single file I'll let it go. I understand why template deduction couldn't be used without a bunch of unnecessary function argument rearranging.

My only request is for the wiki page; I think the links to files + line-numbers should be modified to be permalinks to a specific commit as opposed to just develop. This isn't a blocking requirement for this PR; if anything it can be updated with the commit id of the merge.

jcosborn · 2025-01-30T20:07:10Z

Thanks for the review. Making the wiki use permalinks is a good idea. Once this is merged I'll update it with permalinks to develop.

pass KernelOps into SharedMemoryCache constructor

bc48268

jcosborn requested a review from a team as a code owner January 4, 2025 23:00

maddyscientist requested changes Jan 15, 2025

View reviewed changes

use auto instead of an explicit type

4c190e3

jcosborn added 8 commits January 16, 2025 11:25

attempt to fix CI

00e5f08

try fixing CI again

82ed89b

fix CI

a1ff6e8

change GaugeFixHit_NoAtomicAdd SharedMemoryCache to be similar to the…

99632a5

… others

replace quda::max with max and rename conflicting local variables

768f37b

reuse color_spinor_t definition

4cc7964

update CI

af110f6

Merge branch 'develop' into feature/sycl-merge

57a0ef3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pass KernelOps into SharedMemoryCache constructor #1531

pass KernelOps into SharedMemoryCache constructor #1531

jcosborn commented Jan 4, 2025

maddyscientist left a comment

maddyscientist Jan 15, 2025

jcosborn Jan 16, 2025

maddyscientist Jan 15, 2025

jcosborn Jan 16, 2025

maddyscientist Jan 15, 2025

jcosborn Jan 16, 2025

jcosborn commented Jan 16, 2025

maddyscientist commented Jan 16, 2025 •

edited

Loading

weinbe2 commented Jan 30, 2025

jcosborn commented Jan 30, 2025

pass KernelOps into SharedMemoryCache constructor #1531

Are you sure you want to change the base?

pass KernelOps into SharedMemoryCache constructor #1531

Conversation

jcosborn commented Jan 4, 2025

maddyscientist left a comment

Choose a reason for hiding this comment

maddyscientist Jan 15, 2025

Choose a reason for hiding this comment

jcosborn Jan 16, 2025

Choose a reason for hiding this comment

maddyscientist Jan 15, 2025

Choose a reason for hiding this comment

jcosborn Jan 16, 2025

Choose a reason for hiding this comment

maddyscientist Jan 15, 2025

Choose a reason for hiding this comment

jcosborn Jan 16, 2025

Choose a reason for hiding this comment

jcosborn commented Jan 16, 2025

maddyscientist commented Jan 16, 2025 • edited Loading

weinbe2 commented Jan 30, 2025

jcosborn commented Jan 30, 2025

maddyscientist commented Jan 16, 2025 •

edited

Loading