-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pass KernelOps into SharedMemoryCache constructor #1531
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some comments on the PR, but the main issue I have is with
1.) The lack of documentation regarding using KernelOps
. E.g., how does someone who is writing a new kernel that uses shared memory know what to do. We need a extensive wiki page describing this.
2.) How KernelOps
will be tested in CI without any SYCL target. How can we ensure that KernelOps
are developed correctly?
We can discuss in our call.
@@ -303,7 +304,7 @@ namespace quda { | |||
#pragma unroll | |||
for (int s = 0; s < uvSpin; s++) { | |||
if constexpr (Arg::compute_max) { | |||
uv_max = fmax(UV[s].abs_max(), uv_max); | |||
uv_max = max(UV[s].abs_max(), uv_max); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file uses a mixed of quda::max
and max
. Can you make it consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quda::max
was needed in some places to avoid conflict with local variables named max
.
I've renamed the local variables and replaced quda::max
with just max
.
768f37b
@@ -205,16 +221,19 @@ namespace quda { | |||
int idx1 = (((x[3] * X[2] + x[2]) * X[1] + x[1]) * X[0] + x[0]) >> 1; | |||
Link link1 = arg.u(mu, idx1, 1 - parity); | |||
|
|||
switch (Arg::type) { | |||
if constexpr (Arg::type == 3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change here from switch
to if
? The former is more logical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to be constexpr
now since the different routines being called in each case have different variants of the SharedMemoryCache
. This corresponding KernelOps
are selected here https://github.com/lattice/quda/pull/1531/files#diff-aa538a7e33ac26c68858653fe4c25222aae7dc96f95459c62bf86ea5acd254b4R135 at compile time. Due to the compile-time consistency checking of the SharedMemoryCache
instantiation, we can't compile the branches that don't have the correct KernelOps
even if they are never called.
include/gauge_fix_ovr_hit_devf.cuh
Outdated
SharedMemoryCache<Float> cache; | ||
auto elems = cache.data(); | ||
SharedMemoryCache<array<Float, 4>> cache(ftor); | ||
Float *elems = &(*cache.data())[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not liking this change. Why is this pathology necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there was no need to use an inner array. I've made it similar to the others now.
99632a5
I've added some documentation here As we discussed on the call, the |
Documentation is exactly what I wanted, thanks @jcosborn. |
This design looks good to me given the requirements, thanks @jcosborn. The wiki page is greatly appreciated. I don't "like" the My only request is for the wiki page; I think the links to files + line-numbers should be modified to be permalinks to a specific commit as opposed to just |
Thanks for the review. Making the wiki use permalinks is a good idea. Once this is merged I'll update it with permalinks to develop. |
This will allow the SYCL target to allocate shared memory at kernel launch and pass the pointer in through the KernelOps structure.