Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix duplicate kernel naming in reduce-then-scan kernels #2040

Merged
merged 3 commits into from
Feb 3, 2025

Conversation

mmichel11
Copy link
Contributor

@mmichel11 mmichel11 commented Feb 3, 2025

Description

#2031 reverted the change in #1997 which always passed execution policies through reduce-then-scan as const l-value references. After this, we began to see duplicate kernel name compilation errors in scan_2.pass.

This observed bug is caused by the fact that the execution policy is passed to the submitter function objects as a forwarding reference in the reduce-then-scan submitters. This means that even though the kernel names have been properly generated to be "unique", different cv / ref qualifiers of the execution policy will lead to separate function template instantiations and will be submitting separate kernels as a result. This subtle issue results in duplicate kernel names even if the kernels themselves are logically the same as the same name is used throughout.

To fix this, I have switched the function call operators of the submitters to accept a sycl::queue as this is all that is needed at this level.

Further Details

Here is a minimal reproducer of the underlying issue to go alongside the explanation:

#include <sycl/sycl.hpp>
#include <utility>

template <typename KernelName, typename Dummy>
void foo(Dummy&&, sycl::queue q)
{
    q.submit([](sycl::handler& cgh){
        cgh.single_task<KernelName>([] {
            sycl::ext::oneapi::experimental::printf("hello world\n");
        });
    }).wait();
}

struct dummy
{
};

class kernel;

int main()
{
    sycl::queue q;
    dummy d;
    
    // We will see a compiler error due to duplicate kernel names.
    foo<kernel>(d, q);
    foo<kernel>(std::move(d), q);
}

Signed-off-by: Matthew Michel <matthew.michel@intel.com>
@danhoeflinger
Copy link
Contributor

danhoeflinger commented Feb 3, 2025

Wow this is very sneaky, and also affects ALL submitters which accept forwarding references (not exclusive to policies).

danhoeflinger
danhoeflinger previously approved these changes Feb 3, 2025
Copy link
Contributor

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think we need to do an audit of all submitters to look for other problems where we might be compiling more kernels than we intend by accepting forwarding references.

@SergeyKopienko
Copy link
Contributor

As far as I remember we never fixed these problems by this way. More standard practice - to pass execution policy into submitters. What special things we have in modified submitters for use this approach?

@SergeyKopienko
Copy link
Contributor

"To fix this, I have switched the function call operators of the submitters to accept a sycl::queue as this is all that is needed at this level."

  • I think at this moment your approach is acceptable only in this one submitter. I think in other cases we need have execution policy too.

@mmichel11
Copy link
Contributor Author

As far as I remember we never fixed these problems by this way. More standard practice - to pass execution policy into submitters. What special things we have in modified submitters for use this approach?

The problem I mentioned here exists in every single submitter call. I just filed #2041 which uses reduce as an example. I do not think our testing tries to detect this issue. We just got lucky with the scan_2.pass failure to realize what is happening internally.

@mmichel11
Copy link
Contributor Author

mmichel11 commented Feb 3, 2025

"To fix this, I have switched the function call operators of the submitters to accept a sycl::queue as this is all that is needed at this level."

  • I think at this moment your approach is acceptable only in this one submitter. I think in other cases we need have execution policy too.

It is correct. I have not reviewed the other submitters. This patch is to fix a test issue to unblock internal CI. We need to consider how to best fix this issue in the long-term. One alternative would be to just accept _ExecutionPolicy by value.

@SergeyKopienko
Copy link
Contributor

Thanks, @mmichel11, after offline discussion I understood what happen...

Copy link
Contributor

@rarutyun rarutyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this fix is correct, given that how we deal with the same type of error in other places.

The thing is with the proposed change we create an inconsistency precedent of how we handle kernel name in different places of the code. I don't see a good reason for that.

So, for now I would recommend to add ExecutionPolicy as a part of the name for kernel_generator, that we use for those kernel names.

@danhoeflinger
Copy link
Contributor

danhoeflinger commented Feb 3, 2025

I don't think this fix is correct, given that how we deal with the same type of error in other places.

The thing is with the proposed change we create an inconsistency precedent of how we handle kernel name in different places of the code. I don't see a good reason for that.

So, for now I would recommend to add ExecutionPolicy as a part of the name for kernel_generator, that we use for those kernel names.

Do we really want to compile multiple kernels based on what policy reference flavor we pass in to the submitter?
It seems like this should be one call unified call to the submitter, and one kernel compilation. It can be a big overhead otherwise.

@SergeyKopienko
Copy link
Contributor

SergeyKopienko commented Feb 3, 2025

I don't think this fix is correct, given that how we deal with the same type of error in other places.
The thing is with the proposed change we create an inconsistency precedent of how we handle kernel name in different places of the code. I don't see a good reason for that.
So, for now I would recommend to add ExecutionPolicy as a part of the name for kernel_generator, that we use for those kernel names.

Do we really want to compile multiple kernels based on what policy reference type we pass in to the submitter? It seems like this should be one call unified call to the submitter, and one kernel compilation. It can be a big overhead otherwise.

We have a lot of usages of __kernel_name_generator with _ExecutionPolicy in template params.
So it's looks like as common solution for now.

But I agree that problem really has place but we should have some common solution for all cases, not only for this one.

Moreover, probably the usages of __kernel_name_generator without _ExecutionPolicy in template params looks like more error-prone, I think. It works only by real use cases.

Signed-off-by: Matthew Michel <matthew.michel@intel.com>
@mmichel11
Copy link
Contributor Author

mmichel11 commented Feb 3, 2025

@danhoeflinger I talked with @rarutyun here, and we came to the agreement that we should keep a consistent approach in the immediate-term (add _ExecutionPolicy to the kernel name similar to what is done in radix sort). It still has this limitation as you noted, but it is the same for every other oneDPL algorithm.

As a follow-up, we need a library wide discussion on what approach should be used to address the problem in the long-term.

@rarutyun
Copy link
Contributor

rarutyun commented Feb 3, 2025

@danhoeflinger I talked with @rarutyun here, and we came to the agreement that we should keep a consistent approach in the immediate-term (add _ExecutionPolicy to the kernel name similar to what is done in radix sort). It still has this limitation as you noted, but it is the same for every other oneDPL algorithm.

As a follow-up, we need a library wide discussion on what approach should be used to address the problem in the long-term.

@danhoeflinger, Yes. I agree with the problem. Let's fix it library-wide. Applying fixes here and there without having a oneDPL strategy could potentially hide the problem. So, +1 to all what Matt said.

Copy link
Contributor

@rarutyun rarutyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fixes the problem (I assume you've checked that, and it does) fill free to merge as soon as CI passes.

Thanks.

@danhoeflinger
Copy link
Contributor

OK, no objections from me as long as we have a plan to fix this on the whole.

@mmichel11 mmichel11 merged commit a045eac into main Feb 3, 2025
22 checks passed
@mmichel11 mmichel11 deleted the dev/mmichel11/fix_duplicate_kernel_names_scan_2 branch February 3, 2025 20:28
Copy link
Contributor

@SergeyKopienko SergeyKopienko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

mmichel11 added a commit that referenced this pull request Feb 6, 2025
…iler optimization detection (#2046)

Reverts #2040 and instead uses the `__OPTIMIZE__` macro defined by clang-based compilers to detect -O0 compilation and compile reduce-then-scan paths with a sub-group size of 16 to work around hardware bugs on older integrated graphics architectures. This avoids the performance impact of the kernel bundle approach.
---------

Signed-off-by: Matthew Michel <matthew.michel@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants