-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip test when atomic operations are not supported on GPU. #7117
Conversation
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
Btw, let me provide a bit more background here: Recently, PyTorch introduces this commit: pytorch/pytorch@fc5fda1, which changes the behavior of dgl/tests/python/common/ops/test_ops.py Lines 410 to 415 in 8e6cbd6
Sadly, even if we know that A100 is the first gen GPU that supports bf16 operations [1] [2], we could not use this API( torch.cuda.is_bf16_supported() ) to query for the bf16 operation support any more. cc. @TristonC @nv-dlasalle @frozenbugs
As of now, this commit has not been populated to any pytorch release branch yet. So, it can only be reproduced from a nightly PyTorch build. |
@chang-l to clarify the issue, because DGL has some operations only implemented via atomics, we cannot support these GPUs, but pytorch has non-atomic versions of its operators and thus can implement operations for BF16 on these GPUs? |
@nv-dlasalle Then, do we need to update https://github.com/dmlc/dgl/blob/master/src/array/cuda/atomic.cuh to update the assertions? |
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
I am okay with the current fix in this PR. |
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
|
Description
Following tests
are failing for
dtype=torch.bfloat16
with the error message:Checklist
Please feel free to remove inapplicable items for your PR.
Changes