-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(cmake) Fix cuda arch selection #1091
(cmake) Fix cuda arch selection #1091
Conversation
… sure we build only selected cubins and only ptx for latest capability
@@ -125,7 +125,7 @@ jobs: | |||
docker run --platform linux/$build_arch -i -w /src -v $PWD:/src $image sh -c \ | |||
"apt-get update \ | |||
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends cmake \ | |||
&& cmake -DCOMPUTE_BACKEND=cuda -DNO_CUBLASLT=${NO_CUBLASLT} . \ | |||
&& cmake -DCOMPUTE_BACKEND=cuda -DCOMPUTE_CAPABILITY=\"50;52;60;61;70;75;80;86;89;90\" -DNO_CUBLASLT=${NO_CUBLASLT} . \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I selected these for the CI to bring parity with what the old Makefile
was selecting. When we get around to doing tests on GPU instances, we may want to narrow this down to "native" so we don't waste so much build time.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Thanks @matthewdouglas! Great work. @rickardp @wkpark @akx Could one of you please do a first review of this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I left a few comments / questions
@@ -82,6 +83,31 @@ if(BUILD_CUDA) | |||
message(FATAL_ERROR "CUDA Version > 12 is not supported") | |||
endif() | |||
|
|||
# CMake < 3.23.0 does not define CMAKE_CUDA_ARCHITECTURES_ALL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to just kill this code and bump cmake requirement to 3.23.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought about that, but considering Ubuntu 22.04 LTS as a popular host system, the default is 3.22.1. I believe this is what is on the Docker images that we're using too. I didn't want this PR to change too much outside of the scope of the issues I saw.
I know we could install a newer cmake on the Docker builds though, and GitHub's latest runner images all have CMake 3.28.x on them. So maybe a separate PR to bump up the minimum?
# Ensure we build the PTX for the latest version. | ||
# This is similar to the "all" and "all-major" options in CMake >= 23. | ||
# TODO: Consider bumping CMake requirement and using CMAKE_CUDA_ARCHITECTURES=[all | native] by default | ||
list(REMOVE_DUPLICATES COMPUTE_CAPABILITY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside of removing unsupported archs is that we now remove them silently if the build agent no longer supports them (for example after a CUDA update). Maybe it would be better to fail if specifying an an architecture that is not available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that CI/release builds should be specific and set COMPUTE_CAPABILITY (or CMAKE_CUDA_ARCHITECTURES if moving to use that directly). But otherwise, all we're doing here is adding the virtual architecture to the highest capability selected, e.g. generating both compute_90 and sm_90: arch=compute_90,code=[compute_90,sm_90]
. This is for forward compatibility.
We already have a guard for CUDA >= 13, so that would require manual intervention. We'd find out ahead of time as the architectures to be dropped would be deprecated in 12.x first. That should emit warnings from nvcc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense
753df25
into
bitsandbytes-foundation:main
Thanks @matthewdouglas, this is really helpful! Really glad to have your support with this :) Would you be up to join our Slack channel to be able to coordinate about stuff every now and then? If yes, please send me an email at my firstname at huggingface dot co with the email that I should use for the invite. I'm planning to merge some more workflow related PRs these days and aim to do a release to test PyPi by the end of the week in preparation for a full release next week. |
And thanks @rickardp for the review, very much appreciated as well! |
I noticed a few problems with the way the compute capabilities would be applied when compiling.
Currently
CMAKE_CUDA_ARCHITECTURES_ALL
is used to create a default list. This is defined incmake>=3.23.0
but our minimum requirement is below that,cmake==3.22.1
. The GitHub Actions for the Ubuntu builds are running the 3.21.0 version, and therefore have empty lists:Ubuntu, x86-64, CUDA 12.1.0
However, the Windows builds have a newer version of CMake set up, so it's closer to what we expect:
Windows, x86-64, CUDA 12.1
Besides this issue, there's two others resolved here:
CMAKE_CUDA_ARCHITECTURES
is not user-defined, it's initialized to thenvcc
default. In the case of 12.x, this is adding to the actual flags used when invoking nvcc:--generate-code=arch=compute_52,code=[sm_52]
After these changes, the generated flags look like this (Windows, CUDA 12.3, all capabilities selected)
Note that in a future PR, I'd like to consider replacing the build option
-DCOMPUTE_CAPABILITY=...
with the more canonicalCMAKE_CUDA_ARCHITECTURES
option. That can be a later change stacked after this PR.