Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NVIDIA GPU runners and run CUDA tests again #18814

Open
ScottTodd opened this issue Oct 17, 2024 · 0 comments
Open

Add NVIDIA GPU runners and run CUDA tests again #18814

ScottTodd opened this issue Oct 17, 2024 · 0 comments
Assignees
Labels
codegen/nvvm NVVM code generation compiler backend codegen/spirv SPIR-V code generation compiler backend hal/cuda Runtime CUDA HAL backend hal/vulkan Runtime Vulkan GPU HAL backend infrastructure Relating to build systems, CI, or testing

Comments

@ScottTodd
Copy link
Member

We used to have some Linux runners with NVIDIA T4 and A100 GPUs on the GCP runner cluster. Our new Azure runner cluster currently only has Linux CPU runners.

We have unit tests and larger test suites using the CUDA and Vulkan HAL. None of these are particularly CPU heavy, so we could get by with a 4 or 8 core CPU and an attached GPU, if such a configuration is available.

Target should be presubmit (pull_request and push events), with a load of around 50-100 (up to 400) runs per day.

Can start with a single GPU type, but eventually we could also run tests across a wide range of data center and consumer cards nightly, like T4, A100, H100, 1080, 2080, etc. We will also want to run benchmarks eventually, which will need persistent runners with cached model weights and some tuned hardware/driver settings.

@ScottTodd ScottTodd added codegen/nvvm NVVM code generation compiler backend codegen/spirv SPIR-V code generation compiler backend hal/cuda Runtime CUDA HAL backend hal/vulkan Runtime Vulkan GPU HAL backend infrastructure Relating to build systems, CI, or testing labels Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen/nvvm NVVM code generation compiler backend codegen/spirv SPIR-V code generation compiler backend hal/cuda Runtime CUDA HAL backend hal/vulkan Runtime Vulkan GPU HAL backend infrastructure Relating to build systems, CI, or testing
Projects
None yet
Development

No branches or pull requests

2 participants