Add NVIDIA GPU runners and run CUDA tests again #18814
Labels
codegen/nvvm
NVVM code generation compiler backend
codegen/spirv
SPIR-V code generation compiler backend
hal/cuda
Runtime CUDA HAL backend
hal/vulkan
Runtime Vulkan GPU HAL backend
infrastructure
Relating to build systems, CI, or testing
We used to have some Linux runners with NVIDIA T4 and A100 GPUs on the GCP runner cluster. Our new Azure runner cluster currently only has Linux CPU runners.
We have unit tests and larger test suites using the CUDA and Vulkan HAL. None of these are particularly CPU heavy, so we could get by with a 4 or 8 core CPU and an attached GPU, if such a configuration is available.
Target should be presubmit (
pull_request
andpush
events), with a load of around 50-100 (up to 400) runs per day.Can start with a single GPU type, but eventually we could also run tests across a wide range of data center and consumer cards nightly, like T4, A100, H100, 1080, 2080, etc. We will also want to run benchmarks eventually, which will need persistent runners with cached model weights and some tuned hardware/driver settings.
The text was updated successfully, but these errors were encountered: