CUDA/HIP header-only library to use vector and low-precision floating-point types (16 bit, 8 bit) in GPU code
performance cpp gpu cuda kernel-tuner hip vectorization floating-point half-precision mixed-precision low-precision bfloat16 header-only-library reduced-precision
-
Updated
Dec 2, 2024 - C++