Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
-
Updated
Sep 8, 2024 - Cuda
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Multiple GEMM operators are constructed with cutlass to support LLM inference.
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
c lib for calculating matrices
Add a description, image, and links to the matrix-multiply topic page so that developers can more easily learn about it.
To associate your repository with the matrix-multiply topic, visit your repo's landing page and select "manage topics."