SpargeAttention: A training-free sparse attention that can accelerate any model inference.
attention quantization mlsys inference-acceleration ai-infra vision-transformer sparse-attention llm sageattention
-
Updated
Feb 25, 2025 - Cuda