GitHub - ita9naiwa/attention-impl: attention implemenation

CUDA torch functions for LLM

For study purpose

implemented attentions

Naive Attention
Attention with KV
Attention with non-contagious memory
Single Query Attention with non-contagious KV cache (PagedAttention with block size 1)
Multi Query Attention with non-contagious KV cache (for Speculative Decoding)
Rotary Embedding

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
tests		tests
.gitignore		.gitignore
README.md		README.md
attention_kernel.cu		attention_kernel.cu
fused_matmul_kernel.cu		fused_matmul_kernel.cu
norm_kernel.cu		norm_kernel.cu
ops.h		ops.h
packed_attention_kernel.cu		packed_attention_kernel.cu
pybind.cpp		pybind.cpp
rotary_embedding.cu		rotary_embedding.cu
setup.py		setup.py
util.cuh		util.cuh