GitHub - lucidrains/native-sparse-attention-pytorch: Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper

Native Sparse Attention (wip)

Implementation of the sparse attention pattern proposed by the Deepseek team in their Native Sparse Attention paper

Install

$ pip install native-sparse-attention-pytorch

Usage

import torch
from native_sparse_attention_pytorch import SparseAttention

attn = SparseAttention(
    dim = 512,
    dim_head = 64,
    heads = 8,
    sliding_window_size = 2,
    compress_block_size = 4,
    selection_block_size = 4,
    num_selected_blocks = 2
)

tokens = torch.randn(2, 31, 512)

attended = attn(tokens)

assert tokens.shape == attended.shape

Example

Enwik8 language modeling

$ pip install .[examples]

Then

$ python train.py

To record some of your experiments, just invoke wandb login first before modifying the training script

Citations

@inproceedings{Yuan2025NativeSA,
    title   = {Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention},
    author  = {Jingyang Yuan and Huazuo Gao and Damai Dai and Junyu Luo and Liang Zhao and Zhengyan Zhang and Zhenda Xie and Y. X. Wei and Lean Wang and Zhiping Xiao and Yuqing Wang and Chong Ruan and Ming Zhang and Wenfeng Liang and Wangding Zeng},
    year    = {2025},
    url     = {https://api.semanticscholar.org/CorpusID:276408911}
}

@inproceedings{Keles2022OnTC,
    title   = {On The Computational Complexity of Self-Attention},
    author  = {Feyza Duman Keles and Pruthuvi Maheshakya Wijewardena and Chinmay Hegde},
    booktitle = {International Conference on Algorithmic Learning Theory},
    year    = {2022},
    url     = {https://api.semanticscholar.org/CorpusID:252198880}
}

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.github/workflows		.github/workflows
data		data
native_sparse_attention_pytorch		native_sparse_attention_pytorch
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fig2.png		fig2.png
pyproject.toml		pyproject.toml
test_flex_masks.py		test_flex_masks.py
test_triton_nsa.py		test_triton_nsa.py
train.py		train.py
train_triton_nsa.py		train_triton_nsa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Native Sparse Attention (wip)

Install

Usage

Example

Citations

About

Releases 47

Packages

Contributors 2

Languages

License

lucidrains/native-sparse-attention-pytorch

Folders and files

Latest commit

History

Repository files navigation

Native Sparse Attention (wip)

Install

Usage

Example

Citations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 47

Packages 0

Contributors 2

Languages

Packages