Skip to content

Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper

License

Notifications You must be signed in to change notification settings

lucidrains/native-sparse-attention-pytorch

Repository files navigation

Native Sparse Attention (wip)

Implementation of the sparse attention pattern proposed by the Deepseek team in their Native Sparse Attention paper

Install

$ pip install native-sparse-attention-pytorch

Usage

import torch
from native_sparse_attention_pytorch import SparseAttention

attn = SparseAttention(
    dim = 512,
    dim_head = 64,
    heads = 8,
    sliding_window_size = 2,
    compress_block_size = 4,
    selection_block_size = 4,
    num_selected_blocks = 2
)

tokens = torch.randn(2, 31, 512)

attended = attn(tokens)

assert tokens.shape == attended.shape

Example

Enwik8 language modeling

$ pip install .[examples]

Then

$ python train.py

To record some of your experiments, just invoke wandb login first before modifying the training script

Citations

@inproceedings{Yuan2025NativeSA,
    title   = {Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention},
    author  = {Jingyang Yuan and Huazuo Gao and Damai Dai and Junyu Luo and Liang Zhao and Zhengyan Zhang and Zhenda Xie and Y. X. Wei and Lean Wang and Zhiping Xiao and Yuqing Wang and Chong Ruan and Ming Zhang and Wenfeng Liang and Wangding Zeng},
    year    = {2025},
    url     = {https://api.semanticscholar.org/CorpusID:276408911}
}
@inproceedings{Keles2022OnTC,
    title   = {On The Computational Complexity of Self-Attention},
    author  = {Feyza Duman Keles and Pruthuvi Maheshakya Wijewardena and Chinmay Hegde},
    booktitle = {International Conference on Algorithmic Learning Theory},
    year    = {2022},
    url     = {https://api.semanticscholar.org/CorpusID:252198880}
}

About

Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages