Skip to content

Commit

Permalink
Add Deepseek r1 distill llama models (#1922)
Browse files Browse the repository at this point in the history
Co-authored-by: Ali Alshaarawy <ali.al-shaarawy@cerebras.net>
  • Loading branch information
ali-alshaar7 and Ali Alshaarawy authored Jan 27, 2025
1 parent e7338f6 commit 0031d55
Show file tree
Hide file tree
Showing 4 changed files with 53 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ Every model is written from scratch to maximize performance and remove layers of
| Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | [Hui, Binyuan et al. 2024](https://arxiv.org/abs/2409.12186) |
| Qwen2.5 Math | 1.5B, 7B, 72B | Alibaba Group | [An, Yang et al. 2024](https://arxiv.org/abs/2409.12122) |
| QwQ | 32B | Alibaba Group | [Qwen Team 2024](https://qwenlm.github.io/blog/qwq-32b-preview/) |
| R1 Distll Llama | 8B, 70B | DeepSeek AI | [DeepSeek AI 2025](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf) |
| SmolLM2 | 135M, 360M, 1.7B | Hugging Face | [Hugging Face 2024](https://github.com/huggingface/smollm) |
| Salamandra | 2B, 7B | Barcelona Supercomputing Centre | [BSC-LTC 2024](https://github.com/BSC-LTC/salamandra) |
| StableCode | 3B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding) |
Expand Down
47 changes: 47 additions & 0 deletions litgpt/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2400,5 +2400,52 @@ def norm_class(self) -> Type:
copy["hf_config"]["name"] = c["hf_config"]["name"].format(kind)
configs.append(copy)

###############
# DeepSeek R1 Distill
###############

r1_distill_llama = [
# https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/blob/main/config.json
dict(
name="R1-Distill-Llama-8B",
hf_config=dict(org="deepseek-ai", name="DeepSeek-R1-Distill-Llama-8B"),
block_size=131072,
vocab_size=128000,
padded_vocab_size=128256,
n_layer=32,
n_head=32,
n_query_groups=8,
rotary_percentage=1.0,
parallel_residual=False,
bias=False,
norm_class_name="RMSNorm",
mlp_class_name="LLaMAMLP",
intermediate_size=14336,
rope_base=500000,
rope_adjustments=dict(factor=8.0, low_freq_factor=1.0, high_freq_factor=4.0, original_max_seq_len=8192)
),
# https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/config.json
dict(
name="R1-Distill-Llama-70B",
hf_config=dict(org="deepseek-ai", name="DeepSeek-R1-Distill-Llama-70B"),
block_size=131072,
vocab_size=128000,
padded_vocab_size=128256,
n_layer=80,
n_head=64,
n_embd=8192,
n_query_groups=8,
rotary_percentage=1.0,
parallel_residual=False,
bias=False,
norm_class_name="RMSNorm",
mlp_class_name="LLaMAMLP",
intermediate_size=28672,
rope_base=500000,
rope_adjustments=dict(factor=8.0, low_freq_factor=1.0, high_freq_factor=4.0, original_max_seq_len=8192)
),
]

configs.extend(r1_distill_llama)

name_to_config = {config["name"]: config for config in configs}
2 changes: 2 additions & 0 deletions tests/test_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,8 @@ def test_against_original_open_llama_3b(device, dtype):
{"name": "Llama-3.2-1B"},
{"name": "Llama-3.2-3B"},
{"name": "Llama-3.3-70B-Instruct"},
{"name": "R1-Distill-Llama-8B"},
{"name": "R1-Distill-Llama-70B"},
],
)
@pytest.mark.parametrize(
Expand Down
3 changes: 3 additions & 0 deletions tutorials/download_model_weights.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ LitGPT supports a variety of LLM architectures with publicly available weights.
| Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | [Hui, Binyuan et al. 2024](https://arxiv.org/abs/2409.12186) |
| Qwen2.5 Math | 1.5B, 7B, 72B | Alibaba Group | [An, Yang et al. 2024](https://arxiv.org/abs/2409.12122) |
| QwQ | 32B | Alibaba Group | [Qwen Team 2024](https://qwenlm.github.io/blog/qwq-32b-preview/) |
| R1 Distll Llama | 8B, 70B | DeepSeek AI | [DeepSeek AI 2025](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf) |
| RedPajama-INCITE | 3B, 7B | Together | [Together 2023](https://together.ai/blog/redpajama-models-v1) |
| SmolLM2 | 135M, 360M, 1.7B | Hugging Face | [Hugging Face 2024](https://github.com/huggingface/smollm) |
| StableCode | 3B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding) |
Expand Down Expand Up @@ -87,6 +88,8 @@ codellama/CodeLlama-7b-Python-hf
databricks/dolly-v2-12b
databricks/dolly-v2-3b
databricks/dolly-v2-7b
deepseek-ai/DeepSeek-R1-Distill-Llama-8B
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
EleutherAI/pythia-1.4b
EleutherAI/pythia-1.4b-deduped
EleutherAI/pythia-12b
Expand Down

0 comments on commit 0031d55

Please sign in to comment.