Skip to content

Commit

Permalink
Turn off 2:4 sparse compression until supported in vllm (#1092)
Browse files Browse the repository at this point in the history
This PR temporarily disables the newly added Sparse24 compression
feature in example script, as support for this feature is not yet
available in vLLM.

Support for Sparse24 compression is being added in vLLM via [this
PR](vllm-project/vllm#12097). Once that PR is
merged, this change will be reverted to re-enable the feature.

Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
rahul-tuli authored Jan 23, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
1 parent e48d9db commit 7610854
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py
Original file line number Diff line number Diff line change
@@ -116,5 +116,7 @@ def get_recipe(fp8_enabled):
print("==========================================\n")

# Save compressed model and tokenizer
model.save_pretrained(save_dir, save_compressed=args.fp8)
model.save_pretrained(
save_dir, save_compressed=args.fp8, disable_sparse_compression=True
)
tokenizer.save_pretrained(save_dir)

0 comments on commit 7610854

Please sign in to comment.