Turn off 2:4 sparse compression until supported in vllm (#1092)

This PR temporarily disables the newly added Sparse24 compression feature in example script, as support for this feature is not yet available in vLLM. Support for Sparse24 compression is being added in vLLM via [this PR](vllm-project/vllm#12097). Once that PR is merged, this change will be reverted to re-enable the feature. Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
vllm-project · Jan 23, 2025 · 7610854 · 7610854
1 parent e48d9db
commit 7610854
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py b/examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py
@@ -116,5 +116,7 @@ def get_recipe(fp8_enabled):
 print("==========================================\n")
 
 # Save compressed model and tokenizer
-model.save_pretrained(save_dir, save_compressed=args.fp8)
+model.save_pretrained(
+    save_dir, save_compressed=args.fp8, disable_sparse_compression=True
+)
 tokenizer.save_pretrained(save_dir)