Skip to content

Commit

Permalink
Sparse 2:4 + FP8 Quantization e2e vLLM tests (#1073)
Browse files Browse the repository at this point in the history
SUMMARY:
- Add 2:4 Sparsity + FP8 Quantization e2e tests

TEST PLAN:
- Models produced by the tests:
nm-testing/TinyLlama-1.1B-Chat-v1.0-sparse2of4_fp8_dynamic-e2e
nm-testing/TinyLlama-1.1B-Chat-v1.0-sparse2of4_only-e2e
- Verified to run e2e with vLLM

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
  • Loading branch information
dsikka authored and kylesayrs committed Jan 15, 2025
1 parent 83a3828 commit f9e4d7c
Show file tree
Hide file tree
Showing 5 changed files with 51 additions and 1 deletion.
7 changes: 7 additions & 0 deletions tests/e2e/vLLM/configs/sparse2of4_fp8_dynamic.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
cadence: "nightly"
test_type: "regression"
model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
recipe: tests/e2e/vLLM/recipes/Sparse_2of4/recipe_sparse_2of4_fp8_dynamic.yaml
scheme: sparse2of4_fp8_dynamic
dataset_id: HuggingFaceH4/ultrachat_200k
dataset_split: train_sft
8 changes: 8 additions & 0 deletions tests/e2e/vLLM/configs/sparse_24.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
cadence: "nightly"
test_type: "regression"
model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
recipe: tests/e2e/vLLM/recipes/Sparse_2of4/recipe_sparse_2of4.yaml
scheme: sparse2of4_only
dataset_id: HuggingFaceH4/ultrachat_200k
dataset_split: train_sft
save_compressed: False
6 changes: 6 additions & 0 deletions tests/e2e/vLLM/recipes/Sparse_2of4/recipe_sparse_2of4.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
sparsity_stage:
sparsity_modifiers:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
sparsity_stage:
run_type: oneshot
sparsity_modifiers:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
quantization_stage:
run_type: oneshot
quantization_modifiers:
ConstantPruningModifier:
targets: [
're:.*q_proj.weight',
're:.*k_proj.weight',
're:.*v_proj.weight',
're:.*o_proj.weight',
're:.*gate_proj.weight',
're:.*up_proj.weight',
're:.*down_proj.weight',
]
start: 0
QuantizationModifier:
targets: ["Linear"]
ignore: ["lm_head"]
scheme: "FP8_DYNAMIC"
6 changes: 5 additions & 1 deletion tests/e2e/vLLM/test_vllm.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
vllm_installed = False
logger.warning("vllm is not installed. This test will be skipped")


HF_MODEL_HUB_NAME = "nm-testing"

TEST_DATA_FILE = os.environ.get("TEST_DATA_FILE", "")
Expand Down Expand Up @@ -74,6 +75,7 @@ def set_up(self):
self.recipe = eval_config.get("recipe")
self.quant_type = eval_config.get("quant_type")
self.save_dir = eval_config.get("save_dir")
self.save_compressed = eval_config.get("save_compressed", True)

logger.info("========== RUNNING ==============")
logger.info(self.scheme)
Expand Down Expand Up @@ -113,7 +115,9 @@ def test_vllm(self):
self._check_session_contains_recipe()

logger.info("================= SAVING TO DISK ======================")
oneshot_model.save_pretrained(self.save_dir)
oneshot_model.save_pretrained(
self.save_dir, save_compressed=self.save_compressed
)
tokenizer.save_pretrained(self.save_dir)
recipe_path = os.path.join(self.save_dir, "recipe.yaml")

Expand Down

0 comments on commit f9e4d7c

Please sign in to comment.