Add FP8 Support #4

Satrat · 2024-06-24T22:12:06Z

Migrating this PR from sparseml: neuralmagic/sparseml#2306

add code for inferring compression format from model
e2e fp8 example
e2e tests

robertgshaw2-redhat · 2024-06-24T22:47:51Z

tests/llmcompressor/transformers/compression/recipes/new_quant_fp8.yaml

@@ -0,0 +1,19 @@
+quant_stage:
+  quant_modifiers:
+    GPTQModifier:


nice that gptq works out of the box

but note that we shouldnt need this for fp8

so in our examples its should just be quantization modifier

perhaps we should have a test that matches the expected user flow for fp8 as well?

Can do as follow up

Gotcha, I just updated the PR with this change (both in the example and the integration test). However the nice scheme/target UX is only for GPTQModifier so it made the example a bit messier. I'll move the scheme parsing to QM in a follow up PR

examples/quantization/llama7b_fp8_quantization.py

* Apply quantization config implementation * add TODO * integrate full lifecycle support, QuantizationStatus updates, add tinyllama test * fix comment

add fp8 changes from sparseml

f674365

Satrat requested review from bfineran and robertgshaw2-redhat June 24, 2024 22:13

bfineran approved these changes Jun 24, 2024

View reviewed changes

robertgshaw2-redhat reviewed Jun 24, 2024

View reviewed changes

examples/quantization/llama7b_fp8_quantization.py Outdated Show resolved Hide resolved

update fp8 to use base quant modifier

fab9363

Satrat requested a review from robertgshaw2-redhat June 25, 2024 16:19

Satrat merged commit dcca2be into main Jun 25, 2024
8 of 12 checks passed

Satrat deleted the fp8_support branch June 25, 2024 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FP8 Support #4

Add FP8 Support #4

Satrat commented Jun 24, 2024 •

edited

Loading

robertgshaw2-redhat Jun 24, 2024 •

edited

Loading

robertgshaw2-redhat Jun 25, 2024

Satrat Jun 25, 2024

Add FP8 Support #4

Add FP8 Support #4

Conversation

Satrat commented Jun 24, 2024 • edited Loading

robertgshaw2-redhat Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

robertgshaw2-redhat Jun 25, 2024

Choose a reason for hiding this comment

Satrat Jun 25, 2024

Choose a reason for hiding this comment

Satrat commented Jun 24, 2024 •

edited

Loading

robertgshaw2-redhat Jun 24, 2024 •

edited

Loading