Releases · vllm-project/llm-compressor · GitHub

12 Dec 13:25

dhuangnm

v0.3.1 Latest

Latest

What's Changed

BLOOM Default Smoothquant Mappings by @kylesayrs in #906
[SparseAutoModelForCausalLM Deprecation] Feature change by @horheynm in #881
Correct "dyanmic" typo by @kylesayrs in #888
Explicit defaults for QuantizationModifier targets by @kylesayrs in #889
[SparseAutoModelForCausalLM Deprecation] Update examples by @horheynm in #880
Support pack_quantized format for nonuniform mixed-precision by @mgoin in #913
Actually make the run_compressed test useful by @dsikka in #920
Fix for e2e tests by @horheynm in #927
[Bugfix] Correct metrics calculations by @kylesayrs in #878
Update kv_cache example by @dsikka in #921
[1/2] Expand e2e testing to prepare for lm-eval by @dsikka in #922
Update pytest command to capture results to file by @dbarbuzzi in #932
[Bugfix] DisableKVCache Context by @kylesayrs in #834
Add helpful info to the marlin-24 example by @dsikka in #946
Remove requires_torch by @kylesayrs in #949
Remove unused sparseml.export utilities by @kylesayrs in #950
Implement HooksMixin by @kylesayrs in #917
Add LM Eval Testing by @dsikka in #945
update version by @dsikka in #969

Full Changelog: 0.3.0...0.3.1

Contributors

dbarbuzzi, mgoin, and 3 other contributors

Assets 4

13 Nov 05:22

dhuangnm

v0.3.0

What's New in v0.3.0

Key Features and Improvements

GPTQ Quantized-weight Sequential Updating (#177): Introduced an efficient sequential updating mechanism for GPTQ quantization, improving model compression performance and compatibility.
Auto-Infer Mappings for SmoothQuantModifier (#119): Automatically infers mappings based on model architecture, making SmoothQuant easier to apply across various models.
Improved Sparse Compression Usability (#191): Added support for targeted sparse compression with specific ignore rules during inference, allowing for more flexible model configurations.
Generic Wrapper for Any Hugging Face Model (#185): Added wrap_hf_model_class utility, enabling better support and integration for Hugging Face models i.e. not based on AutoModelForCausalLM.
Observer Restructure (#837): Introduced calibration and frozen steps within QuantizationModifier, moving Observers from compressed-tensors to llm-compressor.

Bug Fixes

Fix Tied Tensors Bug (#659)
Observer Initialization in GPTQ Wrapper (#883)
Sparsity Reload Testing (#882)

Documentation

Updated SmoothQuant Tutorial (#115): Expanded SmoothQuant documentation to include detailed mappings for easier implementation.

What's Changed

Fix compresed typo by @kylesayrs in #188
GPTQ Quantized-weight Sequential Updating by @kylesayrs in #177
Add: targets and ignore inference for sparse compression by @rahul-tuli in #191
switch tests from weekly to nightly by @dhuangnm in #658
Compression wrapper abstract methods by @kylesayrs in #170
Explicitly set sequential_update in examples by @kylesayrs in #187
Increase Sparsity Threshold for compressors by @rahul-tuli in #679
Add a generic wrap_hf_model_class utility to support VLMs by @mgoin in #185
Add tests for examples by @dbarbuzzi in #149
Rename to quantization config by @kylesayrs in #730
Implement Missing Modifier Methods by @kylesayrs in #166
Fix 2/4 GPTQ Model Tests by @dsikka in #769
SmoothQuant mappings tutorial by @rahul-tuli in #115
Fix import of ModelCompressor by @rahul-tuli in #776
update test by @dsikka in #773
[Bugfix] Fix saving offloaded state dict by @kylesayrs in #172
Auto-Infer mappings Argument for SmoothQuantModifier Based on Model Architecture by @rahul-tuli in #119
Update workflows/actions by @dbarbuzzi in #774
[Bugfix] Prepare KD Models when Saving by @kylesayrs in #174
Set Sparse compression to save_compressed by @rahul-tuli in #821
Install compressed-tensors after llm-compressor by @dbarbuzzi in #825
Fix test typo by @kylesayrs in #828
Add AutoModelForCausalLM example by @dsikka in #698
[Bugfix] Workaround tied tensors bug by @kylesayrs in #659
Only untie word embeddings by @kylesayrs in #839
Check for config hidden size by @kylesayrs in #840
Use float32 for Hessian dtype by @kylesayrs in #847
GPTQ: Depreciate non-sequential update option by @kylesayrs in #762
Typehint nits by @kylesayrs in #826
[ DOC ] Remove version restrictions in W8A8 exmaple by @miaojinc in #849
Fix inconsistence in example config of 2:4 sparse quantization by @yzlnew in #80
Fix forward function pass call by @dsikka in #845
[Bugfix] Use weight parameter of linear layer by @kylesayrs in #836
[Bugfix] Rename files to remove colons by @kylesayrs in #846
cover all 3.9-3.12 in commit testing by @dhuangnm in #864
Add marlin-24 recipe/configs for e2e testing by @dsikka in #866
[Bugfix] onload during sparsity calculation by @kylesayrs in #862
Fix HFTrainer overloads by @kylesayrs in #869
Support Model Offloading Tied Tensors Patch by @kylesayrs in #872
Add advice about dealing with non-invertable hessians by @kylesayrs in #875
seed commit workflow by @andy-neuma in #877
[Observer Restructure]: Add Observers; Add calibration and frozen steps to QuantizationModifier by @dsikka in #837
Bugfix observer initialization in gptq_wrapper by @rahul-tuli in #883
BugFix: Fix Sparsity Reload Testing by @dsikka in #882
Use custom unique test names for e2e tests by @dbarbuzzi in #892
Revert "Use custom unique test names for e2e tests (#892)" by @dsikka in #893
Move config["testconfig_path"] assignment by @dbarbuzzi in #895
Cap accelerate version to avoid bug by @kylesayrs in #897
Fix observing offloaded weight by @kylesayrs in #896
Update image in README.md by @mgoin in #861
update accelerate version by @kylesayrs in #899
[GPTQ] Iterative Parameter Updating by @kylesayrs in #863
Small fixes for release by @dsikka in #901
use smaller portion of dataset by @dsikka in #902
Update example to not fail hessian inversion by @dsikka in #904
Bump version to 0.3.0 by @dsikka in #907

New Contributors

@miaojinc made their first contribution in #849
@yzlnew made their first contribution in #80
@andy-neuma made their first contribution in #877

Full Changelog: 0.2.0...0.3.0

Contributors

dbarbuzzi, mgoin, and 7 other contributors

Assets 4

23 Sep 22:24

dhuangnm

v0.2.0

What's Changed

Correct Typo in SparseAutoModelForCausalLM docstring by @kylesayrs in #56
Disable Default Bitmask Compression by @Satrat in #60
TRL Example fix by @rahul-tuli in #59
Fix typo by @rahul-tuli in #63
Correct typo by @kylesayrs in #61
correct import in README.md by @zzc0430 in #66
Fix for issue #43 -- starcoder model by @horheynm in #71
Update README.md by @robertgshaw2-neuralmagic in #74
Layer by Layer Sequential GPTQ Updates by @Satrat in #47
[ Docs ] Update main readme by @robertgshaw2-neuralmagic in #77
[ Docs ] gemma2 examples by @robertgshaw2-neuralmagic in #78
[ Docs ] Update FP8 example to use dynamic per token by @robertgshaw2-neuralmagic in #75
[ Docs ] Overhaul accelerate user guide by @robertgshaw2-neuralmagic in #76
Support kv_cache_scheme for quantizing KV Cache by @mgoin in #88
Propagate trust_remote_code Argument by @kylesayrs in #90
Fix for issue #81 by @horheynm in #84
Fix for issue 83 by @horheynm in #85
[ DOC ] Big Model Example by @robertgshaw2-neuralmagic in #99
Enable obcq/finetune integration tests with commit cadence by @dsikka in #101
metric logging on GPTQ path by @horheynm in #65
Update test config files by @dsikka in #97
remove workflows + update runners by @dsikka in #103
metrics by @horheynm in #104
add debug by @horheynm in #108
Add FP8 KV Cache quant example by @mgoin in #113
Add vLLM e2e tests by @dsikka in #117
Fix style, fix noqa by @kylesayrs in #123
GPTQ Algorithm Cleanup by @kylesayrs in #120
GPTQ Activation Ordering by @kylesayrs in #94
demote recipe string initialization to debug and make more descriptive by @kylesayrs in #116
compressed-tensors main dependency for base-tests by @kylesayrs in #125
Set ready label for transformer tests; add message reminder on PR opened by @dsikka in #126
Fix markdown check test by @dsikka in #127
Naive Run Compressed Pt. 2 by @Satrat in #62
Fix transformer test conditions by @dsikka in #131
Run Compressed Tests by @Satrat in #132
Correct typo by @kylesayrs in #124
Activation Ordering Strategies by @kylesayrs in #121
Fix README Issue by @robertgshaw2-neuralmagic in #139
update by @dsikka in #143
Update finetune and oneshot tests by @dsikka in #114
Validate Recipe Parsing Output by @kylesayrs in #100
fix build error for nightly by @dhuangnm in #145
Fix recipe nested in configs by @kylesayrs in #140
MOE example with warning by @rahul-tuli in #87
Bug Fix: recipe stages were not being concatenated by @rahul-tuli in #150
fix package name bug for nightly by @dhuangnm in #155
Add descriptions for pytest marks by @kylesayrs in #156
Fix Sparsity Unit Test by @Satrat in #153
Fix: Error during model saving with shared tensors by @rahul-tuli in #158
Update 2:4 Examples by @dsikka in #161
DeepSeek: Fix Hessian Estimation by @Satrat in #157
bump up main to 0.2.0 by @dhuangnm in #163
Fix help dialogue by @kylesayrs in #151
Add MoE and Compressed Inference Examples by @Satrat in #160
Separate trust_remote_code args by @kylesayrs in #152
Enable a skipped finetune test by @dsikka in #169
Fix filename in example command by @dbarbuzzi in #173
Add DeepSeek V2.5 Example by @dsikka in #171
fix quality by @dsikka in #176
Patch log function name in gptq by @kylesayrs in #168
README for Modifiers by @Satrat in #165
Fix default for sequential updates by @dsikka in #186
fix default test case by @dsikka in #193
Fix Initalize typo by @Imss27 in #190
Update MoE examples by @mgoin in #192

New Contributors

@zzc0430 made their first contribution in #66
@horheynm made their first contribution in #71
@dsikka made their first contribution in #101
@dhuangnm made their first contribution in #145
@Imss27 made their first contribution in #190

Full Changelog: 0.1.0...0.2.0

Contributors

dbarbuzzi, mgoin, and 9 other contributors

Assets 4

12 Aug 15:37

dhuangnm

v0.1.0

What's Changed

Address Test Failures by @Satrat in #1
Remove SparseZoo Usage by @Satrat in #2
SparseML Cleanup by @markurtz in #6
Remove all references to Neural Magic copyright within LLM Compressor by @markurtz in #7
Add FP8 Support by @Satrat in #4
Fix Weekly Test Failure by @Satrat in #8
Add Scheme UX for QuantizationModifier by @Satrat in #9
Add Group Quantization Test Case by @Satrat in #10
Loguru logging standardization for LLM Compressor by @markurtz in #11
Clarify Function Names for Logging by @Satrat in #12
[ Examples ] E2E Examples by @robertgshaw2-neuralmagic in #5
Update setup.py by @robertgshaw2-neuralmagic in #15
SmoothQuant Mapping Defaults by @Satrat in #13
Initial README by @bfineran in #3
[Bug] Fix validation errors for smoothquant modifier + update examples by @rahul-tuli in #19
[MOE Quantization] Warn against "undercalibrated" modules by @dbogunowicz in #20
Port SparseML Remote Code Fix by @Satrat in #21
Update Quantization Save Defaults by @Satrat in #22
[Bugfix] Add fix to preserve modifier order when passed as a list by @rahul-tuli in #26
GPTQ - move calibration of quantiztion params to after hessian calibration by @bfineran in #25
Fix typos by @eldarkurtic in #31
Remove ceiling from datasets dep by @mgoin in #27
Revert naive compression format by @Satrat in #32
Fix layerwise targets by @Satrat in #36
Move Weight Update Out Of Loop by @Satrat in #40
Fix End Epoch Default by @Satrat in #39
Fix typos in example for w8a8 quant by @eldarkurtic in #38
Model Offloading Support Pt 2 by @Satrat in #34
set version to 1.0.0 for release by @bfineran in #44
Update version for first release by @markurtz in #50
BugFix: Update TRL example scripts to point to the right SFTTrainer by @rahul-tuli in #51
Update examples/quantization_24_sparse_w4a16 README by @dbarbuzzi in #52
Fix Failing Transformers Tests by @Satrat in #53
Offloading Bug Fix by @Satrat in #58

New Contributors

@markurtz made their first contribution in #6
@bfineran made their first contribution in #3
@dbogunowicz made their first contribution in #20
@eldarkurtic made their first contribution in #31
@mgoin made their first contribution in #27
@dbarbuzzi made their first contribution in #52

Full Changelog: https://github.com/vllm-project/llm-compressor/commits/0.1.0

Contributors

dbarbuzzi, mgoin, and 7 other contributors

Assets 4