Skip to content

Releases: vllm-project/llm-compressor

v0.3.1

12 Dec 13:25
c3608a0
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.3.0...0.3.1

v0.3.0

13 Nov 05:22
93832a6
Compare
Choose a tag to compare

What's New in v0.3.0

Key Features and Improvements

  • GPTQ Quantized-weight Sequential Updating (#177): Introduced an efficient sequential updating mechanism for GPTQ quantization, improving model compression performance and compatibility.
  • Auto-Infer Mappings for SmoothQuantModifier (#119): Automatically infers mappings based on model architecture, making SmoothQuant easier to apply across various models.
  • Improved Sparse Compression Usability (#191): Added support for targeted sparse compression with specific ignore rules during inference, allowing for more flexible model configurations.
  • Generic Wrapper for Any Hugging Face Model (#185): Added wrap_hf_model_class utility, enabling better support and integration for Hugging Face models i.e. not based on AutoModelForCausalLM.
  • Observer Restructure (#837): Introduced calibration and frozen steps within QuantizationModifier, moving Observers from compressed-tensors to llm-compressor.

Bug Fixes

  • Fix Tied Tensors Bug (#659)
  • Observer Initialization in GPTQ Wrapper (#883)
  • Sparsity Reload Testing (#882)

Documentation

  • Updated SmoothQuant Tutorial (#115): Expanded SmoothQuant documentation to include detailed mappings for easier implementation.

What's Changed

New Contributors

Full Changelog: 0.2.0...0.3.0

v0.2.0

23 Sep 22:24
2e0035f
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.1.0...0.2.0

v0.1.0

12 Aug 15:37
066d1e4
Compare
Choose a tag to compare

What's Changed

  • Address Test Failures by @Satrat in #1
  • Remove SparseZoo Usage by @Satrat in #2
  • SparseML Cleanup by @markurtz in #6
  • Remove all references to Neural Magic copyright within LLM Compressor by @markurtz in #7
  • Add FP8 Support by @Satrat in #4
  • Fix Weekly Test Failure by @Satrat in #8
  • Add Scheme UX for QuantizationModifier by @Satrat in #9
  • Add Group Quantization Test Case by @Satrat in #10
  • Loguru logging standardization for LLM Compressor by @markurtz in #11
  • Clarify Function Names for Logging by @Satrat in #12
  • [ Examples ] E2E Examples by @robertgshaw2-neuralmagic in #5
  • Update setup.py by @robertgshaw2-neuralmagic in #15
  • SmoothQuant Mapping Defaults by @Satrat in #13
  • Initial README by @bfineran in #3
  • [Bug] Fix validation errors for smoothquant modifier + update examples by @rahul-tuli in #19
  • [MOE Quantization] Warn against "undercalibrated" modules by @dbogunowicz in #20
  • Port SparseML Remote Code Fix by @Satrat in #21
  • Update Quantization Save Defaults by @Satrat in #22
  • [Bugfix] Add fix to preserve modifier order when passed as a list by @rahul-tuli in #26
  • GPTQ - move calibration of quantiztion params to after hessian calibration by @bfineran in #25
  • Fix typos by @eldarkurtic in #31
  • Remove ceiling from datasets dep by @mgoin in #27
  • Revert naive compression format by @Satrat in #32
  • Fix layerwise targets by @Satrat in #36
  • Move Weight Update Out Of Loop by @Satrat in #40
  • Fix End Epoch Default by @Satrat in #39
  • Fix typos in example for w8a8 quant by @eldarkurtic in #38
  • Model Offloading Support Pt 2 by @Satrat in #34
  • set version to 1.0.0 for release by @bfineran in #44
  • Update version for first release by @markurtz in #50
  • BugFix: Update TRL example scripts to point to the right SFTTrainer by @rahul-tuli in #51
  • Update examples/quantization_24_sparse_w4a16 README by @dbarbuzzi in #52
  • Fix Failing Transformers Tests by @Satrat in #53
  • Offloading Bug Fix by @Satrat in #58

New Contributors

Full Changelog: https://github.com/vllm-project/llm-compressor/commits/0.1.0