Skip to content

Release v0.11.0

Latest
Compare
Choose a tag to compare
@Giuseppe5 Giuseppe5 released this 10 Oct 12:31

Breaking Changes

  • Remove ONNX QOp export (#917)
  • QuantTensor cannot have empty metadata fields (e.g., scale, bitwidth, etc.) (#819)
  • Bias quantization now requires the specification of bit-width (#839)
  • QuantLayers do not expose quant_metadata directly. This is delegated to the proxies (#883)
  • QuantDropout has been removed (#861)
  • QuantMaxPool has been removed (#858)

Highlights

  • Support for OCP/FNUZ FP8 quantization

    • Compatibility with QAT/PTQ, including all current PTQ algorithms implemented (GPTQ, LearnedRound, GPFQ, etc.)
    • Possibility to fully customize the minifloat configuration (i.e., select mantissa/exponent bit-width, exponent bias, etc.)
    • Support for ONNX QDQ export
  • Support for OCP MX Quantization

    • Compatibility with QAT/PTQ, including all current PTQ algorithms implemented (GPTQ, LearnedRound, GPFQ, etc.)
    • Possibility to fully customize the minifloat configuration (i.e., select mantissa/exponent bit-width, exponent bias, group size, etc.)
  • New QuantTensor supports:

    • FloatQuantTensor: supports OCP FP formats and general minifloat quantization
    • GroupwiseQuantTensor: supports for OCP MX formats and general groupwise int/minifloat quantization
  • Support for Channel splitting

  • Support for HQO optimization for zero point

  • Support for HQO optimization for scale (prototype)

  • Improved SDXL entrypoint under brevitas_examples

  • Improved LLM entrypoint under brevitas_examples

    • Compatibility with accelerate
  • Prototype support for torch.compile:

    • Check PR #1006 for an example on how to use it

What's Changed

For a more comprehensive list of changes and fix, check the list below:

New Contributors

Full Changelog: v0.10.2...v0.11.0