Add Idefics3/SmolVLM quant support via traceable class #1095

leon-seidel · 2025-01-24T14:23:11Z

SUMMARY:
Adding a traceable Idefics3 class following the new guide to allow W4A16 quants of Idefics3 and SmolVLM (which share the same architecture). Idefics3 seems to require a max_sequence_length of 4096 and I copied the example from the Phi 3 Vision example as the dataset loading approach from the Llava example led to OOM on 64 GB RAM.

TEST PLAN:
Tested on A100 with Idefics3 @512 samples and on a 4060 Ti with SmolVLM @128 samples.

kylesayrs

Thank you for your contribution @leon-seidel! This looks great to me, I look forward to getting this landed!

src/llmcompressor/transformers/tracing/idefics3.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs · 2025-01-24T18:24:21Z

Hi @leon-seidel

I've run your example end to end and it looks good! All that's left is to fix the quality tests and this is good to land!

pip install -e ./[dev]
make style
make quality

kylesayrs · 2025-01-24T20:53:11Z

Base

hf-multimodal (pretrained=HuggingFaceM4/Idefics3-8B-Llama3,dtype=bfloat16,add_bos_token=True,convert_img_format=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|     Tasks      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|----------------|------:|------|-----:|------|---|-----:|---|-----:|
|Computer Science|      0|none  |     0|acc   |↑  |0.3333|±  |0.0875|

W4A16

hf-multimodal (pretrained=Idefics3-8B-Llama3-W4A16-G128,dtype=bfloat16,add_bos_token=True,convert_img_format=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|     Tasks      |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------------|------:|------|-----:|------|---|----:|---|-----:|
|Computer Science|      0|none  |     0|acc   |↑  |  0.3|±  |0.0851|

Signed-off-by: Leon Seidel <leon.seidel@fau.de>

leon-seidel · 2025-01-24T23:27:44Z

Quality should be fixed now, thanks for your help! Also great tutorial on making the models traceable in the first place. I tried it on my own finetuned Idefics3 model and can't see any deterioration in the outputs!

examples/multimodal_vision/idefics3_example.py

@512

SUMMARY: Adding a traceable Idefics3 class following the new [guide](https://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/transformers/tracing/GUIDE.md) to allow W4A16 quants of Idefics3 and SmolVLM (which share the same architecture). Idefics3 seems to require a max_sequence_length of 4096 and I copied the example from the Phi 3 Vision example as the dataset loading approach from the Llava example led to OOM on 64 GB RAM. TEST PLAN: Tested on A100 with Idefics3 @512 samples and on a 4060 Ti with SmolVLM @128 samples. --------- Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>

Add Idefics3/SmolVLM

25e8b2c

kylesayrs added the ready When a PR is ready for review label Jan 24, 2025

kylesayrs reviewed Jan 24, 2025

View reviewed changes

src/llmcompressor/transformers/tracing/idefics3.py Show resolved Hide resolved

Update src/llmcompressor/transformers/tracing/idefics3.py

4ffff65

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

Merge branch 'main' into main

bdf0c18

Style and Quality

c174c25

Signed-off-by: Leon Seidel <leon.seidel@fau.de>

kylesayrs requested review from mgoin, kylesayrs, dsikka, rahul-tuli and brian-dellabetta and removed request for mgoin and rahul-tuli January 25, 2025 01:26

kylesayrs previously approved these changes Jan 25, 2025

View reviewed changes

kylesayrs reviewed Jan 25, 2025

View reviewed changes

examples/multimodal_vision/idefics3_example.py Outdated Show resolved Hide resolved

Remove reference to phi3_vision

ebc6c0d

leon-seidel dismissed kylesayrs’s stale review via ebc6c0d January 25, 2025 17:46

kylesayrs approved these changes Jan 25, 2025

View reviewed changes

dsikka approved these changes Jan 27, 2025

View reviewed changes

dsikka merged commit 55cfa1b into vllm-project:main Jan 27, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Idefics3/SmolVLM quant support via traceable class #1095

Add Idefics3/SmolVLM quant support via traceable class #1095

leon-seidel commented Jan 24, 2025

kylesayrs left a comment

kylesayrs commented Jan 24, 2025

kylesayrs commented Jan 24, 2025 •

edited

Loading

leon-seidel commented Jan 24, 2025

Add Idefics3/SmolVLM quant support via traceable class #1095

Add Idefics3/SmolVLM quant support via traceable class #1095

Conversation

leon-seidel commented Jan 24, 2025

kylesayrs left a comment

Choose a reason for hiding this comment

kylesayrs commented Jan 24, 2025

kylesayrs commented Jan 24, 2025 • edited Loading

leon-seidel commented Jan 24, 2025

kylesayrs commented Jan 24, 2025 •

edited

Loading