Support FP4 gemm and FP4 checkpoints #3899

trevor-m · 2025-02-26T23:06:19Z

Motivation

This PR adds support for modelopt FP4 quantized models.
Tested using fp4 quantized Llama 3.1 model.

This work was adapted from the following - thanks @pavanimajety @kaixih @kushanam!
vllm-project/vllm#12784
vllm-project/vllm#13571
vllm-project/vllm#12520

Modifications

Adds two operations to sgl-kernel:

scaled_fp4_quant - Quantize bf16 or fp16 input to fp4 and returns input scale in block interleaved format
cutlass_scaled_fp4_mm - Perform fp4 gemm

Adds modelopt_fp4 quantization method.
Adds ModelOptFp4Config and ModelOptFp4LinearMethod to utilize new fp4 kernels for linear layers

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

Fix NAN issue by using getCurrentCUDAStream(). Apply rounding patch from trtllm (not needed for fixing NAN) Add fp4 unit tests

Fixes trying to fix cleanup

trevor-m added 2 commits February 26, 2025 22:49

Add fp4 kernels to sgl-kernel

bb6fc2b

Fix NAN issue by using getCurrentCUDAStream(). Apply rounding patch from trtllm (not needed for fixing NAN) Add fp4 unit tests

Nvfp4 weight loading and inference

79f73bf

Fixes trying to fix cleanup

trevor-m requested review from zhyncs, ispobock, HandH1998, BBuf, yizhang2077, merrymercy, Ying1123, hnyls2002 and ByronHsu as code owners February 26, 2025 23:06

trevor-m added 2 commits February 26, 2025 23:15

Run pre-commit

d953ed6

Remove unused is_cutlass_fp4 supported

ece1737

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support FP4 gemm and FP4 checkpoints #3899

Support FP4 gemm and FP4 checkpoints #3899

trevor-m commented Feb 26, 2025

Support FP4 gemm and FP4 checkpoints #3899

Are you sure you want to change the base?

Support FP4 gemm and FP4 checkpoints #3899

Conversation

trevor-m commented Feb 26, 2025

Motivation

Modifications

Checklist