sync with 0.7.1#308
Open
dtrifiro wants to merge 464 commits intoopendatahub-io:mainfrom dtrifiro:sync-with-0.7.0
+59,121-22,724
Commits
This pull request is big! We're only showing the most recent 250 commits
Commits on Jan 14, 2025
Commits on Jan 15, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 16, 2025
[Core] Default to using per_token quantization for fp8 when cutlass is supported. (vllm-project#8651)
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 17, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[V1] Move more control of kv cache initialization from model_executor to EngineCore (vllm-project#11960)
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 18, 2025
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 19, 2025
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 20, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 21, 2025
- authored
[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (vllm-project#12222)
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Documentation][AMD] Add information about prebuilt ROCm vLLM docker for perf validation purpose (vllm-project#12281)
authored
Commits on Jan 22, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 23, 2025
[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD (vllm-project#12282)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[BugFix] Fix parameter names and
process_after_weight_loading
for W4A16 MoE Group Act Order (vllm-project#11528)- authored
- authored
Commits on Jan 24, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Bugfix][Kernel] FA3 Fix - RuntimeError: This flash attention build only supports pack_gqa (for build size reasons). (vllm-project#12405)
authored
Commits on Jan 25, 2025
- authored
- authored
- authored
- authored
- authored
Commits on Jan 26, 2025
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 27, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Feature] [Spec decode]: Enable MLPSpeculator/Medusa and
prompt_logprobs
with ChunkedPrefill (vllm-project#10132)
Commits on Jan 28, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- committed
- committed
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 29, 2025
- authored
- authored
- authored
- authored
- authored
[Bugfix] handle alignment of arguments in convert_sparse_cross_attention_mask_to_dense (vllm-project#12347)
- authored
- authored
- authored
- authored
Commits on Jan 30, 2025
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 31, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Kernel][Quantization] Integrate block-quantized CUTLASS kernels for DeepSeekV3 (vllm-project#12587)
authored- authored
- authored
Commits on Feb 1, 2025
- authored
- authored
- authored
- authored
- authored
[CI/Build] Add label automation for structured-output, speculative-decoding, v1 (vllm-project#12280)
authored
Commits on Feb 4, 2025
- committed