sync with 0.7.1#308

Open

dtrifiro wants to merge 464 commits intoopendatahub-io:mainfrom dtrifiro:sync-with-0.7.0

+59,121-22,724

This pull request is big! We're only showing the most recent 250 commits

Commits on Jan 14, 2025

[Misc] Merge bitsandbytes_stacked_params_mapping and packed_modules_mapping (vllm-project#11924 )
jeejeelee
authored

Commits on Jan 15, 2025

[Kernel] Support MulAndSilu (vllm-project#11624 )
jeejeelee
authored
[HPU][Bugfix] Don't use /dev/accel/accel0 for HPU autodetection in setup.py (vllm-project#12046 )
kzawora-intel
authored
[Platform] move current_memory_usage() into platform (vllm-project#11369 )
shen-shanshan
authored
[V1][BugFix] Fix edge case in VLM scheduling (vllm-project#12065 )
WoosukKwon
authored
[Misc] Add multipstep chunked-prefill support for FlashInfer (vllm-project#10467 )
elfiegg
authored
[core] Turn off GPU communication overlap for Ray executor (vllm-project#12051 )
ruisearch42
authored
[core] platform agnostic executor via collective_rpc (vllm-project#11256 )
youkaichao
authored
[Doc] Update examples to remove SparseAutoModelForCausalLM (vllm-project#12062 )
kylesayrs
authored
[V1][Prefix Cache] Move the logic of num_computed_tokens into KVCacheManager (vllm-project#12003 )
heheda12345
authored
Fix: cases with empty sparsity config (vllm-project#12057 )
rahul-tuli
authored
Type-fix: make execute_model output type optional (vllm-project#12020 )
youngkent
authored
[Platform] Do not raise error if _Backend is not found (vllm-project#12023 )

wangxiyuan
and
MengqingCao
authored
[Model]: Support internlm3 (vllm-project#12037 )
RunningLeon
authored
Misc: allow to use proxy in HTTPConnection (vllm-project#12042 )
zhouyuan
authored
[Misc][Quark] Upstream Quark format to VLLM (vllm-project#10765 )

authored
[Doc]: Update OpenAI-Compatible Server documents (vllm-project#12082 )
maang-h
authored
[Bugfix] use right truncation for non-generative tasks (vllm-project#12050 )
joerunde
authored
[V1][Core] Autotune encoder cache budget (vllm-project#11895 )
ywang96
authored
[Bugfix] Fix _get_lora_device for HQQ marlin (vllm-project#12090 )

varun-sundar-rabindranath
and
Varun Sundar Rabindranath
authored
Allow hip sources to be directly included when compiling for rocm. (vllm-project#12087 )
tvirolai-amd
authored

Commits on Jan 16, 2025

Commits on Jan 17, 2025

Commits on Jan 18, 2025

Commits on Jan 19, 2025

Commits on Jan 20, 2025

Commits on Jan 21, 2025

Commits on Jan 22, 2025

Commits on Jan 23, 2025

Commits on Jan 24, 2025

Commits on Jan 26, 2025

Commits on Jan 27, 2025

Commits on Jan 28, 2025

Commits on Jan 29, 2025

Commits on Jan 30, 2025

Commits on Jan 31, 2025

Commits on Feb 1, 2025

Commits on Feb 4, 2025

Sync with upstream @ v0.7.1
dtrifiro
committed