[pull] main from vllm-project:main #32

pull · 2024-05-21T21:06:50Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

Signed-off-by: kerthcet <kerthcet@gmail.com>

openshift-ci · 2024-05-21T21:07:03Z

Hi @pull[bot]. Thanks for your PR.

I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

…4722)

Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs

The 2nd PR for #4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).

…4894)

…Config (#4991)

…e) (#4983)

…ot defined (#5009)

Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>

Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

Co-authored-by: Elisei Smirnov <el.smirnov@innopolis.university>

Co-authored-by: Michael Goin <michael@neuralmagic.com>

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

Co-authored-by: Lei Wen <wenlei03@qiyi.com>

…-Small model (#4799) Co-authored-by: beagleski <yunanzhang@microsoft.com> Co-authored-by: bapatra <bapatra@microsoft.com> Co-authored-by: Barun Patra <codedecde@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>

…5000)

Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

…5108)

Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com> Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com> Co-authored-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Co-authored-by: omkarkakarparthi <okakarpa>

Co-authored-by: Breno Faria <breno.faria@intrafind.com>

…er.py (#5129)

Co-authored-by: Roger Wang <ywang@roblox.com>

…red_metadata modifier (introduced with PTX 8.5) (#5136)

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

…e ::ordered_metadata modifier (introduced with PTX 8.5)" (#5149)

Co-authored-by: xuhao <xuhao@cambricon.com>

openshift-ci · 2024-05-31T09:01:24Z

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: pull[bot]

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Adds support for multi-lora adapters. Passing tests added over in this PR: https://github.ibm.com/ai-foundation/tgis-deploy-tests/pull/25/files --------- Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

* add gaudi installation readme * readme writeup * Create README_GAUDI.md * Update README.md * Update README_GAUDI.md * Update README.md * Update readmes

Update linear.py

mgoin and others added 3 commits May 21, 2024 09:06

[CI/Build] Codespell ignore build/ directory (#4945)

757b62c

[Bugfix] Fix flag name for max_seq_len_to_capture (#4935)

14772ee

Signed-off-by: kerthcet <kerthcet@gmail.com>

[Bugfix][Kernel] Add head size check for attention backend selection (#…

99eff67

…4944)

openshift-ci bot requested review from dtrifiro and rpancham May 21, 2024 21:06

openshift-ci bot added the needs-ok-to-test label May 21, 2024

pull bot added ⤵️ pull and removed needs-ok-to-test labels May 21, 2024

sasha0552 and others added 2 commits May 22, 2024 01:32

[Frontend] Dynamic RoPE scaling (#4638)

9b9a10d

[CI/Build] Enforce style for C++ and CUDA code with clang-format (#…

5f6d10c

…4722)

dtrifiro added the ok-to-test label May 22, 2024

rkooo567 and others added 18 commits May 22, 2024 09:02

[misc] remove comments that were supposed to be removed (#4977)

c74c913

[Kernel] Fixup for CUTLASS kernels in CUDA graphs (#4954)

8674f98

Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs

[Misc] Load FP8 kv-cache scaling factors from checkpoints (#4893)

a3a73ab

The 2nd PR for #4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).

[Model] LoRA gptbigcode implementation (#3949)

97b0300

[Core] Eliminate parallel worker per-step task scheduling overhead (#…

eb6d3c2

…4894)

[Minor] Fix small typo in llama.py: QKVParallelLinear -> Quantization…

a36de68

…Config (#4991)

[Misc] Take user preference in attention selector (#4960)

ee3eea0

Marlin 24 prefill performance improvement (about 25% better on averag…

6066253

…e) (#4983)

[Bugfix] Update Dockerfile.cpu to fix NameError: name 'vllm_ops' is n…

2ba80be

…ot defined (#5009)

[Core][1/N] Support send/recv in PyNCCL Groups (#4988)

5eda2ea

Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>

[Kernel] Initial Activation Quantization Support (#4525)

a124232

Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

[Core]: Option To Use Prompt Token Ids Inside Logits Processor (#4985)

e3470f8

Co-authored-by: Elisei Smirnov <el.smirnov@innopolis.university>

[Doc] add ccache guide in doc (#5012)

6a50f4c

Co-authored-by: Michael Goin <michael@neuralmagic.com>

[Bugfix] Fix Mistral v0.3 Weight Loading (#5005)

9197709

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

[Core][Bugfix]: fix prefix caching for blockv2 (#4764)

e64fde4

Co-authored-by: Lei Wen <wenlei03@qiyi.com>

[Misc] add logging level env var (#5045)

325c119

[Dynamic Spec Decoding] Minor fix for disabling speculative decoding (#…

d5a1697

…5000)

Etelis and others added 18 commits May 29, 2024 16:13

[Bugfix] logprobs is not compatible with the OpenAI spec #4795 (#5031)

7c3604f

[Doc][Build] update after removing vllm-nccl (#5103)

4fbcb0f

Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

[Bugfix] gptq_marlin: Ensure g_idx_sort_indices is not a Parameter (#…

5bf185a

…5108)

[BUGFIX] [FRONTEND] Correct chat logprobs (#5029)

87d41c8

Co-authored-by: Breno Faria <breno.faria@intrafind.com>

[Bugfix] Automatically Detect SparseML models (#5119)

d910816

[CI/Build] increase wheel size limit to 200 MB (#5130)

f758505

[Misc] remove duplicate definition of seq_lens_tensor in model_runn…

d79d9ea

…er.py (#5129)

[Doc] Use intersphinx and update entrypoints docs (#5125)

a9bcc7a

add doc about serving option on dstack (#3074)

429d897

Co-authored-by: Roger Wang <ywang@roblox.com>

Bump version to v0.4.3 (#5046)

87a658c

[Build] Disable sm_90a in cu11 (#5141)

45a1a69

[Bugfix] Avoid Warnings in SparseML Activation Quantization (#5120)

b35be54

[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::orde…

6d21fa1

…red_metadata modifier (introduced with PTX 8.5) (#5136)

Fix cutlass sm_90a vesrion in CMakeList

533c217

[Model] Support MAP-NEO model (#5081)

a22dea5

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

Revert "[Kernel] Marlin_24: Ensure the mma.sp instruction is using th…

e9d3aa0

…e ::ordered_metadata modifier (introduced with PTX 8.5)" (#5149)

[Misc]: optimize eager mode host time (#4196)

a377f0b

Co-authored-by: xuhao <xuhao@cambricon.com>

dtrifiro marked this pull request as ready for review May 31, 2024 09:00

openshift-ci bot removed the do-not-merge/work-in-progress label May 31, 2024

dtrifiro added lgtm approved labels May 31, 2024

openshift-ci bot requested review from heyselbi and vaibhavjainwiz May 31, 2024 09:01

dtrifiro enabled auto-merge (rebase) May 31, 2024 09:49

dtrifiro merged commit 527c996 into opendatahub-io:main May 31, 2024
15 of 16 checks passed

Xaenalt pushed a commit that referenced this pull request Sep 18, 2024

Add release docs for Gaudi (#32)

b6f5584

* add gaudi installation readme * readme writeup * Create README_GAUDI.md * Update README.md * Update README_GAUDI.md * Update README.md * Update readmes

prarit pushed a commit to prarit/vllm that referenced this pull request Oct 18, 2024

Merge pull request opendatahub-io#32 from ROCm/gshtras-patch-1

69ce080

Update linear.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from vllm-project:main #32

[pull] main from vllm-project:main #32

pull bot commented May 21, 2024 •

edited

Loading

openshift-ci bot commented May 21, 2024

openshift-ci bot commented May 31, 2024

[pull] main from vllm-project:main #32

[pull] main from vllm-project:main #32

Conversation

pull bot commented May 21, 2024 • edited Loading

openshift-ci bot commented May 21, 2024

openshift-ci bot commented May 31, 2024

pull bot commented May 21, 2024 •

edited

Loading