(DO NOT MERGE) IBM release WIP #76

This pull-request has been approved by: prashantgupta24
Once this PR has been reviewed and has the lgtm label, please assign terrytangyuan for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

prashantgupta24 · 2024-07-01T19:08:45Z

/test all

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

* fix gradlib fp8 output * add condition check for existing tune result * fix linter * fix import order * fix lint

mgoin and others added 30 commits July 1, 2024 11:54

[Doc] Documentation on supported hardware for quantization methods (v…

f9fa4e4

…llm-project#5745)

[BugFix] exclude version 1.15.0 for modelscope (vllm-project#5668)

299af70

[ci][test] fix ca test in main (vllm-project#5746)

a455d65

[LoRA] Add support for pinning lora adapters in the LRU cache (vllm-p…

f4c1a10

…roject#5603)

[CI][Hardware][Intel GPU] add Intel GPU(XPU) ci pipeline (vllm-projec…

b0b518d

…t#5616)

[Model] Support Qwen-VL and Qwen-VL-Chat models with text-only inputs (…

bc4ae91

…vllm-project#5710) Co-authored-by: Roger Wang <ywang@roblox.com>

[Misc] Remove vllm-project#4789 workaround left in vllm/entrypoints/o…

ec2ed1b

…penai/run_batch.py (vllm-project#5756)

[Bugfix] Fix pin_lora error in TPU executor (vllm-project#5760)

a77856f

[Docs][TPU] Add installation tip for TPU (vllm-project#5761)

06baabc

[core][distributed] improve shared memory broadcast (vllm-project#5754)

7cd7a7a

[BugFix] [Kernel] Add Cutlass2x fallback kernels (vllm-project#5744)

945732a

Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

[Distributed] Add send and recv helpers (vllm-project#5719)

7923319

[Bugfix] Add phi3v resize for dynamic shape and fix torchvision requi…

f41fff4

…rement (vllm-project#5772)

[doc][faq] add warning to download models for every nodes (vllm-proje…

a6d3e9e

…ct#5783)

[Doc] Add "Suggest edit" button to doc pages (vllm-project#5789)

1408567

[Doc] Add Phi-3-medium to list of supported models (vllm-project#5788)

ab86561

[Bugfix] Fix FlexibleArgumentParser replaces _ with - for actual args (…

657a3f8

…vllm-project#5795)

[ci] Remove aws template (vllm-project#5757)

c9b8f8a

Signed-off-by: kevin <kevin@anyscale.com>

[Doc] Add notice about breaking changes to VLMs (vllm-project#5818)

65b7543

[Speculative Decoding] Support draft model on different tensor-paral…

79df20a

…lel size than target model (vllm-project#5414)

[Misc] Remove useless code in cpu_worker (vllm-project#5824)

1187a29

[Core] Add fault tolerance for RayTokenizerGroupPool (vllm-project#…

54b3304

…5748)

[doc][distributed] add both gloo and nccl tests (vllm-project#5834)

976f4fa

[CI/Build] Add unit testing for FlexibleArgumentParser (vllm-project#…

f88e861

…5798)

[Misc] Update w4a16 compressed-tensors support to include w8a16 (…

caf1017

…vllm-project#5794)

[Hardware][TPU] Refactor TPU backend (vllm-project#5831)

a50a2e9

[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improv…

8509fef

…ements, test fixes (vllm-project#5422)

[Hardware][TPU] Raise errors for unsupported sampling params (vllm-pr…

9595933

…oject#5850)

[CI/Build] Add E2E tests for MLPSpeculator (vllm-project#5791)

b2f42b3

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

[Bugfix] Fix assertion in NeuronExecutor (vllm-project#5841)

2100b12

robertgshaw2-redhat and others added 18 commits July 1, 2024 11:54

[ CI/Build ] LM Eval Harness Based CI Testing (vllm-project#5838)

2c3044d

Co-authored-by: Robert Shaw <rshaw@neuralmagic>

[Bugfix][CI/Build][Hardware][AMD] Install matching torchvision to fix…

ee9c4d1

… AMD tests (vllm-project#5949)

[CI/Build] Temporarily Remove Phi3-Vision from TP Test (vllm-project#…

eddb80a

…5989)

[CI/Build] Reuse code for checking output consistency (vllm-project#5988

075c3f9

)

[CI/Build] [3/3] Reorganize entrypoints tests (vllm-project#5966)

c14b831

[ci][distributed] fix device count call

045125b

[ci][distributed] fix some cuda init that makes it necessary to use spawn (vllm-project#5991)

[Frontend]: Support base64 embedding (vllm-project#5935)

7443549

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

[Lora] Use safetensor keys instead of adapter_config.json to find une…

6b3a037

…xpected modules. (vllm-project#5909) Co-authored-by: sang <sangcho@anyscale.com>

[ CI ] Temporarily Disable Large LM-Eval Tests (vllm-project#6005)

1e2049e

Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic>

[Misc] Fix get_min_capability (vllm-project#5971)

5b65eb0

[ Misc ] Refactor w8a8 to use process_weights_after_load (Simplify …

bde1a5a

…Weight Loading) (vllm-project#5940) Co-authored-by: Robert Shaw <rshaw@neuralmagic>

[misc][cuda] use nvml to avoid accidentally cuda initialization (vllm…

169f3df

…-project#6007)

[Speculative Decoding 2/2 ] Integrate typical acceptance sampler into…

b1d1398

… Spec Decode Worker (vllm-project#5348)

[ CI ] Re-enable Large Model LM Eval (vllm-project#6031)

567df3b

[doc][misc] remove deprecated api server in doc (vllm-project#6037)

7325ac0

[Misc] update benchmark backend for scalellm (vllm-project#6018)

979fcb5

[doc][misc] further lower visibility of simple api server (vllm-proje…

c544ecf

…ct#6041) Co-authored-by: Simon Mo <simon.mo@hey.com>

Squash 4645

0558bcc

openshift-ci bot requested review from dtrifiro and rpancham July 1, 2024 19:08

prashantgupta24 changed the title ~~July 1 upstream 4645~~ IBM release WIP Jul 1, 2024

prashantgupta24 changed the title ~~IBM release WIP~~ (DO NOT MERGE) IBM release working Jul 1, 2024

prashantgupta24 changed the title ~~(DO NOT MERGE) IBM release working~~ (DO NOT MERGE) IBM release WIP Jul 1, 2024

🚧 add adapter changes

2987012

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

prashantgupta24 closed this Jul 3, 2024

prashantgupta24 deleted the july-1-upstream-4645 branch July 3, 2024 14:04

Xaenalt pushed a commit that referenced this pull request Sep 18, 2024

Remove allgather workaround in logits_processor (#76)

90f900c

prarit pushed a commit to prarit/vllm that referenced this pull request Oct 18, 2024

Fix gradlib fp8 output (opendatahub-io#76)

52df169

* fix gradlib fp8 output * add condition check for existing tune result * fix linter * fix import order * fix lint

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(DO NOT MERGE) IBM release WIP #76

(DO NOT MERGE) IBM release WIP #76

prashantgupta24 commented Jul 1, 2024 •

edited

Loading

openshift-ci bot commented Jul 1, 2024

prashantgupta24 commented Jul 1, 2024

(DO NOT MERGE) IBM release WIP #76

(DO NOT MERGE) IBM release WIP #76

Conversation

prashantgupta24 commented Jul 1, 2024 • edited Loading

openshift-ci bot commented Jul 1, 2024

prashantgupta24 commented Jul 1, 2024

prashantgupta24 commented Jul 1, 2024 •

edited

Loading