[Model] PP support for embedding models and update docs #9090

DarkLight1337 · 2024-10-05T08:34:21Z

I found that PP actually didn't work for embedding models because it's not implemented in the embedding model runner. I've updated the embedding model runner to support PP, and cleaned up the existing code in the models so that both embedding and CausalLM models can share weight loading logic.

I have also updated the Supported Models page (again) with additional models.

github-actions · 2024-10-05T08:34:32Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337 · 2024-10-05T08:38:59Z

vllm/worker/embedding_model_runner.py

+        if (self.observability_config is not None
+                and self.observability_config.collect_model_forward_time):
+            model_forward_start = torch.cuda.Event(enable_timing=True)
+            model_forward_end = torch.cuda.Event(enable_timing=True)
+            model_forward_start.record()


Copied from GPUModelRunner. I don't really like how we're repeating this code though...

DarkLight1337 · 2024-10-05T14:02:08Z

@zhuzilin can you help check whether the Qwen2.5-Math-RM-72B can be used in PP setting? The model is too big for me to test.

youkaichao · 2024-10-05T16:53:27Z

@DarkLight1337 you can test it with dummy weight and manually change the hidden size of config to make the model smaller

youkaichao · 2024-10-05T16:56:10Z

hand it over to @andoorve for review

andoorve

Added one minor comment, otherwise looks straightforward to me!

andoorve · 2024-10-06T00:42:08Z

tests/distributed/test_pipeline_parallel.py

    "baichuan-inc/Baichuan-7B": PPTestSettings.fast(trust_remote_code=True),
    "baichuan-inc/Baichuan2-13B-Chat": PPTestSettings.fast(trust_remote_code=True),  # noqa: E501
    "bigscience/bloomz-1b1": PPTestSettings.fast(),
    "THUDM/chatglm3-6b": PPTestSettings.fast(trust_remote_code=True),
    "CohereForAI/c4ai-command-r-v01": PPTestSettings.fast(tp_base=2, trust_remote_code=True),  # noqa: E501
-    # TODO: Test on larger GPU
-    # "databricks/dbrx-instruct": PPTestSettings.fast(),
+    "databricks/dbrx-instruct": PPTestSettings.fast(tp_base=4),


This hardcode of 4 will only work on 8 GPU machines. Might be a bit confusing, or comments should be added to adjust this as necessary.

We might not need all of them as well, maybe a few as examples are sufficient with some comments otherwise needs to be kept up to date.

I have added some comments in the code. See if it works for you.

The comments look good to me, but would it be possible to run this test with tp_base of 8? I.e. does this test automatically work with 2 nodes? This doesn't need to block but just something to think about.

It requires 16 GPUs to run. What is the setup you used to run those models? I have a comment indicating that tp_base is just an indication of the model size and may have to be adjusted further.

youkaichao

please address comments from @andoorve

ywang96

Small nit

docs/source/models/supported_models.rst

Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

…#9090) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

gshtras · 2024-10-07T20:22:59Z

Not yet sure why, but this introduces a regression on ROCm:
python benchmarks/benchmark_latency.py --model meta-llama/Llama-2-70b-chat-hf -tp 8
crashes during weight loading

DarkLight1337 added 3 commits October 5, 2024 08:31

Add PP support for embeddings models

20b63ee

Add PP support for embedding model runner

3a51d81

Test PP embedding models

49c8216

DarkLight1337 requested a review from youkaichao as a code owner October 5, 2024 08:34

DarkLight1337 commented Oct 5, 2024

View reviewed changes

DarkLight1337 added 7 commits October 5, 2024 08:40

Reduce the number of tests

d995174

Fix missing PP handling

b385f8b

Fix for real

c6ea4b1

Fix wrong default

0e1e0b0

Fix mistral loading

d139c54

Update imports

6222fdb

Fix weight loading again

d5f2e15

Update test definitions

b77ab5c

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 5, 2024

andoorve approved these changes Oct 6, 2024

View reviewed changes

youkaichao approved these changes Oct 6, 2024

View reviewed changes

DarkLight1337 added 4 commits October 6, 2024 01:56

Fix import

1b9f8d4

Up the TP size

0b6b931

Add missing models to docs

519f695

Add note for tests

59bf961

DarkLight1337 changed the title ~~[Model] PP support for embedding models~~ [Model] PP support for embedding models and update docs Oct 6, 2024

DarkLight1337 added 5 commits October 6, 2024 02:38

More explicit NOTE

dfad8fb

Fix wrong model name

9adc7c0

Downgrade to note

21212ee

Fix missing org

79f9c4e

Remove an L

6e49835

DarkLight1337 force-pushed the embedding-pp branch from af3cacc to 6e49835 Compare October 6, 2024 02:56

Consolidate multimodal models

e19fda6

ywang96 reviewed Oct 6, 2024

View reviewed changes

docs/source/models/supported_models.rst Outdated Show resolved Hide resolved

DarkLight1337 and others added 2 commits October 6, 2024 11:43

Update docs/source/models/supported_models.rst

4e5cab5

Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

Update supported_models.rst

90ecb43

DarkLight1337 merged commit b22b798 into main Oct 6, 2024
58 checks passed

DarkLight1337 deleted the embedding-pp branch October 6, 2024 08:35

liuyanyi pushed a commit to liuyanyi/vllm that referenced this pull request Oct 6, 2024

[Model] PP support for embedding models and update docs (vllm-project…

0324847

…#9090) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] PP support for embedding models and update docs #9090

[Model] PP support for embedding models and update docs #9090

DarkLight1337 commented Oct 5, 2024 •

edited

Loading

github-actions bot commented Oct 5, 2024

DarkLight1337 Oct 5, 2024 •

edited

Loading

DarkLight1337 commented Oct 5, 2024

youkaichao commented Oct 5, 2024

youkaichao commented Oct 5, 2024

andoorve left a comment

andoorve Oct 6, 2024

andoorve Oct 6, 2024

DarkLight1337 Oct 6, 2024

andoorve Oct 6, 2024

DarkLight1337 Oct 6, 2024

youkaichao left a comment

ywang96 left a comment

gshtras commented Oct 7, 2024

[Model] PP support for embedding models and update docs #9090

[Model] PP support for embedding models and update docs #9090

Conversation

DarkLight1337 commented Oct 5, 2024 • edited Loading

github-actions bot commented Oct 5, 2024

DarkLight1337 Oct 5, 2024 • edited Loading

Choose a reason for hiding this comment

DarkLight1337 commented Oct 5, 2024

youkaichao commented Oct 5, 2024

youkaichao commented Oct 5, 2024

andoorve left a comment

Choose a reason for hiding this comment

andoorve Oct 6, 2024

Choose a reason for hiding this comment

andoorve Oct 6, 2024

Choose a reason for hiding this comment

DarkLight1337 Oct 6, 2024

Choose a reason for hiding this comment

andoorve Oct 6, 2024

Choose a reason for hiding this comment

DarkLight1337 Oct 6, 2024

Choose a reason for hiding this comment

youkaichao left a comment

Choose a reason for hiding this comment

ywang96 left a comment

Choose a reason for hiding this comment

gshtras commented Oct 7, 2024

DarkLight1337 commented Oct 5, 2024 •

edited

Loading

DarkLight1337 Oct 5, 2024 •

edited

Loading