[Model] Support NVLM-D and fix QK Norm in InternViT #9045

DarkLight1337 · 2024-10-03T15:25:23Z

Implement NVLM-D model based on InternVL.

While testing the model, @ywang96 found that the existing implementation of parallel attention in InternViT does not work with QK normalization. Thanks @Isotr0py for fixing this!

FIX #9040
FIX #9041

github-actions · 2024-10-03T15:25:37Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337 · 2024-10-03T15:30:57Z

vllm/model_executor/models/__init__.py

-    "Qwen2VLForConditionalGeneration":
-    ("qwen2_vl", "Qwen2VLForConditionalGeneration"),


This is redundant as it's already in multimodal models, so I'm removing this.

DarkLight1337 · 2024-10-03T15:31:19Z

vllm/model_executor/models/__init__.py

+# yapf: disable
 _MULTIMODAL_MODELS = {
-    "Blip2ForConditionalGeneration":
-    ("blip2", "Blip2ForConditionalGeneration"),
-    "ChameleonForConditionalGeneration":
-    ("chameleon", "ChameleonForConditionalGeneration"),
+    "Blip2ForConditionalGeneration": ("blip2", "Blip2ForConditionalGeneration"),
+    "ChameleonForConditionalGeneration": ("chameleon", "ChameleonForConditionalGeneration"),  # noqa: E501
    "FuyuForCausalLM": ("fuyu", "FuyuForCausalLM"),
    "InternVLChatModel": ("internvl", "InternVLChatModel"),
-    "LlavaForConditionalGeneration": ("llava",
-                                      "LlavaForConditionalGeneration"),
-    "LlavaNextForConditionalGeneration": ("llava_next",
-                                          "LlavaNextForConditionalGeneration"),
-    "LlavaNextVideoForConditionalGeneration":
-    ("llava_next_video", "LlavaNextVideoForConditionalGeneration"),
-    "LlavaOnevisionForConditionalGeneration":
-    ("llava_onevision", "LlavaOnevisionForConditionalGeneration"),
+    "LlavaForConditionalGeneration": ("llava", "LlavaForConditionalGeneration"),
+    "LlavaNextForConditionalGeneration": ("llava_next", "LlavaNextForConditionalGeneration"),  # noqa: E501
+    "LlavaNextVideoForConditionalGeneration": ("llava_next_video", "LlavaNextVideoForConditionalGeneration"),  # noqa: E501
+    "LlavaOnevisionForConditionalGeneration": ("llava_onevision", "LlavaOnevisionForConditionalGeneration"),  # noqa: E501
    "MiniCPMV": ("minicpmv", "MiniCPMV"),
-    "PaliGemmaForConditionalGeneration": ("paligemma",
-                                          "PaliGemmaForConditionalGeneration"),
+    "MllamaForConditionalGeneration": ("mllama", "MllamaForConditionalGeneration"),  # noqa: E501
+    "NVLM_D": ("nvlm_d", "InternVLChatModel"),
+    "PaliGemmaForConditionalGeneration": ("paligemma", "PaliGemmaForConditionalGeneration"),  # noqa: E501
    "Phi3VForCausalLM": ("phi3v", "Phi3VForCausalLM"),
-    "PixtralForConditionalGeneration": ("pixtral",
-                                        "PixtralForConditionalGeneration"),
+    "PixtralForConditionalGeneration": ("pixtral", "PixtralForConditionalGeneration"),  # noqa: E501
    "QWenLMHeadModel": ("qwen", "QWenLMHeadModel"),
-    "Qwen2VLForConditionalGeneration": ("qwen2_vl",
-                                        "Qwen2VLForConditionalGeneration"),
+    "Qwen2VLForConditionalGeneration": ("qwen2_vl", "Qwen2VLForConditionalGeneration"),  # noqa: E501
    "UltravoxModel": ("ultravox", "UltravoxModel"),
-    "MllamaForConditionalGeneration": ("mllama",
-                                       "MllamaForConditionalGeneration"),
 }
+# yapf: enable


Resorted the list in alphabetical order, and enforced one line per model to be more readable.

DarkLight1337 · 2024-10-03T19:25:53Z

I've fixed the errors up to but not including merging multimodal embeddings. We probably need to implement additional logic to handle tile tagging.

mgoin · 2024-10-03T19:56:55Z

(EDIT this was resolved by latest commits) I had a failure when trying to load the model weights

vllm serve nvidia/NVLM-D-72B --tensor-parallel-size 4     
...
  File "/home/mgoin/code/vllm/vllm/model_executor/model_loader/loader.py", line 403, in load_model
    model.load_weights(self._get_all_weights(model_config, model))
  File "/home/mgoin/code/vllm/vllm/model_executor/models/internvl.py", line 564, in load_weights
    self.vision_model.load_weights(weights_group["vision_model"])
  File "/home/mgoin/code/vllm/vllm/model_executor/models/intern_vit.py", line 366, in load_weights
    weight_loader(param, loaded_weight)
  File "/home/mgoin/code/vllm/vllm/model_executor/model_loader/weight_utils.py", line 537, in default_weight_loader
    assert param.size() == loaded_weight.size(), (
AssertionError: Attempted to load weight (torch.Size([12288, 3200])) into parameter (torch.Size([9600, 3200]))

(UPDATE) Now I see an error during model initialization:

vllm serve nvidia/NVLM-D-72B --tensor-parallel-size 4 --enforce-eager --max-num-seqs 16
...
Traceback (most recent call last):
  File "/home/mgoin/code/vllm/vllm/worker/model_runner_base.py", line 116, in _wrapper
    return func(*args, **kwargs)
  File "/home/mgoin/code/vllm/vllm/worker/model_runner.py", line 1644, in execute_model
    hidden_or_intermediate_states = model_executable(
  File "/home/mgoin/venvs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/mgoin/venvs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mgoin/code/vllm/vllm/model_executor/models/internvl.py", line 533, in forward
    inputs_embeds = merge_multimodal_embeddings(
  File "/home/mgoin/code/vllm/vllm/model_executor/models/utils.py", line 169, in merge_multimodal_embeddings
    mask = (input_ids == placeholder_token_id)
RuntimeError: The size of tensor a (98304) must match the size of tensor b (4) at non-singleton dimension 0

To aid debugging I made a FP8 model checkpoint: https://huggingface.co/nm-testing/NVLM-D-72B-FP8-dynamic

DarkLight1337 · 2024-10-04T03:37:20Z

OK I'm able to use the model in online serving now. The outputs seem reasonable.

mgoin · 2024-10-04T04:10:43Z

Yup it sounds reasonable to me 😸 Nice work!

vllm serve nm-testing/NVLM-D-72B-FP8-dynamic --tensor-parallel-size 4 --enforce-eager --max-num-seqs 16

DarkLight1337 · 2024-10-04T04:13:50Z

Now I just have to set up the offline examples...

mgoin · 2024-10-04T04:18:17Z

@DarkLight1337 I plugged it into the existing run_internvl example within offline_inference_vision_language.py

This was the output, which seems reasonable:

Processed prompts: 100%|█████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00,  1.33it/s, est. speed input: 4667.61 toks/s, output: 84.87 toks/s]
The image features a tall, slender structure that resembles a communications tower, partially obscured by the branches and blossoms of cherry trees in full bloom. The structure has a white and light blue color scheme and is topped with an antenna. The cherry blossoms, with their delicate pink flowers, frame the tower, creating a picturesque
The image features a tall, white tower with a distinctive design, partially obscured by cherry blossom trees in full bloom. The tower is likely a telecommunications or observation tower, characterized by its lattice structure and observation deck near the top. The cherry blossoms, with their delicate pink flowers, frame the tower, creating a picturesque scene
The image features a tall, white tower with a distinctive design, partially obscured by cherry blossom trees in full bloom. The cherry blossoms, with their delicate pink flowers, create a beautiful contrast against the blue sky. The tower's structure is intricate, with a combination of straight and curved lines, and it appears to be
The image shows a tall building with a spire, surrounded by cherry blossom trees in full bloom. The building is white and has a modern architectural style, with a distinctive spire that tapers off at the top. The cherry blossom trees are in the foreground, with their pink and white flowers creating a beautiful contrast against

DarkLight1337 · 2024-10-04T04:26:57Z

That is odd, I am getting completely nonsense results on my end.

DarkLight1337 · 2024-10-04T04:54:46Z

@mgoin Can you check whether the multi-image example also works?

DarkLight1337 · 2024-10-04T04:59:22Z

If I set num_prompts=1 then I don't get this problem.

DarkLight1337 · 2024-10-04T08:36:19Z

It seems to be an issue on the machine I am using to test the model. I can't run any models with both TP>1 and max_num_seqs>1 there.

Update: Thanks @ywang96 for helping test this!

Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

ywang96

LGTM! Per our offline discussion, feel free to consolidate the ViT attention module for the two models.

anonymousz97 · 2024-10-07T09:20:25Z

Hi, i'm not able to build the new vllm. Has anyone tried to build this pull from the source code?

DarkLight1337 · 2024-10-07T09:44:33Z

Hi, i'm not able to build the new vllm. Has anyone tried to build this pull from the source code?

What error are you running into specifically?

anonymousz97 · 2024-10-07T10:17:46Z

Hi, i'm not able to build the new vllm. Has anyone tried to build this pull from the source code?

What error are you running into specifically?

It show problem with numpy not installed while i already done the installation. I use pip install -e .

DarkLight1337 · 2024-10-07T10:30:32Z

Hi, i'm not able to build the new vllm. Has anyone tried to build this pull from the source code?

What error are you running into specifically?

It show problem with numpy not installed while i already done the installation. I use pip install -e .

Does this also happen on main branch? This sounds similar to #8851

anonymousz97 · 2024-10-07T10:44:56Z

#8851

Ya, i think it is the same but i use python 3.10 with numpy 1.26.4 and i use ubuntu. I have already read the #8851 , but haven't seen the solution yet.

DarkLight1337 · 2024-10-07T10:46:26Z

#8851

Ya, i think it is the same but i use python 3.10 with numpy 1.26.4 and i use ubuntu. I have already read the #8851 , but haven't seen the solution yet.

I suggest you provide more details in that issue then, since it's not specific to this PR.

anonymousz97 · 2024-10-07T10:47:40Z

#8851

Ya, i think it is the same but i use python 3.10 with numpy 1.26.4 and i use ubuntu. I have already read the #8851 , but haven't seen the solution yet.

I suggest you provide more details in that issue then, since it's not specific to this PR.

Ah yes, i will do that. Thanks!

Support NVLM-D

698921d

DarkLight1337 requested a review from ywang96 October 3, 2024 15:25

DarkLight1337 commented Oct 3, 2024

View reviewed changes

DarkLight1337 added 8 commits October 3, 2024 16:48

Fix wrong module

33f3a50

Avoid warning when loading config

20ebb75

Fix mlp1 loading

de39406

Fix model loading

fe3ba5b

Use NVLM-specific modules

92454b8

Load the correct vision model

efb8f26

Adopt the original version of RMSNorm which uses custom variance

26f4496

Remove extra transpose

d76ef50

DarkLight1337 added 8 commits October 4, 2024 02:44

Simplify code

dead63e

Remove unused code

f0d3003

Update input processing

72a71d5

Merge branch 'main' into nvlm_d

15eb917

Format

1a8fd37

Fix and abstract input pipeline

bfd910a

Add support for online serving

fc710a1

Fix wrong embeddings

6d38309

DarkLight1337 marked this pull request as ready for review October 4, 2024 03:37

Fix docs

3a89f90

DarkLight1337 added 4 commits October 4, 2024 05:02

Update examples

83d54f2

Fix incorrect head size

3144be2

Add sanity checks

43d546c

Fix parallel attention not being used correctly

2ec7fc1

ywang96 self-assigned this Oct 4, 2024

DarkLight1337 mentioned this pull request Oct 4, 2024

[RFC]: Multi-modality Support Refactoring #4194

Open

46 tasks

DarkLight1337 and others added 6 commits October 4, 2024 10:04

Merge branch 'main' into nvlm_d

6a6f477

Merge branch 'main' into nvlm_d

b5ea51b

fix qk norm for paralleled VIT attention

49e3dad

Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

clean up

759e749

Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

add comment

5c2d303

update nvlm-d multi-image

6d54d59

DarkLight1337 changed the title ~~[Model] Support NVLM-D~~ [Model] Support NVLM-D and fix QK Norm in InternViT Oct 7, 2024

ywang96 approved these changes Oct 7, 2024

View reviewed changes

DarkLight1337 added 2 commits October 7, 2024 09:14

Merge branch 'main' into nvlm_d

47ed7e1

Add header to docs

1b57db8

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 7, 2024

Consolidate code

7728064

DarkLight1337 enabled auto-merge (squash) October 7, 2024 09:56

DarkLight1337 merged commit 151ef4e into main Oct 7, 2024
60 checks passed

DarkLight1337 deleted the nvlm_d branch October 7, 2024 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Support NVLM-D and fix QK Norm in InternViT #9045

[Model] Support NVLM-D and fix QK Norm in InternViT #9045

DarkLight1337 commented Oct 3, 2024 •

edited

Loading

github-actions bot commented Oct 3, 2024

DarkLight1337 Oct 3, 2024 •

edited

Loading

DarkLight1337 Oct 3, 2024

DarkLight1337 commented Oct 3, 2024

mgoin commented Oct 3, 2024 •

edited

Loading

DarkLight1337 commented Oct 4, 2024

mgoin commented Oct 4, 2024

DarkLight1337 commented Oct 4, 2024

mgoin commented Oct 4, 2024

DarkLight1337 commented Oct 4, 2024 •

edited

Loading

DarkLight1337 commented Oct 4, 2024

DarkLight1337 commented Oct 4, 2024

DarkLight1337 commented Oct 4, 2024 •

edited

Loading

ywang96 left a comment •

edited

Loading

anonymousz97 commented Oct 7, 2024

DarkLight1337 commented Oct 7, 2024

anonymousz97 commented Oct 7, 2024 •

edited

Loading

DarkLight1337 commented Oct 7, 2024

anonymousz97 commented Oct 7, 2024

DarkLight1337 commented Oct 7, 2024

anonymousz97 commented Oct 7, 2024

		"Qwen2VLForConditionalGeneration":
		("qwen2_vl", "Qwen2VLForConditionalGeneration"),

[Model] Support NVLM-D and fix QK Norm in InternViT #9045

[Model] Support NVLM-D and fix QK Norm in InternViT #9045

Conversation

DarkLight1337 commented Oct 3, 2024 • edited Loading

github-actions bot commented Oct 3, 2024

DarkLight1337 Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

DarkLight1337 Oct 3, 2024

Choose a reason for hiding this comment

DarkLight1337 commented Oct 3, 2024

mgoin commented Oct 3, 2024 • edited Loading

DarkLight1337 commented Oct 4, 2024

mgoin commented Oct 4, 2024

DarkLight1337 commented Oct 4, 2024

mgoin commented Oct 4, 2024

DarkLight1337 commented Oct 4, 2024 • edited Loading

DarkLight1337 commented Oct 4, 2024

DarkLight1337 commented Oct 4, 2024

DarkLight1337 commented Oct 4, 2024 • edited Loading

ywang96 left a comment • edited Loading

Choose a reason for hiding this comment

anonymousz97 commented Oct 7, 2024

DarkLight1337 commented Oct 7, 2024

anonymousz97 commented Oct 7, 2024 • edited Loading

DarkLight1337 commented Oct 7, 2024

anonymousz97 commented Oct 7, 2024

DarkLight1337 commented Oct 7, 2024

anonymousz97 commented Oct 7, 2024

DarkLight1337 commented Oct 3, 2024 •

edited

Loading

DarkLight1337 Oct 3, 2024 •

edited

Loading

mgoin commented Oct 3, 2024 •

edited

Loading

DarkLight1337 commented Oct 4, 2024 •

edited

Loading

DarkLight1337 commented Oct 4, 2024 •

edited

Loading

ywang96 left a comment •

edited

Loading

anonymousz97 commented Oct 7, 2024 •

edited

Loading