Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Support NVLM-D and fix QK Norm in InternViT #9045

Merged
merged 31 commits into from
Oct 7, 2024
Merged

Conversation

DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Oct 3, 2024

Implement NVLM-D model based on InternVL.

While testing the model, @ywang96 found that the existing implementation of parallel attention in InternViT does not work with QK normalization. Thanks @Isotr0py for fixing this!

FIX #9040
FIX #9041

Copy link

github-actions bot commented Oct 3, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

Comment on lines 63 to 70
"Qwen2VLForConditionalGeneration":
("qwen2_vl", "Qwen2VLForConditionalGeneration"),
Copy link
Member Author

@DarkLight1337 DarkLight1337 Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant as it's already in multimodal models, so I'm removing this.

Comment on lines 80 to 100
# yapf: disable
_MULTIMODAL_MODELS = {
"Blip2ForConditionalGeneration":
("blip2", "Blip2ForConditionalGeneration"),
"ChameleonForConditionalGeneration":
("chameleon", "ChameleonForConditionalGeneration"),
"Blip2ForConditionalGeneration": ("blip2", "Blip2ForConditionalGeneration"),
"ChameleonForConditionalGeneration": ("chameleon", "ChameleonForConditionalGeneration"), # noqa: E501
"FuyuForCausalLM": ("fuyu", "FuyuForCausalLM"),
"InternVLChatModel": ("internvl", "InternVLChatModel"),
"LlavaForConditionalGeneration": ("llava",
"LlavaForConditionalGeneration"),
"LlavaNextForConditionalGeneration": ("llava_next",
"LlavaNextForConditionalGeneration"),
"LlavaNextVideoForConditionalGeneration":
("llava_next_video", "LlavaNextVideoForConditionalGeneration"),
"LlavaOnevisionForConditionalGeneration":
("llava_onevision", "LlavaOnevisionForConditionalGeneration"),
"LlavaForConditionalGeneration": ("llava", "LlavaForConditionalGeneration"),
"LlavaNextForConditionalGeneration": ("llava_next", "LlavaNextForConditionalGeneration"), # noqa: E501
"LlavaNextVideoForConditionalGeneration": ("llava_next_video", "LlavaNextVideoForConditionalGeneration"), # noqa: E501
"LlavaOnevisionForConditionalGeneration": ("llava_onevision", "LlavaOnevisionForConditionalGeneration"), # noqa: E501
"MiniCPMV": ("minicpmv", "MiniCPMV"),
"PaliGemmaForConditionalGeneration": ("paligemma",
"PaliGemmaForConditionalGeneration"),
"MllamaForConditionalGeneration": ("mllama", "MllamaForConditionalGeneration"), # noqa: E501
"NVLM_D": ("nvlm_d", "InternVLChatModel"),
"PaliGemmaForConditionalGeneration": ("paligemma", "PaliGemmaForConditionalGeneration"), # noqa: E501
"Phi3VForCausalLM": ("phi3v", "Phi3VForCausalLM"),
"PixtralForConditionalGeneration": ("pixtral",
"PixtralForConditionalGeneration"),
"PixtralForConditionalGeneration": ("pixtral", "PixtralForConditionalGeneration"), # noqa: E501
"QWenLMHeadModel": ("qwen", "QWenLMHeadModel"),
"Qwen2VLForConditionalGeneration": ("qwen2_vl",
"Qwen2VLForConditionalGeneration"),
"Qwen2VLForConditionalGeneration": ("qwen2_vl", "Qwen2VLForConditionalGeneration"), # noqa: E501
"UltravoxModel": ("ultravox", "UltravoxModel"),
"MllamaForConditionalGeneration": ("mllama",
"MllamaForConditionalGeneration"),
}
# yapf: enable
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resorted the list in alphabetical order, and enforced one line per model to be more readable.

@DarkLight1337
Copy link
Member Author

I've fixed the errors up to but not including merging multimodal embeddings. We probably need to implement additional logic to handle tile tagging.

@mgoin
Copy link
Sponsor Collaborator

mgoin commented Oct 3, 2024

(EDIT this was resolved by latest commits) I had a failure when trying to load the model weights

vllm serve nvidia/NVLM-D-72B --tensor-parallel-size 4     
...
  File "/home/mgoin/code/vllm/vllm/model_executor/model_loader/loader.py", line 403, in load_model
    model.load_weights(self._get_all_weights(model_config, model))
  File "/home/mgoin/code/vllm/vllm/model_executor/models/internvl.py", line 564, in load_weights
    self.vision_model.load_weights(weights_group["vision_model"])
  File "/home/mgoin/code/vllm/vllm/model_executor/models/intern_vit.py", line 366, in load_weights
    weight_loader(param, loaded_weight)
  File "/home/mgoin/code/vllm/vllm/model_executor/model_loader/weight_utils.py", line 537, in default_weight_loader
    assert param.size() == loaded_weight.size(), (
AssertionError: Attempted to load weight (torch.Size([12288, 3200])) into parameter (torch.Size([9600, 3200]))

(UPDATE) Now I see an error during model initialization:

vllm serve nvidia/NVLM-D-72B --tensor-parallel-size 4 --enforce-eager --max-num-seqs 16
...
Traceback (most recent call last):
  File "/home/mgoin/code/vllm/vllm/worker/model_runner_base.py", line 116, in _wrapper
    return func(*args, **kwargs)
  File "/home/mgoin/code/vllm/vllm/worker/model_runner.py", line 1644, in execute_model
    hidden_or_intermediate_states = model_executable(
  File "/home/mgoin/venvs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/mgoin/venvs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mgoin/code/vllm/vllm/model_executor/models/internvl.py", line 533, in forward
    inputs_embeds = merge_multimodal_embeddings(
  File "/home/mgoin/code/vllm/vllm/model_executor/models/utils.py", line 169, in merge_multimodal_embeddings
    mask = (input_ids == placeholder_token_id)
RuntimeError: The size of tensor a (98304) must match the size of tensor b (4) at non-singleton dimension 0

To aid debugging I made a FP8 model checkpoint: https://huggingface.co/nm-testing/NVLM-D-72B-FP8-dynamic

@DarkLight1337 DarkLight1337 marked this pull request as ready for review October 4, 2024 03:37
@DarkLight1337
Copy link
Member Author

OK I'm able to use the model in online serving now. The outputs seem reasonable.

@mgoin
Copy link
Sponsor Collaborator

mgoin commented Oct 4, 2024

Yup it sounds reasonable to me 😸 Nice work!

vllm serve nm-testing/NVLM-D-72B-FP8-dynamic --tensor-parallel-size 4 --enforce-eager --max-num-seqs 16
Screenshot 2024-10-03 at 9 07 30 PM

@DarkLight1337
Copy link
Member Author

Now I just have to set up the offline examples...

@mgoin
Copy link
Sponsor Collaborator

mgoin commented Oct 4, 2024

@DarkLight1337 I plugged it into the existing run_internvl example within offline_inference_vision_language.py

Screenshot 2024-10-03 at 9 18 04 PM

This was the output, which seems reasonable:

Processed prompts: 100%|█████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00,  1.33it/s, est. speed input: 4667.61 toks/s, output: 84.87 toks/s]
The image features a tall, slender structure that resembles a communications tower, partially obscured by the branches and blossoms of cherry trees in full bloom. The structure has a white and light blue color scheme and is topped with an antenna. The cherry blossoms, with their delicate pink flowers, frame the tower, creating a picturesque
The image features a tall, white tower with a distinctive design, partially obscured by cherry blossom trees in full bloom. The tower is likely a telecommunications or observation tower, characterized by its lattice structure and observation deck near the top. The cherry blossoms, with their delicate pink flowers, frame the tower, creating a picturesque scene
The image features a tall, white tower with a distinctive design, partially obscured by cherry blossom trees in full bloom. The cherry blossoms, with their delicate pink flowers, create a beautiful contrast against the blue sky. The tower's structure is intricate, with a combination of straight and curved lines, and it appears to be
The image shows a tall building with a spire, surrounded by cherry blossom trees in full bloom. The building is white and has a modern architectural style, with a distinctive spire that tapers off at the top. The cherry blossom trees are in the foreground, with their pink and white flowers creating a beautiful contrast against

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Oct 4, 2024

That is odd, I am getting completely nonsense results on my end.

@DarkLight1337
Copy link
Member Author

@mgoin Can you check whether the multi-image example also works?

@DarkLight1337
Copy link
Member Author

If I set num_prompts=1 then I don't get this problem.

@ywang96 ywang96 self-assigned this Oct 4, 2024
@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Oct 4, 2024

It seems to be an issue on the machine I am using to test the model. I can't run any models with both TP>1 and max_num_seqs>1 there.

Update: Thanks @ywang96 for helping test this!

@DarkLight1337 DarkLight1337 changed the title [Model] Support NVLM-D [Model] Support NVLM-D and fix QK Norm in InternViT Oct 7, 2024
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Per our offline discussion, feel free to consolidate the ViT attention module for the two models.

@anonymousz97
Copy link

Hi, i'm not able to build the new vllm. Has anyone tried to build this pull from the source code?

@DarkLight1337
Copy link
Member Author

Hi, i'm not able to build the new vllm. Has anyone tried to build this pull from the source code?

What error are you running into specifically?

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 7, 2024
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 7, 2024 09:56
@anonymousz97
Copy link

anonymousz97 commented Oct 7, 2024

Hi, i'm not able to build the new vllm. Has anyone tried to build this pull from the source code?

What error are you running into specifically?

It show problem with numpy not installed while i already done the installation. I use pip install -e .

@DarkLight1337
Copy link
Member Author

Hi, i'm not able to build the new vllm. Has anyone tried to build this pull from the source code?

What error are you running into specifically?

It show problem with numpy not installed while i already done the installation. I use pip install -e .

Does this also happen on main branch? This sounds similar to #8851

@anonymousz97
Copy link

#8851

Ya, i think it is the same but i use python 3.10 with numpy 1.26.4 and i use ubuntu. I have already read the #8851 , but haven't seen the solution yet.

@DarkLight1337
Copy link
Member Author

#8851

Ya, i think it is the same but i use python 3.10 with numpy 1.26.4 and i use ubuntu. I have already read the #8851 , but haven't seen the solution yet.

I suggest you provide more details in that issue then, since it's not specific to this PR.

@anonymousz97
Copy link

#8851

Ya, i think it is the same but i use python 3.10 with numpy 1.26.4 and i use ubuntu. I have already read the #8851 , but haven't seen the solution yet.

I suggest you provide more details in that issue then, since it's not specific to this PR.

Ah yes, i will do that. Thanks!

@DarkLight1337 DarkLight1337 merged commit 151ef4e into main Oct 7, 2024
60 checks passed
@DarkLight1337 DarkLight1337 deleted the nvlm_d branch October 7, 2024 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New Model]: nvidia/NVLM-D-72B [New Model]: NVLM 1.0
4 participants