-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Support NVLM-D and fix QK Norm in InternViT #9045
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
"Qwen2VLForConditionalGeneration": | ||
("qwen2_vl", "Qwen2VLForConditionalGeneration"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant as it's already in multimodal models, so I'm removing this.
# yapf: disable | ||
_MULTIMODAL_MODELS = { | ||
"Blip2ForConditionalGeneration": | ||
("blip2", "Blip2ForConditionalGeneration"), | ||
"ChameleonForConditionalGeneration": | ||
("chameleon", "ChameleonForConditionalGeneration"), | ||
"Blip2ForConditionalGeneration": ("blip2", "Blip2ForConditionalGeneration"), | ||
"ChameleonForConditionalGeneration": ("chameleon", "ChameleonForConditionalGeneration"), # noqa: E501 | ||
"FuyuForCausalLM": ("fuyu", "FuyuForCausalLM"), | ||
"InternVLChatModel": ("internvl", "InternVLChatModel"), | ||
"LlavaForConditionalGeneration": ("llava", | ||
"LlavaForConditionalGeneration"), | ||
"LlavaNextForConditionalGeneration": ("llava_next", | ||
"LlavaNextForConditionalGeneration"), | ||
"LlavaNextVideoForConditionalGeneration": | ||
("llava_next_video", "LlavaNextVideoForConditionalGeneration"), | ||
"LlavaOnevisionForConditionalGeneration": | ||
("llava_onevision", "LlavaOnevisionForConditionalGeneration"), | ||
"LlavaForConditionalGeneration": ("llava", "LlavaForConditionalGeneration"), | ||
"LlavaNextForConditionalGeneration": ("llava_next", "LlavaNextForConditionalGeneration"), # noqa: E501 | ||
"LlavaNextVideoForConditionalGeneration": ("llava_next_video", "LlavaNextVideoForConditionalGeneration"), # noqa: E501 | ||
"LlavaOnevisionForConditionalGeneration": ("llava_onevision", "LlavaOnevisionForConditionalGeneration"), # noqa: E501 | ||
"MiniCPMV": ("minicpmv", "MiniCPMV"), | ||
"PaliGemmaForConditionalGeneration": ("paligemma", | ||
"PaliGemmaForConditionalGeneration"), | ||
"MllamaForConditionalGeneration": ("mllama", "MllamaForConditionalGeneration"), # noqa: E501 | ||
"NVLM_D": ("nvlm_d", "InternVLChatModel"), | ||
"PaliGemmaForConditionalGeneration": ("paligemma", "PaliGemmaForConditionalGeneration"), # noqa: E501 | ||
"Phi3VForCausalLM": ("phi3v", "Phi3VForCausalLM"), | ||
"PixtralForConditionalGeneration": ("pixtral", | ||
"PixtralForConditionalGeneration"), | ||
"PixtralForConditionalGeneration": ("pixtral", "PixtralForConditionalGeneration"), # noqa: E501 | ||
"QWenLMHeadModel": ("qwen", "QWenLMHeadModel"), | ||
"Qwen2VLForConditionalGeneration": ("qwen2_vl", | ||
"Qwen2VLForConditionalGeneration"), | ||
"Qwen2VLForConditionalGeneration": ("qwen2_vl", "Qwen2VLForConditionalGeneration"), # noqa: E501 | ||
"UltravoxModel": ("ultravox", "UltravoxModel"), | ||
"MllamaForConditionalGeneration": ("mllama", | ||
"MllamaForConditionalGeneration"), | ||
} | ||
# yapf: enable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resorted the list in alphabetical order, and enforced one line per model to be more readable.
I've fixed the errors up to but not including merging multimodal embeddings. We probably need to implement additional logic to handle tile tagging. |
(EDIT this was resolved by latest commits) I had a failure when trying to load the model weights
(UPDATE) Now I see an error during model initialization:
To aid debugging I made a FP8 model checkpoint: https://huggingface.co/nm-testing/NVLM-D-72B-FP8-dynamic |
OK I'm able to use the model in online serving now. The outputs seem reasonable. |
Now I just have to set up the offline examples... |
@DarkLight1337 I plugged it into the existing This was the output, which seems reasonable:
|
That is odd, I am getting completely nonsense results on my end. |
@mgoin Can you check whether the multi-image example also works? |
If I set |
It seems to be an issue on the machine I am using to test the model. I can't run any models with both Update: Thanks @ywang96 for helping test this! |
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Per our offline discussion, feel free to consolidate the ViT attention module for the two models.
Hi, i'm not able to build the new vllm. Has anyone tried to build this pull from the source code? |
What error are you running into specifically? |
It show problem with numpy not installed while i already done the installation. I use |
Does this also happen on main branch? This sounds similar to #8851 |
Implement NVLM-D model based on InternVL.
While testing the model, @ywang96 found that the existing implementation of parallel attention in InternViT does not work with QK normalization. Thanks @Isotr0py for fixing this!
FIX #9040
FIX #9041