Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync with 0.7.2 #315

Merged
merged 68 commits into from
Feb 7, 2025
Merged

sync with 0.7.2 #315

merged 68 commits into from
Feb 7, 2025

Conversation

dtrifiro
Copy link

@dtrifiro dtrifiro commented Feb 7, 2025

mgoin and others added 30 commits February 1, 2025 16:16
Word "evolved" was mistyped

Signed-off-by: Vicente Herrera <vicenteherrera@vicenteherrera.com>

---------

Signed-off-by: Vicente Herrera <vicenteherrera@vicenteherrera.com>
Fix vllm-project#12647
The `get_quant_method` of `moe_wna16` always return moe method,
GPTQ-based linear method or AWQ-based linear method, even when the
target module is attention layer.


https://github.com/vllm-project/vllm/blob/baeded25699f9f4851843306f27f685c4d4ee7c5/vllm/attention/layer.py#L86-L92

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
I noticed during testing that I was getting a lot of these deprecation
warnings about `local_lora_path`:

```
DeprecationWarning: The 'lora_local_path' attribute is deprecated
     and will be removed in a future version.
     Please use 'lora_path' instead.
```

The check used for emitting this warning was always True, even when the
parameter was not actually specified. It will always be in
`__struct_fields__`. We should be checking for a non-None value,
instead.

Signed-off-by: Russell Bryant <rbryant@redhat.com>

Signed-off-by: Russell Bryant <rbryant@redhat.com>
A small optimization to avoid creating a new `ConstantList` every time `request.kv_block_hashes` is used.

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
…anager (vllm-project#12608)

As mentioned in RFC vllm-project#12254,
this PR achieves the task: combine allocate_slots and append_slots.

There should be no functionality change, except that in decode, also
raise exception when num_tokens is zero (like prefill), and change the
unit test case accordingly.

@comaniac @rickyyx @WoosukKwon @youkaichao @heheda12345 @simon-mo

---------

Signed-off-by: Shawn Du <shawnd200@outlook.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
…lm-project#12628)

- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**

commit 9d7ef44
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:18:24 2025 -0500

    Add SPDX license headers to python source files
    
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
    also be easily used by tools to help manage license compliance.
    
The Linux Foundation runs license scans against the codebase to help
ensure
    we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
    
    More information can be found on the SPDX site:
    
    - https://spdx.dev/learn/handling-license-info/
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

commit 5a1cf1c
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:36:32 2025 -0500

    Check for SPDX headers using pre-commit
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

---------

Signed-off-by: Russell Bryant <rbryant@redhat.com>
…ct#12667)

As more and more people are trying deepseek models with multi-node
inference, vllm-project#7815 becomes more
frequent. Let's give clear message to users.

Signed-off-by: youkaichao <youkaichao@gmail.com>
sgl_moe_align_block_size is based on:


sgl-project/sglang@ded9fcd

moe_align_block_size is based on:


sgl-project/sglang@ba5112f

Signed-off-by: Yang Chen <yangche@fb.com>
…oject#12669)

When people use deepseek models, they find that they need to solve cv2
version conflict, see https://zhuanlan.zhihu.com/p/21064432691 .

I added the check, and make all imports of `cv2` lazy.

---------

Signed-off-by: youkaichao <youkaichao@gmail.com>
…llm-project#12570)

Fix to AWQ quant loading of the new R1 model

The new optimized MoE kernels for a large number of experts `moe_wn16`
uses AWQ quant which requires the attention layers to be in 16bit

The current merge has broken this, and the `get_quant_method` must
return None for it to work correctly again

---------

Signed-off-by: Srikanth Srinivas <srikanth@astrum.ai>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Beim <beim2015@outlook.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: npanpaliya <nishidha.panpaliya@partner.ibm.com>
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: simon-mo <xmo@berkeley.edu>
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Ryan N <ryan.nguyen@centml.ai>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Vicente Herrera <vicenteherrera@vicenteherrera.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Shawn Du <shawnd200@outlook.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Beim <805908499@qq.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Nishidha <nishidha.panpaliya@partner.ibm.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Ryan Nguyen <96593302+xpbowler@users.noreply.github.com>
Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
Co-authored-by: fade_away <1028552010@qq.com>
Co-authored-by: weilong.yu <weilong.yu@shopee.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Eldar Kurtic <eldarkurtic314@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Vicente Herrera <vicenteherrera@vicenteherrera.com>
Co-authored-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Shawn Du <shawnd200@outlook.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
fixes problems like vllm-project#12635 and
vllm-project#12636 and
vllm-project#12565

---------

Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
# Adds support for `transformers` as a backend

Following huggingface/transformers#35235, a
bunch of models should already be supported, we are ramping up support
for more models.

Thanks @Isotr0py for the TP support, and @hmellor for his help as well!
This includes: 
- `trust_remote_code=True` support: any model on the hub, if it
implements attention the correct way can be natively supported!!
- tensor parallel support

---------

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…aled mm (vllm-project#12696)

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
…oject#12553)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
@dtrifiro dtrifiro requested a review from njhill as a code owner February 7, 2025 18:02
Copy link

openshift-ci bot commented Feb 7, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dtrifiro

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dtrifiro dtrifiro merged commit 1da3720 into opendatahub-io:main Feb 7, 2025
2 of 6 checks passed
@dtrifiro dtrifiro deleted the sync-with-0.7.2 branch February 7, 2025 18:56
groenenboomj pushed a commit that referenced this pull request Feb 24, 2025
…with the existing tuning, prior to moving all the way forward to release/3.2.x; Using the correct hipblaslt version in the name (#315)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.