-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(DO NOT MERGE) IBM release WIP #76
Conversation
…vllm-project#5710) Co-authored-by: Roger Wang <ywang@roblox.com>
…penai/run_batch.py (vllm-project#5756)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Signed-off-by: kevin <kevin@anyscale.com>
…lel size than target model (vllm-project#5414)
…ements, test fixes (vllm-project#5422)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
[ci][distributed] fix some cuda init that makes it necessary to use spawn (vllm-project#5991)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…xpected modules. (vllm-project#5909) Co-authored-by: sang <sangcho@anyscale.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic>
…Weight Loading) (vllm-project#5940) Co-authored-by: Robert Shaw <rshaw@neuralmagic>
… Spec Decode Worker (vllm-project#5348)
…ct#6041) Co-authored-by: Simon Mo <simon.mo@hey.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: prashantgupta24 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test all |
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
* fix gradlib fp8 output * add condition check for existing tune result * fix linter * fix import order * fix lint
All vllm integration tests passing on this image!