Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync with upstream@v0.4.3-60-gbaa15a9e #47

Merged
merged 30 commits into from
Jun 10, 2024

Conversation

github-actions[bot]
Copy link

@github-actions github-actions bot commented Jun 7, 2024

@github-actions github-actions bot added the code-sync Sync with upstream label Jun 7, 2024
@openshift-ci openshift-ci bot requested review from heyselbi and vaibhavjainwiz June 7, 2024 04:31
Copy link

openshift-ci bot commented Jun 7, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: github-actions[bot]
Once this PR has been reviewed and has the lgtm label, please assign danielezonca for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

openshift-ci bot commented Jun 7, 2024

Hi @github-actions[bot]. Thanks for your PR.

I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

youkaichao and others added 3 commits June 6, 2024 22:15
Switching from torch._scaled_mm to vLLM's cutlass fp8 kernels when supported as we are seeing 5-15% improvement in e2e performance on neuralmagic/Meta-Llama-3-8B-Instruct-FP8

see https://docs.google.com/spreadsheets/d/1GiAnmzyGHgZ6zL_LDSTm35Bdrt4A8AaFEurDlISYYA4/ for some quick e2e benchmarks and #5144 for comparisons across different GEMM sizes.
@dtrifiro
Copy link

dtrifiro commented Jun 7, 2024

/ok-to-test
/approved

JamesLim-sy and others added 11 commits June 7, 2024 13:35
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: team <calvinn.ng@ahrefs.com>
Bug description:
With torch 2.4.0.dev20240603+cu121,
cutlass_fp8_supported outputs False, and the (capability, version) before the comparison is (90, 11111111112)

This PR fixes the support check for FP8 CUTLASS ( cutlass_fp8_supported) which was introduced in #5183.
[CI/Test] improve robustness of test by replacing del with context manager (hf_runner) (#5347)
[CI/Test] improve robustness of test by replacing del with context manager (vllm_runner) (#5357)
@dtrifiro
Copy link

/ok-to-test

@dtrifiro dtrifiro merged commit e332d6f into opendatahub-io:main Jun 10, 2024
0 of 3 checks passed
prarit pushed a commit to prarit/vllm that referenced this pull request Oct 18, 2024
* support quark

* using torch/all.h

* loading weight from quark output

* support both ammo and quark

* Update doc

* fix load ammo

* fix linter

* fix isort
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code-sync Sync with upstream ok-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.