Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add inference throughput benchmark on-prem vllm #331

Merged
merged 16 commits into from
Jan 16, 2024

Conversation

WuhanMonkey
Copy link
Contributor

@WuhanMonkey WuhanMonkey commented Dec 15, 2023

What does this PR do?

This is the 1st PR as part of the series to add inference throughput benchmarks for Llama 2 models.
In this PR, it adds benchmark scripts, sample input prompts and instructions to run throughput benchmark on-prem for vLLM containers.
The reasons on why we are adding these benchmarks and upcoming benchmarks are in the README file.

Feature/Issue validation/testing

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A
    Logs for Test A

  • Test B
    Logs for Test B

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Thanks for contributing 🎉!

@WuhanMonkey WuhanMonkey changed the title PR for inference throughput benchmark on-perm vllm Add inference throughput benchmark on-perm vllm Dec 15, 2023
Copy link
Contributor

@jeffxtang jeffxtang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice benchmark addition to the repo! Just added some text edit suggestions.

README.md Outdated Show resolved Hide resolved
benchmarks/inference_throughput/README.md Outdated Show resolved Hide resolved
benchmarks/inference_throughput/README.md Outdated Show resolved Hide resolved
benchmarks/inference_throughput/README.md Outdated Show resolved Hide resolved
benchmarks/inference_throughput/README.md Outdated Show resolved Hide resolved
benchmarks/inference_throughput/on-perm/README.md Outdated Show resolved Hide resolved
benchmarks/inference_throughput/on-perm/README.md Outdated Show resolved Hide resolved
benchmarks/inference_throughput/on-perm/README.md Outdated Show resolved Hide resolved
benchmarks/inference_throughput/on-perm/README.md Outdated Show resolved Hide resolved
benchmarks/inference_throughput/on-perm/README.md Outdated Show resolved Hide resolved
@WuhanMonkey
Copy link
Contributor Author

Addressed comments. Need code testing.

Copy link
Contributor

@HamidShojanazeri HamidShojanazeri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WuhanMonkey for the PR, added few comments. I also suggest to rename the folder tobenchmarks/inference/on-prem/vllm instead of inference_throughput.

benchmarks/inference_throughput/on-perm/README.md Outdated Show resolved Hide resolved
benchmarks/inference_throughput/on-perm/README.md Outdated Show resolved Hide resolved
benchmarks/inference_throughput/README.md Outdated Show resolved Hide resolved
benchmarks/inference_throughput/on-perm/README.md Outdated Show resolved Hide resolved
@WuhanMonkey WuhanMonkey changed the title Add inference throughput benchmark on-perm vllm Add inference throughput benchmark on-prem vllm Jan 3, 2024
@bilaalmirza
Copy link

Great new benchmark in the repo! I've made some text edits for clarity.

Copy link
Contributor

@HamidShojanazeri HamidShojanazeri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please resolve the lint error.

@WuhanMonkey WuhanMonkey force-pushed the benchmark-infernece-throughput-onperm-vllm branch from 04726df to ff323f4 Compare January 12, 2024 22:51
@WuhanMonkey WuhanMonkey requested a review from jeffxtang January 12, 2024 22:53
@WuhanMonkey WuhanMonkey merged commit 689e57b into main Jan 16, 2024
3 checks passed
@WuhanMonkey WuhanMonkey deleted the benchmark-infernece-throughput-onperm-vllm branch January 16, 2024 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants