-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add inference throughput benchmark on-prem vllm #331
Add inference throughput benchmark on-prem vllm #331
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice benchmark addition to the repo! Just added some text edit suggestions.
Add lint disable for dead link false alarm
…github.com/facebookresearch/llama-recipes into benchmark-infernece-throughput-onperm-vllm
Addressed comments. Need code testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @WuhanMonkey for the PR, added few comments. I also suggest to rename the folder tobenchmarks/inference/on-prem/vllm
instead of inference_throughput
.
benchmarks/inference_throughput/on-perm/vllm/chat_vllm_benchmark.py
Outdated
Show resolved
Hide resolved
benchmarks/inference_throughput/on-perm/vllm/chat_vllm_benchmark.py
Outdated
Show resolved
Hide resolved
benchmarks/inference_throughput/on-perm/vllm/chat_vllm_benchmark.py
Outdated
Show resolved
Hide resolved
benchmarks/inference_throughput/on-perm/vllm/chat_vllm_benchmark.py
Outdated
Show resolved
Hide resolved
benchmarks/inference_throughput/on-perm/vllm/chat_vllm_benchmark.py
Outdated
Show resolved
Hide resolved
benchmarks/inference_throughput/on-perm/vllm/chat_vllm_benchmark.py
Outdated
Show resolved
Hide resolved
Great new benchmark in the repo! I've made some text edits for clarity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please resolve the lint error.
04726df
to
ff323f4
Compare
What does this PR do?
This is the 1st PR as part of the series to add inference throughput benchmarks for Llama 2 models.
In this PR, it adds benchmark scripts, sample input prompts and instructions to run throughput benchmark on-prem for vLLM containers.
The reasons on why we are adding these benchmarks and upcoming benchmarks are in the README file.
Feature/Issue validation/testing
Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Test A
Logs for Test A
Test B
Logs for Test B
Before submitting
Pull Request section?
to it if that's the case.
Thanks for contributing 🎉!