Add inference throughput benchmark on-prem vllm #331

WuhanMonkey · 2023-12-15T19:17:03Z

What does this PR do?

This is the 1st PR as part of the series to add inference throughput benchmarks for Llama 2 models.
In this PR, it adds benchmark scripts, sample input prompts and instructions to run throughput benchmark on-prem for vLLM containers.
The reasons on why we are adding these benchmarks and upcoming benchmarks are in the README file.

Feature/Issue validation/testing

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
Logs for Test A
Test B
Logs for Test B

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Thanks for contributing 🎉!

jeffxtang

Nice benchmark addition to the repo! Just added some text edit suggestions.

README.md

benchmarks/inference_throughput/README.md

benchmarks/inference_throughput/on-perm/README.md

Add lint disable for dead link false alarm

…github.com/facebookresearch/llama-recipes into benchmark-infernece-throughput-onperm-vllm

WuhanMonkey · 2023-12-30T22:30:23Z

Addressed comments. Need code testing.

HamidShojanazeri

Thanks @WuhanMonkey for the PR, added few comments. I also suggest to rename the folder tobenchmarks/inference/on-prem/vllm instead of inference_throughput.

benchmarks/inference_throughput/on-perm/README.md

benchmarks/inference_throughput/on-perm/vllm/chat_vllm_benchmark.py

benchmarks/inference_throughput/README.md

benchmarks/inference_throughput/on-perm/vllm/chat_vllm_benchmark.py

benchmarks/inference_throughput/on-perm/README.md

bilaalmirza · 2024-01-05T19:38:11Z

Great new benchmark in the repo! I've made some text edits for clarity.

HamidShojanazeri

LGTM, please resolve the lint error.

PR for inference throughput benchmark on-perm vllm

7291664

WuhanMonkey requested a review from jeffxtang December 15, 2023 19:17

facebook-github-bot added the cla signed label Dec 15, 2023

WuhanMonkey requested a review from HamidShojanazeri December 15, 2023 19:17

WuhanMonkey changed the title ~~PR for inference throughput benchmark on-perm vllm~~ Add inference throughput benchmark on-perm vllm Dec 15, 2023

WuhanMonkey added 6 commits December 15, 2023 11:37

Fix spelling

3436e7c

Fix more spellcheck and add to wordlist for ignore

a31afa5

Update tokenizer path

194fe81

Update README.md

c46e501

Remove unnecessary variable

139ff77

Update comments

1240f45

jeffxtang requested changes Dec 21, 2023

View reviewed changes

WuhanMonkey mentioned this pull request Dec 21, 2023

Add script to run cloud api throughput benchmark #339

Merged

6 tasks

jeffxtang and others added 5 commits December 27, 2023 08:26

Merge branch 'main' into benchmark-infernece-throughput-onperm-vllm

cc107d2

Address comments

9f84f73

Merge branch 'main' into benchmark-infernece-throughput-onperm-vllm

87f6119

Update README.md

2b0fc14

Add lint disable for dead link false alarm

Merge branch 'benchmark-infernece-throughput-onperm-vllm' of https://…

e6a28c7

…github.com/facebookresearch/llama-recipes into benchmark-infernece-throughput-onperm-vllm

Update README.md

bd3eb3a

HamidShojanazeri reviewed Jan 2, 2024

View reviewed changes

WuhanMonkey added 2 commits January 3, 2024 12:55

Address comments

e80c258

fix type and rename folder names

fce0485

WuhanMonkey changed the title ~~Add inference throughput benchmark on-perm vllm~~ Add inference throughput benchmark on-prem vllm Jan 3, 2024

HamidShojanazeri approved these changes Jan 11, 2024

View reviewed changes

Update delay simulation comment

ff323f4

WuhanMonkey force-pushed the benchmark-infernece-throughput-onperm-vllm branch from 04726df to ff323f4 Compare January 12, 2024 22:51

WuhanMonkey requested a review from jeffxtang January 12, 2024 22:53

jeffxtang approved these changes Jan 12, 2024

View reviewed changes

WuhanMonkey merged commit 689e57b into main Jan 16, 2024
3 checks passed

WuhanMonkey deleted the benchmark-infernece-throughput-onperm-vllm branch January 16, 2024 17:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add inference throughput benchmark on-prem vllm #331

Add inference throughput benchmark on-prem vllm #331

WuhanMonkey commented Dec 15, 2023 •

edited

Loading

jeffxtang left a comment

WuhanMonkey commented Dec 30, 2023

HamidShojanazeri left a comment

bilaalmirza commented Jan 5, 2024

HamidShojanazeri left a comment

Add inference throughput benchmark on-prem vllm #331

Add inference throughput benchmark on-prem vllm #331

Conversation

WuhanMonkey commented Dec 15, 2023 • edited Loading

What does this PR do?

Feature/Issue validation/testing

Before submitting

jeffxtang left a comment

Choose a reason for hiding this comment

WuhanMonkey commented Dec 30, 2023

HamidShojanazeri left a comment

Choose a reason for hiding this comment

bilaalmirza commented Jan 5, 2024

HamidShojanazeri left a comment

Choose a reason for hiding this comment

WuhanMonkey commented Dec 15, 2023 •

edited

Loading