-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[misc] CUDA Time Layerwise Profiler #8337
base: main
Are you sure you want to change the base?
[misc] CUDA Time Layerwise Profiler #8337
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
3516c46
to
d857de9
Compare
1a0844e
to
52aafcf
Compare
52aafcf
to
e03bedb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Since the script is pretty technically involved and relies on exact attributes to exist, could you add a simple e2e test to run in CI so we can know if torch updates break it?
Co-authored-by: Michael Goin <michael@neuralmagic.com>
what's the easiest way to do this? just add a pytest test or just invoke offline_profile somehow? is there instructions on how to register something with buildkite or all pytest folders already automatically run? @mgoin added examples test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and works well!
Layerwise profiler for see how much time is spent on CUDA (GPU kernels) for each module/layer
Example of how to run a profile
Then there are some utilities for looking at the profile breakdown, e.g. to get a summary table of the prefill phase you can run:
Or to view it as a graph you can run: