-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: TP with external_launcher is not working with vLLM version 0.8.0 and above #15895
Comments
that seems to be true, i can reproduce in the main branch. this also produces different results:
@toslali-ibm can you try to bisect to find which commit is responsible? following https://blog.vllm.ai/2025/01/10/dev-experience.html you can find wheels for all commits. |
I am able to get identical generations if I use vllm 0.7.3. I will try wheels to identify which commit broke this behavior. |
I think I found the breaking commit.
vs.
CC @youkaichao |
but that commit sets a fixed random seed, right? why would that produce different results 🤔 |
@toslali-ibm @youkaichao #14274 sets the seed to
|
@toslali-ibm @youkaichao Please see https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/reproduciblity.py |
@WoosukKwon thanks for the reminder! Indeed I can see that cc10281 adds seed to |
Your current environment
The output of `examples/offline_inference/torchrun_example.py`
🐛 Describe the bug
When I run the script with
torchrun --nproc-per-node=2 torchrun_example.py
, ranks have different output (vlllm == 0.8.0 and onward). Whn I try it with 0.7.3, it works.Before submitting a new issue...
CC @youkaichao
The text was updated successfully, but these errors were encountered: