Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add option to skip initial sync in Manager #117

Open
d4l3k opened this issue Feb 22, 2025 · 0 comments
Open

add option to skip initial sync in Manager #117

d4l3k opened this issue Feb 22, 2025 · 0 comments
Labels
enhancement New feature or request good first issue Good for newcomers lighthouse Lighthouse and quorum related rust

Comments

@d4l3k
Copy link
Member

d4l3k commented Feb 22, 2025

We currently always heal on step 0 to avoid synchronization issues. We want an option to support skipping this sync for users who set the PyTorch seed so all ranks are initialized with the same values.

This should match the name init_sync from pytorch/pytorch#142824

Bonus would be to randomly initialize a value in Manager so we can detect whether or not ranks are seeded and throw an error if there's a mismatch on first quorum.

Relevant code:

@d4l3k d4l3k added lighthouse Lighthouse and quorum related rust enhancement New feature or request good first issue Good for newcomers labels Feb 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers lighthouse Lighthouse and quorum related rust
Projects
None yet
Development

No branches or pull requests

1 participant