Skip to content

Fix parallel load #271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Fix parallel load #271

wants to merge 7 commits into from

Conversation

jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented May 15, 2025

✨ Description

Fix: #244

This should fix all the bugs described in #244:

  • Parameter counter check now takes duplicates into account (fix tensor and pipeline-parallel)
  • Avoid _set_implicit_default for sequence_tensor_parallel since it's an override and not an implicit default. (fix the bug reported for sequence data parallel, though it doesn't technically relates to the feature)

Also address other minor issues and failing tests, and a tentative fix for #249.

🔍 Type of change

Select all that apply:

  • 🐛 Bug fix (non-breaking change that addresses a specific issue)
  • 🚀 New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
  • 📝 Documentation change (updates documentation, including new content or typo fixes)
  • 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

@jlamypoirier jlamypoirier marked this pull request as ready for review May 16, 2025 17:07
@jlamypoirier jlamypoirier mentioned this pull request May 16, 2025
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bugs in Distributed Loading of Non-Distributed Checkpoints
1 participant