-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hierarchical CP implementation (Ulysses + Ring) #1209
Conversation
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
for more information, see https://pre-commit.ci
/te-ci pytorch |
/te-ci pytorch |
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
/te-ci pytorch |
@xrennvidia thanks for the PR! I left a few comments and also edited the PR description a bit. Let me know if it's accurate. Thanks! |
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
for more information, see https://pre-commit.ci
…gine into xren/cp_a2a_p2p
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
It's accurate, thanks. |
/te-ci pytorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
This PR adds a hierarchical implementation of context parallelism to attention. It uses A2A communications in low-level CP groups (e.g., via NVLink), and P2P communications in high-level CP groups (e.g., via IBLink). For more details, please refer to LongVILA and USP.
This implementation supports:
FusedAttention
,FlashAttention
FusedAttention
only)causal
,no_mask
sbhd
,bshd
no_bias
Type of change
Checklist: