Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PyTorch] Improve CP P2P efficiency #1208

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yenchenlin
Copy link

Description

The original implementation saves each attention layer's whole K, V.
Instead, we can discard used K, V to ensure each GPU only holds the current and the next K, V.

Memory profiling results using the toy unit-test

Original:
Screenshot 2024-09-26 at 1 13 04 PM

This MR:
Screenshot 2024-09-26 at 1 13 16 PM

One can see that the unused K, V are immediately released.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refractor

Signed-off-by: Yen-Chen Lin <yenchenl@nvidia.com>
@yenchenlin yenchenlin changed the title Improve CP P2P efficiency [PyTorch] Improve CP P2P efficiency Sep 26, 2024
@xrennvidia xrennvidia assigned xrennvidia and unassigned xrennvidia Sep 26, 2024
@xrennvidia xrennvidia self-requested a review September 26, 2024 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants