[PyTorch] Improve CP P2P efficiency #1208

yenchenlin · 2024-09-26T20:18:13Z

Description

The original implementation saves each attention layer's whole K, V.
Instead, we can discard used K, V to ensure each GPU only holds the current and the next K, V.

Memory profiling results using the toy unit-test

Original:

This MR:

One can see that the unused K, V are immediately released.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Signed-off-by: Yen-Chen Lin <yenchenl@nvidia.com>

yenchenlin force-pushed the improve-p2p-efficiency branch from ef2ff72 to 6c4a4e5 Compare September 26, 2024 20:20

Improve CP P2P efficiency

4906662

Signed-off-by: Yen-Chen Lin <yenchenl@nvidia.com>

yenchenlin force-pushed the improve-p2p-efficiency branch from f908546 to 4906662 Compare September 26, 2024 20:23

yenchenlin changed the title ~~Improve CP P2P efficiency~~ [PyTorch] Improve CP P2P efficiency Sep 26, 2024

xrennvidia assigned xrennvidia and unassigned xrennvidia Sep 26, 2024

xrennvidia self-requested a review September 26, 2024 20:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Improve CP P2P efficiency #1208

[PyTorch] Improve CP P2P efficiency #1208

yenchenlin commented Sep 26, 2024

[PyTorch] Improve CP P2P efficiency #1208

Are you sure you want to change the base?

[PyTorch] Improve CP P2P efficiency #1208

Conversation

yenchenlin commented Sep 26, 2024

Description

Memory profiling results using the toy unit-test

Type of change