[JAX] Expose sliding window attn to TE-JAX API #1205

huanghua1994 · 2024-09-25T23:09:14Z

Description

Recent models employ sliding window attention (SWA). Some frameworks use cuDNN fused attention through the TE-JAX Flash Attention API. The SWA support has not been exposed to this API yet. However, on the backend TE does have support for the SWA. This PR expose the SWA support to the Flash Attention API.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Changes

Please list the changes introduced in this PR:

Expose sliding window attention to the TE-JAX API

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Hua Huang <huah@nvidia.com>

for more information, see https://pre-commit.ci

mgoldfarb-nvidia

Thank you for this PR! Have a few small questions/comments.

tests/jax/test_fused_attn.py

transformer_engine/jax/cpp_extensions/attention.py

tests/jax/test_fused_attn.py

mingxu1067 · 2024-09-26T14:40:49Z

Could you port the SWA to flax and praxis modules as well?

Signed-off-by: Hua Huang <huah@nvidia.com>

mgoldfarb-nvidia

LGTM! Please address @mingxu1067 comments and will be in good shape.

Signed-off-by: Hua Huang <huah@nvidia.com>

for more information, see https://pre-commit.ci

kocchop · 2024-09-29T02:32:47Z

Hi, MaxText is using the DotProductAttention API from transformer_engine/jax/flax/transformer.py. It'd be super useful to expose the SWA to this

huanghua1994 · 2024-09-29T04:44:27Z

Hi @kocchop I am working on it, should be able to commit new changes next week

Will update tests/jax/test_praxis_layers.py next Signed-off-by: Hua Huang <huah@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Hua Huang <huah@nvidia.com>

huanghua1994 · 2024-10-01T17:22:13Z

PR #1212 affects this PR. Maybe we should wait for this PR.

mingxu1067

LGTM, pending for CI and #1212

transformer_engine/jax/attention.py

tests/jax/test_fused_attn.py

mingxu1067 · 2024-10-01T19:51:50Z

@zlsh80826, could you help review this PR as well? Thanky you.

Signed-off-by: Hua Huang <huah@nvidia.com>

for more information, see https://pre-commit.ci

zlsh80826 · 2024-10-02T14:20:22Z

@zlsh80826, could you help review this PR as well? Thanky you.

Sure, I will review it by this week.

tests/jax/test_fused_attn.py

zlsh80826 · 2024-10-03T14:32:48Z

tests/jax/utils.py

@@ -943,7 +971,14 @@ def __post_init__(self):

    @nn.compact
    def __call__(self, inputs, encoder_mask=None, deterministic=False):
-        del self.self_attn_mask_type  # dummy, just align to TE's impl
+        # Currently cuDNN backend only supports SWA for causal/padding_causal, follow this
+        if self.self_attn_mask_type in ["causal", "padding_causal"] and self.window_size[0] > 0:


What is the expected behavior if self.self_attn_mask_type == 'padding' and self.window_size[0] > 0?

In test_fused_attn.py, if we use padding mask and a window_size[0] > 0, no supported C++ backend is available, and the tests are all skipped. The EncoderLayer and DecoderLayer classes in this utils.py will only be used in test_layer.py. Added some code in test_layer.py to skip if window_size[0] > 0 and not using causal/padding_causal mask.

The logical between test_layer.py and utils.py should be seperated. We don't know when will other people make changes to test_layer.py or utils.py standalone. If someone don't know the relationship between them, they might only change only one side and spend lots of time on debugging.

Currently, I noticed that this call doesn’t handle self.self_attn_mask_type == 'padding', nor does it provide any warning or error. It’s important not to assume this function will only be used by test_layer.py or padding will not be passed, as new developers may not be aware of these assumptions. If padding is not supported at the moment, it would be best to raise an exception within this function to prevent unintended behavior.

Does the following code look good?

if self.window_size[0] > 0: if self.self_attn_mask_type in ["causal", "padding_causal"]: encoder_mask = apply_swa_mask( self.self_attn_mask_type, encoder_mask, self.window_size, ) else: raise NotImplementedError("cuDNN only supports SWA for causal and padding_causal")

tests/jax/test_fused_attn.py

Signed-off-by: Hua Huang <huah@nvidia.com>

for more information, see https://pre-commit.ci

zlsh80826 · 2024-10-05T12:12:08Z

/te-ci jax

zlsh80826 · 2024-10-06T16:30:39Z

transformer_engine/jax/cpp_extensions/attention.py

@@ -1042,6 +1056,10 @@ class FusedAttnCPWithAllGatherFwdPrimitive(FusedAttnFwdPrimitive):
    def partition(config, mesh, arg_infos, result_infos):
        # Call base implementation for non-context parallel mesh to avoid unecessary work.
        is_context_parallel = get_mesh_axis_size(config.cp_axis, mesh) > 1
+        if is_context_parallel and config.window_size[0] > -1:
+            assert (


I think raise NotImplementedError is more suitable here. Actually assert may be ignored in the optimization mode in python or release mode in C++.

I use assert here to follow the earlier code in this file, for example, the asserts in fused_attn_fwd() (lines 1353 to 1366). Do I only need to replace the asserts in my commits with raise NotImplementedError?

ok, let's just keep the assert style. But the if statement here doesn't look need.

if is_context_parallel and config.window_size[0] > -1: assert ( is_context_parallel and config.window_size[0] == -1 ), "Sliding window attention is not supported when context parallelism is enabled"

equals to

assert ( is_context_parallel and config.window_size[0] == -1 ), "Sliding window attention is not supported when context parallelism is enabled"

zlsh80826 · 2024-10-06T16:30:46Z

transformer_engine/jax/cpp_extensions/attention.py

@@ -1136,6 +1154,10 @@ class FusedAttnCPWithAllGatherBwdPrimitive(FusedAttnBwdPrimitive):
    def partition(config, mesh, arg_infos, result_infos):
        # Call base implementation for non-context parallel mesh to avoid unecessary work.
        is_context_parallel = get_mesh_axis_size(config.cp_axis, mesh) > 1
+        if is_context_parallel and config.window_size[0] > -1:


Same as above

zlsh80826 · 2024-10-06T16:35:51Z

transformer_engine/jax/flax/transformer.py

        def convert_to_softmax_type(attn_mask_type, mask):
            """Convert the attn_mask_type to SoftmaxType"""
-            # mask is ignored for no_mask and causal_mask
-            if attn_mask_type in [AttnMaskType.NO_MASK, AttnMaskType.CAUSAL_MASK]:
+            # mask is ignored for no_mask and causal_mask without sliding window


We need to raise a ValueError when we got self.window_size[0] >= -1 and AttnMaskType == NO_MASK or PADDING_MASK

transformer_engine/jax/flax/transformer.py

zlsh80826 · 2024-10-06T16:57:26Z

transformer_engine/jax/attention.py

+         attn_mask_type                              |   window_size
+    -------------------------------------------------------------------------
+    NO_MASK, PADDING_MASK                            | (-1, -1) or (>=0, >=0)
+    CAUSAL_MASK                                      | (-1,  0) or (>=0, 0)
+    PADDING_CAUSAL_MASK                              | (-1,  0) or (>=0, 0)
+    CAUSAL_BOTTOM_RIGHT_MASK                         | (-1,  0) or (>=0, 0)
+    PADDING_CAUSAL_BOTTOM_RIGHT_MASK                 | (-1,  0) or (>=0, 0)


Hi @cyanguwa, do you think to use None for no sliding window a better idea? I found there are lots of logical with greater than -1 but it looks like not easily to maintain. I think using None as no sliding window can enhance the code readability and maintainability.

I have used None for the higher-level APIs, for example, 1, 2, 3, 4, but I've used a check_set_window_size function to make sure window_size is consistent with mask type before passing it further down.

Expose JAX sliding window attn API

7163946

Signed-off-by: Hua Huang <huah@nvidia.com>

huanghua1994 added enhancement New feature or request jax labels Sep 25, 2024

huanghua1994 requested review from mingxu1067 and mgoldfarb-nvidia September 25, 2024 23:09

[pre-commit.ci] auto fixes from pre-commit.com hooks

6a5b20c

for more information, see https://pre-commit.ci

mgoldfarb-nvidia requested changes Sep 26, 2024

View reviewed changes

tests/jax/test_fused_attn.py Outdated Show resolved Hide resolved

transformer_engine/jax/cpp_extensions/attention.py Outdated Show resolved Hide resolved

tests/jax/test_fused_attn.py Show resolved Hide resolved

No SWA in context parallel; fix RNG seed in test

3dae9ad

Signed-off-by: Hua Huang <huah@nvidia.com>

mgoldfarb-nvidia approved these changes Sep 27, 2024

View reviewed changes

Hua Huang and others added 2 commits September 27, 2024 10:56

Handle SAW API discrepancy in cuDNN and Python

813287e

Signed-off-by: Hua Huang <huah@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

f9ee6e5

for more information, see https://pre-commit.ci

Hua Huang and others added 3 commits September 30, 2024 16:30

Add SAW API for flax, all tests passed

f56b28a

Will update tests/jax/test_praxis_layers.py next Signed-off-by: Hua Huang <huah@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

6f4df48

for more information, see https://pre-commit.ci

Update test_praxis_layers.py for SWA, test passed

6440665

Signed-off-by: Hua Huang <huah@nvidia.com>

huanghua1994 changed the title ~~Expose sliding window attn to TE-JAX API~~ [JAX] Expose sliding window attn to TE-JAX API Oct 1, 2024

mingxu1067 approved these changes Oct 1, 2024

View reviewed changes

transformer_engine/jax/attention.py Outdated Show resolved Hide resolved

tests/jax/test_fused_attn.py Outdated Show resolved Hide resolved

mingxu1067 requested a review from zlsh80826 October 1, 2024 19:51

Hua Huang and others added 2 commits October 1, 2024 15:28

Use tuple window_size; update for PR NVIDIA#1212

52a2d1c

Signed-off-by: Hua Huang <huah@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

a4e647e

for more information, see https://pre-commit.ci

zlsh80826 reviewed Oct 3, 2024

View reviewed changes

Hua Huang and others added 2 commits October 3, 2024 10:18

Add and adjust some pytest.skip

b420504

Signed-off-by: Hua Huang <huah@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

ddb3fe7

for more information, see https://pre-commit.ci

zlsh80826 reviewed Oct 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JAX] Expose sliding window attn to TE-JAX API #1205

[JAX] Expose sliding window attn to TE-JAX API #1205

huanghua1994 commented Sep 25, 2024 •

edited

Loading

mgoldfarb-nvidia left a comment

mingxu1067 commented Sep 26, 2024

mgoldfarb-nvidia left a comment

kocchop commented Sep 29, 2024 •

edited

Loading

huanghua1994 commented Sep 29, 2024

huanghua1994 commented Oct 1, 2024

mingxu1067 left a comment

mingxu1067 commented Oct 1, 2024

zlsh80826 commented Oct 2, 2024

zlsh80826 Oct 3, 2024

huanghua1994 Oct 3, 2024

zlsh80826 Oct 5, 2024

huanghua1994 Oct 7, 2024

zlsh80826 Oct 8, 2024

zlsh80826 commented Oct 5, 2024

zlsh80826 Oct 6, 2024

huanghua1994 Oct 7, 2024

zlsh80826 Oct 8, 2024

zlsh80826 Oct 6, 2024

huanghua1994 Oct 7, 2024

zlsh80826 Oct 6, 2024

zlsh80826 Oct 6, 2024

cyanguwa Oct 7, 2024

[JAX] Expose sliding window attn to TE-JAX API #1205

Are you sure you want to change the base?

[JAX] Expose sliding window attn to TE-JAX API #1205

Conversation

huanghua1994 commented Sep 25, 2024 • edited Loading

Description

Type of change

Changes

Checklist:

mgoldfarb-nvidia left a comment

Choose a reason for hiding this comment

mingxu1067 commented Sep 26, 2024

mgoldfarb-nvidia left a comment

Choose a reason for hiding this comment

kocchop commented Sep 29, 2024 • edited Loading

huanghua1994 commented Sep 29, 2024

huanghua1994 commented Oct 1, 2024

mingxu1067 left a comment

Choose a reason for hiding this comment

mingxu1067 commented Oct 1, 2024

zlsh80826 commented Oct 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zlsh80826 commented Oct 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huanghua1994 commented Sep 25, 2024 •

edited

Loading

kocchop commented Sep 29, 2024 •

edited

Loading