[Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device. #450

ys950902 · 2024-09-29T06:21:17Z

As sequence_parallel is added in Megatron-DeepSpeed for layernorm, for current implementation, non-CUDA device is using from torch.nn import LayerNorm for layernorm, there is no attr named sequence_parallel, will cause init error for non-CUDA device.

This pr is to fix this issue.

ys950902 · 2024-09-29T06:22:37Z

#429 for layernorm is added in this pr. Hi @polisettyvarma could you please also take a look on this pr, is it okay for you on layernorm, many thank!

polisettyvarma · 2024-09-29T15:08:35Z

@ys950902 will this feature work correctly ? have you done any accuracy check for this ?

tjruwase · 2024-09-30T13:30:46Z

megatron/model/__init__.py

@@ -13,7 +13,7 @@
        from .fused_rmsnorm import RMSNorm
    else:
        from .rmsnorm import RMSNorm
-    from torch.nn import LayerNorm
+    from .layernorm import LayerNorm


Can you please share the failure that is being fixed? I have two concerns about this change:

It is quite subtle since it does not show the connection to sequence-parallelism

It is unclear to me that new LayerNorm is equivalent to torch.nn.LayerNorm for non sequence-parallel case. Maintaining parity with torch.nn.LayerNorm imposes extra development burden.

So, I would like to further understand the problem and explore alternative solutions. Thanks!

tjruwase · 2024-12-14T00:53:18Z

Closing for lack of response. Please re-open if needed.

Fix init issue for layer_norm in sequence_parallel.

3ad2e8f

ys950902 requested review from tjruwase, awan-10, eltonzheng, duli2012, arashb and GuanhuaWang as code owners September 29, 2024 06:21

ys950902 changed the title ~~Fix init issue for layer_norm in sequence_parallel for non-CUDA device.~~ [Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device. Sep 29, 2024

tjruwase removed request for arashb, duli2012, awan-10, GuanhuaWang and eltonzheng September 30, 2024 13:19

tjruwase reviewed Sep 30, 2024

View reviewed changes

tjruwase closed this Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device. #450

[Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device. #450

ys950902 commented Sep 29, 2024

ys950902 commented Sep 29, 2024 •

edited

Loading

polisettyvarma commented Sep 29, 2024

tjruwase Sep 30, 2024 •

edited

Loading

tjruwase commented Dec 14, 2024

[Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device. #450

[Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device. #450

Conversation

ys950902 commented Sep 29, 2024

ys950902 commented Sep 29, 2024 • edited Loading

polisettyvarma commented Sep 29, 2024

tjruwase Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

tjruwase commented Dec 14, 2024

ys950902 commented Sep 29, 2024 •

edited

Loading

tjruwase Sep 30, 2024 •

edited

Loading