Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed. #7816

HelloWorldBeginner · 2024-04-29T08:32:16Z

What does this PR do?

Added support for SDXL finetune on AscendNPU and fixed the bug causing the hang out when saving models using the deepspeed distributed framework. DeepSpeed requires saving weights on every device; saving weights only on the main process would cause issues.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

I fine-tuned SDXL on AscendNPU, and the results are good. I hope diffusers can support more devices.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Training examples: @sayakpaul

…bug when using DeepSpeed.

HuggingFaceDocBuilderDev · 2024-04-29T11:10:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

HelloWorldBeginner · 2024-04-29T11:27:35Z

I found some errors in checks, how can I fix it?

examples/controlnet/train_controlnet_sdxl.py:16:1: I001 [*] Import block is un-sorted or un-formatted
examples/text_to_image/train_text_to_image_lora_sdxl.py:18:1: I001 [*] Import block is un-sorted or un-formatted
src/diffusers/models/activations.py:16:1: I001 [*] Import block is un-sorted or un-formatted
src/diffusers/models/attention_processor.py:14:1: I001 [*] Import block is un-sorted or un-formatted

It's strange because I didn't modify the code here.

sayakpaul · 2024-04-29T11:36:42Z

You can do the following:

Create a fresh Python environment.
Run pip install -e ".[quality]" from the root of diffusers.
Run make style && make quality.

HelloWorldBeginner · 2024-04-29T12:09:55Z

I've already fixed the code formatting issues in the checks.

yiyixuxu · 2024-04-29T21:20:08Z

@sayakpaul
I'm ok with this PR if you think it is needed :)

sayakpaul · 2024-04-30T02:32:08Z

Thanks, Yiyi.

I am alright with the PR because the number of changes is extremely minimal.

sayakpaul · 2024-04-30T02:37:53Z

src/diffusers/models/attention_processor.py

+        if is_torch_npu_available() and query.dtype in (torch.float16, torch.bfloat16):
+            hidden_states = torch_npu.npu_fusion_attention(
+                query, key, value, attn.heads, input_layout="BNSD",
+                pse=None,
+                atten_mask=attention_mask,
+                scale=1.0 / math.sqrt(query.shape[-1]),
+                pre_tockens=65536,
+                next_tockens=65536,
+                keep_prob=1.,
+                sync=False,
+                inner_precise=0,
+            )[0]


Hmm, so when Torch NPU is available it will default to using torch_npu.npu_fusion_attention right? But our current library-wide behaviour is that when PyTorch 1.x is available we rely on AttnProcessor and when PyTorch 2.x is available we rely on AttnProcessor2_0 which uses F.scaled_dot_product_attention(). These are two default attention processors we use based on the available PyTorch version.

So, with that in mind, I find this to be slightly problematic as we are moving away from a conceptual understanding. Folks that use the library already know that F.scaled_dot_product_attention() is being used if they're using PyTorch 2.x unless stated otherwise. Therefore, I think it might be better to have an AttnProcessorNPU class and use that instead. In that class, we will be able to do proper error handling too such as if query.dtype is not torch.float16 or torch.blfoat16, we error out.

I would like to ask @yiyixuxu and @DN6 for their opinions here too.

Alright, I understand your point. It would be a good idea to separate the torch_npu flash attention module.

HelloWorldBeginner · 2024-04-30T07:45:00Z

I've separated the NPU flash attention into a module and implemented a switch control using parameters.
I've tested it and it works.

@sayakpaul

sayakpaul · 2024-04-30T13:33:03Z

src/diffusers/models/activations.py

-        return hidden_states * self.gelu(gate)
+        if is_torch_npu_available():
+            hidden_states = self.proj(hidden_states, *args)
+            return torch_npu.npu_geglu(hidden_states, dim=-1, approximate=1)[0]


Can we not use the existing self.gelu() when using NPU?

Compared to self.gelu(), using torch_npu.npu_geglu can run faster and save memory on NPU.

sayakpaul

Thanks for working on this.

For me, the following would be nice to add before we merge:

Documentation -- add an entry about the NPU processor to https://huggingface.co/docs/diffusers/main/en/api/attnprocessor
Test: Similar to

diffusers/tests/models/test_modeling_common.py

Line 307 in 26a7851

def test_set_xformers_attn_processor_for_determinism(self):

@yiyixuxu could you review the changes introduced the core modules of the library and comment?

yiyixuxu · 2024-04-30T18:25:13Z

src/diffusers/models/activations.py

+        if is_torch_npu_available():
+            hidden_states = self.proj(hidden_states, *args)
+            return torch_npu.npu_geglu(hidden_states, dim=-1, approximate=1)[0]
+        else:
+            hidden_states, gate = self.proj(hidden_states).chunk(2, dim=-1)
+            return hidden_states * self.gelu(gate)


Suggested change

if is_torch_npu_available():

hidden_states = self.proj(hidden_states, *args)

return torch_npu.npu_geglu(hidden_states, dim=-1, approximate=1)[0]

else:

hidden_states, gate = self.proj(hidden_states).chunk(2, dim=-1)

return hidden_states * self.gelu(gate)

hidden_states = self.proj(hidden_states)

if if is_torch_npu_available():

return torch_npu.npu_geglu(hidden_states, dim=-1, approximate=1)[0]

else:

hidden_states, gate = hidden_states.chunk(2, dim=-1)

return hidden_states * self.gelu(gate)

HelloWorldBeginner · 2024-05-01T15:07:44Z

Thanks for working on this.

For me, the following would be nice to add before we merge:

Documentation -- add an entry about the NPU processor to https://huggingface.co/docs/diffusers/main/en/api/attnprocessor

Test: Similar to

diffusers/tests/models/test_modeling_common.py

Line 307 in 26a7851

def test_set_xformers_attn_processor_for_determinism(self):

@yiyixuxu could you review the changes introduced the core modules of the library and comment?

Sure, I'll add unit tests and documentation later.

HelloWorldBeginner · 2024-05-02T08:57:18Z

I've updated the code. @sayakpaul

sayakpaul · 2024-05-02T09:01:20Z

tests/models/test_modeling_common.py

+    @unittest.skipIf(
+        torch_device != "npu" or not is_torch_npu_available(),
+        reason="torch npu flash attention is only available with NPU and `torch_npu` installed",
+    )
+    def test_set_torch_npu_flash_attn_processor_determinism(self):


Test seems quite nice to me. Thanks for working on it!

HelloWorldBeginner · 2024-05-03T15:45:39Z

Hi @sayakpaul.
I noticed the PR is still open. Does the code still need review from others?

sayakpaul · 2024-05-03T15:54:34Z

Yes, it needs reviews from our core maintainer @yiyixuxu/

yiyixuxu

thanks!

diffusers commit 5823736 Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed. (huggingface/diffusers#7816)

Add Ascend NPU support for SDXL fine-tuning and fix the model saving …

e7c1731

…bug when using DeepSpeed.

sayakpaul requested a review from yiyixuxu April 29, 2024 11:01

fix check code quality

6536fc8

HelloWorldBeginner force-pushed the main branch from d2bf131 to 6536fc8 Compare April 29, 2024 12:01

sayakpaul reviewed Apr 30, 2024

View reviewed changes

Decouple the NPU flash attention and make it an independent module.

63c6045

HelloWorldBeginner force-pushed the main branch from e4a39ae to 63c6045 Compare April 30, 2024 07:34

sayakpaul reviewed Apr 30, 2024

View reviewed changes

sayakpaul approved these changes Apr 30, 2024

View reviewed changes

yiyixuxu reviewed Apr 30, 2024

View reviewed changes

add doc and unit tests for npu flash attention.

db5b55c

Merge branch 'main' into main

f9a9366

sayakpaul reviewed May 2, 2024

View reviewed changes

sayakpaul requested a review from yiyixuxu May 3, 2024 15:54

Merge branch 'main' into main

0ef4a25

yiyixuxu approved these changes May 3, 2024

View reviewed changes

yiyixuxu merged commit 5823736 into huggingface:main May 3, 2024
15 checks passed

This was referenced May 11, 2024

Add Ascend NPU support for SDXL and fix bugs #7915

Closed

Add Ascend NPU support for SDXL. #7916

Merged

HelloWorldBeginner mentioned this pull request May 11, 2024

fix bugs when using deepspeed in sdxl #7917

Merged

6 tasks

XSE42 added a commit to XSE42/diffusers3d that referenced this pull request May 13, 2024

[Sync] diffusers commit 5823736

856ec15

diffusers commit 5823736 Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed. (huggingface/diffusers#7816)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed. #7816

Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed. #7816

HelloWorldBeginner commented Apr 29, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 29, 2024

HelloWorldBeginner commented Apr 29, 2024

sayakpaul commented Apr 29, 2024

HelloWorldBeginner commented Apr 29, 2024

yiyixuxu commented Apr 29, 2024

sayakpaul commented Apr 30, 2024

sayakpaul Apr 30, 2024

HelloWorldBeginner Apr 30, 2024

HelloWorldBeginner commented Apr 30, 2024 •

edited

Loading

sayakpaul Apr 30, 2024

HelloWorldBeginner May 1, 2024

sayakpaul left a comment •

edited

Loading

yiyixuxu Apr 30, 2024

HelloWorldBeginner commented May 1, 2024

HelloWorldBeginner commented May 2, 2024

sayakpaul May 2, 2024

HelloWorldBeginner commented May 3, 2024

sayakpaul commented May 3, 2024

yiyixuxu left a comment

Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed. #7816

Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed. #7816

Conversation

HelloWorldBeginner commented Apr 29, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Apr 29, 2024

HelloWorldBeginner commented Apr 29, 2024

sayakpaul commented Apr 29, 2024

HelloWorldBeginner commented Apr 29, 2024

yiyixuxu commented Apr 29, 2024

sayakpaul commented Apr 30, 2024

sayakpaul Apr 30, 2024

Choose a reason for hiding this comment

HelloWorldBeginner Apr 30, 2024

Choose a reason for hiding this comment

HelloWorldBeginner commented Apr 30, 2024 • edited Loading

sayakpaul Apr 30, 2024

Choose a reason for hiding this comment

HelloWorldBeginner May 1, 2024

Choose a reason for hiding this comment

sayakpaul left a comment • edited Loading

Choose a reason for hiding this comment

yiyixuxu Apr 30, 2024

Choose a reason for hiding this comment

HelloWorldBeginner commented May 1, 2024

HelloWorldBeginner commented May 2, 2024

sayakpaul May 2, 2024

Choose a reason for hiding this comment

HelloWorldBeginner commented May 3, 2024

sayakpaul commented May 3, 2024

yiyixuxu left a comment

Choose a reason for hiding this comment

HelloWorldBeginner commented Apr 29, 2024 •

edited

Loading

HelloWorldBeginner commented Apr 30, 2024 •

edited

Loading

sayakpaul left a comment •

edited

Loading