Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update the logic of is_sequential_cpu_offload #7788

Merged
merged 4 commits into from
May 1, 2024
Merged

Conversation

yiyixuxu
Copy link
Collaborator

follow up on this huggingface/accelerate#2701
when the sequential CPU offloading method is enabled for the pipeline, accelerate will try to install an AlignDevicesHook to each model component; if the model contains a buffer, it will install a SequentialHook with two AlignDevicesHook;

currently, we assume that the model is sequentially offloaded only the hook is an AlignDevicesHook. In this PR I updated logic to include the scenario whenSequentialHook is created

@@ -1373,6 +1373,7 @@ def test_sequential_cpu_offload_forward_pass(self, expected_max_diff=1e-4):
output_without_offload = pipe(**inputs)[0]

pipe.enable_sequential_cpu_offload()
assert pipe._execution_device.type == pipe._offload_device.type
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a check to make sure the _execution_device attribute of pipeline work as expected since we rely on the accelerate hooks to infer the device

def _execution_device(self):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also make sense to check if the SequentialHook is properly installed when a model in a pipeline has buffers?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines 1510 to 1514
if hasattr(v, "_hf_hook"):
if isinstance(v._hf_hook, accelerate.hooks.SequentialHook):
for hook in v._hf_hook.hooks:
if not isinstance(hook, accelerate.hooks.AlignDevicesHook):
offloaded_modules_with_incorrect_hooks[k] = type(v._hf_hook.hooks[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this part of the check a bit?

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Left some comments.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! Thanks for updating the logic. +1 for @sayakpaul comments


inputs = self.get_dummy_inputs(generator_device)
output_with_offload = pipe(**inputs)[0]

max_diff = np.abs(to_np(output_with_offload) - to_np(output_without_offload)).max()
self.assertLess(max_diff, expected_max_diff, "CPU offloading should not affect the inference results")

# make sure all `torch.nn.Module` components (except those in `self._exclude_from_cpu_offload`) are offloaded correctly
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @DN6
are these offload tests being run? e.g. I randomly checked on dit and all its offloading tests are failing but I did not see that in the reports (I looked in slow tests too and did not see any)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be because DiT has low usage and hence it's not picked up by utils/fetch_torch_cuda_pipeline_test_matrix.py and ultimately not being included here?

tests/pipelines/${{ matrix.module }}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, DiT has low usage so those slow tests are skipped. They should be in the nightly tests though.

@yiyixuxu yiyixuxu merged commit 21a7ff1 into main May 1, 2024
17 checks passed
@yiyixuxu
Copy link
Collaborator Author

yiyixuxu commented May 1, 2024

a follow-up (low-priority) item is to see if we can include the models we have been excluding from offloading

@yiyixuxu yiyixuxu deleted the offload-cleanup branch May 1, 2024 16:27
XSE42 added a commit to XSE42/diffusers3d that referenced this pull request May 13, 2024
diffusers commit 21a7ff1
    update the logic of `is_sequential_cpu_offload` (huggingface/diffusers#7788)
@keepdying
Copy link

keepdying commented Jul 24, 2024

I think instead of just checking AlignDevicesHook we should check its offload attribute is True to determine module is offloaded to cpu. In some cases it fails even though it shouldn't, e.g. when a pipeline initialized with its __init__ method and required components are initialized with .from_pretrained(model_path, device_map={"": 0})

@SunMarc
Copy link
Member

SunMarc commented Jul 24, 2024

Hey @keepdying, could you share a minimal reproducer of the error that you are facing in a seperate issue ? We can definitely switch to checking the offload attribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants