-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update the logic of is_sequential_cpu_offload
#7788
Conversation
@@ -1373,6 +1373,7 @@ def test_sequential_cpu_offload_forward_pass(self, expected_max_diff=1e-4): | |||
output_without_offload = pipe(**inputs)[0] | |||
|
|||
pipe.enable_sequential_cpu_offload() | |||
assert pipe._execution_device.type == pipe._offload_device.type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a check to make sure the _execution_device
attribute of pipeline work as expected since we rely on the accelerate hooks to infer the device
def _execution_device(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it also make sense to check if the SequentialHook
is properly installed when a model in a pipeline has buffers?
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
if hasattr(v, "_hf_hook"): | ||
if isinstance(v._hf_hook, accelerate.hooks.SequentialHook): | ||
for hook in v._hf_hook.hooks: | ||
if not isinstance(hook, accelerate.hooks.AlignDevicesHook): | ||
offloaded_modules_with_incorrect_hooks[k] = type(v._hf_hook.hooks[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this part of the check a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Left some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! Thanks for updating the logic. +1 for @sayakpaul comments
|
||
inputs = self.get_dummy_inputs(generator_device) | ||
output_with_offload = pipe(**inputs)[0] | ||
|
||
max_diff = np.abs(to_np(output_with_offload) - to_np(output_without_offload)).max() | ||
self.assertLess(max_diff, expected_max_diff, "CPU offloading should not affect the inference results") | ||
|
||
# make sure all `torch.nn.Module` components (except those in `self._exclude_from_cpu_offload`) are offloaded correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @DN6
are these offload tests being run? e.g. I randomly checked on dit and all its offloading tests are failing but I did not see that in the reports (I looked in slow tests too and did not see any)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be because DiT has low usage and hence it's not picked up by utils/fetch_torch_cuda_pipeline_test_matrix.py
and ultimately not being included here?
tests/pipelines/${{ matrix.module }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, DiT has low usage so those slow tests are skipped. They should be in the nightly tests though.
a follow-up (low-priority) item is to see if we can include the models we have been excluding from offloading |
diffusers commit 21a7ff1 update the logic of `is_sequential_cpu_offload` (huggingface/diffusers#7788)
I think instead of just checking AlignDevicesHook we should check its offload attribute is True to determine module is offloaded to cpu. In some cases it fails even though it shouldn't, e.g. when a pipeline initialized with its |
Hey @keepdying, could you share a minimal reproducer of the error that you are facing in a seperate issue ? We can definitely switch to checking the offload attribute. |
follow up on this huggingface/accelerate#2701
when the sequential CPU offloading method is enabled for the pipeline, accelerate will try to install an
AlignDevicesHook
to each model component; if the model contains a buffer, it will install aSequentialHook
with twoAlignDevicesHook
;currently, we assume that the model is sequentially offloaded only the hook is an
AlignDevicesHook.
In this PR I updated logic to include the scenario whenSequentialHook
is created