Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU/CPU Offloading and preload_module_classes #3415

Open
1 of 4 tasks
Giuseppe5 opened this issue Feb 27, 2025 · 0 comments
Open
1 of 4 tasks

GPU/CPU Offloading and preload_module_classes #3415

Giuseppe5 opened this issue Feb 27, 2025 · 0 comments

Comments

@Giuseppe5
Copy link
Contributor

Giuseppe5 commented Feb 27, 2025

System Info

accelerate==1.4.0
torch==2.5.0

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

Hi everyone,

When dispatching a model across GPU/CPU for inference purposes, I see that it is possible to pass a list of preload_module_classes to dispatch_model, and this will make sure that all the nested modules are moved to GPU and no _hf_hook are attached.

One issue I noticed is that for the models that are not offloaded to CPU, the preload_module_classes flag is ignored, and _hf_hook are still attached to all the nested modules.

I understand the principle behind this, however I am having an issue when combining accelerate + compile.

In particular, I'm trying to compile only nested modules that I attached to every Linear layer, and the presence of _hf_hook causes excessive recompilation errors. My goal is then to pass Linear to preload_module_classes and have no hooks attached to all Linear submodules.

This works for the CPU offloaded modules, but not for the ones kept in GPU, which will still have a _hf_hook attached.

If we want to take a step back from this specific problem, I guess the bigger issue is that calling torch.compile on a model dispatched with accelerate incurs in excessive recompilation, and everything above is a best-effort strategy to work around it.

If it is not clear, I can come up with some script to reproduce the issue.

Expected behavior

I don't see any side-effect in having the GPU-placed modules behave as the CPU-offloaded one, and this would simply mean changing here:

attach_execution_device_hook(module, execution_device[module_name], skip_keys=skip_keys, tied_params_map=tied_params_map)

to

attach_execution_device_hook(module, execution_device[module_name], skip_keys=skip_keys, tied_params_map=tied_params_map, preload_module_classes=preload_module_classes)

@muellerzr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant