GPU/CPU Offloading and preload_module_classes #3415

Giuseppe5 · 2025-02-27T14:55:02Z

System Info

accelerate==1.4.0
torch==2.5.0

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

Hi everyone,

When dispatching a model across GPU/CPU for inference purposes, I see that it is possible to pass a list of preload_module_classes to dispatch_model, and this will make sure that all the nested modules are moved to GPU and no _hf_hook are attached.

One issue I noticed is that for the models that are not offloaded to CPU, the preload_module_classes flag is ignored, and _hf_hook are still attached to all the nested modules.

I understand the principle behind this, however I am having an issue when combining accelerate + compile.

In particular, I'm trying to compile only nested modules that I attached to every Linear layer, and the presence of _hf_hook causes excessive recompilation errors. My goal is then to pass Linear to preload_module_classes and have no hooks attached to all Linear submodules.

This works for the CPU offloaded modules, but not for the ones kept in GPU, which will still have a _hf_hook attached.

If we want to take a step back from this specific problem, I guess the bigger issue is that calling torch.compile on a model dispatched with accelerate incurs in excessive recompilation, and everything above is a best-effort strategy to work around it.

If it is not clear, I can come up with some script to reproduce the issue.

Expected behavior

I don't see any side-effect in having the GPU-placed modules behave as the CPU-offloaded one, and this would simply mean changing here:

attach_execution_device_hook(module, execution_device[module_name], skip_keys=skip_keys, tied_params_map=tied_params_map)

to

attach_execution_device_hook(module, execution_device[module_name], skip_keys=skip_keys, tied_params_map=tied_params_map, preload_module_classes=preload_module_classes)

@muellerzr

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU/CPU Offloading and preload_module_classes #3415

GPU/CPU Offloading and preload_module_classes #3415

Giuseppe5 commented Feb 27, 2025 •

edited

Loading

GPU/CPU Offloading and preload_module_classes #3415

GPU/CPU Offloading and preload_module_classes #3415

Comments

Giuseppe5 commented Feb 27, 2025 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

Giuseppe5 commented Feb 27, 2025 •

edited

Loading