You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)
Reproduction
Hi everyone,
When dispatching a model across GPU/CPU for inference purposes, I see that it is possible to pass a list of preload_module_classes to dispatch_model, and this will make sure that all the nested modules are moved to GPU and no _hf_hook are attached.
I understand the principle behind this, however I am having an issue when combining accelerate + compile.
In particular, I'm trying to compile only nested modules that I attached to every Linear layer, and the presence of _hf_hook causes excessive recompilation errors. My goal is then to pass Linear to preload_module_classes and have no hooks attached to all Linear submodules.
This works for the CPU offloaded modules, but not for the ones kept in GPU, which will still have a _hf_hook attached.
If we want to take a step back from this specific problem, I guess the bigger issue is that calling torch.compile on a model dispatched with accelerate incurs in excessive recompilation, and everything above is a best-effort strategy to work around it.
If it is not clear, I can come up with some script to reproduce the issue.
Expected behavior
I don't see any side-effect in having the GPU-placed modules behave as the CPU-offloaded one, and this would simply mean changing here:
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
Hi everyone,
When dispatching a model across GPU/CPU for inference purposes, I see that it is possible to pass a list of
preload_module_classes
todispatch_model
, and this will make sure that all the nested modules are moved to GPU and no_hf_hook
are attached.One issue I noticed is that for the models that are not offloaded to CPU, the
preload_module_classes
flag is ignored, and_hf_hook
are still attached to all the nested modules.I understand the principle behind this, however I am having an issue when combining accelerate + compile.
In particular, I'm trying to compile only nested modules that I attached to every Linear layer, and the presence of
_hf_hook
causes excessive recompilation errors. My goal is then to passLinear
topreload_module_classes
and have no hooks attached to all Linear submodules.This works for the CPU offloaded modules, but not for the ones kept in GPU, which will still have a
_hf_hook
attached.If we want to take a step back from this specific problem, I guess the bigger issue is that calling
torch.compile
on a model dispatched with accelerate incurs in excessive recompilation, and everything above is a best-effort strategy to work around it.If it is not clear, I can come up with some script to reproduce the issue.
Expected behavior
I don't see any side-effect in having the GPU-placed modules behave as the CPU-offloaded one, and this would simply mean changing here:
to
@muellerzr
The text was updated successfully, but these errors were encountered: