You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thanks for your great work.
When I apply accelerate to train our model with stylegan-2 discriminator, as the stylegan-2 tries to compile its cuda op, I get the following bugs.
Could you tell me how to fix it ?
Setting up PyTorch plugin "bias_act_plugin"... /usr/local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
Done.
E0220 14:42:00.513000 139816943049600 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: -11) local_rank: 0 (pid: 60422) of binary: /usr/local/bin/python
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in<module>sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/usr/local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1073, in launch_command
multi_gpu_launcher(args)
File "/usr/local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 718, in multi_gpu_launcher
distrib_run.run(args)
File "/usr/local/lib/python3.10/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/usr/local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
The text was updated successfully, but these errors were encountered:
I'm having a similar issue with stylegan3, with the "UserWarning: TORCH_CUDA_ARCH_LIST is not set" during training. I've tried specifying architecture in the training script like so:
thanks for your great work.
When I apply accelerate to train our model with stylegan-2 discriminator, as the stylegan-2 tries to compile its cuda op, I get the following bugs.
Could you tell me how to fix it ?
The text was updated successfully, but these errors were encountered: