Training on Tesla K80 #24

StuteePatil · 2021-07-28T10:07:23Z

Hi,
Using Tesla K80 to train the model is giving the following error. Does the model require specific GPU architecture for training?

File "train.py", line 290, in
main()
File "train.py", line 50, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/anaconda/envs/vits/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/anaconda/envs/vits/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/anaconda/envs/vits/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/anaconda/envs/vits/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/media/hdd1tb/tts-VITS/vits-main/train.py", line 117, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/media/hdd1tb/tts-VITS/vits-main/train.py", line 162, in train_and_evaluate
hps.data.mel_fmax
File "/media/hdd1tb/tts-VITS/vits-main/mel_processing.py", line 105, in mel_spectrogram_torch
center=center, pad_mode='reflect', normalized=False, onesided=True)
File "/anaconda/envs/vits/lib/python3.7/site-packages/torch/functional.py", line 465, in stft
return _VF.stft(input, n_fft, hop_length, win_length, window, normalized, onesided)
RuntimeError: cuFFT doesn't support signals of half type with compute capability less than SM_53, but the device containing input half tensor only has SM_37

nikich340 · 2021-11-23T03:31:16Z

Did you try "fp16_run": false ?

skilomlg · 2022-02-15T00:19:41Z

You cannot train vits on a K80. K80's are a very weak GPU. You need at least a Tesla P100 or a T4 in order to avoid errors when training. A valid explanation for this is because K80's don't have enough memory for training.

nikich340 · 2022-03-04T05:20:55Z

You cannot train vits on a K80. K80's are a very weak GPU. You need at least a Tesla P100 or a T4 in order to avoid errors when training. A valid explanation for this is because K80's don't have enough memory for training.

That's not true. You can train on any GPU which supports cuda, but have to set fitting batch size. It reduce resulting quality, true, but it doesn't mean "you cannot train".

ramune64 mentioned this issue Mar 22, 2025

When I run train_ms.py on WSL2 get CUDA error and something. #224

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on Tesla K80 #24

Training on Tesla K80 #24

StuteePatil commented Jul 28, 2021

nikich340 commented Nov 23, 2021

skilomlg commented Feb 15, 2022

nikich340 commented Mar 4, 2022

Training on Tesla K80 #24

Training on Tesla K80 #24

Comments

StuteePatil commented Jul 28, 2021

nikich340 commented Nov 23, 2021

skilomlg commented Feb 15, 2022

nikich340 commented Mar 4, 2022