You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This happens because the dataloader is duplicated across the 4 GPUs instead of being sharded. HF moved sharding logic into the accelerator. The accelerator prepares the dataloader for the training configuration.
During distributed training with Pytorch, the number of training steps increases with the number of processes.
To reproduce:
Transformers: 4.41.2
Torch: 2.3.1
Accelerate: 0.31.0
This happens because the dataloader is duplicated across the 4 GPUs instead of being sharded. HF moved sharding logic into the accelerator. The accelerator prepares the dataloader for the training configuration.
https://github.com/huggingface/transformers/blob/e65502951593a76844e872fee9c56b805598538a/src/transformers/trainer.py#L904
Here, the correct number of training steps should be 125K.
The text was updated successfully, but these errors were encountered: