Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

A question about dataloader setting : shufflu? #103

Open
Classmate-Huang opened this issue May 12, 2022 · 2 comments
Open

A question about dataloader setting : shufflu? #103

Classmate-Huang opened this issue May 12, 2022 · 2 comments

Comments

@Classmate-Huang
Copy link

Excellent work!

I see that your training script does not use the shuffle=True setting when loading data. I wonder if this setting has any effect for performance?

Does using shuffle=True have a positive effect? Or negative effects?

@Yuxin-Du-Lab
Copy link

Same question. Have you reached a conclusion? Thx

@Yuxin-Du-Lab
Copy link

Stick an explanation:
Shuffle in the DistributedSampler is true(default). If you set shuffle in the DistributedSampler to true, you do not need to set shuffle in the DataLoader that uses the sampler, because the DistributedSampler generates different random seeds for each process in a distributed environment to mess up the data. Therefore, in a distributed environment, it is recommended to set the shuffle only in the DistributedSampler.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants