A question about dataloader setting : `shufflu`? #103

Classmate-Huang · 2022-05-12T09:31:54Z

Excellent work!

I see that your training script does not use the shuffle=True setting when loading data. I wonder if this setting has any effect for performance?

Does using shuffle=True have a positive effect? Or negative effects?

The text was updated successfully, but these errors were encountered:

Yuxin-Du-Lab · 2023-03-12T15:39:44Z

Same question. Have you reached a conclusion? Thx

Yuxin-Du-Lab · 2023-03-14T07:52:15Z

Stick an explanation:
Shuffle in the DistributedSampler is true(default). If you set shuffle in the DistributedSampler to true, you do not need to set shuffle in the DataLoader that uses the sampler, because the DistributedSampler generates different random seeds for each process in a distributed environment to mess up the data. Therefore, in a distributed environment, it is recommended to set the shuffle only in the DistributedSampler.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about dataloader setting : `shufflu`? #103

A question about dataloader setting : `shufflu`? #103

Classmate-Huang commented May 12, 2022

Yuxin-Du-Lab commented Mar 12, 2023

Yuxin-Du-Lab commented Mar 14, 2023

A question about dataloader setting : shufflu? #103

A question about dataloader setting : shufflu? #103

Comments

Classmate-Huang commented May 12, 2022

Yuxin-Du-Lab commented Mar 12, 2023

Yuxin-Du-Lab commented Mar 14, 2023

A question about dataloader setting : `shufflu`? #103

A question about dataloader setting : `shufflu`? #103