Skip to content

Can we shuffle dataset first and then split the dataset into training set and validation set? #649

Answered by wanghan-iapcm
theAfish asked this question in Q&A
Discussion options

You must be logged in to vote

All data in the dataset generated by dpgen are critical, because all selected are estimated to be of bad accuracy and are added to the training dataset to improve the quality of the model. In other words, removing any data from the dataset may reduce the accuracy of the model. Therefore it is recommended to generate an independent validation set, rather than splitting the dataset generated by dpgen.

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@theAfish
Comment options

@LZH-1996
Comment options

@theAfish
Comment options

Answer selected by AnguseZhang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants