[Have you overcome overfitting problem] #3

vuthede · 2020-10-05T01:52:46Z

Hi there,
@vitrioil Just want to ask have you overcame the overfitting problem that you reported in README?
Thanks, Do you have any idea of your overfitting? and any idea to overcome it?
how much data you train on? thanks

vitrioil · 2020-10-05T14:01:39Z

Hi,

I did not solve the issue. I tried with around 20k audio clips for 2 person speech separation only. I would assume now that more data than that would be required. I did not experiment too much with the model, simply because the time it took for training 1 epoch was always in 1-2 day range, so couple of epochs would take weeks. This would change depending upon your GPU and VRAM availability. I would say more than 16GB would be helpful. So, a lot of opportunity there to tweak the model. I also found this. It could be helpful.

vuthede · 2020-10-05T17:14:35Z

Hi, thanks for your quick reply.
In the paper, it seems like they do some preprocessing to remove noise in the input? Do u think it might help ?

vitrioil · 2020-10-06T13:28:06Z

You mean to say while preparing the dataset? Well, I've seen someone mention that here. However, adding additional noise like AudioSet might help regularise.

vuthede · 2020-10-07T06:12:19Z

Yeah, thanks,
Ah although you got the overfitting problem, I am curious that could your model can distinguish voice of 2 people in some extent?

vitrioil · 2020-10-07T14:29:50Z

Yes, you could make out in certain instances who was the main speaker in the separated output. But, not always. Sometimes, it was only noise or mix of both the speakers. For the most part the output was noisy. All of this was also applicable to training data, but not to a great extent. As I said a lot of time is required for a model/dataset this big.

JuanFMontesinos · 2021-02-11T17:46:06Z

Probably related to #4

MordehayM · 2021-07-27T22:39:35Z

Hi,

I did not solve the issue. I tried with around 20k audio clips for 2 person speech separation only. I would assume now that more data than that would be required. I did not experiment too much with the model, simply because the time it took for training 1 epoch was always in 1-2 day range, so couple of epochs would take weeks. This would change depending upon your GPU and VRAM availability. I would say more than 16GB would be helpful. So, a lot of opportunity there to tweak the model. I also found this. It could be helpful.

Hi @vitrioil
Regarding the 20k audio clips, do you mean you downloaded 200 videos and extract from them the 20k audio clips? (200*199/2=19,000, this is the number of combination for creating the 20k audio clips given the 200 videos)

vitrioil · 2021-07-28T16:05:28Z

Hi @MordehayM ,

I believe it was 20k unique clips. 200C2 is indeed 19k, however not all combinations are considered.

There is a parameter: REMOVE_RANDOM_CHANCE (in audio_mixer_generator.py). This will prevent from combination from blowing up, otherwise there will be a lot of files created. By default the value is 0.9

Hence, I was not taking all combinations of files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Have you overcome overfitting problem] #3

[Have you overcome overfitting problem] #3

vuthede commented Oct 5, 2020

vitrioil commented Oct 5, 2020

vuthede commented Oct 5, 2020

vitrioil commented Oct 6, 2020

vuthede commented Oct 7, 2020

vitrioil commented Oct 7, 2020

JuanFMontesinos commented Feb 11, 2021

MordehayM commented Jul 27, 2021

vitrioil commented Jul 28, 2021

[Have you overcome overfitting problem] #3

[Have you overcome overfitting problem] #3

Comments

vuthede commented Oct 5, 2020

vitrioil commented Oct 5, 2020

vuthede commented Oct 5, 2020

vitrioil commented Oct 6, 2020

vuthede commented Oct 7, 2020

vitrioil commented Oct 7, 2020

JuanFMontesinos commented Feb 11, 2021

MordehayM commented Jul 27, 2021

vitrioil commented Jul 28, 2021