A comprehensible and quick illustration on how to do speaker identification on your own data with NVIDIA's Nemo
- record data
- data prep
- config
- fine tune
- inference
- evaluation
- preprocess into intended format
- slice into around 4 seconds audio samples
Recommended tool pydub
- change from default config
- documentations & instructions
-
fine tune script
- inference script