Compendium for the paper by Karhila, Smolander, Ylinen & Kurimo submitted to Interspeech 2019
Contents:
- Long list of audio files used to train the recogniser
- Python pickle of N-best hypotheses of recognition results and assorted metadata
- phoneset.html Description of the used phoneme set ( html preview )
- scoring_experiment.ipynb Jupyter notebook for running the scoring experiment
After running the experiments several times, the results for correlations of predicted scores to human annotations are:
Method | Correlation (all) | Correlation (outliers removed) |
---|---|---|
Baseline PWLD | 0.43 ± 0.00 | 0.47 ± 0.00 |
Data-driven PWLD | 0.47 ± 0.00 | 0.53 ± 0.01 |
Random Forest | 0.50 ± 0.00 | 0.54 ± 0.01 |
Support Vector | 0.51 ± 0.02 | 0.55 ± 0.04 |