NLP-speech-to-text

Convert speech to text using HuggingFace, comparing Wav2Vec2 versus OpenAI Whisper

Data

Speech samples included a subset of sentences recorded for this study:

Reuter, T., Sullivan, M., & Lew-Williams, C. (2021). Look at that: Spatial deixis reveals experience- related changes in prediction. Language Acquisition. https://doi.org/10.1080/10489223.2021.1932905

Audio for lab-based experiments are very clean. So this should be an easy transcription task.

Conclusion

IMO, Whisper beats Wav2Vec2 in at least 3 ways:

More performant.

Transcribed 20% faster.
Future enhancements could increase speed.

More accurate.

Transcribed "apple" versus "apples" correctly.
Spelled "doggies" correctly as "doggies", not as "DOGGIYS".

More nuanced.

Transcribed 3 sentences with emphatic punctuation (! instead of .)
Punctuation indicates emphasis and emotion, useful for downstream sentiment analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
speech_samples		speech_samples
README.md		README.md
speech_to_text.ipynb		speech_to_text.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-speech-to-text

Data

Conclusion

About

Languages

tracyreuter/NLP-speech-to-text

Folders and files

Latest commit

History

Repository files navigation

NLP-speech-to-text

Data

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Languages