Koel Labs innovates tools to provide real-time feedback and improve your pronunciation.
Checkout the project proposal for details.
So you are learning a new language; you have done the flashcards and memorized the grammar patterns, and yet, after endless hours of practice, your speech sounds broken, unnatural, and slow. Over 48% of people who speak another language feel anxious about their accent, and 35% of English speakers want to get rid of their accent when speaking a foreign language (Babel Language Anxiety Study). While challenging, mastering the sound and tone of a language can transform a person’s ability to communicate effectively. But committing new mouth movements to muscle memory and memorizing inconsistent rules about nuances in the stress of various syllables is tedious, demoralizing to do alone (expensive to do with a human tutor), and requires great discipline.
Mastering pronunciation, tones, and style in language is currently challenging without a personal tutor, and many don’t even know how to improve their pronunciation.
Large language learning classrooms in schools, with 30-40+ people to 1 teacher, provide little personal feedback on speaking/conversation practice. However, this could be significantly supplemented with an AI-based tool that gives this personal feedback.
Most language learning apps do not provide pronunciation feedback at all (or only at a superficial level of whether it can recognize each word) and thus leave it up to the learner to assess whether their pronunciation is correct by comparing it with provided audio clips. It is hard, if not impossible, to evaluate oneself when learning the language, and an outside observer (preferably someone already fluent in the language) is required to assess appropriately. Current language learning apps incorporating pronunciation are not engaging and inaccessible to people who do not know linguistic phonemes. No app holistically covers stress/pitch accent, intonation, tone, and cadence.
The main contribution we aim to make is real-time personal feedback. This should come in the form of actionable corrections on grammar and pronunciation provided in an interactive format – a conversation with an LLM and audio model-powered language coach so that learners can ask questions about the pronunciation feedback as they practice engaging dialogues from their favorite shows. We want to go beyond the existing approach of leaving the learner to compare their pronunciation with the lackluster audio clips provided.
Behind the scenes, this will take advantage of IPA phonemic transcription (symbolic representations of sound). Our goal is to hide this from the learner (since most people are not familiar with IPA phonemes). Instead, the app will identify mispronounced words and explain how the syllables are pronounced using examples from a learner’s native language (when applicable, not all sounds exist in all languages) and different words they already know how to pronounce that share sounds with the words they struggle with.
The grand vision is for all language learners hoping to improve their pronunciation. The MVP will focus on language learners living in the US who have lived here a few years but have a language other than English as their native language and still struggle with their pronunciation. We know a few of these people in our families and neighborhoods who are interested in being the first set of user testers, making it a practical target audience (All three of us co-founders come from families of immigrants or are immigrants ourselves!).
It is sufficiently differentiated from existing solutions yet exists within a proven market (digital language learning). For example, Duolingo generated $531 million in revenue in 2023 and is growing fast.
- We have experience fine-tuning/training speech recognition/synthesis models
- We have experience designing production RAG+LLM powered applications used by 10s of thousands of MAU at various startups (e.g. Gooey AI)
- We are actively involved in NLP and low-resource language research at the University of Washington labs (e.g., UbiComp, ICTD, Yulias NLP Lab)
- We are great software engineers (interned at T-Mobile, Gooey.AI, and more)
- We have experience learning languages
- We are funded by Mozilla Builders
- Acquire funding from Mozilla Builders
- Evaluate existing IPA transcription models
- Fine-tune/train and evaluate a model that does IPA transcription with timestamps for English
- Create a proof of concept feedback pipeline
- Create a web application that allows desktop users to practice audio clips from their favorite shows
- Evaluate existing models for pitch accent, ToBI intonation labeling, stress accent detection, etc.
- Curate a small evaluation dataset for prompt engineering and LLM choice. This will include a handful of everyday speech phenomena to ensure the model can explain the relevant tongue positioning, etc.
- Create and iterate on an LLM pipeline to provide feedback to the user
- Create/license visuals/animations for each phoneme (there are less than 40 relevant for English) and curate a database of common words in different languages that can be used to explain English sounds to their native speakers
- Support more languages for learning
- Expand the number of available audio sources (e.g., audiobooks from LibriVox)
- Recommendation system that recommends movies/shows that are appropriate for a learner’s language level
- Alexander Metzger (he/him/his) — Founder and CEO — experienced with fine-tuning/training ASR/TTS models, MLOps, full-stack development, startups, and open source.
- Aruna Srivastava (she/her) - Co-Founder and Machine Learning Scientist - experienced with NLP, MLOps, Deep learning, and Linguistics.
- Ruslan Mukhamedvaleev (he/him) - Co-Founder and Full-Stack Software Developer - experienced with full-stack development (React, NextJS, VueJS), UI/UX design, and branding.
Shipping with ❤️ from Seattle, Washington.
Ethical considerations when teaching language at scale include not pushing one “standard” dialect (e.g., the White-American pronunciation), thus perpetuating particular expectations of how words should be pronounced. We hope to address this by offering a wide variety of movies/shows with different dialects for learners to choose from (and by leveraging audio models with diverse dialect support).
Frontend application code is open-sourced under the FSL-1.1-Apache-2.0
Model weights, training code, cleaned datasets, etc. are open-sourced under the GNU Affero General Public License. The exceptions are a few models and Huggingface spaces released during the builders program under the Mozilla Public License.
We are excited to have participated in the 2024 cohort of the Mozilla Builders Accelerator.
- Machine Learning data processing, training, and evaluation code: Link
- Web Application: Link
- Python ML inference server: Link
Checkout our contributing guidelines here. You are welcome to get involved with suggestions/feedback through issues on our public repos. Support through the PayPal and Patreon links at the top of the README is always appreciated (this goes towards our data collection and model training efforts and supports us financially). Happy language learning!