Arpagen: A Corpus and Baseline for Phoneme-Level Text Generation

We explore the performance of a phoneme-based text generation model. Character based models have a limited amount of potential inputs and as such require high computation costs to model long term dependencies. Word-based models are accurate and require less computational costs, but in contrast to character-based, have an overwhelming input size with tens of thousands possible unique words. A phoneme-based attempts to bridge this gap by offering a greater amount of unique inputs as compared to the character-based but substantially less than a word-based model. We evaluate the performance of this phoneme-based model against a character and word based using BLEU, ROUGE, and human based metrics.

Final project for LIGN 167 Deep Learning for Natural Language Understanding, UCSD.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Arpagen: A Corpus and Baseline for Phoneme-Level Text Generation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Arpagen: A Corpus and Baseline for Phoneme-Level Text Generation