GitHub - Niger-Volta-LTI/menyo-20k_MT

MENYO-20k: A Multi-domain English - Yorùbá Corpus for Machine Translation

MENYO-20k is a multi-domain parallel dataset with texts obtained from news articles, ted talks, movie transcripts, radio transcripts, science and technology texts, and other short articles curated from the web and professional translators. The dataset has 20,100 parallel sentences split into 10,070 training sentences, 3,397 development sentences, and 6,633 test sentences (3,419 multi-domain, 1,714 news domain, and 1,500 ted talks speech transcript domain). The development and test sets are available upon request.

License

For non-commercial use because some of the data sources like Ted talks and JW news requires permission for commercial use.

Contributors:

David I. Adelani, Jesujoba O. Alabi, Damilola Adebonojo, Adesina Ayeni, Mofe Adeyemi, Ayodele Awokoya, Adebayo O. Adeojo, Babatunde O. Popoola, Olumide Awokoya, Modupe Olaniyi, Princess Folasade, Tolulope Adelani, and Oluyemisi Olaose

Acknowledgement:

This project was supported by the AI4D language dataset fellowship through K4All and Zindi Africa

If you use this dataset, please cite the dataset

@dataset{david_ifeoluwa_adelani_2020_4297448,
  author       = {David Ifeoluwa Adelani and
                  Jesujoba O. Alabi and
                  Damilola Adebonojo and
                  Adesina Ayeni and
                  Mofe Adeyemi and
                  Ayodele Awokoya},
  title        = {{MENYO-20k: A Multi-domain English - Yorùbá Corpus 
                   for Machine Translation}},
  month        = nov,
  year         = 2020,
  publisher    = {Zenodo},
  version      = {1.0},
  doi          = {10.5281/zenodo.4297448},
  url          = {https://doi.org/10.5281/zenodo.4297448}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
MENYO_20k_Corpus.pdf		MENYO_20k_Corpus.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MENYO-20k: A Multi-domain English - Yorùbá Corpus for Machine Translation

License

Contributors:

Acknowledgement:

About

Releases

Packages

Niger-Volta-LTI/menyo-20k_MT

Folders and files

Latest commit

History

Repository files navigation

MENYO-20k: A Multi-domain English - Yorùbá Corpus for Machine Translation

License

Contributors:

Acknowledgement:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages