Skip to content

Latest commit

 

History

History
15 lines (11 loc) · 665 Bytes

README.md

File metadata and controls

15 lines (11 loc) · 665 Bytes

BAEC

  • The Bangor Arabic–English Code-switching (BAEC) corpus

  • consists of 45,251 words and is 436 KB in size.

  • It was collected from different Facebook pages.

  • It includes code-switching between:

    1. MSA and English;
    2. the Saudi dialect and English;
    3. the Egyptian dialect and English.
  • Manually annotated, it has been produced in XML.

If you use the BAEC corpus, Please cite this paper:

Tarmom, T., Teahan, W., Atwell, E. and Alsalka, M.A., 2020. Compression versus traditional machine learning classifiers to detect code-switching in varieties and dialects: Arabic as a case study. Natural Language Engineering, 26(6), pp.663-676.