Repository for the article A Measure for Transparent Comparison of Linguistic Diversity in Multilingual NLP Data Sets (Findings of NAACL 2024)
Tanja Samardzic, Ximena Gutierrez, Christian Bentz, Steven Moran, Olga Pelloni
Data/ Here you will find the corpora statistics neccesary to calculate the diversity measures.
Analyses/ It contains several notebooks that perform the calculation of the diversity measures reported in the paper