This repository contains the Italian corpus collected through the Dodiom game, a collaborative project between the UNIOR NLP Research Group and the NLP Research Group from the Department of Artificial Intelligence and Data Engineering of Instanbul University.
The game has been experimented on two languages,namely Turkish and Italian, with the aim of building a corpus of multiword expressions both in their idiomatic and literal use through a gamified crowdsourcing approach.
The repository contains a collection of Italian idioms and the corresponding examples suggested by the players of a game with a purpose titled Dodiom for the Italian language (https://t.me/dodiom_it_bot).
The overall Dodiom dataset for the Italian language includes a total amount of 6,730 samples, split into two sub-datasets: i) with-reward containing 5,286 samples, obtained during a session of the game where some monetary rewards were given to the best playercof each day and ii) without-reward containing 1,444 sentences.
Each provided example is displayed with the related idiom, the category (idiom/non-idiom) assigned by the player, the total number of likes/dislikes received from other players, any reports provided about vulgarity, improper usage of the platform etc., and the overall calculated rating (dislikes over likes).
The repository also contains the corpus annotated according to an annotation scheme composed of 12 parameters to assess the quality of the sample sentences submitted by the players for the different idioms suggested during the game.
Project coordinator: Prof. Phd Johanna Monti (University of Naples L'Orientale)
Project assistant: Phd Raffaele Manna
Annotators:
- Giuseppina Morza
- Adriana Capasso
- Giovanna Carandente
When using the Italian Dodiom Corpus please cite:
Morza, G., Manna, R., & Monti, J. (2022, June). Assessing the Quality of an Italian Crowdsourced Idiom Corpus: the Dodiom Experiment. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 4205-4211).
Eryiğit, G., Şentaş, A., & Monti, J. (2023). Gamified crowdsourcing for idiom corpora construction. Natural Language Engineering, 29(4), 909-941.