Italian Dodiom Corpus

This repository contains the Italian corpus collected through the Dodiom game, a collaborative project between the UNIOR NLP Research Group and the NLP Research Group from the Department of Artificial Intelligence and Data Engineering of Instanbul University.

The game has been experimented on two languages,namely Turkish and Italian, with the aim of building a corpus of multiword expressions both in their idiomatic and literal use through a gamified crowdsourcing approach.

The repository contains a collection of Italian idioms and the corresponding examples suggested by the players of a game with a purpose titled Dodiom for the Italian language (https://t.me/dodiom_it_bot).

The overall Dodiom dataset for the Italian language includes a total amount of 6,730 samples, split into two sub-datasets: i) with-reward containing 5,286 samples, obtained during a session of the game where some monetary rewards were given to the best playercof each day and ii) without-reward containing 1,444 sentences.

Each provided example is displayed with the related idiom, the category (idiom/non-idiom) assigned by the player, the total number of likes/dislikes received from other players, any reports provided about vulgarity, improper usage of the platform etc., and the overall calculated rating (dislikes over likes).

The repository also contains the corpus annotated according to an annotation scheme composed of 12 parameters to assess the quality of the sample sentences submitted by the players for the different idioms suggested during the game.

Project coordinator: Prof. Phd Johanna Monti (University of Naples L'Orientale)

Project assistant: Phd Raffaele Manna

Annotators:

Giuseppina Morza
Adriana Capasso
Giovanna Carandente

When using the Italian Dodiom Corpus please cite:

Morza, G., Manna, R., & Monti, J. (2022, June). Assessing the Quality of an Italian Crowdsourced Idiom Corpus: the Dodiom Experiment. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 4205-4211).

Eryiğit, G., Şentaş, A., & Monti, J. (2023). Gamified crowdsourcing for idiom corpora construction. Natural Language Engineering, 29(4), 909-941.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Italian Dodiom Corpus Evaluation		Italian Dodiom Corpus Evaluation
Italian Dodiom		Italian Dodiom
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Italian Dodiom Corpus

About

Releases

Packages

Contributors 2

License

unior-nlp-research-group/italian-dodiom-corpus

Folders and files

Latest commit

History

Repository files navigation

Italian Dodiom Corpus

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages