Skip to content

Data used to assess the quality of Italian Crowdsourced Idiom Corpus: Dodiom

License

Notifications You must be signed in to change notification settings

unior-nlp-research-group/italian-dodiom-corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Italian Dodiom Corpus

This repository contains the Italian corpus collected through the Dodiom game, a collaborative project between the UNIOR NLP Research Group and the NLP Research Group from the Department of Artificial Intelligence and Data Engineering of Instanbul University.

The game has been experimented on two languages,namely Turkish and Italian, with the aim of building a corpus of multiword expressions both in their idiomatic and literal use through a gamified crowdsourcing approach.

The repository contains a collection of Italian idioms and the corresponding examples suggested by the players of a game with a purpose titled Dodiom for the Italian language (https://t.me/dodiom_it_bot).

The overall Dodiom dataset for the Italian language includes a total amount of 6,730 samples, split into two sub-datasets: i) with-reward containing 5,286 samples, obtained during a session of the game where some monetary rewards were given to the best playercof each day and ii) without-reward containing 1,444 sentences.

Each provided example is displayed with the related idiom, the category (idiom/non-idiom) assigned by the player, the total number of likes/dislikes received from other players, any reports provided about vulgarity, improper usage of the platform etc., and the overall calculated rating (dislikes over likes).

The repository also contains the corpus annotated according to an annotation scheme composed of 12 parameters to assess the quality of the sample sentences submitted by the players for the different idioms suggested during the game.

Project coordinator: Prof. Phd Johanna Monti (University of Naples L'Orientale)

Project assistant: Phd Raffaele Manna

Annotators:

  1. Giuseppina Morza
  2. Adriana Capasso
  3. Giovanna Carandente

When using the Italian Dodiom Corpus please cite:

Morza, G., Manna, R., & Monti, J. (2022, June). Assessing the Quality of an Italian Crowdsourced Idiom Corpus: the Dodiom Experiment. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 4205-4211).

Eryiğit, G., Şentaş, A., & Monti, J. (2023). Gamified crowdsourcing for idiom corpora construction. Natural Language Engineering, 29(4), 909-941.

About

Data used to assess the quality of Italian Crowdsourced Idiom Corpus: Dodiom

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published