Skip to content

cicl2018/semeval-2019-task-6-HAD

Repository files navigation

SemEval-2019-task-6-HAD

Evaluation of Offensive Tweets with target Classification. For more details: Coda Lab_OffensEval 2019 (SemEval 2019 - Task 6)

Sub-tasks

Sub-task A - Offensive language identification (Offensive / Not Offensive)

  • 15 Jan 2019: A test data release - 17 Jan 2019: Submission deadline

Sub-task B - Automatic categorization of offense types (Targeted Insult and Threats / Untargeted)

  • 22 Jan 2019: A test data release - 24 Jan 2019: Submission deadline

Sub-task C - Offense target identification (Target: Individual / Group / Other)

  • 29 Jan 2019: A test data release - 31 Jan 2019: Submission deadline

Contributors

Himanshu Bansal Univesity of Tübingen himanshu.bansal@student.uni-tuebingen.de
Daniel Nagel University of Tübingen daniel.nagel@student.uni-tuebingen.de
Anita Soloveva Lomonosov MSU, University of Tübingen anita.soloveva@student.uni-tuebingen.de

Preprocessing

  1. Lowercasing
  2. Removing URLs, @USER, all the following charachters “ :. , — ˜ ”, digits and single quotation marks except for abbreviations and possessors (e.g. u’re → u’re, but about’ → about)
  3. Using ‘=’, ‘!’, ‘?’ and ‘/’ as token splitters (e.g. something!important → something important)
  4. Parsing hashtags (See Christos Baziotis et. al. 2017)

Model

We are using an unidirectional LSTM based classifier

Sub-task A: Approaches

1. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions)
2. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions) + Postprocessing with manually created offensive word list
3. Parsing hashtags + LSTM model (architecture parameters are optimized by SVM predictions)

Sub-task B: Approaches

1. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions)
2. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions) + Postprocessing with manually created database of potential insult victims as targets. A large part is the names of representatives of top twitter profiles from the USA, the UK, Saudi Arabia, Brazil, India and Spain, Iran, Iraq, Turkey, Russia and Germany.

Sub-task C: Approaches

1. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions)
2. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions) + Postprocessing with manually created database of potential insult victims as targets, which split by categories: (IND), (GRP) and (OTH).
3. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions) + Postprocessing with manually created database, (see Sub-task C: 2) and personal pronouns, including their contractions.

Post-competition phase

Model

BiLSTM based classifier

Sub-task A: Approaches

1. All preprocessing steps + BiLSTM model (architecture parameters are optimized by SVM predictions) + Postprocessing with manually created offensive word list
2. All preprocessing steps + BiLSTM model (architecture parameters are optimized by SVM predictions) with FastText word embeddings + Postprocessing with manually created offensive word list
3. Parsing hashtags + BiLSTM model (architecture parameters are optimized by SVM predictions) with ELMo word embeddings + Postprocessing with manually created offensive word list

All datasets can be issued via mail.

About

Evaluation of Offensive Tweets with target Classification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages