Evaluation of Offensive Tweets with target Classification. For more details: Coda Lab_OffensEval 2019 (SemEval 2019 - Task 6)
- 15 Jan 2019: A test data release - 17 Jan 2019: Submission deadline
- 22 Jan 2019: A test data release - 24 Jan 2019: Submission deadline
- 29 Jan 2019: A test data release - 31 Jan 2019: Submission deadline
Himanshu Bansal Univesity of Tübingen himanshu.bansal@student.uni-tuebingen.de
Daniel Nagel University of Tübingen daniel.nagel@student.uni-tuebingen.de
Anita Soloveva Lomonosov MSU, University of Tübingen anita.soloveva@student.uni-tuebingen.de
- Lowercasing
- Removing URLs, @USER, all the following charachters “ :. , — ˜ ”, digits and single quotation marks except for abbreviations and possessors (e.g. u’re → u’re, but about’ → about)
- Using ‘=’, ‘!’, ‘?’ and ‘/’ as token splitters (e.g. something!important → something important)
- Parsing hashtags (See Christos Baziotis et. al. 2017)
We are using an unidirectional LSTM based classifier
1. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions)
2. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions) + Postprocessing with manually created offensive word list
3. Parsing hashtags + LSTM model (architecture parameters are optimized by SVM predictions)
1. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions)
2. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions) + Postprocessing with manually created database of potential insult victims as targets. A large part is the names of representatives of top twitter profiles from the USA, the UK, Saudi Arabia, Brazil, India and Spain, Iran, Iraq, Turkey, Russia and Germany.
1. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions)
2. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions) + Postprocessing with manually created database of potential insult victims as targets, which split by categories: (IND), (GRP) and (OTH).
3. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions) + Postprocessing with manually created database, (see Sub-task C: 2) and personal pronouns, including their contractions.
BiLSTM based classifier
1. All preprocessing steps + BiLSTM model (architecture parameters are optimized by SVM predictions) + Postprocessing with manually created offensive word list
2. All preprocessing steps + BiLSTM model (architecture parameters are optimized by SVM predictions) with FastText word embeddings + Postprocessing with manually created offensive word list
3. Parsing hashtags + BiLSTM model (architecture parameters are optimized by SVM predictions) with ELMo word embeddings + Postprocessing with manually created offensive word list