SemEval-2019-task-6-HAD

Evaluation of Offensive Tweets with target Classification. For more details: Coda Lab_OffensEval 2019 (SemEval 2019 - Task 6)

Sub-tasks

Sub-task A - Offensive language identification (Offensive / Not Offensive)

15 Jan 2019: A test data release - 17 Jan 2019: Submission deadline

Sub-task B - Automatic categorization of offense types (Targeted Insult and Threats / Untargeted)

22 Jan 2019: A test data release - 24 Jan 2019: Submission deadline

Sub-task C - Offense target identification (Target: Individual / Group / Other)

29 Jan 2019: A test data release - 31 Jan 2019: Submission deadline

Contributors

Himanshu Bansal Univesity of Tübingen himanshu.bansal@student.uni-tuebingen.de
Daniel Nagel University of Tübingen daniel.nagel@student.uni-tuebingen.de
Anita Soloveva Lomonosov MSU, University of Tübingen anita.soloveva@student.uni-tuebingen.de

Preprocessing

Lowercasing
Removing URLs, @USER, all the following charachters “ :. , — ˜ ”, digits and single quotation marks except for abbreviations and possessors (e.g. u’re → u’re, but about’ → about)
Using ‘=’, ‘!’, ‘?’ and ‘/’ as token splitters (e.g. something!important → something important)
Parsing hashtags (See Christos Baziotis et. al. 2017)

Model

We are using an unidirectional LSTM based classifier

Sub-task A: Approaches

1. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions)
2. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions) + Postprocessing with manually created offensive word list
3. Parsing hashtags + LSTM model (architecture parameters are optimized by SVM predictions)

Sub-task B: Approaches

1. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions)
2. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions) + Postprocessing with manually created database of potential insult victims as targets. A large part is the names of representatives of top twitter profiles from the USA, the UK, Saudi Arabia, Brazil, India and Spain, Iran, Iraq, Turkey, Russia and Germany.

Sub-task C: Approaches

1. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions)
2. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions) + Postprocessing with manually created database of potential insult victims as targets, which split by categories: (IND), (GRP) and (OTH).
3. All preprocessing steps + LSTM model (architecture parameters are optimized by SVM predictions) + Postprocessing with manually created database, (see Sub-task C: 2) and personal pronouns, including their contractions.

Post-competition phase

Model

BiLSTM based classifier

Sub-task A: Approaches

1. All preprocessing steps + BiLSTM model (architecture parameters are optimized by SVM predictions) + Postprocessing with manually created offensive word list
2. All preprocessing steps + BiLSTM model (architecture parameters are optimized by SVM predictions) with FastText word embeddings + Postprocessing with manually created offensive word list
3. Parsing hashtags + BiLSTM model (architecture parameters are optimized by SVM predictions) with ELMo word embeddings + Postprocessing with manually created offensive word list

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
Baseline		Baseline
Final version OffensEval_HAD paper		Final version OffensEval_HAD paper
HatEval.py		HatEval.py
LICENSE		LICENSE
README.md		README.md
Reviews.pdf		Reviews.pdf
model.bin		model.bin
offenseval-trial-pre.txt		offenseval-trial-pre.txt
scrapper.py		scrapper.py
test.txt		test.txt
thrones2vec.w2v		thrones2vec.w2v

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SemEval-2019-task-6-HAD

Sub-tasks

Sub-task A - Offensive language identification (Offensive / Not Offensive)

Sub-task B - Automatic categorization of offense types (Targeted Insult and Threats / Untargeted)

Sub-task C - Offense target identification (Target: Individual / Group / Other)

Contributors

Preprocessing

Model

Sub-task A: Approaches

Sub-task B: Approaches

Sub-task C: Approaches

Post-competition phase

Model

Sub-task A: Approaches

All datasets can be issued via mail.

About

Releases

Packages

Contributors 3

Languages

License

cicl2018/semeval-2019-task-6-HAD

Folders and files

Latest commit

History

Repository files navigation

SemEval-2019-task-6-HAD

Sub-tasks

Sub-task A - Offensive language identification (Offensive / Not Offensive)

Sub-task B - Automatic categorization of offense types (Targeted Insult and Threats / Untargeted)

Sub-task C - Offense target identification (Target: Individual / Group / Other)

Contributors

Preprocessing

Model

Sub-task A: Approaches

Sub-task B: Approaches

Sub-task C: Approaches

Post-competition phase

Model

Sub-task A: Approaches

All datasets can be issued via mail.

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages