In this project, the dataset used is Jigsaw Unintended Bias in Toxicity Classification. It is available on Kaggle (https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data)
- train.csv
- test.csv
In this Project, Gender is chosen as a basic genre for Identifing Bias.
-
Data_Preparation.ipynb: In this ipython notebook, we prepare data so that it can be used in BERT_Data-Classification and hence let us understand about bias.
-
BERT_Data-Classification.ipynb: In this notebook, we perform text classification by fine-tuning a BERT-based model.
-
bias-toxicity-classification.ipynb: In this notebook, toxicity Classification using Logistic Regression and using LSTM architecture.
Importing libraries
Data Cleaning
Exploratory Data Analysis
Data Splitting
Using Logistic Regression
Using LSTM - Single LSTM layer architecture
Comparing AUC/Designed Metrics AUC