This project performs sentiment analysis on the ArmanEmo dataset, which consists of Persian text labeled with seven different emotional categories. The goal is to classify the emotions of unseen text using state-of-the-art NLP models.
- Mohammad Sadegh Poulaei
- Fatemeh Askari
The ArmanEmo dataset contains text from various sources like social media and Digikala reviews. Each sentence is assigned one of seven emotional labels. The data is provided in two files: train.tsv
and test.tsv
.
The preprocessing involved the following steps:
- Converted
tsv
files tocsv
. - Removed non-Persian characters and symbols.
- Normalized elongated words (e.g., "خیییلللی" → "خیلی").
- Removed Arabic letters and symbols like
_
and#
. - Removed Persian numbers.
- Applied further normalization using
parsivar
.
We experimented with several transformer models for emotion classification:
- ParsBert: A Persian-specific BERT model fine-tuned for sentiment tasks.
- ALBERT: A lightweight version of BERT with parameter sharing.
- roberta_facebook: A robust, general-purpose model.
- persian_xlm_roberta_large (Final model): A multilingual version of RoBERTa trained on large datasets, achieving the best accuracy on the test set.
- Dynamic masking allows for a different mask pattern per mini-batch.
- Trained on over 100 languages and 2.5 TB of data.
- Achieved the highest accuracy on the ArmanEmo test set, reaching 69% accuracy after 15 epochs.
We used the following metrics to evaluate model performance:
- Accuracy
- F1 Score
- Precision
- Recall
The best-performing model, persian_xlm_roberta_large
, achieved the following on the test set:
- Accuracy: 69%
- F1 Score: 71%
- Precision: 73%
- Recall: 71%
Some example sentences with their predicted emotions:
Sentence | True Label | Predicted Label |
---|---|---|
"آرزوی موفقیت و پیروزی ایران در جام جهانی" | Happy | Other |
"میخندم ولی دلم پر از غم است" | Happy | Sad |
"آهنگ عالی، راننده خطی صداش کم نمیشه" | Happy | Angry |
To run the code and reproduce the results:
-
Clone the repository:
git clone https://github.com/mspoulaei/armanemo-sentiment-analysis.git
-
run jupyter DL_Project_Model_{model_name}.ipynb in colab