Skip to content

Latest commit

 

History

History
48 lines (38 loc) · 2.34 KB

README.md

File metadata and controls

48 lines (38 loc) · 2.34 KB

Automatic Detection of Depression on COVID-19 Tweets

Project realized by Eleonora Mancini and Eleonora Misino as a part of the Natural Language Processing exam of the Master's degree in Artificial Intelligence @ University of Bologna (A.A. 2019-2020).

The purpose of this project is the analysis of labeling strategies aimed at identifying depression phenomena among users’ tweets. The tweets used in this analysis refer to a specific period of the COVID19 pandemic. In particular, the objective is to try to understand if the strategies studied allow to identify evident phenomena of depression among users during the pandemic period. 4 different strategies were developed and analyzed. It was not possible to arrive at a robust solution, but this project highlights some interesting aspects that could be the starting point for a more in-depth analysis.

Data

Project Workflow

  1. COVID19 Tweets
  • Exploratory Data Analysis
  • Preprocessing
  • Topic Modeling: Latent Dirichlet Allocation
  • Tweets Labelling through 3 strategies that we call TWINT, VADER, NRCLex
  • Labelling Comparison
  • Unsupervised Analysis (LSA and Clustering)
  1. CLPSych Dataset
  • Exploratory Data Analysis
  • Features Extraction
  • Tweets Classification

Please, refer to the notebooks folder for a more detailed description.

Running the code

To reproduce our results:

  • Download the data (please, note that the CLPsych Dataset is not publicly available)
  • Download the notebooks from here
  • Run first the NLP_Project.ipynb notebook and then the CLPsych.ipynb notebook

Results

A detailed analysis of the results can be found here.

Authors

Eleonora Mancini, Eleonora Misino

License

This project is licensed under the MIT License - see the LICENSE file for details.