This project demonstrates sentiment analysis on tweets related to US airlines using the BERT architecture into 3 classes - positive, negative and neutral. The dataset used for this project is the Twitter US Airline Sentiment Dataset which contains tweets labeled as positive, neutral, or negative sentiments for six major US airlines.
In this project, we finetune the pre-trained BERT (Bidirectional Encoder Representations from Transformers) model to perform sentiment analysis on tweets. The main steps of the project are as follows:
-
Data Preprocessing: Tweets are cleaned, removing links, usernames, emojis, and non-alphabetic characters.
-
Label Encoding: Sentiment labels ("positive", "neutral", "negative") are encoded into numerical values.
-
Model: A pre-trained BERT model for sequence classification is loaded and fine-tuned for sentiment analysis task.
-
Training: The model is trained using a custom dataset containing tweet text and encoded labels.
-
Evaluation: Model performance is evaluated on test set using different classification metrics such as accuracy.
Before running the python application, we'll need the following:
- Python 3.10.12 installed
- Necessary packages (install using pip install -r requirements.txt)
Step 1. Clone the repository to your local machine and then switch to code directory
git clone https://github.com/gautamgc17/Bert-for-Sentiment-Analysis.git
cd Bert-for-Sentiment-Analysis
Step 2. Create a Virtual Environment and install Dependencies
pip install virtualenv
Create a new Virtual Environment for the project and activate it.
virtualenv env
env\Scripts\activate
Now install the project dependencies in this virtual environment, which are listed in requirements.txt
.
pip install -r requirements.txt
Step3. Download dataset
Download the Twitter US Airline Sentiment Dataset and save it as Tweets.csv
in the repository's root directory.
Step 4. Run the Python application for model training
python sentiment_analysis.py
Accuracy Obtained on Test-Set : 87%