Twitter Sentiment Analysis

Requirements

The original dataset, train.csv has 99899 tweets.

Pre processing usually depends on the type of data under analysis. For twitter sentiment analysis, preprocessed steps followed are as follows:

Removing words containing a particular pattern eg: tweets contained user names like @user1
Removing punctuations, numbers and apostrophes
Tokenization
Fixing the word length and one could also perform spell correction eg: converting juuuusssttttt to just
Removing stop words and words with length less than 2
Lemmatization
Removal of rare and most frequently occuring words
Some manual corrections

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
preprocessed_data.rar		preprocessed_data.rar
train.csv		train.csv
twitter_sample.ipynb		twitter_sample.ipynb
twitter_sentiment.py		twitter_sentiment.py