Preprocessing for tweets dataset using NLTK.
As we are all know we are in the era of data and most of this data are unstructured and based on article on mongodb :
From 80 to 90 percent of data generated and collected by organizations, is unstructured,, and its volumes are growing rapidly — many times faster than the rate of growth for structured databases.
So part of our work is to handle and clean this data so that it becomes useful and meaningful.
So here is my work as part of my assignment for natural language preprocessing.
I'm beginner so any improvements even a little ones will be appreciated.
Link of the dataset : https://www.kaggle.com/manchunhui/us-election-2020-tweets
Link of the article : https://www.mongodb.com/unstructured-data