Amazon reviews - Sentiment Analysis

“A fool's brain digests philosophy into madness, science into superstition and art into pedantry. Hence a university education. ” George Bernard Shaw

📌 Table of Contents

Introduction
Preprocessing and Cleaning dataset
Story Generation and Visualization from reviews
Text reviews
Extracting Features from Cleaned reviews
Model Building: Sentiment Analysis
Group Project

📝 Introduction

This project proposed the analyse of consumer behaviour in order to assist a business to build an effective and targeted marketing strategy.

To do this we will build predictive models on data sets compiled from e-Commerce giants, Amazon & Walmart datasets.

• *** Build a Sentiment Analysis model*** to predict the effect on sales in relation to customer reviews.

• *** Build a Market Basket Model on the Amazon dataset***. This will enable the enterprise to predict consumer behaviour by suggesting complimentary goods to purchase.

• *** Analyse the conversion rates*** in this dataset also with a view to building a model to increase these.

Examine customer sensitivity to price by building a linear regression model on the Walmart dataset.

The retail industry has taken a 180 degree turn with the rise in online shopping. In 2019, retail e-commerce sales worldwide amounted to 3.53 trillion US dollars and e-retail revenues are projected to grow to 6.54 trillion US dollars by 2022.

It was predicted that in 2020 the global e-commerce market exceed 4 trillion dollars, and one in every four online consumers purchases from stores once a week according to Invespcro (2020) report.

🚀 Preprocessing and Cleaning dataset

Importing Libraries

Visualization libraries

Pandas, Seaborn, Matplotlib.pyplot, Plotly.express as px

NLTK libraries

nltk, re, Wordcloud, PorterStemmer, TfidfVectorizer, Stopwords, Word_tokenize, TextBlob

Machine Learning libraries

sklearn, SVC, LabelEncoder, StandardScaler, Preprocessing import normalize, ExtraTreesClassifier, GridSearchCV

Machine Learning Models

LogisticRegression, DecisionTreeClassifier, BernoulliNB, KNeighborsClassifier, OneVsRestClassifier

model_selection import train_test_split, label_binarize

Other Libraries

Counter, SMOTE, CountVectorizer

⌛️ Dataset features

uniq_id, product_name, manufacturer, price, number_available_in_stock, number_of_reviews, number_of_answered_questions, average_review_rating, amazon_category_and_sub_category, customers_who_bought_this_also_bought, description, product_information, product_description, items_customers_buy_after_viewing_this_item, customer_questions_and_answers, customer_reviews, sellers

🎤 Story Generation and Visualization from reviews

By go further in the exploratory data analysis on texts we are try to understand what features contributes to the sentiment category.

Prior analysis assumptions:

Higher the rate the sentiment becomes positive
There are be many positive sentiment reviews which lead to bias
These assumptions will be verified with our plots also we will do text analysis

🏃 Text reviews

Review Text Ponctuation and creat stop words

NLKT stop words contains words like not, hasn't, would'nt which actually conveys a negative sentiment. If we remove that it will end up contradicting the target variable(sentiment). So I have curated the stop words which doesn't have any negative sentiment or any negative alternatives.

Creating additional features for text analysis.

Create polarity, review length and word count

Polarity: By using Textblob for figuring out the rate of sentiment between [-1,1] where -1 is negative and 1 is positive

Review length: length of the review which includes each letters and spaces

Word length: It measures how many words are in the customer review column

🚀 Extracting Features from Cleaned reviews

Before we build the model for our sentiment analysis, it is required to convert the review texts into vector formation as computer cannot understand words and their sentiment. In this project, we are going to use TF-TDF method to convert the texts.

Encoding target variable-sentiment

Stemming the reviews

Stemming is a method of deriving root word from the inflected word. Here we extract the customer reviews and convert the words to its root word.

There is another technique knows as Lemmatization where it converts the words into root words which has a semantic meaning.

Handling Imbalance target using feature-SMOTE

We noticed that we got a lot of positive sentiments compared to negative and neutral. So it is crucial to balanced the classes in such situation. SMOTE(Synthetic Minority Oversampling Technique)is used to balance out the imbalanced dataset problem. It aims to balance class distribution by randomly increasing minority class examples by replicating them.

:losedbook: Model Building: Sentiment Analysis

Sentiment Analysis refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Understanding people’s emotions is essential for businesses since customers are able to express their thoughts and feelings more openly than ever before.It is quite hard for a human to go through each single line and identify the emotion being the user experience. With machine learning models nowadays we can automatically analyzing customer feedback, from product reviews and survey responses to social media conversations for example, which allows to tailor products and services to meet customer needs.

🎉 College Project

CCT COLLEGE DUBLIN

Higher Diploma in Science in Data Analytics for Business

Under Supervision of: GRAHAM GLANVILLE & MARK MORRISSEY

Released in March 2021.

This project is under the MIT license.

Made with love by Sirlene Andreis 💚🚀

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products.csv.zip		Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products.csv.zip
New amazon data.ipynb		New amazon data.ipynb
ProjectMaster March10.ipynb		ProjectMaster March10.ipynb
README.md		README.md
market-basket-analysis-walmart.ipynb		market-basket-analysis-walmart.ipynb
walmart_product_data_2.csv		walmart_product_data_2.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon reviews - Sentiment Analysis

📌 Table of Contents

📝 Introduction

🚀 Preprocessing and Cleaning dataset

🎤 Story Generation and Visualization from reviews

🏃 Text reviews

Review Text Ponctuation and creat stop words

Creating additional features for text analysis.

🚀 Extracting Features from Cleaned reviews

Stemming the reviews

Handling Imbalance target using feature-SMOTE

:losedbook: Model Building: Sentiment Analysis

🎉 College Project

About

Releases

Packages

Languages

AndreisSirlene/Sentiment-analysis-reviews

Folders and files

Latest commit

History

Repository files navigation

Amazon reviews - Sentiment Analysis

📌 Table of Contents

📝 Introduction

🚀 Preprocessing and Cleaning dataset

🎤 Story Generation and Visualization from reviews

🏃 Text reviews

Review Text Ponctuation and creat stop words

Creating additional features for text analysis.

🚀 Extracting Features from Cleaned reviews

Stemming the reviews

Handling Imbalance target using feature-SMOTE

:losedbook: Model Building: Sentiment Analysis

🎉 College Project

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages