provider-sentiment

Overview

Sentiment analysis project to analyze user reviews of mobile applications used for cellular operators, that is MyTelkomsel, MyXL, MyIM3, and MySF (Smartfren), by scraping data from the Google Play Store. The focus is on performing sentiment analysis using natural language processing (NLP) techniques to understand user satisfaction and identify common issues.

Goals

Sentiment Classification: To classify reviews about each providers app into categories such as positive or negative, giving a clear picture of customer satisfaction.
Identify Key Issues: To identify common complaints, praises, or suggestions shared by customers, helping each operator understand the issues that need immediate attention or improvement.
Competitor Comparison: To compare sentiment scores between the four providers, allowing for a better understanding of public perception and brand image relative to each other.

Tech Stacks

Framework/Technologies	Roles
Kedro	Structuring data engineering and data science pipelines
PostgreSQL	Serves as a data lake for raw data and a data warehouse for preprocessed data
Docker	Containerize the entire project
Apache Airflow	Schedule workflows as DAGs
Scikit-learn	TF-idf vectorizer and support vector machine
PyTorch	Construct LSTM model & training indoBERT
Tableau	Creating visual dashboards and reports

How to install dependencies

Declare any dependencies in requirements.txt for pip installation.

To install them, run:

pip install -r requirements.txt

How to run ETL and ML pipeline using Docker

Change directory to root project
```
cd sentiment-provider-app
```
Initialize airflow within docker:
```
docker-compose up init-airflow -d
```
-d = Detached mode: Run containers in the background
Run docker-compose:
```
docker-compose up
```
To open Airflow, visit this link in browser
```
http://localhost:8080/
```

How to stop service from running:

docker-compose down -v

-v = Remove named volumes declared in the "volumes" section of the Compose file and anonymous volumes attached to containers

How to Access API

Change to deploy directory
```
cd deploy
```
Run the API
```
uvicorn api:app --reload
```

Test the API

curl -X 'POST' \
'http://127.0.0.1:8000/predict' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"text": "aplikasi ini bagus tapi sinyalnya jelek dan kadang lemot"
}'

you should receive JSON response:

{ "Sentiment": "Negative" }

ETL Pipelines

The Extract-Transform-Load pipeline are:

Extract
- Scrape data from google play store
- Store csv file in device for manual labelling
- Dump labeled dataset into postgres
Transform
- Combine datasets
- Remove missing value
- Remove review that has only emoji
- Case folding
- Add space after punctuations to prevent each word to combined after punctuation removal
  Example
```
 Input: "Aplikasi yang sangat buruk,jelek,pembohong"
 Output: "Aplikasi yang sangat buruk, jelek, pembohong"
```
- Remove punctuation characters
- Remove non-ASCII characters from the input text
- Removes URLs
- Stemming (Reduces words to their root form)
- Replace slang words in the input texts with their formal equivalents using colloquial-indonesian-lexicon dictionary
- Remove specific irrelevant words, such as brand name
- Fix letter repetition
  Example
```
"mmantap" -> "mantap",
"mannntap" -> "mantap",
"mantapp" -> "mantap"
```
- Remove reviews with less than 2 words
- Label encoding
- Remove empty string after preprocessing
Load
- Store transformed data in postgres as data warehouse
- Data in data warehouse can be used for dashboard and machine learning
Machine Learning
Comparing several model to get the best result:
- Support Vector Machine (K-fold cross validation & grid search hyperparameter tuning)
- LSTM (PyTorch)
- IndoBERT transformer model
- Gemini LLM
  
  Model F1 Scores (%)
  
  SVM 82.353
  
  SVM (Grid Search) 87.850
  
  LSTM 82.393
  
  IndoBERT 97.48
  
  Gemini LLM 93.913

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
airflow		airflow
asset		asset
conf		conf
data		data
db		db
deploy		deploy
docs/source		docs/source
extract		extract
notebooks		notebooks
src/provider_sentiment		src/provider_sentiment
.dive-ci		.dive-ci
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
colloquial-indonesian-lexicon-v3.csv		colloquial-indonesian-lexicon-v3.csv
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

provider-sentiment

Overview

Goals

Tech Stacks

How to install dependencies

How to run ETL and ML pipeline using Docker

How to Access API

ETL Pipelines

About

Releases

Packages

Languages

Model	F1 Scores (%)
SVM	82.353
SVM (Grid Search)	87.850
LSTM	82.393
IndoBERT	97.48
Gemini LLM	93.913

License

anggapark/sentiment-provider-app

Folders and files

Latest commit

History

Repository files navigation

provider-sentiment

Overview

Goals

Tech Stacks

How to install dependencies

How to run ETL and ML pipeline using Docker

How to Access API

ETL Pipelines

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages