E-Commerce Customer Churn Analysis and Prediction

This project focuses on predicting customer churn in an online retail company using machine learning algorithms. The dataset, sourced from Kaggle, underwent extensive exploratory data analysis (EDA), preprocessing, and classification methods, including Random Forest and XGBoost. Additionally, clustering techniques such as K-Means and DBSCAN were employed to identify customer segments.

Project Overview

Introduction: The goal is to predict customer churn and perform customer segmentation to tailor promotional strategies.
Exploratory Data Analysis (EDA): Analyzing data shape, types, correlations, imbalances, and missing values.
Data Preprocessing: Handling missing values, outliers, encoding categorical variables, and balancing imbalanced data.
Classification Methods: Employing Random Forest, XGBoost, and Logistic Regression with and without balancing data.
Clustering Methods: Utilizing K-Means, DBSCAN, and Hierarchical clustering techniques.

Classification Results

Random Forest: Achieved high accuracy, precision, and AUC-ROC; slightly lower recall.
XGBoost: Outperformed other classifiers in most metrics.
Logistic Regression: Showed comparatively lower scores.

Clustering Results

K-Means with t-SNE: Produced the most accurate clusters compared to other clustering methods.
DBSCAN: Demonstrated less accurate clustering.
Hierarchical Clustering: Used for visualizing dendrogram structure.

Acknowledgments

Dataset source: Kaggle.
Libraries used: pandas, scikit-learn, xgboost, seaborn, and others.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

E-Commerce Customer Churn Analysis and Prediction

Project Overview

Classification Results

Clustering Results

Acknowledgments

Files

README.md

Latest commit

History

README.md

File metadata and controls

E-Commerce Customer Churn Analysis and Prediction

Project Overview

Classification Results

Clustering Results

Acknowledgments