Skip to content

Latest commit

 

History

History
28 lines (19 loc) · 1.72 KB

File metadata and controls

28 lines (19 loc) · 1.72 KB

E-Commerce Customer Churn Analysis and Prediction

This project focuses on predicting customer churn in an online retail company using machine learning algorithms. The dataset, sourced from Kaggle, underwent extensive exploratory data analysis (EDA), preprocessing, and classification methods, including Random Forest and XGBoost. Additionally, clustering techniques such as K-Means and DBSCAN were employed to identify customer segments.

Project Overview

  1. Introduction: The goal is to predict customer churn and perform customer segmentation to tailor promotional strategies.
  2. Exploratory Data Analysis (EDA): Analyzing data shape, types, correlations, imbalances, and missing values.
  3. Data Preprocessing: Handling missing values, outliers, encoding categorical variables, and balancing imbalanced data.
  4. Classification Methods: Employing Random Forest, XGBoost, and Logistic Regression with and without balancing data.
  5. Clustering Methods: Utilizing K-Means, DBSCAN, and Hierarchical clustering techniques.

Classification Results

  • Random Forest: Achieved high accuracy, precision, and AUC-ROC; slightly lower recall.
  • XGBoost: Outperformed other classifiers in most metrics.
  • Logistic Regression: Showed comparatively lower scores.

Clustering Results

  • K-Means with t-SNE: Produced the most accurate clusters compared to other clustering methods.
  • DBSCAN: Demonstrated less accurate clustering.
  • Hierarchical Clustering: Used for visualizing dendrogram structure.

Acknowledgments

  • Dataset source: Kaggle.
  • Libraries used: pandas, scikit-learn, xgboost, seaborn, and others.