This project applies various clustering and classification techniques to segment customers based on their characteristics. It includes K-Means, Hierarchical Clustering, DBSCAN, Decision Trees, Random Forests, Neural Networks, and SVM, providing a comprehensive approach to customer segmentation.
By leveraging unsupervised and supervised learning models, this project delivers an automated, scalable, and interpretable segmentation solution.
Before applying clustering techniques, exploratory data analysis (EDA) is performed to understand the structure and distribution of customer attributes.
- K-Means is a centroid-based clustering method that partitions data into K clusters based on similarity.
- The Elbow Method and Silhouette Analysis are used to determine the optimal number of clusters.
📌 K-Means Cluster Visualization:
- Hierarchical clustering builds a tree-like structure (dendrogram) to visualize relationships between customers.
- Ward’s method is used to minimize variance within clusters.
📌 Dendrogram (Hierarchical Clustering):
📌 Hierarchical Clusters Visualization:
- DBSCAN is a density-based clustering algorithm that groups core samples and detects noise.
- This method is effective for identifying arbitrary-shaped clusters and handling outliers.
- A Decision Tree is trained to classify customers into clusters based on feature splits.
- It provides an interpretable model for understanding customer segmentation.
- A Random Forest classifier is used to improve classification accuracy.
- Feature importance ranking helps understand which attributes drive customer segmentation.
- SVM is used to find the best decision boundary for segmenting customers.
- Kernel trick is applied to handle non-linearity in data.
1️⃣ Clone the Repository:
git clone https://github.com/yourusername/customer-segmentation.git
cd customer-segmentation
2️⃣ Install Required Libraries in R:
install.packages(c("tidyverse", "dbscan", "factoextra", "cluster",
"ggplot2", "rpart", "rpart.plot", "randomForest",
"nnet", "e1071"))
3️⃣ Run the Segmentation Script:
source("customer_segmentation.R")
- DBSCAN: Modify
eps
andminPts
to adjust cluster density. - Decision Trees & Random Forests: Tune
ntree
andmtry
for better classification. - Neural Networks: Adjust hidden layers (
size
), regularization (decay
), and iterations (maxit
). - SVM: Change kernel type (
linear
,radial
,polynomial
) to improve separation.
This project is open-source and available under the MIT License.
✅ This project provides a comprehensive approach to customer segmentation using advanced machine learning techniques.
💡 Use this repository to improve your understanding of clustering, classification, and model evaluation! 🚀
👨💻 Author: Joel Mande
📧 Email: joelwanjala09@gmail.com.com
🌐 LinkedIn: Joel Mande