GitHub - timothynn/Palmer-Penguins-Clustering: EDA, Clustering for Penguins dataset

Overview

This project aims to cluster penguins into different groups based on their physical characteristics using unsupervised learning algorithms. The project will involve gathering penguin data, cleaning and preprocessing the data, selecting appropriate unsupervised learning algorithms, and evaluating the performance of the clustering models.

Goals

To cluster penguins into different groups with high accuracy
To gain experience in data preprocessing, feature selection, and unsupervised learning algorithms
To create a reusable clustering pipeline for future projects

Data

Data Source: Palmer Penguin Dataset

Data Description: The data contains information about different penguin species, including their physical characteristics such as beak length, flipper length, and body mass. The data has 344 instances and 17 features.

Data Preprocessing Steps:

Remove duplicate instances
Remove missing values
Normalize the data
Feature selection and engineering

Tasks

Planning Phase

Define problem statement and project goals
Gather and clean data
Perform exploratory data analysis
Select appropriate unsupervised learning algorithms

Implementation Phase

Train and test clustering models
Fine-tune models
Evaluate model performance
Select final clustering model

Deployment Phase

Deploy model to production (if applicable)
Document project findings and conclusions
Create a blog post or portfolio entry about the project

Unsupervised Learning Algorithms

K-Means Clustering
Hierarchical Clustering
DBSCAN Clustering

Evaluation Metrics

Silhouette Score
Elbow Method

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
.vscode		.vscode
data		data
.gitignore		.gitignore
1 - EDA.ipynb		1 - EDA.ipynb
LICENSE		LICENSE
README.md		README.md
main-notebook.ipynb		main-notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Goals

Data

Data Preprocessing Steps:

Tasks

Planning Phase

Implementation Phase

Deployment Phase

Unsupervised Learning Algorithms

Evaluation Metrics

About

Languages

License

timothynn/Palmer-Penguins-Clustering

Folders and files

Latest commit

History

Repository files navigation

Overview

Goals

Data

Data Preprocessing Steps:

Tasks

Planning Phase

Implementation Phase

Deployment Phase

Unsupervised Learning Algorithms

Evaluation Metrics

About

Topics

Resources

License

Stars

Watchers

Forks

Languages