Skip to content

My first github repository to host a k-means python program

Notifications You must be signed in to change notification settings

KerrFitzgerald/KerrMeans

Repository files navigation

GitHb repository to host python code (K_Means_Vectorized.py) for the K-means algorithms. Algorithms used can be found in "Information Theory, Inference & Learning Algorithms" by David J.C. Mackay

This repository also contains a Jupyter Notebook (Lab Session k-means Clustering Wine Chemical Composition Data.ipynb) which applies K-means algorithms using the Sci-Kit Learn Python Library. This notebook has been used for intereactive MSc Data Science undergraduate teaching.

A number of improvements could be made to this code and repository. Examples include:

  1. Implement Updated Soft K-means Version 1 algorithm
  2. Implement Soft K-means Version 2 algorithm
  3. Implement Soft K-means Version 3 algorithm
  4. Implement the silhouette method

Completed improvements include:

  1. Add a feature to stop 'update loop' once mean values do not move
  2. Reduce the amount of print statements by using loops and a python dictionary
  3. Expand codes to work with more than 3 dimesnions
  4. Add option for 2D functionality using an 'if' loop and additional input parameter
  5. Split code into functions which may also easy implementation of elbow method
  6. Improve the test data so that it contains 5 dimensions (Moved to OLD_CODE)
  7. Implement elbow method
  8. Improve the test data so that it contains 10 dimensions (Moved to OLD_CODE)
  9. Change code to allow automated elbow method
  10. Vectorize the distance and update calculation
  11. Add in statistical running capability testing different random number seeds
  12. Adhere to PEP8 Python standard

The 'wine' dataset is available at: https://archive.ics.uci.edu/ml/datasets.php

About

My first github repository to host a k-means python program

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published