GitHb repository to host python code (K_Means_Vectorized.py) for the K-means algorithms. Algorithms used can be found in "Information Theory, Inference & Learning Algorithms" by David J.C. Mackay
This repository also contains a Jupyter Notebook (Lab Session k-means Clustering Wine Chemical Composition Data.ipynb) which applies K-means algorithms using the Sci-Kit Learn Python Library. This notebook has been used for intereactive MSc Data Science undergraduate teaching.
A number of improvements could be made to this code and repository. Examples include:
- Implement Updated Soft K-means Version 1 algorithm
- Implement Soft K-means Version 2 algorithm
- Implement Soft K-means Version 3 algorithm
- Implement the silhouette method
Completed improvements include:
- Add a feature to stop 'update loop' once mean values do not move
- Reduce the amount of print statements by using loops and a python dictionary
- Expand codes to work with more than 3 dimesnions
- Add option for 2D functionality using an 'if' loop and additional input parameter
- Split code into functions which may also easy implementation of elbow method
- Improve the test data so that it contains 5 dimensions (Moved to OLD_CODE)
- Implement elbow method
- Improve the test data so that it contains 10 dimensions (Moved to OLD_CODE)
- Change code to allow automated elbow method
- Vectorize the distance and update calculation
- Add in statistical running capability testing different random number seeds
- Adhere to PEP8 Python standard
The 'wine' dataset is available at: https://archive.ics.uci.edu/ml/datasets.php