GitHub - ankursharma-iitd/clustering-data-mining: This is the 2nd assignment of the Data Mining Course (COL 761) at IIT Delhi.

Clustering

K-means, DBSCAN, OPTICS

This repository contains the implementation of k-means, DBSCAN and OPTICS from scratch as a part of Homework-2 of the Data Mining Course(COL761) at IIT Delhi.

The dataset is of the following format, where each line corresponds to an n-dimensional point. Each dimension is separated by a space. The number of dimensions and lines can be of any value till 5 and ~1-million points respectively. The feature values are floating point numbers.

3 4 5... 
1 7 8... 
...

sh compile.sh clones the following Github repo and compiles code with respect to all implementations.
sh <rollno>.sh -kmeans <k> executes k-means algorithm with k as the numberof clusters and produces the cluster assignment of each data point
sh <rollno>.sh -dbscan -<minPts> <epsilon> executes DBSCAN and produces the list of cluster assignment of each data point
sh <rollno>.sh -optics -<minPts> <epsilon> executes OPTICS and plots the reachability data using matplotlib

Output format for DBSCAN and k-means

Files called dbscan.txt and kmeans.txt rare produced by DBSCAN and k-means espectively. Each file is of the following format:

#<cluster ID>
<Point1 line no>
<Point 2 line no>
...
#<cluster ID> 
<Point 1 line no>
<Point 2 line no> 
...

The “#” indicates the start of a cluster ID, followed by the line number of all points belonging to the cluster. The line numbers start from 0 and can be treated as an ID of each point. For DBSCAN, all outliers are grouped under the special cluster ID “#outlier”. <cluster ID> are assigned integer values only starting from 0.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
plots		plots
.gitignore		.gitignore
2015CS50278.sh		2015CS50278.sh
2015CS50278.zip		2015CS50278.zip
KDTreeVectorOfVectorsAdaptor.h		KDTreeVectorOfVectorsAdaptor.h
README.md		README.md
README.txt		README.txt
Updated_Assignment2.pdf		Updated_Assignment2.pdf
check_accuracy.py		check_accuracy.py
compile.sh		compile.sh
dataset.txt		dataset.txt
dataset_gen.py		dataset_gen.py
dbscan.cpp		dbscan.cpp
dbscan.py		dbscan.py
install.sh		install.sh
kmeans.cpp		kmeans.cpp
kmeans.py		kmeans.py
nanoflann.hpp		nanoflann.hpp
new_dataset.txt		new_dataset.txt
new_dataset_gen.py		new_dataset_gen.py
optics.py		optics.py
updated_plot.py		updated_plot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clustering

K-means, DBSCAN, OPTICS

Output format for DBSCAN and k-means

About

Releases

Packages

Contributors 2

Languages

ankursharma-iitd/clustering-data-mining

Folders and files

Latest commit

History

Repository files navigation

Clustering

K-means, DBSCAN, OPTICS

Output format for DBSCAN and k-means

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages