A Parallel Implementation of The Apriori Algorithm on AiMOS Supercomputer Using CUDA and MPI

(For implementation details see the Report.pdf. For Compilation instructions see README.pdf.)

For this project, we implemented the Apriori algorithm for Frequent pattern mining on the DCS cluster of AiMOS using CUDA and MPI. The most computationally expensive part of the Apriori algorithm is scanning over the whole dataset at each level. This process can be massively parallelized since it requires performing independent subset operations on different datapoints for different candidate patterns. To implement this algorithm in CUDA used the bitset representations of the itemsets which allowed us to efficiently perform operations like subset and set intersection using simple bitwise operations which are very fast on both CPU and GPU. Each thread on the GPU is assigned to perform subset operation for each candidate pattern, allowing for simultaneous scanning of the datasets for all candidate patterns. On the advent of multiple MPI ranks and GPUs on AiMOS, we partitioned the dataset and allocated different parts to different MPI ranks. After computing support within each rank, the total support was calculated by adding up supports found by other ranks – which was performed by the MPI all-reduce operation. To facilitate the reading of large datasets, the data file is read in sequential blocks parallelly by each MPI rank – using MPI read. To make the output file write more efficient MPI write was used. We have verified the scalability of this implementation both for massively parallel systems and for large datasets by extensive experiments. We have evaluated the performance of different parts of the program and identified potential limiting factors. The current implementation which can be further improved, still achieves admirable results on benchmark datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
datasets		datasets
reader		reader
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
README.pdf		README.pdf
Report.pdf		Report.pdf
apriori		apriori
apriori.cpp		apriori.cpp
apriori.h		apriori.h
compsup.cpp		compsup.cpp
compsup.cu		compsup.cu
compsup.h		compsup.h
dataset.h		dataset.h
getticks.h		getticks.h
slurm-156064(sample output).out		slurm-156064(sample output).out
slurmSpectrum.sh		slurmSpectrum.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Parallel Implementation of The Apriori Algorithm on AiMOS Supercomputer Using CUDA and MPI

About

Releases

Packages

Contributors 3

Languages

shamim-hussain/parallel-apriori-with-cuda-and-mpi

Folders and files

Latest commit

History

Repository files navigation

A Parallel Implementation of The Apriori Algorithm on AiMOS Supercomputer Using CUDA and MPI

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages