Project for Programming Massively Parallel Processors
Include the following modules before compiling on octopus:
module load cuda
module load gcc/10.1.0
Progress:
GPU0: Done
GPU1: Done (not Efficient)
GPU2: DONE -- Optimized <<=== most efficient
GPU3: Done
GPU4: TODO
Our Results for matrix 3 (126 ms)
![Results.png](https://github.com/chriskhalil/SPM_GPU/blob/28f30831a0deb00317e76b385c3ea65616f6ff1d/Results.png)