A collection of GPU kernels implemented one day at a time, progressing from basic to advanced concepts.
- NVIDIA GPU with CUDA support
- CUDA Toolkit installed
- Python 3.11+
- PyTorch
- Day 1 - Basic Vector Addition in CUDA
- Day 2 - Vector Addition with Python/PyTorch Bindings
- Day 3 - RGB to Grayscale Conversion
- Day 4 - RGB to Blurred Image Conversion
- Day 5 - Simple Matrix Multiplication
- Day 6 - Coalased Matrix Multiplication
- Day 7 - GELU Activation function
- Day 8 - NAIVE Batch Normalisation
- Day 9 - Sigmoid Activation function
- Day 10 - Tanh Activation function and Tiled Matrix Multiplication
- Day 11 - Dynamic Tiled Matrix Multiplication
- Day 12 - Layer Normalisation using Shared Memory
- Day 13 - Matrix Transpose
- Day 14 - Softmax using shared memory
- Day 15 - GELU Forward and Backward Kernels
- Day 16 - Querying Gpu Properties
- Day 17 - Custom NF4 Quantization Implementation