This repository hosts a collection of microbenchmarks and recommendation system operations tailored for Gaudi-v2. Our work focuses on optimizing performance and providing efficient implementations for key operations. Specifically, we aim to deliver the following contributions:
- Microbenchmarks: Evaluation of compute, memory, and communication primitives on Gaudi-v2.
- TPC Kernel for EmbeddingBag: Implementation of a table-batched embeddingbag operation optimized for Gaudi-v2.
- TPC Kernel for Data Preprocessing: Development of a custom kernel for efficient data preprocessing.
We have implemented the embeddingbag operation for Gaudi-v2 and are actively working on additional kernels and benchmarks. Further updates will be provided as progress continues.