Parallel Hash Join Optimizations in Rust

Authors: Zhidong (David) Guo, Ye (Ethan) Yuan

[Project Website] (https://cmu-15618-team.github.io/parallel-hash-join/)

Overview

This repository contains the implementation and performance analysis of in-memory parallel hash join algorithms. The project is a deep exploration into optimizing hash join operations, a fundamental component of relational database management systems (DBMS), by leveraging parallelism and advanced memory management techniques in Rust.

The project is part of Carnegie Mellon University's 15-618 Parallel Programming course (Spring 2024).

Introduction

Join operations are crucial for relational databases, often determining the performance of SQL queries. We focus on the hash join algorithm due to its efficiency in handling large datasets. Our project evaluates two main variants of parallel hash joins:

Shared Hash Join: All threads build and use a single shared hash table.
Partitioned Hash Join: Input relations are divided into partitions, and each thread operates on its local partition.

We analyze these designs using both static and dynamic task scheduling and present a comprehensive performance analysis across various workloads with different data distributions.

Features

Rust Implementation: Efficiently implemented using Rust for memory safety and performance.
Dynamic Scheduling: Utilized Rayon for efficient work-stealing and load balancing.
Cache Utilization Analysis: Detailed investigation into cache performance and optimization.
Support for Skewed and Uniform Data: Benchmarked using datasets with different skewness levels, including uniform, low skew, and high skew distributions.

Methodology

Our hash join algorithm is divided into three main phases:

Partition Phase (optional): Divides tables into partitions to improve cache efficiency.
Build Phase: Constructs hash tables from the inner relation.
Probe Phase: Probes the outer relation against the hash tables to generate joined tuples.

The performance of these phases was measured and analyzed under various configurations to optimize parallel execution.

Scheduling Strategies

Static Scheduling: Fixed assignment of partitions to threads.
Dynamic Scheduling: Threads dynamically pick partitions, reducing workload imbalance.

Hash Functions

We used xxh3 as our hash function for its high performance, and parking_lot::Mutex for efficient synchronization in our Rust implementation.

Implementation Details

Programming Language: Rust
Libraries:
- Rayon for parallel execution
- parking_lot for synchronization primitives
- boxcar for lock-free vector operations

Data Structures

Hash Buckets: Implemented using Mutex<std::Vec> for high read throughput.
Partition Buffers: Implemented with boxcar::Vec for efficient parallel writes.

Performance Evaluation

Hardware

Platform: Pittsburgh Supercomputing Center (PSC)
CPU: AMD EPYC 7742 with 64 cores
Memory: 256GB RAM

Experiments

We benchmarked the performance using varying skew parameters to evaluate the efficiency and scalability of the implementations. The analysis covered:

Speedup Analysis: Evaluated the overall and per-phase speedup.
Cache Analysis: Investigated cache miss rates across configurations.
Synchronization Overhead: Measured CPU cycles per output tuple for different scheduling and partitioning strategies.

Setup and Usage

Prerequisites

Rust Compiler (version 1.77.2 or higher)
Cargo for dependency management

Installation

Clone the repository:

git clone https://github.com/cmu-15618-team/parallel-hash-join.git
cd parallel-hash-join

Build the project:
```
cargo build --release
```

Running the Benchmarks

To execute the benchmarks:

cargo run --release -- --inner-tuple-num <number> --outer-ratio <ratio> --threads <number>

Modify the parameters to customize the workload and evaluate performance.

Results

Speedup: Achieved up to 6.5x speedup using 8 cores in the probe phase.
Cache Optimization: Partitioned hash join reduced cache misses significantly under uniform data distribution.
Synchronization Overhead: Shared hash join performed better under skewed data distributions due to reduced synchronization.

For detailed performance graphs and analysis, refer to our report.

Contributors

Zhidong Guo (@zhidongg)
Ye Yuan (@yeyuan3)

License

This project is for educational purposes only as part of the 15-618 course at Carnegie Mellon University. Please do not distribute.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
py		py
report		report
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
poster.pdf		poster.pdf
poster.pptx		poster.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Hash Join Optimizations in Rust

Overview

Table of Contents

Introduction

Features

Methodology

Scheduling Strategies

Hash Functions

Implementation Details

Data Structures

Performance Evaluation

Hardware

Experiments

Setup and Usage

Prerequisites

Installation

Running the Benchmarks

Results

Contributors

License

About

Releases

Packages

Contributors 2

Languages

cmu-15618-team/parallel-hash-join

Folders and files

Latest commit

History

Repository files navigation

Parallel Hash Join Optimizations in Rust

Overview

Table of Contents

Introduction

Features

Methodology

Scheduling Strategies

Hash Functions

Implementation Details

Data Structures

Performance Evaluation

Hardware

Experiments

Setup and Usage

Prerequisites

Installation

Running the Benchmarks

Results

Contributors

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages