Skip to content

Commit

Permalink
updated README
Browse files Browse the repository at this point in the history
  • Loading branch information
YumingxuanGuo committed Mar 23, 2023
1 parent 9496e2d commit 83d5e85
Showing 1 changed file with 38 additions and 20 deletions.
58 changes: 38 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,35 @@
# FeatherDB

Version: 0.2.0
Version: 0.3.0

## Introduction

FeatherDB is an on-disk, persistent, concurrent, unreliable, non-transactional, non-relational, centralized database.
FeatherDB is an disk-based, concurrent, transactional, unreliable, non-relational, centralized database management system in Rust,
written for educational purposes.

## What's New
In the most recent release, FeatherDB evolved from purely in-memory to disk-based with persistency.
We chose LSM-tree as the main key-value storage engine for its great performance in writing operations.

Five methods are provided as the interface:
In our latest release, FeatherDB introduces support for concurrent ACID transactions based on Multi-Version Concurrency Control (MVCC). We now offer two isolation levels: Snapshot Isolation (SI) and Serializable Snapshot Isolation (SSI).

### Snapshot Isolation (SI)

Snapshot Isolation enables multiple transactions to execute concurrently by creating a distinct snapshot of the database for each transaction. This approach avoids the need for locks and ensures that write operations do not block read operations, reducing contention between transactions and enhancing overall system performance. Furthermore, SI versions all data, allowing queries on historical data. To provide ACID transactions, commits are atomic: a transaction operates on a consistent snapshot of the key/value store from the beginning of the transaction, and any write conflicts lead to serialization errors that require retries.

Reference: [toyDB](https://github.com/erikgrinaker/toydb).

### Serializable Snapshot Isolation (SSI)

While SI eliminates most inconsistencies, including phantom reads, it is not fully serializable, as it may exhibit write skew anomalies. To address this, we have introduced a separate mechanism to detect transactions with consecutive read-write dependencies, a necessary but insufficient condition for write skews. If a transaction is detected, it is aborted, rolled back, and retried. This approach conservatively eliminates write skews and ensures serializability, although some benign transactions might be aborted. However, we accept the overhead of false positives due to their infrequency.

Reference: [Serializable Isolation for Snapshot Databases](https://courses.cs.washington.edu/courses/cse444/08au/544M/READING-LIST/fekete-sigmod2008.pdf).

## Key-Value Storage

FeatherDB has evolved from a purely in-memory solution to a disk-based, persistent storage system.
We've chosen the LSM-tree as the primary key-value storage engine due to its excellent write performance.

We provide five methods as the interface:

```Rust
/// Sets a value for a key, replacing the existing value if any.
fn set(&self, key: &[u8], value: Vec<u8>) -> Result<()>;
Expand All @@ -28,33 +47,32 @@ fn scan(&self, range: Range) -> Result<KvScan>;
fn flush(&self) -> Result<()>;
```

LSM-tree contains two different data structures: memtable for in-memory data, and sstable (sorted string table) for on-disk data.
Once the data are flushed as sstables to disk, they become immutable; so mutable operations are only performed in the memtables.
The LSM-tree comprises two distinct data structures: a memtable for in-memory data and an SSTable (Sorted String Table) for on-disk data. Once data is flushed to disk as SSTables, it becomes immutable; mutable operations are only performed in the memtables.

We used a lock-free skiplist as the underlying implementation for the memtable, assuring thread-safety in concurrent operations.
As a plus, all the methods in the interface take immutable references, allowing us to access the storage with simply `Arc<LsmTree>`, instead of `Arc<RwLock<LsmTree>>`.
We employ a lock-free skiplist as the underlying implementation for the memtable, ensuring thread-safety in concurrent operations.
Additionally, all interface methods take immutable references, allowing us to access the storage with a simple `Arc<LsmTree>` instead of `Arc<RwLock<LsmTree>>`.

We also provided the iterator functionality in the `scan` method.
This offers efficient key-range traversals and supports for, possibly in the future, SQL queries.
We also offer iterator functionality in the `scan` method,
enabling efficient key-range traversals and laying the foundation for potential future SQL query support.

## Usage

Currently, there is no command-line interface available.
At the moment, we do not provide a command-line interface for FeatherDB. However, we are actively working on this feature, and it will be available in future releases. Please follow our GitHub repository for updates and new releases.

## What's Next

Some of the most imminent developments includes:
We have several exciting developments planned for the near future and beyond:

* Write-ahead-log (WAL): the fail-safe that guarantees all operations will be performed eventually even after unexpected crashes.
* Relational Model: We're working on an SQL interface that will include projections, filters, joins, aggregates, and transactions to provide a comprehensive relational model experience.

* Level compaction: the "merge" part of the LSM-tree that reduces the storage overhead for outdated and deleted entries.
In the longer term, we have the following enhancements in mind:

* Bloom filter: an optimization technique that improves key searching performance.
* Distributed System: We plan to implement a Raft-based distributed consensus engine for linearizable state machine replication, expanding FeatherDB's capabilities in distributed environments.

* Transaction: an transactional engine that supports ACID transactions under different isolation levels.
* Write-ahead-log (WAL): To ensure durability and data integrity, we will introduce a WAL mechanism that guarantees all operations will eventually be performed, even after unexpected crashes.

Further upcoming ones includes:
* Leveled Compaction: As an essential part of the LSM-tree, we will implement level compaction to minimize storage overhead for outdated and deleted entries, optimizing storage utilization.

* Relational model: an SQL interface including projections, filters, joins, aggregates, and transactions.
* Bloom Filter: To enhance key searching performance, we will incorporate Bloom filters as an optimization technique, reducing unnecessary disk I/O operations.

* Distributed system: a Raft-based distributed consensus engine for linearizable state machine replication.
Stay tuned for these upcoming features and improvements in FeatherDB. Be sure to follow our GitHub repository to stay up-to-date with our latest developments and releases.

0 comments on commit 83d5e85

Please sign in to comment.