VStream: A Distributed Streaming Vector Search System

Introduction

VStream is a distributed streaming vector search system with the following features:

Vector search on streaming data
Dynamic data partitioning
Hierarchical storage mechanism
Vector compression
Hot-cold separation

Build & Run

Requirements

Linux
gcc >= 11
cmake >= 3.10
Java 8
Maven >= 3.8.6
Flink 1.18

Build

bash build.sh

After building, make sure to do the following on every machine in your Flink cluster:

Put build/java/librocksdbjni-shared.so in $LD_LIBRARY_PATH and rename it to librocksdbjni.so.
Copy build/java/rocksdbjni_classes.jar to $FLINK_HOME/lib directory.

Data Preparation

Before running the experiments, you should upload the vector dataset to HDFS. The vector dataset is expected to be in SIFT format.

Configurations

An example of the configuration file is given in flink-frontend/src/main/resources/params.yaml, which contains runtime parameters related to HDFS, RocksDB, the HNSW index and the Flink job. For meaning of each parameter, see flink-frontend/src/main/java/cn/edu/zju/daily/util/Parameters.java.

Run

Run the experiment pipeline by submitting the Flink job:

flink run -c cn.edu.zju.daily.VStreamSearchJob ./build/flink-frontend/vstream-1.1.jar <params.yaml>

where params.yaml is the configuration file.

Comparing with the Baselines

This repo contains the baseline solution using Milvus, Qdrant and Chroma.

Milvus

Start a Milvus 2.3 cluster.
Fill the Milvus root information in the configuration file.
Run:

flink run -c cn.edu.zju.daily.MilvusSeparatedStreamSearchJob ./build/flink-frontend/vstream-1.1.jar <params.yaml>

Qdrant

Start a Qdrant 1.12.1 cluster.
Fill the Qdrant-related parameters in the configuration file.
Run:

flink run -c cn.edu.zju.daily.QdrantSeparatedStreamSearchJob ./build/flink-frontend/vstream-1.1.jar <params.yaml>

Chroma

Start a Chroma instance (version 0.5.12) for each Flink parallelism on the local machine. You can use this script:

flink-frontend/scripts/chroma/start-chroma-cluster.sh

Fill the Chroma-related parameters in the configuration file.
Run:

flink run -c cn.edu.zju.daily.ChromaSeparatedStreamSearchJob ./build/flink-frontend/vstream-1.1.jar <params.yaml>

Notice

This system uses code from RocksDB and Apache Flink, both licensed under Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
buckifier		buckifier
build_tools		build_tools
cache		cache
cmake		cmake
coverage		coverage
db		db
db_stress_tool		db_stress_tool
docs		docs
env		env
examples		examples
file		file
flink-frontend		flink-frontend
fuzz		fuzz
include/rocksdb		include/rocksdb
java		java
logging		logging
memory		memory
memtable		memtable
microbench		microbench
monitoring		monitoring
options		options
plugin		plugin
port		port
table		table
test_util		test_util
third-party		third-party
tools		tools
trace_replay		trace_replay
util		util
utilities		utilities
.clang-format		.clang-format
.gitignore		.gitignore
AUTHORS		AUTHORS
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
COPYING		COPYING
DEFAULT_OPTIONS_HISTORY.md		DEFAULT_OPTIONS_HISTORY.md
DUMP_FORMAT.md		DUMP_FORMAT.md
HISTORY.md		HISTORY.md
INSTALL.md		INSTALL.md
LANGUAGE-BINDINGS.md		LANGUAGE-BINDINGS.md
LICENSE		LICENSE
LICENSE.Apache		LICENSE.Apache
LICENSE.leveldb		LICENSE.leveldb
Makefile		Makefile
PLUGINS.md		PLUGINS.md
README.md		README.md
TARGETS		TARGETS
USERS.md		USERS.md
Vagrantfile		Vagrantfile
WINDOWS_PORT.md		WINDOWS_PORT.md
build.sh		build.sh
common.mk		common.mk
crash_test.mk		crash_test.mk
deploy-all.sh		deploy-all.sh
deploy.sh		deploy.sh
gitconfig		gitconfig
issue_template.md		issue_template.md
rocksdb.pc.in		rocksdb.pc.in
src.mk		src.mk
thirdparty.inc		thirdparty.inc
workers		workers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

VStream: A Distributed Streaming Vector Search System

Introduction

Build & Run

Requirements

Build

Data Preparation

Configurations

Run

Comparing with the Baselines

Milvus

Qdrant

Chroma

Notice

About

Licenses found

Releases

Packages

Contributors 3

Languages

License

Licenses found

ZJU-DAILY/VStream

Folders and files

Latest commit

History

Repository files navigation

VStream: A Distributed Streaming Vector Search System

Introduction

Build & Run

Requirements

Build

Data Preparation

Configurations

Run

Comparing with the Baselines

Milvus

Qdrant

Chroma

Notice

About

Resources

License

Licenses found

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages