This repo is the implementation of SeRF: Segment Graph for Range Filtering Approximate Nearest Neighbor Search.
file | description |
---|---|
segment_graph_1d.h | SegmentGraph for halfbounded query |
segment_graph_2d.h | 2D-SegmentGraph for arbitrary query |
Note
[09/09/2024] We've optimized compile options and added SIMD support. By default, construction and search operations now run in parallel with SIMD, resulting in an approximate 1.5x boost in query speed.
SeRF can be built from source using CMake. The only dependencies required are C++11 and OpenMP.
We have tested the following build instructions on macOS (ARM) and Linux (x86-64):
mkdir build && cd build
cmake ..
make
Running example benchmark on DEEP dataset:
./benchmark/serf_halfbound -N 10000 -dataset_path [path_to_deep_base.fvecs] -query_path [path_to_deep_query.fvecs]
./benchmark/serf_arbitrary -N 10000 -dataset_path [path_to_deep_base.fvecs] -query_path [path_to_deep_query.fvecs]
# Running on sample dataset under "sample_data"
./benchmark/serf_halfbound -N 10000 -dataset_path ../sample_data/deep_10k.fvecs -query_path ../sample_data/deep_query.fvecs
./benchmark/serf_arbitrary -N 10000 -dataset_path ../sample_data/deep_10k.fvecs -query_path ../sample_data/deep_query.fvecs
Parameters:
-dataset_path
: The base dataset path for indexing, pre-sorted by search key-query_path
: The query vectors path-N
: The top-N number of vector using for indexing, load all vectors if not specify
Optional Parameters:
-dataset
: The dataset name, supported values:deep,yt8m-audio,wiki-image
-index_k
: The maximum number of neighbors of HNSW index.-ef_con
: ef_construction for HNSW index.-ef_max
: The maximum number of neighbors for building SeRF.
index_k1
, ef_con
, ef_max
, ef_search
support multiple values, separating by ,
.
Sample command:
./benchmark/serf_arbitrary -N 10000 -dataset_path ../sample_data/deep_10k.fvecs -query_path ../sample_data/deep_query.fvecs -ef_search 40,60,80,100 -index_k 16
We hardcoded some parameters for the ease of demonstration, you can change them in the code and recompile.
Dataset | Data type | Dimensions | Search Key |
---|---|---|---|
DEEP | float | 96 | Synthetic |
Youtube-Audio | float | 128 | Video Release Time |
WIT-Image | float | 1024 | Image Size |
@article{SeRF,
author = {Zuo, Chaoji and Qiao, Miao and Zhou, Wenchao and Li, Feifei and Deng, Dong},
title = {SeRF: Segment Graph for Range-Filtering Approximate Nearest Neighbor Search},
year = {2024},
issue_date = {February 2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {2},
number = {1},
url = {https://doi.org/10.1145/3639324},
doi = {10.1145/3639324}
}