This repo is the implementation of On Dynamic Range-Filtering Approximate Nearest Neighbor Search/. It can support Range-filtering approximate nearest neighbor search (RFANNS) with index steaming insertion.
file | description |
---|---|
segment_graph_2d.h | 2D-SegmentGraph for arbitrary query |
compact_graph.h | DSG for arbitrary query and insertion |
mkdir build && cd build
cmake ..
make
Example Usage of Running DGS on DEEP dataset:
./benchmark/build_index -N 10000 -k 16 -ef_max 500 -dataset deep -dataset_path [path_to_deep_base.fvecs] -index_path [path_to_save_index.bin] -method compact
./benchmark/query_index -N 10000 -k 16 -ef_max 500 -dataset_path [path_to_deep_base.fvecs] -query_path [path_to_deep_query.fvecs] -groundtruth_path [path_to_your_groundtruth] -index_path [path_to_your_index.bin] -method compact
- -N: The number of top-N vectors to use for indexing.
- -k: The number of nearest neighbors to consider during index construction. A larger k may improve accuracy but increase indexing time and memory usage.
- -ef_max: The maximum size of candidate list from once ANN-Search during inserting each point.
- -dataset: The name of the dataset (e.g., deep, yt8m-video, wiki-image). This is used for logging and result organization.
- -method: The indexing method to use. Options include:
- compact: Uses the Compact DSG method for arbitrary queries and insertions.
- Seg2D: Uses the 2D Segment Graph method for arbitrary queries.
- -dataset_path: The path to the base dataset file for indexing. The dataset should be pre-sorted by the search key.
- -index_path: The directory path where the constructed index will be saved.
- -N: The number of top-N vectors to use for querying.
- -k: Should be aligned to your built index
- -ef_max: Should be aligned to your built index
- -dataset: The name of the dataset (should match the value used in build_index).
- -method: The indexing method used (should match the value used in build_index).
- -dataset_path: The path to the base dataset file.
- -query_path: The path to the query vectors file. These are the vectors for which nearest neighbors will be searched.
- -groundtruth_path: The path to the ground truth file for evaluating query accuracy. This file should contain the true nearest neighbors for the queries.
- -index_path: The path to the saved index file generated by build_index.
We hardcoded some parameters, you can change them in the code and recompile.
Dataset | Data type | Dimensions | Search Key |
---|---|---|---|
DEEP | float | 96 | Synthetic |
Youtube-Video | float | 1024 | Video Release Time |
WIT-Image | float | 2048 | Image Size |