Skip to content

Latest commit

 

History

History
62 lines (46 loc) · 2.96 KB

README.md

File metadata and controls

62 lines (46 loc) · 2.96 KB

DSG

This repo is the implementation of On Dynamic Range-Filtering Approximate Nearest Neighbor Search/. It can support Range-filtering approximate nearest neighbor search (RFANNS) with index steaming insertion.

file description
segment_graph_2d.h 2D-SegmentGraph for arbitrary query
compact_graph.h DSG for arbitrary query and insertion

Quick Start

Compile and Run

mkdir build && cd build
cmake ..
make

Example Usage of Running DGS on DEEP dataset:

./benchmark/build_index -N 10000 -k 16 -ef_max 500 -dataset deep -dataset_path [path_to_deep_base.fvecs] -index_path [path_to_save_index.bin] -method compact
./benchmark/query_index -N 10000 -k 16 -ef_max 500 -dataset_path [path_to_deep_base.fvecs] -query_path [path_to_deep_query.fvecs] -groundtruth_path [path_to_your_groundtruth] -index_path [path_to_your_index.bin] -method compact

build_index Parameters

  • -N: The number of top-N vectors to use for indexing.
  • -k: The number of nearest neighbors to consider during index construction. A larger k may improve accuracy but increase indexing time and memory usage.
  • -ef_max: The maximum size of candidate list from once ANN-Search during inserting each point.
  • -dataset: The name of the dataset (e.g., deep, yt8m-video, wiki-image). This is used for logging and result organization.
  • -method: The indexing method to use. Options include:
    • compact: Uses the Compact DSG method for arbitrary queries and insertions.
    • Seg2D: Uses the 2D Segment Graph method for arbitrary queries.
  • -dataset_path: The path to the base dataset file for indexing. The dataset should be pre-sorted by the search key.
  • -index_path: The directory path where the constructed index will be saved.

query_index Parameters

  • -N: The number of top-N vectors to use for querying.
  • -k: Should be aligned to your built index
  • -ef_max: Should be aligned to your built index
  • -dataset: The name of the dataset (should match the value used in build_index).
  • -method: The indexing method used (should match the value used in build_index).
  • -dataset_path: The path to the base dataset file.
  • -query_path: The path to the query vectors file. These are the vectors for which nearest neighbors will be searched.
  • -groundtruth_path: The path to the ground truth file for evaluating query accuracy. This file should contain the true nearest neighbors for the queries.
  • -index_path: The path to the saved index file generated by build_index.

We hardcoded some parameters, you can change them in the code and recompile.

Dataset

Dataset Data type Dimensions Search Key
DEEP float 96 Synthetic
Youtube-Video float 1024 Video Release Time
WIT-Image float 2048 Image Size