Skip to content

Measuring the ANN Recall vs. Performance Trade-Off

Notifications You must be signed in to change notification settings

o19s/ann-measurement

Repository files navigation

Measuring the ANN Recall vs. Performance Trade-Off

Example project for measuring ANN (vs exact) recall at (local developer notebook) scale.

In this example we use:

  • ~6.7m documents of English Wikipedia (wikipedia/20230601.en from TF datasets)
  • 633 questions from WikiQA
  • all-MiniLM-L6-v2 as embedding model
  • OpenSearch 2.17.1 with kNN plugin and the Lucene vector backend

Setup

Virtual environment and dependencies

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Start OpenSearch with Dashboards

docker compose up

Run

To reproduce run the following steps:

  1. Create text embeddings of the first 1000 chars of all the 6.7m Wikipedia EN articles. This takes about 10h on a M1 Max Macbook but only needs to be computed once.
python corpus_embed.py
  1. Index the text embeddings to OpenSearch. We also index the title/text into text fields for possible later experiments. Indexing the whole dataset takes about 2h on the M1 Max. The index will take around 80GB of disk space.
python corpus_indexing.py
  1. Calculate the ANN vs. KNN metrics. We take the WikiQA questions and compute the exact nearest neighbours against the embeddings from step 1. We then run the questions against the OpenSearch index and compute the recall of the returned approximate nearest neighbours.
python calculate_ann_metrics.py

Limitations / Possible Extensions

  • This example only evaluates for one embedding model. Adjust corpus_embed.py to create embeddings for multiple models.
  • This example only indexes into a single shard with no replica. To try ANN against a multi-shard index adjust corpus_indexing.py and create an index with multiple shards/replicas. You might need to increase the available OpenSearch heap size to accommodate for the additional overhead of having more than one shard.
  • This example only iterates through the query-side kNN parameter k. To also try different server-side parameter values for efConstruction and m adjust corpus_indexing.py.
  • This example does not try another other than the Lucene kNN backend for OpenSearch. To try Faiss or nmslib, again expand the indexing code.

About

Measuring the ANN Recall vs. Performance Trade-Off

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages