This demo showcases how Atlas Search's Vector capability can be used to query a body of text. The demo leverages the Hugging Face sentence-transformers/all-MiniLM-L6-v2 model to map sentences and the questions presented to a 384 dimension dense vector space.
Save your body of text to the corpus.txt file, or feel free to use the sample provided, which is the background information from the Warner Bros. Discovery Wikipedia page.
Open params.py and configure your connection to Atlas, along with the name of the database and collection you'd like to store your text.
Install the Sentence Transformers model:
pip install -U sentence-transformers
Run the encoder, which will store the sentences along with their dense vectors to MongoDB.
python3 encode.py
Create a default search index on the collection:
{
"mappings": {
"dynamic": true,
"fields": {
"docVector": {
"type": "knnVector",
"dimensions": 384,
"similarity": "euclidean"
}
}
}
}
You are now ready to ask questions about your body of text! The search.py will use the same sentence transformers library to encode your question and submit it to Atlas Search for the answer.
For example:
python3 search -q "Who founded TBS?"
Atlas Search's Answer:
----------------------
In 1965, Turner Broadcasting System was founded by Ted Turner and based in Atlanta, Georgia.
Try other questions, such as:
"Who did Discovery acquire?"
"When was Warner Bros. founded?"
"Who founded TBS?"
"Who did Warner purchase in 1982?"
This is the simple query passed to MongoDB:
[
{
"$search": {
"knnBeta": {
"vector": <geneated query vector>,
"path": "docVector",
"k": 150 // Number of neareast neighbors (nn) to return
}
}
},
{
"$limit": 1 // Let's assume the first result is correct :-).
}
]
The knnBeta operator uses the Hierarchical Navigable Small Worlds algorithm to perform semantic search. You can use Atlas Search support for kNN query to search similar to a selected product, search for images, etc.