Enhance your RAG with Contextual Retrieval

This project demonstrates how to use the Exxa batch inference API to enhance Retrieval-Augmented Generation (RAG) systems by implementing Anthropic's Contextual Retrieval approach. The goal is to add crucial context before embedding documents into a vector database, thereby improving the accuracy and relevance of retrieved information.

Background

Traditional RAG systems often lose context when documents are split into chunks. This project addresses this issue by adding chunk-specific explanatory context before embedding, as proposed by Anthropic.

For example, a chunk might contain the text: "The company's revenue grew by 3% over the previous quarter."

Without additional context, it is unclear which company or time period this refers to. By adding context, we can make this information more useful for retrieval.

Anthropic's solution involves generating context for each chunk using a language model. The context is added to the chunk before embedding, improving the retrieval performance. For more details, refer to this article.

Why Exxa?

This method might seem costly at first glance, as it requires running each document many times through a language model.

However, Exxa's batch inference API makes this process efficient and cost-effective:

Prefix Caching: By caching redundant prefix tokens in a batch, Exxa reduces the number of tokens that need to be processed per document.
Cost Efficiency: Exxa explicitly optimizes for the lowest cost per million tokens processed for a given model, rather than optimizing for latency as most other providers do.

Try it out

Clone the repository and install dependencies:

git clone https://github.com/withexxa/contextual-retrieval.git
cd contextual-retrieval
pip install -r requirements.txt

Set up your Exxa API credentials: Export your API key as an environment variable:
```
export EXXA_API_KEY='your_api_key_here'
```
Create a batch inference request: Run the script to create a batch inference request for the example document or add your own documents to the documents directory (in .md format in this example).
```
python 1_create_batch_requests.py
```
Fetch the batch inference results: Once the batch is processed, fetch the results.
```
python 2_fetch_batch_results.py
```
Use the results: The contextualized chunks of each document {filename}.md are saved to a {filename}.jsonl file in the output directory.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
documents		documents
.gitignore		.gitignore
1_create_batch_requests.py		1_create_batch_requests.py
2_fetch_batch_results.py		2_fetch_batch_results.py
LICENSE		LICENSE
README.md		README.md
contextual_retrieval.py		contextual_retrieval.py
exxa.py		exxa.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhance your RAG with Contextual Retrieval

Background

Why Exxa?

Try it out

Further Reading

License

About

Languages

License

withexxa/contextual-retrieval

Folders and files

Latest commit

History

Repository files navigation

Enhance your RAG with Contextual Retrieval

Background

Why Exxa?

Try it out

Further Reading

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages