A Retrieval Augmented Generation application that combines the power of Large Language Models with document retrieval capabilities.
- Overview
- Features
- Prerequisites
- Installation
- Usage
- Example Outputs
- Architecture
- Debugging
- Customize
- Resources
- Local Ollama Setup
- Contributing
This application leverages OpenAI's GPT-4 and other LLMs to generate contextually relevant responses based on user input. It searches through a database of over 1,000 websites to provide accurate information, with special handling for disambiguating between entities with identical names. It can be used for semantic search and context aware question-answering for any text dataset.
The application demonstrates an interesting use case of distinguishing between two different people named "Michał Żarnecki" in different contexts, showcasing the power of context-aware information retrieval.
📖 For a detailed explanation of concepts used in this application, check out my article on Medium.
- Multiple LLM support (GPT-4, Claude-3.5, Llama3.2, Bielik, Gemini2)
- Vector database for efficient information retrieval
- Web interface, API endpoints, and CLI access
- Context-aware response generation
- Docker-based setup for easy deployment
- Docker and Docker Compose (Installation Guide)
-
Install Dependencies
cd app/src && composer install
-
Configure Environment
- Copy
.env-sample
to.env
inapp/src
- Choose your model in
.env
:MODEL=<model-option> # Options: GPT-4o, Claude-3.5, Llama3.2, Mixtral, Bielik, Gemini2, DeepSeek, , DeepSeek-R1-7B, DeepSeek-Coder-v2
- Copy
-
API Configuration
- No API key required (go directly to point 4.)
- Requires more CPU/RAM
- GPU recommended for better performance
- Uses Ollama for local model serving
- Requires API key
- Lower resource requirements
- Add to
.env
:(or modify other env variable related to chosen model)OPENAI_API_KEY=your-api-key
- Get OpenAI API key from OpenAI Platform (or one related to other API based model)
- Get DeepSeek API key from DeepSeek Platform
- Get Claude API key from Claude Platform
- Get Gemini API key from Gemini Platform
-
Launch Application
docker-compose up
Note: In case of using API access to LLM (other option than Ollama) run
docker-compose -f docker-compose-llm-api.yaml up
to avoid waisting time on downloading models to local env.Note: Initial document transformation may take long time. As default only part of documents is loaded. To process all documents, modify
$skipFirstN
inapp/src/service/DocumentLoader.php:20
. -
Access Application
- Wait for the setup completion message:
php-app | Loaded documents complete php-app | Postgres is ready - executing command php-app | [Sat Nov 02 11:32:28.365214 2024] [core:notice] [pid 1:tid 1] AH00094: Command line: 'apache2 -D FOREGROUND'
- Open http://127.0.0.1:2037 in your browser
- Wait for the setup completion message:
Visit http://127.0.0.1:2037 and enter your query.
curl -d '{"prompt":"what is result of 2+2?"}' \
-H "Content-Type: application/json" \
-X POST \
http://127.0.0.1:2037/process.php?api
docker exec -it php-app sh
php minicli rag
Input: What is the result of 2 + 2?
Response: The result of 2 + 2 is 4.
Input: Is Michał Żarnecki programmer the same person as Michał Żarnecki audio engineer?
Response: These are two different individuals:
- The programmer specializes in Python, PHP, JavaScript, and AI/ML technologies
- The audio engineer (1946-2016) was a renowned sound director in Polish film
There are 2 string comparison metrics implemented which compare generated answer to expected text. They are not the best solution as they are based on tokens appearance comparison and require providing reference text.
- ROUGE
- BLEU
Second evaluator is a criteria evaluator which pass prompt and generated answer to GPT-4o model and ask for 1-5 points evaluation in criteria:
- correctness: Is the answer accurate, and free of mistakes?
- helpfulness: Does the response provide value or solve the user's problem effectively?
- relevance: Does the answer address the question accurately?
- conciseness: Is the answer free of unnecessary details?
- clarity: Is the language clear and understandable?
- factual_accuracy: Are the facts provided correct?
- insensitivity: Does the response avoid dismissing, invalidating, or overlooking cultural or social sensitivities?
- maliciousness: Does the response avoid promoting harm, hatred, or ill intent?
- harmfulness: Does the response avoid causing potential harm or discomfort to individuals or groups?
- coherence: Does the response maintain logical flow and structure?
- misogyny: Does the response avoid sexist language, stereotypes, or any form of gender-based bias?
- criminality: Does the response avoid promoting illegal activities or providing guidance on committing crimes?
- controversiality: Does the response avoid unnecessarily sparking divisive or sensitive debates?
- creativity : (Optional) Is the response innovative or insightful?
$criteriaEvaluator = new CriteriaEvaluator();
$tokenSimilarityEvaluator = new StringComparisonEvaluator();
$compareResp = "Is Michał Żarnecki programmer is not the same person as Michał Żarnecki audio engineer.
Michał Żarnecki Programmer is still living, while Michał Żarnecki audio engineer died in 2016. They cannot be the same person.
Michał Żarnecki programmer is designing systems and programming AI based solutions. He is also a lecturer.
Michal Żarnecki audio engineer was also audio director that created music to famous Polish movies.";
$resp['evaluation'] = [
'ROUGE' => $tokenSimilarityEvaluator->calculateROUGE($compareResp, $response),
'BLEU' => $tokenSimilarityEvaluator->calculateBLEU($compareResp, $response),
'criteria' => $criteriaEvaluator->evaluate($payload->getRagPrompt(), $response)
];
Response:
{
"ROUGE": {
"recall": 0.23,
"precision": 0.3,
"f1": 0.26
},
"BLEU": 0.22,
"criteria": {
"correctness": 5,
"helpfulness": 4,
"relevance": 4,
"conciseness": 5,
"clarity": 4,
"factual_accuracy": 4,
"insensitivity": 5,
"maliciousness": 0,
"harmfulness": 0,
"coherence": 1,
"misogyny": 0,
"criminality": 0,
"controversiality": 0,
"creativity": 1
}
}
Results for info about Michał Żarnecki RAG example:
Gemini2, Claude-3.5 Sonnet and GPT-4o were good at this task although Gemini2 has the highest score. Bielik responded incorrectly. Mistral and LLama 3.2 were uncertain about response.
To rebuild after PHP script changes:
docker-compose rm
docker rmi -f php-rag
docker-compose up
To rebuild after pg_vector db related changes:
docker-compose rm
docker rmi -f ankane/pgvector
docker-compose up
- Use different LLMs.
You can pick from available LLMs:GPT-4o, Claude-3.5, Llama3.2, Mixtral, Bielik, DeepSeek, DeepSeek-R1-7B, DeepSeek-Coder-v2, Gemini2
For using other ones you can just modify model name in LLM client class for model provider, for exampleapp/src/service/openai/GeneratedTextFromGPTProvider.php:13
final class GeneratedTextFromGPTProvider extends AbstractGPTAPIClient
implements StageInterface, GeneratedTextProviderInterface
{
private string $model = 'gpt-4o';
- Use different embeddings model.
Modifyapp/src/loadDocuments.php:13
andapp/src/process.php:20
.
Put there one of classes that implementTextEncoderInterface
or create yours that satisfies interface.
Embedding size can have impact on text matching precision. - Modify system prompt.
Modify system prompt text in\service\PromptResolver::getSystemPrompt()
.
You can add there additional instructions, example solutions (one-shot/few-shot) or some patterns of reasoning (chain of thought).
private function getSystemPrompt(): string
{
return 'You are a helpful assistant that answers questions based on source documents.' . PHP_EOL;
}
- Use different number of retrieved documents.
Change$limit
inDocumentProvider::getSimilarDocuments()
public function getSimilarDocuments(
string $prompt,
string $embeddingPrompt,
bool $useReranking = false,
int $limit = 10,
string $distanceFunction = 'l2'
) {
- Use reranking.
If too many documents are passed to LLM it may focus on wrong information. If number is too small on the other hand it's possible to miss most important sources.
SetPayload::$useReranking
toTrue
inapp/src/process.php:25
. - Use different text matching algorithm.
Change$distanceFunction
inDocumentProvider::getSimilarDocuments()
.
Pick one from l2|cosine|innerProduct or support other one (see https://github.com/pgvector/pgvector, section "Quering").
public function getSimilarDocuments(
string $prompt,
string $embeddingPrompt,
bool $useReranking = false,
int $limit = 10,
string $distanceFunction = 'l2'
) {
- Dataset: "Website Classification" by Hetul Mehta on Kaggle
- Related Articles:
- Download Ollama
- Pull required models:
ollama pull llama3:latest ollama pull mxbai-embed-large
- Verify installation:
ollama list
- Start server:
ollama serve
- Use
MxbaiTextEncoder.php
class inapp/src/loadDocuments.php
Found a bug or have an improvement in mind? Please:
- Report issues
- Submit pull requests
- Contact: michal@zarnecki.pl
Your contributions make this project better for everyone!