This project is a Retrieval-Augmented Generation (RAG) system designed for querying and summarizing documents. It supports GPU and CPU environments, and the backend can run using either Flask (RAG.py
) or FastAPI (RAG_fastapi.py
). The frontend provides a user-friendly interface for uploading documents and interacting with the system.
The RAG system uses vector-based search and large language models (LLMs) to enable:
- Document uploads and storage.
- Advanced querying using embeddings.
- Document summarization.
It supports various embedding providers (e.g., OpenAI, Llama, Groq) and works in GPU and CPU environments for optimized performance.
The RAG system now supports multiple vector databases for efficient storage and retrieval of embeddings. A dedicated Database_Readme.md
file provides detailed instructions for configuring and using each supported database.
For detailed information on supported vector databases (e.g., FAISS, Milvus, Pinecone, Qdrant, Weaviate) and their configurations, see Database_Readme.md.
Below is a visual representation of the RAG system architecture and user interface:
- Python 3.11 (with Conda for backend environment management)
- Node.js (for the frontend)
- Docker (optional for containerized deployment)
- Git (for cloning the repository)
Before running the backend, create a .env
file in the backend
folder and add the following environment variables:
# Authentication Keys
LLAMA_CLOUD_API_KEY= # API key for LLama Cloud
OPENAI_API_KEY= # API key for OpenAI
GROQ_API_KEY= # API key for Groq
# Vector Database Configuration
VECTOR_DB_PATH=vector_dbs/ # Path to store vector databases
USE_GPU=true # Set to "true" for GPU usage or "false" for CPU
# Pinecone Configuration
PINECONE_API_KEY= # API key for Pinecone
# Weaviate Configuration
WEAVIATE_API_KEY= # API key for Weaviate
WEAVIATE_CLUSTER_URL= # Cluster URL for Weaviate
# Qdrant Configuration
QDRANT_API_KEY= # API key for Qdrant (optional)
QDRANT_ADMIN_API_KEY= # Admin API key for Qdrant (optional)
QDRANT_CLUSTER_URL= # Cluster URL for Qdrant (optional)
- Note: Replace the placeholder values with your actual API keys and paths.
- Make sure the
VECTOR_DB_PATH
matches the directory where vector databases will be stored.
-
Navigate to the backend directory:
cd backend
-
Create and activate a Conda environment:
conda env create -f src/environment.yml conda activate rag_env
-
Install dependencies:
pip install -r requirements.txt
-
Run the backend:
- For Flask:
python src/RAG.py
- For FastAPI:
python src/RAG_fastapi.py
- For Flask:
-
Navigate to the frontend directory:
cd frontend
-
Install dependencies:
npm install
-
Start the development server:
npm start
The frontend will be available at http://localhost:3000
.
The project includes Docker support for both GPU and CPU environments.
-
Build the Docker image:
- GPU Environment:
docker build --build-arg USE_GPU=true -f backend/Backend.gpu.dockerfile -t rag-system-gpu .
- CPU Environment:
docker build --build-arg USE_GPU=false -f backend/Backend.cpu.dockerfile -t rag-system-cpu .
- GPU Environment:
-
Run the Docker container:
- GPU:
docker-compose -f docker-compose.gpu.yml up
- CPU:
docker-compose -f docker-compose.cpu.yml up
- GPU:
To start the backend directly without Docker:
- Flask:
python backend/src/RAG.py
- FastAPI:
python backend/src/RAG_fastapi.py
The backend dynamically supports both GPU and CPU setups:
- For GPU-based setups, ensure
faiss-gpu
is installed and theUSE_GPU
environment variable is set totrue
. - For CPU setups,
faiss-cpu
should be installed instead.
If using Docker, the provided docker-compose
files will handle these configurations automatically.
To switch between Flask and FastAPI when using Docker:
- Set the
RAG_SERVER
environment variable:- Flask:
docker run -e RAG_SERVER=flask -p 5000:5000 rag-system
- FastAPI:
docker run -e RAG_SERVER=fastapi -p 5000:5000 rag-system
- Flask:
project-root/
│
├── README.md # Main project documentation
├── LICENSE # Project license
├── .gitignore # Git ignore file
├── Database_Readme.md # Detailed documentation for database setup
├── docker-compose.yml # Default Docker Compose configuration
├── docker-compose.gpu.yml # Docker Compose configuration for GPU
├── docker-compose.cpu.yml # Docker Compose configuration for CPU
│
├── assets/ # Folder for project assets (images, diagrams, etc.)
│
├── data/ # General data folder
│
├── frontend/ # Frontend (React) directory
│ ├── rag-ui/ # React application
│ │ ├── public/ # Static assets
│ │ ├── src/ # Source code
│ │ ├── package.json # Frontend dependencies
│ │ ├── .env # Frontend environment configuration
│
├── backend/ # Backend (Python) directory
│ ├── __pycache__/ # Python cache files
│ ├── data/ # Data-related files
│ ├── vector_dbs/ # Vector DB index files
│ ├── src/ # Source code
│ │ ├── __pycache__/ # Python cache files for `src`
│ │ ├── adapters/ # Vector database adapters
│ │ │ ├── __pycache__/ # Python cache files for `adapters`
│ │ │ ├── __init__.py # Package initialization
│ │ │ ├── faiss_adapter.py # FAISS vector DB adapter
│ │ │ ├── milvus_adapter.py # Milvus vector DB adapter
│ │ │ ├── pinecone_adapter.py # Pinecone vector DB adapter
│ │ │ ├── qdrant_adapter.py # Qdrant vector DB adapter
│ │ │ ├── weaviate_adapter.py # Weaviate vector DB adapter
│ │ ├── add_to_vector_db.py # Script to add documents to vector DB
│ │ ├── config.yaml # Configuration file for databases
│ │ ├── embedding_config.py # Embedding model configuration
│ │ ├── embedding_initializer.py # Embedding model initialization logic
│ │ ├── environment.yml # Conda environment file
│ │ ├── id_map.pkl # ID mapping for vectors
│ │ ├── main.py # Unified entry point for the backend
│ │ ├── RAG_fastapi.py # FastAPI-based backend
│ │ ├── rag_models.py # RAG models and processing logic
│ │ ├── RAG.py # Flask-based backend
│ │ ├── VectorDB.py # Vector database management logic
│ ├── .env # Backend environment configuration
│ ├── Backend.cpu.dockerfile # Dockerfile for CPU-specific setup
│ ├── Backend.gpu.dockerfile # Dockerfile for GPU-specific setup
│ ├── id_map.pkl # ID mapping file
│ ├── requirements.txt # Backend dependencies
│ ├── start_server.sh # Script to dynamically select Flask or FastAPI
- Flask and FastAPI Support: Run the backend with Flask (
RAG.py
) or FastAPI (RAG_fastapi.py
). - GPU and CPU Compatibility: Dynamically handles GPU or CPU setups based on the environment.
- Document Querying: Upload documents and perform advanced queries using embeddings.
- Summarization: Generate document summaries using vector-based retrieval.
-
Permission Errors:
- Ensure the
data/
andbackend/vector_dbs/
directories are writable:chmod -R 755 backend/vector_dbs chmod -R 755 data
- Ensure the
-
Missing Dependencies:
- Install missing Python packages:
pip install -r backend/requirements.txt
- Install missing Python packages:
-
Docker Issues:
- Check container logs:
docker logs <container_id>
- Check container logs:
-
CORS Issues:
- Ensure the backend allows requests from the frontend:
from fastapi.middleware.cors import CORSMiddleware app.add_middleware( CORSMiddleware, allow_origins=["http://localhost:3000"], allow_methods=["*"], allow_headers=["*"], )
- Ensure the backend allows requests from the frontend: