RAG System for Document Query and Summarization with Flask or FastAPI

This project is a Retrieval-Augmented Generation (RAG) system designed for querying and summarizing documents. It supports GPU and CPU environments, and the backend can run using either Flask (RAG.py) or FastAPI (RAG_fastapi.py). The frontend provides a user-friendly interface for uploading documents and interacting with the system.

Overview

The RAG system uses vector-based search and large language models (LLMs) to enable:

Document uploads and storage.
Advanced querying using embeddings.
Document summarization.

It supports various embedding providers (e.g., OpenAI, Llama, Groq) and works in GPU and CPU environments for optimized performance.

The RAG system now supports multiple vector databases for efficient storage and retrieval of embeddings. A dedicated Database_Readme.md file provides detailed instructions for configuring and using each supported database.

Database Configurations

For detailed information on supported vector databases (e.g., FAISS, Milvus, Pinecone, Qdrant, Weaviate) and their configurations, see Database_Readme.md.

Preview

Below is a visual representation of the RAG system architecture and user interface:

User Interface

Setup and Installation

Prerequisites

Python 3.11 (with Conda for backend environment management)
Node.js (for the frontend)
Docker (optional for containerized deployment)
Git (for cloning the repository)

Environment Configuration

Before running the backend, create a .env file in the backend folder and add the following environment variables:

# Authentication Keys
LLAMA_CLOUD_API_KEY=          # API key for LLama Cloud
OPENAI_API_KEY=               # API key for OpenAI
GROQ_API_KEY=                 # API key for Groq

# Vector Database Configuration
VECTOR_DB_PATH=vector_dbs/    # Path to store vector databases
USE_GPU=true                  # Set to "true" for GPU usage or "false" for CPU

# Pinecone Configuration
PINECONE_API_KEY=             # API key for Pinecone

# Weaviate Configuration
WEAVIATE_API_KEY=             # API key for Weaviate
WEAVIATE_CLUSTER_URL=         # Cluster URL for Weaviate

# Qdrant Configuration
QDRANT_API_KEY=               # API key for Qdrant (optional)
QDRANT_ADMIN_API_KEY=         # Admin API key for Qdrant (optional)
QDRANT_CLUSTER_URL=           # Cluster URL for Qdrant (optional)

Note: Replace the placeholder values with your actual API keys and paths.
Make sure the VECTOR_DB_PATH matches the directory where vector databases will be stored.

Backend Setup

Navigate to the backend directory:
```
cd backend
```

Create and activate a Conda environment:

conda env create -f src/environment.yml
conda activate rag_env

Install dependencies:
```
pip install -r requirements.txt
```

Run the backend:

For Flask:
```
python src/RAG.py
```
For FastAPI:
```
python src/RAG_fastapi.py
```

Frontend Setup

Navigate to the frontend directory:
```
cd frontend
```
Install dependencies:
```
npm install
```
Start the development server:
```
npm start
```

The frontend will be available at http://localhost:3000.

Docker Setup

The project includes Docker support for both GPU and CPU environments.

Build the Docker image:

GPU Environment:

docker build --build-arg USE_GPU=true -f backend/Backend.gpu.dockerfile -t rag-system-gpu .

CPU Environment:

docker build --build-arg USE_GPU=false -f backend/Backend.cpu.dockerfile -t rag-system-cpu .

Run the Docker container:

GPU:

docker-compose -f docker-compose.gpu.yml up

CPU:

docker-compose -f docker-compose.cpu.yml up

Usage

Running the Backend

To start the backend directly without Docker:

Flask:
```
python backend/src/RAG.py
```
FastAPI:
```
python backend/src/RAG_fastapi.py
```

GPU and CPU Configurations

The backend dynamically supports both GPU and CPU setups:

For GPU-based setups, ensure faiss-gpu is installed and the USE_GPU environment variable is set to true.
For CPU setups, faiss-cpu should be installed instead.

If using Docker, the provided docker-compose files will handle these configurations automatically.

Switching Between Flask and FastAPI

To switch between Flask and FastAPI when using Docker:

Set the RAG_SERVER environment variable:

Flask:

docker run -e RAG_SERVER=flask -p 5000:5000 rag-system

FastAPI:

docker run -e RAG_SERVER=fastapi -p 5000:5000 rag-system

Project Structure

project-root/
│
├── README.md                       # Main project documentation
├── LICENSE                         # Project license
├── .gitignore                      # Git ignore file
├── Database_Readme.md              # Detailed documentation for database setup
├── docker-compose.yml              # Default Docker Compose configuration
├── docker-compose.gpu.yml          # Docker Compose configuration for GPU
├── docker-compose.cpu.yml          # Docker Compose configuration for CPU
│
├── assets/                         # Folder for project assets (images, diagrams, etc.)
│
├── data/                           # General data folder
│
├── frontend/                       # Frontend (React) directory
│   ├── rag-ui/                     # React application
│   │   ├── public/                 # Static assets
│   │   ├── src/                    # Source code
│   │   ├── package.json            # Frontend dependencies
│   │   ├── .env                    # Frontend environment configuration
│
├── backend/                        # Backend (Python) directory
│   ├── __pycache__/                # Python cache files
│   ├── data/                       # Data-related files
│   ├── vector_dbs/                 # Vector DB index files
│   ├── src/                        # Source code
│   │   ├── __pycache__/            # Python cache files for `src`
│   │   ├── adapters/               # Vector database adapters
│   │   │   ├── __pycache__/        # Python cache files for `adapters`
│   │   │   ├── __init__.py         # Package initialization
│   │   │   ├── faiss_adapter.py    # FAISS vector DB adapter
│   │   │   ├── milvus_adapter.py   # Milvus vector DB adapter
│   │   │   ├── pinecone_adapter.py # Pinecone vector DB adapter
│   │   │   ├── qdrant_adapter.py   # Qdrant vector DB adapter
│   │   │   ├── weaviate_adapter.py # Weaviate vector DB adapter
│   │   ├── add_to_vector_db.py     # Script to add documents to vector DB
│   │   ├── config.yaml             # Configuration file for databases
│   │   ├── embedding_config.py     # Embedding model configuration
│   │   ├── embedding_initializer.py # Embedding model initialization logic
│   │   ├── environment.yml         # Conda environment file
│   │   ├── id_map.pkl              # ID mapping for vectors
│   │   ├── main.py                 # Unified entry point for the backend
│   │   ├── RAG_fastapi.py          # FastAPI-based backend
│   │   ├── rag_models.py           # RAG models and processing logic
│   │   ├── RAG.py                  # Flask-based backend
│   │   ├── VectorDB.py             # Vector database management logic
│   ├── .env                        # Backend environment configuration
│   ├── Backend.cpu.dockerfile      # Dockerfile for CPU-specific setup
│   ├── Backend.gpu.dockerfile      # Dockerfile for GPU-specific setup
│   ├── id_map.pkl                  # ID mapping file
│   ├── requirements.txt            # Backend dependencies
│   ├── start_server.sh             # Script to dynamically select Flask or FastAPI

Key Features

Flask and FastAPI Support: Run the backend with Flask (RAG.py) or FastAPI (RAG_fastapi.py).
GPU and CPU Compatibility: Dynamically handles GPU or CPU setups based on the environment.
Document Querying: Upload documents and perform advanced queries using embeddings.
Summarization: Generate document summaries using vector-based retrieval.

Troubleshooting

Permission Errors:
- Ensure the data/ and backend/vector_dbs/ directories are writable:
```
chmod -R 755 backend/vector_dbs
chmod -R 755 data
```
Missing Dependencies:
- Install missing Python packages:
```
pip install -r backend/requirements.txt
```
Docker Issues:
- Check container logs:
```
docker logs <container_id>
```

CORS Issues:

Ensure the backend allows requests from the frontend:

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:3000"],
    allow_methods=["*"],
    allow_headers=["*"],
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG System for Document Query and Summarization with Flask or FastAPI

Table of Contents

Overview

Database Configurations

Preview

User Interface

Setup and Installation

Prerequisites

Environment Configuration

Backend Setup

Frontend Setup

Docker Setup

Usage

Running the Backend

GPU and CPU Configurations

Switching Between Flask and FastAPI

Project Structure

Key Features

Troubleshooting

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
backend		backend
frontend/rag-ui		frontend/rag-ui
parsed_chunks		parsed_chunks
pdfs		pdfs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.cpu.yml		docker-compose.cpu.yml
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml
test.py		test.py

License

sunnybedi990/RAG-with-LLM

Folders and files

Latest commit

History

Repository files navigation

RAG System for Document Query and Summarization with Flask or FastAPI

Table of Contents

Overview

Database Configurations

Preview

User Interface

Setup and Installation

Prerequisites

Environment Configuration

Backend Setup

Frontend Setup

Docker Setup

Usage

Running the Backend

GPU and CPU Configurations

Switching Between Flask and FastAPI

Project Structure

Key Features

Troubleshooting

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages