Multi-Document Retrieval with Watsonx 😻

title	emoji	colorFrom	colorTo	sdk	sdk_version	app_file	pinned
Chat-with-Multiple-Documents-Using-Streamlit-and-Watsonx	😻	purple	pink	streamlit	1.40.0	app.py	false

Multi-Document Retrieval with Watsonx 😻

A Streamlit-powered app for querying multiple document types using Watsonx and LangChain.

This project allows users to upload various file formats (PDFs, DOCX, CSV, JSON, YAML, HTML, etc.) and retrieve contextually accurate responses using Watsonx LLM models and LangChain. The app provides a seamless interface to perform retrieval-augmented generation (RAG) from uploaded documents

Note: While this app runs efficiently on machines with low specifications, for faster indexing and response times, I recommend using a more powerful machine.

Live App

Link to live app

Features

File Support: Supports multiple file formats such as PDFs, Word documents, PowerPoint presentations, CSV, JSON, YAML, HTML, and plain text.
Watsonx LLM Integration: Utilize IBM Watsonx's LLM models for querying and generating answers.
Embeddings: Uses HuggingFace embeddings for document indexing.
RAG (Retrieval Augmented Generation): Combines document-based retrieval with LLMs for accurate responses.
Streamlit Interface: Provides an intuitive user experience.

Installation

Follow these steps to clone and run the project locally:

Prerequisites

Python 3.8+ installed on your system.
Install pip (Python package manager).
An IBM Watsonx API key and Project ID.
Install Git if not already installed.

Clone the Repository

git clone https://github.com/Abd-al-RahmanH/Multi-Doc-Retrieval-Watsonx.git
cd Multi-Doc-Retrieval-Watsonx

Install Dependencies

Create a virtual environment (optional but recommended):

python -m venv env
source env/bin/activate  # On Windows: .\env\Scripts\activate

Install required Python packages:
```
pip install -r requirements.txt
```

Set Environment Variables

Create a .env file in the project directory with the following keys:

WATSONX_API_KEY=<your_watsonx_api_key>
WATSONX_PROJECT_ID=<your_watsonx_project_id>

Run the App

Start the Streamlit app by running:
```
streamlit run app.py
```
Open the URL displayed in your terminal (usually http://localhost:8501) to access the app.

How to Use

Upload Documents: Drag and drop supported files (e.g., PDFs, DOCX, JSON) in the app sidebar.
Select Model and Parameters: Choose a Watsonx model and configure settings like output tokens and decoding methods.
Ask Questions: Enter queries in the chat input to retrieve answers based on the uploaded document.

Project Structure

Multi-Doc-Retrieval-Watsonx/
├── app.py               # Main application file
├── requirements.txt     # Python dependencies
├── README.md            # Project documentation
└── .env                 # Environment variables (not included in repo, create manually)

Dependencies

Streamlit: For building the user interface.
LangChain: For document retrieval and RAG implementation.
HuggingFace Transformers: For embedding and vector representation.
Watsonx Foundation Models: For querying and text generation.
Various Python Libraries: For file handling, including pandas, python-docx, python-pptx, and more.

Contributing

We welcome contributions! If you'd like to improve this project:

Fork the repository.
Create a feature branch: git checkout -b feature-name.
Commit your changes: git commit -m 'Add a new feature'.
Push to the branch: git push origin feature-name.
Open a Pull Request.

More Blogs and Interesting Projects

For more blogs and interesting projects, visit my personal website: https://abdulrahmanh.com

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
assets		assets
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
sample_env		sample_env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Document Retrieval with Watsonx 😻

Live App

Features

Installation

Prerequisites

Clone the Repository

Install Dependencies

Set Environment Variables

Run the App

How to Use

Project Structure

Dependencies

Contributing

More Blogs and Interesting Projects

License

About

Releases

Packages

Languages

License

Abd-al-RahmanH/Multi-Doc-Retrieval-Watsonx

Folders and files

Latest commit

History

Repository files navigation

Multi-Document Retrieval with Watsonx 😻

Live App

Features

Installation

Prerequisites

Clone the Repository

Install Dependencies

Set Environment Variables

Run the App

How to Use

Project Structure

Dependencies

Contributing

More Blogs and Interesting Projects

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages