QA with Your Documents

This project provides an interactive Streamlit-based web application that allows users to upload PDF and CSV files, store their content in a vector database using LangChain and Chroma, and query the uploaded documents using OpenAI's LLMs (e.g., GPT-3.5-turbo). The app intelligently retrieves relevant information from the documents and provides citations for the sources.

Features

Upload and Process Documents:
- Upload multiple PDF and CSV files.
- Extract content using LangChain's document loaders.
Vector Database Storage:
- Store document embeddings in a persistent Chroma vector database.
Interactive Query System:
- Ask questions about the uploaded documents.
- Retrieve answers along with source citations.
Download Cited Files:
- Easily download files cited in the query response.

Technologies Used

Streamlit: For creating the web interface.
LangChain: For document processing and retrieval.
Chroma: As the vector database for storing embeddings.
OpenAI API: For LLM-based query answering.
Python: The core language for building the application.

Installation

Prerequisites

Python 3.10 or later
OpenAI API Key

Steps

Clone the Repository:

git clone git@github.com:stacksapien/smart-doc-search.git
cd smart-doc-search

Set Up a Virtual Environment:

python3 -m venv env
source env/bin/activate # On Windows: .\\env\\Scripts\\activate

Install Dependencies:
```
pip install -r requirements.txt
```
Configure Environment Variables: Create a file named .env in the root directory and add your OpenAI API key:
```
OPENAI_API_KEY=your_openai_api_key
```
Run the Application:
```
streamlit run app.py
```
Access the App: Open your browser and navigate to:
```
http://localhost:8501
```

Usage

Upload Files

Upload one or more PDF or CSV files using the file uploader.
Uploaded files are processed and stored in the uploaded_files directory.

Ask Questions

Enter your query in the text box provided.
The app retrieves relevant answers from the uploaded documents and displays the sources.

Download Cited Files

Files cited in the response are available for download.

File Structure

smart-doc-search/
│
├── app.py # Main Streamlit application
├── requirements.txt # List of Python dependencies
├── .env # Environment variables (not included in Git)
├── uploaded_files/ # Directory for storing uploaded files
├── chromadb/ # Directory for persistent Chroma vector database
└── README.md # Project documentation

Deployment

Deploy on AWS EC2

Launch an Ubuntu EC2 instance and configure security groups to allow inbound traffic on ports 22 and 8501.
SSH into the instance and set up Python, Streamlit, and the application as per the installation instructions.
Use a process manager like tmux or screen to keep the app running.

Use Custom Domain

Configure a reverse proxy (e.g., Nginx) to serve the Streamlit app under your domain.
Enable HTTPS using Certbot for SSL certificates.

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch: git checkout -b feature-name
Commit your changes: git commit -m "Add feature-name"
Push to the branch: git push origin feature-name
Submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
screenshots		screenshots
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QA with Your Documents

Features

Technologies Used

Installation

Prerequisites

Steps

Usage

Upload Files

Ask Questions

Download Cited Files

File Structure

Deployment

Deploy on AWS EC2

Use Custom Domain

Contributing

License

Acknowledgments

Issues

Screenshots

Upload Documents

Query and Get Results

Author

About

Releases 1

Languages

License

stacksapien/smart-doc-search

Folders and files

Latest commit

History

Repository files navigation

QA with Your Documents

Features

Technologies Used

Installation

Prerequisites

Steps

Usage

Upload Files

Ask Questions

Download Cited Files

File Structure

Deployment

Deploy on AWS EC2

Use Custom Domain

Contributing

License

Acknowledgments

Issues

Screenshots

Upload Documents

Query and Get Results

Author

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Languages