PDFQuery is a Streamlit-based web application that allows users to upload PDF files and ask questions about the content within those files. The app uses the LangChain framework, Google Generative AI for embeddings, and FAISS for vector storage to provide detailed answers based on the context of the uploaded PDFs.
- Upload multiple PDF files.
- Extract text from PDF files.
- Split text into manageable chunks.
- Embed text chunks using Google Generative AI embeddings.
- Store embeddings using FAISS.
- Perform similarity search to find relevant document chunks.
- Use a conversational AI model to answer questions based on the content of the PDFs.
- Python 3.8 or higher
- Streamlit
- Google API Key for Generative AI
-
Clone the repository:
git clone https://github.com/Rishitabansal9/PdfQuery.git cd PdfQuery
-
Create a virtual environment and activate it:
python -m venv venv .\venv\Scripts\activate # On Windows source venv/bin/activate # On macOS/Linux
-
Install the required packages:
pip install -r requirements.txt
-
Set up your environment variables:
Create a
.env
file in the root directory of the project and add your Google API key:GOOGLE_API_KEY=your_google_api_key
-
Run the Streamlit app:
streamlit run app.py
-
Upload PDF files:
-
Ask questions:
- Enter your question in the text input box on the main page.
- The application will display the answer based on the content of the uploaded PDFs.
app.py
: The main Streamlit application file.requirements.txt
: The list of required Python packages..env
: File to store environment variables (e.g., Google API key).
Contributions are welcome! Please feel free to submit a Pull Request.