A question-answering system built with LangChain, FAISS, and Streamlit that allows users to query documents using the Tentris API for embeddings and chat completion.
- Document-based question answering
- Custom embedding model integration
- Simple UI
- Benchmark dataset generation for QA evaluation
Install dependencies:
pip install streamlit langchain langchain-community faiss-cpu python-dotenv openai
-
Navigate to the project directory.
-
Create a new
.env
file in the root folder (same level asmain.py
). -
Add the following content to the
.env
file:
TENTRIS_BASE_URL_EMBEDDINGS="http://tentris-ml.cs.upb.de:8502/v1"
TENTRIS_BASE_URL_CHAT="http://tentris-ml.cs.upb.de:8501/v1"
TENTRIS_API_KEY="your-api-key-here"
Replace your-api-key-here
with your actual API key.
-
Place your document in the
data/
directory asspeech.txt
-
Run the application:
streamlit run main.py
- Open your browser and go to http://localhost:8501
- Type your question in the text box
- Click 'Get Answer'
- Wait for the response
The benchmark_dataset.py
script generates a test dataset to evaluate the Q&A system. It processes documents, creates a knowledge base, and generates Q&A pairs with Tentris API.
The Giskard Python library provides RAGET (RAG Evaluation Toolkit), which automatically generates a benchmark dataset. RAGET works by:
- Generating a list of questions, reference answers, and reference contexts directly from the knowledge base of your RAG system.
- Producing test datasets that can evaluate the retrieval, generation, and overall quality of your RAG system.
This includes simple questions, as well as more complex variations (e.g., situational, double, or conversational questions) designed to target specific components of the RAG pipeline.
Install required libraries:
pip install pandas giskard
- Prepare the document: Place
speech.txt
in thedata/
folder. - Set environment variables like above.
- Run the script
The dataset_eval_rag.json
file is manually generated, focusing on simple node questions, multihop strategies, and some more complex queries. This dataset provides a set of example questions along with their corresponding answers, and is useful for evaluating the performance of the RAG (Retrieval-Augmented Generation) system.