Vaidya-GPT-Prototype(PHASE-I)

Overview

This project demonstrates a Retrieval-Augmented Generation (RAG) model implementation designed to answer questions based on medical literature. The RAG model combines retrieval-based techniques and Generative AI for accurate and contextually relevant responses.

Dataset

The model is trained and tested using the following medical books:

Medical Book: General medical reference.
Gray's Anatomy for Students: Detailed anatomy reference.
Harrison's Principles of Internal Medicine: Comprehensive internal medicine resource.
Oxford Handbook of Clinical Medicine: Clinical reference for healthcare professionals.
Where There Is No Doctor: Health care guide for rural and remote areas.
Current Medical Diagnosis & Treatment: Diagnostic and treatment guidelines.
Davidson’s Principles and Practice of Medicine: Principles of modern medicine.
Harrison’s Pulmonary and Critical Care Medicine: Specialized reference for pulmonary medicine.

Each book is processed to extract relevant information, chunked into smaller text segments, and stored for retrieval during question answering.

Technology Stack

Languages and Frameworks

Python 3.10.0
LangChain: Framework for building language model applications.
Chroma: Vectorstore for document embeddings.
Google Generative AI: For embedding generation and answer generation.
FAISS: For efficient similarity searches. You can download the above techstack from requirements.txt--> {https://github.com/eeshan15/Vaidya-GPT-Prototype/blob/main/requirements.txt} and complete code from test.ipynb file ---> https://github.com/eeshan15/Vaidya-GPT-Prototype/blob/main/test.ipynb

Dependencies

langchain
langchain_community
langchain_chroma
langchain_google_genai
PyPDFLoader: For loading PDF data.
dotenv: For managing environment variables.
FAISS: For vector-based similarity search.
tqdm: For progress visualization.

Model Architecture

Document Loader:
- PDFs are loaded using PyPDFLoader.
Text Splitting:
- Documents are split into chunks of 1000 characters using RecursiveCharacterTextSplitter.
Embeddings:
- Generated using the Google Generative AI Embeddings model (embedding-001).
Vectorstore:
- Chunks and their embeddings are stored in Chroma or FAISS for efficient retrieval.
Generative AI:
- Uses the Gemini 1.5 Pro version for natural language response generation.
Retrieval Chain:
- Combines retrieved documents with a generative model to answer questions.

Key Features

Multiple Document Support: The model is designed to handle multiple books and provide consolidated answers.
Customizable Prompting: The system uses dynamic prompts to tailor responses based on retrieved content.
High Accuracy: Tested to deliver accurate and concise answers to medical queries.

Usage Instructions

Environment Setup

Clone the repository and navigate to the project directory.

Install dependencies:

pip install langchain langchain_chroma langchain_google_genai dotenv tqdm

Place your medical books in the data/ directory.

Running the Notebook

Open test.ipynb in Jupyter Notebook or an equivalent editor.
Run all the cells sequentially.

Generating Vectorstore

Run the following code to generate and save the vectorstore:

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_chroma import Chroma

# Load and process documents
loader = PyPDFLoader("data/Medical_book.pdf")
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
docs = text_splitter.split_documents(data)

# Create vectorstore
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vectorstore = Chroma.from_documents(docs, embeddings, persist_directory="vectorstore")
vectorstore.persist()

Querying the Model

After generating the vectorstore, run the following snippet to query:

retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 10})
question = "What is Myopia?"
response = rag_chain.invoke({"input": question})
print(response["answer"])

Potential Improvements

Extend the model to include more specialized datasets.
Optimize retrieval and generation for faster response times.
Integrate a user-friendly interface with Streamlit.

Conclusion

This RAG model effectively answers complex medical questions by leveraging retrieval and generation, offering a valuable tool for students, practitioners, and researchers in medicine.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
_asyncio.pyd		_asyncio.pyd
_bz2.pyd		_bz2.pyd
_ctypes.pyd		_ctypes.pyd
_ctypes_test.pyd		_ctypes_test.pyd
_decimal.pyd		_decimal.pyd
_elementtree.pyd		_elementtree.pyd
_hashlib.pyd		_hashlib.pyd
_lzma.pyd		_lzma.pyd
_msi.pyd		_msi.pyd
_multiprocessing.pyd		_multiprocessing.pyd
_overlapped.pyd		_overlapped.pyd
_queue.pyd		_queue.pyd
_socket.pyd		_socket.pyd
_sqlite3.pyd		_sqlite3.pyd
_ssl.pyd		_ssl.pyd
_testbuffer.pyd		_testbuffer.pyd
_testcapi.pyd		_testcapi.pyd
_testconsole.pyd		_testconsole.pyd
_testimportmultiple.pyd		_testimportmultiple.pyd
_testinternalcapi.pyd		_testinternalcapi.pyd
_testmultiphase.pyd		_testmultiphase.pyd
_tkinter.pyd		_tkinter.pyd
_uuid.pyd		_uuid.pyd
_zoneinfo.pyd		_zoneinfo.pyd
aau_token		aau_token
abstract.h		abstract.h
app.py		app.py
asttokens-2.4.1-pyhd8ed1ab_0.json		asttokens-2.4.1-pyhd8ed1ab_0.json
bltinmodule.h		bltinmodule.h
boolobject.h		boolobject.h
bytearrayobject.h		bytearrayobject.h
bytesobject.h		bytesobject.h
bzip2-1.0.8-h2bbff1b_6.json		bzip2-1.0.8-h2bbff1b_6.json
ca-certificates-2024.8.30-h56e8100_0.json		ca-certificates-2024.8.30-h56e8100_0.json
cellobject.h		cellobject.h
ceval.h		ceval.h
classobject.h		classobject.h
code.h		code.h
codecs.h		codecs.h
colorama-0.4.6-pyhd8ed1ab_0.json		colorama-0.4.6-pyhd8ed1ab_0.json
comm-0.2.2-pyhd8ed1ab_0.json		comm-0.2.2-pyhd8ed1ab_0.json
compile.h		compile.h
complexobject.h		complexobject.h
conda-meta		conda-meta
context.h		context.h
datetime.h		datetime.h
debugpy-1.6.7-py310hd77b12b_0.json		debugpy-1.6.7-py310hd77b12b_0.json
decorator-5.1.1-pyhd8ed1ab_0.json		decorator-5.1.1-pyhd8ed1ab_0.json
descrobject.h		descrobject.h
dictobject.h		dictobject.h
dynamic_annotations.h		dynamic_annotations.h
enumobject.h		enumobject.h
errcode.h		errcode.h
eval.h		eval.h
exceptiongroup-1.2.2-pyhd8ed1ab_0.json		exceptiongroup-1.2.2-pyhd8ed1ab_0.json
executing-2.1.0-pyhd8ed1ab_0.json		executing-2.1.0-pyhd8ed1ab_0.json
exports.h		exports.h
fileobject.h		fileobject.h
fileutils.h		fileutils.h
floatobject.h		floatobject.h
frameobject.h		frameobject.h
funcobject.h		funcobject.h
genericaliasobject.h		genericaliasobject.h
genobject.h		genobject.h
history		history
import.h		import.h
importlib-metadata-8.5.0-pyha770c72_0.json		importlib-metadata-8.5.0-pyha770c72_0.json
interpreteridobject.h		interpreteridobject.h
intrcheck.h		intrcheck.h
ipykernel-6.29.5-pyh4bbf305_0.json		ipykernel-6.29.5-pyh4bbf305_0.json
ipython-8.29.0-pyh7428d3b_0.json		ipython-8.29.0-pyh7428d3b_0.json
ipython.1		ipython.1
isympy.1		isympy.1
iterobject.h		iterobject.h
jedi-0.19.2-pyhff2d567_0.json		jedi-0.19.2-pyhff2d567_0.json
jupyter_client-8.6.3-pyhd8ed1ab_0.json		jupyter_client-8.6.3-pyhd8ed1ab_0.json
jupyter_core-5.7.2-py310h5588dad_0.json		jupyter_core-5.7.2-py310h5588dad_0.json
libffi-3.4.4-hd77b12b_1.json		libffi-3.4.4-hd77b12b_1.json
libsodium-1.0.18-h8d14728_1.json		libsodium-1.0.18-h8d14728_1.json
listobject.h		listobject.h
longintrepr.h		longintrepr.h
longobject.h		longobject.h
marshal.h		marshal.h
matplotlib-inline-0.1.7-pyhd8ed1ab_0.json		matplotlib-inline-0.1.7-pyhd8ed1ab_0.json
memoryobject.h		memoryobject.h
methodobject.h		methodobject.h
modsupport.h		modsupport.h
moduleobject.h		moduleobject.h
namespaceobject.h		namespaceobject.h
nest-asyncio-1.6.0-pyhd8ed1ab_0.json		nest-asyncio-1.6.0-pyhd8ed1ab_0.json
object.h		object.h
objimpl.h		objimpl.h
opcode.h		opcode.h
openssl-1.1.1l-h8ffe710_0.json		openssl-1.1.1l-h8ffe710_0.json
osdefs.h		osdefs.h
osmodule.h		osmodule.h
packaging-24.2-pyhd8ed1ab_0.json		packaging-24.2-pyhd8ed1ab_0.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vaidya-GPT-Prototype(PHASE-I)

Overview

Dataset

Technology Stack

Languages and Frameworks

Dependencies

Model Architecture

Key Features

Usage Instructions

Environment Setup

Running the Notebook

Generating Vectorstore

Querying the Model

Potential Improvements

Conclusion

About

Releases

Packages

Languages

License

eeshan15/Vaidya-GPT-Prototype

Folders and files

Latest commit

History

Repository files navigation

Vaidya-GPT-Prototype(PHASE-I)

Overview

Dataset

Technology Stack

Languages and Frameworks

Dependencies

Model Architecture

Key Features

Usage Instructions

Environment Setup

Running the Notebook

Generating Vectorstore

Querying the Model

Potential Improvements

Conclusion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages