Welcome to the Disha Chatbot GitHub repository! This project is an innovative solution designed to streamline the user experience for navigating the IIIT Nagpur website. Built with cutting-edge Machine Learning (ML), Natural Language Processing (NLP), and Large Language Models (LLMs), Disha provides instant, user-friendly responses to a variety of queries.
- Enables natural and intuitive conversations.
- Provides accurate and contextual answers to queries about IIIT Nagpur.
- Extracts text and images from IIIT Nagpur’s website using OCR.
- Structures data into a comprehensive JSON format for training.
- Combines fine-tuned LLMs and Retrieval-Augmented Generation (RAG) for precise answers.
- Responses are verified for maximum reliability.
- Measures output quality using BLEU, ROUGE-L, Semantic Similarity, and Human Score metrics.
- LLaMA-3.2-1B: Fine-tuned with rank values R-8, R-16, R-32, and Phi-3.5.
- Phi-3.5-mini
- PEFT Techniques: Efficient fine-tuning with LoRA and QLoRA.
- Retrieves accurate, contextually relevant data from external databases.
- Utilizes:
- Pinecone: Vector database for optimized search and retrieval.
- LangChain: For seamless data pipelines.
- Google Gemini API: Provides accurate, summarized answers.
Model | BLEU | ROUGE-L | Semantic Similarity | Human Evaluation | Trained Parameters |
---|---|---|---|---|---|
LLAMA-3.2-1b (R=8) | 0.925700 | 0.964550 | 0.998106 | 0.934744 | 12,156,928 |
LLAMA-3.2-1b (R=16) | 0.925950 | 0.964757 | 0.998106 | 0.942012 | 24,313,856 |
LLAMA-3.2-1b (R=32) | 0.924404 | 0.963656 | 0.998096 | 0.946338 | 48,627,712 |
Phi 3.5 Mini | 0.785048 | 0.886750 | 0.998205 | 0.852504 | 29,884,416 |
RAG | 0.964902 | 0.996087 | 0.995800 | 0.967379 | 0 |
- Integrates RAG and fine-tuned LLMs for robust performance.
- Ensures all critical details are included in responses.
- Delivers user-friendly, conversational interactions.
- Expand language support beyond Hindi and English.
- Enhance scalability for larger datasets and more complex queries.
- Integrate additional evaluation metrics to improve accuracy.
Feel free to fork, contribute, and enhance Disha for broader applications!