This project implements a Semantic Search Engine using LangChain, Pinecone, and Hugging Face Embeddings. It allows users to perform semantic searches on product and service data, providing context-aware results using advanced machine learning models.
- Features
- Technology Stack
- Installation
- Usage
- Environment Variables
- Project Structure
- Data Used
- How It Works
- Logging
- Semantic Search: Finds relevant products and services based on natural language queries.
- Conversational QA: Uses a pre-trained language model to answer queries contextually.
- Vector Indexing: Efficient search and retrieval using Pinecone.
- Dynamic Document Creation: Converts product and service data into document objects for indexing.
- Python
- LangChain
- Pinecone
- Hugging Face Embeddings
- ChatGroq
- dotenv (for environment variable management)
- pandas (for data handling)
-
Clone the repository:
git clone https://github.com/your-repo/semantic-search-engine.git cd semantic-search-engine
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Linux/Mac venv\Scripts\activate # On Windows
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up your
.env
file with the required API keys (see Environment Variables). -
Run the Jupyter Notebook to initialize the embeddings and Pinecone index:
jupyter notebook SearchEngine_Pinecone.ipynb
-
Use the semantic search functions to perform searches on the product and service data.
Create a .env
file in the project root directory and add the following variables:
GROQ_API_KEY=your_chatgroq_api_key
PINECONE_KEY=your_pinecone_api_key
PINECONE_ENV=your_pinecone_environment
Replace the values with your actual API keys.
semantic-search-engine/
├── SearchEngine_Pinecone.ipynb
├── requirements.txt
├── .env
└── README.md
The project includes example data for products and services:
- Products: Contains product IDs, names, descriptions, gender, and base color.
- Services: Contains service IDs, names, descriptions, and categories.
The .env
file is used to securely manage API keys for Pinecone and ChatGroq.
A fine-tuned SentenceTransformer model is used to generate embeddings for queries and documents.
Products and services are converted into document objects that can be indexed in Pinecone.
The documents are indexed in Pinecone for fast and efficient retrieval.
Queries are processed using LangChain to provide context-aware responses from the indexed data.
The project logs important events and errors to semantic_search.log
.
2025-01-07 10:00:00 - INFO - HuggingFaceEmbeddings initialized with model: models/fine-tuned-sbert-triplet
2025-01-07 10:05:00 - INFO - ChatGroq LLM initialized successfully.