Skip to content

Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.

License

Notifications You must be signed in to change notification settings

EulerSearch/embedding_studio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

55 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Embedding Studio

πŸ‘‰ Try the Live Demo

version Python 3.10 CUDA 11.7.1 Docker Compose Version

Website β€’ Documentation β€’ Challenges & Solutions β€’ Use Cases

Embedding Studio is an innovative open-source framework designed to transform embedding models and vector databases into comprehensive, self-improving search engines. With built-in clickstream collection, continuous model refinement, and intelligent vector optimization, it creates a feedback loop that enhances search quality over time based on real user interactions.

Community Support
Embedding Studio grows with our team's enthusiasm. Your star on the repository helps us keep developing.
Join us in reaching our goal:

Progress

Features

Core Capabilities

  1. πŸ”„ Full-Cycle Search Engine - Transform your vector database into a complete search solution
  2. πŸ–±οΈ User Feedback Collection - Automatically gather clickstream and session data
  3. πŸš€ Continuous Improvement - Enhance search quality on-the-fly without long waiting periods
  4. πŸ“Š Performance Monitoring - Track search quality metrics through comprehensive dashboards
  5. 🎯 Iterative Fine-Tuning - Improve your embedding model through user interaction data
  6. πŸ” Blue-Green Deployment - Zero-downtime deployment of improved embedding models
  7. πŸ’Ύ Multi-Source Integration - Connect to various data sources (S3, GCP, PostgreSQL, etc.)
  8. 🧠 Vector Optimization - Apply post-training adjustments for incremental improvements

Specialized Features

  • πŸ“ˆ Personalization Support - Create user-specific vector adjustments based on individual behavior
  • πŸ’¬ Suggestion System - Generate intelligent query autocompletions based on user patterns
  • πŸ”Ž Category Prediction - Automatically identify relevant categories for search queries
  • πŸ”€ Multi-Modal Support - Work with text, images, and structured data in one framework
  • 🧩 Plugin Architecture - Extend functionality through a comprehensive plugin system

In Development (*)

  • πŸ“‘ Zero-Shot Query Parser - Mix structured and unstructured search queries
  • πŸ“š Catalog Pre-Training - Fine-tune embedding models on your specific content before deployment
  • πŸ“Š Advanced Analytics - More detailed insights into search performance and user behavior

(*) - Features in active development

When is Embedding Studio the Best Fit?

More about it here.

  • πŸ“šπŸ’Ό Rich Content Collections - Businesses with extensive catalogs and unstructured data
  • πŸ›οΈπŸ€ Customer-Centric Platforms - Applications prioritizing personalized user experiences
  • πŸ”„πŸ“Š Dynamic Content - Platforms with evolving content and changing user preferences
  • πŸ”πŸ§  Complex Queries - Systems handling nuanced and multifaceted search needs
  • πŸ”„πŸ“Š Mixed Data Types - Applications integrating different data formats in search
  • πŸ”„πŸš€ Continuous Improvement - Platforms seeking ongoing optimization through user interactions
  • πŸ’΅πŸ’‘ Cost-Conscious Organizations - Teams looking for powerful yet affordable solutions

Challenges Solved

Disclaimer: Embedding Studio is not another Vector Database - it's a framework that transforms your Vector Database into a complete Search Engine with all necessary components.

  • βœ… Cold Start Problems - Jump-start search quality with minimal data
  • βœ… Static Search Quality - Create systems that improve automatically over time
  • βœ… Long Improvement Cycles - Reduce frustration with fast feedback loops
  • βœ… Resource-Heavy Reindexing - Optimize the updating process for better performance
  • βœ… Hybrid Search Complexity - Seamlessly combine structured and unstructured search
  • βœ… Query Understanding - Parse natural language queries more effectively
  • βœ… New Content Discovery - Ensure fresh items get proper visibility

More about challenges and solutions here

System Architecture

Embedding Studio uses a modular, service-based architecture:

Core Components

  • API Service - Central coordination point for applications
  • Vector Database - PostgreSQL with pgvector for embedding storage
  • Clickstream System - Captures and processes user interactions
  • Worker Services:
    • Fine-Tuning Worker - Handles model training and improvement
    • Inference Worker - Manages Triton Inference Server for embeddings
    • Improvement Worker - Processes incremental vector adjustments
    • Upsertion Worker - Manages content updates and indexing

Data Flow

  1. Content Ingestion - Load data from various sources
  2. User Interaction - Collect clickstream data through API endpoints
  3. Fine-Tuning - Use interaction data to improve embedding models
  4. Model Deployment - Update inference service with improved models
  5. Search and Retrieval - Deliver better results based on fine-tuned models

Comparison with Traditional Approaches

Embedding Studio Chart

Our framework enables you to continuously fine-tune your model based on user experience, allowing you to form search results for user queries faster and more accurately.

$${\color{red}RED:}$$ On the graph, typical search solutions without enhancements, such as Full Text Searching (FTS), Nearest Neighbor Search (NNS), and others, are marked in red. Without the use of additional tools, the search quality remains unchanged over time.

$${\color{orange}ORANGE:}$$ Solutions are depicted that accumulate some feedback (clicks, reviews, votes, discussions, etc.) and then initiate a full model retraining. The primary issue with these solutions is that full model retraining is a time-consuming and expensive procedure, thus lacking reactive adjustments (for example, when a product suddenly experiences increased demand, and the search system has not yet adapted to it).

$${\color{#6666ff}INDIGO:}$$ We propose a solution that allows collecting user feedback and rapidly retraining the model on the difference between the old and new versions. This enables a smoother and more relevant search quality curve for your system.

Getting Started

Prerequisites

  • Docker Compose v2.17.0+
  • For fine-tuning: NVIDIA GPU with CUDA support
  • Minimum 8GB RAM allocated to Docker

Documentation

For comprehensive documentation:

Plugin System

Embedding Studio features a powerful plugin architecture allowing extension of:

  • Data loaders for different sources
  • Text and image processors
  • Fine-tuning methods
  • Vector optimization strategies
  • Query processing logic

Create custom plugins by extending base classes and implementing your specific logic.

Contributing

We welcome contributions to Embedding Studio! To contribute:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

Please check our contributing guidelines for detailed information.

πŸ“¬ Contact Us

EulerSearch Inc.
3416, 1007 N Orange St. 4th Floor,
Wilmington, DE, New Castle, US, 19801
Contact Email: aleksandr.iudaev@eulersearch.com
Phone: +34 (691) 454 148
LinkedIn: https://www.linkedin.com/in/alexanderyudaev/

License

Embedding Studio is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.