GitHub - EulerSearch/embedding_studio: Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.

Website • Documentation • Challenges & Solutions • Use Cases

Embedding Studio is an innovative open-source framework designed to transform embedding models and vector databases into comprehensive, self-improving search engines. With built-in clickstream collection, continuous model refinement, and intelligent vector optimization, it creates a feedback loop that enhances search quality over time based on real user interactions.

Community Support

Embedding Studio grows with our team's enthusiasm. Your star on the repository helps us keep developing.
Join us in reaching our goal:

Features

Core Capabilities

🔄 Full-Cycle Search Engine - Transform your vector database into a complete search solution
🖱️ User Feedback Collection - Automatically gather clickstream and session data
🚀 Continuous Improvement - Enhance search quality on-the-fly without long waiting periods
📊 Performance Monitoring - Track search quality metrics through comprehensive dashboards
🎯 Iterative Fine-Tuning - Improve your embedding model through user interaction data
🔍 Blue-Green Deployment - Zero-downtime deployment of improved embedding models
💾 Multi-Source Integration - Connect to various data sources (S3, GCP, PostgreSQL, etc.)
🧠 Vector Optimization - Apply post-training adjustments for incremental improvements

Specialized Features

📈 Personalization Support - Create user-specific vector adjustments based on individual behavior
💬 Suggestion System - Generate intelligent query autocompletions based on user patterns
🔎 Category Prediction - Automatically identify relevant categories for search queries
🔤 Multi-Modal Support - Work with text, images, and structured data in one framework
🧩 Plugin Architecture - Extend functionality through a comprehensive plugin system

In Development (*)

📑 Zero-Shot Query Parser - Mix structured and unstructured search queries
📚 Catalog Pre-Training - Fine-tune embedding models on your specific content before deployment
📊 Advanced Analytics - More detailed insights into search performance and user behavior

(*) - Features in active development

When is Embedding Studio the Best Fit?

More about it here.

📚💼 Rich Content Collections - Businesses with extensive catalogs and unstructured data
🛍️🤝 Customer-Centric Platforms - Applications prioritizing personalized user experiences
🔄📊 Dynamic Content - Platforms with evolving content and changing user preferences
🔍🧠 Complex Queries - Systems handling nuanced and multifaceted search needs
🔄📊 Mixed Data Types - Applications integrating different data formats in search
🔄🚀 Continuous Improvement - Platforms seeking ongoing optimization through user interactions
💵💡 Cost-Conscious Organizations - Teams looking for powerful yet affordable solutions

Challenges Solved

Disclaimer: Embedding Studio is not another Vector Database - it's a framework that transforms your Vector Database into a complete Search Engine with all necessary components.

✅ Cold Start Problems - Jump-start search quality with minimal data
✅ Static Search Quality - Create systems that improve automatically over time
✅ Long Improvement Cycles - Reduce frustration with fast feedback loops
✅ Resource-Heavy Reindexing - Optimize the updating process for better performance
✅ Hybrid Search Complexity - Seamlessly combine structured and unstructured search
✅ Query Understanding - Parse natural language queries more effectively
✅ New Content Discovery - Ensure fresh items get proper visibility

More about challenges and solutions here

System Architecture

Embedding Studio uses a modular, service-based architecture:

Core Components

API Service - Central coordination point for applications
Vector Database - PostgreSQL with pgvector for embedding storage
Clickstream System - Captures and processes user interactions
Worker Services:
- Fine-Tuning Worker - Handles model training and improvement
- Inference Worker - Manages Triton Inference Server for embeddings
- Improvement Worker - Processes incremental vector adjustments
- Upsertion Worker - Manages content updates and indexing

Data Flow

Content Ingestion - Load data from various sources
User Interaction - Collect clickstream data through API endpoints
Fine-Tuning - Use interaction data to improve embedding models
Model Deployment - Update inference service with improved models
Search and Retrieval - Deliver better results based on fine-tuned models

Comparison with Traditional Approaches

Our framework enables you to continuously fine-tune your model based on user experience, allowing you to form search results for user queries faster and more accurately.

$${\color{red}RED:}$$ On the graph, typical search solutions without enhancements, such as Full Text Searching (FTS), Nearest Neighbor Search (NNS), and others, are marked in red. Without the use of additional tools, the search quality remains unchanged over time.

$${\color{orange}ORANGE:}$$ Solutions are depicted that accumulate some feedback (clicks, reviews, votes, discussions, etc.) and then initiate a full model retraining. The primary issue with these solutions is that full model retraining is a time-consuming and expensive procedure, thus lacking reactive adjustments (for example, when a product suddenly experiences increased demand, and the search system has not yet adapted to it).

$${\color{#6666ff}INDIGO:}$$ We propose a solution that allows collecting user feedback and rapidly retraining the model on the difference between the old and new versions. This enables a smoother and more relevant search quality curve for your system.

Getting Started

Prerequisites

Docker Compose v2.17.0+
For fine-tuning: NVIDIA GPU with CUDA support
Minimum 8GB RAM allocated to Docker

Documentation

For comprehensive documentation:

Plugin System

Embedding Studio features a powerful plugin architecture allowing extension of:

Data loaders for different sources
Text and image processors
Fine-tuning methods
Vector optimization strategies
Query processing logic

Create custom plugins by extending base classes and implementing your specific logic.

Contributing

We welcome contributions to Embedding Studio! To contribute:

Fork the repository
Create a feature branch
Submit a pull request

Please check our contributing guidelines for detailed information.

📬 Contact Us

EulerSearch Inc.
3416, 1007 N Orange St. 4th Floor,
Wilmington, DE, New Castle, US, 19801
Contact Email: aleksandr.iudaev@eulersearch.com
Phone: +34 (691) 454 148
LinkedIn: https://www.linkedin.com/in/alexanderyudaev/

License

Embedding Studio is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
assets		assets
docs		docs
embedding_studio		embedding_studio
examples		examples
plugins		plugins
scripts		scripts
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.local.yml		docker-compose.local.yml
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
service.Dockerfile		service.Dockerfile
setup.py		setup.py
worker.fine_tuning.Dockerfile		worker.fine_tuning.Dockerfile
worker.improvement_worker.Dockerfile		worker.improvement_worker.Dockerfile
worker.inference.Dockerfile		worker.inference.Dockerfile
worker.upsertion_worker.Dockerfile		worker.upsertion_worker.Dockerfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Core Capabilities

Specialized Features

In Development (*)

When is Embedding Studio the Best Fit?

Challenges Solved

System Architecture

Core Components

Data Flow

Comparison with Traditional Approaches

Getting Started

Prerequisites

Documentation

Plugin System

Contributing

📬 Contact Us

License

About

Releases 2

Packages

Contributors 3

Languages

License

EulerSearch/embedding_studio

Folders and files

Latest commit

History

Repository files navigation

Features

Core Capabilities

Specialized Features

In Development (*)

When is Embedding Studio the Best Fit?

Challenges Solved

System Architecture

Core Components

Data Flow

Comparison with Traditional Approaches

Getting Started

Prerequisites

Documentation

Plugin System

Contributing

📬 Contact Us

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages