ConduitLLM's router provides intelligent distribution of requests across different LLM providers and models. This functionality enables failover capabilities, load balancing, and cost optimization by directing requests to the most appropriate model deployment based on availability, cost, and performance.
Note: The router supports both text and multimodal (vision) models. Routing, fallback, and cost optimization features apply to all supported model types.
Supported Providers: ConduitLLM supports routing across providers including OpenAI, Anthropic, Cohere, Gemini, Fireworks, and OpenRouter.
The routing system consists of three main components:
The DefaultLLMRouter
is the core implementation that handles the actual routing logic:
-
Multiple Routing Strategies: Supports different ways to distribute requests:
- Simple: Uses the first available deployment for the requested model
- Random: Randomly selects from available deployments
- Round-Robin: Cycles through available deployments sequentially
-
Health Tracking: Monitors the health status of model deployments:
- Tracks successful and failed requests
- Calculates success rates and error frequencies
- Automatically marks unreliable deployments as unhealthy
-
Fallback Capabilities: When a model fails, the router can:
- Try alternative deployments of the same model
- Fall back to alternative models based on configuration
- Return appropriate errors when no options remain
-
Retry Logic: Implements sophisticated retry handling:
- Exponential backoff for temporary failures
- Configurable maximum retry attempts
- Differentiation between retryable and non-retryable errors
-
Streaming Support: Properly manages streaming completions:
- Handles async iterators for streaming responses
- Maintains consistent streaming behavior across providers
- Properly propagates streaming errors
The RouterConfig
provides the configuration model for the router:
- Strategy Selection: String-based selection of routing strategies
- Model Deployment Specifications: Defines which models are available
- Fallback Configuration: Specifies fallback paths between models
Example configuration:
{
"strategy": "round-robin",
"deployments": [
{
"model": "gpt-4-equivalent",
"provider": "openai-provider-id",
"weight": 1.0,
"isActive": true
},
{
"model": "gpt-4-equivalent",
"provider": "anthropic-provider-id",
"weight": 0.5,
"isActive": true
}
],
"fallbacks": [
{
"primaryModel": "gpt-4-equivalent",
"fallbackModels": ["gpt-3.5-equivalent", "command-equivalent"]
}
]
}
The RouterService
manages the router configuration in the database:
- Configuration Management: CRUD operations for router settings
- Model Deployment Management: Add, update, and remove model deployments
- Fallback Configuration: Define fallback paths between models
- Router Initialization: Create and configure router instances
The simplest routing approach that uses the first available deployment for a requested model:
- Advantages: Predictable behavior, minimal overhead
- Disadvantages: No load balancing, single point of failure
- Use Case: Development environments, simple deployments
Randomly selects from available deployments:
- Advantages: Basic load balancing, no state to maintain
- Disadvantages: Potential for uneven distribution
- Use Case: Multiple deployments of similar capability/cost
Cycles through available deployments in sequence:
- Advantages: Fair load distribution, predictable pattern
- Disadvantages: Requires state maintenance
- Use Case: Production environments with multiple similar deployments
The fallback system allows the router to try alternative options when a request fails:
When a specific deployment fails, the router tries another deployment of the same model:
- Model X on Provider A fails
- Router tries Model X on Provider B
- If successful, the request is fulfilled
When all deployments of a model fail, the router tries an alternative model:
- All deployments of Model X fail
- Router checks fallback configuration
- Router tries Model Y (fallback model)
- If successful, the request is fulfilled
{
"fallbacks": [
{
"primaryModel": "gpt-4-equivalent",
"fallbackModels": ["gpt-3.5-equivalent", "claude-equivalent"]
},
{
"primaryModel": "gpt-3.5-equivalent",
"fallbackModels": ["command-equivalent"]
}
]
}
The router tracks the health of model deployments to ensure reliability:
- Success Rate: Percentage of successful requests
- Error Frequency: Rate of errors over time
- Response Time: Average and percentile response times
Based on metrics, deployments are marked as:
- Healthy: Available for routing
- Degraded: Available but with caution
- Unhealthy: Temporarily excluded from routing
Unhealthy deployments are periodically tested to check if they've recovered:
- Circuit Breaker Pattern: Allows occasional test requests
- Automatic Recovery: Restores deployment to rotation when healthy
- Manual Override: Admin can force deployment status
- Client sends request for a generic model
- Router selects a deployment based on strategy
- Router sends request to the selected provider
- If successful, response is returned to client
- If failed, router tries fallback options
- If all options fail, error is returned to client
The router handles various error types:
- Transient Errors: Automatically retried with backoff
- Provider Errors: Trigger fallback to alternative providers
- Model Errors: Trigger fallback to alternative models
- Catastrophic Errors: Returned to client with helpful context
- Caching: Deployment health status is cached
- Concurrency: Router handles concurrent requests safely
- Overhead: Minimal latency added by routing logic
{
"strategy": "simple",
"deployments": [
{
"model": "chat-model",
"provider": "openai-provider-id",
"weight": 1.0,
"isActive": true
}
]
}
{
"strategy": "round-robin",
"deployments": [
{
"model": "chat-model",
"provider": "openai-provider-id",
"weight": 1.0,
"isActive": true
},
{
"model": "chat-model",
"provider": "anthropic-provider-id",
"weight": 1.0,
"isActive": true
},
{
"model": "chat-model",
"provider": "cohere-provider-id",
"weight": 0.5,
"isActive": true
}
]
}
{
"strategy": "simple",
"deployments": [
{
"model": "gpt-4-equivalent",
"provider": "openai-provider-id",
"weight": 1.0,
"isActive": true
}
],
"fallbacks": [
{
"primaryModel": "gpt-4-equivalent",
"fallbackModels": ["gpt-3.5-equivalent"]
}
]
}
GET /api/router/config
PUT /api/router/strategy
POST /api/router/deployments
PUT /api/router/deployments/{id}
DELETE /api/router/deployments/{id}
POST /api/router/fallbacks
See the API Reference for detailed endpoint documentation.
The Router can be configured through the WebUI:
- Navigate to the Configuration page
- Select the Router tab
- Configure the routing strategy
- Add and manage model deployments
- Configure fallback paths
- Multiple Providers: Configure multiple providers for critical models
- Fallback Chains: Create thoughtful fallback paths from expensive to cheaper models
- Weights: Use weights to control traffic distribution based on cost and performance
- Health Monitoring: Regularly review deployment health in the WebUI
- Testing: Test fallback behavior before relying on it in production
- Cost Optimization: The router can optimize for cost by considering model pricing, including vision/multimodal models, when distributing requests and configuring fallbacks.