Features / Routing
Intelligent Model Routing
Three routing strategies. A declarative DSL. Automatic fallback chains. Route by cost, latency, capability, or any combination.
Routing Strategies
Automatically selects the cheapest model that meets your quality threshold. Configure quality tiers and let Gatekeeper minimize spend.
routing:
strategy: cost-optimized
quality_threshold: 0.85
tiers:
- models: [llama-3.1-8b, mistral-7b]
max_cost_per_1k: 0.0001
- models: [claude-3-haiku, gpt-3.5-turbo]
max_cost_per_1k: 0.001
- models: [gpt-4o, claude-3-5-sonnet]
max_cost_per_1k: 0.01 # fallback for complex
Fallback Chain
Define primary, secondary, and tertiary providers. Failure classes, retry order, and timing are validated during assisted onboarding before hard claims are published.
# Three-tier fallback chain
failover:
primary: openai/gpt-4o
secondary: anthropic/claude-3-5-sonnet # if openai 429/5xx
tertiary: groq/llama-3.1-70b # last resort
retry_on: [429, 500, 502, 503, 504]
timeout_ms: 8000
max_retries: 2
Model Capability Matching
Route based on request content. If the message contains an image, go to a vision model. If context is huge, go to Gemini 1.5 Pro.
# Route by content type
rules:
- if: contains(request.messages, "image")
route_to: vision_capable_models
- if: request.max_tokens > 4096
route_to: long_output_models
- if: request.cost_ceiling < 0.005
route_to: budget_models
Cost Ceiling
Set a per-request cost ceiling. Downgrade behavior and eligible fallback models are validated during assisted onboarding.
# Never spend more than $0.01 per request
per_request:
max_cost_usd: 0.01
on_exceed: use_cheaper_model # or: reject | truncate
# Monthly cap per key
monthly:
max_cost_usd: 500
on_exceed: block_with_429