Features / Routing

Intelligent Model Routing

Three routing strategies. A declarative DSL. Automatic fallback chains. Route by cost, latency, capability, or any combination.

Routing Strategies

Automatically selects the cheapest model that meets your quality threshold. Configure quality tiers and let Gatekeeper minimize spend.

routing:

strategy: cost-optimized

quality_threshold: 0.85

tiers:

- models: [llama-3.1-8b, mistral-7b]

max_cost_per_1k: 0.0001

- models: [claude-3-haiku, gpt-3.5-turbo]

max_cost_per_1k: 0.001

- models: [gpt-4o, claude-3-5-sonnet]

max_cost_per_1k: 0.01 # fallback for complex

Fallback Chain

Define primary, secondary, and tertiary providers. Failure classes, retry order, and timing are validated during assisted onboarding before hard claims are published.

# Three-tier fallback chain

failover:

primary: openai/gpt-4o

secondary: anthropic/claude-3-5-sonnet # if openai 429/5xx

tertiary: groq/llama-3.1-70b # last resort

retry_on: [429, 500, 502, 503, 504]

timeout_ms: 8000

max_retries: 2

Model Capability Matching

Route based on request content. If the message contains an image, go to a vision model. If context is huge, go to Gemini 1.5 Pro.

# Route by content type

rules:

- if: contains(request.messages, "image")

route_to: vision_capable_models

- if: request.max_tokens > 4096

route_to: long_output_models

- if: request.cost_ceiling < 0.005

route_to: budget_models

Cost Ceiling

Set a per-request cost ceiling. Downgrade behavior and eligible fallback models are validated during assisted onboarding.

# Never spend more than $0.01 per request

per_request:

max_cost_usd: 0.01

on_exceed: use_cheaper_model # or: reject | truncate

# Monthly cap per key

monthly:

max_cost_usd: 500

on_exceed: block_with_429

Read the Docs Architecture Overview →