Gatekeeper
Features / Routing

Intelligent Model Routing

Three routing strategies. A declarative DSL. Automatic fallback chains. Route by cost, latency, capability, or any combination.

Routing Strategies

Automatically selects the cheapest model that meets your quality threshold. Configure quality tiers and let Gatekeeper minimize spend.

routing:
strategy: cost-optimized
quality_threshold: 0.85
tiers:
- models: [llama-3.1-8b, mistral-7b]
max_cost_per_1k: 0.0001
- models: [claude-3-haiku, gpt-3.5-turbo]
max_cost_per_1k: 0.001
- models: [gpt-4o, claude-3-5-sonnet]
max_cost_per_1k: 0.01 # fallback for complex

Fallback Chain

Define primary, secondary, and tertiary providers. Failure classes, retry order, and timing are validated during assisted onboarding before hard claims are published.

# Three-tier fallback chain
failover:
primary: openai/gpt-4o
secondary: anthropic/claude-3-5-sonnet # if openai 429/5xx
tertiary: groq/llama-3.1-70b # last resort
retry_on: [429, 500, 502, 503, 504]
timeout_ms: 8000
max_retries: 2

Model Capability Matching

Route based on request content. If the message contains an image, go to a vision model. If context is huge, go to Gemini 1.5 Pro.

# Route by content type
rules:
- if: contains(request.messages, "image")
route_to: vision_capable_models
- if: request.max_tokens > 4096
route_to: long_output_models
- if: request.cost_ceiling < 0.005
route_to: budget_models

Cost Ceiling

Set a per-request cost ceiling. Downgrade behavior and eligible fallback models are validated during assisted onboarding.

# Never spend more than $0.01 per request
per_request:
max_cost_usd: 0.01
on_exceed: use_cheaper_model # or: reject | truncate
# Monthly cap per key
monthly:
max_cost_usd: 500
on_exceed: block_with_429