Gatekeeper
Performance Benchmarks

Routing target to Custom Model

Gatekeeper adds just Routing target of routing overhead. Key validation, RBAC check, budget enforcement, provider selection — all in under Routing target before your request hits the provider.

Routing target
Routing overhead
evidence pending
Targets
Providers
validated during setup
Boundary
Support scope
defined during onboarding
key-validation target
Key validation
offline validation

One-Click Latency Test

Simulate a request through the Gatekeeper routing pipeline

Routing Overhead Validation

Latency is measured during assisted onboarding before public benchmark claims.

Gatekeeper
target validation
5ms
Direct SDK
baseline reference
0ms
LiteLLM
comparison target
15ms
OpenRouter
comparison target
25ms
Portkey
cloud SaaS
18ms

* Direct SDK = 0ms overhead but requires one HTTP client per provider. Gatekeeper validates target provider routing during assisted onboarding.

Throughput Validation

Concurrent routing capacity is measured per target deployment.

Gatekeeper
assisted proof
12000ms
LiteLLM
comparison target
800ms
OpenRouter
comparison target
3000ms

Provider Failover Validation

Failover behavior is tested during assisted onboarding before production claims

Validate
Failure detection
Health-check behavior
Validate
Failover time
Switch to next provider
Validate
Client impact
Response format proof

Provider Latency Heatmap

Median (p50) and 99th percentile (p99) response times by provider (ms, excluding routing overhead)

Groq
p50120ms
p99280ms
OpenAI
p50280ms
p99620ms
Anthropic
p50310ms
p99680ms
Google
p50240ms
p99540ms
Together
p50190ms
p99420ms
Mistral
p50260ms
p99580ms
Cohere
p50220ms
p99490ms
Fireworks
p50160ms
p99350ms

Methodology

  • • Routing overhead measured from TCP accept to first byte forwarded to provider (excludes provider latency)
  • • Tests run on a single Gatekeeper node (2 vCPU, 4GB RAM) on Contabo VPS
  • • Provider latencies measured from the same host, averaging 1000 requests per provider
  • • Throughput measured with k6, 200 concurrent virtual users, 60-second sustained load
  • • Failover measured by injecting a provider timeout and recording time to first byte from backup provider