Gatekeeper
Architecture

How Gatekeeper Routes
Your AI Requests

Every request travels through a deterministic, cryptographically-verified pipeline. Token validation is offline. Rate limiting uses CRDTs. All requests log to JetStream.

Request Flow

Click any step to expand details.

1
Client RequestCustom OpenAI-compatible client
2
Ed25519 Key Validationoffline validation target · offline · no server call
offline validation target
3
ZK Rate Limit CheckProve within limit without revealing usage
ZK proof target
4
CRDT Token BudgetGCounter — distributed, no coordinator
no Redis
5
Model RouterCost / latency / capability optimized
~2ms
6
Provider Pooltarget providers · JetStream health tracking
target providers
7
SSE Streaming ResponseOpenAI-compatible stream
8
NATS Audit LogJetStream BH_AUDIT — tamper-evident
BH_AUDIT
Total Gatekeeper overhead: ~4ms (steps 2–5) · Provider network latency not included
Token validation is offline

offline validation in offline validation target — no network call, no OCSP, no PKI server. The token IS the credential.

Rate limiting via CRDT GCounter

Distributed nodes agree on budget without a coordinator. No Redis, no central DB, no SPOF.

JetStream stream targets

Audit, load shedding, health tracking, and routing state are validated in the target environment.

3 Routing Strategies

Select per request via the X-Routing-Mode header, or set a default per API key.

Routes to the cheapest model that meets the quality threshold. Simple queries go to Llama on Groq at $0.00008/1K tokens. Complex queries escalate to GPT-4o or Claude only when the task complexity score requires it.

  • Cost reduction validated during assisted onboarding
  • Task complexity scored before routing
  • Quality threshold configurable per API key
  • Cost tracked via CRDT GCounter across nodes
query: "What is 2+2?"
→ llama-3.1-8b on Groq ($0.00008/1K)
query: "Write a legal contract"
→ claude-3-5-sonnet ($0.003/1K)
query: "Generate unit tests"
→ deepseek-r1 ($0.0005/1K)

JetStream Stream Targets

Stream provisioning, load shedding, audit, and health tracking are validated during assisted onboarding.

Stream

BH_AUDIT

All requests — append-only, tamper-evident

Stream

BH_EVENTS

Provider failover + routing decisions

Stream

BH_HEALTH

Provider p99 latency time-series

Stream

BH_ALERTS

Budget exhaustion + SLA breach alerts

Stream

BH_SESSIONS

API key session state

Stream

BH_CONFIG

Routing rules + capability matrix (KV)

Failover Validation

Provider error classes, retry order, timing, health tracking, and client response behavior are tested in the target environment before hard failover claims are published.

# Failover chain (configured in gatekeeper.yaml)
failover_chain:
- provider: openai priority: 1
- provider: anthropic priority: 2 # if openai returns 429/5xx
- provider: groq priority: 3 # final fallback, $0.00008/1K

CRDT Token Budget

Distributed rate limiting targets a GCounter CRDT from internal/crdt/. Usage gossip, convergence, and budget behavior are validated during assisted onboarding.

GCounter.Increment(key, tokens_used)
GCounter.Value(key) <= budget // always consistent

Zero Single Point of Failure

Gatekeeper runs as multiple stateless nodes. JetStream provides durable message delivery and audit. BH_HEALTH KV tracks provider state. Custom node can handle any request — no sticky sessions, no shared mutable state outside CRDTs.

No central RedisNo sticky sessionsCRDT consensusJetStream durable