Architecture

How Gatekeeper Routes
Your AI Requests

Every request travels through a deterministic, cryptographically-verified pipeline. Token validation is offline. Rate limiting uses CRDTs. All requests log to JetStream.

Request Flow

Click any step to expand details.

Client RequestCustom OpenAI-compatible client

Ed25519 Key Validationoffline validation target · offline · no server call

offline validation target

ZK Rate Limit CheckProve within limit without revealing usage

ZK proof target

CRDT Token BudgetGCounter — distributed, no coordinator

no Redis

Model RouterCost / latency / capability optimized

~2ms

Provider Pooltarget providers · JetStream health tracking

target providers

SSE Streaming ResponseOpenAI-compatible stream

NATS Audit LogJetStream BH_AUDIT — tamper-evident

BH_AUDIT

Total Gatekeeper overhead: ~4ms (steps 2–5) · Provider network latency not included

Token validation is offline

offline validation in offline validation target — no network call, no OCSP, no PKI server. The token IS the credential.

Rate limiting via CRDT GCounter

Distributed nodes agree on budget without a coordinator. No Redis, no central DB, no SPOF.

JetStream stream targets

Audit, load shedding, health tracking, and routing state are validated in the target environment.

3 Routing Strategies

Select per request via the X-Routing-Mode header, or set a default per API key.

Routes to the cheapest model that meets the quality threshold. Simple queries go to Llama on Groq at $0.00008/1K tokens. Complex queries escalate to GPT-4o or Claude only when the task complexity score requires it.

Cost reduction validated during assisted onboarding
Task complexity scored before routing
Quality threshold configurable per API key
Cost tracked via CRDT GCounter across nodes

query: "What is 2+2?"
→ llama-3.1-8b on Groq ($0.00008/1K)

query: "Write a legal contract"
→ claude-3-5-sonnet ($0.003/1K)

query: "Generate unit tests"
→ deepseek-r1 ($0.0005/1K)

JetStream Stream Targets

Stream provisioning, load shedding, audit, and health tracking are validated during assisted onboarding.

Stream

BH_AUDIT

All requests — append-only, tamper-evident

Stream

BH_EVENTS

Provider failover + routing decisions

Stream

BH_HEALTH

Provider p99 latency time-series

Stream

BH_ALERTS

Budget exhaustion + SLA breach alerts

Stream

BH_SESSIONS

API key session state

Stream

BH_CONFIG

Routing rules + capability matrix (KV)

Failover Validation

Provider error classes, retry order, timing, health tracking, and client response behavior are tested in the target environment before hard failover claims are published.

# Failover chain (configured in gatekeeper.yaml)

failover_chain:

- provider: openai priority: 1

- provider: anthropic priority: 2 # if openai returns 429/5xx

- provider: groq priority: 3 # final fallback, $0.00008/1K

CRDT Token Budget

Distributed rate limiting targets a GCounter CRDT from internal/crdt/. Usage gossip, convergence, and budget behavior are validated during assisted onboarding.

GCounter.Increment(key, tokens_used)
GCounter.Value(key) <= budget // always consistent

Zero Single Point of Failure

Gatekeeper runs as multiple stateless nodes. JetStream provides durable message delivery and audit. BH_HEALTH KV tracks provider state. Custom node can handle any request — no sticky sessions, no shared mutable state outside CRDTs.

No central RedisNo sticky sessionsCRDT consensusJetStream durable

How Gatekeeper RoutesYour AI Requests

Request Flow

3 Routing Strategies

JetStream Stream Targets

Failover Validation

CRDT Token Budget

Zero Single Point of Failure

How Gatekeeper Routes
Your AI Requests