One API Endpoint for 290+ Models: How It Works
Every AI provider has a slightly different API. OpenAI has /v1/chat/completions. Anthropic has /v1/messages. Google has Vertex AI. AWS Bedrock has its own format. Gatekeeper normalizes all of them to a single endpoint so your application code never needs to change.
The API Fragmentation Problem
When you use multiple AI providers directly, you end up with different SDK imports, different request/response schemas, different streaming protocols, and different error codes in each service. Switching providers requires code changes everywhere the AI call happens.
Without a gateway
# OpenAI call from openai import OpenAI client = OpenAI(api_key="sk-...") response = client.chat.completions.create(...) # Anthropic call (different SDK!) import anthropic client = anthropic.Anthropic(api_key="sk-ant-...") response = client.messages.create(...) # Google (yet another SDK) import vertexai # ... 20 more lines of setup
With Gatekeeper
# One SDK, any model
from openai import OpenAI
client = OpenAI(
api_key="sk-gk-myapp",
base_url="http://gatekeeper/v1"
)
# GPT-4o
client.chat.completions.create(
model="gpt-4o", ...)
# Claude (same code!)
client.chat.completions.create(
model="claude-3-5-sonnet", ...)How Translation Works
When a request arrives at Gatekeeper, the routing engine identifies the target provider from the model name. A provider-specific adapter then translates the normalized request into the format the provider expects.
Incoming: POST /v1/chat/completions
{
"model": "claude-3-5-sonnet-20241022",
"messages": [{"role": "user", "content": "Hello"}]
}
│
▼ Routing Engine
│ model → anthropic adapter
│
▼ Translation
Outgoing: POST https://api.anthropic.com/v1/messages
{
"model": "claude-3-5-sonnet-20241022",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 4096
} + X-Api-Key headerThe adapter also translates the response back to OpenAI format before returning it to the caller. Your application sees a consistent response shape regardless of provider.
Model Aliases: True Portability
The most powerful use of Gatekeeper's unified endpoint is model aliases. You can define model: "my-chat-model" in your application and point it at any underlying model — without changing application code.
{
"my-chat-model": "gpt-4o",
"my-fast-model": "gpt-4o-mini",
"my-long-context": "claude-3-5-sonnet-20241022"
}When you want to switch my-chat-model from GPT-4o to Claude Sonnet, you change one line in the dashboard. All applications using that alias immediately route to the new model — no deployments, no PRs.
This also enables A/B testing: point an alias at two models with traffic splitting to compare quality and cost before committing to a migration.
Streaming Works Everywhere
Streaming is normalized too. Whether the underlying provider uses SSE, chunked JSON, or a proprietary format, Gatekeeper returns standard OpenAI-format SSE to the caller.
# Streaming works the same for all providers
stream = client.chat.completions.create(
model="claude-3-5-sonnet-20241022", # Anthropic
messages=[{"role": "user", "content": "Write a haiku"}],
stream=True # Same as OpenAI streaming
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")Why This Matters for Your Architecture
Switch providers in the dashboard, not in code. Openai raises prices? One config change.
Route different feature types to the optimal model without maintaining multiple SDK integrations.
Junior devs only need to know the OpenAI SDK. Provider-specific quirks are Gatekeeper's problem.
One place to see spend across OpenAI, Anthropic, and Google — not three separate dashboards.