Assisted Routing for Target Models: How It Works

March 29, 2026·Gatekeeper Team·6 min read

Every AI provider has a slightly different API. OpenAI has /v1/chat/completions. Anthropic has /v1/messages. Google has Vertex AI. AWS Bedrock has its own format. Gatekeeper normalizes all of them to a single endpoint so your application code never needs to change.

The API Fragmentation Problem

When you use multiple AI providers directly, you end up with different SDK imports, different request/response schemas, different streaming protocols, and different error codes in each service. Switching providers requires code changes everywhere the AI call happens.

Without a gateway

# OpenAI call
from openai import OpenAI
client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(...)

# Anthropic call (different SDK!)
import anthropic
client = anthropic.Anthropic(api_key="sk-ant-...")
response = client.messages.create(...)

# Google (yet another SDK)
import vertexai
# ... 20 more lines of setup

With Gatekeeper

# One SDK, any model
from openai import OpenAI

client = OpenAI(
    api_key="sk-gk-myapp",
    base_url="http://gatekeeper/v1"
)

# GPT-4o
client.chat.completions.create(
    model="gpt-4o", ...)

# Claude (same code!)
client.chat.completions.create(
    model="claude-3-5-sonnet", ...)

How Translation Works

When a request arrives at Gatekeeper, the routing engine identifies the target provider from the model name. A provider-specific adapter then translates the normalized request into the format the provider expects.

Incoming: POST /v1/chat/completions
{
  "model": "claude-3-5-sonnet-20241022",
  "messages": [{"role": "user", "content": "Hello"}]
}

                    │
                    ▼ Routing Engine
                    │ model → anthropic adapter
                    │
                    ▼ Translation

Outgoing: POST https://api.anthropic.com/v1/messages
{
  "model": "claude-3-5-sonnet-20241022",
  "messages": [{"role": "user", "content": "Hello"}],
  "max_tokens": 4096
}  +  X-Api-Key header

The adapter also translates the response back to OpenAI format before returning it to the caller. Your application sees a consistent response shape regardless of provider.

Model Aliases: True Portability

The most powerful use of Gatekeeper's unified endpoint is model aliases. You can define model: "my-chat-model" in your application and point it at any underlying model — without changing application code.

Dashboard → Settings → Model Aliases

{
  "my-chat-model": "gpt-4o",
  "my-fast-model": "gpt-4o-mini",
  "my-long-context": "claude-3-5-sonnet-20241022"
}

When you want to switch my-chat-model from GPT-4o to Claude Sonnet, you change one line in the dashboard. All applications using that alias immediately route to the new model — no deployments, no PRs.

This also enables A/B testing: point an alias at two models with traffic splitting to compare quality and cost before committing to a migration.

Streaming Validation

Streaming behavior is validated per provider during assisted onboarding. Provider-specific formats and OpenAI-format SSE behavior are documented after the target workflow is proven.

python

# Streaming validation example
stream = client.chat.completions.create(
    model="claude-3-5-sonnet-20241022",  # Anthropic
    messages=[{"role": "user", "content": "Write a haiku"}],
    stream=True  # Validate behavior during onboarding
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Why This Matters for Your Architecture

No provider lock-in

Switch providers in the dashboard, not in code. Openai raises prices? One config change.

Use the best model per task

Route different feature types to the optimal model without maintaining multiple SDK integrations.

One SDK to maintain

Junior devs only need to know the OpenAI SDK. Provider-specific quirks are Gatekeeper's problem.

Unified cost tracking

One place to see spend across OpenAI, Anthropic, and Google — not three separate dashboards.

Try it in 5 minutes

One Docker run command. Works with your existing OpenAI SDK.

Read the Quick Start