Stop Your AI Bill From Surprising You
A common story: engineering team integrates GPT-4o into a new feature. Works great in testing. Ships to production. End of month: $4,200 bill. The culprit was a batch job that ran 10,000 requests with long prompts. Nobody set a limit. Nobody knew until Stripe charged the card.
Why AI Bills Get Out of Hand
AI spending has three properties that make it hard to control with traditional cloud cost tools:
- 1.Per-request pricing is non-obvious.A developer adding "summarize this document" to a feature doesn't think "this will cost $0.003 per call." At 1 million calls per month, that is $3,000.
- 2.Token counts are unpredictable. User-submitted content (documents, code, chat messages) varies wildly in length. A feature that costs $50/month in testing costs $2,000/month when users submit long documents.
- 3.Multiple teams, one bill. Engineering, marketing, support, and data science all use the same API keys. When the bill comes in, nobody knows which team is responsible for which line item.
Step 1: Set Budget Limits
The highest-ROI action is adding a budget limit to every virtual key or team. A limit does not stop you from building — it stops you from getting a surprise at the end of the month.
# Create a key for the marketing team with a $200/month limit
curl -X POST http://localhost:4000/v1/keys \
-H "Authorization: Bearer sk-gk-master" \
-d '{
"name": "marketing-chatbot",
"budget_limit": 200.00,
"budget_period": "monthly",
"budget_action": "block" # hard limit: reject requests over budget
}'budget_action: "warn"for the first week. This lets you observe real usage without blocking anyone. Switch to "block" once you understand the actual spending pattern.Step 2: Attribute Costs to Teams
The fastest way to create cost accountability is one virtual key per team (or per application). When spend spikes, you immediately know which key — and therefore which team — is responsible.
Engineering
sk-gk-eng-*
$500/mo
Marketing
sk-gk-mkt-*
$200/mo
Support
sk-gk-sup-*
$150/mo
Gatekeeper's usage dashboard breaks down cost by virtual key, model, provider, and time period. At the end of the month, you can show each team exactly what they spent and on which models.
Step 3: Set Alert Thresholds
Budget limits are a safety net. Alerts are an early-warning system. Configure alerts to fire when a key hits 50% and 80% of its monthly budget — giving you time to investigate before anything gets blocked.
# Or via API:
PATCH /v1/keys/key_01abc
{
"alerts": [
{ "threshold_pct": 50, "channel": "email" },
{ "threshold_pct": 80, "channel": "slack_webhook" },
{ "threshold_pct": 100, "channel": "pagerduty" }
]
}Step 4: Right-Size Your Models
GPT-4o and Claude Sonnet are not always the right choice. Many use cases — classification, summarization, simple Q&A — work equally well with smaller, cheaper models.
| Use case | Expensive choice | Cheaper alternative | Savings |
|---|---|---|---|
| Ticket classification | GPT-4o ($5/1M) | GPT-4o-mini ($0.15/1M) | 97% |
| Document summary | Claude Sonnet ($3/1M) | Claude Haiku ($0.25/1M) | 92% |
| Code completion | GPT-4o ($5/1M) | DeepSeek-V3 ($0.28/1M) | 94% |
| RAG Q&A | GPT-4o ($5/1M) | Llama 3.3-70B ($0.59/1M) | 88% |
Use Gatekeeper's model aliases to A/B test cheaper models transparently. Route 10% of traffic to the cheaper model, check quality metrics, then ramp up if quality holds.
The 30-Minute Quick-Win Checklist
Set up Gatekeeper in 5 minutes
Budget limits and team attribution work out of the box. No extra configuration.
Get started