A transparent reverse-proxy that sits between your apps and AI providers. Real-time budget enforcement, anomaly detection, and Denial-of-Wallet attack prevention — with zero code changes.
Every token costs money. Without guardrails, a single misconfigured agent loop can burn $10,000+ in minutes. You only find out when the invoice arrives.
A misconfigured workflow, a retry loop, or an ambitious agent can rack up five-figure bills overnight. No provider will stop it.
Malicious actors weaponize your API keys — flooding expensive models, inflating prompts, or triggering recursive agent chains to drain your budget.
No real-time view of per-key, per-project, per-model costs. No budget enforcement. No anomaly alerts. You're flying blind.
Every request passes through a battle-tested pipeline. Budget enforcement, anomaly detection, circuit breaking, caching — all at sub-millisecond speed.
Rate limits in dollars, not requests. Set $/min, $/hr, daily, and monthly caps per API key. Micro-USD precision with atomic CAS operations for lock-free concurrency.
Z-Score (3σ deviation), IQR (quartile fences), EMA (exponential moving average), and CUSUM (cumulative sum change-point detection). Consensus-based scoring — configurable to require any, majority, or all algorithms to agree.
Stock-market inspired 5-state protection: Closed → Warn (50%) → Critical (75%) → Emergency (90%) → Open (100%). Immediate escalation, cooldown-gated de-escalation.
When budgets run low, auto-downgrade models (GPT-4.1 → GPT-4.1-mini → GPT-4.1-nano) instead of blocking. Also reduces max_tokens and disables streaming. Service stays up.
Graph-based cycle detection catches runaway agent chains before they spiral. Tracks call depth, semantic similarity, and pattern frequency across agentic workflows.
Moka LRU cache deduplicates identical requests. Zero tokens consumed, zero cost incurred. Sub-microsecond lookup latency. Massive savings on repetitive queries.
Even streaming responses are monitored in real-time. If budget is exceeded mid-stream, CostShield terminates the SSE connection immediately — no runaway streaming costs.
Organization → Project → API Key. Per-key budgets, per-project limits, per-org spending caps. Plan-based rate limits with full isolation between tenants.
Every AI API call passes through 12 security and cost-control stages. All in Rust. All under a millisecond.
Validate x-costshield-key, resolve organization, project, and API key. Check permissions and plan limits.
Auto-detect which AI provider is being targeted from the request path and headers. Route to correct adapter.
Pre-compute expected cost from token estimation, model pricing, and billing type (per-token, per-image, per-character, per-second).
Compare estimated cost against per-minute, hourly, daily, and monthly budget limits. Block or degrade if exceeded.
Evaluate 5-state financial circuit breaker. Escalate through Warn → Critical → Emergency → Open based on budget utilization.
Graph-based cycle detection for agentic workflows. Catches recursive call chains before they become infinite cost spirals.
Moka LRU semantic cache lookup. If hit: return cached response instantly, zero tokens consumed, zero cost.
Provisionally reserve the estimated cost from the key's budget. Prevents over-commitment during concurrent requests.
Proxy the request to the upstream AI provider. Support for both standard HTTP and SSE streaming responses.
Extract actual usage metrics from the provider's response. Tokens consumed, model used, finish reason, latency.
Reconcile actual cost vs. estimated. Release excess reservation or charge additional. Update sliding window buckets.
Add x-costshield-cost, x-costshield-budget-remaining headers. Return the AI response to the application.
From OpenAI to self-hosted Ollama — CostShield understands every provider's pricing model, billing type, and API format. Unified cost control for all.
CostShield ships with a built-in DoW Attack Simulator that throws 7 distinct attack patterns at your gateway. Every defense is proven, not theoretical.
10,000 requests in rapid succession to overwhelm budget tracking.
100K-token payloads designed to maximize per-request cost.
Recursive agent chains that trigger infinite call cascades.
Targeting the most expensive models (GPT-4.1, Claude Opus) to drain budgets fast.
Slow cost escalation that stays below anomaly thresholds until it's too late.
Short, intense bursts designed to slip through rate-limit windows.
Distributed attacks across multiple API keys to evade per-key detection.
HTTP/1.1 reverse proxy with SSE streaming support. Full 12-step pipeline. Sub-millisecond overhead. Lock-free concurrency with DashMap and atomic CAS.
RESTful CRUD for orgs, projects, keys, and alerts. Real-time WebSocket cost events. Prometheus metrics endpoint for observability.
Real-time cost monitoring, budget health visualization, provider analytics, anomaly alert feed. WebSocket-powered live updates.
Write-through cache: DashMap in-memory + async background writes. TimescaleDB hypertables for time-series cost data. Full migration system.
Native SDKs for Python, Node.js, and Go. Full API coverage for budget management, usage analytics, and alert configuration.
from costshield import CostShieldClient
client = CostShieldClient(
"http://localhost:3000",
"your-admin-key"
)
# Check budget status
budget = client.get_budget_status()
print(f"Daily spend: ${budget.daily_spent_usd}")
print(f"Remaining: ${budget.daily_remaining_usd}")
# Set alert
client.create_alert(
threshold_usd=50.0,
channel="slack"
)
import { CostShieldClient } from '@costshield/sdk'
const client = new CostShieldClient(
'http://localhost:3000',
'your-admin-key'
)
// Check budget status
const budget = await client.getBudgetStatus()
console.log(`Daily spend: $${budget.dailySpentUsd}`)
// Real-time cost stream
client.onCostEvent((event) => {
console.log(`$${event.cost_usd} — ${event.model}`)
})
package main
import "github.com/san-techie21/astra-costshield/sdks/go/costshield"
func main() {
client := costshield.NewClient(
"http://localhost:3000",
"your-admin-key",
)
// Check budget status
budget, err := client.GetBudgetStatus(ctx)
fmt.Printf("Daily spend: $%.2f\n", budget.DailySpentUsd)
}
docker compose up -d
helm install costshield
terraform apply
cargo run -p costshield-gateway
Self-host the full engine at zero cost. Managed cloud coming soon.
Deploy CostShield in minutes. Protect your budget from day one.