The World's First AI Cost Firewall

Your AI Spend Has
No Firewall.
Until Now.

A transparent reverse-proxy that sits between your apps and AI providers. Real-time budget enforcement, anomaly detection, and Denial-of-Wallet attack prevention — with zero code changes.

0+
AI Providers
0+
Models Tracked
0
Step Pipeline
<1ms
Proxy Overhead
0+
Tests Passing
Zero Code Changes
# Before — direct to OpenAI
client = OpenAI(base_url="https://api.openai.com/v1")
 
# After — through CostShield (that's it!)
client = OpenAI(base_url="http://localhost:8080/v1")
 
# CostShield handles everything: auth, budget, anomaly detection,
# circuit breaker, caching, cost tracking... transparently.

AI APIs are a financial black hole

Every token costs money. Without guardrails, a single misconfigured agent loop can burn $10,000+ in minutes. You only find out when the invoice arrives.

Runaway AI Bills

A misconfigured workflow, a retry loop, or an ambitious agent can rack up five-figure bills overnight. No provider will stop it.

$10K+ burned in one night

Denial-of-Wallet Attacks

Malicious actors weaponize your API keys — flooding expensive models, inflating prompts, or triggering recursive agent chains to drain your budget.

7 known attack vectors

Zero Spending Visibility

No real-time view of per-key, per-project, per-model costs. No budget enforcement. No anomaly alerts. You're flying blind.

0 guardrails by default

8 defense layers. One proxy.

Every request passes through a battle-tested pipeline. Budget enforcement, anomaly detection, circuit breaking, caching — all at sub-millisecond speed.

Budget-Denominated Rate Limiting

Rate limits in dollars, not requests. Set $/min, $/hr, daily, and monthly caps per API key. Micro-USD precision with atomic CAS operations for lock-free concurrency.

CARLE Algorithm

4-Algorithm Anomaly Detection

Z-Score (3σ deviation), IQR (quartile fences), EMA (exponential moving average), and CUSUM (cumulative sum change-point detection). Consensus-based scoring — configurable to require any, majority, or all algorithms to agree.

Multi-Algorithm Ensemble

Financial Circuit Breaker

Stock-market inspired 5-state protection: Closed → Warn (50%) → Critical (75%) → Emergency (90%) → Open (100%). Immediate escalation, cooldown-gated de-escalation.

Graceful Degradation

When budgets run low, auto-downgrade models (GPT-4.1 → GPT-4.1-mini → GPT-4.1-nano) instead of blocking. Also reduces max_tokens and disables streaming. Service stays up.

Agentic Loop Detection

Graph-based cycle detection catches runaway agent chains before they spiral. Tracks call depth, semantic similarity, and pattern frequency across agentic workflows.

Semantic Response Cache

Moka LRU cache deduplicates identical requests. Zero tokens consumed, zero cost incurred. Sub-microsecond lookup latency. Massive savings on repetitive queries.

Mid-Stream SSE Enforcement

Even streaming responses are monitored in real-time. If budget is exceeded mid-stream, CostShield terminates the SSE connection immediately — no runaway streaming costs.

Multi-Tenant Hierarchy

Organization → Project → API Key. Per-key budgets, per-project limits, per-org spending caps. Plan-based rate limits with full isolation between tenants.

The 12-Step Request Pipeline

Every AI API call passes through 12 security and cost-control stages. All in Rust. All under a millisecond.

Your App
CostShield
AI Provider
01

Authenticate

Validate x-costshield-key, resolve organization, project, and API key. Check permissions and plan limits.

02

Detect Provider

Auto-detect which AI provider is being targeted from the request path and headers. Route to correct adapter.

03

Estimate Cost

Pre-compute expected cost from token estimation, model pricing, and billing type (per-token, per-image, per-character, per-second).

04

Check Budget

Compare estimated cost against per-minute, hourly, daily, and monthly budget limits. Block or degrade if exceeded.

05

Circuit Breaker

Evaluate 5-state financial circuit breaker. Escalate through Warn → Critical → Emergency → Open based on budget utilization.

06

Loop Detection

Graph-based cycle detection for agentic workflows. Catches recursive call chains before they become infinite cost spirals.

07

Check Cache

Moka LRU semantic cache lookup. If hit: return cached response instantly, zero tokens consumed, zero cost.

08

Reserve Cost

Provisionally reserve the estimated cost from the key's budget. Prevents over-commitment during concurrent requests.

09

Forward Request

Proxy the request to the upstream AI provider. Support for both standard HTTP and SSE streaming responses.

10

Parse Response

Extract actual usage metrics from the provider's response. Tokens consumed, model used, finish reason, latency.

11

True-Up Cost

Reconcile actual cost vs. estimated. Release excess reservation or charge additional. Update sliding window buckets.

12

Return Response

Add x-costshield-cost, x-costshield-budget-remaining headers. Return the AI response to the application.

15+ providers. 50+ models. One proxy.

From OpenAI to self-hosted Ollama — CostShield understands every provider's pricing model, billing type, and API format. Unified cost control for all.

OpenAI
Anthropic
Google Gemini
DeepSeek
Mistral
Groq
Together AI
Fireworks AI
Cohere
Perplexity
xAI (Grok)
AWS Bedrock
Azure OpenAI
OpenRouter
Replicate
ElevenLabs
Self-Hosted / Ollama
Per Token Per Token (Tiered) Per Image Per Character Per Second (Audio) Per Compute Time Per Request

Validated against 7 attack vectors

CostShield ships with a built-in DoW Attack Simulator that throws 7 distinct attack patterns at your gateway. Every defense is proven, not theoretical.

01

Volume Flood

10,000 requests in rapid succession to overwhelm budget tracking.

02

Prompt Inflation

100K-token payloads designed to maximize per-request cost.

03

Agentic Loop

Recursive agent chains that trigger infinite call cascades.

04

Model Exploitation

Targeting the most expensive models (GPT-4.1, Claude Opus) to drain budgets fast.

05

Gradual Ramp

Slow cost escalation that stays below anomaly thresholds until it's too late.

06

Burst Storm

Short, intense bursts designed to slip through rate-limit windows.

07

Multi-Key Scatter

Distributed attacks across multiple API keys to evade per-key detection.

cargo run -p dow-simulator --release
Volume Flood — BLOCKED — 10,000 requests, $0.00 leaked
Prompt Inflation — BLOCKED — 100K tokens/req, budget enforced
Agentic Loop — BLOCKED — cycle detected at depth 3
Model Exploitation — DEGRADED — GPT-4.1 → GPT-4.1-nano
Gradual Ramp — DETECTED — CUSUM anomaly at request #47
Burst Storm — BLOCKED — $/min budget exceeded
Multi-Key Scatter — BLOCKED — org-level budget enforced
 
Result: 7/7 attacks neutralized. $0.00 total leakage.

Rust-powered. Production-grade.

Gateway Proxy
Rust + Hyper + Tokio

HTTP/1.1 reverse proxy with SSE streaming support. Full 12-step pipeline. Sub-millisecond overhead. Lock-free concurrency with DashMap and atomic CAS.

Management API
Rust + Axum 0.8

RESTful CRUD for orgs, projects, keys, and alerts. Real-time WebSocket cost events. Prometheus metrics endpoint for observability.

Dashboard
Next.js 15 + React 19 + Tailwind 4

Real-time cost monitoring, budget health visualization, provider analytics, anomaly alert feed. WebSocket-powered live updates.

Persistence
PostgreSQL 16 + TimescaleDB

Write-through cache: DashMap in-memory + async background writes. TimescaleDB hypertables for time-series cost data. Full migration system.

SDKs for every stack

Native SDKs for Python, Node.js, and Go. Full API coverage for budget management, usage analytics, and alert configuration.

pip install costshield
from costshield import CostShieldClient

client = CostShieldClient(
    "http://localhost:3000",
    "your-admin-key"
)

# Check budget status
budget = client.get_budget_status()
print(f"Daily spend: ${budget.daily_spent_usd}")
print(f"Remaining:   ${budget.daily_remaining_usd}")

# Set alert
client.create_alert(
    threshold_usd=50.0,
    channel="slack"
)
npm install @costshield/sdk
import { CostShieldClient } from '@costshield/sdk'

const client = new CostShieldClient(
  'http://localhost:3000',
  'your-admin-key'
)

// Check budget status
const budget = await client.getBudgetStatus()
console.log(`Daily spend: $${budget.dailySpentUsd}`)

// Real-time cost stream
client.onCostEvent((event) => {
  console.log(`$${event.cost_usd}${event.model}`)
})
go get github.com/san-techie21/astra-costshield/sdks/go
package main

import "github.com/san-techie21/astra-costshield/sdks/go/costshield"

func main() {
    client := costshield.NewClient(
        "http://localhost:3000",
        "your-admin-key",
    )

    // Check budget status
    budget, err := client.GetBudgetStatus(ctx)
    fmt.Printf("Daily spend: $%.2f\n", budget.DailySpentUsd)
}

Docker. Kubernetes. Terraform. Your call.

Docker

docker compose up -d

Kubernetes

helm install costshield

Terraform

terraform apply

From Source

cargo run -p costshield-gateway

Open source core. Free forever.

Self-host the full engine at zero cost. Managed cloud coming soon.

Community
$0
forever / self-hosted
  • Full 12-step proxy pipeline
  • Budget-denominated rate limiting
  • 4-algorithm anomaly detection
  • Financial circuit breaker
  • Graceful model degradation
  • Semantic response cache
  • Agentic loop detection
  • 15+ provider adapters
  • Multi-tenant hierarchy
  • Dashboard + REST API
  • Docker + Helm + Terraform
  • Python, Node.js, Go SDKs
Get Started Free

Stop burning money on AI APIs.

Deploy CostShield in minutes. Protect your budget from day one.