The World's First AI Cost Firewall →

Your AI Spend Has
No Firewall.
Until Now.

A transparent reverse-proxy that sits between your apps and AI providers. Real-time budget enforcement, anomaly detection, and Denial-of-Wallet attack prevention — with zero code changes.

View on GitHub See the 12-Step Pipeline

AI Providers

Models Tracked

Step Pipeline

<1ms

Proxy Overhead

Tests Passing

Zero Code Changes

# Before — direct to OpenAI

client = OpenAI(base_url="https://api.openai.com/v1")

# After — through CostShield (that's it!)

client = OpenAI(base_url="http://localhost:8080/v1")

# CostShield handles everything: auth, budget, anomaly detection,

# circuit breaker, caching, cost tracking... transparently.

The Problem

AI APIs are a financial black hole

Every token costs money. Without guardrails, a single misconfigured agent loop can burn $10,000+ in minutes. You only find out when the invoice arrives.

Runaway AI Bills

A misconfigured workflow, a retry loop, or an ambitious agent can rack up five-figure bills overnight. No provider will stop it.

$10K+ burned in one night

Denial-of-Wallet Attacks

Malicious actors weaponize your API keys — flooding expensive models, inflating prompts, or triggering recursive agent chains to drain your budget.

7 known attack vectors

Zero Spending Visibility

No real-time view of per-key, per-project, per-model costs. No budget enforcement. No anomaly alerts. You're flying blind.

0 guardrails by default

Core Engine

8 defense layers. One proxy.

Every request passes through a battle-tested pipeline. Budget enforcement, anomaly detection, circuit breaking, caching — all at sub-millisecond speed.

Budget-Denominated Rate Limiting

Rate limits in dollars, not requests. Set $/min, $/hr, daily, and monthly caps per API key. Micro-USD precision with atomic CAS operations for lock-free concurrency.

CARLE Algorithm

4-Algorithm Anomaly Detection

Z-Score (3σ deviation), IQR (quartile fences), EMA (exponential moving average), and CUSUM (cumulative sum change-point detection). Consensus-based scoring — configurable to require any, majority, or all algorithms to agree.

Multi-Algorithm Ensemble

Financial Circuit Breaker

Stock-market inspired 5-state protection: Closed → Warn (50%) → Critical (75%) → Emergency (90%) → Open (100%). Immediate escalation, cooldown-gated de-escalation.

Graceful Degradation

When budgets run low, auto-downgrade models (GPT-4.1 → GPT-4.1-mini → GPT-4.1-nano) instead of blocking. Also reduces max_tokens and disables streaming. Service stays up.

Agentic Loop Detection

Graph-based cycle detection catches runaway agent chains before they spiral. Tracks call depth, semantic similarity, and pattern frequency across agentic workflows.

Semantic Response Cache

Moka LRU cache deduplicates identical requests. Zero tokens consumed, zero cost incurred. Sub-microsecond lookup latency. Massive savings on repetitive queries.

Mid-Stream SSE Enforcement

Even streaming responses are monitored in real-time. If budget is exceeded mid-stream, CostShield terminates the SSE connection immediately — no runaway streaming costs.

Multi-Tenant Hierarchy

Organization → Project → API Key. Per-key budgets, per-project limits, per-org spending caps. Plan-based rate limits with full isolation between tenants.

Architecture

The 12-Step Request Pipeline

Every AI API call passes through 12 security and cost-control stages. All in Rust. All under a millisecond.

Your App

CostShield

AI Provider

Authenticate

Validate x-costshield-key, resolve organization, project, and API key. Check permissions and plan limits.

Detect Provider

Auto-detect which AI provider is being targeted from the request path and headers. Route to correct adapter.

Estimate Cost

Pre-compute expected cost from token estimation, model pricing, and billing type (per-token, per-image, per-character, per-second).

Check Budget

Compare estimated cost against per-minute, hourly, daily, and monthly budget limits. Block or degrade if exceeded.

Circuit Breaker

Evaluate 5-state financial circuit breaker. Escalate through Warn → Critical → Emergency → Open based on budget utilization.

Loop Detection

Graph-based cycle detection for agentic workflows. Catches recursive call chains before they become infinite cost spirals.

Check Cache

Moka LRU semantic cache lookup. If hit: return cached response instantly, zero tokens consumed, zero cost.

Reserve Cost

Provisionally reserve the estimated cost from the key's budget. Prevents over-commitment during concurrent requests.

Forward Request

Proxy the request to the upstream AI provider. Support for both standard HTTP and SSE streaming responses.

Parse Response

Extract actual usage metrics from the provider's response. Tokens consumed, model used, finish reason, latency.

True-Up Cost

Reconcile actual cost vs. estimated. Release excess reservation or charge additional. Update sliding window buckets.

Return Response

Add x-costshield-cost, x-costshield-budget-remaining headers. Return the AI response to the application.

Universal Support

15+ providers. 50+ models. One proxy.

From OpenAI to self-hosted Ollama — CostShield understands every provider's pricing model, billing type, and API format. Unified cost control for all.

OpenAI

Anthropic

Google Gemini

DeepSeek

Mistral

Groq

Together AI

Fireworks AI

Cohere

Perplexity

xAI (Grok)

AWS Bedrock

Azure OpenAI

OpenRouter

Replicate

ElevenLabs

Self-Hosted / Ollama

Per Token Per Token (Tiered) Per Image Per Character Per Second (Audio) Per Compute Time Per Request

Battle-Tested

Validated against 7 attack vectors

CostShield ships with a built-in DoW Attack Simulator that throws 7 distinct attack patterns at your gateway. Every defense is proven, not theoretical.

Volume Flood

10,000 requests in rapid succession to overwhelm budget tracking.

Prompt Inflation

100K-token payloads designed to maximize per-request cost.

Agentic Loop

Recursive agent chains that trigger infinite call cascades.

Model Exploitation

Targeting the most expensive models (GPT-4.1, Claude Opus) to drain budgets fast.

Gradual Ramp

Slow cost escalation that stays below anomaly thresholds until it's too late.

Burst Storm

Short, intense bursts designed to slip through rate-limit windows.

Multi-Key Scatter

Distributed attacks across multiple API keys to evade per-key detection.

cargo run -p dow-simulator --release

✔ Volume Flood — BLOCKED — 10,000 requests, $0.00 leaked

✔ Prompt Inflation — BLOCKED — 100K tokens/req, budget enforced

✔ Agentic Loop — BLOCKED — cycle detected at depth 3

✔ Model Exploitation — DEGRADED — GPT-4.1 → GPT-4.1-nano

✔ Gradual Ramp — DETECTED — CUSUM anomaly at request #47

✔ Burst Storm — BLOCKED — $/min budget exceeded

✔ Multi-Key Scatter — BLOCKED — org-level budget enforced

Result: 7/7 attacks neutralized. $0.00 total leakage.

Built Different

Rust-powered. Production-grade.

Gateway Proxy

Rust + Hyper + Tokio

HTTP/1.1 reverse proxy with SSE streaming support. Full 12-step pipeline. Sub-millisecond overhead. Lock-free concurrency with DashMap and atomic CAS.

Management API

Rust + Axum 0.8

RESTful CRUD for orgs, projects, keys, and alerts. Real-time WebSocket cost events. Prometheus metrics endpoint for observability.

Dashboard

Next.js 15 + React 19 + Tailwind 4

Real-time cost monitoring, budget health visualization, provider analytics, anomaly alert feed. WebSocket-powered live updates.

Persistence

PostgreSQL 16 + TimescaleDB

Write-through cache: DashMap in-memory + async background writes. TimescaleDB hypertables for time-series cost data. Full migration system.

Developer Experience

SDKs for every stack

Native SDKs for Python, Node.js, and Go. Full API coverage for budget management, usage analytics, and alert configuration.

pip install costshield

from costshield import CostShieldClient

client = CostShieldClient(
    "http://localhost:3000",
    "your-admin-key"
)

# Check budget status
budget = client.get_budget_status()
print(f"Daily spend: ${budget.daily_spent_usd}")
print(f"Remaining:   ${budget.daily_remaining_usd}")

# Set alert
client.create_alert(
    threshold_usd=50.0,
    channel="slack"
)

npm install @costshield/sdk

import { CostShieldClient } from '@costshield/sdk'

const client = new CostShieldClient(
  'http://localhost:3000',
  'your-admin-key'
)

// Check budget status
const budget = await client.getBudgetStatus()
console.log(`Daily spend: $${budget.dailySpentUsd}`)

// Real-time cost stream
client.onCostEvent((event) => {
  console.log(`$${event.cost_usd} — ${event.model}`)
})

go get github.com/san-techie21/astra-costshield/sdks/go

package main

import "github.com/san-techie21/astra-costshield/sdks/go/costshield"

func main() {
    client := costshield.NewClient(
        "http://localhost:3000",
        "your-admin-key",
    )

    // Check budget status
    budget, err := client.GetBudgetStatus(ctx)
    fmt.Printf("Daily spend: $%.2f\n", budget.DailySpentUsd)
}

Deploy Anywhere

Docker. Kubernetes. Terraform. Your call.

Docker

docker compose up -d

Kubernetes

helm install costshield

Terraform

terraform apply

From Source

cargo run -p costshield-gateway

Pricing

Open source core. Free forever.

Self-host the full engine at zero cost. Managed cloud coming soon.

Community

forever / self-hosted

Full 12-step proxy pipeline
Budget-denominated rate limiting
4-algorithm anomaly detection
Financial circuit breaker
Graceful model degradation
Semantic response cache
Agentic loop detection
15+ provider adapters
Multi-tenant hierarchy
Dashboard + REST API
Docker + Helm + Terraform
Python, Node.js, Go SDKs

Get Started Free

Coming Soon

Cloud

TBD

managed service

Everything in Community
Managed infrastructure
Global edge deployment
SSO / SAML authentication
Slack & PagerDuty alerts
Custom SLAs
Dedicated support
SOC 2 compliance
Audit logging

Your AI Spend Has No Firewall. Until Now.