This document describes the system architecture, data flow, components, and security model of Metrx.
System Overview
Metrx is a distributed system for real-time cost tracking and outcome management of AI agents:
┌─────────────────────────────────────────────────────────────────┐
│ Client Applications (Your Code) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Metrx SDK / HTTP Client │ │
│ │ - Routes requests through Gateway │ │
│ │ - Adds authentication & tracking headers │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────┬──────────────────────────────────┘
│
HTTP/JSON
│
┌──────────────────────────────▼──────────────────────────────────┐
│ Metrx Gateway (Cloudflare Worker) │
│ Runs on edge, sub-millisecond latency │
├──────────────────────────────────────────────────────────────────┤
│ 1. Authenticate API key (KV cache → Supabase) │
│ 2. Check usage limits against monthly quota │
│ 3. Extract custom headers (X-Agent-ID, X-Session-ID, etc.) │
│ 4. Route to correct LLM provider (OpenAI, Anthropic, etc.) │
│ 5. Proxy request with provider auth │
│ 6. Stream response back to client │
│ 7. Calculate cost from token counts │
│ 8. Queue event to Redis (fire-and-forget) │
└──────────────────────────────┬──────────────────────────────────┘
│
┌─────────────────────┴──────────────┬──────────────┐
│ │ │
HTTP/JSON HTTPS │
│ │ │
┌────▼────────────────────────┐ ┌───────▼──────┐ ┌────▼──────┐
│ LLM Providers │ │ Cloudflare KV│ │ Upstash │
├─────────────────────────────┤ │ (API Key │ │ Redis │
│ - OpenAI │ │ Cache) │ │ (Event │
│ - Anthropic │ └──────────────┘ │ Queue) │
│ - Google (Gemini) │ └───────┬────┘
│ - xAI (Grok) │ │
│ - Others │ Event Stream
└─────────────────────────────┘ │
│
┌────────────────────────────────────────────────────▼─────┐
│ Worker Processes (BullMQ) │
├──────────────────────────────────────────────────────────┤
│ 1. Consume events from Redis queue │
│ 2. Enrich event data (org lookup, pricing) │
│ 3. Write to Supabase (events, usage counters) │
│ 4. Trigger webhooks │
│ 5. Update real-time metrics │
│ 6. Process inferred outcomes │
└──────────────────┬───────────────────────────────────────┘
│
PostgreSQL/HTTPS
│
┌──────────────────▼──────────────────┐
│ Supabase (PostgreSQL + Auth) │
├───────────────────────────────────────┤
│ Tables: │
│ - organizations │
│ - api_keys │
│ - events (LLM calls) │
│ - sessions │
│ - outcomes │
│ - monthly_usage_counters │
│ - webhooks │
└──────────────────┬──────────────────┘
│
┌───────────────────────────▼──────────────────────────┐
│ Web Dashboard (Next.js) │
├───────────────────────────────────────────────────────┤
│ - Real-time cost dashboards │
│ - Agent & session metrics │
│ - Team management │
│ - Webhook configuration │
│ - Billing integration (Stripe) │
└───────────────────────────────────────────────────────┘Component Details
1. Gateway (Cloudflare Worker)
Purpose: Transparent proxy for LLM API calls with sub-millisecond overhead.
Location: /apps/gateway
Key Features:
- Runs globally on Cloudflare edge (low latency)
- API key authentication with KV caching
- Transparent request forwarding (OpenAI API compatible)
- Token usage extraction & cost calculation
- Real-time event queueing to Redis
Latency Budget: <100ms p95 added latency
Critical Dependencies:
- Cloudflare KV (API key cache)
- Upstash Redis (event queue)
- Supabase (org lookup on cache miss)
- LLM provider APIs (OpenAI, Anthropic, Google, xAI)
Failure Modes:
- Auth cache miss + Redis down: Returns 503 (won’t process requests)
- Provider timeout: Returns 504 with retry information
- Request size > 10MB: Returns 413
- Rate limit exceeded: Returns 429
2. Web Dashboard (Next.js)
Purpose: User interface for cost tracking, team management, and outcome tracking.
Location: /apps/web
Key Features:
- Real-time cost dashboards (powered by Supabase subscriptions)
- Agent and session drill-down
- Team/user management (via Clerk)
- Outcome tracking and business ROI calculation
- Billing (Stripe integration)
- Webhook management
Authentication: Clerk (OAuth + JWT)
Database Access: Supabase client (RLS enforced)
3. Worker Processes (BullMQ)
Purpose: Asynchronous processing of events from the event queue.
Location: /workers (or separate service)
Key Features:
- Event dequeuing from Redis
- Database writes (Supabase)
- Webhook dispatching
- Monthly usage counter updates
- Session aggregation
Job Queue: BullMQ (Redis-backed)
Scaling: Horizontally scalable (multiple worker instances)
Processing Latency: P99 <5 seconds from event creation to database
4. Database (Supabase / PostgreSQL)
Purpose: Durable storage for organizations, events, sessions, outcomes, and configuration.
Key Tables:
organizations
CREATE TABLE organizations (
id UUID PRIMARY KEY,
name VARCHAR,
tier VARCHAR, -- starter, lite, pro, business, enterprise
is_active BOOLEAN,
monthly_call_limit INTEGER,
stripe_customer_id VARCHAR,
created_at TIMESTAMP,
updated_at TIMESTAMP
);api_keys
CREATE TABLE api_keys (
id UUID PRIMARY KEY,
org_id UUID REFERENCES organizations(id),
key_hash VARCHAR UNIQUE, -- SHA-256(api_key)
name VARCHAR,
last_used_at TIMESTAMP,
created_at TIMESTAMP,
is_active BOOLEAN
);events
CREATE TABLE events (
id UUID PRIMARY KEY,
org_id UUID REFERENCES organizations(id),
agent_key VARCHAR, -- custom identifier
provider VARCHAR, -- openai, anthropic, google, xai
model VARCHAR,
input_tokens INTEGER,
output_tokens INTEGER,
cost_microcents INTEGER, -- 1/100,000th of a cent
latency_ms INTEGER,
status VARCHAR, -- success, error
error_message VARCHAR,
occurred_at TIMESTAMP,
session_id VARCHAR, -- nullable, for grouping
customer_id VARCHAR, -- nullable, for multi-tenant
gateway_request_id VARCHAR, -- for tracing
is_streaming BOOLEAN,
is_cached BOOLEAN,
input_length INTEGER,
output_length INTEGER,
created_at TIMESTAMP
);
-- Indexes for fast queries
CREATE INDEX idx_events_org_id ON events(org_id);
CREATE INDEX idx_events_occurred_at ON events(occurred_at);
CREATE INDEX idx_events_agent_key ON events(agent_key);
CREATE INDEX idx_events_session_id ON events(session_id);sessions
CREATE TABLE sessions (
id UUID PRIMARY KEY,
org_id UUID REFERENCES organizations(id),
agent_key VARCHAR,
session_id VARCHAR,
status VARCHAR, -- active, completed
started_at TIMESTAMP,
completed_at TIMESTAMP,
event_count INTEGER,
total_cost_microcents INTEGER,
total_latency_ms INTEGER,
created_at TIMESTAMP,
updated_at TIMESTAMP
);outcomes
CREATE TABLE outcomes (
id UUID PRIMARY KEY,
org_id UUID REFERENCES organizations(id),
session_id VARCHAR REFERENCES sessions(id),
name VARCHAR,
confirmed BOOLEAN,
confirmed_by VARCHAR, -- user_id
confirmed_at TIMESTAMP,
metadata JSONB, -- arbitrary outcome data
created_at TIMESTAMP,
updated_at TIMESTAMP
);monthly_usage_counters
CREATE TABLE monthly_usage_counters (
id UUID PRIMARY KEY,
org_id UUID REFERENCES organizations(id),
year INTEGER,
month INTEGER,
call_count INTEGER,
total_cost_microcents INTEGER,
created_at TIMESTAMP,
updated_at TIMESTAMP
);webhooks
CREATE TABLE webhooks (
id UUID PRIMARY KEY,
org_id UUID REFERENCES organizations(id),
url VARCHAR,
events VARCHAR[], -- ['event.created', 'outcome.confirmed']
secret VARCHAR, -- HMAC secret for signature
is_active BOOLEAN,
last_triggered_at TIMESTAMP,
created_at TIMESTAMP,
updated_at TIMESTAMP
);Row-Level Security (RLS): All tables use RLS policies to ensure users only access their organization’s data.
Data Flow
Request Path (Real-Time, Synchronous)
Client Request
↓
Gateway receives request
↓
Authenticate API key
├─ Check KV cache (fast path, 90% hit rate)
└─ Miss? Query Supabase + cache result
↓
Check usage limits
├─ Query monthly_usage_counters for org
└─ Reject if over limit (return 429)
↓
Extract custom headers (X-Agent-ID, X-Session-ID, etc.)
↓
Parse request & resolve provider
↓
Forward to LLM provider API
├─ Stream response back to client
└─ Capture token usage from final chunk
↓
Calculate cost (from model pricing + tokens)
↓
Queue event to Redis (fire-and-forget, `<1ms`)
↓
Return response to clientTotal Latency to Client: Provider latency + <100ms gateway overhead
Event Processing Path (Async, Background)
Event in Redis queue
↓
BullMQ worker dequeues event
↓
Enrich event metadata
└─ Org lookup, pricing verification
↓
Write to Supabase
├─ Insert into events table
└─ Increment monthly_usage_counters
↓
Update session aggregates (if session_id present)
↓
Trigger webhooks (HTTP POST to registered URLs)
↓
Process inferred outcomes (ML-based)
↓
Update real-time metrics for dashboardTotal Latency: P99 <5 seconds
Outcome Tracking Path
Business outcome occurs (e.g., customer satisfied)
↓
Dashboard user confirms outcome (or via API)
↓
Write to outcomes table
↓
Trigger outcome.confirmed webhook
↓
Calculate session ROI
├─ Total cost for session
└─ Assign value to outcome
↓
Update dashboards & reportsSecurity Model
API Key Security
- Generation: Client generates random 32-byte key
- Storage: Only SHA-256 hash stored in database
- Transmission: Over TLS 1.3 only
- Caching: Hashed key cached in Cloudflare KV for 1 hour
- Rotation: Users can rotate keys anytime; old key invalidated immediately
Network Security
- TLS 1.3: All external communication encrypted
- HTTPS Only: Gateway rejects HTTP requests
- CORS: Restricted to configured origins (default: open in dev, restricted in prod)
- Rate Limiting: Per-org limits + per-IP burst limits
- Request Size: Max 10MB per request
Data Isolation
- Row-Level Security: Supabase RLS policies enforce org isolation
- API Keys: Scoped to single org
- Webhooks: Only receive events from their org
- Dashboard: Users only see their org’s data
Authentication & Authorization
Gateway: API key + org_id lookup
Web Dashboard: Clerk OAuth + JWT session
Webhooks: HMAC-SHA256 signature verification
Performance Characteristics
Latency (p95)
| Operation | Latency | Notes |
|---|---|---|
| Health check | 1ms | No auth required |
| Auth (cache hit) | 5ms | KV lookup |
| Auth (cache miss) | 50ms | Supabase query + caching |
| Chat completion | Provider + 50ms | Gateway overhead |
| Streaming (end-to-end) | Provider | Transparent passthrough |
| Event logging | 1ms | Fire-and-forget to Redis |
| Event processing | <5s | BullMQ worker P99 |
Throughput
| Tier | RPM Limit | Rationale |
|---|---|---|
| Starter | 10 | Free tier, no burst |
| Lite | 100 | 10K calls/mo ÷ 24h ÷ 60m = 6.9 RPM avg |
| Pro | 500 | 100K calls/mo ÷ 24h ÷ 60m = 69 RPM avg |
| Business | 5,000 | 1M calls/mo ÷ 24h ÷ 60m = 694 RPM avg |
Scalability
- Gateway: Horizontally scalable across Cloudflare edge locations
- Workers: Horizontally scalable (add more worker instances)
- Database: Supabase handles auto-scaling; events table can store years of data
- Redis: Upstash Redis auto-scales; queue typically empty (sub-second processing)
Integration Points
LLM Providers
The Gateway proxies to:
- OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo, etc.
- Anthropic: Claude 3 (Opus, Sonnet, Haiku)
- Google: Gemini Pro
- xAI: Grok
Each provider has its own authentication & API format, abstracted by the gateway.
Billing Integration
- Stripe: Monthly subscription billing + usage-based overage charges
- Webhook: Triggered when usage exceeds plan limit
Observability
- Sentry: Error tracking for gateway and workers
- OpenTelemetry: Traces exported to your observability backend
- Webhooks: Real-time event stream for custom logging
Deployment Architecture
Production
Cloudflare (Global CDN)
↓
├─ Gateway (Cloudflare Workers)
├─ KV (API key cache)
└─ Pages (Static assets for docs)
AWS / Vercel
├─ Web Dashboard (Next.js)
└─ Worker processes (EC2 / Vercel Functions)
Supabase (Managed PostgreSQL)
└─ All persistent data
Upstash (Managed Redis)
└─ Event queue
Stripe
└─ Billing & payments
Clerk
└─ Authentication & user managementSelf-Hosting
See Self-Hosting Guide for details on running on your infrastructure.
Monitoring & Observability
Key Metrics
Real-Time (Dashboard):
- Cost per agent (last 24h)
- Call volume by model
- Avg latency
- Error rate
Historical (Reports):
- Cost trends (daily, weekly, monthly)
- Model usage distribution
- Customer chargeback calculation
- Outcome success rate & ROI
Alerts
- Cost spike detected (> threshold)
- Rate limit approached (> 80% of quota)
- Provider error rate spike
- Webhook delivery failures
Future Enhancements
- Model fine-tuning recommendations
- Automatic prompt optimization
- Budget forecasting & alerts
- Team collaboration features
- Advanced analytics & ML-powered insights
Next Steps: See Self-Hosting Guide for deployment details or API Reference for integration.