Architecture

This document describes the system architecture, data flow, components, and security model of Metrx.

System Overview

Metrx is a distributed system for real-time cost tracking and outcome management of AI agents:

┌─────────────────────────────────────────────────────────────────┐
│ Client Applications (Your Code)                                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ Metrx SDK / HTTP Client                           │   │
│  │ - Routes requests through Gateway                        │   │
│  │ - Adds authentication & tracking headers                 │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                   │
└──────────────────────────────┬──────────────────────────────────┘

                        HTTP/JSON

┌──────────────────────────────▼──────────────────────────────────┐
│ Metrx Gateway (Cloudflare Worker)                         │
│ Runs on edge, sub-millisecond latency                            │
├──────────────────────────────────────────────────────────────────┤
│ 1. Authenticate API key (KV cache → Supabase)                    │
│ 2. Check usage limits against monthly quota                      │
│ 3. Extract custom headers (X-Agent-ID, X-Session-ID, etc.)       │
│ 4. Route to correct LLM provider (OpenAI, Anthropic, etc.)       │
│ 5. Proxy request with provider auth                              │
│ 6. Stream response back to client                                │
│ 7. Calculate cost from token counts                              │
│ 8. Queue event to Redis (fire-and-forget)                        │
└──────────────────────────────┬──────────────────────────────────┘

         ┌─────────────────────┴──────────────┬──────────────┐
         │                                    │              │
    HTTP/JSON                          HTTPS                │
         │                                    │              │
    ┌────▼────────────────────────┐  ┌───────▼──────┐  ┌────▼──────┐
    │ LLM Providers               │  │ Cloudflare KV│  │ Upstash   │
    ├─────────────────────────────┤  │ (API Key     │  │ Redis     │
    │ - OpenAI                    │  │  Cache)      │  │ (Event    │
    │ - Anthropic                 │  └──────────────┘  │  Queue)   │
    │ - Google (Gemini)           │                    └───────┬────┘
    │ - xAI (Grok)                │                           │
    │ - Others                    │                   Event Stream
    └─────────────────────────────┘                           │

         ┌────────────────────────────────────────────────────▼─────┐
         │ Worker Processes (BullMQ)                                 │
         ├──────────────────────────────────────────────────────────┤
         │ 1. Consume events from Redis queue                       │
         │ 2. Enrich event data (org lookup, pricing)               │
         │ 3. Write to Supabase (events, usage counters)            │
         │ 4. Trigger webhooks                                      │
         │ 5. Update real-time metrics                              │
         │ 6. Process inferred outcomes                             │
         └──────────────────┬───────────────────────────────────────┘

                   PostgreSQL/HTTPS

         ┌──────────────────▼──────────────────┐
         │ Supabase (PostgreSQL + Auth)         │
         ├───────────────────────────────────────┤
         │ Tables:                               │
         │ - organizations                       │
         │ - api_keys                            │
         │ - events (LLM calls)                  │
         │ - sessions                            │
         │ - outcomes                            │
         │ - monthly_usage_counters              │
         │ - webhooks                            │
         └──────────────────┬──────────────────┘

┌───────────────────────────▼──────────────────────────┐
│ Web Dashboard (Next.js)                              │
├───────────────────────────────────────────────────────┤
│ - Real-time cost dashboards                          │
│ - Agent & session metrics                            │
│ - Team management                                    │
│ - Webhook configuration                              │
│ - Billing integration (Stripe)                       │
└───────────────────────────────────────────────────────┘

Component Details

1. Gateway (Cloudflare Worker)

Purpose: Transparent proxy for LLM API calls with sub-millisecond overhead.

Location: /apps/gateway

Key Features:

  • Runs globally on Cloudflare edge (low latency)
  • API key authentication with KV caching
  • Transparent request forwarding (OpenAI API compatible)
  • Token usage extraction & cost calculation
  • Real-time event queueing to Redis

Latency Budget: <100ms p95 added latency

Critical Dependencies:

  • Cloudflare KV (API key cache)
  • Upstash Redis (event queue)
  • Supabase (org lookup on cache miss)
  • LLM provider APIs (OpenAI, Anthropic, Google, xAI)

Failure Modes:

  • Auth cache miss + Redis down: Returns 503 (won’t process requests)
  • Provider timeout: Returns 504 with retry information
  • Request size > 10MB: Returns 413
  • Rate limit exceeded: Returns 429

2. Web Dashboard (Next.js)

Purpose: User interface for cost tracking, team management, and outcome tracking.

Location: /apps/web

Key Features:

  • Real-time cost dashboards (powered by Supabase subscriptions)
  • Agent and session drill-down
  • Team/user management (via Clerk)
  • Outcome tracking and business ROI calculation
  • Billing (Stripe integration)
  • Webhook management

Authentication: Clerk (OAuth + JWT)

Database Access: Supabase client (RLS enforced)

3. Worker Processes (BullMQ)

Purpose: Asynchronous processing of events from the event queue.

Location: /workers (or separate service)

Key Features:

  • Event dequeuing from Redis
  • Database writes (Supabase)
  • Webhook dispatching
  • Monthly usage counter updates
  • Session aggregation

Job Queue: BullMQ (Redis-backed)

Scaling: Horizontally scalable (multiple worker instances)

Processing Latency: P99 <5 seconds from event creation to database

4. Database (Supabase / PostgreSQL)

Purpose: Durable storage for organizations, events, sessions, outcomes, and configuration.

Key Tables:

organizations

CREATE TABLE organizations (
  id UUID PRIMARY KEY,
  name VARCHAR,
  tier VARCHAR, -- starter, lite, pro, business, enterprise
  is_active BOOLEAN,
  monthly_call_limit INTEGER,
  stripe_customer_id VARCHAR,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);

api_keys

CREATE TABLE api_keys (
  id UUID PRIMARY KEY,
  org_id UUID REFERENCES organizations(id),
  key_hash VARCHAR UNIQUE, -- SHA-256(api_key)
  name VARCHAR,
  last_used_at TIMESTAMP,
  created_at TIMESTAMP,
  is_active BOOLEAN
);

events

CREATE TABLE events (
  id UUID PRIMARY KEY,
  org_id UUID REFERENCES organizations(id),
  agent_key VARCHAR, -- custom identifier
  provider VARCHAR, -- openai, anthropic, google, xai
  model VARCHAR,
  input_tokens INTEGER,
  output_tokens INTEGER,
  cost_microcents INTEGER, -- 1/100,000th of a cent
  latency_ms INTEGER,
  status VARCHAR, -- success, error
  error_message VARCHAR,
  occurred_at TIMESTAMP,
  session_id VARCHAR, -- nullable, for grouping
  customer_id VARCHAR, -- nullable, for multi-tenant
  gateway_request_id VARCHAR, -- for tracing
  is_streaming BOOLEAN,
  is_cached BOOLEAN,
  input_length INTEGER,
  output_length INTEGER,
  created_at TIMESTAMP
);
 
-- Indexes for fast queries
CREATE INDEX idx_events_org_id ON events(org_id);
CREATE INDEX idx_events_occurred_at ON events(occurred_at);
CREATE INDEX idx_events_agent_key ON events(agent_key);
CREATE INDEX idx_events_session_id ON events(session_id);

sessions

CREATE TABLE sessions (
  id UUID PRIMARY KEY,
  org_id UUID REFERENCES organizations(id),
  agent_key VARCHAR,
  session_id VARCHAR,
  status VARCHAR, -- active, completed
  started_at TIMESTAMP,
  completed_at TIMESTAMP,
  event_count INTEGER,
  total_cost_microcents INTEGER,
  total_latency_ms INTEGER,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);

outcomes

CREATE TABLE outcomes (
  id UUID PRIMARY KEY,
  org_id UUID REFERENCES organizations(id),
  session_id VARCHAR REFERENCES sessions(id),
  name VARCHAR,
  confirmed BOOLEAN,
  confirmed_by VARCHAR, -- user_id
  confirmed_at TIMESTAMP,
  metadata JSONB, -- arbitrary outcome data
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);

monthly_usage_counters

CREATE TABLE monthly_usage_counters (
  id UUID PRIMARY KEY,
  org_id UUID REFERENCES organizations(id),
  year INTEGER,
  month INTEGER,
  call_count INTEGER,
  total_cost_microcents INTEGER,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);

webhooks

CREATE TABLE webhooks (
  id UUID PRIMARY KEY,
  org_id UUID REFERENCES organizations(id),
  url VARCHAR,
  events VARCHAR[], -- ['event.created', 'outcome.confirmed']
  secret VARCHAR, -- HMAC secret for signature
  is_active BOOLEAN,
  last_triggered_at TIMESTAMP,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);

Row-Level Security (RLS): All tables use RLS policies to ensure users only access their organization’s data.


Data Flow

Request Path (Real-Time, Synchronous)

Client Request

Gateway receives request

Authenticate API key
  ├─ Check KV cache (fast path, 90% hit rate)
  └─ Miss? Query Supabase + cache result

Check usage limits
  ├─ Query monthly_usage_counters for org
  └─ Reject if over limit (return 429)

Extract custom headers (X-Agent-ID, X-Session-ID, etc.)

Parse request & resolve provider

Forward to LLM provider API
  ├─ Stream response back to client
  └─ Capture token usage from final chunk

Calculate cost (from model pricing + tokens)

Queue event to Redis (fire-and-forget, `<1ms`)

Return response to client

Total Latency to Client: Provider latency + <100ms gateway overhead

Event Processing Path (Async, Background)

Event in Redis queue

BullMQ worker dequeues event

Enrich event metadata
  └─ Org lookup, pricing verification

Write to Supabase
  ├─ Insert into events table
  └─ Increment monthly_usage_counters

Update session aggregates (if session_id present)

Trigger webhooks (HTTP POST to registered URLs)

Process inferred outcomes (ML-based)

Update real-time metrics for dashboard

Total Latency: P99 <5 seconds

Outcome Tracking Path

Business outcome occurs (e.g., customer satisfied)

Dashboard user confirms outcome (or via API)

Write to outcomes table

Trigger outcome.confirmed webhook

Calculate session ROI
  ├─ Total cost for session
  └─ Assign value to outcome

Update dashboards & reports

Security Model

API Key Security

  1. Generation: Client generates random 32-byte key
  2. Storage: Only SHA-256 hash stored in database
  3. Transmission: Over TLS 1.3 only
  4. Caching: Hashed key cached in Cloudflare KV for 1 hour
  5. Rotation: Users can rotate keys anytime; old key invalidated immediately

Network Security

  1. TLS 1.3: All external communication encrypted
  2. HTTPS Only: Gateway rejects HTTP requests
  3. CORS: Restricted to configured origins (default: open in dev, restricted in prod)
  4. Rate Limiting: Per-org limits + per-IP burst limits
  5. Request Size: Max 10MB per request

Data Isolation

  1. Row-Level Security: Supabase RLS policies enforce org isolation
  2. API Keys: Scoped to single org
  3. Webhooks: Only receive events from their org
  4. Dashboard: Users only see their org’s data

Authentication & Authorization

Gateway: API key + org_id lookup

Web Dashboard: Clerk OAuth + JWT session

Webhooks: HMAC-SHA256 signature verification


Performance Characteristics

Latency (p95)

OperationLatencyNotes
Health check1msNo auth required
Auth (cache hit)5msKV lookup
Auth (cache miss)50msSupabase query + caching
Chat completionProvider + 50msGateway overhead
Streaming (end-to-end)ProviderTransparent passthrough
Event logging1msFire-and-forget to Redis
Event processing<5sBullMQ worker P99

Throughput

TierRPM LimitRationale
Starter10Free tier, no burst
Lite10010K calls/mo ÷ 24h ÷ 60m = 6.9 RPM avg
Pro500100K calls/mo ÷ 24h ÷ 60m = 69 RPM avg
Business5,0001M calls/mo ÷ 24h ÷ 60m = 694 RPM avg

Scalability

  • Gateway: Horizontally scalable across Cloudflare edge locations
  • Workers: Horizontally scalable (add more worker instances)
  • Database: Supabase handles auto-scaling; events table can store years of data
  • Redis: Upstash Redis auto-scales; queue typically empty (sub-second processing)

Integration Points

LLM Providers

The Gateway proxies to:

  • OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo, etc.
  • Anthropic: Claude 3 (Opus, Sonnet, Haiku)
  • Google: Gemini Pro
  • xAI: Grok

Each provider has its own authentication & API format, abstracted by the gateway.

Billing Integration

  • Stripe: Monthly subscription billing + usage-based overage charges
  • Webhook: Triggered when usage exceeds plan limit

Observability

  • Sentry: Error tracking for gateway and workers
  • OpenTelemetry: Traces exported to your observability backend
  • Webhooks: Real-time event stream for custom logging

Deployment Architecture

Production

Cloudflare (Global CDN)

  ├─ Gateway (Cloudflare Workers)
  ├─ KV (API key cache)
  └─ Pages (Static assets for docs)

AWS / Vercel
  ├─ Web Dashboard (Next.js)
  └─ Worker processes (EC2 / Vercel Functions)

Supabase (Managed PostgreSQL)
  └─ All persistent data

Upstash (Managed Redis)
  └─ Event queue

Stripe
  └─ Billing & payments

Clerk
  └─ Authentication & user management

Self-Hosting

See Self-Hosting Guide for details on running on your infrastructure.


Monitoring & Observability

Key Metrics

Real-Time (Dashboard):

  • Cost per agent (last 24h)
  • Call volume by model
  • Avg latency
  • Error rate

Historical (Reports):

  • Cost trends (daily, weekly, monthly)
  • Model usage distribution
  • Customer chargeback calculation
  • Outcome success rate & ROI

Alerts

  • Cost spike detected (> threshold)
  • Rate limit approached (> 80% of quota)
  • Provider error rate spike
  • Webhook delivery failures

Future Enhancements

  • Model fine-tuning recommendations
  • Automatic prompt optimization
  • Budget forecasting & alerts
  • Team collaboration features
  • Advanced analytics & ML-powered insights

Next Steps: See Self-Hosting Guide for deployment details or API Reference for integration.