Self-Hosting

This guide covers deploying Metrx on your own infrastructure for complete control and data privacy.

Table of Contents

  1. Prerequisites
  2. Architecture Overview
  3. Environment Setup
  4. Database Setup
  5. Deploying the Gateway
  6. Deploying the Web Dashboard
  7. Running Background Workers
  8. SSL/TLS Configuration
  9. Monitoring & Logging
  10. Troubleshooting

Prerequisites

System Requirements

  • Node.js: 18.0.0 or higher
  • npm: 10.9.0 or higher
  • Docker: (optional) for containerized deployment
  • Kubernetes: (optional) for advanced deployments

External Services

  • PostgreSQL: 12+ (managed Supabase or self-hosted)
  • Redis: 6.0+ (for event queue and caching)
  • Clerk: (for authentication) or self-hosted auth solution
  • Stripe: (for billing) or payment provider
  • DNS: Domain name for Gateway and Web app

Infrastructure Options

  • VPS: AWS EC2, DigitalOcean, Linode, Hetzner
  • Kubernetes: GKE, EKS, AKS
  • Docker Compose: Single-machine deployment
  • Vercel/Netlify: For the Web dashboard only (Next.js)

Architecture Overview

Self-hosted Metrx consists of:

┌──────────────────────────────────────┐
│ Your Infrastructure                  │
├──────────────────────────────────────┤
│                                      │
│  ┌────────────────────────────────┐  │
│  │ Gateway (Node.js Worker)       │  │
│  │ Port: 8787 (HTTP proxy)        │  │
│  └────────────────────────────────┘  │
│                                      │
│  ┌────────────────────────────────┐  │
│  │ Web Dashboard (Next.js)        │  │
│  │ Port: 3000 (Web app)           │  │
│  └────────────────────────────────┘  │
│                                      │
│  ┌────────────────────────────────┐  │
│  │ Workers (Node.js / BullMQ)     │  │
│  │ Port: 9000 (internal only)     │  │
│  └────────────────────────────────┘  │
│                                      │
├──────────────────────────────────────┤
│ PostgreSQL (Database)                │
│ Redis (Event Queue)                  │
├──────────────────────────────────────┤
│ External Services (read-only)        │
│ - OpenAI, Anthropic, etc. APIs       │
│ - Stripe (billing)                   │
│ - Clerk (auth)                       │
└──────────────────────────────────────┘

Environment Setup

1. Clone the Repository

git clone https://github.com/metrxbot/metrxbot.git
cd metrxbot
npm install

2. Create Environment Files

Root .env.production:

# === App Configuration ===
NODE_ENV=production
NEXT_PUBLIC_APP_URL=https://app.yourdomain.com
NEXT_PUBLIC_GATEWAY_URL=https://gateway.yourdomain.com
 
# === Supabase (PostgreSQL) ===
NEXT_PUBLIC_SUPABASE_URL=https://your-supabase.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJxxx...
SUPABASE_SERVICE_ROLE_KEY=eyJxxx...
 
# === Clerk Authentication ===
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_live_xxx
CLERK_SECRET_KEY=sk_live_xxx
NEXT_PUBLIC_CLERK_SIGN_IN_URL=/sign-in
NEXT_PUBLIC_CLERK_SIGN_UP_URL=/sign-up
NEXT_PUBLIC_CLERK_AFTER_SIGN_IN_URL=/
NEXT_PUBLIC_CLERK_AFTER_SIGN_UP_URL=/onboarding
 
# === Stripe Billing ===
STRIPE_SECRET_KEY=sk_live_xxx
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=pk_live_xxx
STRIPE_WEBHOOK_SECRET=whsec_xxx
STRIPE_PRICE_LITE=price_xxx
STRIPE_PRICE_PRO=price_xxx
STRIPE_PRICE_BUSINESS=price_xxx
 
# === Redis (Event Queue) ===
UPSTASH_REDIS_REST_URL=https://your-redis.upstash.io
UPSTASH_REDIS_REST_TOKEN=xxx
 
# === Email (Resend) ===
RESEND_API_KEY=re_xxx
RESEND_FROM_EMAIL=Metrx `<noreply@metrxbot.com>`
 
# === Error Tracking (Sentry) ===
NEXT_PUBLIC_SENTRY_DSN=https://xxx@sentry.io/xxx
SENTRY_AUTH_TOKEN=sntrys_xxx
SENTRY_ORG=your-org
SENTRY_PROJECT=metrxbot
 
# === Feature Flags ===
NEXT_PUBLIC_DEMO_MODE=false
ALERT_EMAIL_ENABLED=true
ALERT_THRESHOLD_COST_SPIKE_PCT=200
ALERT_RATE_LIMIT_PER_ORG_HOUR=5

Gateway apps/gateway/.env.production:

For self-hosting, replace Cloudflare Workers with a Node.js worker:

# Create apps/gateway/.env.production
SUPABASE_URL=https://your-supabase.supabase.co
SUPABASE_SERVICE_KEY=eyJxxx...
UPSTASH_REDIS_URL=https://your-redis.upstash.io
UPSTASH_REDIS_TOKEN=xxx
ENVIRONMENT=production
ALLOWED_ORIGINS=https://app.yourdomain.com

3. Install Dependencies

npm install
npm run build

Database Setup

  1. Create account at supabase.com
  2. Create a new project
  3. Note your API URL and keys
  4. Run migrations:
# Install Supabase CLI
npm install -g supabase
 
# Link to your Supabase project
supabase link --project-ref your-project-ref
 
# Run migrations
supabase migration up

Option B: Self-Hosted PostgreSQL

With Docker Compose

Create docker-compose.yml:

version: '3.8'
 
services:
  postgres:
    image: postgres:15
    environment:
      POSTGRES_USER: metrxbot
      POSTGRES_PASSWORD: secure_password_here
      POSTGRES_DB: metrxbot
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U metrxbot"]
      interval: 10s
      timeout: 5s
      retries: 5
 
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
 
volumes:
  postgres_data:
  redis_data:

Start services:

docker-compose up -d

Create Database Schema

psql -h localhost -U metrxbot -d metrxbot -f supabase/migrations/001_init.sql
psql -h localhost -U metrxbot -d metrxbot -f supabase/migrations/002_indexes.sql
psql -h localhost -U metrxbot -d metrxbot -f supabase/migrations/003_rls.sql
# ... run all migrations

Option C: Managed PostgreSQL (AWS RDS, Azure Database)

  1. Create RDS instance
  2. Create database and user
  3. Update connection string in .env.production
  4. Run migrations

AWS RDS Example

# Create RDS instance
aws rds create-db-instance \
  --db-instance-identifier metrxbot \
  --db-instance-class db.t3.micro \
  --engine postgres \
  --master-username metrxbot \
  --master-user-password `<secure_password>`
 
# Get endpoint
aws rds describe-db-instances --db-instance-identifier metrxbot
 
# Update .env.production
POSTGRES_URL=postgresql://metrxbot:`<password>`@metrxbot.xxx.rds.amazonaws.com:5432/metrxbot

Deploying the Gateway

The Gateway is the critical request path. It should be highly available and low-latency.

Option A: Docker on Single Server

Build Docker image:

cd apps/gateway
docker build -t agentledger-gateway:latest .

Create .env.production:

SUPABASE_URL=https://your-supabase.supabase.co
SUPABASE_SERVICE_KEY=eyJxxx...
UPSTASH_REDIS_URL=https://your-redis.upstash.io
UPSTASH_REDIS_TOKEN=xxx
ENVIRONMENT=production
ALLOWED_ORIGINS=https://app.yourdomain.com
PORT=8787

Run container:

docker run -d \
  --name agentledger-gateway \
  -p 8787:8787 \
  --env-file .env.production \
  agentledger-gateway:latest

Option B: Kubernetes Deployment

Create k8s/gateway-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: agentledger-gateway
  labels:
    app: agentledger-gateway
spec:
  replicas: 3
  selector:
    matchLabels:
      app: agentledger-gateway
  template:
    metadata:
      labels:
        app: agentledger-gateway
    spec:
      containers:
      - name: gateway
        image: agentledger-gateway:latest
        ports:
        - containerPort: 8787
        env:
        - name: SUPABASE_URL
          valueFrom:
            configMapKeyRef:
              name: metrxbot-config
              key: supabase_url
        - name: SUPABASE_SERVICE_KEY
          valueFrom:
            secretKeyRef:
              name: metrxbot-secrets
              key: supabase_service_key
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8787
          initialDelaySeconds: 10
          periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: agentledger-gateway
spec:
  selector:
    app: agentledger-gateway
  ports:
  - protocol: TCP
    port: 8787
    targetPort: 8787
  type: LoadBalancer

Deploy:

kubectl apply -f k8s/gateway-deployment.yaml

Option C: AWS Lambda + API Gateway

Create serverless.yml:

service: agentledger-gateway
 
provider:
  name: aws
  runtime: nodejs18.x
  environment:
    SUPABASE_URL: ${env:SUPABASE_URL}
    SUPABASE_SERVICE_KEY: ${env:SUPABASE_SERVICE_KEY}
    UPSTASH_REDIS_URL: ${env:UPSTASH_REDIS_URL}
    UPSTASH_REDIS_TOKEN: ${env:UPSTASH_REDIS_TOKEN}
 
functions:
  gateway:
    handler: apps/gateway/src/index.handler
    events:
      - http:
          path: /{proxy+}
          method: ANY
          cors: true

Deploy:

npm install -g serverless
serverless deploy

Load Balancing & High Availability

With Nginx:

upstream gateway {
    least_conn;
    server 10.0.1.10:8787;
    server 10.0.1.11:8787;
    server 10.0.1.12:8787;
}
 
server {
    listen 443 ssl;
    server_name gateway.yourdomain.com;
 
    ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
 
    location / {
        proxy_pass http://gateway;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Deploying the Web Dashboard

The Web Dashboard is a Next.js application.

Option A: Vercel (Easiest)

npm install -g vercel
vercel login
vercel deploy --prod

Option B: Self-Hosted Docker

Build:

cd apps/web
docker build -t agentledger-web:latest .

Run:

docker run -d \
  --name agentledger-web \
  -p 3000:3000 \
  --env-file .env.production \
  agentledger-web:latest

Option C: Kubernetes

Create k8s/web-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: agentledger-web
spec:
  replicas: 2
  selector:
    matchLabels:
      app: agentledger-web
  template:
    metadata:
      labels:
        app: agentledger-web
    spec:
      containers:
      - name: web
        image: agentledger-web:latest
        ports:
        - containerPort: 3000
        env:
        - name: NEXT_PUBLIC_SUPABASE_URL
          valueFrom:
            configMapKeyRef:
              name: metrxbot-config
              key: supabase_url
        # ... other env vars
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /api/health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: agentledger-web
spec:
  selector:
    app: agentledger-web
  ports:
  - protocol: TCP
    port: 3000
    targetPort: 3000
  type: LoadBalancer

Running Background Workers

Background workers process events from Redis and update the database.

Option A: Standalone Process

Create workers/start.ts:

import { Worker } from 'bullmq';
import { EventProcessor } from './processors/event';
import { OutcomeProcessor } from './processors/outcome';
 
const redis = {
  host: process.env.REDIS_HOST || 'localhost',
  port: parseInt(process.env.REDIS_PORT || '6379'),
};
 
// Process events
const eventWorker = new Worker('events', EventProcessor, { connection: redis });
eventWorker.on('completed', (job) => {
  console.log(`Event ${job.id} processed successfully`);
});
eventWorker.on('failed', (job, err) => {
  console.error(`Event ${job.id} failed:`, err);
});
 
// Process outcomes
const outcomeWorker = new Worker('outcomes', OutcomeProcessor, { connection: redis });
outcomeWorker.on('completed', (job) => {
  console.log(`Outcome ${job.id} processed successfully`);
});
 
console.log('Background workers started');

Run:

npm run workers:start

Option B: Docker Container

Dockerfile:

FROM node:18
 
WORKDIR /app
 
COPY package*.json ./
RUN npm ci --only=production
 
COPY . .
 
CMD ["npm", "run", "workers:start"]

Option C: Kubernetes CronJob

For periodic tasks (monthly billing, cleanup):

apiVersion: batch/v1
kind: CronJob
metadata:
  name: metrxbot-billing
spec:
  schedule: "0 0 1 * *"  # 1st day of month
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: billing
            image: agentledger-workers:latest
            command: ["npm", "run", "billing:process"]
            env:
            - name: SUPABASE_URL
              valueFrom:
                configMapKeyRef:
                  name: metrxbot-config
                  key: supabase_url
          restartPolicy: OnFailure

SSL/TLS Configuration

Using Let’s Encrypt with Certbot

# Install Certbot
sudo apt-get install certbot python3-certbot-nginx
 
# Obtain certificate
sudo certbot certonly --nginx -d gateway.yourdomain.com -d app.yourdomain.com
 
# Auto-renewal
sudo systemctl enable certbot.timer
sudo systemctl start certbot.timer

Using Cloudflare SSL (Flexible)

  1. Point DNS to your server
  2. Enable Cloudflare’s flexible SSL
  3. Gateway traffic is encrypted end-to-end

Self-Signed Certificate (Development Only)

openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes

Monitoring & Logging

Application Logs

With Docker Compose:

docker-compose logs -f gateway
docker-compose logs -f web
docker-compose logs -f workers

With Kubernetes:

kubectl logs -f deployment/agentledger-gateway
kubectl logs -f deployment/agentledger-web

Metrics with Prometheus

Create prometheus.yml:

global:
  scrape_interval: 15s
 
scrape_configs:
  - job_name: 'gateway'
    static_configs:
      - targets: ['localhost:8787']
  - job_name: 'web'
    static_configs:
      - targets: ['localhost:3000']
  - job_name: 'redis'
    static_configs:
      - targets: ['localhost:6379']

Centralized Logging with ELK Stack

Filebeat config:

filebeat.inputs:
- type: docker
  enabled: true
  containers:
    ids:
      - '*'
 
output.elasticsearch:
  hosts: ["localhost:9200"]
  index: "metrxbot-%{+yyyy.MM.dd}"

Alerting with Alertmanager

global:
  resolve_timeout: 5m
 
route:
  group_by: ['alertname']
  receiver: 'slack'
 
receivers:
- name: 'slack'
  slack_configs:
  - api_url: 'YOUR_SLACK_WEBHOOK_URL'
    channel: '#alerts'

Troubleshooting

Gateway won’t start

Check logs:

docker logs agentledger-gateway

Verify environment variables:

env | grep SUPABASE
env | grep REDIS

Test database connection:

psql -h your-postgres-host -U metrxbot -d metrxbot -c "SELECT 1"

High latency

  1. Check Gateway load (should be <80% CPU)
  2. Check Redis latency: redis-cli ping (should be <10ms)
  3. Check database query times
  4. Check network latency to LLM providers

Events not being processed

  1. Check Redis queue length: redis-cli llen events
  2. Check worker logs: docker logs agentledger-workers
  3. Verify database has events table: psql ... -c "\dt events"

Database growing too large

  1. Archive old events to cold storage
  2. Implement data retention policy
  3. Use table partitioning by date

Upgrade Path

To upgrade Metrx:

# Backup database
pg_dump metrxbot > backup-$(date +%Y%m%d).sql
 
# Pull latest code
git pull origin main
 
# Run migrations
npm run migrate
 
# Rebuild and deploy
npm run build
docker build -t agentledger-gateway:latest apps/gateway/
docker push agentledger-gateway:latest
 
# Rolling restart (Kubernetes)
kubectl rollout restart deployment/agentledger-gateway

Next Steps: See Architecture for system design details or API Reference for integration.