This guide covers deploying Metrx on your own infrastructure for complete control and data privacy.
Table of Contents
- Prerequisites
- Architecture Overview
- Environment Setup
- Database Setup
- Deploying the Gateway
- Deploying the Web Dashboard
- Running Background Workers
- SSL/TLS Configuration
- Monitoring & Logging
- Troubleshooting
Prerequisites
System Requirements
- Node.js: 18.0.0 or higher
- npm: 10.9.0 or higher
- Docker: (optional) for containerized deployment
- Kubernetes: (optional) for advanced deployments
External Services
- PostgreSQL: 12+ (managed Supabase or self-hosted)
- Redis: 6.0+ (for event queue and caching)
- Clerk: (for authentication) or self-hosted auth solution
- Stripe: (for billing) or payment provider
- DNS: Domain name for Gateway and Web app
Infrastructure Options
- VPS: AWS EC2, DigitalOcean, Linode, Hetzner
- Kubernetes: GKE, EKS, AKS
- Docker Compose: Single-machine deployment
- Vercel/Netlify: For the Web dashboard only (Next.js)
Architecture Overview
Self-hosted Metrx consists of:
┌──────────────────────────────────────┐
│ Your Infrastructure │
├──────────────────────────────────────┤
│ │
│ ┌────────────────────────────────┐ │
│ │ Gateway (Node.js Worker) │ │
│ │ Port: 8787 (HTTP proxy) │ │
│ └────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────┐ │
│ │ Web Dashboard (Next.js) │ │
│ │ Port: 3000 (Web app) │ │
│ └────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────┐ │
│ │ Workers (Node.js / BullMQ) │ │
│ │ Port: 9000 (internal only) │ │
│ └────────────────────────────────┘ │
│ │
├──────────────────────────────────────┤
│ PostgreSQL (Database) │
│ Redis (Event Queue) │
├──────────────────────────────────────┤
│ External Services (read-only) │
│ - OpenAI, Anthropic, etc. APIs │
│ - Stripe (billing) │
│ - Clerk (auth) │
└──────────────────────────────────────┘Environment Setup
1. Clone the Repository
git clone https://github.com/metrxbot/metrxbot.git
cd metrxbot
npm install2. Create Environment Files
Root .env.production:
# === App Configuration ===
NODE_ENV=production
NEXT_PUBLIC_APP_URL=https://app.yourdomain.com
NEXT_PUBLIC_GATEWAY_URL=https://gateway.yourdomain.com
# === Supabase (PostgreSQL) ===
NEXT_PUBLIC_SUPABASE_URL=https://your-supabase.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJxxx...
SUPABASE_SERVICE_ROLE_KEY=eyJxxx...
# === Clerk Authentication ===
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_live_xxx
CLERK_SECRET_KEY=sk_live_xxx
NEXT_PUBLIC_CLERK_SIGN_IN_URL=/sign-in
NEXT_PUBLIC_CLERK_SIGN_UP_URL=/sign-up
NEXT_PUBLIC_CLERK_AFTER_SIGN_IN_URL=/
NEXT_PUBLIC_CLERK_AFTER_SIGN_UP_URL=/onboarding
# === Stripe Billing ===
STRIPE_SECRET_KEY=sk_live_xxx
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=pk_live_xxx
STRIPE_WEBHOOK_SECRET=whsec_xxx
STRIPE_PRICE_LITE=price_xxx
STRIPE_PRICE_PRO=price_xxx
STRIPE_PRICE_BUSINESS=price_xxx
# === Redis (Event Queue) ===
UPSTASH_REDIS_REST_URL=https://your-redis.upstash.io
UPSTASH_REDIS_REST_TOKEN=xxx
# === Email (Resend) ===
RESEND_API_KEY=re_xxx
RESEND_FROM_EMAIL=Metrx `<noreply@metrxbot.com>`
# === Error Tracking (Sentry) ===
NEXT_PUBLIC_SENTRY_DSN=https://xxx@sentry.io/xxx
SENTRY_AUTH_TOKEN=sntrys_xxx
SENTRY_ORG=your-org
SENTRY_PROJECT=metrxbot
# === Feature Flags ===
NEXT_PUBLIC_DEMO_MODE=false
ALERT_EMAIL_ENABLED=true
ALERT_THRESHOLD_COST_SPIKE_PCT=200
ALERT_RATE_LIMIT_PER_ORG_HOUR=5Gateway apps/gateway/.env.production:
For self-hosting, replace Cloudflare Workers with a Node.js worker:
# Create apps/gateway/.env.production
SUPABASE_URL=https://your-supabase.supabase.co
SUPABASE_SERVICE_KEY=eyJxxx...
UPSTASH_REDIS_URL=https://your-redis.upstash.io
UPSTASH_REDIS_TOKEN=xxx
ENVIRONMENT=production
ALLOWED_ORIGINS=https://app.yourdomain.com3. Install Dependencies
npm install
npm run buildDatabase Setup
Option A: Managed Supabase (Recommended)
- Create account at supabase.com
- Create a new project
- Note your API URL and keys
- Run migrations:
# Install Supabase CLI
npm install -g supabase
# Link to your Supabase project
supabase link --project-ref your-project-ref
# Run migrations
supabase migration upOption B: Self-Hosted PostgreSQL
With Docker Compose
Create docker-compose.yml:
version: '3.8'
services:
postgres:
image: postgres:15
environment:
POSTGRES_USER: metrxbot
POSTGRES_PASSWORD: secure_password_here
POSTGRES_DB: metrxbot
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U metrxbot"]
interval: 10s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
volumes:
postgres_data:
redis_data:Start services:
docker-compose up -dCreate Database Schema
psql -h localhost -U metrxbot -d metrxbot -f supabase/migrations/001_init.sql
psql -h localhost -U metrxbot -d metrxbot -f supabase/migrations/002_indexes.sql
psql -h localhost -U metrxbot -d metrxbot -f supabase/migrations/003_rls.sql
# ... run all migrationsOption C: Managed PostgreSQL (AWS RDS, Azure Database)
- Create RDS instance
- Create database and user
- Update connection string in
.env.production - Run migrations
AWS RDS Example
# Create RDS instance
aws rds create-db-instance \
--db-instance-identifier metrxbot \
--db-instance-class db.t3.micro \
--engine postgres \
--master-username metrxbot \
--master-user-password `<secure_password>`
# Get endpoint
aws rds describe-db-instances --db-instance-identifier metrxbot
# Update .env.production
POSTGRES_URL=postgresql://metrxbot:`<password>`@metrxbot.xxx.rds.amazonaws.com:5432/metrxbotDeploying the Gateway
The Gateway is the critical request path. It should be highly available and low-latency.
Option A: Docker on Single Server
Build Docker image:
cd apps/gateway
docker build -t agentledger-gateway:latest .Create .env.production:
SUPABASE_URL=https://your-supabase.supabase.co
SUPABASE_SERVICE_KEY=eyJxxx...
UPSTASH_REDIS_URL=https://your-redis.upstash.io
UPSTASH_REDIS_TOKEN=xxx
ENVIRONMENT=production
ALLOWED_ORIGINS=https://app.yourdomain.com
PORT=8787Run container:
docker run -d \
--name agentledger-gateway \
-p 8787:8787 \
--env-file .env.production \
agentledger-gateway:latestOption B: Kubernetes Deployment
Create k8s/gateway-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: agentledger-gateway
labels:
app: agentledger-gateway
spec:
replicas: 3
selector:
matchLabels:
app: agentledger-gateway
template:
metadata:
labels:
app: agentledger-gateway
spec:
containers:
- name: gateway
image: agentledger-gateway:latest
ports:
- containerPort: 8787
env:
- name: SUPABASE_URL
valueFrom:
configMapKeyRef:
name: metrxbot-config
key: supabase_url
- name: SUPABASE_SERVICE_KEY
valueFrom:
secretKeyRef:
name: metrxbot-secrets
key: supabase_service_key
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8787
initialDelaySeconds: 10
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: agentledger-gateway
spec:
selector:
app: agentledger-gateway
ports:
- protocol: TCP
port: 8787
targetPort: 8787
type: LoadBalancerDeploy:
kubectl apply -f k8s/gateway-deployment.yamlOption C: AWS Lambda + API Gateway
Create serverless.yml:
service: agentledger-gateway
provider:
name: aws
runtime: nodejs18.x
environment:
SUPABASE_URL: ${env:SUPABASE_URL}
SUPABASE_SERVICE_KEY: ${env:SUPABASE_SERVICE_KEY}
UPSTASH_REDIS_URL: ${env:UPSTASH_REDIS_URL}
UPSTASH_REDIS_TOKEN: ${env:UPSTASH_REDIS_TOKEN}
functions:
gateway:
handler: apps/gateway/src/index.handler
events:
- http:
path: /{proxy+}
method: ANY
cors: trueDeploy:
npm install -g serverless
serverless deployLoad Balancing & High Availability
With Nginx:
upstream gateway {
least_conn;
server 10.0.1.10:8787;
server 10.0.1.11:8787;
server 10.0.1.12:8787;
}
server {
listen 443 ssl;
server_name gateway.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
location / {
proxy_pass http://gateway;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}Deploying the Web Dashboard
The Web Dashboard is a Next.js application.
Option A: Vercel (Easiest)
npm install -g vercel
vercel login
vercel deploy --prodOption B: Self-Hosted Docker
Build:
cd apps/web
docker build -t agentledger-web:latest .Run:
docker run -d \
--name agentledger-web \
-p 3000:3000 \
--env-file .env.production \
agentledger-web:latestOption C: Kubernetes
Create k8s/web-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: agentledger-web
spec:
replicas: 2
selector:
matchLabels:
app: agentledger-web
template:
metadata:
labels:
app: agentledger-web
spec:
containers:
- name: web
image: agentledger-web:latest
ports:
- containerPort: 3000
env:
- name: NEXT_PUBLIC_SUPABASE_URL
valueFrom:
configMapKeyRef:
name: metrxbot-config
key: supabase_url
# ... other env vars
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: agentledger-web
spec:
selector:
app: agentledger-web
ports:
- protocol: TCP
port: 3000
targetPort: 3000
type: LoadBalancerRunning Background Workers
Background workers process events from Redis and update the database.
Option A: Standalone Process
Create workers/start.ts:
import { Worker } from 'bullmq';
import { EventProcessor } from './processors/event';
import { OutcomeProcessor } from './processors/outcome';
const redis = {
host: process.env.REDIS_HOST || 'localhost',
port: parseInt(process.env.REDIS_PORT || '6379'),
};
// Process events
const eventWorker = new Worker('events', EventProcessor, { connection: redis });
eventWorker.on('completed', (job) => {
console.log(`Event ${job.id} processed successfully`);
});
eventWorker.on('failed', (job, err) => {
console.error(`Event ${job.id} failed:`, err);
});
// Process outcomes
const outcomeWorker = new Worker('outcomes', OutcomeProcessor, { connection: redis });
outcomeWorker.on('completed', (job) => {
console.log(`Outcome ${job.id} processed successfully`);
});
console.log('Background workers started');Run:
npm run workers:startOption B: Docker Container
Dockerfile:
FROM node:18
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
CMD ["npm", "run", "workers:start"]Option C: Kubernetes CronJob
For periodic tasks (monthly billing, cleanup):
apiVersion: batch/v1
kind: CronJob
metadata:
name: metrxbot-billing
spec:
schedule: "0 0 1 * *" # 1st day of month
jobTemplate:
spec:
template:
spec:
containers:
- name: billing
image: agentledger-workers:latest
command: ["npm", "run", "billing:process"]
env:
- name: SUPABASE_URL
valueFrom:
configMapKeyRef:
name: metrxbot-config
key: supabase_url
restartPolicy: OnFailureSSL/TLS Configuration
Using Let’s Encrypt with Certbot
# Install Certbot
sudo apt-get install certbot python3-certbot-nginx
# Obtain certificate
sudo certbot certonly --nginx -d gateway.yourdomain.com -d app.yourdomain.com
# Auto-renewal
sudo systemctl enable certbot.timer
sudo systemctl start certbot.timerUsing Cloudflare SSL (Flexible)
- Point DNS to your server
- Enable Cloudflare’s flexible SSL
- Gateway traffic is encrypted end-to-end
Self-Signed Certificate (Development Only)
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodesMonitoring & Logging
Application Logs
With Docker Compose:
docker-compose logs -f gateway
docker-compose logs -f web
docker-compose logs -f workersWith Kubernetes:
kubectl logs -f deployment/agentledger-gateway
kubectl logs -f deployment/agentledger-webMetrics with Prometheus
Create prometheus.yml:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'gateway'
static_configs:
- targets: ['localhost:8787']
- job_name: 'web'
static_configs:
- targets: ['localhost:3000']
- job_name: 'redis'
static_configs:
- targets: ['localhost:6379']Centralized Logging with ELK Stack
Filebeat config:
filebeat.inputs:
- type: docker
enabled: true
containers:
ids:
- '*'
output.elasticsearch:
hosts: ["localhost:9200"]
index: "metrxbot-%{+yyyy.MM.dd}"Alerting with Alertmanager
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- api_url: 'YOUR_SLACK_WEBHOOK_URL'
channel: '#alerts'Troubleshooting
Gateway won’t start
Check logs:
docker logs agentledger-gatewayVerify environment variables:
env | grep SUPABASE
env | grep REDISTest database connection:
psql -h your-postgres-host -U metrxbot -d metrxbot -c "SELECT 1"High latency
- Check Gateway load (should be
<80%CPU) - Check Redis latency:
redis-cli ping(should be<10ms) - Check database query times
- Check network latency to LLM providers
Events not being processed
- Check Redis queue length:
redis-cli llen events - Check worker logs:
docker logs agentledger-workers - Verify database has events table:
psql ... -c "\dt events"
Database growing too large
- Archive old events to cold storage
- Implement data retention policy
- Use table partitioning by date
Upgrade Path
To upgrade Metrx:
# Backup database
pg_dump metrxbot > backup-$(date +%Y%m%d).sql
# Pull latest code
git pull origin main
# Run migrations
npm run migrate
# Rebuild and deploy
npm run build
docker build -t agentledger-gateway:latest apps/gateway/
docker push agentledger-gateway:latest
# Rolling restart (Kubernetes)
kubectl rollout restart deployment/agentledger-gatewayNext Steps: See Architecture for system design details or API Reference for integration.