Production Deployment Guide

⚠️ FUTURE WORK REFERENCE - This document is for planning and reference only

Status: 📋 Analysis complete, implementation NOT started
Purpose: Reference material for future production deployment implementation
Last Updated: 2025-01-17

🚧 IMPORTANT NOTES:

DNS Architecture Changing: See GitHub Issue #30 - PowerDNS will be replaced with CoreDNS file plugin

Not Production Ready: Scripts and procedures need to be created and tested

Architecture May Evolve: Recommendations here based on current dev setup, may change

Use for Planning: Use this as a reference when implementing production deployment

📋 Table of Contents

Detailed Analysis

Architecture Analysis
Component-by-Component Breakdown
Security Considerations
Deployment Procedures (Future)
Operations & Maintenance
Cost & Resource Planning

Executive Summary

This document provides comprehensive analysis and planning for Holistix Forge production deployment on Ubuntu VPS. The analysis shows that 85% of development scripts and 90% of architecture can be reused for production with minimal adaptation.

Key Findings

Good News:

✅ Development environment designed with production parity in mind
✅ Most scripts work with minor modifications
✅ Architecture is production-ready
✅ Main differences are simplifications (fewer components in production)

Important Caveats:

⚠️ DNS architecture is changing (see Issue #30)
⚠️ Production scripts need to be created
⚠️ Full deployment testing required
⚠️ Security hardening needs implementation

Deployment Strategy

Approach: Maximize reuse from development setup, implement only necessary differences.

Main Differences:

❌ No dev container - Install directly on Ubuntu VPS
⚠️ DNS: PowerDNS on port 53 (but changing to CoreDNS file plugin per #30)
✏️ SSL: Let's Encrypt with DNS-01 challenge (instead of mkcert)
➕ systemd services for process management
➕ Security hardening (firewall, SSH, passwords)
➕ Monitoring alerts and automated backups

Key Decisions

Decision 1: Direct Install (No Dev Container) ✅

Recommendation: Install services directly on Ubuntu VPS

Reasoning:

Dev container provides no functional benefit in production
Simplifies operations and debugging
Better performance (no container overhead)
Standard Linux administration tools

Impact: Need to adapt scripts that assume container environment

Decision 2: DNS Architecture ⚠️ CHANGING

⚠️ CRITICAL NOTE: This decision will change when Issue #30 is implemented.

Current Plan (in this doc): PowerDNS on port 53
Future Plan (Issue #30): CoreDNS file plugin with wildcard DNS
Result: Production will be even simpler (no database, no dynamic DNS operations)

Current Recommendation: PowerDNS on port 53 directly

Current Reasoning:

Production has domain delegation (simpler than dev)
CoreDNS only needed for local dev forwarding
Single DNS server instead of two

Future Recommendation (after #30):

Use CoreDNS with file plugin
Static zone files with wildcard DNS (*.domain)
No PowerDNS, no database, no dynamic operations
Even simpler architecture

Impact: Wait for Issue #30 before implementing DNS setup

Decision 3: Let's Encrypt SSL ✅

Recommendation: Let's Encrypt with DNS-01 challenge for wildcard certificates

Reasoning:

Free and fully automated
Wildcard support (critical for dynamic gateways/containers)
Industry standard
Automatic renewal

Requirements:

DNS provider API access (Cloudflare, Route53, etc.)
Certbot with DNS plugin

Impact: Need DNS provider API credentials

Decision 4: Pre-Built Artifacts ✅

Recommendation: Build locally or in CI/CD, deploy artifacts only

Reasoning:

No source code on production server
Faster deployments
No build tools needed on production
Better security

Impact: Need deployment pipeline or local build process

Decision 5: systemd Services ✅

Recommendation: Use systemd for all service management

Reasoning:

Auto-restart on crash
Start on boot
Resource limits
Standard logging (journalctl)
Standard operations (systemctl)

Impact: Need to create systemd service files

Development vs Production Comparison

Quick Reference Table

Component	Development	Production	Changes Required
Host Environment	Dev Container (Ubuntu)	Ubuntu VPS	❌ Remove container layer
PostgreSQL	`apt install`	`apt install`	✅ Same install ✏️ Harden config
Nginx	`apt install`	`apt install`	✅ Same install ➕ Security headers
DNS	CoreDNS + PowerDNS	⚠️ TBD (see #30)	⚠️ Wait for Issue #30
SSL	mkcert	Let's Encrypt	✏️ Change SSL automation
Services	Manual start	systemd	➕ Create service files
Node.js	NodeSource 24.x	NodeSource 24.x	✅ Same
Docker	Docker Desktop	Docker Engine	✅ Same (for gateways)
Monitoring	Optional	Required	✅ Same stack + alerts

DNS Architecture Comparison

⚠️ NOTE: This comparison assumes current architecture. See Issue #30 for planned changes.

Aspect	Development	Production (Current)	Production (Future #30)
Tiers	Two-tier	Single-tier	Single-tier
DNS Servers	CoreDNS + PowerDNS	PowerDNS only	CoreDNS only
Port	53 (CoreDNS), 5300 (PowerDNS)	53 (PowerDNS)	53 (CoreDNS)
Database	PostgreSQL for PowerDNS	PostgreSQL for PowerDNS	None!
Dynamic DNS	Yes (via API)	Yes (via API)	No (wildcard)
Complexity	Medium	Low	Very Low

Why Different?

Dev: Need both local (*.domain.local) and external DNS forwarding
Prod (current): Domain delegation handles routing, no forwarding needed
Prod (future): Wildcard DNS eliminates need for dynamic records!

SSL/TLS Comparison

Aspect	Development	Production
Tool	mkcert	Let's Encrypt (certbot)
Certificate Type	Self-signed	Trusted CA
Wildcard	✅ `*.domain.local`	✅ `*.your-domain.com`
Challenge	N/A	DNS-01 (for wildcard)
Renewal	Never expires	Auto-renew every 90 days
Client Trust	Manual CA install	Automatic (browser trusted)
Cost	Free	Free

Service Management Comparison

Aspect	Development	Production
Ganymede	Manual `node main.js &`	systemd service
DNS	Manual start	systemd service
Nginx	System service	systemd service
Auto-start	❌ Manual	✅ On boot
Restart on Crash	❌ No	✅ Yes
Resource Limits	❌ None	✅ systemd limits
Logging	Files	journalctl + files

Security Comparison

Aspect	Development	Production
Firewall	❌ Not configured	✅ ufw with strict rules
SSH	Default	✅ Hardened (no root, key-only)
DB Password	`devpassword`	Strong random (32 chars)
DB User	postgres superuser	✅ Limited app user
SSL/TLS	Self-signed	Trusted CA
Rate Limiting	❌ None	✅ Nginx limits
Security Headers	❌ None	✅ X-Frame, CSP, etc.
Auto Updates	Manual	✅ unattended-upgrades

Implementation Roadmap

Phase 1: Core Infrastructure (Week 1)

Goal: Get VPS ready with basic services

Tasks:

[ ] Provision Ubuntu 24.04 VPS
[ ] Configure SSH hardening
[ ] Setup firewall (ufw)
[ ] Configure DNS at domain registrar
[ ] Install Node.js, PostgreSQL, Nginx, Docker
[ ] Setup Let's Encrypt SSL

Deliverables:

Accessible VPS with hardened SSH
Domain pointing to VPS
SSL certificate working
Core dependencies installed

Estimated Time: 8 hours (+ 24-48h DNS propagation wait)

Phase 2: Script Adaptation (Week 2)

Goal: Create production-specific scripts

Tasks:

[ ] Wait for Issue #30 DNS architecture decision
[ ] Create scripts/production/setup-production.sh
[ ] Create systemd service files
[ ] Adapt create-env.sh for production
[ ] Create scripts/production/deploy.sh
[ ] Create backup scripts
[ ] Document all procedures

Deliverables:

Production setup script
Production environment creation script
Deployment automation
Backup automation
systemd service templates

Estimated Time: 16 hours

Phase 3: Deployment & Testing (Week 3)

Goal: Deploy and verify full stack

Tasks:

[ ] Build application artifacts
[ ] Run production setup
[ ] Create production environment
[ ] Deploy artifacts
[ ] Start services
[ ] Test all functionality
[ ] Fix issues
[ ] Security audit

Deliverables:

Working production deployment
Test results documentation
Issue tracking for bugs

Estimated Time: 24 hours

Phase 4: Operations Setup (Week 4)

Goal: Production-ready operations

Tasks:

[ ] Configure Grafana alerts
[ ] Setup external uptime monitoring
[ ] Test backup/restore procedures
[ ] Create runbooks
[ ] Load testing
[ ] Disaster recovery plan
[ ] CI/CD integration

Deliverables:

Monitoring and alerting configured
Tested backup/restore procedures
Operational runbooks
CI/CD pipeline

Estimated Time: 24 hours

Total Timeline: ~4 weeks

Architecture Analysis

What We Have (Development)

Components:

Main dev container (Ubuntu 24.04)
PostgreSQL database
PowerDNS (port 5300) + CoreDNS (port 53)
Nginx for SSL and routing
Ganymede API (Express.js)
Gateway pool (Docker containers)
User containers (Docker)
Monitoring stack (Grafana, Loki, Tempo)

Strengths:

✅ Complete local development environment
✅ Production parity in architecture
✅ Comprehensive automation scripts
✅ Well-documented setup

Production Gaps:

⚠️ mkcert SSL (need Let's Encrypt)
⚠️ Manual process management (need systemd)
⚠️ Weak security defaults
⚠️ No monitoring alerts
⚠️ No automated backups

Component Reusability Matrix

Component	Reusability	Notes
PostgreSQL setup	85%	Add hardening steps
DNS (PowerDNS)	⚠️ TBD	Wait for Issue #30
DNS (CoreDNS)	⚠️ TBD	May keep with file plugin
Nginx config	85%	Change SSL paths, add headers
Ganymede app	95%	No code changes
Gateway pool	100%	Works as-is
Frontend build	100%	No changes
Monitoring	100%	Add alerts

Overall Reusability: 85%

Component-by-Component Breakdown

PostgreSQL

Development:

Installed via apt install postgresql
Default configuration
Weak password (devpassword)
Superuser used directly

Production Adaptations:

✅ Keep apt install postgresql (same)
✏️ Generate strong random password
✏️ Create limited application user (already in create-env.sh!)
➕ Configure connection limits
➕ Enable SSL/TLS for connections
➕ Setup automated backups
➕ Add monitoring

Script Impact:

setup-postgres.sh - Add hardening (85% reusable)
create-env.sh - Already creates app user ✅

DNS (⚠️ Architecture Changing)

See Issue #30 for planned architecture changes.

Current Plan (may be obsolete):

PowerDNS on port 53
Remove CoreDNS

Future Plan (Issue #30):

CoreDNS with file plugin on port 53
Static zone files with wildcard DNS
Remove PowerDNS entirely

Recommendation: Wait for Issue #30 before implementing DNS in production

Nginx

Development:

SSL termination with mkcert
Proxy to Ganymede and gateways
Basic configuration

Production Adaptations:

✏️ Use Let's Encrypt SSL certificates
➕ Add security headers (X-Frame-Options, CSP, etc.)
➕ Add rate limiting
➕ Add gzip compression
✅ Keep proxy configuration (same)
✅ Keep dynamic gateway configs (same)

Security Headers:

add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline';" always;

Reusability: 85%

Ganymede (API Server)

Development:

Runs manually via node main.js &
Logs to file
No restart on crash

Production Adaptations:

✏️ Run via systemd service
✏️ Use production environment variables
➕ Add resource limits (systemd MemoryMax, CPUQuota)
➕ Add security sandboxing (systemd directives)
✅ Keep application code (no changes needed)
✅ Keep database schema (no changes)

systemd Service Example:

[Service]
Type=simple
User=holistix
WorkingDirectory=/opt/holistix/prod
EnvironmentFile=/opt/holistix/prod/.env.ganymede

# Security
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true

# Resources
MemoryMax=2G
CPUQuota=200%

# Restart
Restart=on-failure
RestartSec=5s

ExecStart=/usr/bin/node dist/packages/app-ganymede/main.js

Reusability: 95%

Gateway Pool

Development:

Docker containers
HTTP build distribution
Dynamic allocation

Production Adaptations:

✅ Keep Docker containers (same)
✅ Keep allocation logic (same)
✏️ Use pre-built artifacts instead of HTTP distribution
➕ Add container health checks
➕ Add resource limits (Docker --memory, --cpus)
✅ Keep lifecycle management (same)

Reusability: 95%

Frontend

Development:

Built with Vite
Served by Nginx
Hot reload in dev mode

Production Adaptations:

✅ Keep Vite build process (same)
✏️ Build with --configuration=production
➕ Add cache headers in Nginx
➕ Add CDN integration (optional)
✅ Keep Nginx serving (same)

Reusability: 95%

Security Considerations

Firewall Configuration

Required Ports:

# SSH
ufw allow 22/tcp

# HTTP/HTTPS
ufw allow 80/tcp
ufw allow 443/tcp

# DNS
ufw allow 53/tcp
ufw allow 53/udp

# Block everything else
ufw default deny incoming
ufw default allow outgoing

ufw enable

SSH Hardening

# /etc/ssh/sshd_config
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3
LoginGraceTime 20

Database Security

Strong Passwords:

# Generate 32-character random password
DB_PASSWORD=$(openssl rand -base64 32)

Limited Privileges:

-- App user has only necessary permissions
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES TO app_user;
-- No CREATE, DROP, or user management

SSL Enforcement:

# PostgreSQL: require SSL for all connections
ssl = on

Secrets Management

DO NOT:

❌ Commit secrets to git
❌ Store in plain text

DO:

✅ Use environment files with 0600 permissions
✅ Store in /etc/holistix/secrets/
✅ Consider secret management tools (Vault, AWS Secrets Manager)

Rate Limiting

# Nginx rate limiting
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

location / {
    limit_req zone=api burst=20;
}

Deployment Procedures (Future)

⚠️ NOTE: These procedures are for reference only. They need to be tested and refined before use.

Prerequisites

VPS Requirements:

Ubuntu 24.04 LTS
4 vCPU, 8GB RAM, 100GB SSD (minimum)
Static public IP
Cost: ~$40-50/month

Domain Requirements:

Owned domain name
DNS registrar access
Ability to configure NS records

DNS Provider API:

Cloudflare (recommended)
Route 53 (AWS)
Or other certbot-supported provider

Deployment Steps (High-Level)

VPS Provisioning

Create VPS instance
Configure SSH hardening
Setup firewall

DNS Configuration

Configure domain delegation
Wait for propagation

Application Setup

Install dependencies
Setup PostgreSQL
Setup DNS (wait for Issue #30)
Setup Let's Encrypt

Environment Creation

Build applications
Create production environment
Configure systemd services

Verification

Test DNS resolution
Test HTTPS
Test API
Test frontend

Operations Setup
- Configure monitoring
- Setup backups
- Configure alerts

Detailed procedures: Create after testing deployment.

Operations & Maintenance

Monitoring

Components to Monitor:

System metrics (CPU, RAM, disk, network)
Application metrics (API requests, response times)
Database metrics (connections, queries)
Gateway pool status
Container metrics

Tools:

Grafana (dashboards)
Loki (logs)
Tempo (traces)
OTLP Collector
UptimeRobot (external uptime)

Alert Rules:

Gateway pool exhausted
Disk usage > 80%
High memory usage
SSL certificate expiring (< 30 days)
API errors > threshold

Backups

What to Backup:

PostgreSQL databases (all ganymede_*)
Organization data (org-data/ directory)
Nginx configurations
Environment files (.env.*)
SSL certificates (auto-renewed, but backup for safety)

Backup Schedule:

Daily backups at 2 AM
Keep last 7 days
Weekly backups kept for 4 weeks
Monthly backups kept for 12 months

Backup Script (Example):

#!/bin/bash
BACKUP_DIR="/opt/holistix/backups"
DATE=$(date +%Y%m%d_%H%M%S)

# Backup PostgreSQL
pg_dump ganymede_prod | gzip > "$BACKUP_DIR/postgres/ganymede_prod_${DATE}.sql.gz"

# Backup org-data
tar -czf "$BACKUP_DIR/org-data/org-data_${DATE}.tar.gz" /opt/holistix/prod/org-data/

# Cleanup old backups
find "$BACKUP_DIR" -mtime +7 -delete

Common Operations

Deploy Updates:

# Build locally
npx nx run-many --target=build --all --configuration=production
tar -czf holistix-$(git rev-parse --short HEAD).tar.gz dist/

# Deploy to VPS
scp holistix-*.tar.gz holistix@VPS:/tmp/
ssh holistix@VPS "cd /opt/holistix/prod && tar -xzf /tmp/holistix-*.tar.gz"
ssh holistix@VPS "systemctl restart ganymede@prod"

Scale Gateway Pool:

# Add 5 more gateways
ENV_NAME=prod DOMAIN=your-domain.com \
  ./scripts/local-dev/gateway-pool.sh create 5 /opt/holistix/monorepo

View Logs:

# System logs
journalctl -u ganymede@prod -f

# Application logs
tail -f /opt/holistix/prod/logs/ganymede.log

# Gateway logs
docker logs -f gw-pool-0

Cost & Resource Planning

VPS Cost Estimates

Minimum (testing):

2 vCPU, 4GB RAM, 50GB SSD
$15-25/month
Suitable for: Testing, small deployment

Recommended (production):

4 vCPU, 8GB RAM, 100GB SSD
$35-50/month
Suitable for: Production, 10-50 users

High Performance:

8 vCPU, 16GB RAM, 200GB SSD
$80-120/month
Suitable for: Large deployment, 100+ users

Resource Distribution (8GB VPS)

PostgreSQL:     2GB
Ganymede:       2GB
Gateway Pool:   3GB (10 gateways @ 300MB each)
User Containers: 1GB (2-4 containers)
System:         1GB (OS overhead)

Storage Planning

Application:     500MB (dist + node_modules)
PostgreSQL:     1-5GB (depends on usage)
Logs:           1-2GB (with rotation)
Backups:        5-10GB (7 days of DB backups)
User Data:      Variable (org-data files)
Total:          ~10-20GB typical

Bandwidth Estimation

Per User Per Day:

Initial load: ~2MB (frontend bundle)
WebSocket: ~1MB (collaboration)
API requests: ~1MB

Example: 100 active users:

Daily: ~400MB
Monthly: ~12GB
Well within typical 4TB bandwidth limits

Script Reusability Summary

Scripts That Work As-Is (100%)

install-node.sh - Node.js installation
build-images.sh - Gateway Docker image
gateway-pool.sh - Gateway pool management
envctl-monitor.sh - Environment monitoring
build-frontend.sh - Frontend build

Scripts Needing Minor Changes (85-95%)

setup-postgres.sh - Add hardening steps
create-env.sh - Replace mkcert with Let's Encrypt, adapt paths
envctl.sh - Add systemd support
install-system-deps.sh - Minor tweaks

Scripts Not Needed in Production

setup-coredns.sh - DNS architecture changing (Issue #30)
update-coredns.sh - DNS architecture changing
install-mkcert.sh - Using Let's Encrypt instead

New Scripts Needed

scripts/production/setup-production.sh - Main production setup
scripts/production/setup-letsencrypt.sh - SSL automation
scripts/production/create-systemd-services.sh - Service files
scripts/production/harden-system.sh - Security hardening
scripts/production/deploy.sh - Deployment automation
scripts/production/backup-all.sh - Backup automation
scripts/production/restore.sh - Disaster recovery
scripts/production/health-check.sh - Deep health check

Timeline Estimates

Development Setup (First Time)

Create dev container: 10 min
Run setup-all.sh: 15-20 min
Create environment: 5 min
Build frontend: 5 min
Configure host DNS: 10 min
Total: 45-50 minutes

Production Setup (Estimated)

Provision VPS: 10 min
DNS delegation: 5 min (+ 24-48h wait)
SSH & security: 30 min
Run production setup: 20-30 min
SSL certificate: 5 min
Create environment: 10 min
Deploy artifacts: 10 min
Testing: 30 min
Monitoring setup: 30 min
Total: 2.5-3 hours (+ DNS propagation wait)

Risk Assessment

Development Risks (Low)

Dev container crash → Restart
Data loss → Not production data
Security breach → Local network only

Production Risks (High)

Risk	Impact	Mitigation
VPS crash	HIGH	Monitoring + alerts + backups
Database corruption	HIGH	Daily backups + replication
Security breach	HIGH	Hardening + updates + monitoring
SSL expiry	MEDIUM	Auto-renewal + alerts
DNS failure	MEDIUM	Health checks
Disk full	MEDIUM	Monitoring + log rotation
Gateway exhaustion	MEDIUM	Pool size alerts

Next Steps

Before Starting Implementation

✅ Read this document - Understand architecture and decisions
⚠️ Wait for Issue #30 - DNS architecture decision
📋 Create GitHub issue - Track production deployment work
🧪 Plan testing strategy - How to verify deployment

Implementation Order

Phase 1: Core infrastructure (VPS, security, dependencies)
Phase 2: Script adaptation (after Issue #30)
Phase 3: Deployment testing (staging environment)
Phase 4: Operations setup (monitoring, backups)

Success Criteria

[ ] Production deployment works end-to-end
[ ] All services auto-start on boot
[ ] Monitoring and alerts configured
[ ] Backups tested and working
[ ] Security audit passed
[ ] Load testing passed
[ ] Documentation complete

Conclusion

The Holistix Forge local development environment is remarkably production-ready. The main work required is:

Wait for DNS simplification (Issue #30)
Remove dev container layer (install directly)
Add Let's Encrypt SSL (instead of mkcert)
Create systemd services (proper management)
Implement security hardening (firewall, SSH, etc.)
Setup operations (monitoring, backups, alerts)

Key Insight: 85% of development work transfers to production. The architecture is solid, the foundation is there. The main task is creating and testing the production-specific scripts and procedures.

Local Development Guide - Development environment setup
DNS Complete Guide - DNS architecture details
Gateway Architecture - Gateway system design
System Architecture - Overall system design
GitHub Issue #30 - DNS simplification

Document Status: 📋 Planning/Reference Only
Implementation Status: Not started - waiting for Issue #30
Next Action: Create GitHub issue to track implementation work
Maintainer: Core team

Production Deployment Guide

📋 Table of Contents

Quick Reference

Detailed Analysis

Executive Summary

Key Findings

Deployment Strategy

Key Decisions

Decision 1: Direct Install (No Dev Container) ✅

Decision 2: DNS Architecture ⚠️ CHANGING

Decision 3: Let's Encrypt SSL ✅

Decision 4: Pre-Built Artifacts ✅

Decision 5: systemd Services ✅

Development vs Production Comparison

Quick Reference Table

DNS Architecture Comparison

SSL/TLS Comparison

Service Management Comparison

Security Comparison

Implementation Roadmap

Phase 1: Core Infrastructure (Week 1)

Phase 2: Script Adaptation (Week 2)

Phase 3: Deployment & Testing (Week 3)

Phase 4: Operations Setup (Week 4)

Architecture Analysis

What We Have (Development)

Component Reusability Matrix

Component-by-Component Breakdown

PostgreSQL

DNS (⚠️ Architecture Changing)

Nginx

Ganymede (API Server)

Gateway Pool

Frontend

Security Considerations

Firewall Configuration

SSH Hardening

Database Security

Secrets Management

Rate Limiting

Deployment Procedures (Future)

Prerequisites

Deployment Steps (High-Level)

Operations & Maintenance

Monitoring

Backups

Common Operations

Cost & Resource Planning

VPS Cost Estimates

Resource Distribution (8GB VPS)

Storage Planning

Bandwidth Estimation

Script Reusability Summary

Scripts That Work As-Is (100%)

Scripts Needing Minor Changes (85-95%)

Scripts Not Needed in Production

New Scripts Needed

Timeline Estimates

Development Setup (First Time)

Production Setup (Estimated)

Risk Assessment

Development Risks (Low)

Production Risks (High)

Next Steps

Before Starting Implementation

Implementation Order

Success Criteria

Conclusion

Related Documentation