Production Deployment Guide
โ ๏ธ FUTURE WORK REFERENCE - This document is for planning and reference only
Status: ๐ Analysis complete, implementation NOT started
Purpose: Reference material for future production deployment implementation
Last Updated: 2025-01-17๐ง IMPORTANT NOTES:
- DNS Architecture Changing: See GitHub Issue #30 - PowerDNS will be replaced with CoreDNS file plugin
- Not Production Ready: Scripts and procedures need to be created and tested
- Architecture May Evolve: Recommendations here based on current dev setup, may change
- Use for Planning: Use this as a reference when implementing production deployment
๐ Table of Contents
Quick Reference
Detailed Analysis
- Architecture Analysis
- Component-by-Component Breakdown
- Security Considerations
- Deployment Procedures (Future)
- Operations & Maintenance
- Cost & Resource Planning
Executive Summary
This document provides comprehensive analysis and planning for Holistix Forge production deployment on Ubuntu VPS. The analysis shows that 85% of development scripts and 90% of architecture can be reused for production with minimal adaptation.
Key Findings
Good News:
- โ Development environment designed with production parity in mind
- โ Most scripts work with minor modifications
- โ Architecture is production-ready
- โ Main differences are simplifications (fewer components in production)
Important Caveats:
- โ ๏ธ DNS architecture is changing (see Issue #30)
- โ ๏ธ Production scripts need to be created
- โ ๏ธ Full deployment testing required
- โ ๏ธ Security hardening needs implementation
Deployment Strategy
Approach: Maximize reuse from development setup, implement only necessary differences.
Main Differences:
- โ No dev container - Install directly on Ubuntu VPS
- โ ๏ธ DNS: PowerDNS on port 53 (but changing to CoreDNS file plugin per #30)
- โ๏ธ SSL: Let's Encrypt with DNS-01 challenge (instead of mkcert)
- โ systemd services for process management
- โ Security hardening (firewall, SSH, passwords)
- โ Monitoring alerts and automated backups
Key Decisions
Decision 1: Direct Install (No Dev Container) โ
Recommendation: Install services directly on Ubuntu VPS
Reasoning:
- Dev container provides no functional benefit in production
- Simplifies operations and debugging
- Better performance (no container overhead)
- Standard Linux administration tools
Impact: Need to adapt scripts that assume container environment
Decision 2: DNS Architecture โ ๏ธ CHANGING
โ ๏ธ CRITICAL NOTE: This decision will change when Issue #30 is implemented.
Current Plan (in this doc): PowerDNS on port 53
Future Plan (Issue #30): CoreDNS file plugin with wildcard DNS
Result: Production will be even simpler (no database, no dynamic DNS operations)
Current Recommendation: PowerDNS on port 53 directly
Current Reasoning:
- Production has domain delegation (simpler than dev)
- CoreDNS only needed for local dev forwarding
- Single DNS server instead of two
Future Recommendation (after #30):
- Use CoreDNS with file plugin
- Static zone files with wildcard DNS (
*.domain) - No PowerDNS, no database, no dynamic operations
- Even simpler architecture
Impact: Wait for Issue #30 before implementing DNS setup
Decision 3: Let's Encrypt SSL โ
Recommendation: Let's Encrypt with DNS-01 challenge for wildcard certificates
Reasoning:
- Free and fully automated
- Wildcard support (critical for dynamic gateways/containers)
- Industry standard
- Automatic renewal
Requirements:
- DNS provider API access (Cloudflare, Route53, etc.)
- Certbot with DNS plugin
Impact: Need DNS provider API credentials
Decision 4: Pre-Built Artifacts โ
Recommendation: Build locally or in CI/CD, deploy artifacts only
Reasoning:
- No source code on production server
- Faster deployments
- No build tools needed on production
- Better security
Impact: Need deployment pipeline or local build process
Decision 5: systemd Services โ
Recommendation: Use systemd for all service management
Reasoning:
- Auto-restart on crash
- Start on boot
- Resource limits
- Standard logging (journalctl)
- Standard operations (systemctl)
Impact: Need to create systemd service files
Development vs Production Comparison
Quick Reference Table
| Component | Development | Production | Changes Required |
|---|---|---|---|
| Host Environment | Dev Container (Ubuntu) | Ubuntu VPS | โ Remove container layer |
| PostgreSQL | apt install |
apt install |
โ
Same install โ๏ธ Harden config |
| Nginx | apt install |
apt install |
โ
Same install โ Security headers |
| DNS | CoreDNS + PowerDNS | โ ๏ธ TBD (see #30) | โ ๏ธ Wait for Issue #30 |
| SSL | mkcert | Let's Encrypt | โ๏ธ Change SSL automation |
| Services | Manual start | systemd | โ Create service files |
| Node.js | NodeSource 24.x | NodeSource 24.x | โ Same |
| Docker | Docker Desktop | Docker Engine | โ Same (for gateways) |
| Monitoring | Optional | Required | โ Same stack + alerts |
DNS Architecture Comparison
โ ๏ธ NOTE: This comparison assumes current architecture. See Issue #30 for planned changes.
| Aspect | Development | Production (Current) | Production (Future #30) |
|---|---|---|---|
| Tiers | Two-tier | Single-tier | Single-tier |
| DNS Servers | CoreDNS + PowerDNS | PowerDNS only | CoreDNS only |
| Port | 53 (CoreDNS), 5300 (PowerDNS) | 53 (PowerDNS) | 53 (CoreDNS) |
| Database | PostgreSQL for PowerDNS | PostgreSQL for PowerDNS | None! |
| Dynamic DNS | Yes (via API) | Yes (via API) | No (wildcard) |
| Complexity | Medium | Low | Very Low |
Why Different?
- Dev: Need both local (
*.domain.local) and external DNS forwarding - Prod (current): Domain delegation handles routing, no forwarding needed
- Prod (future): Wildcard DNS eliminates need for dynamic records!
SSL/TLS Comparison
| Aspect | Development | Production |
|---|---|---|
| Tool | mkcert | Let's Encrypt (certbot) |
| Certificate Type | Self-signed | Trusted CA |
| Wildcard | โ
*.domain.local |
โ
*.your-domain.com |
| Challenge | N/A | DNS-01 (for wildcard) |
| Renewal | Never expires | Auto-renew every 90 days |
| Client Trust | Manual CA install | Automatic (browser trusted) |
| Cost | Free | Free |
Service Management Comparison
| Aspect | Development | Production |
|---|---|---|
| Ganymede | Manual node main.js & |
systemd service |
| DNS | Manual start | systemd service |
| Nginx | System service | systemd service |
| Auto-start | โ Manual | โ On boot |
| Restart on Crash | โ No | โ Yes |
| Resource Limits | โ None | โ systemd limits |
| Logging | Files | journalctl + files |
Security Comparison
| Aspect | Development | Production |
|---|---|---|
| Firewall | โ Not configured | โ ufw with strict rules |
| SSH | Default | โ Hardened (no root, key-only) |
| DB Password | devpassword |
Strong random (32 chars) |
| DB User | postgres superuser | โ Limited app user |
| SSL/TLS | Self-signed | Trusted CA |
| Rate Limiting | โ None | โ Nginx limits |
| Security Headers | โ None | โ X-Frame, CSP, etc. |
| Auto Updates | Manual | โ unattended-upgrades |
Implementation Roadmap
Phase 1: Core Infrastructure (Week 1)
Goal: Get VPS ready with basic services
Tasks:
- [ ] Provision Ubuntu 24.04 VPS
- [ ] Configure SSH hardening
- [ ] Setup firewall (ufw)
- [ ] Configure DNS at domain registrar
- [ ] Install Node.js, PostgreSQL, Nginx, Docker
- [ ] Setup Let's Encrypt SSL
Deliverables:
- Accessible VPS with hardened SSH
- Domain pointing to VPS
- SSL certificate working
- Core dependencies installed
Estimated Time: 8 hours (+ 24-48h DNS propagation wait)
Phase 2: Script Adaptation (Week 2)
Goal: Create production-specific scripts
Tasks:
- [ ] Wait for Issue #30 DNS architecture decision
- [ ] Create
scripts/production/setup-production.sh - [ ] Create systemd service files
- [ ] Adapt
create-env.shfor production - [ ] Create
scripts/production/deploy.sh - [ ] Create backup scripts
- [ ] Document all procedures
Deliverables:
- Production setup script
- Production environment creation script
- Deployment automation
- Backup automation
- systemd service templates
Estimated Time: 16 hours
Phase 3: Deployment & Testing (Week 3)
Goal: Deploy and verify full stack
Tasks:
- [ ] Build application artifacts
- [ ] Run production setup
- [ ] Create production environment
- [ ] Deploy artifacts
- [ ] Start services
- [ ] Test all functionality
- [ ] Fix issues
- [ ] Security audit
Deliverables:
- Working production deployment
- Test results documentation
- Issue tracking for bugs
Estimated Time: 24 hours
Phase 4: Operations Setup (Week 4)
Goal: Production-ready operations
Tasks:
- [ ] Configure Grafana alerts
- [ ] Setup external uptime monitoring
- [ ] Test backup/restore procedures
- [ ] Create runbooks
- [ ] Load testing
- [ ] Disaster recovery plan
- [ ] CI/CD integration
Deliverables:
- Monitoring and alerting configured
- Tested backup/restore procedures
- Operational runbooks
- CI/CD pipeline
Estimated Time: 24 hours
Total Timeline: ~4 weeks
Architecture Analysis
What We Have (Development)
Components:
- Main dev container (Ubuntu 24.04)
- PostgreSQL database
- PowerDNS (port 5300) + CoreDNS (port 53)
- Nginx for SSL and routing
- Ganymede API (Express.js)
- Gateway pool (Docker containers)
- User containers (Docker)
- Monitoring stack (Grafana, Loki, Tempo)
Strengths:
- โ Complete local development environment
- โ Production parity in architecture
- โ Comprehensive automation scripts
- โ Well-documented setup
Production Gaps:
- โ ๏ธ mkcert SSL (need Let's Encrypt)
- โ ๏ธ Manual process management (need systemd)
- โ ๏ธ Weak security defaults
- โ ๏ธ No monitoring alerts
- โ ๏ธ No automated backups
Component Reusability Matrix
| Component | Reusability | Notes |
|---|---|---|
| PostgreSQL setup | 85% | Add hardening steps |
| DNS (PowerDNS) | โ ๏ธ TBD | Wait for Issue #30 |
| DNS (CoreDNS) | โ ๏ธ TBD | May keep with file plugin |
| Nginx config | 85% | Change SSL paths, add headers |
| Ganymede app | 95% | No code changes |
| Gateway pool | 100% | Works as-is |
| Frontend build | 100% | No changes |
| Monitoring | 100% | Add alerts |
Overall Reusability: 85%
Component-by-Component Breakdown
PostgreSQL
Development:
- Installed via
apt install postgresql - Default configuration
- Weak password (
devpassword) - Superuser used directly
Production Adaptations:
- โ
Keep
apt install postgresql(same) - โ๏ธ Generate strong random password
- โ๏ธ Create limited application user (already in
create-env.sh!) - โ Configure connection limits
- โ Enable SSL/TLS for connections
- โ Setup automated backups
- โ Add monitoring
Script Impact:
setup-postgres.sh- Add hardening (85% reusable)create-env.sh- Already creates app user โ
DNS (โ ๏ธ Architecture Changing)
See Issue #30 for planned architecture changes.
Current Plan (may be obsolete):
- PowerDNS on port 53
- Remove CoreDNS
Future Plan (Issue #30):
- CoreDNS with file plugin on port 53
- Static zone files with wildcard DNS
- Remove PowerDNS entirely
Recommendation: Wait for Issue #30 before implementing DNS in production
Nginx
Development:
- SSL termination with mkcert
- Proxy to Ganymede and gateways
- Basic configuration
Production Adaptations:
- โ๏ธ Use Let's Encrypt SSL certificates
- โ Add security headers (X-Frame-Options, CSP, etc.)
- โ Add rate limiting
- โ Add gzip compression
- โ Keep proxy configuration (same)
- โ Keep dynamic gateway configs (same)
Security Headers:
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline';" always;
Reusability: 85%
Ganymede (API Server)
Development:
- Runs manually via
node main.js & - Logs to file
- No restart on crash
Production Adaptations:
- โ๏ธ Run via systemd service
- โ๏ธ Use production environment variables
- โ Add resource limits (systemd MemoryMax, CPUQuota)
- โ Add security sandboxing (systemd directives)
- โ Keep application code (no changes needed)
- โ Keep database schema (no changes)
systemd Service Example:
[Service]
Type=simple
User=holistix
WorkingDirectory=/opt/holistix/prod
EnvironmentFile=/opt/holistix/prod/.env.ganymede
# Security
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
# Resources
MemoryMax=2G
CPUQuota=200%
# Restart
Restart=on-failure
RestartSec=5s
ExecStart=/usr/bin/node dist/packages/app-ganymede/main.js
Reusability: 95%
Gateway Pool
Development:
- Docker containers
- HTTP build distribution
- Dynamic allocation
Production Adaptations:
- โ Keep Docker containers (same)
- โ Keep allocation logic (same)
- โ๏ธ Use pre-built artifacts instead of HTTP distribution
- โ Add container health checks
- โ Add resource limits (Docker
--memory,--cpus) - โ Keep lifecycle management (same)
Reusability: 95%
Frontend
Development:
- Built with Vite
- Served by Nginx
- Hot reload in dev mode
Production Adaptations:
- โ Keep Vite build process (same)
- โ๏ธ Build with
--configuration=production - โ Add cache headers in Nginx
- โ Add CDN integration (optional)
- โ Keep Nginx serving (same)
Reusability: 95%
Security Considerations
Firewall Configuration
Required Ports:
# SSH
ufw allow 22/tcp
# HTTP/HTTPS
ufw allow 80/tcp
ufw allow 443/tcp
# DNS
ufw allow 53/tcp
ufw allow 53/udp
# Block everything else
ufw default deny incoming
ufw default allow outgoing
ufw enable
SSH Hardening
# /etc/ssh/sshd_config
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3
LoginGraceTime 20
Database Security
Strong Passwords:
# Generate 32-character random password
DB_PASSWORD=$(openssl rand -base64 32)
Limited Privileges:
-- App user has only necessary permissions
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES TO app_user;
-- No CREATE, DROP, or user management
SSL Enforcement:
# PostgreSQL: require SSL for all connections
ssl = on
Secrets Management
DO NOT:
- โ Commit secrets to git
- โ Store in plain text
DO:
- โ Use environment files with 0600 permissions
- โ
Store in
/etc/holistix/secrets/ - โ Consider secret management tools (Vault, AWS Secrets Manager)
Rate Limiting
# Nginx rate limiting
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
location / {
limit_req zone=api burst=20;
}
Deployment Procedures (Future)
โ ๏ธ NOTE: These procedures are for reference only. They need to be tested and refined before use.
Prerequisites
VPS Requirements:
- Ubuntu 24.04 LTS
- 4 vCPU, 8GB RAM, 100GB SSD (minimum)
- Static public IP
- Cost: ~$40-50/month
Domain Requirements:
- Owned domain name
- DNS registrar access
- Ability to configure NS records
DNS Provider API:
- Cloudflare (recommended)
- Route 53 (AWS)
- Or other certbot-supported provider
Deployment Steps (High-Level)
- VPS Provisioning
- Create VPS instance
- Configure SSH hardening
- Setup firewall
- DNS Configuration
- Configure domain delegation
- Wait for propagation
- Application Setup
- Install dependencies
- Setup PostgreSQL
- Setup DNS (wait for Issue #30)
- Setup Let's Encrypt
- Environment Creation
- Build applications
- Create production environment
- Configure systemd services
- Verification
- Test DNS resolution
- Test HTTPS
- Test API
- Test frontend
- Operations Setup
- Configure monitoring
- Setup backups
- Configure alerts
Detailed procedures: Create after testing deployment.
Operations & Maintenance
Monitoring
Components to Monitor:
- System metrics (CPU, RAM, disk, network)
- Application metrics (API requests, response times)
- Database metrics (connections, queries)
- Gateway pool status
- Container metrics
Tools:
- Grafana (dashboards)
- Loki (logs)
- Tempo (traces)
- OTLP Collector
- UptimeRobot (external uptime)
Alert Rules:
- Gateway pool exhausted
- Disk usage > 80%
- High memory usage
- SSL certificate expiring (< 30 days)
- API errors > threshold
Backups
What to Backup:
- PostgreSQL databases (all
ganymede_*) - Organization data (
org-data/directory) - Nginx configurations
- Environment files (
.env.*) - SSL certificates (auto-renewed, but backup for safety)
Backup Schedule:
- Daily backups at 2 AM
- Keep last 7 days
- Weekly backups kept for 4 weeks
- Monthly backups kept for 12 months
Backup Script (Example):
#!/bin/bash
BACKUP_DIR="/opt/holistix/backups"
DATE=$(date +%Y%m%d_%H%M%S)
# Backup PostgreSQL
pg_dump ganymede_prod | gzip > "$BACKUP_DIR/postgres/ganymede_prod_${DATE}.sql.gz"
# Backup org-data
tar -czf "$BACKUP_DIR/org-data/org-data_${DATE}.tar.gz" /opt/holistix/prod/org-data/
# Cleanup old backups
find "$BACKUP_DIR" -mtime +7 -delete
Common Operations
Deploy Updates:
# Build locally
npx nx run-many --target=build --all --configuration=production
tar -czf holistix-$(git rev-parse --short HEAD).tar.gz dist/
# Deploy to VPS
scp holistix-*.tar.gz holistix@VPS:/tmp/
ssh holistix@VPS "cd /opt/holistix/prod && tar -xzf /tmp/holistix-*.tar.gz"
ssh holistix@VPS "systemctl restart ganymede@prod"
Scale Gateway Pool:
# Add 5 more gateways
ENV_NAME=prod DOMAIN=your-domain.com \
./scripts/local-dev/gateway-pool.sh create 5 /opt/holistix/monorepo
View Logs:
# System logs
journalctl -u ganymede@prod -f
# Application logs
tail -f /opt/holistix/prod/logs/ganymede.log
# Gateway logs
docker logs -f gw-pool-0
Cost & Resource Planning
VPS Cost Estimates
Minimum (testing):
- 2 vCPU, 4GB RAM, 50GB SSD
- $15-25/month
- Suitable for: Testing, small deployment
Recommended (production):
- 4 vCPU, 8GB RAM, 100GB SSD
- $35-50/month
- Suitable for: Production, 10-50 users
High Performance:
- 8 vCPU, 16GB RAM, 200GB SSD
- $80-120/month
- Suitable for: Large deployment, 100+ users
Resource Distribution (8GB VPS)
PostgreSQL: 2GB
Ganymede: 2GB
Gateway Pool: 3GB (10 gateways @ 300MB each)
User Containers: 1GB (2-4 containers)
System: 1GB (OS overhead)
Storage Planning
Application: 500MB (dist + node_modules)
PostgreSQL: 1-5GB (depends on usage)
Logs: 1-2GB (with rotation)
Backups: 5-10GB (7 days of DB backups)
User Data: Variable (org-data files)
Total: ~10-20GB typical
Bandwidth Estimation
Per User Per Day:
- Initial load: ~2MB (frontend bundle)
- WebSocket: ~1MB (collaboration)
- API requests: ~1MB
Example: 100 active users:
- Daily: ~400MB
- Monthly: ~12GB
- Well within typical 4TB bandwidth limits
Script Reusability Summary
Scripts That Work As-Is (100%)
install-node.sh- Node.js installationbuild-images.sh- Gateway Docker imagegateway-pool.sh- Gateway pool managementenvctl-monitor.sh- Environment monitoringbuild-frontend.sh- Frontend build
Scripts Needing Minor Changes (85-95%)
setup-postgres.sh- Add hardening stepscreate-env.sh- Replace mkcert with Let's Encrypt, adapt pathsenvctl.sh- Add systemd supportinstall-system-deps.sh- Minor tweaks
Scripts Not Needed in Production
setup-coredns.sh- DNS architecture changing (Issue #30)update-coredns.sh- DNS architecture changinginstall-mkcert.sh- Using Let's Encrypt instead
New Scripts Needed
scripts/production/setup-production.sh- Main production setupscripts/production/setup-letsencrypt.sh- SSL automationscripts/production/create-systemd-services.sh- Service filesscripts/production/harden-system.sh- Security hardeningscripts/production/deploy.sh- Deployment automationscripts/production/backup-all.sh- Backup automationscripts/production/restore.sh- Disaster recoveryscripts/production/health-check.sh- Deep health check
Timeline Estimates
Development Setup (First Time)
- Create dev container: 10 min
- Run
setup-all.sh: 15-20 min - Create environment: 5 min
- Build frontend: 5 min
- Configure host DNS: 10 min
- Total: 45-50 minutes
Production Setup (Estimated)
- Provision VPS: 10 min
- DNS delegation: 5 min (+ 24-48h wait)
- SSH & security: 30 min
- Run production setup: 20-30 min
- SSL certificate: 5 min
- Create environment: 10 min
- Deploy artifacts: 10 min
- Testing: 30 min
- Monitoring setup: 30 min
- Total: 2.5-3 hours (+ DNS propagation wait)
Risk Assessment
Development Risks (Low)
- Dev container crash โ Restart
- Data loss โ Not production data
- Security breach โ Local network only
Production Risks (High)
| Risk | Impact | Mitigation |
|---|---|---|
| VPS crash | HIGH | Monitoring + alerts + backups |
| Database corruption | HIGH | Daily backups + replication |
| Security breach | HIGH | Hardening + updates + monitoring |
| SSL expiry | MEDIUM | Auto-renewal + alerts |
| DNS failure | MEDIUM | Health checks |
| Disk full | MEDIUM | Monitoring + log rotation |
| Gateway exhaustion | MEDIUM | Pool size alerts |
Next Steps
Before Starting Implementation
- โ Read this document - Understand architecture and decisions
- โ ๏ธ Wait for Issue #30 - DNS architecture decision
- ๐ Create GitHub issue - Track production deployment work
- ๐งช Plan testing strategy - How to verify deployment
Implementation Order
- Phase 1: Core infrastructure (VPS, security, dependencies)
- Phase 2: Script adaptation (after Issue #30)
- Phase 3: Deployment testing (staging environment)
- Phase 4: Operations setup (monitoring, backups)
Success Criteria
- [ ] Production deployment works end-to-end
- [ ] All services auto-start on boot
- [ ] Monitoring and alerts configured
- [ ] Backups tested and working
- [ ] Security audit passed
- [ ] Load testing passed
- [ ] Documentation complete
Conclusion
The Holistix Forge local development environment is remarkably production-ready. The main work required is:
- Wait for DNS simplification (Issue #30)
- Remove dev container layer (install directly)
- Add Let's Encrypt SSL (instead of mkcert)
- Create systemd services (proper management)
- Implement security hardening (firewall, SSH, etc.)
- Setup operations (monitoring, backups, alerts)
Key Insight: 85% of development work transfers to production. The architecture is solid, the foundation is there. The main task is creating and testing the production-specific scripts and procedures.
Related Documentation
- Local Development Guide - Development environment setup
- DNS Complete Guide - DNS architecture details
- Gateway Architecture - Gateway system design
- System Architecture - Overall system design
- GitHub Issue #30 - DNS simplification
Document Status: ๐ Planning/Reference Only
Implementation Status: Not started - waiting for Issue #30
Next Action: Create GitHub issue to track implementation work
Maintainer: Core team