#!/bin/bash # Module 8-11: Secrets, How-To, Troubleshooting, Architecture cat << 'EOFGUIDES' --- ## 8️⃣ SECRETS & PASSWORDS ### Vault Access All secrets are stored in HashiCorp Vault for security. ```bash # Set Vault environment variables export VAULT_ADDR="http://127.0.0.1:8200" export VAULT_TOKEN="hvs.jYguDdf2IzobXG8b9QWyATV8" # List all available secret paths vault kv list aimpress/ # Get a specific secret vault kv get aimpress/ # Get secret in JSON format vault kv get -format=json aimpress/ | jq '.data.data' # Add new secret vault kv put aimpress/ \ username="admin" \ password="secret123" \ api_key="xyz" # Update existing secret vault kv patch aimpress/ password="new_password" # Delete secret vault kv delete aimpress/ # View secret history vault kv metadata get aimpress/ ``` ### Vault Secret Structure ``` secret/ ├── monitoring/ │ ├── slack_webhook │ └── alert_email └── outline/ └── scripts/ ├── api_token ├── api_url ├── reports_collection_id └── technical_kb_collection_id aimpress/ ├── postgres/ │ ├── admin # Main PostgreSQL admin │ ├── outline # Outline database user │ └── odoo # Odoo database user ├── odoo # Odoo admin credentials ├── authentik # Authentik admin credentials ├── grafana # Grafana admin credentials └── portainer # Portainer admin credentials ``` ### Vault Unseal (if sealed) ```bash # Check Vault status docker exec vault vault status # Unseal Vault /opt/00-infrastructure/vault/unseal.sh # Or manually with keys docker exec vault vault operator unseal docker exec vault vault operator unseal docker exec vault vault operator unseal ``` --- ## 9️⃣ HOW TO GUIDES ### Restart a Service ```bash # Quick restart docker restart # Graceful restart with docker-compose cd /opt/ docker-compose restart # Full restart (if config changed) cd /opt/ docker-compose down docker-compose up -d # Restart all services in compose file cd /opt/ docker-compose restart ``` ### View Logs ```bash # Real-time logs (follow) docker logs -f --tail 100 # Last 500 lines docker logs --tail 500 # Logs with timestamps docker logs -t # Logs from last hour docker logs --since 1h # Logs from specific time docker logs --since "2025-11-03T10:00:00" # Save logs to file docker logs > /tmp/container.log # Search in logs docker logs 2>&1 | grep "error" ``` ### Update a Container ```bash # 1. BACKUP FIRST! /opt/05-backups/scripts/backup-full-enhanced.sh # 2. Pull new image cd /opt/ docker-compose pull # 3. Recreate container docker-compose up -d # 4. Check logs docker logs -f # 5. Verify service is working curl -I https:// ``` ### Delete a Container/Service ```bash # ⚠️ WARNING: This will delete data! Backup first! # 1. Create backup /opt/05-backups/scripts/backup-full-enhanced.sh # 2. Stop and remove container docker stop docker rm # 3. Remove volumes (if not needed) docker volume ls | grep docker volume rm # 4. Remove from docker-compose.yml cd /opt/ nano docker-compose.yml # Delete service section # 5. Restart remaining services docker-compose up -d ``` ### Change Password #### For Web Applications ```bash # 1. Login to web interface # 2. Go to Settings → Users → Change Password # 3. Save new password to Vault export VAULT_ADDR="http://127.0.0.1:8200" export VAULT_TOKEN="hvs.jYguDdf2IzobXG8b9QWyATV8" vault kv put aimpress/ password="" ``` #### For PostgreSQL ```bash # 1. Generate strong password NEW_PASS=$(openssl rand -base64 32) echo "New password: $NEW_PASS" # 2. Change in database docker exec postgres-main psql -U aimpress_admin -c \ "ALTER USER WITH PASSWORD '$NEW_PASS';" # 3. Save to Vault vault kv put aimpress/postgres/ password="$NEW_PASS" # 4. Update application config cd /opt/ nano docker-compose.yml # Update DB_PASSWORD # 5. Restart application docker-compose restart ``` ### Clean Up Disk Space ```bash # Clean Docker (removes unused images, containers, volumes) docker system prune -af --volumes # Clean old logs find /opt -name "*.log" -mtime +7 -delete find /var/log -name "*.log" -mtime +7 -exec truncate -s 0 {} \; # Clean old backups (keep last 60 days) find /mnt/backups -name "*.gz" -mtime +60 -delete # Find and remove large files find /opt /mnt -type f -size +1G -exec ls -lh {} \; # Check what's using space du -sh /opt/* /mnt/* | sort -h ``` ### Add New Service ```bash # 1. Create directory structure mkdir -p /opt/03-business/ cd /opt/03-business/ # 2. Create docker-compose.yml nano docker-compose.yml # 3. Add to Traefik network docker network connect traefik-public # 4. Start service docker-compose up -d # 5. Check logs docker logs -f # 6. Add credentials to Vault vault kv put aimpress/ \ username="admin" \ password="$(openssl rand -base64 32)" ``` --- ## 🔟 TROUBLESHOOTING ### Website Not Responding (502/503 Error) ```bash # 1. Check Traefik (reverse proxy) docker logs traefik --tail 100 | grep -i error # 2. Check if target container is running docker ps | grep # 3. Check target container logs docker logs --tail 100 # 4. Test direct connection (bypass Traefik) docker exec curl -I localhost: # 5. Check if service is in correct network docker network inspect traefik-public | grep # 6. Restart Traefik docker restart traefik # 7. Restart service docker restart ``` ### Database Connection Errors ```bash # 1. Check PostgreSQL is running docker ps | grep postgres-main docker logs postgres-main --tail 100 # 2. Test database connection docker exec postgres-main psql -U aimpress_admin -c "SELECT version();" # 3. Check disk space (common issue) df -h /mnt/psql-data # 4. Check database credentials vault kv get aimpress/postgres/ # 5. Verify application database config cd /opt/ cat docker-compose.yml | grep -i database # 6. Restart PostgreSQL (careful!) docker restart postgres-main # 7. Check PostgreSQL connections docker exec postgres-main psql -U aimpress_admin -c \ "SELECT * FROM pg_stat_activity;" ``` ### Container Keeps Restarting ```bash # 1. Check logs for error docker logs --tail 200 # 2. Check restart count docker inspect --format='{{.RestartCount}}' # 3. Check resource usage docker stats --no-stream free -h df -h # 4. Stop auto-restart temporarily docker update --restart=no # 5. Start manually to see full error docker start && docker logs -f # 6. Check for port conflicts netstat -tulpn | grep # 7. Check healthcheck docker inspect | grep -A 20 Health ``` ### Out of Disk Space ```bash # 1. Find what's using space du -sh /opt/* /mnt/* /var/* | sort -h | tail -20 # 2. Check Docker disk usage docker system df # 3. Clean Docker docker system prune -af --volumes # 4. Clean logs journalctl --vacuum-time=7d find /var/log -name "*.log" -mtime +7 -exec rm {} \; # 5. Clean old backups find /mnt/backups -mtime +30 -delete # 6. Remove stopped containers docker container prune -f # 7. Remove unused volumes docker volume prune -f ``` ### SSL Certificate Issues ```bash # 1. Check Traefik logs for certificate errors docker logs traefik 2>&1 | grep -i "certificate\|acme\|letsencrypt" # 2. Check certificate status docker exec traefik cat /letsencrypt/acme.json | jq '.Certificates' # 3. Force certificate renewal docker exec traefik rm -rf /letsencrypt/acme.json docker restart traefik # Wait 2-3 minutes for new certificates # 4. Check domain DNS dig nslookup # 5. Test certificate openssl s_client -connect :443 -servername ``` ### High Memory Usage ```bash # 1. Check which containers use most memory docker stats --no-stream | sort -k4 -h # 2. Check system memory free -h top -bn1 | head -20 # 3. Find memory-heavy processes ps aux --sort=-%mem | head -10 # 4. Restart heavy containers docker restart # 5. Check for memory leaks docker logs | grep -i "memory\|oom" # 6. Add memory limits to docker-compose.yml # deploy: # resources: # limits: # memory: 1G ``` ### Vault Sealed/Inaccessible ```bash # 1. Check Vault status docker logs vault --tail 50 # 2. Check if sealed docker exec vault vault status # 3. Unseal Vault /opt/00-infrastructure/vault/unseal.sh # 4. If unseal script doesn't work docker exec vault vault operator unseal docker exec vault vault operator unseal docker exec vault vault operator unseal # 5. Get new token if expired docker exec vault vault login # Enter root token ``` --- ## 1️⃣1️⃣ SYSTEM ARCHITECTURE ### High-Level Overview ``` INTERNET ↓ Cloudflare (DNS + CDN) ↓ OVH Server (51.89.231.46) ↓ Traefik (Reverse Proxy + SSL) ↓ ↓ ┌───────┴───────────┴──────┬────────┐ ↓ ↓ ↓ Services Monitoring Tools ┌─────┴──────┐ ┌────┴────┐ ┌──┴───┐ ↓ ↓ ↓ ↓ ↓ ↓ Wiki N8N Grafana Loki Portainer Odoo Supabase Prometheus Vault Authentik ... ↓ ↓ └────────────┴────────┐ ↓ Infrastructure ┌──────┬──────┬──────┐ ↓ ↓ ↓ ↓ PostgreSQL Redis Vault Storage (/mnt/psql-data) (/mnt/backups) ``` ### Request Flow 1. **User** visits `https://wiki.ai-impress.com` 2. **Cloudflare** resolves DNS → 51.89.231.46 3. **Traefik** receives HTTPS request (port 443) 4. **Traefik** terminates SSL (Let's Encrypt certificate) 5. **Traefik** routes to Outline container via internal network 6. **Outline** authenticates user (via Authentik if needed) 7. **Outline** queries PostgreSQL for content 8. **Response** flows back: Outline → Traefik → User **Critical points:** If Traefik or PostgreSQL fail, all services go down. ### Directory Structure ``` /opt/ # All applications ├── 00-infrastructure/ # Core infrastructure │ ├── traefik/ # Reverse proxy + SSL │ ├── postgres/ # PostgreSQL main │ ├── vault/ # Secrets storage │ │ ├── .vault-token # Root token (keep safe!) │ │ └── unseal.sh # Unseal script │ └── redis/ # Cache & queues │ ├── 01-network/ # Network services │ └── authentik/ # SSO authentication │ ├── 02-core/ # Core business apps │ ├── outline/ # Wiki │ ├── n8n/ # Automation │ └── supabase/ # Backend platform │ ├── 03-business/ # Business applications │ ├── odoo/ # ERP │ └── documenso/ # Document signing │ ├── 04-tools/ # Management tools │ ├── monitoring/ # Grafana, Prometheus, Loki │ │ └── docker-compose.yml │ ├── portainer/ # Docker UI │ └── uptime-kuma/ # Uptime monitoring │ └── 05-backups/ # Backup system ├── scripts/ # 👈 ALL ADMIN SCRIPTS │ ├── admin.sh # Main admin interface │ ├── health-check-alerting.sh # Health monitoring │ ├── backup-full-enhanced.sh # Full backups │ ├── upload-to-outline.sh # Report uploads │ └── ... ├── logs/ # Script logs ├── reports/ # JSON reports └── config-versions/ # Config backups /opt/infrastructure-docs/ # Documentation ├── scripts/ │ ├── server-full-report.sh # Generate this report │ └── modules/ # Report modules ├── reports/ # Generated MD reports ├── ADMIN_GUIDE.md # Administrator guide └── SUMMARY.md # Setup summary /mnt/ # Mounted storage ├── backups/ # 💾 ALL BACKUPS HERE │ ├── *.tar.gz # System backups │ ├── *.sql.gz # Database backups │ └── vault-export-*.json # Vault backups │ └── psql-data/ # 🗄️ ALL DATABASES HERE ├── pgdata/ # postgres-main ├── supabase-db/ # Supabase └── pgadmin-data/ # pgAdmin ``` ### Service Dependencies **Level 1: Infrastructure (Cannot delete)** - Traefik → All websites depend on it - PostgreSQL → All applications need database - Redis → N8N, Authentik, others need cache - Vault → All passwords stored here **Level 2: Network** - Authentik → Provides SSO for many services **Level 3: Applications** - Outline, N8N, Odoo, etc → Can delete if backed up **Level 4: Tools** - Grafana, Portainer, etc → Can recreate ### Network Architecture **Docker Networks:** - `traefik-public` - Public-facing services - `postgres-network` - Database connections - `redis-network` - Redis connections - `authentik-network` - Auth services - Individual service networks **Port Mapping:** - `80` → Traefik HTTP (redirects to HTTPS) - `443` → Traefik HTTPS - `22` → SSH - Internal services: No direct external access ### Backup Strategy **Automated backups:** - **Daily 02:00** - Configuration backup - **Daily 03:00** - Full system backup - **Every 4 hours** - Vault secrets export - **Weekly Saturday** - Database maintenance **Backup locations:** - `/mnt/backups/` - Primary backup location - Vault exports in same location **What's backed up:** - All databases (PostgreSQL) - All Docker configurations - All Vault secrets - Application data volumes ### Monitoring Setup **Grafana** → Visualizations **Prometheus** → Metrics collection **Loki** → Log aggregation **cAdvisor** → Container metrics **Node Exporter** → System metrics **Alertmanager** → Alert routing **Blackbox Exporter** → Website monitoring **Alert flow:** Health check → Detects problem → Sends to Slack + Email --- ## 📞 QUICK REFERENCE GUIDE ### Main Scripts ```bash # System status /opt/05-backups/scripts/admin.sh status # Health check + alerts /opt/05-backups/scripts/health-check-alerting.sh # Generate full report /opt/infrastructure-docs/scripts/server-full-report.sh # Upload to Outline /opt/05-backups/scripts/upload-to-outline.sh latest-report # Full backup /opt/05-backups/scripts/backup-full-enhanced.sh ``` ### Emergency Commands ```bash # Restart all containers docker restart $(docker ps -q) # Stop all containers docker stop $(docker ps -q) # Clean up everything docker system prune -af --volumes # Check logs for errors docker ps -a | grep -v Up | awk '{print $NF}' | xargs -I {} docker logs --tail 50 {} ``` ### Useful Aliases (Add to .bashrc) ```bash alias dps='docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"' alias dlogs='docker logs -f --tail 100' alias dstats='docker stats --no-stream' alias dclean='docker system prune -af' alias vaultlogin='export VAULT_ADDR=http://127.0.0.1:8200 && export VAULT_TOKEN=hvs.jYguDdf2IzobXG8b9QWyATV8' ``` ### Important URLs - **Wiki:** https://wiki.ai-impress.com - **N8N:** https://n8n.ai-impress.com - **Monitoring:** https://grafana.ai-impress.com - **Uptime:** https://uptime.ai-impress.com - **Docker UI:** https://portainer.ai-impress.com ### Support Resources - **Documentation:** https://wiki.ai-impress.com/collection/technical-knowledge-base-vCle1zKdMA - **Reports:** https://wiki.ai-impress.com/collection/reports-yRgWpcsnJ7 - **Slack:** #alerts channel - **Email:** admin@ai-impress.com --- **Report End** **Generated:** TIMESTAMP_PLACEHOLDER **Version:** 5.0.0 (Modular) **Modules:** 7 sections EOFGUIDES