Set up three-tier synchronization: Syncthing (real-time), GitHub (version control), rsync (disaster recovery). Includes complete documentation for future Claude sessions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
678 lines
17 KiB
Bash
Executable file
678 lines
17 KiB
Bash
Executable file
#!/bin/bash
|
||
# Module 8-11: Secrets, How-To, Troubleshooting, Architecture
|
||
|
||
cat << 'EOFGUIDES'
|
||
---
|
||
|
||
## 8️⃣ SECRETS & PASSWORDS
|
||
|
||
### Vault Access
|
||
|
||
All secrets are stored in HashiCorp Vault for security.
|
||
|
||
```bash
|
||
# Set Vault environment variables
|
||
export VAULT_ADDR="http://127.0.0.1:8200"
|
||
export VAULT_TOKEN="hvs.jYguDdf2IzobXG8b9QWyATV8"
|
||
|
||
# List all available secret paths
|
||
vault kv list aimpress/
|
||
|
||
# Get a specific secret
|
||
vault kv get aimpress/<service-name>
|
||
|
||
# Get secret in JSON format
|
||
vault kv get -format=json aimpress/<service> | jq '.data.data'
|
||
|
||
# Add new secret
|
||
vault kv put aimpress/<service> \
|
||
username="admin" \
|
||
password="secret123" \
|
||
api_key="xyz"
|
||
|
||
# Update existing secret
|
||
vault kv patch aimpress/<service> password="new_password"
|
||
|
||
# Delete secret
|
||
vault kv delete aimpress/<service>
|
||
|
||
# View secret history
|
||
vault kv metadata get aimpress/<service>
|
||
```
|
||
|
||
### Vault Secret Structure
|
||
|
||
```
|
||
secret/
|
||
├── monitoring/
|
||
│ ├── slack_webhook
|
||
│ └── alert_email
|
||
└── outline/
|
||
└── scripts/
|
||
├── api_token
|
||
├── api_url
|
||
├── reports_collection_id
|
||
└── technical_kb_collection_id
|
||
|
||
aimpress/
|
||
├── postgres/
|
||
│ ├── admin # Main PostgreSQL admin
|
||
│ ├── outline # Outline database user
|
||
│ └── odoo # Odoo database user
|
||
├── odoo # Odoo admin credentials
|
||
├── authentik # Authentik admin credentials
|
||
├── grafana # Grafana admin credentials
|
||
└── portainer # Portainer admin credentials
|
||
```
|
||
|
||
### Vault Unseal (if sealed)
|
||
|
||
```bash
|
||
# Check Vault status
|
||
docker exec vault vault status
|
||
|
||
# Unseal Vault
|
||
/opt/00-infrastructure/vault/unseal.sh
|
||
|
||
# Or manually with keys
|
||
docker exec vault vault operator unseal <key1>
|
||
docker exec vault vault operator unseal <key2>
|
||
docker exec vault vault operator unseal <key3>
|
||
```
|
||
|
||
---
|
||
|
||
## 9️⃣ HOW TO GUIDES
|
||
|
||
### Restart a Service
|
||
|
||
```bash
|
||
# Quick restart
|
||
docker restart <container-name>
|
||
|
||
# Graceful restart with docker-compose
|
||
cd /opt/<service-directory>
|
||
docker-compose restart <service-name>
|
||
|
||
# Full restart (if config changed)
|
||
cd /opt/<service-directory>
|
||
docker-compose down
|
||
docker-compose up -d
|
||
|
||
# Restart all services in compose file
|
||
cd /opt/<service-directory>
|
||
docker-compose restart
|
||
```
|
||
|
||
### View Logs
|
||
|
||
```bash
|
||
# Real-time logs (follow)
|
||
docker logs -f --tail 100 <container-name>
|
||
|
||
# Last 500 lines
|
||
docker logs --tail 500 <container-name>
|
||
|
||
# Logs with timestamps
|
||
docker logs -t <container-name>
|
||
|
||
# Logs from last hour
|
||
docker logs --since 1h <container-name>
|
||
|
||
# Logs from specific time
|
||
docker logs --since "2025-11-03T10:00:00" <container-name>
|
||
|
||
# Save logs to file
|
||
docker logs <container-name> > /tmp/container.log
|
||
|
||
# Search in logs
|
||
docker logs <container-name> 2>&1 | grep "error"
|
||
```
|
||
|
||
### Update a Container
|
||
|
||
```bash
|
||
# 1. BACKUP FIRST!
|
||
/opt/05-backups/scripts/backup-full-enhanced.sh
|
||
|
||
# 2. Pull new image
|
||
cd /opt/<service-directory>
|
||
docker-compose pull <service-name>
|
||
|
||
# 3. Recreate container
|
||
docker-compose up -d <service-name>
|
||
|
||
# 4. Check logs
|
||
docker logs -f <container-name>
|
||
|
||
# 5. Verify service is working
|
||
curl -I https://<service-url>
|
||
```
|
||
|
||
### Delete a Container/Service
|
||
|
||
```bash
|
||
# ⚠️ WARNING: This will delete data! Backup first!
|
||
|
||
# 1. Create backup
|
||
/opt/05-backups/scripts/backup-full-enhanced.sh
|
||
|
||
# 2. Stop and remove container
|
||
docker stop <container-name>
|
||
docker rm <container-name>
|
||
|
||
# 3. Remove volumes (if not needed)
|
||
docker volume ls | grep <service>
|
||
docker volume rm <volume-name>
|
||
|
||
# 4. Remove from docker-compose.yml
|
||
cd /opt/<service-directory>
|
||
nano docker-compose.yml # Delete service section
|
||
|
||
# 5. Restart remaining services
|
||
docker-compose up -d
|
||
```
|
||
|
||
### Change Password
|
||
|
||
#### For Web Applications
|
||
|
||
```bash
|
||
# 1. Login to web interface
|
||
# 2. Go to Settings → Users → Change Password
|
||
# 3. Save new password to Vault
|
||
|
||
export VAULT_ADDR="http://127.0.0.1:8200"
|
||
export VAULT_TOKEN="hvs.jYguDdf2IzobXG8b9QWyATV8"
|
||
vault kv put aimpress/<service> password="<new-password>"
|
||
```
|
||
|
||
#### For PostgreSQL
|
||
|
||
```bash
|
||
# 1. Generate strong password
|
||
NEW_PASS=$(openssl rand -base64 32)
|
||
echo "New password: $NEW_PASS"
|
||
|
||
# 2. Change in database
|
||
docker exec postgres-main psql -U aimpress_admin -c \
|
||
"ALTER USER <username> WITH PASSWORD '$NEW_PASS';"
|
||
|
||
# 3. Save to Vault
|
||
vault kv put aimpress/postgres/<service> password="$NEW_PASS"
|
||
|
||
# 4. Update application config
|
||
cd /opt/<service-path>
|
||
nano docker-compose.yml # Update DB_PASSWORD
|
||
|
||
# 5. Restart application
|
||
docker-compose restart
|
||
```
|
||
|
||
### Clean Up Disk Space
|
||
|
||
```bash
|
||
# Clean Docker (removes unused images, containers, volumes)
|
||
docker system prune -af --volumes
|
||
|
||
# Clean old logs
|
||
find /opt -name "*.log" -mtime +7 -delete
|
||
find /var/log -name "*.log" -mtime +7 -exec truncate -s 0 {} \;
|
||
|
||
# Clean old backups (keep last 60 days)
|
||
find /mnt/backups -name "*.gz" -mtime +60 -delete
|
||
|
||
# Find and remove large files
|
||
find /opt /mnt -type f -size +1G -exec ls -lh {} \;
|
||
|
||
# Check what's using space
|
||
du -sh /opt/* /mnt/* | sort -h
|
||
```
|
||
|
||
### Add New Service
|
||
|
||
```bash
|
||
# 1. Create directory structure
|
||
mkdir -p /opt/03-business/<service-name>
|
||
cd /opt/03-business/<service-name>
|
||
|
||
# 2. Create docker-compose.yml
|
||
nano docker-compose.yml
|
||
|
||
# 3. Add to Traefik network
|
||
docker network connect traefik-public <container-name>
|
||
|
||
# 4. Start service
|
||
docker-compose up -d
|
||
|
||
# 5. Check logs
|
||
docker logs -f <container-name>
|
||
|
||
# 6. Add credentials to Vault
|
||
vault kv put aimpress/<service> \
|
||
username="admin" \
|
||
password="$(openssl rand -base64 32)"
|
||
```
|
||
|
||
---
|
||
|
||
## 🔟 TROUBLESHOOTING
|
||
|
||
### Website Not Responding (502/503 Error)
|
||
|
||
```bash
|
||
# 1. Check Traefik (reverse proxy)
|
||
docker logs traefik --tail 100 | grep -i error
|
||
|
||
# 2. Check if target container is running
|
||
docker ps | grep <service-name>
|
||
|
||
# 3. Check target container logs
|
||
docker logs <service-name> --tail 100
|
||
|
||
# 4. Test direct connection (bypass Traefik)
|
||
docker exec <service-name> curl -I localhost:<port>
|
||
|
||
# 5. Check if service is in correct network
|
||
docker network inspect traefik-public | grep <service-name>
|
||
|
||
# 6. Restart Traefik
|
||
docker restart traefik
|
||
|
||
# 7. Restart service
|
||
docker restart <service-name>
|
||
```
|
||
|
||
### Database Connection Errors
|
||
|
||
```bash
|
||
# 1. Check PostgreSQL is running
|
||
docker ps | grep postgres-main
|
||
docker logs postgres-main --tail 100
|
||
|
||
# 2. Test database connection
|
||
docker exec postgres-main psql -U aimpress_admin -c "SELECT version();"
|
||
|
||
# 3. Check disk space (common issue)
|
||
df -h /mnt/psql-data
|
||
|
||
# 4. Check database credentials
|
||
vault kv get aimpress/postgres/<service>
|
||
|
||
# 5. Verify application database config
|
||
cd /opt/<service-path>
|
||
cat docker-compose.yml | grep -i database
|
||
|
||
# 6. Restart PostgreSQL (careful!)
|
||
docker restart postgres-main
|
||
|
||
# 7. Check PostgreSQL connections
|
||
docker exec postgres-main psql -U aimpress_admin -c \
|
||
"SELECT * FROM pg_stat_activity;"
|
||
```
|
||
|
||
### Container Keeps Restarting
|
||
|
||
```bash
|
||
# 1. Check logs for error
|
||
docker logs <container-name> --tail 200
|
||
|
||
# 2. Check restart count
|
||
docker inspect --format='{{.RestartCount}}' <container-name>
|
||
|
||
# 3. Check resource usage
|
||
docker stats <container-name> --no-stream
|
||
free -h
|
||
df -h
|
||
|
||
# 4. Stop auto-restart temporarily
|
||
docker update --restart=no <container-name>
|
||
|
||
# 5. Start manually to see full error
|
||
docker start <container-name> && docker logs -f <container-name>
|
||
|
||
# 6. Check for port conflicts
|
||
netstat -tulpn | grep <port>
|
||
|
||
# 7. Check healthcheck
|
||
docker inspect <container-name> | grep -A 20 Health
|
||
```
|
||
|
||
### Out of Disk Space
|
||
|
||
```bash
|
||
# 1. Find what's using space
|
||
du -sh /opt/* /mnt/* /var/* | sort -h | tail -20
|
||
|
||
# 2. Check Docker disk usage
|
||
docker system df
|
||
|
||
# 3. Clean Docker
|
||
docker system prune -af --volumes
|
||
|
||
# 4. Clean logs
|
||
journalctl --vacuum-time=7d
|
||
find /var/log -name "*.log" -mtime +7 -exec rm {} \;
|
||
|
||
# 5. Clean old backups
|
||
find /mnt/backups -mtime +30 -delete
|
||
|
||
# 6. Remove stopped containers
|
||
docker container prune -f
|
||
|
||
# 7. Remove unused volumes
|
||
docker volume prune -f
|
||
```
|
||
|
||
### SSL Certificate Issues
|
||
|
||
```bash
|
||
# 1. Check Traefik logs for certificate errors
|
||
docker logs traefik 2>&1 | grep -i "certificate\|acme\|letsencrypt"
|
||
|
||
# 2. Check certificate status
|
||
docker exec traefik cat /letsencrypt/acme.json | jq '.Certificates'
|
||
|
||
# 3. Force certificate renewal
|
||
docker exec traefik rm -rf /letsencrypt/acme.json
|
||
docker restart traefik
|
||
|
||
# Wait 2-3 minutes for new certificates
|
||
|
||
# 4. Check domain DNS
|
||
dig <domain.com>
|
||
nslookup <domain.com>
|
||
|
||
# 5. Test certificate
|
||
openssl s_client -connect <domain>:443 -servername <domain>
|
||
```
|
||
|
||
### High Memory Usage
|
||
|
||
```bash
|
||
# 1. Check which containers use most memory
|
||
docker stats --no-stream | sort -k4 -h
|
||
|
||
# 2. Check system memory
|
||
free -h
|
||
top -bn1 | head -20
|
||
|
||
# 3. Find memory-heavy processes
|
||
ps aux --sort=-%mem | head -10
|
||
|
||
# 4. Restart heavy containers
|
||
docker restart <container-name>
|
||
|
||
# 5. Check for memory leaks
|
||
docker logs <container-name> | grep -i "memory\|oom"
|
||
|
||
# 6. Add memory limits to docker-compose.yml
|
||
# deploy:
|
||
# resources:
|
||
# limits:
|
||
# memory: 1G
|
||
```
|
||
|
||
### Vault Sealed/Inaccessible
|
||
|
||
```bash
|
||
# 1. Check Vault status
|
||
docker logs vault --tail 50
|
||
|
||
# 2. Check if sealed
|
||
docker exec vault vault status
|
||
|
||
# 3. Unseal Vault
|
||
/opt/00-infrastructure/vault/unseal.sh
|
||
|
||
# 4. If unseal script doesn't work
|
||
docker exec vault vault operator unseal <key1>
|
||
docker exec vault vault operator unseal <key2>
|
||
docker exec vault vault operator unseal <key3>
|
||
|
||
# 5. Get new token if expired
|
||
docker exec vault vault login
|
||
# Enter root token
|
||
```
|
||
|
||
---
|
||
|
||
## 1️⃣1️⃣ SYSTEM ARCHITECTURE
|
||
|
||
### High-Level Overview
|
||
|
||
```
|
||
INTERNET
|
||
↓
|
||
Cloudflare (DNS + CDN)
|
||
↓
|
||
OVH Server (51.89.231.46)
|
||
↓
|
||
Traefik (Reverse Proxy + SSL)
|
||
↓ ↓
|
||
┌───────┴───────────┴──────┬────────┐
|
||
↓ ↓ ↓
|
||
Services Monitoring Tools
|
||
┌─────┴──────┐ ┌────┴────┐ ┌──┴───┐
|
||
↓ ↓ ↓ ↓ ↓ ↓
|
||
Wiki N8N Grafana Loki Portainer
|
||
Odoo Supabase Prometheus Vault
|
||
Authentik ...
|
||
↓ ↓
|
||
└────────────┴────────┐
|
||
↓
|
||
Infrastructure
|
||
┌──────┬──────┬──────┐
|
||
↓ ↓ ↓ ↓
|
||
PostgreSQL Redis Vault Storage
|
||
(/mnt/psql-data) (/mnt/backups)
|
||
```
|
||
|
||
### Request Flow
|
||
|
||
1. **User** visits `https://wiki.ai-impress.com`
|
||
2. **Cloudflare** resolves DNS → 51.89.231.46
|
||
3. **Traefik** receives HTTPS request (port 443)
|
||
4. **Traefik** terminates SSL (Let's Encrypt certificate)
|
||
5. **Traefik** routes to Outline container via internal network
|
||
6. **Outline** authenticates user (via Authentik if needed)
|
||
7. **Outline** queries PostgreSQL for content
|
||
8. **Response** flows back: Outline → Traefik → User
|
||
|
||
**Critical points:** If Traefik or PostgreSQL fail, all services go down.
|
||
|
||
### Directory Structure
|
||
|
||
```
|
||
/opt/ # All applications
|
||
├── 00-infrastructure/ # Core infrastructure
|
||
│ ├── traefik/ # Reverse proxy + SSL
|
||
│ ├── postgres/ # PostgreSQL main
|
||
│ ├── vault/ # Secrets storage
|
||
│ │ ├── .vault-token # Root token (keep safe!)
|
||
│ │ └── unseal.sh # Unseal script
|
||
│ └── redis/ # Cache & queues
|
||
│
|
||
├── 01-network/ # Network services
|
||
│ └── authentik/ # SSO authentication
|
||
│
|
||
├── 02-core/ # Core business apps
|
||
│ ├── outline/ # Wiki
|
||
│ ├── n8n/ # Automation
|
||
│ └── supabase/ # Backend platform
|
||
│
|
||
├── 03-business/ # Business applications
|
||
│ ├── odoo/ # ERP
|
||
│ └── documenso/ # Document signing
|
||
│
|
||
├── 04-tools/ # Management tools
|
||
│ ├── monitoring/ # Grafana, Prometheus, Loki
|
||
│ │ └── docker-compose.yml
|
||
│ ├── portainer/ # Docker UI
|
||
│ └── uptime-kuma/ # Uptime monitoring
|
||
│
|
||
└── 05-backups/ # Backup system
|
||
├── scripts/ # 👈 ALL ADMIN SCRIPTS
|
||
│ ├── admin.sh # Main admin interface
|
||
│ ├── health-check-alerting.sh # Health monitoring
|
||
│ ├── backup-full-enhanced.sh # Full backups
|
||
│ ├── upload-to-outline.sh # Report uploads
|
||
│ └── ...
|
||
├── logs/ # Script logs
|
||
├── reports/ # JSON reports
|
||
└── config-versions/ # Config backups
|
||
|
||
/opt/infrastructure-docs/ # Documentation
|
||
├── scripts/
|
||
│ ├── server-full-report.sh # Generate this report
|
||
│ └── modules/ # Report modules
|
||
├── reports/ # Generated MD reports
|
||
├── ADMIN_GUIDE.md # Administrator guide
|
||
└── SUMMARY.md # Setup summary
|
||
|
||
/mnt/ # Mounted storage
|
||
├── backups/ # 💾 ALL BACKUPS HERE
|
||
│ ├── *.tar.gz # System backups
|
||
│ ├── *.sql.gz # Database backups
|
||
│ └── vault-export-*.json # Vault backups
|
||
│
|
||
└── psql-data/ # 🗄️ ALL DATABASES HERE
|
||
├── pgdata/ # postgres-main
|
||
├── supabase-db/ # Supabase
|
||
└── pgadmin-data/ # pgAdmin
|
||
```
|
||
|
||
### Service Dependencies
|
||
|
||
**Level 1: Infrastructure (Cannot delete)**
|
||
- Traefik → All websites depend on it
|
||
- PostgreSQL → All applications need database
|
||
- Redis → N8N, Authentik, others need cache
|
||
- Vault → All passwords stored here
|
||
|
||
**Level 2: Network**
|
||
- Authentik → Provides SSO for many services
|
||
|
||
**Level 3: Applications**
|
||
- Outline, N8N, Odoo, etc → Can delete if backed up
|
||
|
||
**Level 4: Tools**
|
||
- Grafana, Portainer, etc → Can recreate
|
||
|
||
### Network Architecture
|
||
|
||
**Docker Networks:**
|
||
- `traefik-public` - Public-facing services
|
||
- `postgres-network` - Database connections
|
||
- `redis-network` - Redis connections
|
||
- `authentik-network` - Auth services
|
||
- Individual service networks
|
||
|
||
**Port Mapping:**
|
||
- `80` → Traefik HTTP (redirects to HTTPS)
|
||
- `443` → Traefik HTTPS
|
||
- `22` → SSH
|
||
- Internal services: No direct external access
|
||
|
||
### Backup Strategy
|
||
|
||
**Automated backups:**
|
||
- **Daily 02:00** - Configuration backup
|
||
- **Daily 03:00** - Full system backup
|
||
- **Every 4 hours** - Vault secrets export
|
||
- **Weekly Saturday** - Database maintenance
|
||
|
||
**Backup locations:**
|
||
- `/mnt/backups/` - Primary backup location
|
||
- Vault exports in same location
|
||
|
||
**What's backed up:**
|
||
- All databases (PostgreSQL)
|
||
- All Docker configurations
|
||
- All Vault secrets
|
||
- Application data volumes
|
||
|
||
### Monitoring Setup
|
||
|
||
**Grafana** → Visualizations
|
||
**Prometheus** → Metrics collection
|
||
**Loki** → Log aggregation
|
||
**cAdvisor** → Container metrics
|
||
**Node Exporter** → System metrics
|
||
**Alertmanager** → Alert routing
|
||
**Blackbox Exporter** → Website monitoring
|
||
|
||
**Alert flow:**
|
||
Health check → Detects problem → Sends to Slack + Email
|
||
|
||
---
|
||
|
||
## 📞 QUICK REFERENCE GUIDE
|
||
|
||
### Main Scripts
|
||
|
||
```bash
|
||
# System status
|
||
/opt/05-backups/scripts/admin.sh status
|
||
|
||
# Health check + alerts
|
||
/opt/05-backups/scripts/health-check-alerting.sh
|
||
|
||
# Generate full report
|
||
/opt/infrastructure-docs/scripts/server-full-report.sh
|
||
|
||
# Upload to Outline
|
||
/opt/05-backups/scripts/upload-to-outline.sh latest-report
|
||
|
||
# Full backup
|
||
/opt/05-backups/scripts/backup-full-enhanced.sh
|
||
```
|
||
|
||
### Emergency Commands
|
||
|
||
```bash
|
||
# Restart all containers
|
||
docker restart $(docker ps -q)
|
||
|
||
# Stop all containers
|
||
docker stop $(docker ps -q)
|
||
|
||
# Clean up everything
|
||
docker system prune -af --volumes
|
||
|
||
# Check logs for errors
|
||
docker ps -a | grep -v Up | awk '{print $NF}' | xargs -I {} docker logs --tail 50 {}
|
||
```
|
||
|
||
### Useful Aliases (Add to .bashrc)
|
||
|
||
```bash
|
||
alias dps='docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"'
|
||
alias dlogs='docker logs -f --tail 100'
|
||
alias dstats='docker stats --no-stream'
|
||
alias dclean='docker system prune -af'
|
||
alias vaultlogin='export VAULT_ADDR=http://127.0.0.1:8200 && export VAULT_TOKEN=hvs.jYguDdf2IzobXG8b9QWyATV8'
|
||
```
|
||
|
||
### Important URLs
|
||
|
||
- **Wiki:** https://wiki.ai-impress.com
|
||
- **N8N:** https://n8n.ai-impress.com
|
||
- **Monitoring:** https://grafana.ai-impress.com
|
||
- **Uptime:** https://uptime.ai-impress.com
|
||
- **Docker UI:** https://portainer.ai-impress.com
|
||
|
||
### Support Resources
|
||
|
||
- **Documentation:** https://wiki.ai-impress.com/collection/technical-knowledge-base-vCle1zKdMA
|
||
- **Reports:** https://wiki.ai-impress.com/collection/reports-yRgWpcsnJ7
|
||
- **Slack:** #alerts channel
|
||
- **Email:** admin@ai-impress.com
|
||
|
||
---
|
||
|
||
**Report End**
|
||
**Generated:** TIMESTAMP_PLACEHOLDER
|
||
**Version:** 5.0.0 (Modular)
|
||
**Modules:** 7 sections
|
||
EOFGUIDES
|