OVHserver/opt/infrastructure-docs/scripts/modules/generate-guides.sh
SamoilenkoVadym a987d45fbc chore: initial infrastructure setup with Syncthing, Git and documentation
Set up three-tier synchronization: Syncthing (real-time), GitHub (version control), rsync (disaster recovery). Includes complete documentation for future Claude sessions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 16:41:12 +00:00

678 lines
17 KiB
Bash
Executable file
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#!/bin/bash
# Module 8-11: Secrets, How-To, Troubleshooting, Architecture
cat << 'EOFGUIDES'
---
## 8⃣ SECRETS & PASSWORDS
### Vault Access
All secrets are stored in HashiCorp Vault for security.
```bash
# Set Vault environment variables
export VAULT_ADDR="http://127.0.0.1:8200"
export VAULT_TOKEN="hvs.jYguDdf2IzobXG8b9QWyATV8"
# List all available secret paths
vault kv list aimpress/
# Get a specific secret
vault kv get aimpress/<service-name>
# Get secret in JSON format
vault kv get -format=json aimpress/<service> | jq '.data.data'
# Add new secret
vault kv put aimpress/<service> \
username="admin" \
password="secret123" \
api_key="xyz"
# Update existing secret
vault kv patch aimpress/<service> password="new_password"
# Delete secret
vault kv delete aimpress/<service>
# View secret history
vault kv metadata get aimpress/<service>
```
### Vault Secret Structure
```
secret/
├── monitoring/
│ ├── slack_webhook
│ └── alert_email
└── outline/
└── scripts/
├── api_token
├── api_url
├── reports_collection_id
└── technical_kb_collection_id
aimpress/
├── postgres/
│ ├── admin # Main PostgreSQL admin
│ ├── outline # Outline database user
│ └── odoo # Odoo database user
├── odoo # Odoo admin credentials
├── authentik # Authentik admin credentials
├── grafana # Grafana admin credentials
└── portainer # Portainer admin credentials
```
### Vault Unseal (if sealed)
```bash
# Check Vault status
docker exec vault vault status
# Unseal Vault
/opt/00-infrastructure/vault/unseal.sh
# Or manually with keys
docker exec vault vault operator unseal <key1>
docker exec vault vault operator unseal <key2>
docker exec vault vault operator unseal <key3>
```
---
## 9⃣ HOW TO GUIDES
### Restart a Service
```bash
# Quick restart
docker restart <container-name>
# Graceful restart with docker-compose
cd /opt/<service-directory>
docker-compose restart <service-name>
# Full restart (if config changed)
cd /opt/<service-directory>
docker-compose down
docker-compose up -d
# Restart all services in compose file
cd /opt/<service-directory>
docker-compose restart
```
### View Logs
```bash
# Real-time logs (follow)
docker logs -f --tail 100 <container-name>
# Last 500 lines
docker logs --tail 500 <container-name>
# Logs with timestamps
docker logs -t <container-name>
# Logs from last hour
docker logs --since 1h <container-name>
# Logs from specific time
docker logs --since "2025-11-03T10:00:00" <container-name>
# Save logs to file
docker logs <container-name> > /tmp/container.log
# Search in logs
docker logs <container-name> 2>&1 | grep "error"
```
### Update a Container
```bash
# 1. BACKUP FIRST!
/opt/05-backups/scripts/backup-full-enhanced.sh
# 2. Pull new image
cd /opt/<service-directory>
docker-compose pull <service-name>
# 3. Recreate container
docker-compose up -d <service-name>
# 4. Check logs
docker logs -f <container-name>
# 5. Verify service is working
curl -I https://<service-url>
```
### Delete a Container/Service
```bash
# ⚠️ WARNING: This will delete data! Backup first!
# 1. Create backup
/opt/05-backups/scripts/backup-full-enhanced.sh
# 2. Stop and remove container
docker stop <container-name>
docker rm <container-name>
# 3. Remove volumes (if not needed)
docker volume ls | grep <service>
docker volume rm <volume-name>
# 4. Remove from docker-compose.yml
cd /opt/<service-directory>
nano docker-compose.yml # Delete service section
# 5. Restart remaining services
docker-compose up -d
```
### Change Password
#### For Web Applications
```bash
# 1. Login to web interface
# 2. Go to Settings → Users → Change Password
# 3. Save new password to Vault
export VAULT_ADDR="http://127.0.0.1:8200"
export VAULT_TOKEN="hvs.jYguDdf2IzobXG8b9QWyATV8"
vault kv put aimpress/<service> password="<new-password>"
```
#### For PostgreSQL
```bash
# 1. Generate strong password
NEW_PASS=$(openssl rand -base64 32)
echo "New password: $NEW_PASS"
# 2. Change in database
docker exec postgres-main psql -U aimpress_admin -c \
"ALTER USER <username> WITH PASSWORD '$NEW_PASS';"
# 3. Save to Vault
vault kv put aimpress/postgres/<service> password="$NEW_PASS"
# 4. Update application config
cd /opt/<service-path>
nano docker-compose.yml # Update DB_PASSWORD
# 5. Restart application
docker-compose restart
```
### Clean Up Disk Space
```bash
# Clean Docker (removes unused images, containers, volumes)
docker system prune -af --volumes
# Clean old logs
find /opt -name "*.log" -mtime +7 -delete
find /var/log -name "*.log" -mtime +7 -exec truncate -s 0 {} \;
# Clean old backups (keep last 60 days)
find /mnt/backups -name "*.gz" -mtime +60 -delete
# Find and remove large files
find /opt /mnt -type f -size +1G -exec ls -lh {} \;
# Check what's using space
du -sh /opt/* /mnt/* | sort -h
```
### Add New Service
```bash
# 1. Create directory structure
mkdir -p /opt/03-business/<service-name>
cd /opt/03-business/<service-name>
# 2. Create docker-compose.yml
nano docker-compose.yml
# 3. Add to Traefik network
docker network connect traefik-public <container-name>
# 4. Start service
docker-compose up -d
# 5. Check logs
docker logs -f <container-name>
# 6. Add credentials to Vault
vault kv put aimpress/<service> \
username="admin" \
password="$(openssl rand -base64 32)"
```
---
## 🔟 TROUBLESHOOTING
### Website Not Responding (502/503 Error)
```bash
# 1. Check Traefik (reverse proxy)
docker logs traefik --tail 100 | grep -i error
# 2. Check if target container is running
docker ps | grep <service-name>
# 3. Check target container logs
docker logs <service-name> --tail 100
# 4. Test direct connection (bypass Traefik)
docker exec <service-name> curl -I localhost:<port>
# 5. Check if service is in correct network
docker network inspect traefik-public | grep <service-name>
# 6. Restart Traefik
docker restart traefik
# 7. Restart service
docker restart <service-name>
```
### Database Connection Errors
```bash
# 1. Check PostgreSQL is running
docker ps | grep postgres-main
docker logs postgres-main --tail 100
# 2. Test database connection
docker exec postgres-main psql -U aimpress_admin -c "SELECT version();"
# 3. Check disk space (common issue)
df -h /mnt/psql-data
# 4. Check database credentials
vault kv get aimpress/postgres/<service>
# 5. Verify application database config
cd /opt/<service-path>
cat docker-compose.yml | grep -i database
# 6. Restart PostgreSQL (careful!)
docker restart postgres-main
# 7. Check PostgreSQL connections
docker exec postgres-main psql -U aimpress_admin -c \
"SELECT * FROM pg_stat_activity;"
```
### Container Keeps Restarting
```bash
# 1. Check logs for error
docker logs <container-name> --tail 200
# 2. Check restart count
docker inspect --format='{{.RestartCount}}' <container-name>
# 3. Check resource usage
docker stats <container-name> --no-stream
free -h
df -h
# 4. Stop auto-restart temporarily
docker update --restart=no <container-name>
# 5. Start manually to see full error
docker start <container-name> && docker logs -f <container-name>
# 6. Check for port conflicts
netstat -tulpn | grep <port>
# 7. Check healthcheck
docker inspect <container-name> | grep -A 20 Health
```
### Out of Disk Space
```bash
# 1. Find what's using space
du -sh /opt/* /mnt/* /var/* | sort -h | tail -20
# 2. Check Docker disk usage
docker system df
# 3. Clean Docker
docker system prune -af --volumes
# 4. Clean logs
journalctl --vacuum-time=7d
find /var/log -name "*.log" -mtime +7 -exec rm {} \;
# 5. Clean old backups
find /mnt/backups -mtime +30 -delete
# 6. Remove stopped containers
docker container prune -f
# 7. Remove unused volumes
docker volume prune -f
```
### SSL Certificate Issues
```bash
# 1. Check Traefik logs for certificate errors
docker logs traefik 2>&1 | grep -i "certificate\|acme\|letsencrypt"
# 2. Check certificate status
docker exec traefik cat /letsencrypt/acme.json | jq '.Certificates'
# 3. Force certificate renewal
docker exec traefik rm -rf /letsencrypt/acme.json
docker restart traefik
# Wait 2-3 minutes for new certificates
# 4. Check domain DNS
dig <domain.com>
nslookup <domain.com>
# 5. Test certificate
openssl s_client -connect <domain>:443 -servername <domain>
```
### High Memory Usage
```bash
# 1. Check which containers use most memory
docker stats --no-stream | sort -k4 -h
# 2. Check system memory
free -h
top -bn1 | head -20
# 3. Find memory-heavy processes
ps aux --sort=-%mem | head -10
# 4. Restart heavy containers
docker restart <container-name>
# 5. Check for memory leaks
docker logs <container-name> | grep -i "memory\|oom"
# 6. Add memory limits to docker-compose.yml
# deploy:
# resources:
# limits:
# memory: 1G
```
### Vault Sealed/Inaccessible
```bash
# 1. Check Vault status
docker logs vault --tail 50
# 2. Check if sealed
docker exec vault vault status
# 3. Unseal Vault
/opt/00-infrastructure/vault/unseal.sh
# 4. If unseal script doesn't work
docker exec vault vault operator unseal <key1>
docker exec vault vault operator unseal <key2>
docker exec vault vault operator unseal <key3>
# 5. Get new token if expired
docker exec vault vault login
# Enter root token
```
---
## 1⃣1⃣ SYSTEM ARCHITECTURE
### High-Level Overview
```
INTERNET
Cloudflare (DNS + CDN)
OVH Server (51.89.231.46)
Traefik (Reverse Proxy + SSL)
↓ ↓
┌───────┴───────────┴──────┬────────┐
↓ ↓ ↓
Services Monitoring Tools
┌─────┴──────┐ ┌────┴────┐ ┌──┴───┐
↓ ↓ ↓ ↓ ↓ ↓
Wiki N8N Grafana Loki Portainer
Odoo Supabase Prometheus Vault
Authentik ...
↓ ↓
└────────────┴────────┐
Infrastructure
┌──────┬──────┬──────┐
↓ ↓ ↓ ↓
PostgreSQL Redis Vault Storage
(/mnt/psql-data) (/mnt/backups)
```
### Request Flow
1. **User** visits `https://wiki.ai-impress.com`
2. **Cloudflare** resolves DNS → 51.89.231.46
3. **Traefik** receives HTTPS request (port 443)
4. **Traefik** terminates SSL (Let's Encrypt certificate)
5. **Traefik** routes to Outline container via internal network
6. **Outline** authenticates user (via Authentik if needed)
7. **Outline** queries PostgreSQL for content
8. **Response** flows back: Outline → Traefik → User
**Critical points:** If Traefik or PostgreSQL fail, all services go down.
### Directory Structure
```
/opt/ # All applications
├── 00-infrastructure/ # Core infrastructure
│ ├── traefik/ # Reverse proxy + SSL
│ ├── postgres/ # PostgreSQL main
│ ├── vault/ # Secrets storage
│ │ ├── .vault-token # Root token (keep safe!)
│ │ └── unseal.sh # Unseal script
│ └── redis/ # Cache & queues
├── 01-network/ # Network services
│ └── authentik/ # SSO authentication
├── 02-core/ # Core business apps
│ ├── outline/ # Wiki
│ ├── n8n/ # Automation
│ └── supabase/ # Backend platform
├── 03-business/ # Business applications
│ ├── odoo/ # ERP
│ └── documenso/ # Document signing
├── 04-tools/ # Management tools
│ ├── monitoring/ # Grafana, Prometheus, Loki
│ │ └── docker-compose.yml
│ ├── portainer/ # Docker UI
│ └── uptime-kuma/ # Uptime monitoring
└── 05-backups/ # Backup system
├── scripts/ # 👈 ALL ADMIN SCRIPTS
│ ├── admin.sh # Main admin interface
│ ├── health-check-alerting.sh # Health monitoring
│ ├── backup-full-enhanced.sh # Full backups
│ ├── upload-to-outline.sh # Report uploads
│ └── ...
├── logs/ # Script logs
├── reports/ # JSON reports
└── config-versions/ # Config backups
/opt/infrastructure-docs/ # Documentation
├── scripts/
│ ├── server-full-report.sh # Generate this report
│ └── modules/ # Report modules
├── reports/ # Generated MD reports
├── ADMIN_GUIDE.md # Administrator guide
└── SUMMARY.md # Setup summary
/mnt/ # Mounted storage
├── backups/ # 💾 ALL BACKUPS HERE
│ ├── *.tar.gz # System backups
│ ├── *.sql.gz # Database backups
│ └── vault-export-*.json # Vault backups
└── psql-data/ # 🗄️ ALL DATABASES HERE
├── pgdata/ # postgres-main
├── supabase-db/ # Supabase
└── pgadmin-data/ # pgAdmin
```
### Service Dependencies
**Level 1: Infrastructure (Cannot delete)**
- Traefik → All websites depend on it
- PostgreSQL → All applications need database
- Redis → N8N, Authentik, others need cache
- Vault → All passwords stored here
**Level 2: Network**
- Authentik → Provides SSO for many services
**Level 3: Applications**
- Outline, N8N, Odoo, etc → Can delete if backed up
**Level 4: Tools**
- Grafana, Portainer, etc → Can recreate
### Network Architecture
**Docker Networks:**
- `traefik-public` - Public-facing services
- `postgres-network` - Database connections
- `redis-network` - Redis connections
- `authentik-network` - Auth services
- Individual service networks
**Port Mapping:**
- `80` → Traefik HTTP (redirects to HTTPS)
- `443` → Traefik HTTPS
- `22` → SSH
- Internal services: No direct external access
### Backup Strategy
**Automated backups:**
- **Daily 02:00** - Configuration backup
- **Daily 03:00** - Full system backup
- **Every 4 hours** - Vault secrets export
- **Weekly Saturday** - Database maintenance
**Backup locations:**
- `/mnt/backups/` - Primary backup location
- Vault exports in same location
**What's backed up:**
- All databases (PostgreSQL)
- All Docker configurations
- All Vault secrets
- Application data volumes
### Monitoring Setup
**Grafana** → Visualizations
**Prometheus** → Metrics collection
**Loki** → Log aggregation
**cAdvisor** → Container metrics
**Node Exporter** → System metrics
**Alertmanager** → Alert routing
**Blackbox Exporter** → Website monitoring
**Alert flow:**
Health check → Detects problem → Sends to Slack + Email
---
## 📞 QUICK REFERENCE GUIDE
### Main Scripts
```bash
# System status
/opt/05-backups/scripts/admin.sh status
# Health check + alerts
/opt/05-backups/scripts/health-check-alerting.sh
# Generate full report
/opt/infrastructure-docs/scripts/server-full-report.sh
# Upload to Outline
/opt/05-backups/scripts/upload-to-outline.sh latest-report
# Full backup
/opt/05-backups/scripts/backup-full-enhanced.sh
```
### Emergency Commands
```bash
# Restart all containers
docker restart $(docker ps -q)
# Stop all containers
docker stop $(docker ps -q)
# Clean up everything
docker system prune -af --volumes
# Check logs for errors
docker ps -a | grep -v Up | awk '{print $NF}' | xargs -I {} docker logs --tail 50 {}
```
### Useful Aliases (Add to .bashrc)
```bash
alias dps='docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"'
alias dlogs='docker logs -f --tail 100'
alias dstats='docker stats --no-stream'
alias dclean='docker system prune -af'
alias vaultlogin='export VAULT_ADDR=http://127.0.0.1:8200 && export VAULT_TOKEN=hvs.jYguDdf2IzobXG8b9QWyATV8'
```
### Important URLs
- **Wiki:** https://wiki.ai-impress.com
- **N8N:** https://n8n.ai-impress.com
- **Monitoring:** https://grafana.ai-impress.com
- **Uptime:** https://uptime.ai-impress.com
- **Docker UI:** https://portainer.ai-impress.com
### Support Resources
- **Documentation:** https://wiki.ai-impress.com/collection/technical-knowledge-base-vCle1zKdMA
- **Reports:** https://wiki.ai-impress.com/collection/reports-yRgWpcsnJ7
- **Slack:** #alerts channel
- **Email:** admin@ai-impress.com
---
**Report End**
**Generated:** TIMESTAMP_PLACEHOLDER
**Version:** 5.0.0 (Modular)
**Modules:** 7 sections
EOFGUIDES