OVHserver/opt/05-backups/RESTORE-GUIDE.md
SamoilenkoVadym 51969ba447 feat: обеспечена 100% восстановимость инфраструктуры из бэкапов
КРИТИЧНЫЕ ИЗМЕНЕНИЯ:
- Исправлены имена Docker volumes для корректного бэкапа
- Добавлены ВСЕ критичные volumes: n8n, Odoo, Authentik-postgres, Outline, WikiJS
- Добавлены Grafana dashboards в бэкап application data
- Добавлена автоочистка локальных бэкапов (7 дней)
- Изменен retention R2: с 1 дня на 3 дня (безопасность)
- Исправлен путь к Supabase storage

УЛУЧШЕНИЯ:
- backup-full-enhanced.sh v2.2.0
- Добавлена функция cleanup_old_local_backups()
- Создан детальный RESTORE-GUIDE.md с пошаговыми инструкциями
- 100% покрытие для disaster recovery

БЭКАПИРУЕМЫЕ КОМПОНЕНТЫ:
Databases:
  - PostgreSQL (postgres-main + authentik-postgres)
  - MariaDB (mautic-db)
  - MongoDB (если есть)

Volumes (9 критичных):
  - authentik_authentik-postgres-data (SSO БД)
  - authentik_authentik-redis-data (sessions)
  - evolution-api_evolution-data (WhatsApp)
  - n8n-shared_n8n-data (workflows, credentials)
  - odoo_odoo-data + odoo_odoo-addons (ERP)
  - vaultwarden_vaultwarden-data (passwords)
  - outline_outline-data + wikijs_data (wiki)

Application Data:
  - Vault secrets
  - Docker Compose configs + .env
  - Grafana dashboards
  - Supabase storage
  - Documenso documents
  - Evolution instances
  - Mautic data

Cloud Backup:
  - R2 (HOT): последние 3 дня
  - Google Drive (COLD): 7д + 4н + 3м

РЕЗУЛЬТАТ:
Теперь возможно полное восстановление всей инфраструктуры
на новом сервере с 0 за 4-6 часов.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 19:22:22 +00:00

668 lines
15 KiB
Markdown

# AI-Impress Disaster Recovery Guide
**Version:** 2.2.0
**Last Updated:** 2025-11-13
**Purpose:** Complete step-by-step guide to restore full infrastructure from backups
---
## Table of Contents
1. [Overview](#overview)
2. [Prerequisites](#prerequisites)
3. [Recovery Scenarios](#recovery-scenarios)
4. [Full System Restoration](#full-system-restoration)
5. [Partial Recovery](#partial-recovery)
6. [Verification](#verification)
7. [Troubleshooting](#troubleshooting)
---
## Overview
This guide covers full disaster recovery for AI-Impress infrastructure. With backup version 2.2.0, we achieve **100% recovery coverage** of all critical components.
### What's Backed Up
**Databases:**
- PostgreSQL (postgres-main): n8n, Odoo, Vaultwarden, WikiJS, Evolution, Documenso, Supabase
- PostgreSQL (authentik-postgres): Authentik SSO users and configuration
- MariaDB (mautic-db): Mautic marketing automation
- MongoDB (if present)
**Docker Volumes:**
- `authentik_authentik-postgres-data` - Authentik database
- `authentik_authentik-redis-data` - Authentik sessions
- `evolution-api_evolution-data` - WhatsApp sessions and messages
- `n8n-shared_n8n-data` - n8n workflows and credentials
- `odoo_odoo-data` - Odoo file store and attachments
- `odoo_odoo-addons` - Custom Odoo modules
- `vaultwarden_vaultwarden-data` - Password vaults
- `outline_outline-data` - Outline wiki data
- `wikijs_data` - WikiJS data
**Application Data:**
- Vault secrets (`/opt/00-infrastructure/vault/data`)
- Docker Compose files and .env configs
- Supabase storage
- Grafana dashboards
- Documenso signed documents
- Evolution API WhatsApp instances
- Mautic sync data
**Cloud Backups:**
- **HOT (R2):** Last 3 days for quick recovery
- **COLD (Google Drive):** 7 days + 4 weeks + 3 months
---
## Prerequisites
### Required Information
1. **Server Access:**
- New/replacement server IP address
- SSH access (ubuntu user)
- sudo privileges
2. **Backup Credentials:**
- Restic password (from `/opt/05-backups/restic/.env`)
- Cloudflare R2 credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
- Google Drive rclone configuration
- Vault unseal keys (if using Vault)
3. **DNS & Domain:**
- Domain: `*.ai-impress.com`
- Cloudflare API token for SSL
4. **Required Software:**
- Ubuntu 22.04 LTS (or compatible)
- Docker & Docker Compose
- Restic
- rclone (for Google Drive)
---
## Recovery Scenarios
### Scenario 1: Complete Server Loss
**Situation:** Physical server destroyed, migrating to new hardware
**Recovery Time:** 4-6 hours
**Procedure:** [Full System Restoration](#full-system-restoration)
### Scenario 2: Single Service Failure
**Situation:** One service (e.g., n8n) corrupted or lost data
**Recovery Time:** 30 minutes - 2 hours
**Procedure:** [Partial Recovery](#partial-recovery)
### Scenario 3: Database Corruption
**Situation:** PostgreSQL or MariaDB database corrupted
**Recovery Time:** 1-2 hours
**Procedure:** [Database-Only Recovery](#database-only-recovery)
---
## Full System Restoration
### PHASE 1: Prepare New Server (30-60 minutes)
#### 1.1 Install Base System
```bash
# Update system
sudo apt update && sudo apt upgrade -y
# Install required packages
sudo apt install -y \
docker.io \
docker-compose \
git \
curl \
wget \
restic \
rclone \
unzip
```
#### 1.2 Create Directory Structure
```bash
# Create main directories
sudo mkdir -p /opt /mnt /data
sudo chown -R ubuntu:ubuntu /opt /mnt /data
# Create backup directories
sudo mkdir -p /mnt/backups/local-backups
sudo mkdir -p /opt/05-backups/{scripts,logs,reports,restic}
```
#### 1.3 Setup Docker Networks
```bash
# Create external networks
docker network create traefik-public
docker network create database-internal
```
---
### PHASE 2: Restore from Cloud Backup (1-2 hours)
#### 2.1 Configure Restic
```bash
# Create Restic credentials file
cat > /opt/05-backups/restic/.env << 'EOF'
# Cloudflare R2 (HOT Storage)
export RESTIC_REPOSITORY="s3:https://6aff840a680098927b58beb93b59dd03.r2.cloudflarestorage.com/aimpress-backups"
export AWS_ACCESS_KEY_ID="YOUR_R2_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_R2_SECRET_KEY"
export RESTIC_PASSWORD="YOUR_RESTIC_PASSWORD"
# Google Drive (COLD Storage) - alternative
# export RESTIC_REPOSITORY="rclone:gdrive:ai-impress-backups"
EOF
source /opt/05-backups/restic/.env
```
#### 2.2 List Available Snapshots
```bash
# Check R2 snapshots (last 3 days)
restic -r "$RESTIC_REPOSITORY" snapshots
# Or check Google Drive (longer history)
restic -r "rclone:gdrive:ai-impress-backups" snapshots
```
#### 2.3 Restore Latest Snapshot
```bash
# Restore to /mnt/backups
cd /mnt/backups
restic -r "$RESTIC_REPOSITORY" restore latest --target /mnt/backups
# Verify restoration
ls -lah /mnt/backups/local-backups/
```
---
### PHASE 3: Restore Databases (1-2 hours)
#### 3.1 Start Database Containers
```bash
# Start PostgreSQL main
cd /opt/00-infrastructure/postgres
docker compose up -d
# Wait for healthy status
docker ps | grep postgres-main
# Start Authentik PostgreSQL
cd /opt/01-security/authentik
docker compose up -d authentik-postgres
# Start MariaDB for Mautic (if used)
cd /opt/03-business/mautic
docker compose up -d mautic-db
```
#### 3.2 Restore PostgreSQL Databases
```bash
# Find latest PostgreSQL dump
LATEST_PG_DUMP=$(ls -t /mnt/backups/local-backups/postgresql-postgres-main-*.sql.gz | head -1)
# Restore postgres-main
gunzip -c "$LATEST_PG_DUMP" | docker exec -i postgres-main psql -U aimpress_admin postgres
# Find and restore Authentik database
LATEST_AUTHENTIK_DUMP=$(ls -t /mnt/backups/local-backups/postgresql-authentik-postgres-*.sql.gz | head -1)
gunzip -c "$LATEST_AUTHENTIK_DUMP" | docker exec -i authentik-postgres psql -U authentik postgres
```
#### 3.3 Restore MariaDB Database
```bash
# Find latest MariaDB dump
LATEST_MARIADB_DUMP=$(ls -t /mnt/backups/local-backups/mariadb-mautic-db-*.sql.gz | head -1)
# Restore
gunzip -c "$LATEST_MARIADB_DUMP" | docker exec -i mautic-db mariadb
```
---
### PHASE 4: Restore Docker Volumes (1-2 hours)
#### 4.1 Extract Volume Backups
```bash
cd /mnt/backups/local-backups
# Find latest volume backups
ls -t *-volume-*.tar.gz
```
#### 4.2 Restore Critical Volumes
```bash
# Function to restore volume
restore_volume() {
local volume_name=$1
local backup_file=$2
echo "Restoring $volume_name..."
# Create volume if doesn't exist
docker volume create "$volume_name"
# Get volume mount point
local volume_path=$(docker volume inspect "$volume_name" --format '{{.Mountpoint}}')
# Extract backup to volume
sudo tar xzf "$backup_file" -C "$(dirname "$volume_path")" --strip-components=1
echo "✅ $volume_name restored"
}
# Restore Authentik volumes
restore_volume "authentik_authentik-postgres-data" "$(ls -t authentik-postgres-volume-*.tar.gz | head -1)"
restore_volume "authentik_authentik-redis-data" "$(ls -t authentik-redis-volume-*.tar.gz | head -1)"
# Restore Evolution API
restore_volume "evolution-api_evolution-data" "$(ls -t evolution-volume-*.tar.gz | head -1)"
# Restore n8n
restore_volume "n8n-shared_n8n-data" "$(ls -t n8n-volume-*.tar.gz | head -1)"
# Restore Odoo
restore_volume "odoo_odoo-data" "$(ls -t odoo-data-volume-*.tar.gz | head -1)"
restore_volume "odoo_odoo-addons" "$(ls -t odoo-addons-volume-*.tar.gz | head -1)"
# Restore Vaultwarden
restore_volume "vaultwarden_vaultwarden-data" "$(ls -t vaultwarden-volume-*.tar.gz | head -1)"
# Restore Outline & WikiJS
restore_volume "outline_outline-data" "$(ls -t outline-volume-*.tar.gz | head -1)"
restore_volume "wikijs_data" "$(ls -t wikijs-volume-*.tar.gz | head -1)"
```
---
### PHASE 5: Restore Configurations (30-60 minutes)
#### 5.1 Restore Docker Compose Files and .env
```bash
# Find latest configs backup
LATEST_CONFIGS=$(ls -t /mnt/backups/local-backups/docker-configs-*.tar.gz | head -1)
# Extract to /opt
cd /
sudo tar xzf "$LATEST_CONFIGS"
# Verify
ls -la /opt/*/docker-compose.yml
```
#### 5.2 Restore Vault Data
```bash
# Find latest Vault backup
LATEST_VAULT=$(ls -t /mnt/backups/local-backups/vault-data-*.tar.gz | head -1)
# Extract
sudo tar xzf "$LATEST_VAULT" -C /opt/00-infrastructure/vault/
# Verify
ls -la /opt/00-infrastructure/vault/data/
```
#### 5.3 Restore Application Data
```bash
# Find latest app data backup
LATEST_APP_DATA=$(ls -t /mnt/backups/local-backups/app-data-*.tar.gz | head -1)
# Extract
cd /
sudo tar xzf "$LATEST_APP_DATA"
# This restores:
# - Grafana dashboards
# - Supabase storage
# - Documenso documents
# - Evolution instances
# - Mautic data
# - And more
```
---
### PHASE 6: Start Services (1-2 hours)
#### 6.1 Start Infrastructure Services
```bash
# Start in order:
# 1. Traefik (reverse proxy)
cd /opt/00-infrastructure/traefik
docker compose up -d
# 2. PostgreSQL, Redis, RabbitMQ
cd /opt/00-infrastructure/postgres && docker compose up -d
cd /opt/00-infrastructure/redis && docker compose up -d
cd /opt/00-infrastructure/rabbitmq && docker compose up -d
# 3. Vault
cd /opt/00-infrastructure/vault && docker compose up -d
# Wait for services to be healthy
docker ps
```
#### 6.2 Start Security & Authentication
```bash
# Authentik (SSO)
cd /opt/01-security/authentik
docker compose up -d
# Vaultwarden (Password Manager)
cd /opt/01-security/vaultwarden
docker compose up -d
# Wait for Authentik to be ready
curl -I https://auth.ai-impress.com
```
#### 6.3 Start Core Services
```bash
# n8n automation
cd /opt/02-core/n8n-shared
docker compose up -d
# Evolution API (WhatsApp)
cd /opt/02-core/evolution-api
docker compose up -d
# Supabase
cd /opt/02-core/supabase/supabase/docker
docker compose up -d
# BigBlueButton (if used)
cd /opt/02-core/bigbluebutton
docker compose up -d
```
#### 6.4 Start Business Services
```bash
# Odoo ERP
cd /opt/03-business/odoo
docker compose up -d
# Outline wiki
cd /opt/03-business/outline
docker compose up -d
# Documenso (document signing)
cd /opt/03-business/documenso
docker compose up -d
# WikiJS
cd /opt/03-business/wikijs
docker compose up -d
# Mautic (if used)
cd /opt/03-business/mautic
docker compose up -d
```
#### 6.5 Start Monitoring & Tools
```bash
# Grafana
cd /opt/04-tools/monitoring/grafana
docker compose up -d
# Prometheus
cd /opt/04-tools/monitoring/prometheus
docker compose up -d
# Loki
cd /opt/04-tools/monitoring/loki
docker compose up -d
# Uptime Kuma
cd /opt/04-tools/uptime-kuma
docker compose up -d
# Portainer
cd /opt/04-tools/portainer
docker compose up -d
```
---
## Verification
### Check All Services
```bash
# View all running containers
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
# Check for any failed containers
docker ps -a | grep -v "Up"
# Check logs for errors
docker compose logs --tail=50 [service-name]
```
### Test Key Services
```bash
# Test Traefik
curl -I https://traefik.ai-impress.com
# Test Authentik (SSO)
curl -I https://auth.ai-impress.com
# Test n8n
curl -I https://n8n.ai-impress.com
# Test Odoo
curl -I https://odoo.ai-impress.com
# Test Grafana
curl -I https://grafana.ai-impress.com
```
### Verify Data Integrity
**PostgreSQL:**
```bash
# Check database sizes
docker exec postgres-main psql -U aimpress_admin -c "\l+"
# Verify n8n database
docker exec postgres-main psql -U aimpress_admin n8n_shared -c "SELECT COUNT(*) FROM workflow_entity;"
# Verify Odoo database
docker exec postgres-main psql -U aimpress_admin odoo -c "SELECT COUNT(*) FROM res_users;"
```
**Authentik:**
```bash
# Check Authentik users
docker exec authentik-postgres psql -U authentik authentik -c "SELECT COUNT(*) FROM authentik_core_user;"
```
**Volumes:**
```bash
# Check volume sizes
docker volume ls -q | xargs docker volume inspect --format '{{ .Name }}: {{ .Mountpoint }}' | while read vol; do
du -sh $(echo $vol | cut -d: -f2)
done
```
---
## Partial Recovery
### Restore Single Service
#### Example: Restore n8n Only
```bash
# 1. Stop n8n
cd /opt/02-core/n8n-shared
docker compose down
# 2. Restore n8n database
LATEST_PG_DUMP=$(ls -t /mnt/backups/local-backups/postgresql-postgres-main-*.sql.gz | head -1)
gunzip -c "$LATEST_PG_DUMP" | docker exec -i postgres-main psql -U aimpress_admin -c "DROP DATABASE n8n_shared; CREATE DATABASE n8n_shared;"
gunzip -c "$LATEST_PG_DUMP" | docker exec -i postgres-main psql -U aimpress_admin n8n_shared
# 3. Restore n8n volume
docker volume rm n8n-shared_n8n-data
docker volume create n8n-shared_n8n-data
LATEST_N8N_VOL=$(ls -t /mnt/backups/local-backups/n8n-volume-*.tar.gz | head -1)
# ... extract volume ...
# 4. Restart n8n
docker compose up -d
```
---
### Database-Only Recovery
```bash
# Stop services using the database
cd /opt/02-core/n8n-shared && docker compose stop
cd /opt/03-business/odoo && docker compose stop
# Restore database
LATEST_PG_DUMP=$(ls -t /mnt/backups/local-backups/postgresql-postgres-main-*.sql.gz | head -1)
gunzip -c "$LATEST_PG_DUMP" | docker exec -i postgres-main psql -U aimpress_admin postgres
# Restart services
cd /opt/02-core/n8n-shared && docker compose start
cd /opt/03-business/odoo && docker compose start
```
---
## Troubleshooting
### Issue: Container Won't Start
**Problem:** Service fails to start after restoration
**Solution:**
```bash
# Check logs
docker compose logs [service-name]
# Check if volume exists
docker volume ls | grep [volume-name]
# Check if database exists
docker exec postgres-main psql -U aimpress_admin -l
```
### Issue: Database Connection Errors
**Problem:** Services can't connect to database
**Solution:**
```bash
# Verify database is running
docker ps | grep postgres
# Check database network
docker network inspect database-internal
# Test connection
docker exec postgres-main psql -U aimpress_admin -c "SELECT 1;"
```
### Issue: SSL Certificate Errors
**Problem:** HTTPS not working
**Solution:**
```bash
# Check Traefik logs
docker compose -f /opt/00-infrastructure/traefik/docker-compose.yml logs
# Verify acme.json exists
ls -la /opt/00-infrastructure/traefik/acme/acme.json
# If missing, Traefik will regenerate (may take 5-10 minutes)
```
### Issue: Authentik Users Missing
**Problem:** Can't log in to any service
**Solution:**
```bash
# Check Authentik PostgreSQL
docker ps | grep authentik-postgres
# Verify database restoration
docker exec authentik-postgres psql -U authentik authentik -c "SELECT email FROM authentik_core_user;"
# If empty, re-restore Authentik database
```
---
## Recovery Time Estimates
| Scenario | Minimum | Typical | Maximum |
|----------|---------|---------|---------|
| Full System | 3 hours | 4-6 hours | 8 hours |
| Single Service | 15 min | 30-60 min | 2 hours |
| Database Only | 30 min | 1 hour | 2 hours |
| Volume Only | 10 min | 20-30 min | 1 hour |
---
## Post-Recovery Checklist
- [ ] All containers running (`docker ps`)
- [ ] All services accessible via HTTPS
- [ ] Authentik SSO working (can log in)
- [ ] n8n workflows executing
- [ ] Odoo accessible with data
- [ ] Evolution API WhatsApp connected
- [ ] Grafana dashboards visible
- [ ] Vaultwarden accessible
- [ ] No errors in logs
- [ ] SSL certificates valid
- [ ] Backup script working (`/opt/05-backups/scripts/backup-full-enhanced.sh`)
---
## Support & Contact
For assistance during recovery:
- **Email:** admin@ai-impress.com
- **Backup Logs:** `/opt/05-backups/logs/`
- **Documentation:** `/opt/CLAUDE.md`
---
**Last Updated:** 2025-11-13
**Script Version:** backup-full-enhanced.sh v2.2.0