ferrero-opentext/Python-Version/MARKDOWN_DOCS/DATABASE_BACKUP_GUIDE.md

792 lines
19 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# PostgreSQL Database Backup & Restore Guide
## Overview
Automated backup system for the Ferrero Asset Tracking database with dual backup strategy:
- **Daily backups (pg_dump):** Logical SQL dumps, 7-day retention
- **Weekly backups (pg_basebackup):** Physical binary backups, keeps latest only
## Quick Start
### Run Manual Backup
```bash
# Daily backup
./database/backup.sh --daily
# Weekly backup
./database/backup.sh --weekly
```
### Check Backup Health
```bash
./database/check_backups.sh
```
### Restore from Backup
```bash
# List available backups
./database/restore.sh --list
# Restore from specific backup
./database/restore.sh backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz
```
## Backup Strategy
### Dual Backup Approach
| Type | Frequency | Method | Retention | Size | Purpose |
|------|-----------|--------|-----------|------|---------|
| **Daily** | Every day 2:00 AM | pg_dump | 7 days | ~2-5 MB | Quick restore, portable |
| **Weekly** | Sundays 3:00 AM | pg_basebackup | Latest only | ~30 MB | Full cluster backup |
### Why Both Methods?
**pg_dump (Daily):**
- ✅ Portable across PostgreSQL versions
- ✅ Human-readable SQL
- ✅ Can restore individual tables
- ✅ Smaller file size
- ⚠️ Slower restore for large databases
**pg_basebackup (Weekly):**
- ✅ Faster restore
- ✅ Exact binary replica
- ✅ Point-in-time recovery capable
- ⚠️ Version-specific
- ⚠️ Larger file size
## Directory Structure
```
/opt/ferrero-automation/Python-Version/
├── backups/
│ ├── dumps/ # Daily SQL dumps (7 files)
│ │ ├── ferrero_tracking_2025-11-11_02-00.sql.gz
│ │ ├── ferrero_tracking_2025-11-10_02-00.sql.gz
│ │ └── ... (up to 7 days)
│ ├── basebackups/ # Weekly binary backups (1 file)
│ │ └── ferrero_tracking_2025-11-10_03-00.tar.gz
│ ├── safety/ # Pre-restore safety backups
│ │ └── pre_restore_ferrero_tracking_2025-11-11_14-30.sql.gz
│ └── README.md
├── database/
│ ├── backup.sh # Backup script
│ ├── restore.sh # Restore script
│ ├── check_backups.sh # Health check script
│ └── init.sql
└── logs/
├── backup.log # Backup execution logs
└── restore.log # Restore execution logs
```
## Automated Backups (Cron)
### Production Cron Setup
Add to server crontab (`crontab -e`):
```cron
# Daily database backup at 2:00 AM
0 2 * * * cd /opt/ferrero-automation/Python-Version && ./database/backup.sh --daily >> logs/backup.log 2>&1
# Weekly full backup on Sundays at 3:00 AM
0 3 * * 0 cd /opt/ferrero-automation/Python-Version && ./database/backup.sh --weekly >> logs/backup.log 2>&1
# Check backup health daily at 8:00 AM (alerts if stale)
0 8 * * * cd /opt/ferrero-automation/Python-Version && ./database/check_backups.sh --quiet >> logs/backup_check.log 2>&1
```
### Schedule Rationale
- **2:00 AM** - Off-peak hours, minimal workflow activity
- **Sundays 3:00 AM** - Weekly full backup after daily completes
- **8:00 AM health check** - Alerts early in business day if backup failed
### Systemd Service Setup (Alternative to Cron)
For a more robust setup, you can use the provided systemd services.
**1. Install Service Files:**
```bash
sudo cp ferrero-backup-daily.service /etc/systemd/system/
sudo cp ferrero-backup-daily.timer /etc/systemd/system/
sudo cp ferrero-backup-weekly.service /etc/systemd/system/
sudo cp ferrero-backup-weekly.timer /etc/systemd/system/
```
**2. Enable Timers:**
```bash
sudo systemctl daemon-reload
sudo systemctl enable --now ferrero-backup-daily.timer
sudo systemctl enable --now ferrero-backup-weekly.timer
```
**3. Verify Timers:**
```bash
systemctl list-timers | grep ferrero
```
## Backup Scripts
### backup.sh
**Daily Mode:**
```bash
./database/backup.sh --daily
```
**What it does:**
1. Checks Docker container is running
2. Runs `pg_dump` via `docker exec`
3. Compresses output with gzip
4. Saves to `backups/dumps/ferrero_tracking_YYYY-MM-DD_HH-MM.sql.gz`
5. Deletes backups older than 7 days
6. Logs to `logs/backup.log`
7. Sends error email if backup fails
**Weekly Mode:**
```bash
./database/backup.sh --weekly
```
**What it does:**
1. Checks Docker container is running
2. Runs `pg_basebackup` via `docker exec`
3. Creates compressed tar archive
4. Saves to `backups/basebackups/`
5. Keeps only the latest weekly backup
6. Logs to `logs/backup.log`
### restore.sh
**List available backups:**
```bash
./database/restore.sh --list
```
**Restore from backup:**
```bash
./database/restore.sh backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz
```
**Safety features:**
1. Creates safety backup before restore
2. Requires explicit "yes" confirmation
3. Validates backup file exists
4. Drops and recreates database
5. Restores from compressed SQL dump
6. Verifies restoration (counts rows)
### check_backups.sh
**Check backup health:**
```bash
./database/check_backups.sh
```
**Output includes:**
- Daily backup count and status
- Latest backup age (warns if > 25 hours)
- Weekly backup status
- Total disk space used
- List of all backups
**Quiet mode (for cron):**
```bash
./database/check_backups.sh --quiet
```
## Restore Procedures
### Scenario 1: Restore from Daily Backup (Most Common)
**When to use:** Data corruption, accidental deletion, need to rollback
```bash
# Step 1: List available backups
./database/restore.sh --list
# Output shows:
# Daily Backups (pg_dump):
# backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz (2.1M, Nov 11 02:00)
# backups/dumps/ferrero_tracking_2025-11-10_02-00.sql.gz (2.0M, Nov 10 02:00)
# ...
# Step 2: Choose backup and restore
./database/restore.sh backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz
# Step 3: Confirm when prompted
# ⚠️ WARNING: This will REPLACE the current database!
# Are you sure you want to continue? (yes/no): yes
# Step 4: Wait for restore to complete
# ✓ Safety backup created
# ✓ Database recreated
# ✓ Data restored
# ✓ Verification: master_assets table has 150 rows
```
### Scenario 2: Disaster Recovery
**When to use:** Complete data loss, server migration
1. **Ensure Docker is running:**
```bash
docker ps | grep ferrero-tracking-db
```
2. **If container not running:**
```bash
cd /opt/ferrero-automation/Python-Version
docker compose up -d
```
3. **Wait for PostgreSQL to be ready:**
```bash
docker logs ferrero-tracking-db
# Wait for "database system is ready to accept connections"
```
4. **Restore from backup:**
```bash
./database/restore.sh backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz
```
5. **Verify data:**
```bash
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT
(SELECT COUNT(*) FROM master_assets) as master_assets,
(SELECT COUNT(*) FROM derivative_assets) as derivative_assets,
(SELECT COUNT(*) FROM creativex_scores) as creativex_scores;
"
```
6. **Resume workflows:**
```bash
# Test connection
python scripts/test_connection.py
# Workflows will auto-resume on next cron run
```
## Monitoring
### Check Backup Logs
```bash
# View recent backup activity
tail -50 logs/backup.log
# Search for errors
grep -i error logs/backup.log
# View today's backups
grep "$(date +%Y-%m-%d)" logs/backup.log
```
### Verify Backups Were Created
```bash
# Check latest daily backup
ls -lh backups/dumps/ | tail -5
# Check latest weekly backup
ls -lh backups/basebackups/
# Run health check
./database/check_backups.sh
```
### Systemd Logs
If using the systemd services, you can also check the system journal:
```bash
# Check daily backup service logs
sudo journalctl -u ferrero-backup-daily.service
# Check weekly backup service logs
sudo journalctl -u ferrero-backup-weekly.service
```
### Email Notifications
Backup failures automatically send emails to `ERROR_EMAIL` configured in `.env`.
**Email includes:**
- Error message
- Timestamp
- Server hostname
## Disk Space Management
### Current Usage
```bash
# Check backup directory size
du -sh backups/
# Check Docker volume size
docker system df -v | grep ferrero-db-data
```
### Expected Growth
**Estimated for 7 days retention:**
- Daily dumps: 7 files × ~2 MB = ~14 MB
- Weekly backup: 1 file × ~30 MB = ~30 MB
- Safety backups: ~5 MB (created only during restores)
- **Total: ~50 MB maximum**
### Cleanup
Automatic cleanup is built into scripts:
- Daily backups older than 7 days are auto-deleted
- Only latest weekly backup is kept
- Safety backups can be manually deleted after successful restore
**Manual cleanup if needed:**
```bash
# Remove old safety backups (optional)
rm backups/safety/*.sql.gz
# Remove specific backup
rm backups/dumps/ferrero_tracking_2025-11-01_02-00.sql.gz
```
## Manual Backup Commands
### Create Ad-Hoc Backup Before Risky Operation
```bash
# Quick backup before making changes
TIMESTAMP=$(date +%Y-%m-%d_%H-%M)
docker exec ferrero-tracking-db pg_dump -U ferrero_user ferrero_tracking | \
gzip > "backups/manual/before_change_${TIMESTAMP}.sql.gz"
```
### Backup Specific Table Only
```bash
# Backup just creativex_scores table
docker exec ferrero-tracking-db pg_dump -U ferrero_user -t creativex_scores ferrero_tracking | \
gzip > "backups/manual/creativex_scores_${TIMESTAMP}.sql.gz"
```
### Copy Backup to Remote Location
```bash
# SCP to another server
scp backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz \
user@backup-server:/backups/ferrero/
# Sync entire backup directory
rsync -avz backups/ user@backup-server:/backups/ferrero/
```
## Testing Backups
**CRITICAL:** Always test your backups! A backup you can't restore is worthless.
### Test Procedure (Monthly)
1. **Create test database:**
```bash
docker exec ferrero-tracking-db createdb -U ferrero_user ferrero_tracking_test
```
2. **Restore backup to test database:**
```bash
gunzip -c backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz | \
docker exec -i ferrero-tracking-db psql -U ferrero_user -d ferrero_tracking_test
```
3. **Verify data:**
```bash
docker exec ferrero-tracking-db psql -U ferrero_user -d ferrero_tracking_test -c "
SELECT COUNT(*) FROM master_assets;
SELECT COUNT(*) FROM creativex_scores;
"
```
4. **Cleanup test database:**
```bash
docker exec ferrero-tracking-db dropdb -U ferrero_user ferrero_tracking_test
```
## Troubleshooting
### Backup Failed: "Container not running"
```bash
# Check Docker status
docker ps -a | grep ferrero-tracking-db
# Start container
docker compose up -d
# Verify database is ready
docker logs ferrero-tracking-db | tail -20
```
### Backup Failed: "Permission denied"
```bash
# Fix script permissions
chmod +x database/backup.sh database/restore.sh database/check_backups.sh
# Fix backup directory permissions
chmod 755 backups
chmod 600 backups/dumps/*.sql.gz
```
### Restore Failed: "Database is being accessed"
```bash
# Stop all workflows accessing database
# (or just run restore during off-hours)
# Kill active connections
docker exec ferrero-tracking-db psql -U ferrero_user -d postgres -c "
SELECT pg_terminate_backend(pg_stat_activity.pid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = 'ferrero_tracking'
AND pid <> pg_backend_pid();
"
# Retry restore
./database/restore.sh <backup-file>
```
### Backup Directory Full
```bash
# Check space
df -h /opt/ferrero-automation
# Remove old safety backups
rm backups/safety/*.sql.gz
# Reduce retention (edit backup.sh):
DAILY_RETENTION_DAYS=3 # Instead of 7
```
### Latest Backup is Stale (> 25 hours)
```bash
# Check backup log for errors
tail -100 logs/backup.log | grep -i error
# Check cron is running
crontab -l | grep backup
# Check cron logs
grep CRON /var/log/syslog | grep backup
# Run manual backup
./database/backup.sh --daily
```
## Backup Verification
### Check Backup File Integrity
```bash
# Test gzip file is not corrupted
gunzip -t backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz
echo $? # Should output 0 if OK
# View first 20 lines of backup (without extracting)
gunzip -c backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz | head -20
```
### Verify Backup Contents
```bash
# Count how many INSERT statements
gunzip -c backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz | grep -c "INSERT"
# Check for specific table
gunzip -c backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz | grep "CREATE TABLE creativex_scores"
```
## Production Deployment
### Initial Setup on Server
```bash
# SSH to server
ssh user@production-server
# Navigate to project
cd /opt/ferrero-automation/Python-Version
# Make scripts executable
chmod +x database/backup.sh
chmod +x database/restore.sh
chmod +x database/check_backups.sh
# Create backup directories
mkdir -p backups/dumps
mkdir -p backups/basebackups
mkdir -p backups/safety
mkdir -p backups/manual
# Set permissions
chmod 700 backups
chmod 755 database/*.sh
# Test backup manually
./database/backup.sh --daily
```
### Add to Cron
```bash
# Edit crontab
crontab -e
# Add these lines:
0 2 * * * cd /opt/ferrero-automation/Python-Version && ./database/backup.sh --daily >> logs/backup.log 2>&1
0 3 * * 0 cd /opt/ferrero-automation/Python-Version && ./database/backup.sh --weekly >> logs/backup.log 2>&1
0 8 * * * cd /opt/ferrero-automation/Python-Version && ./database/check_backups.sh --quiet >> logs/backup_check.log 2>&1
# Save and exit
```
### Verify Cron Setup
```bash
# List cron jobs
crontab -l | grep backup
# Check cron service is running
systemctl status cron # or 'crond' on some systems
# Wait for next scheduled backup and check
tail -f logs/backup.log
```
## Advanced Operations
### Backup Before Major Changes
```bash
# Before schema changes
./database/backup.sh --daily
# Verify backup created
ls -lh backups/dumps/ | tail -1
# Make your changes
# ...
# If something goes wrong, restore
./database/restore.sh backups/dumps/ferrero_tracking_2025-11-11_14-30.sql.gz
```
### Backup to Remote Server (Optional)
Add to cron for offsite backup:
```bash
# Rsync backups to remote server daily at 4:00 AM
0 4 * * * rsync -avz /opt/ferrero-automation/Python-Version/backups/ backup-server:/ferrero-backups/ >> /opt/ferrero-automation/Python-Version/logs/rsync.log 2>&1
```
### Point-in-Time Recovery
For future enhancement, enable WAL archiving in Docker:
```yaml
# docker-compose.yml
services:
postgres:
command: postgres -c wal_level=replica -c archive_mode=on -c archive_command='cp %p /backups/wal/%f'
volumes:
- ./backups/wal:/backups/wal
```
## Maintenance
### Weekly Tasks
- ✅ Verify latest backup exists
- ✅ Check backup file sizes are reasonable
- ✅ Review backup logs for errors
```bash
# Quick weekly check
./database/check_backups.sh
tail -100 logs/backup.log | grep -E "ERROR|✓"
```
### Monthly Tasks
- ✅ Test restore process with test database
- ✅ Verify email notifications work
- ✅ Check disk space usage
```bash
# Monthly test restore
docker exec ferrero-tracking-db createdb -U ferrero_user test_restore
gunzip -c backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz | \
docker exec -i ferrero-tracking-db psql -U ferrero_user -d test_restore
docker exec ferrero-tracking-db dropdb -U ferrero_user test_restore
```
## Security
### Backup File Permissions
```bash
# Set restrictive permissions on backup directories
chmod 700 backups
chmod 700 backups/dumps
chmod 700 backups/basebackups
chmod 600 backups/dumps/*.sql.gz
chmod 600 backups/basebackups/*.tar.gz
```
### What's in Backups?
Backup files contain:
- All database tables (master_assets, creativex_scores, etc.)
- All data including metadata (JSONB columns)
- Database schema (tables, indexes, functions, triggers)
- **NO sensitive files** (.env, certificates, Box-config.json)
### Encryption (Future Enhancement)
For offsite backups, consider encryption:
```bash
# Encrypt backup
gpg --encrypt --recipient your-email@domain.com backup.sql.gz
# Decrypt when needed
gpg --decrypt backup.sql.gz.gpg > backup.sql.gz
```
## Backup Failure Scenarios
### What Happens if Backup Fails?
1. **Error logged** to `logs/backup.log`
2. **Email sent** to `ERROR_EMAIL`
3. **Cron continues** - will retry next day
4. **Previous backups** still available
### Email Template for Failures
Recipients receive email with:
- Error message
- Timestamp of failure
- Server hostname
- Instructions to check logs
## Recovery Time Objectives
| Scenario | Data Loss | Recovery Time |
|----------|-----------|---------------|
| Restore from daily (2 AM backup) | Max 24 hours | ~2-5 minutes |
| Restore from weekly (Sunday backup) | Max 7 days | ~1 minute |
| Manual backup before change | No loss | ~2-5 minutes |
## FAQs
### How much data will I lose if I restore?
Any data created **after** the backup timestamp will be lost.
Example:
- Backup at 2:00 AM on Nov 11
- Database corrupted at 3:00 PM on Nov 11
- Restore from 2:00 AM backup
- **Lost:** 13 hours of data (assets processed between 2 AM - 3 PM)
### Can I restore just one table?
Yes, with pg_dump backups:
```bash
# Extract just creativex_scores table
gunzip -c backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz | \
grep -A 10000 "CREATE TABLE creativex_scores" | \
docker exec -i ferrero-tracking-db psql -U ferrero_user -d ferrero_tracking
```
### How do I backup before deploying new code?
```bash
# Quick manual backup
./database/backup.sh --daily
# Or ad-hoc backup with custom name
docker exec ferrero-tracking-db pg_dump -U ferrero_user ferrero_tracking | \
gzip > "backups/manual/before_deployment_$(date +%Y-%m-%d).sql.gz"
```
### What if I need backups older than 7 days?
Modify retention in `backup.sh`:
```bash
# Edit backup.sh
DAILY_RETENTION_DAYS=30 # Keep 30 days instead of 7
```
Or manually copy important backups to archive:
```bash
mkdir -p backups/archive
cp backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz backups/archive/
```
## Support Commands
### Database Size
```bash
# Check database size
docker exec ferrero-tracking-db psql -U ferrero_user -d ferrero_tracking -c "
SELECT pg_size_pretty(pg_database_size('ferrero_tracking'));
"
```
### Table Sizes
```bash
# Check individual table sizes
docker exec ferrero-tracking-db psql -U ferrero_user -d ferrero_tracking -c "
SELECT
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
"
```
### Backup History
```bash
# List all backups with sizes
find backups/ -name "*.sql.gz" -o -name "*.tar.gz" | \
xargs ls -lh | \
awk '{print $9 " - " $5 " - " $6 " " $7 " " $8}'
```
## Summary
**Automated** - Daily and weekly backups via cron
**Reliable** - Dual strategy (logical + physical)
**Space-efficient** - 7-day retention, ~50 MB total
**Safe** - Safety backups before restore
**Monitored** - Health checks and email alerts
**Docker-compatible** - Works with containerized PostgreSQL
**Production-ready** - Standard PostgreSQL tools
**Recommendation:** Set up cron jobs immediately and test restore monthly.