ferrero-opentext/Python-Version/MARKDOWN_DOCS/DATABASE_BACKUP_GUIDE.md

19 KiB
Raw Permalink Blame History

PostgreSQL Database Backup & Restore Guide

Overview

Automated backup system for the Ferrero Asset Tracking database with dual backup strategy:

  • Daily backups (pg_dump): Logical SQL dumps, 7-day retention
  • Weekly backups (pg_basebackup): Physical binary backups, keeps latest only

Quick Start

Run Manual Backup

# Daily backup
./database/backup.sh --daily

# Weekly backup
./database/backup.sh --weekly

Check Backup Health

./database/check_backups.sh

Restore from Backup

# List available backups
./database/restore.sh --list

# Restore from specific backup
./database/restore.sh backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz

Backup Strategy

Dual Backup Approach

Type Frequency Method Retention Size Purpose
Daily Every day 2:00 AM pg_dump 7 days ~2-5 MB Quick restore, portable
Weekly Sundays 3:00 AM pg_basebackup Latest only ~30 MB Full cluster backup

Why Both Methods?

pg_dump (Daily):

  • Portable across PostgreSQL versions
  • Human-readable SQL
  • Can restore individual tables
  • Smaller file size
  • ⚠️ Slower restore for large databases

pg_basebackup (Weekly):

  • Faster restore
  • Exact binary replica
  • Point-in-time recovery capable
  • ⚠️ Version-specific
  • ⚠️ Larger file size

Directory Structure

/opt/ferrero-automation/Python-Version/
├── backups/
│   ├── dumps/                    # Daily SQL dumps (7 files)
│   │   ├── ferrero_tracking_2025-11-11_02-00.sql.gz
│   │   ├── ferrero_tracking_2025-11-10_02-00.sql.gz
│   │   └── ... (up to 7 days)
│   ├── basebackups/             # Weekly binary backups (1 file)
│   │   └── ferrero_tracking_2025-11-10_03-00.tar.gz
│   ├── safety/                  # Pre-restore safety backups
│   │   └── pre_restore_ferrero_tracking_2025-11-11_14-30.sql.gz
│   └── README.md
├── database/
│   ├── backup.sh               # Backup script
│   ├── restore.sh              # Restore script
│   ├── check_backups.sh        # Health check script
│   └── init.sql
└── logs/
    ├── backup.log              # Backup execution logs
    └── restore.log             # Restore execution logs

Automated Backups (Cron)

Production Cron Setup

Add to server crontab (crontab -e):

# Daily database backup at 2:00 AM
0 2 * * * cd /opt/ferrero-automation/Python-Version && ./database/backup.sh --daily >> logs/backup.log 2>&1

# Weekly full backup on Sundays at 3:00 AM
0 3 * * 0 cd /opt/ferrero-automation/Python-Version && ./database/backup.sh --weekly >> logs/backup.log 2>&1

# Check backup health daily at 8:00 AM (alerts if stale)
0 8 * * * cd /opt/ferrero-automation/Python-Version && ./database/check_backups.sh --quiet >> logs/backup_check.log 2>&1

Schedule Rationale

  • 2:00 AM - Off-peak hours, minimal workflow activity
  • Sundays 3:00 AM - Weekly full backup after daily completes
  • 8:00 AM health check - Alerts early in business day if backup failed

Systemd Service Setup (Alternative to Cron)

For a more robust setup, you can use the provided systemd services.

1. Install Service Files:

sudo cp ferrero-backup-daily.service /etc/systemd/system/
sudo cp ferrero-backup-daily.timer /etc/systemd/system/
sudo cp ferrero-backup-weekly.service /etc/systemd/system/
sudo cp ferrero-backup-weekly.timer /etc/systemd/system/

2. Enable Timers:

sudo systemctl daemon-reload
sudo systemctl enable --now ferrero-backup-daily.timer
sudo systemctl enable --now ferrero-backup-weekly.timer

3. Verify Timers:

systemctl list-timers | grep ferrero

Backup Scripts

backup.sh

Daily Mode:

./database/backup.sh --daily

What it does:

  1. Checks Docker container is running
  2. Runs pg_dump via docker exec
  3. Compresses output with gzip
  4. Saves to backups/dumps/ferrero_tracking_YYYY-MM-DD_HH-MM.sql.gz
  5. Deletes backups older than 7 days
  6. Logs to logs/backup.log
  7. Sends error email if backup fails

Weekly Mode:

./database/backup.sh --weekly

What it does:

  1. Checks Docker container is running
  2. Runs pg_basebackup via docker exec
  3. Creates compressed tar archive
  4. Saves to backups/basebackups/
  5. Keeps only the latest weekly backup
  6. Logs to logs/backup.log

restore.sh

List available backups:

./database/restore.sh --list

Restore from backup:

./database/restore.sh backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz

Safety features:

  1. Creates safety backup before restore
  2. Requires explicit "yes" confirmation
  3. Validates backup file exists
  4. Drops and recreates database
  5. Restores from compressed SQL dump
  6. Verifies restoration (counts rows)

check_backups.sh

Check backup health:

./database/check_backups.sh

Output includes:

  • Daily backup count and status
  • Latest backup age (warns if > 25 hours)
  • Weekly backup status
  • Total disk space used
  • List of all backups

Quiet mode (for cron):

./database/check_backups.sh --quiet

Restore Procedures

Scenario 1: Restore from Daily Backup (Most Common)

When to use: Data corruption, accidental deletion, need to rollback

# Step 1: List available backups
./database/restore.sh --list

# Output shows:
# Daily Backups (pg_dump):
#   backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz (2.1M, Nov 11 02:00)
#   backups/dumps/ferrero_tracking_2025-11-10_02-00.sql.gz (2.0M, Nov 10 02:00)
#   ...

# Step 2: Choose backup and restore
./database/restore.sh backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz

# Step 3: Confirm when prompted
# ⚠️  WARNING: This will REPLACE the current database!
# Are you sure you want to continue? (yes/no): yes

# Step 4: Wait for restore to complete
# ✓ Safety backup created
# ✓ Database recreated
# ✓ Data restored
# ✓ Verification: master_assets table has 150 rows

Scenario 2: Disaster Recovery

When to use: Complete data loss, server migration

  1. Ensure Docker is running:

    docker ps | grep ferrero-tracking-db
    
  2. If container not running:

    cd /opt/ferrero-automation/Python-Version
    docker compose up -d
    
  3. Wait for PostgreSQL to be ready:

    docker logs ferrero-tracking-db
    # Wait for "database system is ready to accept connections"
    
  4. Restore from backup:

    ./database/restore.sh backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz
    
  5. Verify data:

    PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
    SELECT
        (SELECT COUNT(*) FROM master_assets) as master_assets,
        (SELECT COUNT(*) FROM derivative_assets) as derivative_assets,
        (SELECT COUNT(*) FROM creativex_scores) as creativex_scores;
    "
    
  6. Resume workflows:

    # Test connection
    python scripts/test_connection.py
    
    # Workflows will auto-resume on next cron run
    

Monitoring

Check Backup Logs

# View recent backup activity
tail -50 logs/backup.log

# Search for errors
grep -i error logs/backup.log

# View today's backups
grep "$(date +%Y-%m-%d)" logs/backup.log

Verify Backups Were Created

# Check latest daily backup
ls -lh backups/dumps/ | tail -5

# Check latest weekly backup
ls -lh backups/basebackups/

# Run health check
./database/check_backups.sh

Systemd Logs

If using the systemd services, you can also check the system journal:

# Check daily backup service logs
sudo journalctl -u ferrero-backup-daily.service

# Check weekly backup service logs
sudo journalctl -u ferrero-backup-weekly.service

Email Notifications

Backup failures automatically send emails to ERROR_EMAIL configured in .env.

Email includes:

  • Error message
  • Timestamp
  • Server hostname

Disk Space Management

Current Usage

# Check backup directory size
du -sh backups/

# Check Docker volume size
docker system df -v | grep ferrero-db-data

Expected Growth

Estimated for 7 days retention:

  • Daily dumps: 7 files × ~2 MB = ~14 MB
  • Weekly backup: 1 file × ~30 MB = ~30 MB
  • Safety backups: ~5 MB (created only during restores)
  • Total: ~50 MB maximum

Cleanup

Automatic cleanup is built into scripts:

  • Daily backups older than 7 days are auto-deleted
  • Only latest weekly backup is kept
  • Safety backups can be manually deleted after successful restore

Manual cleanup if needed:

# Remove old safety backups (optional)
rm backups/safety/*.sql.gz

# Remove specific backup
rm backups/dumps/ferrero_tracking_2025-11-01_02-00.sql.gz

Manual Backup Commands

Create Ad-Hoc Backup Before Risky Operation

# Quick backup before making changes
TIMESTAMP=$(date +%Y-%m-%d_%H-%M)
docker exec ferrero-tracking-db pg_dump -U ferrero_user ferrero_tracking | \
  gzip > "backups/manual/before_change_${TIMESTAMP}.sql.gz"

Backup Specific Table Only

# Backup just creativex_scores table
docker exec ferrero-tracking-db pg_dump -U ferrero_user -t creativex_scores ferrero_tracking | \
  gzip > "backups/manual/creativex_scores_${TIMESTAMP}.sql.gz"

Copy Backup to Remote Location

# SCP to another server
scp backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz \
    user@backup-server:/backups/ferrero/

# Sync entire backup directory
rsync -avz backups/ user@backup-server:/backups/ferrero/

Testing Backups

CRITICAL: Always test your backups! A backup you can't restore is worthless.

Test Procedure (Monthly)

  1. Create test database:

    docker exec ferrero-tracking-db createdb -U ferrero_user ferrero_tracking_test
    
  2. Restore backup to test database:

    gunzip -c backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz | \
      docker exec -i ferrero-tracking-db psql -U ferrero_user -d ferrero_tracking_test
    
  3. Verify data:

    docker exec ferrero-tracking-db psql -U ferrero_user -d ferrero_tracking_test -c "
    SELECT COUNT(*) FROM master_assets;
    SELECT COUNT(*) FROM creativex_scores;
    "
    
  4. Cleanup test database:

    docker exec ferrero-tracking-db dropdb -U ferrero_user ferrero_tracking_test
    

Troubleshooting

Backup Failed: "Container not running"

# Check Docker status
docker ps -a | grep ferrero-tracking-db

# Start container
docker compose up -d

# Verify database is ready
docker logs ferrero-tracking-db | tail -20

Backup Failed: "Permission denied"

# Fix script permissions
chmod +x database/backup.sh database/restore.sh database/check_backups.sh

# Fix backup directory permissions
chmod 755 backups
chmod 600 backups/dumps/*.sql.gz

Restore Failed: "Database is being accessed"

# Stop all workflows accessing database
# (or just run restore during off-hours)

# Kill active connections
docker exec ferrero-tracking-db psql -U ferrero_user -d postgres -c "
SELECT pg_terminate_backend(pg_stat_activity.pid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = 'ferrero_tracking'
  AND pid <> pg_backend_pid();
"

# Retry restore
./database/restore.sh <backup-file>

Backup Directory Full

# Check space
df -h /opt/ferrero-automation

# Remove old safety backups
rm backups/safety/*.sql.gz

# Reduce retention (edit backup.sh):
DAILY_RETENTION_DAYS=3  # Instead of 7

Latest Backup is Stale (> 25 hours)

# Check backup log for errors
tail -100 logs/backup.log | grep -i error

# Check cron is running
crontab -l | grep backup

# Check cron logs
grep CRON /var/log/syslog | grep backup

# Run manual backup
./database/backup.sh --daily

Backup Verification

Check Backup File Integrity

# Test gzip file is not corrupted
gunzip -t backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz
echo $?  # Should output 0 if OK

# View first 20 lines of backup (without extracting)
gunzip -c backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz | head -20

Verify Backup Contents

# Count how many INSERT statements
gunzip -c backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz | grep -c "INSERT"

# Check for specific table
gunzip -c backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz | grep "CREATE TABLE creativex_scores"

Production Deployment

Initial Setup on Server

# SSH to server
ssh user@production-server

# Navigate to project
cd /opt/ferrero-automation/Python-Version

# Make scripts executable
chmod +x database/backup.sh
chmod +x database/restore.sh
chmod +x database/check_backups.sh

# Create backup directories
mkdir -p backups/dumps
mkdir -p backups/basebackups
mkdir -p backups/safety
mkdir -p backups/manual

# Set permissions
chmod 700 backups
chmod 755 database/*.sh

# Test backup manually
./database/backup.sh --daily

Add to Cron

# Edit crontab
crontab -e

# Add these lines:
0 2 * * * cd /opt/ferrero-automation/Python-Version && ./database/backup.sh --daily >> logs/backup.log 2>&1
0 3 * * 0 cd /opt/ferrero-automation/Python-Version && ./database/backup.sh --weekly >> logs/backup.log 2>&1
0 8 * * * cd /opt/ferrero-automation/Python-Version && ./database/check_backups.sh --quiet >> logs/backup_check.log 2>&1

# Save and exit

Verify Cron Setup

# List cron jobs
crontab -l | grep backup

# Check cron service is running
systemctl status cron  # or 'crond' on some systems

# Wait for next scheduled backup and check
tail -f logs/backup.log

Advanced Operations

Backup Before Major Changes

# Before schema changes
./database/backup.sh --daily

# Verify backup created
ls -lh backups/dumps/ | tail -1

# Make your changes
# ...

# If something goes wrong, restore
./database/restore.sh backups/dumps/ferrero_tracking_2025-11-11_14-30.sql.gz

Backup to Remote Server (Optional)

Add to cron for offsite backup:

# Rsync backups to remote server daily at 4:00 AM
0 4 * * * rsync -avz /opt/ferrero-automation/Python-Version/backups/ backup-server:/ferrero-backups/ >> /opt/ferrero-automation/Python-Version/logs/rsync.log 2>&1

Point-in-Time Recovery

For future enhancement, enable WAL archiving in Docker:

# docker-compose.yml
services:
  postgres:
    command: postgres -c wal_level=replica -c archive_mode=on -c archive_command='cp %p /backups/wal/%f'
    volumes:
      - ./backups/wal:/backups/wal

Maintenance

Weekly Tasks

  • Verify latest backup exists
  • Check backup file sizes are reasonable
  • Review backup logs for errors
# Quick weekly check
./database/check_backups.sh
tail -100 logs/backup.log | grep -E "ERROR|✓"

Monthly Tasks

  • Test restore process with test database
  • Verify email notifications work
  • Check disk space usage
# Monthly test restore
docker exec ferrero-tracking-db createdb -U ferrero_user test_restore
gunzip -c backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz | \
  docker exec -i ferrero-tracking-db psql -U ferrero_user -d test_restore
docker exec ferrero-tracking-db dropdb -U ferrero_user test_restore

Security

Backup File Permissions

# Set restrictive permissions on backup directories
chmod 700 backups
chmod 700 backups/dumps
chmod 700 backups/basebackups
chmod 600 backups/dumps/*.sql.gz
chmod 600 backups/basebackups/*.tar.gz

What's in Backups?

Backup files contain:

  • All database tables (master_assets, creativex_scores, etc.)
  • All data including metadata (JSONB columns)
  • Database schema (tables, indexes, functions, triggers)
  • NO sensitive files (.env, certificates, Box-config.json)

Encryption (Future Enhancement)

For offsite backups, consider encryption:

# Encrypt backup
gpg --encrypt --recipient your-email@domain.com backup.sql.gz

# Decrypt when needed
gpg --decrypt backup.sql.gz.gpg > backup.sql.gz

Backup Failure Scenarios

What Happens if Backup Fails?

  1. Error logged to logs/backup.log
  2. Email sent to ERROR_EMAIL
  3. Cron continues - will retry next day
  4. Previous backups still available

Email Template for Failures

Recipients receive email with:

  • Error message
  • Timestamp of failure
  • Server hostname
  • Instructions to check logs

Recovery Time Objectives

Scenario Data Loss Recovery Time
Restore from daily (2 AM backup) Max 24 hours ~2-5 minutes
Restore from weekly (Sunday backup) Max 7 days ~1 minute
Manual backup before change No loss ~2-5 minutes

FAQs

How much data will I lose if I restore?

Any data created after the backup timestamp will be lost.

Example:

  • Backup at 2:00 AM on Nov 11
  • Database corrupted at 3:00 PM on Nov 11
  • Restore from 2:00 AM backup
  • Lost: 13 hours of data (assets processed between 2 AM - 3 PM)

Can I restore just one table?

Yes, with pg_dump backups:

# Extract just creativex_scores table
gunzip -c backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz | \
  grep -A 10000 "CREATE TABLE creativex_scores" | \
  docker exec -i ferrero-tracking-db psql -U ferrero_user -d ferrero_tracking

How do I backup before deploying new code?

# Quick manual backup
./database/backup.sh --daily

# Or ad-hoc backup with custom name
docker exec ferrero-tracking-db pg_dump -U ferrero_user ferrero_tracking | \
  gzip > "backups/manual/before_deployment_$(date +%Y-%m-%d).sql.gz"

What if I need backups older than 7 days?

Modify retention in backup.sh:

# Edit backup.sh
DAILY_RETENTION_DAYS=30  # Keep 30 days instead of 7

Or manually copy important backups to archive:

mkdir -p backups/archive
cp backups/dumps/ferrero_tracking_2025-11-11_02-00.sql.gz backups/archive/

Support Commands

Database Size

# Check database size
docker exec ferrero-tracking-db psql -U ferrero_user -d ferrero_tracking -c "
SELECT pg_size_pretty(pg_database_size('ferrero_tracking'));
"

Table Sizes

# Check individual table sizes
docker exec ferrero-tracking-db psql -U ferrero_user -d ferrero_tracking -c "
SELECT
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
"

Backup History

# List all backups with sizes
find backups/ -name "*.sql.gz" -o -name "*.tar.gz" | \
  xargs ls -lh | \
  awk '{print $9 " - " $5 " - " $6 " " $7 " " $8}'

Summary

Automated - Daily and weekly backups via cron Reliable - Dual strategy (logical + physical) Space-efficient - 7-day retention, ~50 MB total Safe - Safety backups before restore Monitored - Health checks and email alerts Docker-compatible - Works with containerized PostgreSQL Production-ready - Standard PostgreSQL tools

Recommendation: Set up cron jobs immediately and test restore monthly.