ferrero-opentext/Python-Version/MARKDOWN_DOCS/DEPLOYMENT_GUIDE.md

622 lines
15 KiB
Markdown

# Ferrero Asset Tracking - Complete Deployment Guide
**Last Updated:** November 5, 2025
**Version:** 1.0 - Production Ready
---
## 📦 What You're Deploying
**Complete automation system with:**
- 4 workflow automation scripts (A1→A2, A2→A3, A5→A6, B1→B2)
- PostgreSQL database with tracking system
- Dual authentication (OAuth2 + mTLS)
- Email notifications
- Daily summary reports
- Recursive folder search
- NOT APPROVED filtering (A5→A6)
---
## 🚀 Server Deployment Steps
### Step 1: Copy Files to Server
**From your local machine:**
```bash
# Copy entire Python-Version folder (excluding temp files)
rsync -av --exclude='venv' --exclude='logs' --exclude='temp' \
--exclude='__pycache__' --exclude='*.pyc' --exclude='.DS_Store' \
--exclude='asset_metadata_debug.json' --exclude='check_asset_metadata.py' \
--exclude='00:*' \
Python-Version/ user@server:/opt/ferrero-automation/
```
**Or use SCP:**
```bash
scp -r Python-Version/ user@server:/opt/ferrero-automation/
```
---
### Step 2: Setup PostgreSQL Database
**Option A: Docker (Recommended)**
```bash
# On server
cd /opt/ferrero-automation
# Create Docker compose file
cat > docker-compose.yml << 'DOCKER'
version: '3.8'
services:
postgres:
image: postgres:15
container_name: ferrero-tracking-db
restart: always
environment:
POSTGRES_DB: ferrero_tracking
POSTGRES_USER: ferrero_user
POSTGRES_PASSWORD: ferrero_pass_2025
ports:
- "5437:5432"
volumes:
- ./database/init.sql:/docker-entrypoint-initdb.d/init.sql
- ferrero-db-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ferrero_user -d ferrero_tracking"]
interval: 10s
timeout: 5s
retries: 5
volumes:
ferrero-db-data:
driver: local
DOCKER
# Start database
docker-compose up -d
# Wait for it to be healthy
docker-compose ps
```
**Option B: Native PostgreSQL**
```bash
# Install PostgreSQL 15
sudo apt update
sudo apt install postgresql-15
# Create database and user
sudo -u postgres psql << 'SQL'
CREATE DATABASE ferrero_tracking;
CREATE USER ferrero_user WITH PASSWORD 'ferrero_pass_2025';
GRANT ALL PRIVILEGES ON DATABASE ferrero_tracking TO ferrero_user;
\c ferrero_tracking
GRANT ALL ON SCHEMA public TO ferrero_user;
SQL
# Run init script
psql -h localhost -p 5432 -U ferrero_user -d ferrero_tracking -f database/init.sql
```
---
### Step 3: Initialize Database Schema
**The init.sql file includes:**
- ✅ master_assets table (35 columns)
- ✅ derivative_assets table
- ✅ asset_events table (audit log)
- ✅ workflow_state table
- ✅ All indexes
- ✅ All triggers
- ✅ Helper functions
**Key columns added this session:**
- `full_metadata JSONB` - Complete metadata storage
- `global_master_campaign_id VARCHAR(50)` - Global campaign reference
- `global_master_folder_id VARCHAR(255)` - Global folder ID
- `local_campaign_id VARCHAR(50)` - Local campaign ID
---
### Step 4: Setup Python Environment
```bash
cd /opt/ferrero-automation
# Create virtual environment
python3 -m venv venv
# Activate
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
```
**Requirements include:**
- requests
- python-dotenv
- PyYAML
- psycopg2-binary
- boxsdk
- cryptography (for mTLS)
- Jinja2
- And more...
---
### Step 5: Configure Environment
**Update `.env` file for server:**
```bash
# Edit .env
nano .env
# Update these values:
DB_HOST=localhost # Or actual DB server IP
DB_PORT=5437 # 5432 if native PostgreSQL
DB_USER=ferrero_user
DB_PASSWORD=ferrero_pass_2025 # CHANGE IN PRODUCTION!
# Verify other settings:
# - Box credentials
# - SMTP credentials
# - DAM URLs (OAuth2 and mTLS)
```
**CRITICAL: Change passwords in production!**
---
### Step 6: Test Connections
```bash
cd /opt/ferrero-automation
source venv/bin/activate
# Test with OAuth2 (default)
python scripts/test_connection.py
# Should show:
# ✓ DAM connection OK
# ✓ Box connection OK
# ✓ Database connection OK
```
**If testing from whitelisted IP:**
```bash
# Test with mTLS
python scripts/test_mtls_cert.py
python scripts/test_connection.py --auth-pfx
```
---
### Step 7: Setup Systemd Services (Recommended)
We now use systemd services for reliable background execution.
**1. Install Service Files:**
```bash
# Copy all service files
sudo cp ferrero-orchestrator-prod.service /etc/systemd/system/
sudo cp scripts/creativex-service.service /etc/systemd/system/
sudo cp ferrero-backup-daily.service /etc/systemd/system/
sudo cp ferrero-backup-daily.timer /etc/systemd/system/
sudo cp ferrero-backup-weekly.service /etc/systemd/system/
sudo cp ferrero-backup-weekly.timer /etc/systemd/system/
```
**2. Enable and Start:**
```bash
# Reload systemd
sudo systemctl daemon-reload
# Enable Orchestrator (Main Workflows)
sudo systemctl enable --now ferrero-orchestrator-prod
# Enable CreativeX Service (Scoring)
sudo systemctl enable --now creativex-service
# Enable Backup Timers
sudo systemctl enable --now ferrero-backup-daily.timer
sudo systemctl enable --now ferrero-backup-weekly.timer
```
**3. Verify Status:**
```bash
# Check services
systemctl status ferrero-orchestrator-prod
systemctl status creativex-service
# Check timers
systemctl list-timers | grep ferrero
```
**4. View Logs:**
```bash
# Orchestrator
tail -f logs/orchestrator_prod.log
# CreativeX
tail -f logs/creativex_service.log
# Backups
sudo journalctl -u ferrero-backup-daily.service
```
---
### Step 8: Verify Workflows
**Manual test runs:**
```bash
cd /opt/ferrero-automation
source venv/bin/activate
# Test each workflow
python scripts/a1_to_a2_download.py
python scripts/a5_to_a6_download.py
python scripts/b1_to_b2_download.py
python scripts/a2_to_a3_upload_polling.py
# Generate daily report
python scripts/daily_report.py
```
**Check logs:**
```bash
tail -f logs/a1_to_a2.log
tail -f logs/a5_to_a6.log
```
---
## 🗄️ Database Schema Reference
### master_assets Table (35 columns)
**Core Fields:**
- `tracking_id` VARCHAR(6) UNIQUE - 6-character tracking ID
- `opentext_id` VARCHAR(255) - DAM asset ID
- `original_filename` VARCHAR(500)
- `file_extension` VARCHAR(20)
- `file_size_bytes` BIGINT
- `mime_type` VARCHAR(100)
**Campaign Relationships:**
- `global_master_campaign_id` VARCHAR(50) - Global campaign reference
- `global_master_folder_id` VARCHAR(255) - Global folder ID
- `local_campaign_id` VARCHAR(50) - Local campaign ID
**Metadata:**
- `full_metadata` JSONB - Complete 200KB+ metadata (no truncation!)
- `upload_directory` VARCHAR(1000)
- `description` TEXT
**Extracted Fields:**
- `brand_code`, `brand_name`
- `country_code`, `country_name`
- `language_code`, `language_name`
- `subject_title`
- `asset_type`, `asset_type_name`
- `duration_seconds`, `aspect_ratio`, `width_px`, `height_px`
- `tags` TEXT[], `categories` TEXT[]
**Status & Tracking:**
- `status` VARCHAR(50) DEFAULT 'active'
- `is_deleted` BOOLEAN DEFAULT false
- `created_at`, `updated_at`, `deleted_at` TIMESTAMP
- `ingested_by`, `ingested_at`
---
## 🔐 Security Checklist
**Before Production:**
- [ ] Change database password from `ferrero_pass_2025`
- [ ] Update SMTP password if needed
- [ ] Verify Box JWT credentials are production keys
- [ ] Verify DAM credentials (OAuth2 or mTLS)
- [ ] Set certificate file permissions: `chmod 600 config/certificates/*.pfx`
- [ ] Verify `.env` is not readable by other users: `chmod 600 .env`
- [ ] Configure firewall to allow only whitelisted IPs
- [ ] Enable database backups
- [ ] Set up log rotation
---
## 📧 Email Configuration
**Current SMTP (Mailgun):**
```
Server: smtp.mailgun.org:587
From: TWIST-UK-SERVER@oliver.agency
To: daveporter@oliver.agency
```
**Email Types:**
- Success notifications (green theme)
- Warning notifications (orange theme)
- Error notifications (red theme)
- Daily summary reports (blue dashboard)
---
## 🔄 Authentication Methods
### OAuth2 (Default - Production Ready)
**When to use:** Current production environment
**Configuration:**
```bash
DAM_BASE_URL=https://ppr.dam.ferrero.com/otmmapi
DAM_AUTH_URL=https://ppr.dam.ferrero.com/otdsws/oauth2/token
DAM_CLIENT_ID=otds-OLV
DAM_CLIENT_SECRET=hs28LZ9ZzQ5I9rlW3P7Wwyw85oOatlC1
```
**Usage:**
```bash
python scripts/a1_to_a2_download.py # No flag needed
```
### mTLS Certificate (Optional - Whitelisted IP Required)
**When to use:** Enhanced security, API gateway environments
**Configuration:**
```bash
DAM_MTLS_BASE_URL=https://dev-auth.app-api.ferrero.com/00003/mm
DAM_MTLS_CERT_PATH=config/certificates/dam-mtls-dev.pfx
DAM_MTLS_CERT_PASSWORD=fnJ8xrnh!54NE&2HR62=2P3YEy+hy9RajZ7v5&=y
```
**Usage:**
```bash
python scripts/a1_to_a2_download.py --auth-pfx # Requires --auth-pfx flag
```
**Requirements:**
- Must run from IP whitelisted in API gateway
- Different base URL than OAuth2
- Certificate file must be present
---
## 📊 Monitoring
### Log Files
**Location:** `logs/`
**Files:**
- `a1_to_a2.log` - Master assets download (A1→A2)
- `a5_to_a6.log` - Rework assets (A5→A6)
- `b1_to_b2.log` - Global masters (B1→B2)
- `a2_to_a3.log` - Upload from Box
- `cron_*.log` - Cron execution logs
- `daily_report.log` - Report generation log
**Log Rotation:**
- Max size: 10MB per file
- Backups: 28 files (~1 month)
- Automatic cleanup
### Database Queries
**Check recent activity:**
```sql
-- Recent master assets
SELECT tracking_id, original_filename, local_campaign_id, created_at
FROM master_assets
ORDER BY created_at DESC
LIMIT 10;
-- Count by status
SELECT status, COUNT(*)
FROM master_assets
GROUP BY status;
-- Recent events
SELECT * FROM asset_events
ORDER BY event_timestamp DESC
LIMIT 20;
```
---
## 🧪 Testing Checklist
**Before going live:**
- [ ] Database connection test passes
- [ ] OAuth2 authentication works
- [ ] mTLS authentication works (from whitelisted IP)
- [ ] A1→A2 workflow completes successfully
- [ ] A5→A6 workflow filters NOT APPROVED correctly
- [ ] B1→B2 workflow completes successfully
- [ ] A2→A3 upload works
- [ ] Email notifications received
- [ ] Daily report generates correctly
- [ ] Recursive folder search finds subfolders
- [ ] Folder structure preserved in Box
- [ ] Tracking IDs generated uniquely
- [ ] Database stores all data correctly
---
## 📁 Complete File Structure
```
/opt/ferrero-automation/
├── .env # Environment variables & credentials
├── config/
│ ├── config.yaml # Main configuration
│ ├── field_mappings.yaml # Field mappings
│ └── certificates/ # mTLS certificates (gitignored)
│ └── dam-mtls-dev.pfx # Certificate file
├── scripts/
│ ├── a1_to_a2_download.py # A1→A2 workflow
│ ├── a5_to_a6_download.py # A5→A6 workflow (rework)
│ ├── b1_to_b2_download.py # B1→B2 workflow (global)
│ ├── a2_to_a3_upload_polling.py # A2→A3 workflow
│ ├── creativex_service.py # CreativeX scoring service
│ ├── creativex-service.service # Systemd service file
│ ├── orchestrator-prod.py # Production orchestrator
│ ├── daily_report.py # Daily summary email
│ ├── test_connection.py # Connection tester
│ ├── test_mtls_cert.py # Certificate tester
│ └── shared/ # Shared modules
│ ├── dam_client.py # DAM API client (dual auth)
│ ├── box_client.py # Box API client
│ ├── database.py # PostgreSQL client
│ ├── notifier.py # Email & webhooks
│ └── config_loader.py # Config loader
├── database/
│ ├── init.sql # Database schema (create this)
│ ├── backup.sh # Backup script
│ └── restore.sh # Restore script
├── ferrero-orchestrator-prod.service # Orchestrator service file
├── ferrero-backup-daily.service # Backup service file
├── ferrero-backup-daily.timer # Backup timer file
├── ferrero-backup-weekly.service # Backup service file
├── ferrero-backup-weekly.timer # Backup timer file
├── venv/ # Virtual environment (recreate)
├── logs/ # Log files (auto-created)
├── temp/ # Temp downloads (auto-created)
├── requirements.txt # Python dependencies
└── MARKDOWN_DOCS/
├── DATABASE_SCHEMA.md # Schema reference
├── DEPLOYMENT_GUIDE.md # This guide
├── CREATIVEX_DEPLOYMENT.md # CreativeX specific guide
└── DATABASE_BACKUP_GUIDE.md # Backup specific guide
```
---
## 🗄️ Database Schema (init.sql)
Create file: `/opt/ferrero-automation/database/init.sql`
**See DATABASE_SCHEMA.md for complete schema!**
**Quick reference - master_assets table has:**
- 35 columns total
- JSONB for full metadata
- Campaign relationship fields
- Unique tracking_id constraint
- 6 indexes for performance
- 2 triggers (updated_at, event logging)
---
## 🔧 Troubleshooting
### Database Connection Issues
```bash
# Check database is running
docker ps | grep ferrero
# Test connection
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT 1;"
# Check tables exist
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "\dt"
```
### OAuth2 Issues
```bash
# Test OAuth2 auth
python scripts/test_connection.py
# Check logs
tail -f logs/a1_to_a2.log
```
### mTLS Issues
```bash
# Verify certificate loads
python scripts/test_mtls_cert.py
# Check you're on whitelisted IP
curl -I https://dev-auth.app-api.ferrero.com/00003/mm/v6
# Test mTLS connection
python scripts/test_connection.py --auth-pfx
```
### Log Files Not Created
```bash
# Create log directories
mkdir -p logs/backup
chmod 755 logs
```
---
## 🎯 Production Checklist
**Before Production:**
1. Database:
- [ ] Change password from default
- [ ] Configure backups
- [ ] Set up monitoring
2. Credentials:
- [ ] Update all passwords in .env
- [ ] Get production mTLS certificate (if using)
- [ ] Verify Box credentials are production keys
3. Testing:
- [ ] Run all test scripts
- [ ] Process test campaign end-to-end
- [ ] Verify emails received
- [ ] Check database records created
4. Monitoring:
- [ ] Set up log monitoring
- [ ] Configure daily report recipients
- [ ] Set up alerts for errors
5. Security:
- [ ] File permissions (600 for .env, certificates)
- [ ] Firewall rules configured
- [ ] Only whitelisted IPs for mTLS
---
## 📞 Support
**If issues occur:**
1. Check logs in `logs/` directory
2. Run `python scripts/test_connection.py`
3. Check database with psql commands
4. Review email notifications for errors
**Key contacts:**
- DAM API support: For mTLS questions
- Database team: For DB issues
- Email: Check SMTP logs
---
## ✅ Summary
**Copy Python-Version folder → Install dependencies → Configure DB → Test → Deploy cron jobs**
**That's it! Everything is self-contained and ready to deploy.** 🚀