# Ferrero Content Scaling - Python Automation System **Complete automated workflow for Ferrero DAM Content Scaling** **Version:** 2.0 **Last Updated:** November 5, 2025 **Status:** ✅ Production Ready & Fully Tested --- ## 📋 Table of Contents 1. [Overview](#overview) 2. [What's Included](#whats-included) 3. [Quick Start](#quick-start) 4. [Server Deployment](#server-deployment) 5. [Workflows](#workflows) 6. [Authentication](#authentication) 7. [Configuration](#configuration) 8. [Monitoring](#monitoring) 9. [Troubleshooting](#troubleshooting) --- ## 🎯 Overview **Automated end-to-end workflow system that:** - Downloads master assets from OpenText DAM - Uploads to Box with tracking IDs - Processes localized assets from Box - Filters rejected assets for rework - Sends email notifications - Generates daily summary reports - Supports OAuth2 and mTLS authentication **All workflows run autonomously via cron (every 5 minutes)** --- ## 📦 What's Included ### 4 Workflow Automation Scripts 1. **A1→A2** - Master Assets Download (Local Adaptation) - Downloads master assets from DAM - Uploads to Box with tracking IDs - Stores in database with full metadata - Updates status A1 → A2 2. **A5→A6** - Rework Assets Download (Rejections) - **Filters for NOT APPROVED assets only** - Extracts rejection comments from 3 reviewers - Uploads to separate Revisions Box folder - Smart tracking ID reuse - Updates status A5 → A6 3. **B1→B2** - Global Master Assets Download - Downloads global master assets - Uses MASTERS_CampaignNumber naming - Separate Box folder for global assets - Updates status B1 → B2 4. **A2→A3** - Upload from Box (Polling) - Monitors Box folder for agency uploads - Matches tracking IDs to master assets - Uploads to DAM with inherited metadata - Updates status A2 → A3 ### Additional Scripts 5. **Daily Report** - Automated summary email (7 PM daily) - Analyzes all workflow logs (last 24 hours) - Campaign and asset statistics - Success rate calculations - Error summaries - Professional dashboard-style email 6. **Test Scripts** - `test_connection.py` - Tests DAM, Box, Database connections - `test_mtls_cert.py` - Tests mTLS certificate loading - `test_mtls_debug.py` - Detailed mTLS debugging --- ## 🚀 Quick Start ### Prerequisites - Python 3.6+ (server) or 3.10+ (local) - PostgreSQL 15+ database - Box account with JWT app - OpenText DAM API credentials - SMTP server for emails (Mailgun recommended) ### Local Setup ```bash cd Python-Version # 1. Create virtual environment python3 -m venv venv source venv/bin/activate # 2. Install dependencies pip install -r requirements.txt # 3. Configure environment cp .env.example .env # If exists, or edit .env directly nano .env # 4. Test connections python scripts/test_connection.py # Should show: # ✓ DAM connection OK # ✓ Box connection OK # ✓ Database connection OK ``` --- ## 🖥️ Server Deployment ### Step 1: Copy Files to Server **Important: Box-config.json location** The Box JWT configuration file must be **one folder up** from Python-Version: ``` /opt/ferrero-automation/ ├── Box-config.json ← Box JWT config HERE └── Python-Version/ ├── .env ├── config/ │ └── config.yaml ← Points to ../Box-config.json └── scripts/ ``` **Copy command:** ```bash # Copy both folders scp Box-config.json user@server:/opt/ferrero-automation/ scp -r Python-Version/ user@server:/opt/ferrero-automation/ ``` **Or use rsync:** ```bash # From Ferrero-Opentext folder rsync -av Box-config.json Python-Version/ user@server:/opt/ferrero-automation/ ``` ### Step 2: Setup Database **Using Docker (Recommended):** ```bash cd /opt/ferrero-automation/Python-Version docker run -d \ --name ferrero-tracking-db \ --restart always \ -e POSTGRES_DB=ferrero_tracking \ -e POSTGRES_USER=ferrero_user \ -e POSTGRES_PASSWORD=ferrero_pass_2025 \ -p 5433:5432 \ -v $(pwd)/database/init.sql:/docker-entrypoint-initdb.d/init.sql \ -v ferrero-db-data:/var/lib/postgresql/data \ postgres:15 # Wait 10 seconds for initialization sleep 10 # Verify database docker exec ferrero-tracking-db psql -U ferrero_user -d ferrero_tracking -c "\dt" ``` ### Step 3: Install Python Dependencies ```bash cd /opt/ferrero-automation/Python-Version python3 -m venv venv source venv/bin/activate pip install -r requirements.txt ``` ### Step 4: Configure for Server **Update .env:** ```bash nano .env # Update database (if different from defaults) DB_HOST=localhost DB_PORT=5433 DB_USER=ferrero_user DB_PASSWORD=ferrero_pass_2025 # CHANGE IN PRODUCTION! # Verify all other credentials are correct ``` ### Step 5: Test Everything ```bash source venv/bin/activate # Test connections python scripts/test_connection.py # Test each workflow (won't process if no campaigns) python scripts/a1_to_a2_download.py python scripts/a5_to_a6_download.py python scripts/b1_to_b2_download.py # Generate test report python scripts/daily_report.py ``` ### Step 6: Setup Cron Jobs ```bash crontab -e ``` **Add:** ```cron # Ferrero Automation (every 5 minutes) */5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a1_to_a2_download.py >> logs/cron_a1_a2.log 2>&1 */5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a5_to_a6_download.py >> logs/cron_a5_a6.log 2>&1 */5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/b1_to_b2_download.py >> logs/cron_b1_b2.log 2>&1 */5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a2_to_a3_upload_polling.py >> logs/cron_a2_a3.log 2>&1 # Daily report (7 PM) 0 19 * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/daily_report.py >> logs/daily_report.log 2>&1 ``` --- ## 🔄 Workflows ### A1→A2: Master Assets Download **Purpose:** Download master assets from DAM for local adaptation **Process:** 1. Search for campaigns with status = A1 2. Process first campaign only (cron runs every 5 min) 3. Recursively search Master Assets folder (including subfolders) 4. Download all assets from DAM 5. Upload to Box folder `348304357505` - Folder name: `C000000078-Campaign_Name` - Preserves DAM subfolder structure 6. Store in database with tracking ID 7. **ONLY update A1→A2 if ALL assets successful** 8. Send webhook + email notification **Box Folder:** 348304357505 (Local Adaptation) **Email:** a1_to_a2_complete, a1_to_a2_partial, a1_to_a2_no_assets --- ### A5→A6: Rework Assets Download (NOT APPROVED Filter) **Purpose:** Download ONLY rejected assets that need rework **Process:** 1. Search for campaigns with status = A5 2. Process first campaign only 3. Recursively search Final Assets folder 4. **Filter for ECOMMERCE STATUS = "NOT APPROVED" only** 5. Skip all approved/other status assets 6. Download NOT APPROVED assets 7. Extract rejection details: - Approver comments, certifier, date - Legal comments, certifier, date - IA&CC comments, certifier, date 8. Upload to Box folder `349441822875` (Revisions) - Folder name: `C000000078-Campaign_Name-Revisions` - Preserves DAM subfolder structure 9. Smart tracking ID lookup (reuse if asset exists from A1→A2) 10. **ONLY update A5→A6 if ALL rejected assets successful** 11. Send detailed email with rejection comments **Box Folder:** 349441822875 (Revisions) **Email:** a5_to_a6_rejections, a5_to_a6_partial, a5_to_a6_no_rejections **Special Features:** - ✅ Filters assets by ECOMMERCE STATUS field - ✅ Extracts 10 rejection-related fields - ✅ Shows detailed comments in email (color-coded by reviewer) - ✅ Reuses tracking IDs from original download --- ### B1→B2: Global Master Assets Download **Purpose:** Download global master assets for global campaigns **Process:** 1. Search for campaigns with status = B1 AND type = "Global comm" 2. Process first campaign only 3. Recursively search Final Assets folder (B1 uses Final, not Master) 4. Download all assets 5. Upload to Box folder `349261192115` - Folder name: `MASTERS_C000000068-Campaign_Name` - Preserves DAM subfolder structure 6. Store in database 7. **ONLY update B1→B2 if ALL assets successful** 8. Send email notification (no webhook) **Box Folder:** 349261192115 (Global Masters) **Email:** b1_to_b2_complete, b1_to_b2_partial, b1_to_b2_no_assets --- ### A2→A3: Upload from Box **Purpose:** Process localized assets uploaded by agency to Box **Process:** 1. Poll Box folder `348526703108` for new files 2. Match tracking ID from filename 3. Load master metadata from database 4. Build asset representation (27 MVP fields) 5. Update Description, Language, State from filename 6. Upload to DAM Final Assets folder 7. Delete file from Box 8. **Update A2→A3 when ALL campaign assets uploaded** **Box Folder:** 348526703108 (Agency Uploads) **Email:** a2_to_a3_file_uploaded, a2_to_a3_complete --- ### Daily Report **Purpose:** Automated daily summary email at 7 PM **Process:** 1. Analyzes all 4 workflow logs (last 24 hours) 2. Calculates statistics: - Campaigns found/processed/completed/partial - Assets successful/failed/skipped - NOT APPROVED count (A5→A6) - Success rate percentage - Error count 3. Per-workflow breakdown 4. Campaign details (collapsible) 5. Error logs (collapsible) 6. Professional dashboard-style email **Runs:** Daily at 7:00 PM via cron **Email:** Blue dashboard theme with statistics --- ## 🔐 Authentication ### OAuth2 (Default - Production Ready) **Current production authentication method** **Configuration:** ```bash # .env DAM_BASE_URL=https://ppr.dam.ferrero.com/otmmapi DAM_AUTH_URL=https://ppr.dam.ferrero.com/otdsws/oauth2/token DAM_CLIENT_ID=otds-OLV DAM_CLIENT_SECRET=hs28LZ9ZzQ5I9rlW3P7Wwyw85oOatlC1 ``` **Usage (default - no flag needed):** ```bash python scripts/a1_to_a2_download.py python scripts/a5_to_a6_download.py python scripts/b1_to_b2_download.py ``` **How it works:** - Requests OAuth2 token from `/otdsws/oauth2/token` - Uses client_credentials grant type - Token added to Authorization header - Token auto-refreshes (1 hour expiry) --- ### mTLS Certificate (Optional - Whitelisted IP Required) **Enhanced security with client certificates** **Configuration:** ```bash # .env DAM_MTLS_BASE_URL=https://dev-auth.app-api.ferrero.com/00003/mm DAM_MTLS_CERT_PATH=config/certificates/dam-mtls-dev.pfx DAM_MTLS_CERT_PASSWORD=fnJ8xrnh!54NE&2HR62=2P3YEy+hy9RajZ7v5&=y ``` **Usage (requires --auth-pfx flag):** ```bash python scripts/a1_to_a2_download.py --auth-pfx python scripts/a5_to_a6_download.py --auth-pfx python scripts/b1_to_b2_download.py --auth-pfx ``` **How it works:** - Uses PFX/P12 client certificate - Certificate converted to PEM temporarily - Presented during TLS handshake - Different base URL than OAuth2 - Requires IP whitelisting **Important:** - ⚠️ Must run from whitelisted IP address - ⚠️ Different API endpoint than OAuth2 - ⚠️ For production, get production certificate **Testing certificate:** ```bash python scripts/test_mtls_cert.py python scripts/test_connection.py --auth-pfx ``` --- ## ⚙️ Configuration ### Environment Variables (.env) **Required variables:** ```bash # Environment ENV=staging # or production # DAM - OAuth2 (default) DAM_BASE_URL=https://ppr.dam.ferrero.com/otmmapi DAM_AUTH_URL=https://ppr.dam.ferrero.com/otdsws/oauth2/token DAM_CLIENT_ID=otds-OLV DAM_CLIENT_SECRET=your_secret_here # DAM - mTLS (optional with --auth-pfx) DAM_MTLS_BASE_URL=https://dev-auth.app-api.ferrero.com/00003/mm DAM_MTLS_CERT_PATH=config/certificates/dam-mtls-dev.pfx DAM_MTLS_CERT_PASSWORD=your_cert_password # Box BOX_CLIENT_ID=your_box_client_id BOX_CLIENT_SECRET=your_box_secret BOX_JWT_KEY_ID=your_jwt_key BOX_PASSPHRASE=your_passphrase BOX_ENTERPRISE_ID=43984435 # Box Folders BOX_ROOT_FOLDER_A1_A2=348304357505 # Local Adaptation downloads BOX_ROOT_FOLDER_A2_A3=348526703108 # Agency uploads BOX_ROOT_FOLDER_B1_B2=349261192115 # Global Masters # Database DB_HOST=localhost DB_PORT=5433 DB_USER=ferrero_user DB_PASSWORD=ferrero_pass_2025 # Email (Mailgun SMTP) SMTP_SERVER=smtp.mailgun.org SMTP_PORT=587 SMTP_USER=your_smtp_user SMTP_PASSWORD=your_smtp_password SENDER_EMAIL=TWIST-UK-SERVER@oliver.agency ERROR_EMAIL=daveporter@oliver.agency REPORT_EMAILS=daveporter@oliver.agency # Webhook CAMPAIGN_STATUS_WEBHOOK_URL=https://hook.us1.make.celonis.com/your_webhook_id ``` ### Box JWT Configuration **File location:** `../Box-config.json` (one folder up from Python-Version) **Structure:** ``` Ferrero-Opentext/ ├── Box-config.json ← Box JWT config └── Python-Version/ └── config/ └── config.yaml ← References ../Box-config.json ``` **For server deployment**, copy both: ```bash scp Box-config.json user@server:/opt/ferrero-automation/ scp -r Python-Version/ user@server:/opt/ferrero-automation/ ``` --- ## 🗄️ Database ### Connection Details **Default (Docker):** - Host: localhost - Port: 5433 - Database: ferrero_tracking - User: ferrero_user - Password: ferrero_pass_2025 (⚠️ CHANGE IN PRODUCTION!) ### Schema **Tables:** 1. **master_assets** (35 columns) - tracking_id, opentext_id, full_metadata (JSONB) - global_master_campaign_id, local_campaign_id - All metadata fields 2. **derivative_assets** - Localized assets 3. **asset_events** - Audit log 4. **workflow_state** - Workflow tracking **Complete schema:** See `DATABASE_SCHEMA.md` or `database/init.sql` ### Setup Database ```bash # Docker method docker run -d \ --name ferrero-tracking-db \ --restart always \ -e POSTGRES_DB=ferrero_tracking \ -e POSTGRES_USER=ferrero_user \ -e POSTGRES_PASSWORD=ferrero_pass_2025 \ -p 5433:5432 \ -v $(pwd)/database/init.sql:/docker-entrypoint-initdb.d/init.sql \ -v ferrero-db-data:/var/lib/postgresql/data \ postgres:15 # Or native PostgreSQL psql -h localhost -U postgres -f database/init.sql ``` --- ## 📊 Monitoring ### Log Files All logs in `logs/` directory: - `a1_to_a2.log` - A1→A2 workflow - `a5_to_a6.log` - A5→A6 workflow (rework) - `b1_to_b2.log` - B1→B2 workflow (global) - `a2_to_a3.log` - A2→A3 workflow (upload) - `cron_*.log` - Cron execution logs - `daily_report.log` - Report generation **Log rotation:** - Max size: 10MB per file - Backups: 28 files (~1 month retention) - Auto-cleanup **View logs:** ```bash tail -f logs/a1_to_a2.log tail -f logs/a5_to_a6.log # Search for errors grep -i error logs/*.log grep -i "NOT APPROVED" logs/a5_to_a6.log ``` ### Database Queries ```sql -- Recent master assets SELECT tracking_id, original_filename, local_campaign_id, created_at FROM master_assets ORDER BY created_at DESC LIMIT 20; -- Count by campaign SELECT local_campaign_id, COUNT(*) as asset_count FROM master_assets GROUP BY local_campaign_id ORDER BY asset_count DESC; -- Check for duplicates SELECT opentext_id, local_campaign_id, COUNT(*) FROM master_assets GROUP BY opentext_id, local_campaign_id HAVING COUNT(*) > 1; -- Recent events SELECT tracking_id, event_type, event_timestamp FROM asset_events ORDER BY event_timestamp DESC LIMIT 20; ``` ### Email Notifications **All workflows send emails:** - ✅ Success (green theme) - ⚠️ Partial/failures (orange theme) - ❌ Errors (red theme) - 📊 Daily report (blue dashboard) **Templates include:** - Campaign details - Asset lists with tracking IDs - Box URLs - Rejection comments (A5→A6) - Error messages - Next steps --- ## 🔧 Troubleshooting ### Connection Test Fails ```bash # Run diagnostic python scripts/test_connection.py # Check each component python scripts/test_connection.py --auth-pfx # Test mTLS ``` **Common issues:** - ❌ DAM: Check credentials in .env - ❌ Box: Verify Box-config.json is one folder up - ❌ Database: Check PostgreSQL is running on port 5433 - ❌ mTLS: Check you're on whitelisted IP ### Box JWT File Not Found **Error:** `Failed to initialize Box client: [Errno 2] No such file or directory: '../Box-config.json'` **Solution:** ```bash # Check Box-config.json location ls -la ../Box-config.json # Should be one folder up from Python-Version # Structure: # /opt/ferrero-automation/ # ├── Box-config.json # └── Python-Version/ ``` ### Database Connection Fails ```bash # Check Docker container running docker ps | grep ferrero # Check PostgreSQL service sudo systemctl status postgresql # Test connection manually PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5433 -U ferrero_user -d ferrero_tracking -c "SELECT 1;" ``` ### No Campaigns Found **Expected behavior** - scripts check every 5 minutes: ``` No A1 campaigns found - exiting ``` This is normal! The script will check again in 5 minutes. ### Email Not Sending ```bash # Check SMTP configuration in .env grep SMTP .env # Test email manually python -c "from scripts.shared.notifier import Notifier; from shared.config_loader import load_config; n=Notifier(load_config('config/config.yaml')); print('OK')" ``` ### mTLS Certificate Issues ```bash # Test certificate loads python scripts/test_mtls_cert.py # Should show: # ✓ Certificate loaded successfully # ✓ Valid PEM format # Check permissions ls -l config/certificates/*.pfx # Should be: -rw------- (600) # Fix if needed chmod 600 config/certificates/*.pfx ``` ### Logs Not Created ```bash # Create log directories mkdir -p logs/backup chmod 755 logs ``` --- ## 📁 Complete File Structure ``` /opt/ferrero-automation/ ├── Box-config.json ← Box JWT (one folder up!) └── Python-Version/ ├── .env ← Environment variables ├── config/ │ ├── config.yaml ← Main configuration │ ├── field_mappings.yaml ← Field definitions │ └── certificates/ ← mTLS certificates (gitignored) │ └── dam-mtls-dev.pfx ├── database/ │ └── init.sql ← Database schema ├── scripts/ │ ├── a1_to_a2_download.py ← A1→A2 workflow │ ├── a5_to_a6_download.py ← A5→A6 workflow (rework) │ ├── b1_to_b2_download.py ← B1→B2 workflow (global) │ ├── a2_to_a3_upload_polling.py ← A2→A3 workflow │ ├── daily_report.py ← Daily summary │ ├── test_connection.py ← Connection tester │ ├── test_mtls_cert.py ← Certificate tester │ ├── test_mtls_debug.py ← mTLS debugger │ └── shared/ │ ├── dam_client.py ← DAM API (dual auth) │ ├── box_client.py ← Box API │ ├── database.py ← PostgreSQL │ ├── notifier.py ← Email + webhooks (12 templates) │ └── config_loader.py ← Config loader ├── logs/ ← Auto-created ├── temp/ ← Auto-created ├── venv/ ← Virtual environment ├── requirements.txt ← Python dependencies ├── DEPLOYMENT_GUIDE.md ← Full deployment guide ├── DATABASE_SCHEMA.md ← Schema documentation └── README.md ← This file ``` --- ## 🔒 Security ### Production Checklist **Before going live:** - [ ] Change database password from default - [ ] Update all .env passwords - [ ] Get production mTLS certificate (if using mTLS) - [ ] Verify Box JWT credentials are production keys - [ ] Set file permissions: ```bash chmod 600 .env chmod 600 config/certificates/*.pfx chmod 600 ../Box-config.json ``` - [ ] Configure firewall rules - [ ] Enable database backups - [ ] Set up log monitoring - [ ] Test all workflows end-to-end ### Gitignored Files (Never Committed) ``` .env # All credentials config/certificates/ # mTLS certificates *.pfx, *.p12 # Certificate files logs/ # Log files temp/ # Temporary downloads __pycache__/ # Python cache *.pyc # Compiled files ``` --- ## 🎯 Key Features ### Recursive Folder Search (All Workflows) - ✅ Searches subfolders within Master/Final Assets folders - ✅ Preserves folder structure in Box - ✅ Logs subfolder paths ### NOT APPROVED Filtering (A5→A6 Only) - ✅ Only downloads assets with `ECOMMERCE STATUS = "NOT APPROVED"` - ✅ Skips approved/other status assets - ✅ Extracts detailed rejection comments from 3 reviewers ### Smart Tracking ID Management - ✅ 6-character unique IDs - ✅ Reuses existing IDs for rework assets (A5→A6) - ✅ Links master assets to derivatives - ✅ Full audit trail in database ### Professional Email Notifications - ✅ Unified modern styling across all templates - ✅ Color-coded by severity (green/orange/red/blue) - ✅ Detailed asset lists with Box URLs - ✅ Rejection comments with reviewer names (A5→A6) - ✅ Collapsible sections in daily report --- ## 📞 Support ### Getting Help 1. **Check logs:** `tail -f logs/a1_to_a2.log` 2. **Test connections:** `python scripts/test_connection.py` 3. **Check database:** `psql -h localhost -p 5433 -U ferrero_user -d ferrero_tracking` 4. **Review config:** `cat config/config.yaml` ### Common Commands ```bash # Activate environment source venv/bin/activate # Test everything python scripts/test_connection.py # Manual workflow run python scripts/a1_to_a2_download.py # Generate report python scripts/daily_report.py # Check cron logs tail -f logs/cron_*.log # Database stats PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5433 -U ferrero_user -d ferrero_tracking -c "SELECT COUNT(*) FROM master_assets;" ``` --- ## 📝 Additional Documentation - **DEPLOYMENT_GUIDE.md** - Complete server deployment guide - **DATABASE_SCHEMA.md** - Full database schema reference - **config/field_mappings.yaml** - Editable field definitions --- ## ✅ Production Ready **Fully tested and working:** - ✅ All 4 workflows automated - ✅ Recursive folder search - ✅ NOT APPROVED filtering - ✅ Email notifications (12 templates) - ✅ Daily summary reports - ✅ Dual authentication (OAuth2 + mTLS) - ✅ Complete database schema - ✅ Deployment documentation **Compatible with:** - Python 3.6+ (server) - Python 3.10+ (local development) - PostgreSQL 15+ - Box SDK 3.x - OpenText DAM API v6 --- **Version:** 2.0 - Production Ready **Last Updated:** November 5, 2025 **Repository:** bitbucket.org:zlalani/ferrero-opentext.git 🚀 **Ready to deploy!**