# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview **Ferrero DAM Content Scaling - Automated Python Workflow System** This is a production automation system that orchestrates asset workflows between OpenText DAM, Box, and PostgreSQL. The system processes digital assets through multiple lifecycle stages (A1→A2, A5→A6, B1→B2, A2→A3) with automated email notifications and daily reporting. **Key characteristics:** - Production system with cron automation (every 5 minutes) - Dual authentication support: OAuth2 (default) and mTLS certificate - Python 3.6+ compatible for server deployment - Heavy use of external APIs (OpenText DAM, Box SDK) - PostgreSQL database with JSONB metadata storage ## Common Commands ### Development Setup ```bash # Create and activate virtual environment python3 -m venv venv source venv/bin/activate # Install dependencies pip install -r requirements.txt # Test connections python scripts/test_connection.py ``` ### Running Workflows **OAuth2 (default):** ```bash python scripts/a1_to_a2_download.py python scripts/a5_to_a6_download.py python scripts/b1_to_b2_download.py python scripts/a2_to_a3_upload_polling.py ``` **mTLS authentication (requires --auth-pfx flag and whitelisted IP):** ```bash python scripts/a1_to_a2_download.py --auth-pfx python scripts/a5_to_a6_download.py --auth-pfx python scripts/b1_to_b2_download.py --auth-pfx ``` ### Testing ```bash # Test all connections python scripts/test_connection.py # Test mTLS certificate python scripts/test_mtls_cert.py python scripts/test_connection.py --auth-pfx # Generate daily report manually python scripts/daily_report.py ``` ### Database Operations ```bash # Connect to database PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking # View recent assets psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT tracking_id, original_filename, created_at FROM master_assets ORDER BY created_at DESC LIMIT 10;" # Initialize/reset database psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -f database/init.sql ``` ### Monitoring ```bash # Watch workflow logs tail -f logs/a1_to_a2.log tail -f logs/a5_to_a6.log # Search for errors grep -i error logs/*.log grep -i "NOT APPROVED" logs/a5_to_a6.log # View cron execution logs tail -f logs/cron_a1_a2.log ``` ## Architecture ### System Design **Multi-stage workflow orchestration:** 1. **A1→A2**: Downloads master assets from DAM, uploads to Box with tracking IDs 2. **A5→A6**: Filters and downloads ONLY rejected assets (ECOMMERCE STATUS = "NOT APPROVED") 3. **B1→B2**: Handles global master campaigns separately 4. **A2→A3**: Polls Box for agency uploads, matches tracking IDs, uploads back to DAM ### Core Components **scripts/shared/** - Shared library modules used by all workflows: - **dam_client.py**: OpenText DAM API client with dual authentication - OAuth2 token management with auto-refresh - mTLS certificate authentication (PFX to PEM conversion) - Recursive folder search for assets (critical for subfolder support) - Asset download and metadata extraction - **Important**: Uses different base URLs for OAuth2 vs mTLS - **box_client.py**: Box SDK wrapper - JWT authentication via Box-config.json (located one folder up: `../Box-config.json`) - Creates campaign folders with tracking ID naming - Preserves DAM folder structure during upload - Three separate Box folders for different workflows (A1→A2, A2→A3, B1→B2) - **database.py**: PostgreSQL connection pool and operations - Generates unique 6-character tracking IDs - `find_or_create_tracking_id()`: Smart lookup by opentext_id + local_campaign_id (reuses IDs for rework workflow) - Stores full metadata in JSONB column (no truncation) - Checks campaign completion status for A1→A2 updates - **notifier.py**: Email and webhook notifications - 12 email templates with color-coded themes - Mailgun SMTP integration - Detailed asset listings with Box URLs - Rejection comment extraction (A5→A6 specific) - **metadata_extractor_mvp.py**: Field mapping and metadata transformation - Loads MVP fields from environment-specific config (`field_mappings_ppr.yaml` or `field_mappings_prod.yaml`) - **Two tracking modes:** Full inheritance (from master metadata) and folder-only (`-N` suffix, uses `config/asset_representation_template.json` as base) - Handles filename-based updates, forced values, defaults, and asset type overrides - Asset type overrides (e.g., EOL) can set/remove fields with final precedence - Force-sets required values (e.g., STATE = "Local") - Uses DomainValue format for domained fields when setting values on template fields ### Configuration Architecture **Hierarchical config system:** - `.env`: Environment variables (credentials, never committed) - `config/config.yaml`: Main configuration (references .env vars) - `config/field_mappings_ppr.yaml`: PPR field definitions (auto-loaded when DAM URL contains 'ppr') - `config/field_mappings_prod.yaml`: PROD field definitions (auto-loaded otherwise) - `config/asset_type_mappings.yaml`: 3-letter code to DAM code mappings (e.g., EOL -> externallegalopinion) - `config/asset_representation_template.json`: Reference template for folder-only mode (-N flag), contains full field metadata structure - `../Box-config.json`: Box JWT credentials (one directory up from Python-Version) **Important**: Box-config.json MUST be located at `../Box-config.json` (one folder up). This is hardcoded in config.yaml as `rsa_private_key_path: ../Box-config.json`. ### Database Schema **Four main tables:** 1. **master_assets**: 35 columns with JSONB full_metadata - tracking_id (6-char unique), opentext_id, original_filename - Campaign relationships: global_master_campaign_id, local_campaign_id - JSONB storage prevents metadata truncation 2. **derivative_assets**: Localized versions linked by tracking_id 3. **asset_events**: Audit trail of all operations 4. **workflow_state**: Tracks campaign processing state **Key indexes:** - `idx_master_assets_tracking_id` for fast lookups - `idx_master_assets_opentext_local` for rework workflow matching - GIN indexes on JSONB fields for metadata queries ### Workflow-Specific Logic **A1→A2 (Master Download):** - Processes ONE campaign per run (cron handles frequency) - Recursive search in "Master Assets" folder - ONLY updates status A1→A2 if ALL assets succeed - Uses `BOX_ROOT_FOLDER_A1_A2` (348304357505) **A5→A6 (Rejection/Rework):** - **Critical filter**: Only processes assets where `ECOMMERCE STATUS = "NOT APPROVED"` - Extracts 10 rejection-related fields (approver/legal/IA&CC comments) - Reuses tracking IDs via `find_or_create_tracking_id()` - Uses separate Box folder: `BOX_ROOT_FOLDER_A5_A6` (349441822875) - Search is in "Final Assets" folder, not "Master Assets" **B1→B2 (Global Masters):** - Filters for campaigns where type = "Global comm" AND status = B1 - Uses "MASTERS_" prefix for Box folder names - Searches "Final Assets" folder (B1 uses Final, not Master) - Uses `BOX_ROOT_FOLDER_B1_B2` (349261192115) - Does NOT send webhook (only email) **A2→A3 (Upload from Box):** - Polls `BOX_ROOT_FOLDER_A2_A3` (348526703108) for new files - Parses tracking ID from filename (V2 format) - Two tracking modes: - **Full inheritance**: Loads master metadata from database, inherits all fields - **Folder-only** (`-N` suffix): Uses `config/asset_representation_template.json` as base, populates from filename - Updates Description, Language, State fields from filename - Applies asset type overrides (e.g., EOL sets Agency="-", Languages="Global", IPRights="Yes", removes validity dates) - Deletes file from Box after successful upload - Updates campaign status A2→A3 when ALL assets uploaded ### Authentication Modes **Two separate authentication systems:** **OAuth2 (Production Default):** - Base URL: `DAM_BASE_URL` (https://ppr.dam.ferrero.com/otmmapi) - Token endpoint: `DAM_AUTH_URL` - Auto-refreshing access token (1 hour expiry) - No special flags needed **mTLS Certificate (Optional):** - Base URL: `DAM_MTLS_BASE_URL` (different endpoint: https://dev-auth.app-api.ferrero.com/00003/mm) - Requires PFX certificate at `DAM_MTLS_CERT_PATH` - Runtime flag: `--auth-pfx` - Requires IP whitelisting - PFX converted to PEM on-the-fly using cryptography library ## Development Guidelines ### Adding New Workflow Scripts New workflow scripts should follow this pattern: 1. Import from `scripts.shared` modules 2. Setup rotating log handler (10MB, 28 backups) 3. Parse command-line args (include `--auth-pfx` for mTLS support) 4. Load config via `load_config('config/config.yaml')` 5. Initialize clients (DAMClient, BoxClient, Database, Notifier) 6. Process campaigns individually (one per run for cron safety) 7. Send email notifications for all outcomes (success/partial/error) 8. Log extensively with campaign and asset context ### Modifying Field Mappings **To add/remove fields, edit the environment-specific file:** - PPR: `config/field_mappings_ppr.yaml` - PROD: `config/field_mappings_prod.yaml` ```yaml mvp_fields: - FERRERO.FIELD.NEW_FIELD_NAME # Add here - FERRERO.FIELD.ANOTHER_FIELD # Add here ``` **No code changes required** - the system dynamically loads fields at runtime. ### Asset Type Overrides To add field overrides for a specific asset type, add an `asset_type_overrides` section to the field mappings file: ```yaml asset_type_overrides: EOL: # Keyed by 3-letter asset type code FERRERO.MARKETING.FIELD.AGENCY NAME: "-" MAIN_LANGUAGES: "Global" FERRERO.FIELD.ASSET VALIDITY START PERIOD: "" # Empty string removes the field ``` Overrides run after all other field processing (forced values, defaults) and take final precedence. An empty string value removes the field entirely from the payload. ### Database Queries Common patterns used in the codebase: ```python # Generate unique tracking ID tracking_id = db.generate_unique_tracking_id() # Find or reuse existing tracking ID (for rework) result = db.find_or_create_tracking_id(opentext_id, local_campaign_id) # Store master asset with full metadata db.store_master_asset( tracking_id=tracking_id, opentext_id=asset_id, asset_data=asset, # Full dict stored in JSONB box_file_id=box_result['file_id'], box_url=box_result['url'], upload_folder_id=final_folder_id, global_master_campaign_id=global_ref['global_master_campaign_id'], local_campaign_id=campaign_number ) # Check if all campaign assets uploaded all_done = db.check_all_campaign_assets_uploaded(campaign_number) ``` ### Email Notifications **12 template types in notifier.py:** - a1_to_a2_complete, a1_to_a2_partial, a1_to_a2_no_assets - a5_to_a6_rejections, a5_to_a6_partial, a5_to_a6_no_rejections - b1_to_b2_complete, b1_to_b2_partial, b1_to_b2_no_assets - a2_to_a3_file_uploaded, a2_to_a3_complete - daily_summary_report Each template includes campaign context, asset lists, Box URLs, and next steps. ### Error Handling Patterns The codebase uses extensive try/except with logging: ```python try: # Process asset file_path = dam.download_asset(asset_id, output_dir) box_result = box.upload_with_tracking_id(file_path, ...) db.store_master_asset(...) processed_assets.append(asset_info) except Exception as e: logger.error("Failed: {}".format(str(e))) failed_assets.append(asset_info) ``` **Important**: Workflows only update campaign status if ALL assets succeed to prevent partial state issues. ### Python 3.6 Compatibility **Required for server deployment:** - Use `.format()` instead of f-strings - Avoid walrus operator (`:=`) - No dict union operator (`|`) - Type hints are optional - Use `os.path` instead of `Path` where possible ## Deployment Notes ### Server Directory Structure ``` /opt/ferrero-automation/ ├── Box-config.json # Box JWT config HERE (one level up!) └── Python-Version/ ├── .env # Environment variables ├── config/ │ ├── config.yaml │ ├── field_mappings_ppr.yaml │ ├── field_mappings_prod.yaml │ ├── asset_type_mappings.yaml │ ├── asset_representation_template.json │ └── certificates/ │ └── dam-mtls-dev.pfx ├── database/ │ └── init.sql ├── scripts/ │ ├── *.py # Workflow scripts │ └── shared/ # Shared modules ├── logs/ # Auto-created by scripts ├── temp/ # Auto-created for downloads └── venv/ # Virtual environment ``` ### Cron Configuration Production cron runs workflows every 5 minutes: ```cron */5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a1_to_a2_download.py >> logs/cron_a1_a2.log 2>&1 */5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a5_to_a6_download.py >> logs/cron_a5_a6.log 2>&1 */5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/b1_to_b2_download.py >> logs/cron_b1_b2.log 2>&1 */5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a2_to_a3_upload_polling.py >> logs/cron_a2_a3.log 2>&1 0 19 * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/daily_report.py >> logs/daily_report.log 2>&1 ``` ### Security Considerations **Files to protect (chmod 600):** - `.env` (all credentials) - `config/certificates/*.pfx` (mTLS certificates) - `../Box-config.json` (Box JWT private key) **Gitignored secrets:** - `.env` - `config/certificates/` - `*.pfx`, `*.p12` - `logs/`, `temp/` **Production checklist:** - Change default database password (ferrero_pass_2025) - Update SMTP credentials - Get production mTLS certificate (current is dev) - Verify all Box folder IDs are production IDs - Configure firewall rules for IP whitelisting (if using mTLS) ## Troubleshooting ### "Box-config.json not found" **Cause**: Box JWT file must be one directory up from Python-Version **Fix**: Verify file exists at `../Box-config.json` relative to Python-Version directory ### "mTLS connection refused" **Cause**: IP not whitelisted or wrong base URL **Fix**: Check `DAM_MTLS_BASE_URL` uses the correct mTLS endpoint and verify IP is whitelisted ### "No A1 campaigns found" **Expected behavior**: Scripts check every 5 minutes via cron. This message is normal when no campaigns are in A1 status. ### Database connection fails **Fix**: Verify PostgreSQL is running on port 5437 (not default 5432) ```bash docker ps | grep ferrero PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT 1;" ``` ### Emails not sending **Fix**: Check SMTP configuration in .env, verify Mailgun credentials ## Additional Documentation - **README.md**: Complete user documentation with deployment guide - **DEPLOYMENT_GUIDE.md**: Server deployment step-by-step - **DATABASE_SCHEMA.md**: Full schema reference with field descriptions - **database/init.sql**: Complete database schema with indexes