ferrero-opentext/Python-Version/MARKDOWN_DOCS/CLAUDE.md

13 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Ferrero DAM Content Scaling - Automated Python Workflow System

This is a production automation system that orchestrates asset workflows between OpenText DAM, Box, and PostgreSQL. The system processes digital assets through multiple lifecycle stages (A1→A2, A5→A6, B1→B2, A2→A3) with automated email notifications and daily reporting.

Key characteristics:

  • Production system with cron automation (every 5 minutes)
  • Dual authentication support: OAuth2 (default) and mTLS certificate
  • Python 3.6+ compatible for server deployment
  • Heavy use of external APIs (OpenText DAM, Box SDK)
  • PostgreSQL database with JSONB metadata storage

Common Commands

Development Setup

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Test connections
python scripts/test_connection.py

Running Workflows

OAuth2 (default):

python scripts/a1_to_a2_download.py
python scripts/a5_to_a6_download.py
python scripts/b1_to_b2_download.py
python scripts/a2_to_a3_upload_polling.py

mTLS authentication (requires --auth-pfx flag and whitelisted IP):

python scripts/a1_to_a2_download.py --auth-pfx
python scripts/a5_to_a6_download.py --auth-pfx
python scripts/b1_to_b2_download.py --auth-pfx

Testing

# Test all connections
python scripts/test_connection.py

# Test mTLS certificate
python scripts/test_mtls_cert.py
python scripts/test_connection.py --auth-pfx

# Generate daily report manually
python scripts/daily_report.py

Database Operations

# Connect to database
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking

# View recent assets
psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT tracking_id, original_filename, created_at FROM master_assets ORDER BY created_at DESC LIMIT 10;"

# Initialize/reset database
psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -f database/init.sql

Monitoring

# Watch workflow logs
tail -f logs/a1_to_a2.log
tail -f logs/a5_to_a6.log

# Search for errors
grep -i error logs/*.log
grep -i "NOT APPROVED" logs/a5_to_a6.log

# View cron execution logs
tail -f logs/cron_a1_a2.log

Architecture

System Design

Multi-stage workflow orchestration:

  1. A1→A2: Downloads master assets from DAM, uploads to Box with tracking IDs
  2. A5→A6: Filters and downloads ONLY rejected assets (ECOMMERCE STATUS = "NOT APPROVED")
  3. B1→B2: Handles global master campaigns separately
  4. A2→A3: Polls Box for agency uploads, matches tracking IDs, uploads back to DAM

Core Components

scripts/shared/ - Shared library modules used by all workflows:

  • dam_client.py: OpenText DAM API client with dual authentication

    • OAuth2 token management with auto-refresh
    • mTLS certificate authentication (PFX to PEM conversion)
    • Recursive folder search for assets (critical for subfolder support)
    • Asset download and metadata extraction
    • Important: Uses different base URLs for OAuth2 vs mTLS
  • box_client.py: Box SDK wrapper

    • JWT authentication via Box-config.json (located one folder up: ../Box-config.json)
    • Creates campaign folders with tracking ID naming
    • Preserves DAM folder structure during upload
    • Three separate Box folders for different workflows (A1→A2, A2→A3, B1→B2)
  • database.py: PostgreSQL connection pool and operations

    • Generates unique 6-character tracking IDs
    • find_or_create_tracking_id(): Smart lookup by opentext_id + local_campaign_id (reuses IDs for rework workflow)
    • Stores full metadata in JSONB column (no truncation)
    • Checks campaign completion status for A1→A2 updates
  • notifier.py: Email and webhook notifications

    • 12 email templates with color-coded themes
    • Mailgun SMTP integration
    • Detailed asset listings with Box URLs
    • Rejection comment extraction (A5→A6 specific)
  • metadata_extractor_mvp.py: Field mapping and metadata transformation

    • Loads 27 MVP fields from config/field_mappings.yaml
    • Handles filename-based updates
    • Force-sets required values (e.g., STATE = "Local")

Configuration Architecture

Hierarchical config system:

  • .env: Environment variables (credentials, never committed)
  • config/config.yaml: Main configuration (references .env vars)
  • config/field_mappings.yaml: Editable field definitions (add/remove fields without code changes)
  • ../Box-config.json: Box JWT credentials (one directory up from Python-Version)

Important: Box-config.json MUST be located at ../Box-config.json (one folder up). This is hardcoded in config.yaml as rsa_private_key_path: ../Box-config.json.

Database Schema

Four main tables:

  1. master_assets: 35 columns with JSONB full_metadata
    • tracking_id (6-char unique), opentext_id, original_filename
    • Campaign relationships: global_master_campaign_id, local_campaign_id
    • JSONB storage prevents metadata truncation
  2. derivative_assets: Localized versions linked by tracking_id
  3. asset_events: Audit trail of all operations
  4. workflow_state: Tracks campaign processing state

Key indexes:

  • idx_master_assets_tracking_id for fast lookups
  • idx_master_assets_opentext_local for rework workflow matching
  • GIN indexes on JSONB fields for metadata queries

Workflow-Specific Logic

A1→A2 (Master Download):

  • Processes ONE campaign per run (cron handles frequency)
  • Recursive search in "Master Assets" folder
  • ONLY updates status A1→A2 if ALL assets succeed
  • Uses BOX_ROOT_FOLDER_A1_A2 (348304357505)

A5→A6 (Rejection/Rework):

  • Critical filter: Only processes assets where ECOMMERCE STATUS = "NOT APPROVED"
  • Extracts 10 rejection-related fields (approver/legal/IA&CC comments)
  • Reuses tracking IDs via find_or_create_tracking_id()
  • Uses separate Box folder: BOX_ROOT_FOLDER_A5_A6 (349441822875)
  • Search is in "Final Assets" folder, not "Master Assets"

B1→B2 (Global Masters):

  • Filters for campaigns where type = "Global comm" AND status = B1
  • Uses "MASTERS_" prefix for Box folder names
  • Searches "Final Assets" folder (B1 uses Final, not Master)
  • Uses BOX_ROOT_FOLDER_B1_B2 (349261192115)
  • Does NOT send webhook (only email)

A2→A3 (Upload from Box):

  • Polls BOX_ROOT_FOLDER_A2_A3 (348526703108) for new files
  • Parses tracking ID from filename (V2 format)
  • Loads master metadata from database
  • Updates Description, Language, State fields from filename
  • Deletes file from Box after successful upload
  • Updates campaign status A2→A3 when ALL assets uploaded

Authentication Modes

Two separate authentication systems:

OAuth2 (Production Default):

mTLS Certificate (Optional):

  • Base URL: DAM_MTLS_BASE_URL (different endpoint: https://dev-auth.app-api.ferrero.com/00003/mm)
  • Requires PFX certificate at DAM_MTLS_CERT_PATH
  • Runtime flag: --auth-pfx
  • Requires IP whitelisting
  • PFX converted to PEM on-the-fly using cryptography library

Development Guidelines

Adding New Workflow Scripts

New workflow scripts should follow this pattern:

  1. Import from scripts.shared modules
  2. Setup rotating log handler (10MB, 28 backups)
  3. Parse command-line args (include --auth-pfx for mTLS support)
  4. Load config via load_config('config/config.yaml')
  5. Initialize clients (DAMClient, BoxClient, Database, Notifier)
  6. Process campaigns individually (one per run for cron safety)
  7. Send email notifications for all outcomes (success/partial/error)
  8. Log extensively with campaign and asset context

Modifying Field Mappings

To add/remove fields, edit config/field_mappings.yaml:

mvp_fields:
  - FERRERO.FIELD.NEW_FIELD_NAME  # Add here
  - FERRERO.FIELD.ANOTHER_FIELD    # Add here

No code changes required - the system dynamically loads fields at runtime.

Database Queries

Common patterns used in the codebase:

# Generate unique tracking ID
tracking_id = db.generate_unique_tracking_id()

# Find or reuse existing tracking ID (for rework)
result = db.find_or_create_tracking_id(opentext_id, local_campaign_id)

# Store master asset with full metadata
db.store_master_asset(
    tracking_id=tracking_id,
    opentext_id=asset_id,
    asset_data=asset,  # Full dict stored in JSONB
    box_file_id=box_result['file_id'],
    box_url=box_result['url'],
    upload_folder_id=final_folder_id,
    global_master_campaign_id=global_ref['global_master_campaign_id'],
    local_campaign_id=campaign_number
)

# Check if all campaign assets uploaded
all_done = db.check_all_campaign_assets_uploaded(campaign_number)

Email Notifications

12 template types in notifier.py:

  • a1_to_a2_complete, a1_to_a2_partial, a1_to_a2_no_assets
  • a5_to_a6_rejections, a5_to_a6_partial, a5_to_a6_no_rejections
  • b1_to_b2_complete, b1_to_b2_partial, b1_to_b2_no_assets
  • a2_to_a3_file_uploaded, a2_to_a3_complete
  • daily_summary_report

Each template includes campaign context, asset lists, Box URLs, and next steps.

Error Handling Patterns

The codebase uses extensive try/except with logging:

try:
    # Process asset
    file_path = dam.download_asset(asset_id, output_dir)
    box_result = box.upload_with_tracking_id(file_path, ...)
    db.store_master_asset(...)
    processed_assets.append(asset_info)
except Exception as e:
    logger.error("Failed: {}".format(str(e)))
    failed_assets.append(asset_info)

Important: Workflows only update campaign status if ALL assets succeed to prevent partial state issues.

Python 3.6 Compatibility

Required for server deployment:

  • Use .format() instead of f-strings
  • Avoid walrus operator (:=)
  • No dict union operator (|)
  • Type hints are optional
  • Use os.path instead of Path where possible

Deployment Notes

Server Directory Structure

/opt/ferrero-automation/
├── Box-config.json          # Box JWT config HERE (one level up!)
└── Python-Version/
    ├── .env                 # Environment variables
    ├── config/
    │   ├── config.yaml
    │   ├── field_mappings.yaml
    │   └── certificates/
    │       └── dam-mtls-dev.pfx
    ├── database/
    │   └── init.sql
    ├── scripts/
    │   ├── *.py             # Workflow scripts
    │   └── shared/          # Shared modules
    ├── logs/                # Auto-created by scripts
    ├── temp/                # Auto-created for downloads
    └── venv/                # Virtual environment

Cron Configuration

Production cron runs workflows every 5 minutes:

*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a1_to_a2_download.py >> logs/cron_a1_a2.log 2>&1
*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a5_to_a6_download.py >> logs/cron_a5_a6.log 2>&1
*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/b1_to_b2_download.py >> logs/cron_b1_b2.log 2>&1
*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a2_to_a3_upload_polling.py >> logs/cron_a2_a3.log 2>&1
0 19 * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/daily_report.py >> logs/daily_report.log 2>&1

Security Considerations

Files to protect (chmod 600):

  • .env (all credentials)
  • config/certificates/*.pfx (mTLS certificates)
  • ../Box-config.json (Box JWT private key)

Gitignored secrets:

  • .env
  • config/certificates/
  • *.pfx, *.p12
  • logs/, temp/

Production checklist:

  • Change default database password (ferrero_pass_2025)
  • Update SMTP credentials
  • Get production mTLS certificate (current is dev)
  • Verify all Box folder IDs are production IDs
  • Configure firewall rules for IP whitelisting (if using mTLS)

Troubleshooting

"Box-config.json not found"

Cause: Box JWT file must be one directory up from Python-Version Fix: Verify file exists at ../Box-config.json relative to Python-Version directory

"mTLS connection refused"

Cause: IP not whitelisted or wrong base URL Fix: Check DAM_MTLS_BASE_URL uses the correct mTLS endpoint and verify IP is whitelisted

"No A1 campaigns found"

Expected behavior: Scripts check every 5 minutes via cron. This message is normal when no campaigns are in A1 status.

Database connection fails

Fix: Verify PostgreSQL is running on port 5437 (not default 5432)

docker ps | grep ferrero
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT 1;"

Emails not sending

Fix: Check SMTP configuration in .env, verify Mailgun credentials

Additional Documentation

  • README.md: Complete user documentation with deployment guide
  • DEPLOYMENT_GUIDE.md: Server deployment step-by-step
  • DATABASE_SCHEMA.md: Full schema reference with field descriptions
  • database/init.sql: Complete database schema with indexes