13 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Ferrero DAM Content Scaling - Automated Python Workflow System
This is a production automation system that orchestrates asset workflows between OpenText DAM, Box, and PostgreSQL. The system processes digital assets through multiple lifecycle stages (A1→A2, A5→A6, B1→B2, A2→A3) with automated email notifications and daily reporting.
Key characteristics:
- Production system with cron automation (every 5 minutes)
- Dual authentication support: OAuth2 (default) and mTLS certificate
- Python 3.6+ compatible for server deployment
- Heavy use of external APIs (OpenText DAM, Box SDK)
- PostgreSQL database with JSONB metadata storage
Common Commands
Development Setup
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Test connections
python scripts/test_connection.py
Running Workflows
OAuth2 (default):
python scripts/a1_to_a2_download.py
python scripts/a5_to_a6_download.py
python scripts/b1_to_b2_download.py
python scripts/a2_to_a3_upload_polling.py
mTLS authentication (requires --auth-pfx flag and whitelisted IP):
python scripts/a1_to_a2_download.py --auth-pfx
python scripts/a5_to_a6_download.py --auth-pfx
python scripts/b1_to_b2_download.py --auth-pfx
Testing
# Test all connections
python scripts/test_connection.py
# Test mTLS certificate
python scripts/test_mtls_cert.py
python scripts/test_connection.py --auth-pfx
# Generate daily report manually
python scripts/daily_report.py
Database Operations
# Connect to database
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking
# View recent assets
psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT tracking_id, original_filename, created_at FROM master_assets ORDER BY created_at DESC LIMIT 10;"
# Initialize/reset database
psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -f database/init.sql
Monitoring
# Watch workflow logs
tail -f logs/a1_to_a2.log
tail -f logs/a5_to_a6.log
# Search for errors
grep -i error logs/*.log
grep -i "NOT APPROVED" logs/a5_to_a6.log
# View cron execution logs
tail -f logs/cron_a1_a2.log
Architecture
System Design
Multi-stage workflow orchestration:
- A1→A2: Downloads master assets from DAM, uploads to Box with tracking IDs
- A5→A6: Filters and downloads ONLY rejected assets (ECOMMERCE STATUS = "NOT APPROVED")
- B1→B2: Handles global master campaigns separately
- A2→A3: Polls Box for agency uploads, matches tracking IDs, uploads back to DAM
Core Components
scripts/shared/ - Shared library modules used by all workflows:
-
dam_client.py: OpenText DAM API client with dual authentication
- OAuth2 token management with auto-refresh
- mTLS certificate authentication (PFX to PEM conversion)
- Recursive folder search for assets (critical for subfolder support)
- Asset download and metadata extraction
- Important: Uses different base URLs for OAuth2 vs mTLS
-
box_client.py: Box SDK wrapper
- JWT authentication via Box-config.json (located one folder up:
../Box-config.json) - Creates campaign folders with tracking ID naming
- Preserves DAM folder structure during upload
- Three separate Box folders for different workflows (A1→A2, A2→A3, B1→B2)
- JWT authentication via Box-config.json (located one folder up:
-
database.py: PostgreSQL connection pool and operations
- Generates unique 6-character tracking IDs
find_or_create_tracking_id(): Smart lookup by opentext_id + local_campaign_id (reuses IDs for rework workflow)- Stores full metadata in JSONB column (no truncation)
- Checks campaign completion status for A1→A2 updates
-
notifier.py: Email and webhook notifications
- 12 email templates with color-coded themes
- Mailgun SMTP integration
- Detailed asset listings with Box URLs
- Rejection comment extraction (A5→A6 specific)
-
metadata_extractor_mvp.py: Field mapping and metadata transformation
- Loads 27 MVP fields from
config/field_mappings.yaml - Handles filename-based updates
- Force-sets required values (e.g., STATE = "Local")
- Loads 27 MVP fields from
Configuration Architecture
Hierarchical config system:
.env: Environment variables (credentials, never committed)config/config.yaml: Main configuration (references .env vars)config/field_mappings.yaml: Editable field definitions (add/remove fields without code changes)../Box-config.json: Box JWT credentials (one directory up from Python-Version)
Important: Box-config.json MUST be located at ../Box-config.json (one folder up). This is hardcoded in config.yaml as rsa_private_key_path: ../Box-config.json.
Database Schema
Four main tables:
- master_assets: 35 columns with JSONB full_metadata
- tracking_id (6-char unique), opentext_id, original_filename
- Campaign relationships: global_master_campaign_id, local_campaign_id
- JSONB storage prevents metadata truncation
- derivative_assets: Localized versions linked by tracking_id
- asset_events: Audit trail of all operations
- workflow_state: Tracks campaign processing state
Key indexes:
idx_master_assets_tracking_idfor fast lookupsidx_master_assets_opentext_localfor rework workflow matching- GIN indexes on JSONB fields for metadata queries
Workflow-Specific Logic
A1→A2 (Master Download):
- Processes ONE campaign per run (cron handles frequency)
- Recursive search in "Master Assets" folder
- ONLY updates status A1→A2 if ALL assets succeed
- Uses
BOX_ROOT_FOLDER_A1_A2(348304357505)
A5→A6 (Rejection/Rework):
- Critical filter: Only processes assets where
ECOMMERCE STATUS = "NOT APPROVED" - Extracts 10 rejection-related fields (approver/legal/IA&CC comments)
- Reuses tracking IDs via
find_or_create_tracking_id() - Uses separate Box folder:
BOX_ROOT_FOLDER_A5_A6(349441822875) - Search is in "Final Assets" folder, not "Master Assets"
B1→B2 (Global Masters):
- Filters for campaigns where type = "Global comm" AND status = B1
- Uses "MASTERS_" prefix for Box folder names
- Searches "Final Assets" folder (B1 uses Final, not Master)
- Uses
BOX_ROOT_FOLDER_B1_B2(349261192115) - Does NOT send webhook (only email)
A2→A3 (Upload from Box):
- Polls
BOX_ROOT_FOLDER_A2_A3(348526703108) for new files - Parses tracking ID from filename (V2 format)
- Loads master metadata from database
- Updates Description, Language, State fields from filename
- Deletes file from Box after successful upload
- Updates campaign status A2→A3 when ALL assets uploaded
Authentication Modes
Two separate authentication systems:
OAuth2 (Production Default):
- Base URL:
DAM_BASE_URL(https://ppr.dam.ferrero.com/otmmapi) - Token endpoint:
DAM_AUTH_URL - Auto-refreshing access token (1 hour expiry)
- No special flags needed
mTLS Certificate (Optional):
- Base URL:
DAM_MTLS_BASE_URL(different endpoint: https://dev-auth.app-api.ferrero.com/00003/mm) - Requires PFX certificate at
DAM_MTLS_CERT_PATH - Runtime flag:
--auth-pfx - Requires IP whitelisting
- PFX converted to PEM on-the-fly using cryptography library
Development Guidelines
Adding New Workflow Scripts
New workflow scripts should follow this pattern:
- Import from
scripts.sharedmodules - Setup rotating log handler (10MB, 28 backups)
- Parse command-line args (include
--auth-pfxfor mTLS support) - Load config via
load_config('config/config.yaml') - Initialize clients (DAMClient, BoxClient, Database, Notifier)
- Process campaigns individually (one per run for cron safety)
- Send email notifications for all outcomes (success/partial/error)
- Log extensively with campaign and asset context
Modifying Field Mappings
To add/remove fields, edit config/field_mappings.yaml:
mvp_fields:
- FERRERO.FIELD.NEW_FIELD_NAME # Add here
- FERRERO.FIELD.ANOTHER_FIELD # Add here
No code changes required - the system dynamically loads fields at runtime.
Database Queries
Common patterns used in the codebase:
# Generate unique tracking ID
tracking_id = db.generate_unique_tracking_id()
# Find or reuse existing tracking ID (for rework)
result = db.find_or_create_tracking_id(opentext_id, local_campaign_id)
# Store master asset with full metadata
db.store_master_asset(
tracking_id=tracking_id,
opentext_id=asset_id,
asset_data=asset, # Full dict stored in JSONB
box_file_id=box_result['file_id'],
box_url=box_result['url'],
upload_folder_id=final_folder_id,
global_master_campaign_id=global_ref['global_master_campaign_id'],
local_campaign_id=campaign_number
)
# Check if all campaign assets uploaded
all_done = db.check_all_campaign_assets_uploaded(campaign_number)
Email Notifications
12 template types in notifier.py:
- a1_to_a2_complete, a1_to_a2_partial, a1_to_a2_no_assets
- a5_to_a6_rejections, a5_to_a6_partial, a5_to_a6_no_rejections
- b1_to_b2_complete, b1_to_b2_partial, b1_to_b2_no_assets
- a2_to_a3_file_uploaded, a2_to_a3_complete
- daily_summary_report
Each template includes campaign context, asset lists, Box URLs, and next steps.
Error Handling Patterns
The codebase uses extensive try/except with logging:
try:
# Process asset
file_path = dam.download_asset(asset_id, output_dir)
box_result = box.upload_with_tracking_id(file_path, ...)
db.store_master_asset(...)
processed_assets.append(asset_info)
except Exception as e:
logger.error("Failed: {}".format(str(e)))
failed_assets.append(asset_info)
Important: Workflows only update campaign status if ALL assets succeed to prevent partial state issues.
Python 3.6 Compatibility
Required for server deployment:
- Use
.format()instead of f-strings - Avoid walrus operator (
:=) - No dict union operator (
|) - Type hints are optional
- Use
os.pathinstead ofPathwhere possible
Deployment Notes
Server Directory Structure
/opt/ferrero-automation/
├── Box-config.json # Box JWT config HERE (one level up!)
└── Python-Version/
├── .env # Environment variables
├── config/
│ ├── config.yaml
│ ├── field_mappings.yaml
│ └── certificates/
│ └── dam-mtls-dev.pfx
├── database/
│ └── init.sql
├── scripts/
│ ├── *.py # Workflow scripts
│ └── shared/ # Shared modules
├── logs/ # Auto-created by scripts
├── temp/ # Auto-created for downloads
└── venv/ # Virtual environment
Cron Configuration
Production cron runs workflows every 5 minutes:
*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a1_to_a2_download.py >> logs/cron_a1_a2.log 2>&1
*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a5_to_a6_download.py >> logs/cron_a5_a6.log 2>&1
*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/b1_to_b2_download.py >> logs/cron_b1_b2.log 2>&1
*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a2_to_a3_upload_polling.py >> logs/cron_a2_a3.log 2>&1
0 19 * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/daily_report.py >> logs/daily_report.log 2>&1
Security Considerations
Files to protect (chmod 600):
.env(all credentials)config/certificates/*.pfx(mTLS certificates)../Box-config.json(Box JWT private key)
Gitignored secrets:
.envconfig/certificates/*.pfx,*.p12logs/,temp/
Production checklist:
- Change default database password (ferrero_pass_2025)
- Update SMTP credentials
- Get production mTLS certificate (current is dev)
- Verify all Box folder IDs are production IDs
- Configure firewall rules for IP whitelisting (if using mTLS)
Troubleshooting
"Box-config.json not found"
Cause: Box JWT file must be one directory up from Python-Version
Fix: Verify file exists at ../Box-config.json relative to Python-Version directory
"mTLS connection refused"
Cause: IP not whitelisted or wrong base URL
Fix: Check DAM_MTLS_BASE_URL uses the correct mTLS endpoint and verify IP is whitelisted
"No A1 campaigns found"
Expected behavior: Scripts check every 5 minutes via cron. This message is normal when no campaigns are in A1 status.
Database connection fails
Fix: Verify PostgreSQL is running on port 5437 (not default 5432)
docker ps | grep ferrero
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT 1;"
Emails not sending
Fix: Check SMTP configuration in .env, verify Mailgun credentials
Additional Documentation
- README.md: Complete user documentation with deployment guide
- DEPLOYMENT_GUIDE.md: Server deployment step-by-step
- DATABASE_SCHEMA.md: Full schema reference with field descriptions
- database/init.sql: Complete database schema with indexes