Added documentation for template-based folder-only mode (-N flag), asset type overrides (EOL), environment-specific field mappings, and updated config file references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
15 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Ferrero DAM Content Scaling - Automated Python Workflow System
This is a production automation system that orchestrates asset workflows between OpenText DAM, Box, and PostgreSQL. The system processes digital assets through multiple lifecycle stages (A1→A2, A5→A6, B1→B2, A2→A3) with automated email notifications and daily reporting.
Key characteristics:
- Production system with cron automation (every 5 minutes)
- Dual authentication support: OAuth2 (default) and mTLS certificate
- Python 3.6+ compatible for server deployment
- Heavy use of external APIs (OpenText DAM, Box SDK)
- PostgreSQL database with JSONB metadata storage
Common Commands
Development Setup
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Test connections
python scripts/test_connection.py
Running Workflows
OAuth2 (default):
python scripts/a1_to_a2_download.py
python scripts/a5_to_a6_download.py
python scripts/b1_to_b2_download.py
python scripts/a2_to_a3_upload_polling.py
mTLS authentication (requires --auth-pfx flag and whitelisted IP):
python scripts/a1_to_a2_download.py --auth-pfx
python scripts/a5_to_a6_download.py --auth-pfx
python scripts/b1_to_b2_download.py --auth-pfx
Testing
# Test all connections
python scripts/test_connection.py
# Test mTLS certificate
python scripts/test_mtls_cert.py
python scripts/test_connection.py --auth-pfx
# Generate daily report manually
python scripts/daily_report.py
Database Operations
# Connect to database
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking
# View recent assets
psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT tracking_id, original_filename, created_at FROM master_assets ORDER BY created_at DESC LIMIT 10;"
# Initialize/reset database
psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -f database/init.sql
Monitoring
# Watch workflow logs
tail -f logs/a1_to_a2.log
tail -f logs/a5_to_a6.log
# Search for errors
grep -i error logs/*.log
grep -i "NOT APPROVED" logs/a5_to_a6.log
# View cron execution logs
tail -f logs/cron_a1_a2.log
Architecture
System Design
Multi-stage workflow orchestration:
- A1→A2: Downloads master assets from DAM, uploads to Box with tracking IDs
- A5→A6: Filters and downloads ONLY rejected assets (ECOMMERCE STATUS = "NOT APPROVED")
- B1→B2: Handles global master campaigns separately
- A2→A3: Polls Box for agency uploads, matches tracking IDs, uploads back to DAM
Core Components
scripts/shared/ - Shared library modules used by all workflows:
-
dam_client.py: OpenText DAM API client with dual authentication
- OAuth2 token management with auto-refresh
- mTLS certificate authentication (PFX to PEM conversion)
- Recursive folder search for assets (critical for subfolder support)
- Asset download and metadata extraction
- Important: Uses different base URLs for OAuth2 vs mTLS
-
box_client.py: Box SDK wrapper
- JWT authentication via Box-config.json (located one folder up:
../Box-config.json) - Creates campaign folders with tracking ID naming
- Preserves DAM folder structure during upload
- Three separate Box folders for different workflows (A1→A2, A2→A3, B1→B2)
- JWT authentication via Box-config.json (located one folder up:
-
database.py: PostgreSQL connection pool and operations
- Generates unique 6-character tracking IDs
find_or_create_tracking_id(): Smart lookup by opentext_id + local_campaign_id (reuses IDs for rework workflow)- Stores full metadata in JSONB column (no truncation)
- Checks campaign completion status for A1→A2 updates
-
notifier.py: Email and webhook notifications
- 12 email templates with color-coded themes
- Mailgun SMTP integration
- Detailed asset listings with Box URLs
- Rejection comment extraction (A5→A6 specific)
-
metadata_extractor_mvp.py: Field mapping and metadata transformation
- Loads MVP fields from environment-specific config (
field_mappings_ppr.yamlorfield_mappings_prod.yaml) - Two tracking modes: Full inheritance (from master metadata) and folder-only (
-Nsuffix, usesconfig/asset_representation_template.jsonas base) - Handles filename-based updates, forced values, defaults, and asset type overrides
- Asset type overrides (e.g., EOL) can set/remove fields with final precedence
- Force-sets required values (e.g., STATE = "Local")
- Uses DomainValue format for domained fields when setting values on template fields
- Loads MVP fields from environment-specific config (
Configuration Architecture
Hierarchical config system:
.env: Environment variables (credentials, never committed)config/config.yaml: Main configuration (references .env vars)config/field_mappings_ppr.yaml: PPR field definitions (auto-loaded when DAM URL contains 'ppr')config/field_mappings_prod.yaml: PROD field definitions (auto-loaded otherwise)config/asset_type_mappings.yaml: 3-letter code to DAM code mappings (e.g., EOL -> externallegalopinion)config/asset_representation_template.json: Reference template for folder-only mode (-N flag), contains full field metadata structure../Box-config.json: Box JWT credentials (one directory up from Python-Version)
Important: Box-config.json MUST be located at ../Box-config.json (one folder up). This is hardcoded in config.yaml as rsa_private_key_path: ../Box-config.json.
Database Schema
Four main tables:
- master_assets: 35 columns with JSONB full_metadata
- tracking_id (6-char unique), opentext_id, original_filename
- Campaign relationships: global_master_campaign_id, local_campaign_id
- JSONB storage prevents metadata truncation
- derivative_assets: Localized versions linked by tracking_id
- asset_events: Audit trail of all operations
- workflow_state: Tracks campaign processing state
Key indexes:
idx_master_assets_tracking_idfor fast lookupsidx_master_assets_opentext_localfor rework workflow matching- GIN indexes on JSONB fields for metadata queries
Workflow-Specific Logic
A1→A2 (Master Download):
- Processes ONE campaign per run (cron handles frequency)
- Recursive search in "Master Assets" folder
- ONLY updates status A1→A2 if ALL assets succeed
- Uses
BOX_ROOT_FOLDER_A1_A2(348304357505)
A5→A6 (Rejection/Rework):
- Critical filter: Only processes assets where
ECOMMERCE STATUS = "NOT APPROVED" - Extracts 10 rejection-related fields (approver/legal/IA&CC comments)
- Reuses tracking IDs via
find_or_create_tracking_id() - Uses separate Box folder:
BOX_ROOT_FOLDER_A5_A6(349441822875) - Search is in "Final Assets" folder, not "Master Assets"
B1→B2 (Global Masters):
- Filters for campaigns where type = "Global comm" AND status = B1
- Uses "MASTERS_" prefix for Box folder names
- Searches "Final Assets" folder (B1 uses Final, not Master)
- Uses
BOX_ROOT_FOLDER_B1_B2(349261192115) - Does NOT send webhook (only email)
A2→A3 (Upload from Box):
- Polls
BOX_ROOT_FOLDER_A2_A3(348526703108) for new files - Parses tracking ID from filename (V2 format)
- Two tracking modes:
- Full inheritance: Loads master metadata from database, inherits all fields
- Folder-only (
-Nsuffix): Usesconfig/asset_representation_template.jsonas base, populates from filename
- Updates Description, Language, State fields from filename
- Applies asset type overrides (e.g., EOL sets Agency="-", Languages="Global", IPRights="Yes", removes validity dates)
- Deletes file from Box after successful upload
- Updates campaign status A2→A3 when ALL assets uploaded
Authentication Modes
Two separate authentication systems:
OAuth2 (Production Default):
- Base URL:
DAM_BASE_URL(https://ppr.dam.ferrero.com/otmmapi) - Token endpoint:
DAM_AUTH_URL - Auto-refreshing access token (1 hour expiry)
- No special flags needed
mTLS Certificate (Optional):
- Base URL:
DAM_MTLS_BASE_URL(different endpoint: https://dev-auth.app-api.ferrero.com/00003/mm) - Requires PFX certificate at
DAM_MTLS_CERT_PATH - Runtime flag:
--auth-pfx - Requires IP whitelisting
- PFX converted to PEM on-the-fly using cryptography library
Development Guidelines
Adding New Workflow Scripts
New workflow scripts should follow this pattern:
- Import from
scripts.sharedmodules - Setup rotating log handler (10MB, 28 backups)
- Parse command-line args (include
--auth-pfxfor mTLS support) - Load config via
load_config('config/config.yaml') - Initialize clients (DAMClient, BoxClient, Database, Notifier)
- Process campaigns individually (one per run for cron safety)
- Send email notifications for all outcomes (success/partial/error)
- Log extensively with campaign and asset context
Modifying Field Mappings
To add/remove fields, edit the environment-specific file:
- PPR:
config/field_mappings_ppr.yaml - PROD:
config/field_mappings_prod.yaml
mvp_fields:
- FERRERO.FIELD.NEW_FIELD_NAME # Add here
- FERRERO.FIELD.ANOTHER_FIELD # Add here
No code changes required - the system dynamically loads fields at runtime.
Asset Type Overrides
To add field overrides for a specific asset type, add an asset_type_overrides section to the field mappings file:
asset_type_overrides:
EOL: # Keyed by 3-letter asset type code
FERRERO.MARKETING.FIELD.AGENCY NAME: "-"
MAIN_LANGUAGES: "Global"
FERRERO.FIELD.ASSET VALIDITY START PERIOD: "" # Empty string removes the field
Overrides run after all other field processing (forced values, defaults) and take final precedence. An empty string value removes the field entirely from the payload.
Database Queries
Common patterns used in the codebase:
# Generate unique tracking ID
tracking_id = db.generate_unique_tracking_id()
# Find or reuse existing tracking ID (for rework)
result = db.find_or_create_tracking_id(opentext_id, local_campaign_id)
# Store master asset with full metadata
db.store_master_asset(
tracking_id=tracking_id,
opentext_id=asset_id,
asset_data=asset, # Full dict stored in JSONB
box_file_id=box_result['file_id'],
box_url=box_result['url'],
upload_folder_id=final_folder_id,
global_master_campaign_id=global_ref['global_master_campaign_id'],
local_campaign_id=campaign_number
)
# Check if all campaign assets uploaded
all_done = db.check_all_campaign_assets_uploaded(campaign_number)
Email Notifications
12 template types in notifier.py:
- a1_to_a2_complete, a1_to_a2_partial, a1_to_a2_no_assets
- a5_to_a6_rejections, a5_to_a6_partial, a5_to_a6_no_rejections
- b1_to_b2_complete, b1_to_b2_partial, b1_to_b2_no_assets
- a2_to_a3_file_uploaded, a2_to_a3_complete
- daily_summary_report
Each template includes campaign context, asset lists, Box URLs, and next steps.
Error Handling Patterns
The codebase uses extensive try/except with logging:
try:
# Process asset
file_path = dam.download_asset(asset_id, output_dir)
box_result = box.upload_with_tracking_id(file_path, ...)
db.store_master_asset(...)
processed_assets.append(asset_info)
except Exception as e:
logger.error("Failed: {}".format(str(e)))
failed_assets.append(asset_info)
Important: Workflows only update campaign status if ALL assets succeed to prevent partial state issues.
Python 3.6 Compatibility
Required for server deployment:
- Use
.format()instead of f-strings - Avoid walrus operator (
:=) - No dict union operator (
|) - Type hints are optional
- Use
os.pathinstead ofPathwhere possible
Deployment Notes
Server Directory Structure
/opt/ferrero-automation/
├── Box-config.json # Box JWT config HERE (one level up!)
└── Python-Version/
├── .env # Environment variables
├── config/
│ ├── config.yaml
│ ├── field_mappings_ppr.yaml
│ ├── field_mappings_prod.yaml
│ ├── asset_type_mappings.yaml
│ ├── asset_representation_template.json
│ └── certificates/
│ └── dam-mtls-dev.pfx
├── database/
│ └── init.sql
├── scripts/
│ ├── *.py # Workflow scripts
│ └── shared/ # Shared modules
├── logs/ # Auto-created by scripts
├── temp/ # Auto-created for downloads
└── venv/ # Virtual environment
Cron Configuration
Production cron runs workflows every 5 minutes:
*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a1_to_a2_download.py >> logs/cron_a1_a2.log 2>&1
*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a5_to_a6_download.py >> logs/cron_a5_a6.log 2>&1
*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/b1_to_b2_download.py >> logs/cron_b1_b2.log 2>&1
*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a2_to_a3_upload_polling.py >> logs/cron_a2_a3.log 2>&1
0 19 * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/daily_report.py >> logs/daily_report.log 2>&1
Security Considerations
Files to protect (chmod 600):
.env(all credentials)config/certificates/*.pfx(mTLS certificates)../Box-config.json(Box JWT private key)
Gitignored secrets:
.envconfig/certificates/*.pfx,*.p12logs/,temp/
Production checklist:
- Change default database password (ferrero_pass_2025)
- Update SMTP credentials
- Get production mTLS certificate (current is dev)
- Verify all Box folder IDs are production IDs
- Configure firewall rules for IP whitelisting (if using mTLS)
Troubleshooting
"Box-config.json not found"
Cause: Box JWT file must be one directory up from Python-Version
Fix: Verify file exists at ../Box-config.json relative to Python-Version directory
"mTLS connection refused"
Cause: IP not whitelisted or wrong base URL
Fix: Check DAM_MTLS_BASE_URL uses the correct mTLS endpoint and verify IP is whitelisted
"No A1 campaigns found"
Expected behavior: Scripts check every 5 minutes via cron. This message is normal when no campaigns are in A1 status.
Database connection fails
Fix: Verify PostgreSQL is running on port 5437 (not default 5432)
docker ps | grep ferrero
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT 1;"
Emails not sending
Fix: Check SMTP configuration in .env, verify Mailgun credentials
Additional Documentation
- README.md: Complete user documentation with deployment guide
- DEPLOYMENT_GUIDE.md: Server deployment step-by-step
- DATABASE_SCHEMA.md: Full schema reference with field descriptions
- database/init.sql: Complete database schema with indexes