Added documentation for template-based folder-only mode (-N flag), asset type overrides (EOL), environment-specific field mappings, and updated config file references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
404 lines
15 KiB
Markdown
404 lines
15 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview
|
|
|
|
**Ferrero DAM Content Scaling - Automated Python Workflow System**
|
|
|
|
This is a production automation system that orchestrates asset workflows between OpenText DAM, Box, and PostgreSQL. The system processes digital assets through multiple lifecycle stages (A1→A2, A5→A6, B1→B2, A2→A3) with automated email notifications and daily reporting.
|
|
|
|
**Key characteristics:**
|
|
- Production system with cron automation (every 5 minutes)
|
|
- Dual authentication support: OAuth2 (default) and mTLS certificate
|
|
- Python 3.6+ compatible for server deployment
|
|
- Heavy use of external APIs (OpenText DAM, Box SDK)
|
|
- PostgreSQL database with JSONB metadata storage
|
|
|
|
## Common Commands
|
|
|
|
### Development Setup
|
|
```bash
|
|
# Create and activate virtual environment
|
|
python3 -m venv venv
|
|
source venv/bin/activate
|
|
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# Test connections
|
|
python scripts/test_connection.py
|
|
```
|
|
|
|
### Running Workflows
|
|
|
|
**OAuth2 (default):**
|
|
```bash
|
|
python scripts/a1_to_a2_download.py
|
|
python scripts/a5_to_a6_download.py
|
|
python scripts/b1_to_b2_download.py
|
|
python scripts/a2_to_a3_upload_polling.py
|
|
```
|
|
|
|
**mTLS authentication (requires --auth-pfx flag and whitelisted IP):**
|
|
```bash
|
|
python scripts/a1_to_a2_download.py --auth-pfx
|
|
python scripts/a5_to_a6_download.py --auth-pfx
|
|
python scripts/b1_to_b2_download.py --auth-pfx
|
|
```
|
|
|
|
### Testing
|
|
```bash
|
|
# Test all connections
|
|
python scripts/test_connection.py
|
|
|
|
# Test mTLS certificate
|
|
python scripts/test_mtls_cert.py
|
|
python scripts/test_connection.py --auth-pfx
|
|
|
|
# Generate daily report manually
|
|
python scripts/daily_report.py
|
|
```
|
|
|
|
### Database Operations
|
|
```bash
|
|
# Connect to database
|
|
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking
|
|
|
|
# View recent assets
|
|
psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT tracking_id, original_filename, created_at FROM master_assets ORDER BY created_at DESC LIMIT 10;"
|
|
|
|
# Initialize/reset database
|
|
psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -f database/init.sql
|
|
```
|
|
|
|
### Monitoring
|
|
```bash
|
|
# Watch workflow logs
|
|
tail -f logs/a1_to_a2.log
|
|
tail -f logs/a5_to_a6.log
|
|
|
|
# Search for errors
|
|
grep -i error logs/*.log
|
|
grep -i "NOT APPROVED" logs/a5_to_a6.log
|
|
|
|
# View cron execution logs
|
|
tail -f logs/cron_a1_a2.log
|
|
```
|
|
|
|
## Architecture
|
|
|
|
### System Design
|
|
|
|
**Multi-stage workflow orchestration:**
|
|
1. **A1→A2**: Downloads master assets from DAM, uploads to Box with tracking IDs
|
|
2. **A5→A6**: Filters and downloads ONLY rejected assets (ECOMMERCE STATUS = "NOT APPROVED")
|
|
3. **B1→B2**: Handles global master campaigns separately
|
|
4. **A2→A3**: Polls Box for agency uploads, matches tracking IDs, uploads back to DAM
|
|
|
|
### Core Components
|
|
|
|
**scripts/shared/** - Shared library modules used by all workflows:
|
|
|
|
- **dam_client.py**: OpenText DAM API client with dual authentication
|
|
- OAuth2 token management with auto-refresh
|
|
- mTLS certificate authentication (PFX to PEM conversion)
|
|
- Recursive folder search for assets (critical for subfolder support)
|
|
- Asset download and metadata extraction
|
|
- **Important**: Uses different base URLs for OAuth2 vs mTLS
|
|
|
|
- **box_client.py**: Box SDK wrapper
|
|
- JWT authentication via Box-config.json (located one folder up: `../Box-config.json`)
|
|
- Creates campaign folders with tracking ID naming
|
|
- Preserves DAM folder structure during upload
|
|
- Three separate Box folders for different workflows (A1→A2, A2→A3, B1→B2)
|
|
|
|
- **database.py**: PostgreSQL connection pool and operations
|
|
- Generates unique 6-character tracking IDs
|
|
- `find_or_create_tracking_id()`: Smart lookup by opentext_id + local_campaign_id (reuses IDs for rework workflow)
|
|
- Stores full metadata in JSONB column (no truncation)
|
|
- Checks campaign completion status for A1→A2 updates
|
|
|
|
- **notifier.py**: Email and webhook notifications
|
|
- 12 email templates with color-coded themes
|
|
- Mailgun SMTP integration
|
|
- Detailed asset listings with Box URLs
|
|
- Rejection comment extraction (A5→A6 specific)
|
|
|
|
- **metadata_extractor_mvp.py**: Field mapping and metadata transformation
|
|
- Loads MVP fields from environment-specific config (`field_mappings_ppr.yaml` or `field_mappings_prod.yaml`)
|
|
- **Two tracking modes:** Full inheritance (from master metadata) and folder-only (`-N` suffix, uses `config/asset_representation_template.json` as base)
|
|
- Handles filename-based updates, forced values, defaults, and asset type overrides
|
|
- Asset type overrides (e.g., EOL) can set/remove fields with final precedence
|
|
- Force-sets required values (e.g., STATE = "Local")
|
|
- Uses DomainValue format for domained fields when setting values on template fields
|
|
|
|
### Configuration Architecture
|
|
|
|
**Hierarchical config system:**
|
|
- `.env`: Environment variables (credentials, never committed)
|
|
- `config/config.yaml`: Main configuration (references .env vars)
|
|
- `config/field_mappings_ppr.yaml`: PPR field definitions (auto-loaded when DAM URL contains 'ppr')
|
|
- `config/field_mappings_prod.yaml`: PROD field definitions (auto-loaded otherwise)
|
|
- `config/asset_type_mappings.yaml`: 3-letter code to DAM code mappings (e.g., EOL -> externallegalopinion)
|
|
- `config/asset_representation_template.json`: Reference template for folder-only mode (-N flag), contains full field metadata structure
|
|
- `../Box-config.json`: Box JWT credentials (one directory up from Python-Version)
|
|
|
|
**Important**: Box-config.json MUST be located at `../Box-config.json` (one folder up). This is hardcoded in config.yaml as `rsa_private_key_path: ../Box-config.json`.
|
|
|
|
### Database Schema
|
|
|
|
**Four main tables:**
|
|
1. **master_assets**: 35 columns with JSONB full_metadata
|
|
- tracking_id (6-char unique), opentext_id, original_filename
|
|
- Campaign relationships: global_master_campaign_id, local_campaign_id
|
|
- JSONB storage prevents metadata truncation
|
|
2. **derivative_assets**: Localized versions linked by tracking_id
|
|
3. **asset_events**: Audit trail of all operations
|
|
4. **workflow_state**: Tracks campaign processing state
|
|
|
|
**Key indexes:**
|
|
- `idx_master_assets_tracking_id` for fast lookups
|
|
- `idx_master_assets_opentext_local` for rework workflow matching
|
|
- GIN indexes on JSONB fields for metadata queries
|
|
|
|
### Workflow-Specific Logic
|
|
|
|
**A1→A2 (Master Download):**
|
|
- Processes ONE campaign per run (cron handles frequency)
|
|
- Recursive search in "Master Assets" folder
|
|
- ONLY updates status A1→A2 if ALL assets succeed
|
|
- Uses `BOX_ROOT_FOLDER_A1_A2` (348304357505)
|
|
|
|
**A5→A6 (Rejection/Rework):**
|
|
- **Critical filter**: Only processes assets where `ECOMMERCE STATUS = "NOT APPROVED"`
|
|
- Extracts 10 rejection-related fields (approver/legal/IA&CC comments)
|
|
- Reuses tracking IDs via `find_or_create_tracking_id()`
|
|
- Uses separate Box folder: `BOX_ROOT_FOLDER_A5_A6` (349441822875)
|
|
- Search is in "Final Assets" folder, not "Master Assets"
|
|
|
|
**B1→B2 (Global Masters):**
|
|
- Filters for campaigns where type = "Global comm" AND status = B1
|
|
- Uses "MASTERS_" prefix for Box folder names
|
|
- Searches "Final Assets" folder (B1 uses Final, not Master)
|
|
- Uses `BOX_ROOT_FOLDER_B1_B2` (349261192115)
|
|
- Does NOT send webhook (only email)
|
|
|
|
**A2→A3 (Upload from Box):**
|
|
- Polls `BOX_ROOT_FOLDER_A2_A3` (348526703108) for new files
|
|
- Parses tracking ID from filename (V2 format)
|
|
- Two tracking modes:
|
|
- **Full inheritance**: Loads master metadata from database, inherits all fields
|
|
- **Folder-only** (`-N` suffix): Uses `config/asset_representation_template.json` as base, populates from filename
|
|
- Updates Description, Language, State fields from filename
|
|
- Applies asset type overrides (e.g., EOL sets Agency="-", Languages="Global", IPRights="Yes", removes validity dates)
|
|
- Deletes file from Box after successful upload
|
|
- Updates campaign status A2→A3 when ALL assets uploaded
|
|
|
|
### Authentication Modes
|
|
|
|
**Two separate authentication systems:**
|
|
|
|
**OAuth2 (Production Default):**
|
|
- Base URL: `DAM_BASE_URL` (https://ppr.dam.ferrero.com/otmmapi)
|
|
- Token endpoint: `DAM_AUTH_URL`
|
|
- Auto-refreshing access token (1 hour expiry)
|
|
- No special flags needed
|
|
|
|
**mTLS Certificate (Optional):**
|
|
- Base URL: `DAM_MTLS_BASE_URL` (different endpoint: https://dev-auth.app-api.ferrero.com/00003/mm)
|
|
- Requires PFX certificate at `DAM_MTLS_CERT_PATH`
|
|
- Runtime flag: `--auth-pfx`
|
|
- Requires IP whitelisting
|
|
- PFX converted to PEM on-the-fly using cryptography library
|
|
|
|
## Development Guidelines
|
|
|
|
### Adding New Workflow Scripts
|
|
|
|
New workflow scripts should follow this pattern:
|
|
1. Import from `scripts.shared` modules
|
|
2. Setup rotating log handler (10MB, 28 backups)
|
|
3. Parse command-line args (include `--auth-pfx` for mTLS support)
|
|
4. Load config via `load_config('config/config.yaml')`
|
|
5. Initialize clients (DAMClient, BoxClient, Database, Notifier)
|
|
6. Process campaigns individually (one per run for cron safety)
|
|
7. Send email notifications for all outcomes (success/partial/error)
|
|
8. Log extensively with campaign and asset context
|
|
|
|
### Modifying Field Mappings
|
|
|
|
**To add/remove fields, edit the environment-specific file:**
|
|
- PPR: `config/field_mappings_ppr.yaml`
|
|
- PROD: `config/field_mappings_prod.yaml`
|
|
|
|
```yaml
|
|
mvp_fields:
|
|
- FERRERO.FIELD.NEW_FIELD_NAME # Add here
|
|
- FERRERO.FIELD.ANOTHER_FIELD # Add here
|
|
```
|
|
|
|
**No code changes required** - the system dynamically loads fields at runtime.
|
|
|
|
### Asset Type Overrides
|
|
|
|
To add field overrides for a specific asset type, add an `asset_type_overrides` section to the field mappings file:
|
|
```yaml
|
|
asset_type_overrides:
|
|
EOL: # Keyed by 3-letter asset type code
|
|
FERRERO.MARKETING.FIELD.AGENCY NAME: "-"
|
|
MAIN_LANGUAGES: "Global"
|
|
FERRERO.FIELD.ASSET VALIDITY START PERIOD: "" # Empty string removes the field
|
|
```
|
|
|
|
Overrides run after all other field processing (forced values, defaults) and take final precedence. An empty string value removes the field entirely from the payload.
|
|
|
|
### Database Queries
|
|
|
|
Common patterns used in the codebase:
|
|
```python
|
|
# Generate unique tracking ID
|
|
tracking_id = db.generate_unique_tracking_id()
|
|
|
|
# Find or reuse existing tracking ID (for rework)
|
|
result = db.find_or_create_tracking_id(opentext_id, local_campaign_id)
|
|
|
|
# Store master asset with full metadata
|
|
db.store_master_asset(
|
|
tracking_id=tracking_id,
|
|
opentext_id=asset_id,
|
|
asset_data=asset, # Full dict stored in JSONB
|
|
box_file_id=box_result['file_id'],
|
|
box_url=box_result['url'],
|
|
upload_folder_id=final_folder_id,
|
|
global_master_campaign_id=global_ref['global_master_campaign_id'],
|
|
local_campaign_id=campaign_number
|
|
)
|
|
|
|
# Check if all campaign assets uploaded
|
|
all_done = db.check_all_campaign_assets_uploaded(campaign_number)
|
|
```
|
|
|
|
### Email Notifications
|
|
|
|
**12 template types in notifier.py:**
|
|
- a1_to_a2_complete, a1_to_a2_partial, a1_to_a2_no_assets
|
|
- a5_to_a6_rejections, a5_to_a6_partial, a5_to_a6_no_rejections
|
|
- b1_to_b2_complete, b1_to_b2_partial, b1_to_b2_no_assets
|
|
- a2_to_a3_file_uploaded, a2_to_a3_complete
|
|
- daily_summary_report
|
|
|
|
Each template includes campaign context, asset lists, Box URLs, and next steps.
|
|
|
|
### Error Handling Patterns
|
|
|
|
The codebase uses extensive try/except with logging:
|
|
```python
|
|
try:
|
|
# Process asset
|
|
file_path = dam.download_asset(asset_id, output_dir)
|
|
box_result = box.upload_with_tracking_id(file_path, ...)
|
|
db.store_master_asset(...)
|
|
processed_assets.append(asset_info)
|
|
except Exception as e:
|
|
logger.error("Failed: {}".format(str(e)))
|
|
failed_assets.append(asset_info)
|
|
```
|
|
|
|
**Important**: Workflows only update campaign status if ALL assets succeed to prevent partial state issues.
|
|
|
|
### Python 3.6 Compatibility
|
|
|
|
**Required for server deployment:**
|
|
- Use `.format()` instead of f-strings
|
|
- Avoid walrus operator (`:=`)
|
|
- No dict union operator (`|`)
|
|
- Type hints are optional
|
|
- Use `os.path` instead of `Path` where possible
|
|
|
|
## Deployment Notes
|
|
|
|
### Server Directory Structure
|
|
```
|
|
/opt/ferrero-automation/
|
|
├── Box-config.json # Box JWT config HERE (one level up!)
|
|
└── Python-Version/
|
|
├── .env # Environment variables
|
|
├── config/
|
|
│ ├── config.yaml
|
|
│ ├── field_mappings_ppr.yaml
|
|
│ ├── field_mappings_prod.yaml
|
|
│ ├── asset_type_mappings.yaml
|
|
│ ├── asset_representation_template.json
|
|
│ └── certificates/
|
|
│ └── dam-mtls-dev.pfx
|
|
├── database/
|
|
│ └── init.sql
|
|
├── scripts/
|
|
│ ├── *.py # Workflow scripts
|
|
│ └── shared/ # Shared modules
|
|
├── logs/ # Auto-created by scripts
|
|
├── temp/ # Auto-created for downloads
|
|
└── venv/ # Virtual environment
|
|
```
|
|
|
|
### Cron Configuration
|
|
|
|
Production cron runs workflows every 5 minutes:
|
|
```cron
|
|
*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a1_to_a2_download.py >> logs/cron_a1_a2.log 2>&1
|
|
*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a5_to_a6_download.py >> logs/cron_a5_a6.log 2>&1
|
|
*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/b1_to_b2_download.py >> logs/cron_b1_b2.log 2>&1
|
|
*/5 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/a2_to_a3_upload_polling.py >> logs/cron_a2_a3.log 2>&1
|
|
0 19 * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/daily_report.py >> logs/daily_report.log 2>&1
|
|
```
|
|
|
|
### Security Considerations
|
|
|
|
**Files to protect (chmod 600):**
|
|
- `.env` (all credentials)
|
|
- `config/certificates/*.pfx` (mTLS certificates)
|
|
- `../Box-config.json` (Box JWT private key)
|
|
|
|
**Gitignored secrets:**
|
|
- `.env`
|
|
- `config/certificates/`
|
|
- `*.pfx`, `*.p12`
|
|
- `logs/`, `temp/`
|
|
|
|
**Production checklist:**
|
|
- Change default database password (ferrero_pass_2025)
|
|
- Update SMTP credentials
|
|
- Get production mTLS certificate (current is dev)
|
|
- Verify all Box folder IDs are production IDs
|
|
- Configure firewall rules for IP whitelisting (if using mTLS)
|
|
|
|
## Troubleshooting
|
|
|
|
### "Box-config.json not found"
|
|
**Cause**: Box JWT file must be one directory up from Python-Version
|
|
**Fix**: Verify file exists at `../Box-config.json` relative to Python-Version directory
|
|
|
|
### "mTLS connection refused"
|
|
**Cause**: IP not whitelisted or wrong base URL
|
|
**Fix**: Check `DAM_MTLS_BASE_URL` uses the correct mTLS endpoint and verify IP is whitelisted
|
|
|
|
### "No A1 campaigns found"
|
|
**Expected behavior**: Scripts check every 5 minutes via cron. This message is normal when no campaigns are in A1 status.
|
|
|
|
### Database connection fails
|
|
**Fix**: Verify PostgreSQL is running on port 5437 (not default 5432)
|
|
```bash
|
|
docker ps | grep ferrero
|
|
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT 1;"
|
|
```
|
|
|
|
### Emails not sending
|
|
**Fix**: Check SMTP configuration in .env, verify Mailgun credentials
|
|
|
|
## Additional Documentation
|
|
|
|
- **README.md**: Complete user documentation with deployment guide
|
|
- **DEPLOYMENT_GUIDE.md**: Server deployment step-by-step
|
|
- **DATABASE_SCHEMA.md**: Full schema reference with field descriptions
|
|
- **database/init.sql**: Complete database schema with indexes
|