Adds upsert logic that marks old records as 'superseded' while creating new 'active' records, preserving full history for audit/analysis. Changes: - Updated store_creativex_score() to check for existing filename - Old records marked status='superseded' before inserting new 'active' record - Returns is_update flag to indicate if this was an update vs new insert - Logs score changes (e.g., "Score: 80.0 -> 85.0") Documentation updates: - Added "Understanding Status Field" section with soft delete explanation - Separated queries into "Latest Scores" vs "History/Audit" sections - Added A2→A3 integration guide with example code - Documented query logic and behavior table for future integration - Added migration notes for existing data Query patterns for A2→A3: - status='active' → Latest/current score (use this in workflows) - status='superseded' → Previous scores (history/audit trail) - get_creativex_score_by_filename() automatically filters for active Benefits: - Easy lookup of latest scores (just filter status='active') - Full history preserved for tracking score changes over time - No data loss when files are re-scored - Clear audit trail of when scores changed Tested and verified: - Existing record (80.0) marked as superseded - New record (85.0) created as active - Queries correctly return only active record 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
14 KiB
CreativeX Score Extraction - Deployment Guide
Overview
This guide covers deploying the CreativeX score extraction system, which:
- Monitors Box folder 350605024645 for PDF files
- Extracts CreativeX scores using LlamaExtract AI agent "Creativex-Extract"
- Stores results in PostgreSQL database with full JSON
- Removes processed files from Box
- Sends email notifications
Local Development Setup
1. Add Environment Variable
Add to your .env file:
# Box Folder Configuration (add to existing Box section)
BOX_ROOT_FOLDER_CREATIVEX=350605024645
# CreativeX Configuration
LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key_here
CREATIVEX_AGENT_NAME=Creativex-Extract
2. Install Python Dependencies
cd Python-Version
source venv/bin/activate
pip install llama-cloud-services
Or install all dependencies:
pip install -r requirements.txt
3. Create Database Table
If starting fresh (full init):
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -f database/init.sql
If database already exists (add table only):
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
CREATE TABLE IF NOT EXISTS creativex_scores (
id SERIAL PRIMARY KEY,
filename VARCHAR(500) NOT NULL,
box_file_id VARCHAR(255),
creativex_id VARCHAR(255),
creativex_url TEXT,
quality_score VARCHAR(50),
full_extraction_data JSONB,
extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(50) DEFAULT 'active',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename);
CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id);
CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status);
"
4. Verify Table Creation
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "\d creativex_scores"
You should see:
- 10 columns (id, filename, box_file_id, creativex_id, creativex_url, quality_score, full_extraction_data, extracted_at, status, created_at)
- 3 indexes (idx_creativex_filename, idx_creativex_box_file, idx_creativex_status)
5. Test Locally
# Run the script manually
python scripts/creativex_scoring_storing.py
Expected behaviors:
- If no PDFs in Box folder 350605024645: "No PDF files found" email sent
- If PDFs present: Extraction runs, scores stored, files deleted from Box
- If extraction fails: Partial success email with errors
Production Server Deployment
Prerequisites
- Server already running Ferrero automation (A1→A2, A5→A6, etc.)
- Git repository backed up to Bitbucket
- SSH access to production server
Step 1: Update .env on Server
SSH to server and add:
cd /opt/ferrero-automation/Python-Version
nano .env
Add:
# Box Folder Configuration (add to existing Box section)
BOX_ROOT_FOLDER_CREATIVEX=350605024645
# CreativeX Configuration
LLAMA_CLOUD_API_KEY=your_production_llama_cloud_api_key
CREATIVEX_AGENT_NAME=Creativex-Extract
Save and exit (Ctrl+X, Y, Enter).
Step 2: Pull Latest Code
cd /opt/ferrero-automation/Python-Version
git pull origin main
This will include:
scripts/creativex_scoring_storing.py- Updated
database/init.sql - Updated
scripts/shared/database.py - Updated
scripts/shared/notifier.py - Updated
config/config.yaml - Updated
requirements.txt
Step 3: Install Dependencies
cd /opt/ferrero-automation/Python-Version
source venv/bin/activate
pip install llama-cloud-services
Or update all:
pip install -r requirements.txt --upgrade
Step 4: Create Database Table
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
CREATE TABLE IF NOT EXISTS creativex_scores (
id SERIAL PRIMARY KEY,
filename VARCHAR(500) NOT NULL,
box_file_id VARCHAR(255),
creativex_id VARCHAR(255),
creativex_url TEXT,
quality_score VARCHAR(50),
full_extraction_data JSONB,
extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(50) DEFAULT 'active',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename);
CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id);
CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status);
"
Note on Existing Data: If you already have records in the table from testing, they will have status = 'active' by default. This is correct - they are the current versions. When you re-upload the same filename, the system will mark the old record as superseded and create a new active record automatically.
Step 5: Verify Installation
# Check database table
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT COUNT(*) FROM creativex_scores;"
# Check script exists
ls -lh scripts/creativex_scoring_storing.py
# Check it's executable
chmod +x scripts/creativex_scoring_storing.py
Step 6: Test Run
cd /opt/ferrero-automation/Python-Version
source venv/bin/activate
python scripts/creativex_scoring_storing.py
Check logs:
tail -f logs/creativex_scoring.log
Step 7: Add to Cron (Optional - If Automated)
Note: User specified this is manual for now, so skip this step initially.
If you want to automate later (e.g., every hour):
crontab -e
Add:
# CreativeX Score Extraction - Every hour
0 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/creativex_scoring_storing.py >> logs/cron_creativex.log 2>&1
Save and exit.
Configuration Details
Environment Variables
All configuration is centralized in .env:
# Box folder for CreativeX PDFs
BOX_ROOT_FOLDER_CREATIVEX=350605024645
# LlamaCloud API credentials
LLAMA_CLOUD_API_KEY=your_api_key_here
# Agent name in LlamaExtract
CREATIVEX_AGENT_NAME=Creativex-Extract
Box Folder
- Folder ID: Configured via
BOX_ROOT_FOLDER_CREATIVEX(default: 350605024645) - Purpose: Drop PDFs here for CreativeX score extraction
- Behavior: Files are automatically deleted after successful processing
LlamaExtract Agent
- Agent Name: Configured via
CREATIVEX_AGENT_NAME(default: Creativex-Extract) - Expected Fields:
filename: Original filename from PDFcreativeXId.id: CreativeX identifiercreativeXId.url: CreativeX URLferreroCreativeQuality.percentage: Quality score
Database Storage
- Table:
creativex_scores - Quick Access Fields: filename, creativex_id, creativex_url, quality_score
- Full JSON: Stored in
full_extraction_dataJSONB column - Purpose: Future lookups by filename during DAM uploads
Email Notifications
Recipients configured in .env:
- Success:
REPORT_EMAILS - Errors:
ERROR_EMAIL
Templates:
creativex_complete- All files processed successfullycreativex_partial- Some files failedcreativex_no_files- No PDFs found (normal if folder empty)
Usage
Manual Execution
cd /opt/ferrero-automation/Python-Version
source venv/bin/activate
python scripts/creativex_scoring_storing.py
Workflow
- Upload PDFs to Box folder 350605024645
- Run script (manual or cron)
- Script downloads each PDF
- LlamaExtract processes PDF
- Results stored in database
- PDF deleted from Box
- Email notification sent
Checking Results
IMPORTANT: Understanding Status Field
The system uses soft delete to preserve history while keeping latest scores easily accessible:
status = 'active'→ Latest/current score for this filenamestatus = 'superseded'→ Previous score (history/audit trail)
When you re-upload the same filename with a new score, the old record is marked superseded and a new active record is created.
Query for Latest Scores (Most Common):
# View recent ACTIVE extractions (latest scores only)
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT filename, creativex_id, quality_score, extracted_at
FROM creativex_scores
WHERE status = 'active'
ORDER BY extracted_at DESC
LIMIT 10;
"
# Count total ACTIVE scores (unique filenames with latest scores)
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT COUNT(*) as active_scores FROM creativex_scores WHERE status = 'active';
"
# Get latest score for specific filename (use this in A2→A3 workflow)
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT filename, creativex_id, creativex_url, quality_score, extracted_at
FROM creativex_scores
WHERE filename = 'yourfile.mp4' AND status = 'active';
"
Query for History/Audit (All Versions):
# View ALL versions of a file (including superseded)
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT filename, quality_score, status, extracted_at
FROM creativex_scores
WHERE filename = 'yourfile.mp4'
ORDER BY extracted_at DESC;
"
# Count total records (including history)
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT
COUNT(*) as total_records,
COUNT(*) FILTER (WHERE status = 'active') as active_records,
COUNT(*) FILTER (WHERE status = 'superseded') as superseded_records
FROM creativex_scores;
"
# See score changes over time for a file
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT
filename,
quality_score,
status,
extracted_at,
CASE
WHEN status = 'active' THEN 'CURRENT'
ELSE 'OLD VERSION'
END as version_label
FROM creativex_scores
WHERE filename LIKE '%Nutella%'
ORDER BY filename, extracted_at DESC;
"
Viewing Full JSON
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT filename, full_extraction_data::jsonb
FROM creativex_scores
WHERE filename = 'example.pdf';
"
Future Integration: A2→A3 Workflow
How to Use in DAM Upload Scripts
The database method db.get_creativex_score_by_filename(filename) is ready for use in other scripts.
IMPORTANT: The method automatically filters for status = 'active' to always return the latest score.
Example usage in a2_to_a3_upload_polling.py:
# In a2_to_a3_upload_polling.py or similar
filename = "Brand_Country_Language_123456.mp4"
# Lookup CreativeX score (returns ONLY active/latest score)
score_data = db.get_creativex_score_by_filename(filename)
if score_data:
# Add to DAM metadata
dam_metadata['FERRERO.FIELD.CREATIVEX_SCORE'] = score_data['quality_score']
dam_metadata['FERRERO.FIELD.CREATIVEX_URL'] = score_data['creativex_url']
dam_metadata['FERRERO.FIELD.CREATIVEX_ID'] = score_data['creativex_id']
# Optional: Access full JSON for additional fields
full_data = score_data['full_extraction_data']
dam_metadata['FERRERO.FIELD.CREATIVEX_BRAND'] = full_data['data']['brand']
dam_metadata['FERRERO.FIELD.CREATIVEX_MARKET'] = full_data['data']['market']
logger.info("Added CreativeX score {} to DAM metadata".format(
score_data['quality_score']
))
else:
logger.warning("No CreativeX score found for: {}".format(filename))
Query Logic in get_creativex_score_by_filename()
The method uses this query internally:
SELECT filename, creativex_id, creativex_url, quality_score,
box_file_id, full_extraction_data, extracted_at
FROM creativex_scores
WHERE filename = %s AND status = 'active'
ORDER BY extracted_at DESC
LIMIT 1
This ensures you always get the latest score, even if multiple versions exist in history.
Behavior Summary for A2→A3 Integration
| Scenario | What Happens |
|---|---|
| Score exists for filename | Returns latest active score |
| Multiple scores exist (history) | Returns only the newest active one |
| No score exists | Returns None |
| File re-scored (same filename) | Old score marked superseded, new score is active |
Key Takeaway: You never need to worry about duplicates or history in A2→A3 workflow. The query automatically handles it.
Troubleshooting
"llama-cloud-services not installed"
source venv/bin/activate
pip install llama-cloud-services
"Agent 'Creativex-Extract' not found"
- Verify agent name in LlamaCloud portal
- Check spelling matches exactly:
Creativex-Extract - Verify API key is correct
"No PDF files found"
- This is normal if Box folder 350605024645 is empty
- Upload test PDF to folder and re-run
"Database connection failed"
# Check PostgreSQL is running
docker ps | grep ferrero
# Test connection
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT 1;"
"Email not sending"
- Check SMTP configuration in .env
- Verify Mailgun credentials
- Check logs for detailed error
Files not deleted from Box
- This is expected for failed extractions
- Only successful extractions delete files
- Failed files remain for manual review/retry
Rollback Instructions
If you need to rollback:
Remove Database Table
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
DROP TABLE IF EXISTS creativex_scores CASCADE;
"
Remove from Cron
crontab -e
# Delete the CreativeX line, save and exit
Revert Code
cd /opt/ferrero-automation/Python-Version
git revert <commit-hash>
git push origin main
Support
- Logs:
logs/creativex_scoring.log - Database Queries: See "Checking Results" section above
- Email Test: Check SMTP settings and recipients list
- LlamaCloud Issues: Verify API key and agent configuration
Summary Checklist
Local Setup:
- Add
LLAMA_CLOUD_API_KEYto .env - Install
llama-cloud-servicespackage - Create
creativex_scorestable - Test script runs successfully
Production Deployment:
- Git pull latest code
- Add
LLAMA_CLOUD_API_KEYto server .env - Install dependencies on server
- Create database table on server
- Test run on server
- Verify email notifications
- (Optional) Add to cron if automating
Post-Deployment:
- Upload test PDF to Box folder 350605024645
- Run script and verify extraction
- Check database record created
- Verify PDF deleted from Box
- Confirm email notification received