16 KiB
CreativeX Score Extraction - Deployment Guide
Overview
This guide covers deploying the CreativeX score extraction system, which:
- Monitors Box folder 350605024645 for PDF files
- Extracts CreativeX scores using LlamaExtract AI agent "Creativex-Extract"
- Stores results in PostgreSQL database with full JSON
- Removes processed files from Box
- Sends email notifications
Local Development Setup
1. Add Environment Variable
Add to your .env file:
# Box Folder Configuration (add to existing Box section)
BOX_ROOT_FOLDER_CREATIVEX=350605024645
# CreativeX Configuration
LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key_here
CREATIVEX_AGENT_NAME=Creativex-Extract
2. Install Python Dependencies
cd Python-Version
source venv/bin/activate
pip install llama-cloud-services
Or install all dependencies:
pip install -r requirements.txt
3. Create Database Table
If starting fresh (full init):
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -f database/init.sql
If database already exists (add table only):
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
CREATE TABLE IF NOT EXISTS creativex_scores (
id SERIAL PRIMARY KEY,
filename VARCHAR(500) NOT NULL,
box_file_id VARCHAR(255),
creativex_id VARCHAR(255),
creativex_url TEXT,
quality_score VARCHAR(50),
full_extraction_data JSONB,
extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(50) DEFAULT 'active',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename);
CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id);
CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status);
"
4. Verify Table Creation
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "\d creativex_scores"
You should see:
- 10 columns (id, filename, box_file_id, creativex_id, creativex_url, quality_score, full_extraction_data, extracted_at, status, created_at)
- 3 indexes (idx_creativex_filename, idx_creativex_box_file, idx_creativex_status)
5. Test Locally
# Run the script manually
python scripts/creativex_scoring_storing.py
Expected behaviors:
- If no PDFs in Box folder 350605024645: "No PDF files found" email sent
- If PDFs present: Extraction runs, scores stored, files deleted from Box
- If extraction fails: Partial success email with errors
Production Server Deployment
Prerequisites
- Server already running Ferrero automation (A1→A2, A5→A6, etc.)
- Git repository backed up to Bitbucket
- SSH access to production server
Step 1: Update .env on Server
SSH to server and add:
cd /opt/ferrero-automation/Python-Version
nano .env
Add:
# Box Folder Configuration (add to existing Box section)
BOX_ROOT_FOLDER_CREATIVEX=350605024645
# CreativeX Configuration
LLAMA_CLOUD_API_KEY=your_production_llama_cloud_api_key
CREATIVEX_AGENT_NAME=Creativex-Extract
Save and exit (Ctrl+X, Y, Enter).
Step 2: Pull Latest Code
cd /opt/ferrero-automation/Python-Version
git pull origin main
This will include:
scripts/creativex_scoring_storing.py- Updated
database/init.sql - Updated
scripts/shared/database.py - Updated
scripts/shared/notifier.py - Updated
config/config.yaml - Updated
requirements.txt
Step 3: Install Dependencies
cd /opt/ferrero-automation/Python-Version
source venv/bin/activate
pip install llama-cloud-services
Or update all:
pip install -r requirements.txt --upgrade
Step 4: Create Database Table
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
CREATE TABLE IF NOT EXISTS creativex_scores (
id SERIAL PRIMARY KEY,
filename VARCHAR(500) NOT NULL,
box_file_id VARCHAR(255),
creativex_id VARCHAR(255),
creativex_url TEXT,
quality_score VARCHAR(50),
full_extraction_data JSONB,
tracking_id VARCHAR(6),
extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(50) DEFAULT 'active',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename);
CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id);
CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status);
CREATE INDEX IF NOT EXISTS idx_creativex_tracking_id ON creativex_scores(tracking_id);
"
If Table Already Exists (Migration):
# Add tracking_id column to existing table
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
ALTER TABLE creativex_scores ADD COLUMN tracking_id VARCHAR(6);
CREATE INDEX IF NOT EXISTS idx_creativex_tracking_id ON creativex_scores(tracking_id);
"
Note on Status Values:
active- Current derivative score (from PDF extraction)superseded- Old derivative score (version history)master-cx-score- Master asset score (from A1→A2 DAM metadata, reference only)
Step 5: Verify Installation
# Check database table
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT COUNT(*) FROM creativex_scores;"
# Check script exists
ls -lh scripts/creativex_scoring_storing.py
# Check it's executable
chmod +x scripts/creativex_scoring_storing.py
Step 6: Test Run
cd /opt/ferrero-automation/Python-Version
source venv/bin/activate
python scripts/creativex_scoring_storing.py
Check logs:
tail -f logs/creativex_scoring.log
Step 7: Add to Cron (Optional - If Automated)
Note: User specified this is manual for now, so skip this step initially.
If you want to automate later (e.g., every hour):
crontab -e
Add:
# CreativeX Score Extraction - Every hour
0 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/creativex_scoring_storing.py >> logs/cron_creativex.log 2>&1
Save and exit.
Configuration Details
Environment Variables
All configuration is centralized in .env:
# Box folder for CreativeX PDFs
BOX_ROOT_FOLDER_CREATIVEX=350605024645
# LlamaCloud API credentials
LLAMA_CLOUD_API_KEY=your_api_key_here
# Agent name in LlamaExtract
CREATIVEX_AGENT_NAME=Creativex-Extract
Box Folder
- Folder ID: Configured via
BOX_ROOT_FOLDER_CREATIVEX(default: 350605024645) - Purpose: Drop PDFs here for CreativeX score extraction
- Behavior: Files are automatically deleted after successful processing
LlamaExtract Agent
- Agent Name: Configured via
CREATIVEX_AGENT_NAME(default: Creativex-Extract) - Expected Fields:
filename: Original filename from PDFcreativeXId.id: CreativeX identifiercreativeXId.url: CreativeX URLferreroCreativeQuality.percentage: Quality score
Database Storage
- Table:
creativex_scores - Quick Access Fields: filename, creativex_id, creativex_url, quality_score
- Full JSON: Stored in
full_extraction_dataJSONB column - Purpose: Future lookups by filename during DAM uploads
Email Notifications
Recipients configured in .env:
- Success:
REPORT_EMAILS - Errors:
ERROR_EMAIL
Templates:
creativex_complete- All files processed successfullycreativex_partial- Some files failedcreativex_no_files- No PDFs found (normal if folder empty)
Usage
Manual Execution
cd /opt/ferrero-automation/Python-Version
source venv/bin/activate
python scripts/creativex_scoring_storing.py
Workflow
CreativeX PDF Extraction (Manual):
- Upload PDFs to Box folder 350605024645
- Run script:
python scripts/creativex_scoring_storing.py - Script downloads each PDF
- LlamaExtract processes PDF
- Results stored in database with status='active'
- PDF deleted from Box
- Email notification sent
Master Asset CreativeX (Automatic):
- A1→A2 downloads master asset from DAM
- If master has CreativeX score/URL in metadata:
- Extracts score and URL
- Stores in database with status='master-cx-score'
- Links via tracking_id
- Used for reference/reporting only (not used in A2→A3 uploads)
- Logs "No CreativeX data" if master not scored (normal)
Checking Results
IMPORTANT: Understanding Status Field
The system uses soft delete to preserve history while keeping latest scores easily accessible:
status = 'active'→ Latest/current derivative score (from PDF extraction)status = 'superseded'→ Previous derivative score (history/audit trail)status = 'master-cx-score'→ Master asset score (from A1→A2, reference only)
Status Usage:
- Derivative scores (PDF extraction): When you re-upload the same filename with a new score, the old record is marked
supersededand a newactiverecord is created. - Master scores (A1→A2): Stored with
master-cx-scorestatus and linked viatracking_id. Not used for uploads, only for reference/reporting.
Query for Latest Scores (Most Common):
# View recent ACTIVE extractions (latest scores only)
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT filename, creativex_id, quality_score, extracted_at
FROM creativex_scores
WHERE status = 'active'
ORDER BY extracted_at DESC
LIMIT 10;
"
# Count total ACTIVE scores (unique filenames with latest scores)
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT COUNT(*) as active_scores FROM creativex_scores WHERE status = 'active';
"
# Get latest score for specific filename (use this in A2→A3 workflow)
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT filename, creativex_id, creativex_url, quality_score, extracted_at
FROM creativex_scores
WHERE filename = 'yourfile.mp4' AND status = 'active';
"
Query for Master Scores (Reference/Reporting):
# Get master score for specific tracking ID
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT filename, quality_score, tracking_id, creativex_url
FROM creativex_scores
WHERE tracking_id = '7xXgKp' AND status = 'master-cx-score';
"
# View all master scores
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT tracking_id, filename, quality_score, extracted_at
FROM creativex_scores
WHERE status = 'master-cx-score'
ORDER BY extracted_at DESC
LIMIT 10;
"
Query for History/Audit (All Versions):
# View ALL versions of a file (including superseded)
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT filename, quality_score, status, tracking_id, extracted_at
FROM creativex_scores
WHERE filename = 'yourfile.mp4'
ORDER BY extracted_at DESC;
"
# Count total records by status
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT
COUNT(*) as total_records,
COUNT(*) FILTER (WHERE status = 'active') as active_derivative_scores,
COUNT(*) FILTER (WHERE status = 'superseded') as superseded_records,
COUNT(*) FILTER (WHERE status = 'master-cx-score') as master_scores
FROM creativex_scores;
"
# See score changes over time for a file
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT
filename,
quality_score,
status,
extracted_at,
CASE
WHEN status = 'active' THEN 'CURRENT'
ELSE 'OLD VERSION'
END as version_label
FROM creativex_scores
WHERE filename LIKE '%Nutella%'
ORDER BY filename, extracted_at DESC;
"
Viewing Full JSON
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT filename, full_extraction_data::jsonb
FROM creativex_scores
WHERE filename = 'example.pdf';
"
Future Integration: A2→A3 Workflow
How to Use in DAM Upload Scripts
The database method db.get_creativex_score_by_filename(filename) is ready for use in other scripts.
IMPORTANT: The method automatically filters for status = 'active' to always return the latest score.
Example usage in a2_to_a3_upload_polling.py:
# In a2_to_a3_upload_polling.py or similar
filename = "Brand_Country_Language_123456.mp4"
# Lookup CreativeX score (returns ONLY active/latest score)
score_data = db.get_creativex_score_by_filename(filename)
if score_data:
# Add to DAM metadata
dam_metadata['FERRERO.FIELD.CREATIVEX_SCORE'] = score_data['quality_score']
dam_metadata['FERRERO.FIELD.CREATIVEX_URL'] = score_data['creativex_url']
dam_metadata['FERRERO.FIELD.CREATIVEX_ID'] = score_data['creativex_id']
# Optional: Access full JSON for additional fields
full_data = score_data['full_extraction_data']
dam_metadata['FERRERO.FIELD.CREATIVEX_BRAND'] = full_data['data']['brand']
dam_metadata['FERRERO.FIELD.CREATIVEX_MARKET'] = full_data['data']['market']
logger.info("Added CreativeX score {} to DAM metadata".format(
score_data['quality_score']
))
else:
logger.warning("No CreativeX score found for: {}".format(filename))
Query Logic in get_creativex_score_by_filename()
The method uses this query internally:
SELECT filename, creativex_id, creativex_url, quality_score,
box_file_id, full_extraction_data, extracted_at
FROM creativex_scores
WHERE filename = %s AND status = 'active'
ORDER BY extracted_at DESC
LIMIT 1
This ensures you always get the latest score, even if multiple versions exist in history.
Behavior Summary for A2→A3 Integration
| Scenario | What Happens |
|---|---|
| Score exists for filename | Returns latest active score |
| Multiple scores exist (history) | Returns only the newest active one |
| No score exists | Returns None |
| File re-scored (same filename) | Old score marked superseded, new score is active |
Key Takeaway: You never need to worry about duplicates or history in A2→A3 workflow. The query automatically handles it.
Troubleshooting
"llama-cloud-services not installed"
source venv/bin/activate
pip install llama-cloud-services
"Agent 'Creativex-Extract' not found"
- Verify agent name in LlamaCloud portal
- Check spelling matches exactly:
Creativex-Extract - Verify API key is correct
"No PDF files found"
- This is normal if Box folder 350605024645 is empty
- Upload test PDF to folder and re-run
"Database connection failed"
# Check PostgreSQL is running
docker ps | grep ferrero
# Test connection
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT 1;"
"Email not sending"
- Check SMTP configuration in .env
- Verify Mailgun credentials
- Check logs for detailed error
Files not deleted from Box
- This is expected for failed extractions
- Only successful extractions delete files
- Failed files remain for manual review/retry
Rollback Instructions
If you need to rollback:
Remove Database Table
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
DROP TABLE IF EXISTS creativex_scores CASCADE;
"
Remove from Cron
crontab -e
# Delete the CreativeX line, save and exit
Revert Code
cd /opt/ferrero-automation/Python-Version
git revert <commit-hash>
git push origin main
Support
- Logs:
logs/creativex_scoring.log - Database Queries: See "Checking Results" section above
- Email Test: Check SMTP settings and recipients list
- LlamaCloud Issues: Verify API key and agent configuration
Summary Checklist
Local Setup:
- Add
LLAMA_CLOUD_API_KEYto .env - Install
llama-cloud-servicespackage - Create
creativex_scorestable - Test script runs successfully
Production Deployment:
- Git pull latest code
- Add
LLAMA_CLOUD_API_KEYto server .env - Install dependencies on server
- Create database table on server
- Test run on server
- Verify email notifications
- (Optional) Add to cron if automating
Post-Deployment:
- Upload test PDF to Box folder 350605024645
- Run script and verify extraction
- Check database record created
- Verify PDF deleted from Box
- Confirm email notification received