ferrero-opentext/Python-Version/CREATIVEX_DEPLOYMENT.md
DJP b6b9d7337a Add CreativeX score extraction and storage system
Implements new workflow to extract CreativeX quality scores from PDFs
using LlamaExtract AI and store results in PostgreSQL database.

Components added:
- creativex_scoring_storing.py: Main script to process PDFs from Box
- creativex_scores table: Database table with JSONB for full JSON storage
- Database methods: store_creativex_score() and get_creativex_score_by_filename()
- Email templates: creativex_complete, creativex_partial, creativex_no_files
- Configuration: creativex section in config.yaml
- CREATIVEX_DEPLOYMENT.md: Complete deployment and usage guide

Features:
- Monitors Box folder 350605024645 for PDFs
- Extracts scores using LlamaExtract agent "Creativex-Extract"
- Stores 4 key fields (filename, ID, URL, score) + full JSON
- Deletes processed PDFs from Box after successful extraction
- Sends email notifications for success/partial/no-files scenarios
- Manual execution (python scripts/creativex_scoring_storing.py)

Database schema:
- Table: creativex_scores with 10 columns
- Indexes on filename, box_file_id, status for fast lookups
- JSONB column stores complete extraction for future flexibility

Future integration ready:
db.get_creativex_score_by_filename() available for DAM upload workflows
to attach CreativeX metadata during asset processing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 16:15:45 -05:00

10 KiB

CreativeX Score Extraction - Deployment Guide

Overview

This guide covers deploying the CreativeX score extraction system, which:

  1. Monitors Box folder 350605024645 for PDF files
  2. Extracts CreativeX scores using LlamaExtract AI agent "Creativex-Extract"
  3. Stores results in PostgreSQL database with full JSON
  4. Removes processed files from Box
  5. Sends email notifications

Local Development Setup

1. Add Environment Variable

Add to your .env file:

# CreativeX Configuration
LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key_here

2. Install Python Dependencies

cd Python-Version
source venv/bin/activate
pip install llama-cloud-services

Or install all dependencies:

pip install -r requirements.txt

3. Create Database Table

If starting fresh (full init):

PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -f database/init.sql

If database already exists (add table only):

PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
CREATE TABLE IF NOT EXISTS creativex_scores (
    id SERIAL PRIMARY KEY,
    filename VARCHAR(500) NOT NULL,
    box_file_id VARCHAR(255),
    creativex_id VARCHAR(255),
    creativex_url TEXT,
    quality_score VARCHAR(50),
    full_extraction_data JSONB,
    extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    status VARCHAR(50) DEFAULT 'active',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename);
CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id);
CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status);
"

4. Verify Table Creation

PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "\d creativex_scores"

You should see:

  • 10 columns (id, filename, box_file_id, creativex_id, creativex_url, quality_score, full_extraction_data, extracted_at, status, created_at)
  • 3 indexes (idx_creativex_filename, idx_creativex_box_file, idx_creativex_status)

5. Test Locally

# Run the script manually
python scripts/creativex_scoring_storing.py

Expected behaviors:

  • If no PDFs in Box folder 350605024645: "No PDF files found" email sent
  • If PDFs present: Extraction runs, scores stored, files deleted from Box
  • If extraction fails: Partial success email with errors

Production Server Deployment

Prerequisites

  • Server already running Ferrero automation (A1→A2, A5→A6, etc.)
  • Git repository backed up to Bitbucket
  • SSH access to production server

Step 1: Update .env on Server

SSH to server and add:

cd /opt/ferrero-automation/Python-Version
nano .env

Add:

# CreativeX Configuration
LLAMA_CLOUD_API_KEY=your_production_llama_cloud_api_key

Save and exit (Ctrl+X, Y, Enter).

Step 2: Pull Latest Code

cd /opt/ferrero-automation/Python-Version
git pull origin main

This will include:

  • scripts/creativex_scoring_storing.py
  • Updated database/init.sql
  • Updated scripts/shared/database.py
  • Updated scripts/shared/notifier.py
  • Updated config/config.yaml
  • Updated requirements.txt

Step 3: Install Dependencies

cd /opt/ferrero-automation/Python-Version
source venv/bin/activate
pip install llama-cloud-services

Or update all:

pip install -r requirements.txt --upgrade

Step 4: Create Database Table

PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
CREATE TABLE IF NOT EXISTS creativex_scores (
    id SERIAL PRIMARY KEY,
    filename VARCHAR(500) NOT NULL,
    box_file_id VARCHAR(255),
    creativex_id VARCHAR(255),
    creativex_url TEXT,
    quality_score VARCHAR(50),
    full_extraction_data JSONB,
    extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    status VARCHAR(50) DEFAULT 'active',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename);
CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id);
CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status);
"

Step 5: Verify Installation

# Check database table
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT COUNT(*) FROM creativex_scores;"

# Check script exists
ls -lh scripts/creativex_scoring_storing.py

# Check it's executable
chmod +x scripts/creativex_scoring_storing.py

Step 6: Test Run

cd /opt/ferrero-automation/Python-Version
source venv/bin/activate
python scripts/creativex_scoring_storing.py

Check logs:

tail -f logs/creativex_scoring.log

Step 7: Add to Cron (Optional - If Automated)

Note: User specified this is manual for now, so skip this step initially.

If you want to automate later (e.g., every hour):

crontab -e

Add:

# CreativeX Score Extraction - Every hour
0 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/creativex_scoring_storing.py >> logs/cron_creativex.log 2>&1

Save and exit.

Configuration Details

Box Folder

  • Folder ID: 350605024645
  • Purpose: Drop PDFs here for CreativeX score extraction
  • Behavior: Files are automatically deleted after successful processing

LlamaExtract Agent

  • Agent Name: Creativex-Extract
  • Expected Fields:
    • filename: Original filename from PDF
    • creativeXId.id: CreativeX identifier
    • creativeXId.url: CreativeX URL
    • ferreroCreativeQuality.percentage: Quality score

Database Storage

  • Table: creativex_scores
  • Quick Access Fields: filename, creativex_id, creativex_url, quality_score
  • Full JSON: Stored in full_extraction_data JSONB column
  • Purpose: Future lookups by filename during DAM uploads

Email Notifications

Recipients configured in .env:

  • Success: REPORT_EMAILS
  • Errors: ERROR_EMAIL

Templates:

  1. creativex_complete - All files processed successfully
  2. creativex_partial - Some files failed
  3. creativex_no_files - No PDFs found (normal if folder empty)

Usage

Manual Execution

cd /opt/ferrero-automation/Python-Version
source venv/bin/activate
python scripts/creativex_scoring_storing.py

Workflow

  1. Upload PDFs to Box folder 350605024645
  2. Run script (manual or cron)
  3. Script downloads each PDF
  4. LlamaExtract processes PDF
  5. Results stored in database
  6. PDF deleted from Box
  7. Email notification sent

Checking Results

# View recent extractions
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT filename, creativex_id, quality_score, extracted_at
FROM creativex_scores
ORDER BY extracted_at DESC
LIMIT 10;
"

# Count total scores
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT COUNT(*) as total_scores FROM creativex_scores WHERE status = 'active';
"

# View specific file
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT * FROM creativex_scores WHERE filename LIKE '%yourfile%';
"

Viewing Full JSON

PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT filename, full_extraction_data::jsonb
FROM creativex_scores
WHERE filename = 'example.pdf';
"

Future Integration

The database method db.get_creativex_score_by_filename(filename) is ready for use in other scripts.

Example usage in future DAM upload workflow:

# In a2_to_a3_upload_polling.py or similar
filename = "Brand_Country_Language_123456.mp4"

# Lookup CreativeX score
score_data = db.get_creativex_score_by_filename(filename)

if score_data:
    # Add to DAM metadata
    dam_metadata['FERRERO.FIELD.CREATIVEX_SCORE'] = score_data['quality_score']
    dam_metadata['FERRERO.FIELD.CREATIVEX_URL'] = score_data['creativex_url']
    dam_metadata['FERRERO.FIELD.CREATIVEX_ID'] = score_data['creativex_id']

Troubleshooting

"llama-cloud-services not installed"

source venv/bin/activate
pip install llama-cloud-services

"Agent 'Creativex-Extract' not found"

  • Verify agent name in LlamaCloud portal
  • Check spelling matches exactly: Creativex-Extract
  • Verify API key is correct

"No PDF files found"

  • This is normal if Box folder 350605024645 is empty
  • Upload test PDF to folder and re-run

"Database connection failed"

# Check PostgreSQL is running
docker ps | grep ferrero

# Test connection
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT 1;"

"Email not sending"

  • Check SMTP configuration in .env
  • Verify Mailgun credentials
  • Check logs for detailed error

Files not deleted from Box

  • This is expected for failed extractions
  • Only successful extractions delete files
  • Failed files remain for manual review/retry

Rollback Instructions

If you need to rollback:

Remove Database Table

PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
DROP TABLE IF EXISTS creativex_scores CASCADE;
"

Remove from Cron

crontab -e
# Delete the CreativeX line, save and exit

Revert Code

cd /opt/ferrero-automation/Python-Version
git revert <commit-hash>
git push origin main

Support

  • Logs: logs/creativex_scoring.log
  • Database Queries: See "Checking Results" section above
  • Email Test: Check SMTP settings and recipients list
  • LlamaCloud Issues: Verify API key and agent configuration

Summary Checklist

Local Setup:

  • Add LLAMA_CLOUD_API_KEY to .env
  • Install llama-cloud-services package
  • Create creativex_scores table
  • Test script runs successfully

Production Deployment:

  • Git pull latest code
  • Add LLAMA_CLOUD_API_KEY to server .env
  • Install dependencies on server
  • Create database table on server
  • Test run on server
  • Verify email notifications
  • (Optional) Add to cron if automating

Post-Deployment:

  • Upload test PDF to Box folder 350605024645
  • Run script and verify extraction
  • Check database record created
  • Verify PDF deleted from Box
  • Confirm email notification received