Implements new workflow to extract CreativeX quality scores from PDFs using LlamaExtract AI and store results in PostgreSQL database. Components added: - creativex_scoring_storing.py: Main script to process PDFs from Box - creativex_scores table: Database table with JSONB for full JSON storage - Database methods: store_creativex_score() and get_creativex_score_by_filename() - Email templates: creativex_complete, creativex_partial, creativex_no_files - Configuration: creativex section in config.yaml - CREATIVEX_DEPLOYMENT.md: Complete deployment and usage guide Features: - Monitors Box folder 350605024645 for PDFs - Extracts scores using LlamaExtract agent "Creativex-Extract" - Stores 4 key fields (filename, ID, URL, score) + full JSON - Deletes processed PDFs from Box after successful extraction - Sends email notifications for success/partial/no-files scenarios - Manual execution (python scripts/creativex_scoring_storing.py) Database schema: - Table: creativex_scores with 10 columns - Indexes on filename, box_file_id, status for fast lookups - JSONB column stores complete extraction for future flexibility Future integration ready: db.get_creativex_score_by_filename() available for DAM upload workflows to attach CreativeX metadata during asset processing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
10 KiB
10 KiB
CreativeX Score Extraction - Deployment Guide
Overview
This guide covers deploying the CreativeX score extraction system, which:
- Monitors Box folder 350605024645 for PDF files
- Extracts CreativeX scores using LlamaExtract AI agent "Creativex-Extract"
- Stores results in PostgreSQL database with full JSON
- Removes processed files from Box
- Sends email notifications
Local Development Setup
1. Add Environment Variable
Add to your .env file:
# CreativeX Configuration
LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key_here
2. Install Python Dependencies
cd Python-Version
source venv/bin/activate
pip install llama-cloud-services
Or install all dependencies:
pip install -r requirements.txt
3. Create Database Table
If starting fresh (full init):
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -f database/init.sql
If database already exists (add table only):
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
CREATE TABLE IF NOT EXISTS creativex_scores (
id SERIAL PRIMARY KEY,
filename VARCHAR(500) NOT NULL,
box_file_id VARCHAR(255),
creativex_id VARCHAR(255),
creativex_url TEXT,
quality_score VARCHAR(50),
full_extraction_data JSONB,
extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(50) DEFAULT 'active',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename);
CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id);
CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status);
"
4. Verify Table Creation
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "\d creativex_scores"
You should see:
- 10 columns (id, filename, box_file_id, creativex_id, creativex_url, quality_score, full_extraction_data, extracted_at, status, created_at)
- 3 indexes (idx_creativex_filename, idx_creativex_box_file, idx_creativex_status)
5. Test Locally
# Run the script manually
python scripts/creativex_scoring_storing.py
Expected behaviors:
- If no PDFs in Box folder 350605024645: "No PDF files found" email sent
- If PDFs present: Extraction runs, scores stored, files deleted from Box
- If extraction fails: Partial success email with errors
Production Server Deployment
Prerequisites
- Server already running Ferrero automation (A1→A2, A5→A6, etc.)
- Git repository backed up to Bitbucket
- SSH access to production server
Step 1: Update .env on Server
SSH to server and add:
cd /opt/ferrero-automation/Python-Version
nano .env
Add:
# CreativeX Configuration
LLAMA_CLOUD_API_KEY=your_production_llama_cloud_api_key
Save and exit (Ctrl+X, Y, Enter).
Step 2: Pull Latest Code
cd /opt/ferrero-automation/Python-Version
git pull origin main
This will include:
scripts/creativex_scoring_storing.py- Updated
database/init.sql - Updated
scripts/shared/database.py - Updated
scripts/shared/notifier.py - Updated
config/config.yaml - Updated
requirements.txt
Step 3: Install Dependencies
cd /opt/ferrero-automation/Python-Version
source venv/bin/activate
pip install llama-cloud-services
Or update all:
pip install -r requirements.txt --upgrade
Step 4: Create Database Table
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
CREATE TABLE IF NOT EXISTS creativex_scores (
id SERIAL PRIMARY KEY,
filename VARCHAR(500) NOT NULL,
box_file_id VARCHAR(255),
creativex_id VARCHAR(255),
creativex_url TEXT,
quality_score VARCHAR(50),
full_extraction_data JSONB,
extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(50) DEFAULT 'active',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename);
CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id);
CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status);
"
Step 5: Verify Installation
# Check database table
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT COUNT(*) FROM creativex_scores;"
# Check script exists
ls -lh scripts/creativex_scoring_storing.py
# Check it's executable
chmod +x scripts/creativex_scoring_storing.py
Step 6: Test Run
cd /opt/ferrero-automation/Python-Version
source venv/bin/activate
python scripts/creativex_scoring_storing.py
Check logs:
tail -f logs/creativex_scoring.log
Step 7: Add to Cron (Optional - If Automated)
Note: User specified this is manual for now, so skip this step initially.
If you want to automate later (e.g., every hour):
crontab -e
Add:
# CreativeX Score Extraction - Every hour
0 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/creativex_scoring_storing.py >> logs/cron_creativex.log 2>&1
Save and exit.
Configuration Details
Box Folder
- Folder ID: 350605024645
- Purpose: Drop PDFs here for CreativeX score extraction
- Behavior: Files are automatically deleted after successful processing
LlamaExtract Agent
- Agent Name: Creativex-Extract
- Expected Fields:
filename: Original filename from PDFcreativeXId.id: CreativeX identifiercreativeXId.url: CreativeX URLferreroCreativeQuality.percentage: Quality score
Database Storage
- Table:
creativex_scores - Quick Access Fields: filename, creativex_id, creativex_url, quality_score
- Full JSON: Stored in
full_extraction_dataJSONB column - Purpose: Future lookups by filename during DAM uploads
Email Notifications
Recipients configured in .env:
- Success:
REPORT_EMAILS - Errors:
ERROR_EMAIL
Templates:
creativex_complete- All files processed successfullycreativex_partial- Some files failedcreativex_no_files- No PDFs found (normal if folder empty)
Usage
Manual Execution
cd /opt/ferrero-automation/Python-Version
source venv/bin/activate
python scripts/creativex_scoring_storing.py
Workflow
- Upload PDFs to Box folder 350605024645
- Run script (manual or cron)
- Script downloads each PDF
- LlamaExtract processes PDF
- Results stored in database
- PDF deleted from Box
- Email notification sent
Checking Results
# View recent extractions
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT filename, creativex_id, quality_score, extracted_at
FROM creativex_scores
ORDER BY extracted_at DESC
LIMIT 10;
"
# Count total scores
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT COUNT(*) as total_scores FROM creativex_scores WHERE status = 'active';
"
# View specific file
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT * FROM creativex_scores WHERE filename LIKE '%yourfile%';
"
Viewing Full JSON
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT filename, full_extraction_data::jsonb
FROM creativex_scores
WHERE filename = 'example.pdf';
"
Future Integration
The database method db.get_creativex_score_by_filename(filename) is ready for use in other scripts.
Example usage in future DAM upload workflow:
# In a2_to_a3_upload_polling.py or similar
filename = "Brand_Country_Language_123456.mp4"
# Lookup CreativeX score
score_data = db.get_creativex_score_by_filename(filename)
if score_data:
# Add to DAM metadata
dam_metadata['FERRERO.FIELD.CREATIVEX_SCORE'] = score_data['quality_score']
dam_metadata['FERRERO.FIELD.CREATIVEX_URL'] = score_data['creativex_url']
dam_metadata['FERRERO.FIELD.CREATIVEX_ID'] = score_data['creativex_id']
Troubleshooting
"llama-cloud-services not installed"
source venv/bin/activate
pip install llama-cloud-services
"Agent 'Creativex-Extract' not found"
- Verify agent name in LlamaCloud portal
- Check spelling matches exactly:
Creativex-Extract - Verify API key is correct
"No PDF files found"
- This is normal if Box folder 350605024645 is empty
- Upload test PDF to folder and re-run
"Database connection failed"
# Check PostgreSQL is running
docker ps | grep ferrero
# Test connection
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT 1;"
"Email not sending"
- Check SMTP configuration in .env
- Verify Mailgun credentials
- Check logs for detailed error
Files not deleted from Box
- This is expected for failed extractions
- Only successful extractions delete files
- Failed files remain for manual review/retry
Rollback Instructions
If you need to rollback:
Remove Database Table
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
DROP TABLE IF EXISTS creativex_scores CASCADE;
"
Remove from Cron
crontab -e
# Delete the CreativeX line, save and exit
Revert Code
cd /opt/ferrero-automation/Python-Version
git revert <commit-hash>
git push origin main
Support
- Logs:
logs/creativex_scoring.log - Database Queries: See "Checking Results" section above
- Email Test: Check SMTP settings and recipients list
- LlamaCloud Issues: Verify API key and agent configuration
Summary Checklist
Local Setup:
- Add
LLAMA_CLOUD_API_KEYto .env - Install
llama-cloud-servicespackage - Create
creativex_scorestable - Test script runs successfully
Production Deployment:
- Git pull latest code
- Add
LLAMA_CLOUD_API_KEYto server .env - Install dependencies on server
- Create database table on server
- Test run on server
- Verify email notifications
- (Optional) Add to cron if automating
Post-Deployment:
- Upload test PDF to Box folder 350605024645
- Run script and verify extraction
- Check database record created
- Verify PDF deleted from Box
- Confirm email notification received