Add CreativeX score extraction and storage system

Implements new workflow to extract CreativeX quality scores from PDFs
using LlamaExtract AI and store results in PostgreSQL database.

Components added:
- creativex_scoring_storing.py: Main script to process PDFs from Box
- creativex_scores table: Database table with JSONB for full JSON storage
- Database methods: store_creativex_score() and get_creativex_score_by_filename()
- Email templates: creativex_complete, creativex_partial, creativex_no_files
- Configuration: creativex section in config.yaml
- CREATIVEX_DEPLOYMENT.md: Complete deployment and usage guide

Features:
- Monitors Box folder 350605024645 for PDFs
- Extracts scores using LlamaExtract agent "Creativex-Extract"
- Stores 4 key fields (filename, ID, URL, score) + full JSON
- Deletes processed PDFs from Box after successful extraction
- Sends email notifications for success/partial/no-files scenarios
- Manual execution (python scripts/creativex_scoring_storing.py)

Database schema:
- Table: creativex_scores with 10 columns
- Indexes on filename, box_file_id, status for fast lookups
- JSONB column stores complete extraction for future flexibility

Future integration ready:
db.get_creativex_score_by_filename() available for DAM upload workflows
to attach CreativeX metadata during asset processing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
DJP 2025-11-11 16:15:45 -05:00
parent a9c3ff6503
commit b6b9d7337a
7 changed files with 1041 additions and 1 deletions

View file

@ -0,0 +1,398 @@
# CreativeX Score Extraction - Deployment Guide
## Overview
This guide covers deploying the CreativeX score extraction system, which:
1. Monitors Box folder 350605024645 for PDF files
2. Extracts CreativeX scores using LlamaExtract AI agent "Creativex-Extract"
3. Stores results in PostgreSQL database with full JSON
4. Removes processed files from Box
5. Sends email notifications
## Local Development Setup
### 1. Add Environment Variable
Add to your `.env` file:
```bash
# CreativeX Configuration
LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key_here
```
### 2. Install Python Dependencies
```bash
cd Python-Version
source venv/bin/activate
pip install llama-cloud-services
```
Or install all dependencies:
```bash
pip install -r requirements.txt
```
### 3. Create Database Table
**If starting fresh (full init):**
```bash
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -f database/init.sql
```
**If database already exists (add table only):**
```bash
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
CREATE TABLE IF NOT EXISTS creativex_scores (
id SERIAL PRIMARY KEY,
filename VARCHAR(500) NOT NULL,
box_file_id VARCHAR(255),
creativex_id VARCHAR(255),
creativex_url TEXT,
quality_score VARCHAR(50),
full_extraction_data JSONB,
extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(50) DEFAULT 'active',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename);
CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id);
CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status);
"
```
### 4. Verify Table Creation
```bash
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "\d creativex_scores"
```
You should see:
- 10 columns (id, filename, box_file_id, creativex_id, creativex_url, quality_score, full_extraction_data, extracted_at, status, created_at)
- 3 indexes (idx_creativex_filename, idx_creativex_box_file, idx_creativex_status)
### 5. Test Locally
```bash
# Run the script manually
python scripts/creativex_scoring_storing.py
```
**Expected behaviors:**
- If no PDFs in Box folder 350605024645: "No PDF files found" email sent
- If PDFs present: Extraction runs, scores stored, files deleted from Box
- If extraction fails: Partial success email with errors
## Production Server Deployment
### Prerequisites
- Server already running Ferrero automation (A1→A2, A5→A6, etc.)
- Git repository backed up to Bitbucket
- SSH access to production server
### Step 1: Update .env on Server
SSH to server and add:
```bash
cd /opt/ferrero-automation/Python-Version
nano .env
```
Add:
```bash
# CreativeX Configuration
LLAMA_CLOUD_API_KEY=your_production_llama_cloud_api_key
```
Save and exit (Ctrl+X, Y, Enter).
### Step 2: Pull Latest Code
```bash
cd /opt/ferrero-automation/Python-Version
git pull origin main
```
This will include:
- `scripts/creativex_scoring_storing.py`
- Updated `database/init.sql`
- Updated `scripts/shared/database.py`
- Updated `scripts/shared/notifier.py`
- Updated `config/config.yaml`
- Updated `requirements.txt`
### Step 3: Install Dependencies
```bash
cd /opt/ferrero-automation/Python-Version
source venv/bin/activate
pip install llama-cloud-services
```
Or update all:
```bash
pip install -r requirements.txt --upgrade
```
### Step 4: Create Database Table
```bash
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
CREATE TABLE IF NOT EXISTS creativex_scores (
id SERIAL PRIMARY KEY,
filename VARCHAR(500) NOT NULL,
box_file_id VARCHAR(255),
creativex_id VARCHAR(255),
creativex_url TEXT,
quality_score VARCHAR(50),
full_extraction_data JSONB,
extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(50) DEFAULT 'active',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename);
CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id);
CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status);
"
```
### Step 5: Verify Installation
```bash
# Check database table
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT COUNT(*) FROM creativex_scores;"
# Check script exists
ls -lh scripts/creativex_scoring_storing.py
# Check it's executable
chmod +x scripts/creativex_scoring_storing.py
```
### Step 6: Test Run
```bash
cd /opt/ferrero-automation/Python-Version
source venv/bin/activate
python scripts/creativex_scoring_storing.py
```
Check logs:
```bash
tail -f logs/creativex_scoring.log
```
### Step 7: Add to Cron (Optional - If Automated)
**Note:** User specified this is manual for now, so skip this step initially.
If you want to automate later (e.g., every hour):
```bash
crontab -e
```
Add:
```cron
# CreativeX Score Extraction - Every hour
0 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/creativex_scoring_storing.py >> logs/cron_creativex.log 2>&1
```
Save and exit.
## Configuration Details
### Box Folder
- **Folder ID:** 350605024645
- **Purpose:** Drop PDFs here for CreativeX score extraction
- **Behavior:** Files are automatically deleted after successful processing
### LlamaExtract Agent
- **Agent Name:** Creativex-Extract
- **Expected Fields:**
- `filename`: Original filename from PDF
- `creativeXId.id`: CreativeX identifier
- `creativeXId.url`: CreativeX URL
- `ferreroCreativeQuality.percentage`: Quality score
### Database Storage
- **Table:** `creativex_scores`
- **Quick Access Fields:** filename, creativex_id, creativex_url, quality_score
- **Full JSON:** Stored in `full_extraction_data` JSONB column
- **Purpose:** Future lookups by filename during DAM uploads
### Email Notifications
**Recipients configured in .env:**
- Success: `REPORT_EMAILS`
- Errors: `ERROR_EMAIL`
**Templates:**
1. `creativex_complete` - All files processed successfully
2. `creativex_partial` - Some files failed
3. `creativex_no_files` - No PDFs found (normal if folder empty)
## Usage
### Manual Execution
```bash
cd /opt/ferrero-automation/Python-Version
source venv/bin/activate
python scripts/creativex_scoring_storing.py
```
### Workflow
1. Upload PDFs to Box folder 350605024645
2. Run script (manual or cron)
3. Script downloads each PDF
4. LlamaExtract processes PDF
5. Results stored in database
6. PDF deleted from Box
7. Email notification sent
### Checking Results
```bash
# View recent extractions
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT filename, creativex_id, quality_score, extracted_at
FROM creativex_scores
ORDER BY extracted_at DESC
LIMIT 10;
"
# Count total scores
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT COUNT(*) as total_scores FROM creativex_scores WHERE status = 'active';
"
# View specific file
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT * FROM creativex_scores WHERE filename LIKE '%yourfile%';
"
```
### Viewing Full JSON
```bash
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
SELECT filename, full_extraction_data::jsonb
FROM creativex_scores
WHERE filename = 'example.pdf';
"
```
## Future Integration
The database method `db.get_creativex_score_by_filename(filename)` is ready for use in other scripts.
**Example usage in future DAM upload workflow:**
```python
# In a2_to_a3_upload_polling.py or similar
filename = "Brand_Country_Language_123456.mp4"
# Lookup CreativeX score
score_data = db.get_creativex_score_by_filename(filename)
if score_data:
# Add to DAM metadata
dam_metadata['FERRERO.FIELD.CREATIVEX_SCORE'] = score_data['quality_score']
dam_metadata['FERRERO.FIELD.CREATIVEX_URL'] = score_data['creativex_url']
dam_metadata['FERRERO.FIELD.CREATIVEX_ID'] = score_data['creativex_id']
```
## Troubleshooting
### "llama-cloud-services not installed"
```bash
source venv/bin/activate
pip install llama-cloud-services
```
### "Agent 'Creativex-Extract' not found"
- Verify agent name in LlamaCloud portal
- Check spelling matches exactly: `Creativex-Extract`
- Verify API key is correct
### "No PDF files found"
- This is normal if Box folder 350605024645 is empty
- Upload test PDF to folder and re-run
### "Database connection failed"
```bash
# Check PostgreSQL is running
docker ps | grep ferrero
# Test connection
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT 1;"
```
### "Email not sending"
- Check SMTP configuration in .env
- Verify Mailgun credentials
- Check logs for detailed error
### Files not deleted from Box
- This is expected for failed extractions
- Only successful extractions delete files
- Failed files remain for manual review/retry
## Rollback Instructions
If you need to rollback:
### Remove Database Table
```bash
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
DROP TABLE IF EXISTS creativex_scores CASCADE;
"
```
### Remove from Cron
```bash
crontab -e
# Delete the CreativeX line, save and exit
```
### Revert Code
```bash
cd /opt/ferrero-automation/Python-Version
git revert <commit-hash>
git push origin main
```
## Support
- **Logs:** `logs/creativex_scoring.log`
- **Database Queries:** See "Checking Results" section above
- **Email Test:** Check SMTP settings and recipients list
- **LlamaCloud Issues:** Verify API key and agent configuration
## Summary Checklist
**Local Setup:**
- [ ] Add `LLAMA_CLOUD_API_KEY` to .env
- [ ] Install `llama-cloud-services` package
- [ ] Create `creativex_scores` table
- [ ] Test script runs successfully
**Production Deployment:**
- [ ] Git pull latest code
- [ ] Add `LLAMA_CLOUD_API_KEY` to server .env
- [ ] Install dependencies on server
- [ ] Create database table on server
- [ ] Test run on server
- [ ] Verify email notifications
- [ ] (Optional) Add to cron if automating
**Post-Deployment:**
- [ ] Upload test PDF to Box folder 350605024645
- [ ] Run script and verify extraction
- [ ] Check database record created
- [ ] Verify PDF deleted from Box
- [ ] Confirm email notification received

View file

@ -95,6 +95,12 @@ notifications:
fields:
mappings_file: config/field_mappings.yaml
# CreativeX Configuration
creativex:
llama_api_key: ${LLAMA_CLOUD_API_KEY}
agent_name: Creativex-Extract
box_folder_id: "350605024645"
# Logging Configuration
logging:
level: INFO

View file

@ -172,6 +172,35 @@ CREATE TABLE IF NOT EXISTS campaign_status (
\echo 'Table campaign_status created'
-- ============================================================================
-- Table: creativex_scores
-- Purpose: Stores CreativeX quality scores extracted from PDFs via LlamaExtract
-- ============================================================================
CREATE TABLE IF NOT EXISTS creativex_scores (
-- Primary Key
id SERIAL PRIMARY KEY,
-- File Information
filename VARCHAR(500) NOT NULL,
box_file_id VARCHAR(255),
-- CreativeX Data (parsed fields for quick access)
creativex_id VARCHAR(255),
creativex_url TEXT,
quality_score VARCHAR(50),
-- Full Extraction Data (JSONB - Complete LlamaExtract response for future use)
full_extraction_data JSONB,
-- Timestamps
extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(50) DEFAULT 'active',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
\echo 'Table creativex_scores created'
\echo 'Tables created successfully'
-- ============================================================================
@ -211,6 +240,11 @@ CREATE INDEX IF NOT EXISTS idx_campaign_status_status ON campaign_status(status)
CREATE INDEX IF NOT EXISTS idx_campaign_status_live ON campaign_status(live_campaign);
CREATE INDEX IF NOT EXISTS idx_campaign_status_webhook_sent ON campaign_status(webhook_sent);
-- creativex_scores indexes
CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename);
CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id);
CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status);
\echo 'Indexes created successfully'
-- ============================================================================
@ -323,8 +357,10 @@ GRANT USAGE ON SCHEMA public TO ferrero_user;
\echo ' - derivative_assets'
\echo ' - asset_events'
\echo ' - workflow_state'
\echo ' - campaign_status'
\echo ' - creativex_scores'
\echo ''
\echo 'Indexes created: 12'
\echo 'Indexes created: 15'
\echo 'Triggers created: 4'
\echo 'Functions created: 2'
\echo ''

View file

@ -24,6 +24,9 @@ cryptography>=3.4.0
# Email templates
Jinja2>=3.0.0
# LlamaExtract for CreativeX score extraction
llama-cloud-services>=0.1.0
# Retry logic
tenacity>=8.0.0

View file

@ -0,0 +1,396 @@
#!/usr/bin/env python3
"""
CreativeX Score Extractor and Storage
Processes PDFs from Box folder 350605024645, extracts CreativeX scores using LlamaExtract,
stores results in database, and removes processed files from Box.
Compatible with Python 3.6+
"""
import sys
import os
import logging
from datetime import datetime
from pathlib import Path
# Add shared library to path
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
from shared.config_loader import load_config
from shared.box_client import BoxClient
from shared.database import Database
from shared.notifier import Notifier
# Setup logging with rotation
from logging.handlers import RotatingFileHandler
# Create logs directory if it doesn't exist
os.makedirs('logs', exist_ok=True)
# Configure logging with rotation
log_handler = RotatingFileHandler(
'logs/creativex_scoring.log',
maxBytes=10*1024*1024, # 10MB per file
backupCount=28 # Keep 28 rotated files (approximately 1 month)
)
log_handler.setLevel(logging.INFO)
log_handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)
console_handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
logging.basicConfig(
level=logging.INFO,
handlers=[log_handler, console_handler]
)
logger = logging.getLogger('CreativeXScoring')
class CreativeXExtractor:
"""Handles extraction of CreativeX data from PDF files using LlamaExtract."""
def __init__(self, api_key, agent_name):
"""
Initialize the Llama Extract client.
Args:
api_key: LlamaCloud API key
agent_name: Agent name in LlamaExtract
"""
try:
from llama_cloud_services import LlamaExtract
self.extractor = LlamaExtract(api_key=api_key)
self.agent_name = agent_name
logger.info("LlamaExtract client initialized with agent: {}".format(agent_name))
except ImportError:
logger.error("llama-cloud-services not installed. Run: pip install llama-cloud-services")
raise
except Exception as e:
logger.error("Failed to initialize LlamaExtract: {}".format(str(e)))
raise
def extract_from_file(self, file_path):
"""
Extract data from a PDF file using Llama Extract.
Args:
file_path: Path to the PDF file
Returns:
Dictionary containing the extraction result, or None if extraction fails
"""
try:
logger.info(" Getting agent: {}".format(self.agent_name))
agent = self.extractor.get_agent(name=self.agent_name)
if agent is None:
raise Exception("Agent '{}' not found".format(self.agent_name))
logger.info(" Running extraction on: {}".format(os.path.basename(file_path)))
result = agent.extract(str(file_path))
# Convert result to dictionary format
extraction_data = {
"run_id": getattr(result, "run_id", None),
"extraction_agent_id": getattr(result, "extraction_agent_id", None),
"data": result.data if hasattr(result, "data") else {},
"extraction_metadata": getattr(result, "extraction_metadata", {})
}
return extraction_data
except Exception as e:
logger.error(" ERROR: Extraction failed - {}".format(str(e)))
return None
def parse_csv_fields(self, extraction_data):
"""
Parse specific fields for database storage.
Expected fields:
- filename
- creativeXId.id
- creativeXId.url
- ferreroCreativeQuality.percentage
Args:
extraction_data: Full extraction result dictionary
Returns:
Dictionary with parsed fields, or None if required fields are missing
"""
try:
data = extraction_data.get("data", {})
# Extract filename
filename = data.get("filename", "")
# Extract creativeXId fields
creative_x_id_obj = data.get("creativeXId", {})
creative_x_id = creative_x_id_obj.get("id", "") if isinstance(creative_x_id_obj, dict) else ""
creative_x_url = creative_x_id_obj.get("url", "") if isinstance(creative_x_id_obj, dict) else ""
# Extract ferreroCreativeQuality percentage
ferrero_quality_obj = data.get("ferreroCreativeQuality", {})
quality_score = ferrero_quality_obj.get("percentage", "") if isinstance(ferrero_quality_obj, dict) else ""
# Validate that we have the critical fields
if not filename:
logger.warning(" WARNING: filename field is missing from extraction data")
return {
"filename": filename,
"id": creative_x_id,
"url": creative_x_url,
"score": quality_score
}
except Exception as e:
logger.error(" ERROR: Failed to parse CSV fields - {}".format(str(e)))
return None
def process_pdfs(box_client, db, extractor, notifier, config):
"""
Process all PDFs in the CreativeX Box folder.
Args:
box_client: BoxClient instance
db: Database instance
extractor: CreativeXExtractor instance
notifier: Notifier instance
config: Configuration dict
Returns:
dict with processing results
"""
creativex_folder_id = config['creativex']['box_folder_id']
logger.info("=" * 60)
logger.info("CreativeX Score Extraction")
logger.info("=" * 60)
logger.info("Box Folder ID: {}".format(creativex_folder_id))
logger.info("")
try:
# List all PDF files in Box folder
files = box_client.list_folder_files(creativex_folder_id)
pdf_files = [f for f in files if f['name'].lower().endswith('.pdf')]
if not pdf_files:
logger.info("No PDF files found in Box folder")
# Send email notification
notifier.send_email(
template_name='creativex_no_files',
recipients=config['notifications']['recipients']['success'],
data={
'timestamp': datetime.now().strftime("%Y-%m-%d %H:%M:%S")
}
)
return {'success': True, 'file_count': 0, 'processed': 0, 'failed': 0}
logger.info("Found {} PDF file(s) to process".format(len(pdf_files)))
logger.info("")
# Create temp directory
temp_dir = Path('temp/creativex')
temp_dir.mkdir(parents=True, exist_ok=True)
# Track results
processed_files = []
failed_files = []
# Process each PDF
for idx, file_info in enumerate(pdf_files, 1):
file_id = file_info['id']
filename = file_info['name']
logger.info("[{}/{}] Processing: {}".format(idx, len(pdf_files), filename))
try:
# 1. Download PDF from Box
temp_file_path = temp_dir / filename
box_client.download_file(file_id, str(temp_file_path))
# 2. Extract data using LlamaExtract
extraction_data = extractor.extract_from_file(str(temp_file_path))
if extraction_data is None:
raise Exception("Extraction returned None")
# 3. Parse fields
parsed_fields = extractor.parse_csv_fields(extraction_data)
if not parsed_fields:
raise Exception("Failed to parse extraction fields")
# 4. Store in database with full JSON
db_result = db.store_creativex_score(
filename=parsed_fields['filename'],
creativex_id=parsed_fields['id'],
creativex_url=parsed_fields['url'],
quality_score=parsed_fields['score'],
box_file_id=file_id,
full_extraction_data=extraction_data
)
if not db_result['success']:
raise Exception("Database storage failed: {}".format(db_result.get('error', 'Unknown')))
# 5. Delete file from Box (only after successful storage)
try:
box_file = box_client.client.file(file_id)
box_file.delete()
logger.info(" Deleted from Box: {}".format(filename))
except Exception as e:
logger.warning(" Could not delete file from Box: {}".format(str(e)))
# Don't fail the whole process if delete fails
# 6. Clean up local temp file
try:
os.remove(str(temp_file_path))
except Exception as e:
logger.warning(" Could not delete temp file: {}".format(str(e)))
# Track success
processed_files.append({
'filename': parsed_fields['filename'],
'creativex_id': parsed_fields['id'],
'creativex_url': parsed_fields['url'],
'quality_score': parsed_fields['score'],
'box_file_id': file_id
})
logger.info(" ✓ Success: Score {} extracted and stored".format(parsed_fields['score']))
logger.info("")
except Exception as e:
logger.error(" ✗ Failed: {}".format(str(e)))
logger.info("")
failed_files.append({
'filename': filename,
'box_file_id': file_id,
'error': str(e)
})
# Clean up temp file if it exists
try:
temp_file_path = temp_dir / filename
if temp_file_path.exists():
os.remove(str(temp_file_path))
except:
pass
# Summary
total_files = len(pdf_files)
success_count = len(processed_files)
failed_count = len(failed_files)
logger.info("=" * 60)
logger.info("Processing Complete")
logger.info("=" * 60)
logger.info("Total Files: {}".format(total_files))
logger.info("Successful: {}".format(success_count))
logger.info("Failed: {}".format(failed_count))
logger.info("")
# Send email notification
if failed_count == 0:
# All successful
notifier.send_email(
template_name='creativex_complete',
recipients=config['notifications']['recipients']['success'],
data={
'file_count': total_files,
'success_count': success_count,
'processed_files': processed_files
}
)
else:
# Partial success
notifier.send_email(
template_name='creativex_partial',
recipients=config['notifications']['recipients']['errors'],
data={
'file_count': total_files,
'success_count': success_count,
'failed_count': failed_count,
'processed_files': processed_files,
'failed_files': failed_files
}
)
return {
'success': success_count > 0,
'file_count': total_files,
'processed': success_count,
'failed': failed_count
}
except Exception as e:
logger.error("FATAL ERROR: {}".format(str(e)))
return {'success': False, 'error': str(e)}
def main():
"""Entry point."""
try:
logger.info("Starting CreativeX Score Extraction")
logger.info("")
# Load configuration
config = load_config('config/config.yaml')
# Initialize clients
logger.info("Initializing clients...")
# Box client for CreativeX folder
box = BoxClient(config, root_folder_id=config['creativex']['box_folder_id'])
# Database
db = Database(config)
# Notifier
notifier = Notifier(config)
# CreativeX Extractor
extractor = CreativeXExtractor(
api_key=config['creativex']['llama_api_key'],
agent_name=config['creativex']['agent_name']
)
logger.info("Clients initialized successfully")
logger.info("")
# Process PDFs
result = process_pdfs(box, db, extractor, notifier, config)
if result['success']:
logger.info("✓ CreativeX extraction completed successfully")
sys.exit(0)
else:
logger.error("✗ CreativeX extraction failed")
sys.exit(1)
except KeyboardInterrupt:
logger.info("\n\nProcess interrupted by user.")
sys.exit(1)
except Exception as e:
logger.error("\nFATAL ERROR: {}".format(str(e)))
import traceback
traceback.print_exc()
sys.exit(1)
finally:
# Close database connections
try:
db.close()
except:
pass
if __name__ == "__main__":
main()

View file

@ -536,6 +536,103 @@ class Database:
cursor.close()
self.put_connection(conn)
def store_creativex_score(self, filename, creativex_id, creativex_url, quality_score, box_file_id, full_extraction_data):
"""
Store CreativeX score data extracted from PDF
Args:
filename: Original filename from extraction
creativex_id: CreativeX ID from extraction
creativex_url: CreativeX URL from extraction
quality_score: Quality score percentage
box_file_id: Box file ID for tracking
full_extraction_data: Complete LlamaExtract JSON response
Returns:
dict with success boolean
"""
conn = self.get_connection()
try:
cursor = conn.cursor()
# Convert full_extraction_data to JSON string if it's a dict
import json
full_json = json.dumps(full_extraction_data) if isinstance(full_extraction_data, dict) else full_extraction_data
cursor.execute("""
INSERT INTO creativex_scores (
filename, creativex_id, creativex_url, quality_score,
box_file_id, full_extraction_data
) VALUES (%s, %s, %s, %s, %s, %s)
""", (
filename,
creativex_id,
creativex_url,
quality_score,
box_file_id,
full_json
))
conn.commit()
logger.info("Stored CreativeX score: {} (Score: {})".format(filename, quality_score))
return {'success': True}
except Exception as e:
conn.rollback()
logger.error("Failed to store CreativeX score: {}".format(str(e)))
return {'success': False, 'error': str(e)}
finally:
cursor.close()
self.put_connection(conn)
def get_creativex_score_by_filename(self, filename):
"""
Get CreativeX score data by filename
Args:
filename: Filename to search for
Returns:
dict with creativex data or None if not found
"""
conn = self.get_connection()
try:
cursor = conn.cursor()
cursor.execute("""
SELECT filename, creativex_id, creativex_url, quality_score,
box_file_id, full_extraction_data, extracted_at
FROM creativex_scores
WHERE filename = %s AND status = 'active'
ORDER BY extracted_at DESC
LIMIT 1
""", (filename,))
row = cursor.fetchone()
if not row:
return None
# Parse JSONB as dict
import json
full_data = row[5] if isinstance(row[5], dict) else json.loads(row[5])
return {
'filename': row[0],
'creativex_id': row[1],
'creativex_url': row[2],
'quality_score': row[3],
'box_file_id': row[4],
'full_extraction_data': full_data,
'extracted_at': row[6]
}
finally:
cursor.close()
self.put_connection(conn)
def close(self):
"""Close all connections in pool"""
if self.pool:

View file

@ -678,6 +678,110 @@ class Notifier:
</div>
</div>
"""
},
'creativex_complete': {
'subject': "✅ CreativeX Scores Extracted - {file_count} files processed",
'html': """
<div style="font-family: Arial, sans-serif; max-width: 900px; margin: 0 auto;">
<div style="background-color: #9c27b0; color: white; padding: 20px; text-align: center; border-radius: 8px 8px 0 0;">
<h1 style="margin: 0;"> CreativeX Score Extraction Complete</h1>
</div>
<div style="background-color: #f3e5f5; border-left: 4px solid #9c27b0; padding: 15px; margin: 20px 0;">
<p style="margin: 0;"><strong>Files Processed:</strong> {{ file_count }}</p>
<p style="margin: 5px 0 0 0;"><strong>Scores Extracted:</strong> {{ success_count }}</p>
<p style="margin: 5px 0 0 0;"><strong>Source:</strong> Box Folder 350605024645</p>
</div>
<h3 style="margin-top: 30px; color: #333;">Extracted Scores:</h3>
{% for score in processed_files %}
<div style="border: 1px solid #ddd; margin: 15px 0; padding: 15px; background-color: #fafafa; border-radius: 4px;">
<div style="background-color: #9c27b0; color: white; padding: 10px 15px; margin: -15px -15px 15px -15px; border-radius: 4px 4px 0 0;">
<strong>{{ score.filename }}</strong>
</div>
<div style="padding: 10px; background-color: white; border-radius: 4px;">
<p style="margin: 5px 0;"><span style="font-weight: bold;">Quality Score:</span> <span style="font-size: 20px; color: #9c27b0;">{{ score.quality_score }}</span></p>
<p style="margin: 5px 0;"><span style="font-weight: bold;">CreativeX ID:</span> {{ score.creativex_id }}</p>
{% if score.creativex_url %}<p style="margin: 5px 0;"><span style="font-weight: bold;">CreativeX URL:</span> <a href="{{ score.creativex_url }}">{{ score.creativex_url }}</a></p>{% endif %}
<p style="margin: 5px 0;"><span style="font-weight: bold;">Box File ID:</span> {{ score.box_file_id }}</p>
</div>
</div>
{% endfor %}
<div style="background-color: #f3e5f5; border-left: 4px solid #9c27b0; padding: 15px; margin: 20px 0;">
<p style="margin: 0;"><strong> Complete:</strong> All scores extracted and stored in database.</p>
<p style="margin: 5px 0 0 0;"><strong>Files Removed:</strong> Processed PDFs deleted from Box folder.</p>
</div>
<p style="color: #666; font-size: 12px; margin-top: 20px;">CreativeX scores stored with full JSON for future lookups.</p>
</div>
"""
},
'creativex_partial': {
'subject': "⚠️ CreativeX Extraction Partial - {success_count}/{file_count} successful",
'html': """
<div style="font-family: Arial, sans-serif; max-width: 900px; margin: 0 auto;">
<div style="background-color: #ff9800; color: white; padding: 20px; text-align: center; border-radius: 8px 8px 0 0;">
<h1 style="margin: 0;"> CreativeX Extraction Partially Complete</h1>
</div>
<div style="background-color: #fff3e0; border-left: 4px solid #ff9800; padding: 15px; margin: 20px 0;">
<p style="margin: 0;"><strong>Total Files:</strong> {{ file_count }}</p>
<p style="margin: 5px 0 0 0;"><strong> Successful:</strong> <span style="color: #28a745;">{{ success_count }}</span></p>
<p style="margin: 5px 0 0 0;"><strong> Failed:</strong> <span style="color: #d32f2f;">{{ failed_count }}</span></p>
<p style="margin: 5px 0 0 0;"><strong>Source:</strong> Box Folder 350605024645</p>
</div>
{% if processed_files %}
<h3 style="margin-top: 30px; color: #28a745;"> Successful Extractions ({{ success_count }}):</h3>
{% for score in processed_files %}
<div style="border: 1px solid #c8e6c9; margin: 10px 0; padding: 12px; background-color: #f1f8e9; border-radius: 4px;">
<strong>{{ score.filename }}</strong> - Score: {{ score.quality_score }}
</div>
{% endfor %}
{% endif %}
{% if failed_files %}
<h3 style="margin-top: 30px; color: #d32f2f;"> Failed Extractions ({{ failed_count }}):</h3>
{% for file in failed_files %}
<div style="border: 1px solid #ffcdd2; margin: 10px 0; padding: 12px; background-color: #ffebee; border-radius: 4px;">
<strong>{{ file.filename }}</strong>
<br><small style="color: #666;">Error: {{ file.error }}</small>
</div>
{% endfor %}
{% endif %}
<div style="background-color: #fff3e0; border-left: 4px solid #ff9800; padding: 15px; margin: 20px 0;">
<p style="margin: 0;"><strong> Action Required:</strong> Review failed extractions above.</p>
<p style="margin: 5px 0 0 0;">Failed files remain in Box folder for retry.</p>
</div>
<p style="color: #666; font-size: 12px; margin-top: 20px;">Successful scores stored in database. Failed files not deleted from Box.</p>
</div>
"""
},
'creativex_no_files': {
'subject': " CreativeX Extraction - No files found",
'html': """
<div style="font-family: Arial, sans-serif; max-width: 900px; margin: 0 auto;">
<div style="background-color: #607d8b; color: white; padding: 20px; text-align: center; border-radius: 8px 8px 0 0;">
<h1 style="margin: 0;"> CreativeX Extraction - No Files</h1>
</div>
<div style="background-color: #eceff1; border-left: 4px solid #607d8b; padding: 15px; margin: 20px 0;">
<p style="margin: 0;"><strong>Status:</strong> No PDF files found</p>
<p style="margin: 5px 0 0 0;"><strong>Source:</strong> Box Folder 350605024645</p>
<p style="margin: 5px 0 0 0;"><strong>Run Time:</strong> {{ timestamp }}</p>
</div>
<div style="background-color: #e3f2fd; border-left: 4px solid #2196f3; padding: 15px; margin: 20px 0;">
<p style="margin: 0;"><strong> Note:</strong> This is expected behavior when no new PDFs are ready for processing.</p>
<p style="margin: 5px 0 0 0;">Upload PDFs to Box folder 350605024645 to process CreativeX scores.</p>
</div>
<p style="color: #666; font-size: 12px; margin-top: 20px;">Script completed successfully with no errors.</p>
</div>
"""
}
}