Add CreativeX score extraction and storage system

Implements new workflow to extract CreativeX quality scores from PDFs using LlamaExtract AI and store results in PostgreSQL database. Components added: - creativex_scoring_storing.py: Main script to process PDFs from Box - creativex_scores table: Database table with JSONB for full JSON storage - Database methods: store_creativex_score() and get_creativex_score_by_filename() - Email templates: creativex_complete, creativex_partial, creativex_no_files - Configuration: creativex section in config.yaml - CREATIVEX_DEPLOYMENT.md: Complete deployment and usage guide Features: - Monitors Box folder 350605024645 for PDFs - Extracts scores using LlamaExtract agent "Creativex-Extract" - Stores 4 key fields (filename, ID, URL, score) + full JSON - Deletes processed PDFs from Box after successful extraction - Sends email notifications for success/partial/no-files scenarios - Manual execution (python scripts/creativex_scoring_storing.py) Database schema: - Table: creativex_scores with 10 columns - Indexes on filename, box_file_id, status for fast lookups - JSONB column stores complete extraction for future flexibility Future integration ready: db.get_creativex_score_by_filename() available for DAM upload workflows to attach CreativeX metadata during asset processing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 16:15:45 -05:00 · 2025-11-11 16:15:45 -05:00 · b6b9d7337a
commit b6b9d7337a
parent a9c3ff6503
7 changed files with 1041 additions and 1 deletions
--- a/Python-Version/CREATIVEX_DEPLOYMENT.md
+++ b/Python-Version/CREATIVEX_DEPLOYMENT.md
@ -0,0 +1,398 @@
+# CreativeX Score Extraction - Deployment Guide
+
+## Overview
+
+This guide covers deploying the CreativeX score extraction system, which:
+1. Monitors Box folder 350605024645 for PDF files
+2. Extracts CreativeX scores using LlamaExtract AI agent "Creativex-Extract"
+3. Stores results in PostgreSQL database with full JSON
+4. Removes processed files from Box
+5. Sends email notifications
+
+## Local Development Setup
+
+### 1. Add Environment Variable
+
+Add to your `.env` file:
+
+```bash
+# CreativeX Configuration
+LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key_here
+```
+
+### 2. Install Python Dependencies
+
+```bash
+cd Python-Version
+source venv/bin/activate
+pip install llama-cloud-services
+```
+
+Or install all dependencies:
+
+```bash
+pip install -r requirements.txt
+```
+
+### 3. Create Database Table
+
+**If starting fresh (full init):**
+```bash
+PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -f database/init.sql
+```
+
+**If database already exists (add table only):**
+```bash
+PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
+CREATE TABLE IF NOT EXISTS creativex_scores (
+    id SERIAL PRIMARY KEY,
+    filename VARCHAR(500) NOT NULL,
+    box_file_id VARCHAR(255),
+    creativex_id VARCHAR(255),
+    creativex_url TEXT,
+    quality_score VARCHAR(50),
+    full_extraction_data JSONB,
+    extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    status VARCHAR(50) DEFAULT 'active',
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename);
+CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id);
+CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status);
+"
+```
+
+### 4. Verify Table Creation
+
+```bash
+PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "\d creativex_scores"
+```
+
+You should see:
+- 10 columns (id, filename, box_file_id, creativex_id, creativex_url, quality_score, full_extraction_data, extracted_at, status, created_at)
+- 3 indexes (idx_creativex_filename, idx_creativex_box_file, idx_creativex_status)
+
+### 5. Test Locally
+
+```bash
+# Run the script manually
+python scripts/creativex_scoring_storing.py
+```
+
+**Expected behaviors:**
+- If no PDFs in Box folder 350605024645: "No PDF files found" email sent
+- If PDFs present: Extraction runs, scores stored, files deleted from Box
+- If extraction fails: Partial success email with errors
+
+## Production Server Deployment
+
+### Prerequisites
+- Server already running Ferrero automation (A1→A2, A5→A6, etc.)
+- Git repository backed up to Bitbucket
+- SSH access to production server
+
+### Step 1: Update .env on Server
+
+SSH to server and add:
+
+```bash
+cd /opt/ferrero-automation/Python-Version
+nano .env
+```
+
+Add:
+```bash
+# CreativeX Configuration
+LLAMA_CLOUD_API_KEY=your_production_llama_cloud_api_key
+```
+
+Save and exit (Ctrl+X, Y, Enter).
+
+### Step 2: Pull Latest Code
+
+```bash
+cd /opt/ferrero-automation/Python-Version
+git pull origin main
+```
+
+This will include:
+- `scripts/creativex_scoring_storing.py`
+- Updated `database/init.sql`
+- Updated `scripts/shared/database.py`
+- Updated `scripts/shared/notifier.py`
+- Updated `config/config.yaml`
+- Updated `requirements.txt`
+
+### Step 3: Install Dependencies
+
+```bash
+cd /opt/ferrero-automation/Python-Version
+source venv/bin/activate
+pip install llama-cloud-services
+```
+
+Or update all:
+```bash
+pip install -r requirements.txt --upgrade
+```
+
+### Step 4: Create Database Table
+
+```bash
+PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
+CREATE TABLE IF NOT EXISTS creativex_scores (
+    id SERIAL PRIMARY KEY,
+    filename VARCHAR(500) NOT NULL,
+    box_file_id VARCHAR(255),
+    creativex_id VARCHAR(255),
+    creativex_url TEXT,
+    quality_score VARCHAR(50),
+    full_extraction_data JSONB,
+    extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    status VARCHAR(50) DEFAULT 'active',
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename);
+CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id);
+CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status);
+"
+```
+
+### Step 5: Verify Installation
+
+```bash
+# Check database table
+PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT COUNT(*) FROM creativex_scores;"
+
+# Check script exists
+ls -lh scripts/creativex_scoring_storing.py
+
+# Check it's executable
+chmod +x scripts/creativex_scoring_storing.py
+```
+
+### Step 6: Test Run
+
+```bash
+cd /opt/ferrero-automation/Python-Version
+source venv/bin/activate
+python scripts/creativex_scoring_storing.py
+```
+
+Check logs:
+```bash
+tail -f logs/creativex_scoring.log
+```
+
+### Step 7: Add to Cron (Optional - If Automated)
+
+**Note:** User specified this is manual for now, so skip this step initially.
+
+If you want to automate later (e.g., every hour):
+
+```bash
+crontab -e
+```
+
+Add:
+```cron
+# CreativeX Score Extraction - Every hour
+0 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/creativex_scoring_storing.py >> logs/cron_creativex.log 2>&1
+```
+
+Save and exit.
+
+## Configuration Details
+
+### Box Folder
+- **Folder ID:** 350605024645
+- **Purpose:** Drop PDFs here for CreativeX score extraction
+- **Behavior:** Files are automatically deleted after successful processing
+
+### LlamaExtract Agent
+- **Agent Name:** Creativex-Extract
+- **Expected Fields:**
+  - `filename`: Original filename from PDF
+  - `creativeXId.id`: CreativeX identifier
+  - `creativeXId.url`: CreativeX URL
+  - `ferreroCreativeQuality.percentage`: Quality score
+
+### Database Storage
+- **Table:** `creativex_scores`
+- **Quick Access Fields:** filename, creativex_id, creativex_url, quality_score
+- **Full JSON:** Stored in `full_extraction_data` JSONB column
+- **Purpose:** Future lookups by filename during DAM uploads
+
+### Email Notifications
+
+**Recipients configured in .env:**
+- Success: `REPORT_EMAILS`
+- Errors: `ERROR_EMAIL`
+
+**Templates:**
+1. `creativex_complete` - All files processed successfully
+2. `creativex_partial` - Some files failed
+3. `creativex_no_files` - No PDFs found (normal if folder empty)
+
+## Usage
+
+### Manual Execution
+
+```bash
+cd /opt/ferrero-automation/Python-Version
+source venv/bin/activate
+python scripts/creativex_scoring_storing.py
+```
+
+### Workflow
+
+1. Upload PDFs to Box folder 350605024645
+2. Run script (manual or cron)
+3. Script downloads each PDF
+4. LlamaExtract processes PDF
+5. Results stored in database
+6. PDF deleted from Box
+7. Email notification sent
+
+### Checking Results
+
+```bash
+# View recent extractions
+PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
+SELECT filename, creativex_id, quality_score, extracted_at
+FROM creativex_scores
+ORDER BY extracted_at DESC
+LIMIT 10;
+"
+
+# Count total scores
+PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
+SELECT COUNT(*) as total_scores FROM creativex_scores WHERE status = 'active';
+"
+
+# View specific file
+PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
+SELECT * FROM creativex_scores WHERE filename LIKE '%yourfile%';
+"
+```
+
+### Viewing Full JSON
+
+```bash
+PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
+SELECT filename, full_extraction_data::jsonb
+FROM creativex_scores
+WHERE filename = 'example.pdf';
+"
+```
+
+## Future Integration
+
+The database method `db.get_creativex_score_by_filename(filename)` is ready for use in other scripts.
+
+**Example usage in future DAM upload workflow:**
+
+```python
+# In a2_to_a3_upload_polling.py or similar
+filename = "Brand_Country_Language_123456.mp4"
+
+# Lookup CreativeX score
+score_data = db.get_creativex_score_by_filename(filename)
+
+if score_data:
+    # Add to DAM metadata
+    dam_metadata['FERRERO.FIELD.CREATIVEX_SCORE'] = score_data['quality_score']
+    dam_metadata['FERRERO.FIELD.CREATIVEX_URL'] = score_data['creativex_url']
+    dam_metadata['FERRERO.FIELD.CREATIVEX_ID'] = score_data['creativex_id']
+```
+
+## Troubleshooting
+
+### "llama-cloud-services not installed"
+```bash
+source venv/bin/activate
+pip install llama-cloud-services
+```
+
+### "Agent 'Creativex-Extract' not found"
+- Verify agent name in LlamaCloud portal
+- Check spelling matches exactly: `Creativex-Extract`
+- Verify API key is correct
+
+### "No PDF files found"
+- This is normal if Box folder 350605024645 is empty
+- Upload test PDF to folder and re-run
+
+### "Database connection failed"
+```bash
+# Check PostgreSQL is running
+docker ps | grep ferrero
+
+# Test connection
+PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT 1;"
+```
+
+### "Email not sending"
+- Check SMTP configuration in .env
+- Verify Mailgun credentials
+- Check logs for detailed error
+
+### Files not deleted from Box
+- This is expected for failed extractions
+- Only successful extractions delete files
+- Failed files remain for manual review/retry
+
+## Rollback Instructions
+
+If you need to rollback:
+
+### Remove Database Table
+```bash
+PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "
+DROP TABLE IF EXISTS creativex_scores CASCADE;
+"
+```
+
+### Remove from Cron
+```bash
+crontab -e
+# Delete the CreativeX line, save and exit
+```
+
+### Revert Code
+```bash
+cd /opt/ferrero-automation/Python-Version
+git revert <commit-hash>
+git push origin main
+```
+
+## Support
+
+- **Logs:** `logs/creativex_scoring.log`
+- **Database Queries:** See "Checking Results" section above
+- **Email Test:** Check SMTP settings and recipients list
+- **LlamaCloud Issues:** Verify API key and agent configuration
+
+## Summary Checklist
+
+**Local Setup:**
+- [ ] Add `LLAMA_CLOUD_API_KEY` to .env
+- [ ] Install `llama-cloud-services` package
+- [ ] Create `creativex_scores` table
+- [ ] Test script runs successfully
+
+**Production Deployment:**
+- [ ] Git pull latest code
+- [ ] Add `LLAMA_CLOUD_API_KEY` to server .env
+- [ ] Install dependencies on server
+- [ ] Create database table on server
+- [ ] Test run on server
+- [ ] Verify email notifications
+- [ ] (Optional) Add to cron if automating
+
+**Post-Deployment:**
+- [ ] Upload test PDF to Box folder 350605024645
+- [ ] Run script and verify extraction
+- [ ] Check database record created
+- [ ] Verify PDF deleted from Box
+- [ ] Confirm email notification received
--- a/Python-Version/config/config.yaml
+++ b/Python-Version/config/config.yaml
@ -95,6 +95,12 @@ notifications:
 fields:
  mappings_file: config/field_mappings.yaml

+# CreativeX Configuration
+creativex:
+  llama_api_key: ${LLAMA_CLOUD_API_KEY}
+  agent_name: Creativex-Extract
+  box_folder_id: "350605024645"
+
 # Logging Configuration
 logging:
  level: INFO
--- a/Python-Version/database/init.sql
+++ b/Python-Version/database/init.sql
@ -172,6 +172,35 @@ CREATE TABLE IF NOT EXISTS campaign_status (

 \echo 'Table campaign_status created'

+-- ============================================================================
+-- Table: creativex_scores
+-- Purpose: Stores CreativeX quality scores extracted from PDFs via LlamaExtract
+-- ============================================================================
+
+CREATE TABLE IF NOT EXISTS creativex_scores (
+    -- Primary Key
+    id SERIAL PRIMARY KEY,
+
+    -- File Information
+    filename VARCHAR(500) NOT NULL,
+    box_file_id VARCHAR(255),
+
+    -- CreativeX Data (parsed fields for quick access)
+    creativex_id VARCHAR(255),
+    creativex_url TEXT,
+    quality_score VARCHAR(50),
+
+    -- Full Extraction Data (JSONB - Complete LlamaExtract response for future use)
+    full_extraction_data JSONB,
+
+    -- Timestamps
+    extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    status VARCHAR(50) DEFAULT 'active',
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+
+\echo 'Table creativex_scores created'
+
 \echo 'Tables created successfully'

 -- ============================================================================
@ -211,6 +240,11 @@ CREATE INDEX IF NOT EXISTS idx_campaign_status_status ON campaign_status(status)
 CREATE INDEX IF NOT EXISTS idx_campaign_status_live ON campaign_status(live_campaign);
 CREATE INDEX IF NOT EXISTS idx_campaign_status_webhook_sent ON campaign_status(webhook_sent);

+-- creativex_scores indexes
+CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename);
+CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id);
+CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status);
+
 \echo 'Indexes created successfully'

 -- ============================================================================
@ -323,8 +357,10 @@ GRANT USAGE ON SCHEMA public TO ferrero_user;
 \echo '  - derivative_assets'
 \echo '  - asset_events'
 \echo '  - workflow_state'
+\echo '  - campaign_status'
+\echo '  - creativex_scores'
 \echo ''
-\echo 'Indexes created: 12'
+\echo 'Indexes created: 15'
 \echo 'Triggers created: 4'
 \echo 'Functions created: 2'
 \echo ''
--- a/Python-Version/requirements.txt
+++ b/Python-Version/requirements.txt
@ -24,6 +24,9 @@ cryptography>=3.4.0
 # Email templates
 Jinja2>=3.0.0

+# LlamaExtract for CreativeX score extraction
+llama-cloud-services>=0.1.0
+
 # Retry logic
 tenacity>=8.0.0

--- a/Python-Version/scripts/creativex_scoring_storing.py
+++ b/Python-Version/scripts/creativex_scoring_storing.py
@ -0,0 +1,396 @@
+#!/usr/bin/env python3
+"""
+CreativeX Score Extractor and Storage
+Processes PDFs from Box folder 350605024645, extracts CreativeX scores using LlamaExtract,
+stores results in database, and removes processed files from Box.
+Compatible with Python 3.6+
+"""
+
+import sys
+import os
+import logging
+from datetime import datetime
+from pathlib import Path
+
+# Add shared library to path
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
+
+from shared.config_loader import load_config
+from shared.box_client import BoxClient
+from shared.database import Database
+from shared.notifier import Notifier
+
+# Setup logging with rotation
+from logging.handlers import RotatingFileHandler
+
+# Create logs directory if it doesn't exist
+os.makedirs('logs', exist_ok=True)
+
+# Configure logging with rotation
+log_handler = RotatingFileHandler(
+    'logs/creativex_scoring.log',
+    maxBytes=10*1024*1024,  # 10MB per file
+    backupCount=28  # Keep 28 rotated files (approximately 1 month)
+)
+log_handler.setLevel(logging.INFO)
+log_handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
+
+console_handler = logging.StreamHandler()
+console_handler.setLevel(logging.INFO)
+console_handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
+
+logging.basicConfig(
+    level=logging.INFO,
+    handlers=[log_handler, console_handler]
+)
+
+logger = logging.getLogger('CreativeXScoring')
+
+
+class CreativeXExtractor:
+    """Handles extraction of CreativeX data from PDF files using LlamaExtract."""
+
+    def __init__(self, api_key, agent_name):
+        """
+        Initialize the Llama Extract client.
+
+        Args:
+            api_key: LlamaCloud API key
+            agent_name: Agent name in LlamaExtract
+        """
+        try:
+            from llama_cloud_services import LlamaExtract
+            self.extractor = LlamaExtract(api_key=api_key)
+            self.agent_name = agent_name
+            logger.info("LlamaExtract client initialized with agent: {}".format(agent_name))
+        except ImportError:
+            logger.error("llama-cloud-services not installed. Run: pip install llama-cloud-services")
+            raise
+        except Exception as e:
+            logger.error("Failed to initialize LlamaExtract: {}".format(str(e)))
+            raise
+
+    def extract_from_file(self, file_path):
+        """
+        Extract data from a PDF file using Llama Extract.
+
+        Args:
+            file_path: Path to the PDF file
+
+        Returns:
+            Dictionary containing the extraction result, or None if extraction fails
+        """
+        try:
+            logger.info("  Getting agent: {}".format(self.agent_name))
+            agent = self.extractor.get_agent(name=self.agent_name)
+
+            if agent is None:
+                raise Exception("Agent '{}' not found".format(self.agent_name))
+
+            logger.info("  Running extraction on: {}".format(os.path.basename(file_path)))
+            result = agent.extract(str(file_path))
+
+            # Convert result to dictionary format
+            extraction_data = {
+                "run_id": getattr(result, "run_id", None),
+                "extraction_agent_id": getattr(result, "extraction_agent_id", None),
+                "data": result.data if hasattr(result, "data") else {},
+                "extraction_metadata": getattr(result, "extraction_metadata", {})
+            }
+
+            return extraction_data
+
+        except Exception as e:
+            logger.error("  ERROR: Extraction failed - {}".format(str(e)))
+            return None
+
+    def parse_csv_fields(self, extraction_data):
+        """
+        Parse specific fields for database storage.
+
+        Expected fields:
+        - filename
+        - creativeXId.id
+        - creativeXId.url
+        - ferreroCreativeQuality.percentage
+
+        Args:
+            extraction_data: Full extraction result dictionary
+
+        Returns:
+            Dictionary with parsed fields, or None if required fields are missing
+        """
+        try:
+            data = extraction_data.get("data", {})
+
+            # Extract filename
+            filename = data.get("filename", "")
+
+            # Extract creativeXId fields
+            creative_x_id_obj = data.get("creativeXId", {})
+            creative_x_id = creative_x_id_obj.get("id", "") if isinstance(creative_x_id_obj, dict) else ""
+            creative_x_url = creative_x_id_obj.get("url", "") if isinstance(creative_x_id_obj, dict) else ""
+
+            # Extract ferreroCreativeQuality percentage
+            ferrero_quality_obj = data.get("ferreroCreativeQuality", {})
+            quality_score = ferrero_quality_obj.get("percentage", "") if isinstance(ferrero_quality_obj, dict) else ""
+
+            # Validate that we have the critical fields
+            if not filename:
+                logger.warning("  WARNING: filename field is missing from extraction data")
+
+            return {
+                "filename": filename,
+                "id": creative_x_id,
+                "url": creative_x_url,
+                "score": quality_score
+            }
+
+        except Exception as e:
+            logger.error("  ERROR: Failed to parse CSV fields - {}".format(str(e)))
+            return None
+
+
+def process_pdfs(box_client, db, extractor, notifier, config):
+    """
+    Process all PDFs in the CreativeX Box folder.
+
+    Args:
+        box_client: BoxClient instance
+        db: Database instance
+        extractor: CreativeXExtractor instance
+        notifier: Notifier instance
+        config: Configuration dict
+
+    Returns:
+        dict with processing results
+    """
+    creativex_folder_id = config['creativex']['box_folder_id']
+
+    logger.info("=" * 60)
+    logger.info("CreativeX Score Extraction")
+    logger.info("=" * 60)
+    logger.info("Box Folder ID: {}".format(creativex_folder_id))
+    logger.info("")
+
+    try:
+        # List all PDF files in Box folder
+        files = box_client.list_folder_files(creativex_folder_id)
+        pdf_files = [f for f in files if f['name'].lower().endswith('.pdf')]
+
+        if not pdf_files:
+            logger.info("No PDF files found in Box folder")
+
+            # Send email notification
+            notifier.send_email(
+                template_name='creativex_no_files',
+                recipients=config['notifications']['recipients']['success'],
+                data={
+                    'timestamp': datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+                }
+            )
+
+            return {'success': True, 'file_count': 0, 'processed': 0, 'failed': 0}
+
+        logger.info("Found {} PDF file(s) to process".format(len(pdf_files)))
+        logger.info("")
+
+        # Create temp directory
+        temp_dir = Path('temp/creativex')
+        temp_dir.mkdir(parents=True, exist_ok=True)
+
+        # Track results
+        processed_files = []
+        failed_files = []
+
+        # Process each PDF
+        for idx, file_info in enumerate(pdf_files, 1):
+            file_id = file_info['id']
+            filename = file_info['name']
+
+            logger.info("[{}/{}] Processing: {}".format(idx, len(pdf_files), filename))
+
+            try:
+                # 1. Download PDF from Box
+                temp_file_path = temp_dir / filename
+                box_client.download_file(file_id, str(temp_file_path))
+
+                # 2. Extract data using LlamaExtract
+                extraction_data = extractor.extract_from_file(str(temp_file_path))
+
+                if extraction_data is None:
+                    raise Exception("Extraction returned None")
+
+                # 3. Parse fields
+                parsed_fields = extractor.parse_csv_fields(extraction_data)
+
+                if not parsed_fields:
+                    raise Exception("Failed to parse extraction fields")
+
+                # 4. Store in database with full JSON
+                db_result = db.store_creativex_score(
+                    filename=parsed_fields['filename'],
+                    creativex_id=parsed_fields['id'],
+                    creativex_url=parsed_fields['url'],
+                    quality_score=parsed_fields['score'],
+                    box_file_id=file_id,
+                    full_extraction_data=extraction_data
+                )
+
+                if not db_result['success']:
+                    raise Exception("Database storage failed: {}".format(db_result.get('error', 'Unknown')))
+
+                # 5. Delete file from Box (only after successful storage)
+                try:
+                    box_file = box_client.client.file(file_id)
+                    box_file.delete()
+                    logger.info("  Deleted from Box: {}".format(filename))
+                except Exception as e:
+                    logger.warning("  Could not delete file from Box: {}".format(str(e)))
+                    # Don't fail the whole process if delete fails
+
+                # 6. Clean up local temp file
+                try:
+                    os.remove(str(temp_file_path))
+                except Exception as e:
+                    logger.warning("  Could not delete temp file: {}".format(str(e)))
+
+                # Track success
+                processed_files.append({
+                    'filename': parsed_fields['filename'],
+                    'creativex_id': parsed_fields['id'],
+                    'creativex_url': parsed_fields['url'],
+                    'quality_score': parsed_fields['score'],
+                    'box_file_id': file_id
+                })
+
+                logger.info("  ✓ Success: Score {} extracted and stored".format(parsed_fields['score']))
+                logger.info("")
+
+            except Exception as e:
+                logger.error("  ✗ Failed: {}".format(str(e)))
+                logger.info("")
+
+                failed_files.append({
+                    'filename': filename,
+                    'box_file_id': file_id,
+                    'error': str(e)
+                })
+
+                # Clean up temp file if it exists
+                try:
+                    temp_file_path = temp_dir / filename
+                    if temp_file_path.exists():
+                        os.remove(str(temp_file_path))
+                except:
+                    pass
+
+        # Summary
+        total_files = len(pdf_files)
+        success_count = len(processed_files)
+        failed_count = len(failed_files)
+
+        logger.info("=" * 60)
+        logger.info("Processing Complete")
+        logger.info("=" * 60)
+        logger.info("Total Files: {}".format(total_files))
+        logger.info("Successful: {}".format(success_count))
+        logger.info("Failed: {}".format(failed_count))
+        logger.info("")
+
+        # Send email notification
+        if failed_count == 0:
+            # All successful
+            notifier.send_email(
+                template_name='creativex_complete',
+                recipients=config['notifications']['recipients']['success'],
+                data={
+                    'file_count': total_files,
+                    'success_count': success_count,
+                    'processed_files': processed_files
+                }
+            )
+        else:
+            # Partial success
+            notifier.send_email(
+                template_name='creativex_partial',
+                recipients=config['notifications']['recipients']['errors'],
+                data={
+                    'file_count': total_files,
+                    'success_count': success_count,
+                    'failed_count': failed_count,
+                    'processed_files': processed_files,
+                    'failed_files': failed_files
+                }
+            )
+
+        return {
+            'success': success_count > 0,
+            'file_count': total_files,
+            'processed': success_count,
+            'failed': failed_count
+        }
+
+    except Exception as e:
+        logger.error("FATAL ERROR: {}".format(str(e)))
+        return {'success': False, 'error': str(e)}
+
+
+def main():
+    """Entry point."""
+    try:
+        logger.info("Starting CreativeX Score Extraction")
+        logger.info("")
+
+        # Load configuration
+        config = load_config('config/config.yaml')
+
+        # Initialize clients
+        logger.info("Initializing clients...")
+
+        # Box client for CreativeX folder
+        box = BoxClient(config, root_folder_id=config['creativex']['box_folder_id'])
+
+        # Database
+        db = Database(config)
+
+        # Notifier
+        notifier = Notifier(config)
+
+        # CreativeX Extractor
+        extractor = CreativeXExtractor(
+            api_key=config['creativex']['llama_api_key'],
+            agent_name=config['creativex']['agent_name']
+        )
+
+        logger.info("Clients initialized successfully")
+        logger.info("")
+
+        # Process PDFs
+        result = process_pdfs(box, db, extractor, notifier, config)
+
+        if result['success']:
+            logger.info("✓ CreativeX extraction completed successfully")
+            sys.exit(0)
+        else:
+            logger.error("✗ CreativeX extraction failed")
+            sys.exit(1)
+
+    except KeyboardInterrupt:
+        logger.info("\n\nProcess interrupted by user.")
+        sys.exit(1)
+    except Exception as e:
+        logger.error("\nFATAL ERROR: {}".format(str(e)))
+        import traceback
+        traceback.print_exc()
+        sys.exit(1)
+    finally:
+        # Close database connections
+        try:
+            db.close()
+        except:
+            pass
+
+
+if __name__ == "__main__":
+    main()
--- a/Python-Version/scripts/shared/database.py
+++ b/Python-Version/scripts/shared/database.py
@ -536,6 +536,103 @@ class Database:
            cursor.close()
            self.put_connection(conn)

+    def store_creativex_score(self, filename, creativex_id, creativex_url, quality_score, box_file_id, full_extraction_data):
+        """
+        Store CreativeX score data extracted from PDF
+
+        Args:
+            filename: Original filename from extraction
+            creativex_id: CreativeX ID from extraction
+            creativex_url: CreativeX URL from extraction
+            quality_score: Quality score percentage
+            box_file_id: Box file ID for tracking
+            full_extraction_data: Complete LlamaExtract JSON response
+
+        Returns:
+            dict with success boolean
+        """
+        conn = self.get_connection()
+        try:
+            cursor = conn.cursor()
+
+            # Convert full_extraction_data to JSON string if it's a dict
+            import json
+            full_json = json.dumps(full_extraction_data) if isinstance(full_extraction_data, dict) else full_extraction_data
+
+            cursor.execute("""
+                INSERT INTO creativex_scores (
+                    filename, creativex_id, creativex_url, quality_score,
+                    box_file_id, full_extraction_data
+                ) VALUES (%s, %s, %s, %s, %s, %s)
+            """, (
+                filename,
+                creativex_id,
+                creativex_url,
+                quality_score,
+                box_file_id,
+                full_json
+            ))
+
+            conn.commit()
+            logger.info("Stored CreativeX score: {} (Score: {})".format(filename, quality_score))
+
+            return {'success': True}
+
+        except Exception as e:
+            conn.rollback()
+            logger.error("Failed to store CreativeX score: {}".format(str(e)))
+            return {'success': False, 'error': str(e)}
+
+        finally:
+            cursor.close()
+            self.put_connection(conn)
+
+    def get_creativex_score_by_filename(self, filename):
+        """
+        Get CreativeX score data by filename
+
+        Args:
+            filename: Filename to search for
+
+        Returns:
+            dict with creativex data or None if not found
+        """
+        conn = self.get_connection()
+        try:
+            cursor = conn.cursor()
+
+            cursor.execute("""
+                SELECT filename, creativex_id, creativex_url, quality_score,
+                       box_file_id, full_extraction_data, extracted_at
+                FROM creativex_scores
+                WHERE filename = %s AND status = 'active'
+                ORDER BY extracted_at DESC
+                LIMIT 1
+            """, (filename,))
+
+            row = cursor.fetchone()
+
+            if not row:
+                return None
+
+            # Parse JSONB as dict
+            import json
+            full_data = row[5] if isinstance(row[5], dict) else json.loads(row[5])
+
+            return {
+                'filename': row[0],
+                'creativex_id': row[1],
+                'creativex_url': row[2],
+                'quality_score': row[3],
+                'box_file_id': row[4],
+                'full_extraction_data': full_data,
+                'extracted_at': row[6]
+            }
+
+        finally:
+            cursor.close()
+            self.put_connection(conn)
+
    def close(self):
        """Close all connections in pool"""
        if self.pool:
--- a/Python-Version/scripts/shared/notifier.py
+++ b/Python-Version/scripts/shared/notifier.py
@ -678,6 +678,110 @@ class Notifier:
                            </div>
                        </div>
                    """
+                },
+                'creativex_complete': {
+                    'subject': "✅ CreativeX Scores Extracted - {file_count} files processed",
+                    'html': """
+                        <div style="font-family: Arial, sans-serif; max-width: 900px; margin: 0 auto;">
+                            <div style="background-color: #9c27b0; color: white; padding: 20px; text-align: center; border-radius: 8px 8px 0 0;">
+                                <h1 style="margin: 0;">✅ CreativeX Score Extraction Complete</h1>
+                            </div>
+
+                            <div style="background-color: #f3e5f5; border-left: 4px solid #9c27b0; padding: 15px; margin: 20px 0;">
+                                <p style="margin: 0;"><strong>Files Processed:</strong> {{ file_count }}</p>
+                                <p style="margin: 5px 0 0 0;"><strong>Scores Extracted:</strong> {{ success_count }}</p>
+                                <p style="margin: 5px 0 0 0;"><strong>Source:</strong> Box Folder 350605024645</p>
+                            </div>
+
+                            <h3 style="margin-top: 30px; color: #333;">Extracted Scores:</h3>
+                            {% for score in processed_files %}
+                            <div style="border: 1px solid #ddd; margin: 15px 0; padding: 15px; background-color: #fafafa; border-radius: 4px;">
+                                <div style="background-color: #9c27b0; color: white; padding: 10px 15px; margin: -15px -15px 15px -15px; border-radius: 4px 4px 0 0;">
+                                    <strong>{{ score.filename }}</strong>
+                                </div>
+                                <div style="padding: 10px; background-color: white; border-radius: 4px;">
+                                    <p style="margin: 5px 0;"><span style="font-weight: bold;">Quality Score:</span> <span style="font-size: 20px; color: #9c27b0;">{{ score.quality_score }}</span></p>
+                                    <p style="margin: 5px 0;"><span style="font-weight: bold;">CreativeX ID:</span> {{ score.creativex_id }}</p>
+                                    {% if score.creativex_url %}<p style="margin: 5px 0;"><span style="font-weight: bold;">CreativeX URL:</span> <a href="{{ score.creativex_url }}">{{ score.creativex_url }}</a></p>{% endif %}
+                                    <p style="margin: 5px 0;"><span style="font-weight: bold;">Box File ID:</span> {{ score.box_file_id }}</p>
+                                </div>
+                            </div>
+                            {% endfor %}
+
+                            <div style="background-color: #f3e5f5; border-left: 4px solid #9c27b0; padding: 15px; margin: 20px 0;">
+                                <p style="margin: 0;"><strong>✓ Complete:</strong> All scores extracted and stored in database.</p>
+                                <p style="margin: 5px 0 0 0;"><strong>Files Removed:</strong> Processed PDFs deleted from Box folder.</p>
+                            </div>
+
+                            <p style="color: #666; font-size: 12px; margin-top: 20px;">CreativeX scores stored with full JSON for future lookups.</p>
+                        </div>
+                    """
+                },
+                'creativex_partial': {
+                    'subject': "⚠️ CreativeX Extraction Partial - {success_count}/{file_count} successful",
+                    'html': """
+                        <div style="font-family: Arial, sans-serif; max-width: 900px; margin: 0 auto;">
+                            <div style="background-color: #ff9800; color: white; padding: 20px; text-align: center; border-radius: 8px 8px 0 0;">
+                                <h1 style="margin: 0;">⚠️ CreativeX Extraction Partially Complete</h1>
+                            </div>
+
+                            <div style="background-color: #fff3e0; border-left: 4px solid #ff9800; padding: 15px; margin: 20px 0;">
+                                <p style="margin: 0;"><strong>Total Files:</strong> {{ file_count }}</p>
+                                <p style="margin: 5px 0 0 0;"><strong>✓ Successful:</strong> <span style="color: #28a745;">{{ success_count }}</span></p>
+                                <p style="margin: 5px 0 0 0;"><strong>✗ Failed:</strong> <span style="color: #d32f2f;">{{ failed_count }}</span></p>
+                                <p style="margin: 5px 0 0 0;"><strong>Source:</strong> Box Folder 350605024645</p>
+                            </div>
+
+                            {% if processed_files %}
+                            <h3 style="margin-top: 30px; color: #28a745;">✅ Successful Extractions ({{ success_count }}):</h3>
+                            {% for score in processed_files %}
+                            <div style="border: 1px solid #c8e6c9; margin: 10px 0; padding: 12px; background-color: #f1f8e9; border-radius: 4px;">
+                                <strong>{{ score.filename }}</strong> - Score: {{ score.quality_score }}
+                            </div>
+                            {% endfor %}
+                            {% endif %}
+
+                            {% if failed_files %}
+                            <h3 style="margin-top: 30px; color: #d32f2f;">❌ Failed Extractions ({{ failed_count }}):</h3>
+                            {% for file in failed_files %}
+                            <div style="border: 1px solid #ffcdd2; margin: 10px 0; padding: 12px; background-color: #ffebee; border-radius: 4px;">
+                                <strong>{{ file.filename }}</strong>
+                                <br><small style="color: #666;">Error: {{ file.error }}</small>
+                            </div>
+                            {% endfor %}
+                            {% endif %}
+
+                            <div style="background-color: #fff3e0; border-left: 4px solid #ff9800; padding: 15px; margin: 20px 0;">
+                                <p style="margin: 0;"><strong>⚠️ Action Required:</strong> Review failed extractions above.</p>
+                                <p style="margin: 5px 0 0 0;">Failed files remain in Box folder for retry.</p>
+                            </div>
+
+                            <p style="color: #666; font-size: 12px; margin-top: 20px;">Successful scores stored in database. Failed files not deleted from Box.</p>
+                        </div>
+                    """
+                },
+                'creativex_no_files': {
+                    'subject': "ℹ️ CreativeX Extraction - No files found",
+                    'html': """
+                        <div style="font-family: Arial, sans-serif; max-width: 900px; margin: 0 auto;">
+                            <div style="background-color: #607d8b; color: white; padding: 20px; text-align: center; border-radius: 8px 8px 0 0;">
+                                <h1 style="margin: 0;">ℹ️ CreativeX Extraction - No Files</h1>
+                            </div>
+
+                            <div style="background-color: #eceff1; border-left: 4px solid #607d8b; padding: 15px; margin: 20px 0;">
+                                <p style="margin: 0;"><strong>Status:</strong> No PDF files found</p>
+                                <p style="margin: 5px 0 0 0;"><strong>Source:</strong> Box Folder 350605024645</p>
+                                <p style="margin: 5px 0 0 0;"><strong>Run Time:</strong> {{ timestamp }}</p>
+                            </div>
+
+                            <div style="background-color: #e3f2fd; border-left: 4px solid #2196f3; padding: 15px; margin: 20px 0;">
+                                <p style="margin: 0;"><strong>ℹ️ Note:</strong> This is expected behavior when no new PDFs are ready for processing.</p>
+                                <p style="margin: 5px 0 0 0;">Upload PDFs to Box folder 350605024645 to process CreativeX scores.</p>
+                            </div>
+
+                            <p style="color: #666; font-size: 12px; margin-top: 20px;">Script completed successfully with no errors.</p>
+                        </div>
+                    """
                }
            }