diff --git a/Python-Version/CREATIVEX_DEPLOYMENT.md b/Python-Version/CREATIVEX_DEPLOYMENT.md new file mode 100644 index 0000000..cb8d34d --- /dev/null +++ b/Python-Version/CREATIVEX_DEPLOYMENT.md @@ -0,0 +1,398 @@ +# CreativeX Score Extraction - Deployment Guide + +## Overview + +This guide covers deploying the CreativeX score extraction system, which: +1. Monitors Box folder 350605024645 for PDF files +2. Extracts CreativeX scores using LlamaExtract AI agent "Creativex-Extract" +3. Stores results in PostgreSQL database with full JSON +4. Removes processed files from Box +5. Sends email notifications + +## Local Development Setup + +### 1. Add Environment Variable + +Add to your `.env` file: + +```bash +# CreativeX Configuration +LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key_here +``` + +### 2. Install Python Dependencies + +```bash +cd Python-Version +source venv/bin/activate +pip install llama-cloud-services +``` + +Or install all dependencies: + +```bash +pip install -r requirements.txt +``` + +### 3. Create Database Table + +**If starting fresh (full init):** +```bash +PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -f database/init.sql +``` + +**If database already exists (add table only):** +```bash +PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c " +CREATE TABLE IF NOT EXISTS creativex_scores ( + id SERIAL PRIMARY KEY, + filename VARCHAR(500) NOT NULL, + box_file_id VARCHAR(255), + creativex_id VARCHAR(255), + creativex_url TEXT, + quality_score VARCHAR(50), + full_extraction_data JSONB, + extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + status VARCHAR(50) DEFAULT 'active', + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); +CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename); +CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id); +CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status); +" +``` + +### 4. Verify Table Creation + +```bash +PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "\d creativex_scores" +``` + +You should see: +- 10 columns (id, filename, box_file_id, creativex_id, creativex_url, quality_score, full_extraction_data, extracted_at, status, created_at) +- 3 indexes (idx_creativex_filename, idx_creativex_box_file, idx_creativex_status) + +### 5. Test Locally + +```bash +# Run the script manually +python scripts/creativex_scoring_storing.py +``` + +**Expected behaviors:** +- If no PDFs in Box folder 350605024645: "No PDF files found" email sent +- If PDFs present: Extraction runs, scores stored, files deleted from Box +- If extraction fails: Partial success email with errors + +## Production Server Deployment + +### Prerequisites +- Server already running Ferrero automation (A1→A2, A5→A6, etc.) +- Git repository backed up to Bitbucket +- SSH access to production server + +### Step 1: Update .env on Server + +SSH to server and add: + +```bash +cd /opt/ferrero-automation/Python-Version +nano .env +``` + +Add: +```bash +# CreativeX Configuration +LLAMA_CLOUD_API_KEY=your_production_llama_cloud_api_key +``` + +Save and exit (Ctrl+X, Y, Enter). + +### Step 2: Pull Latest Code + +```bash +cd /opt/ferrero-automation/Python-Version +git pull origin main +``` + +This will include: +- `scripts/creativex_scoring_storing.py` +- Updated `database/init.sql` +- Updated `scripts/shared/database.py` +- Updated `scripts/shared/notifier.py` +- Updated `config/config.yaml` +- Updated `requirements.txt` + +### Step 3: Install Dependencies + +```bash +cd /opt/ferrero-automation/Python-Version +source venv/bin/activate +pip install llama-cloud-services +``` + +Or update all: +```bash +pip install -r requirements.txt --upgrade +``` + +### Step 4: Create Database Table + +```bash +PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c " +CREATE TABLE IF NOT EXISTS creativex_scores ( + id SERIAL PRIMARY KEY, + filename VARCHAR(500) NOT NULL, + box_file_id VARCHAR(255), + creativex_id VARCHAR(255), + creativex_url TEXT, + quality_score VARCHAR(50), + full_extraction_data JSONB, + extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + status VARCHAR(50) DEFAULT 'active', + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); +CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename); +CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id); +CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status); +" +``` + +### Step 5: Verify Installation + +```bash +# Check database table +PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT COUNT(*) FROM creativex_scores;" + +# Check script exists +ls -lh scripts/creativex_scoring_storing.py + +# Check it's executable +chmod +x scripts/creativex_scoring_storing.py +``` + +### Step 6: Test Run + +```bash +cd /opt/ferrero-automation/Python-Version +source venv/bin/activate +python scripts/creativex_scoring_storing.py +``` + +Check logs: +```bash +tail -f logs/creativex_scoring.log +``` + +### Step 7: Add to Cron (Optional - If Automated) + +**Note:** User specified this is manual for now, so skip this step initially. + +If you want to automate later (e.g., every hour): + +```bash +crontab -e +``` + +Add: +```cron +# CreativeX Score Extraction - Every hour +0 * * * * cd /opt/ferrero-automation/Python-Version && venv/bin/python scripts/creativex_scoring_storing.py >> logs/cron_creativex.log 2>&1 +``` + +Save and exit. + +## Configuration Details + +### Box Folder +- **Folder ID:** 350605024645 +- **Purpose:** Drop PDFs here for CreativeX score extraction +- **Behavior:** Files are automatically deleted after successful processing + +### LlamaExtract Agent +- **Agent Name:** Creativex-Extract +- **Expected Fields:** + - `filename`: Original filename from PDF + - `creativeXId.id`: CreativeX identifier + - `creativeXId.url`: CreativeX URL + - `ferreroCreativeQuality.percentage`: Quality score + +### Database Storage +- **Table:** `creativex_scores` +- **Quick Access Fields:** filename, creativex_id, creativex_url, quality_score +- **Full JSON:** Stored in `full_extraction_data` JSONB column +- **Purpose:** Future lookups by filename during DAM uploads + +### Email Notifications + +**Recipients configured in .env:** +- Success: `REPORT_EMAILS` +- Errors: `ERROR_EMAIL` + +**Templates:** +1. `creativex_complete` - All files processed successfully +2. `creativex_partial` - Some files failed +3. `creativex_no_files` - No PDFs found (normal if folder empty) + +## Usage + +### Manual Execution + +```bash +cd /opt/ferrero-automation/Python-Version +source venv/bin/activate +python scripts/creativex_scoring_storing.py +``` + +### Workflow + +1. Upload PDFs to Box folder 350605024645 +2. Run script (manual or cron) +3. Script downloads each PDF +4. LlamaExtract processes PDF +5. Results stored in database +6. PDF deleted from Box +7. Email notification sent + +### Checking Results + +```bash +# View recent extractions +PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c " +SELECT filename, creativex_id, quality_score, extracted_at +FROM creativex_scores +ORDER BY extracted_at DESC +LIMIT 10; +" + +# Count total scores +PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c " +SELECT COUNT(*) as total_scores FROM creativex_scores WHERE status = 'active'; +" + +# View specific file +PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c " +SELECT * FROM creativex_scores WHERE filename LIKE '%yourfile%'; +" +``` + +### Viewing Full JSON + +```bash +PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c " +SELECT filename, full_extraction_data::jsonb +FROM creativex_scores +WHERE filename = 'example.pdf'; +" +``` + +## Future Integration + +The database method `db.get_creativex_score_by_filename(filename)` is ready for use in other scripts. + +**Example usage in future DAM upload workflow:** + +```python +# In a2_to_a3_upload_polling.py or similar +filename = "Brand_Country_Language_123456.mp4" + +# Lookup CreativeX score +score_data = db.get_creativex_score_by_filename(filename) + +if score_data: + # Add to DAM metadata + dam_metadata['FERRERO.FIELD.CREATIVEX_SCORE'] = score_data['quality_score'] + dam_metadata['FERRERO.FIELD.CREATIVEX_URL'] = score_data['creativex_url'] + dam_metadata['FERRERO.FIELD.CREATIVEX_ID'] = score_data['creativex_id'] +``` + +## Troubleshooting + +### "llama-cloud-services not installed" +```bash +source venv/bin/activate +pip install llama-cloud-services +``` + +### "Agent 'Creativex-Extract' not found" +- Verify agent name in LlamaCloud portal +- Check spelling matches exactly: `Creativex-Extract` +- Verify API key is correct + +### "No PDF files found" +- This is normal if Box folder 350605024645 is empty +- Upload test PDF to folder and re-run + +### "Database connection failed" +```bash +# Check PostgreSQL is running +docker ps | grep ferrero + +# Test connection +PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT 1;" +``` + +### "Email not sending" +- Check SMTP configuration in .env +- Verify Mailgun credentials +- Check logs for detailed error + +### Files not deleted from Box +- This is expected for failed extractions +- Only successful extractions delete files +- Failed files remain for manual review/retry + +## Rollback Instructions + +If you need to rollback: + +### Remove Database Table +```bash +PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c " +DROP TABLE IF EXISTS creativex_scores CASCADE; +" +``` + +### Remove from Cron +```bash +crontab -e +# Delete the CreativeX line, save and exit +``` + +### Revert Code +```bash +cd /opt/ferrero-automation/Python-Version +git revert +git push origin main +``` + +## Support + +- **Logs:** `logs/creativex_scoring.log` +- **Database Queries:** See "Checking Results" section above +- **Email Test:** Check SMTP settings and recipients list +- **LlamaCloud Issues:** Verify API key and agent configuration + +## Summary Checklist + +**Local Setup:** +- [ ] Add `LLAMA_CLOUD_API_KEY` to .env +- [ ] Install `llama-cloud-services` package +- [ ] Create `creativex_scores` table +- [ ] Test script runs successfully + +**Production Deployment:** +- [ ] Git pull latest code +- [ ] Add `LLAMA_CLOUD_API_KEY` to server .env +- [ ] Install dependencies on server +- [ ] Create database table on server +- [ ] Test run on server +- [ ] Verify email notifications +- [ ] (Optional) Add to cron if automating + +**Post-Deployment:** +- [ ] Upload test PDF to Box folder 350605024645 +- [ ] Run script and verify extraction +- [ ] Check database record created +- [ ] Verify PDF deleted from Box +- [ ] Confirm email notification received diff --git a/Python-Version/config/config.yaml b/Python-Version/config/config.yaml index 47b69b0..908c97b 100644 --- a/Python-Version/config/config.yaml +++ b/Python-Version/config/config.yaml @@ -95,6 +95,12 @@ notifications: fields: mappings_file: config/field_mappings.yaml +# CreativeX Configuration +creativex: + llama_api_key: ${LLAMA_CLOUD_API_KEY} + agent_name: Creativex-Extract + box_folder_id: "350605024645" + # Logging Configuration logging: level: INFO diff --git a/Python-Version/database/init.sql b/Python-Version/database/init.sql index f3bb1ac..5cc2135 100644 --- a/Python-Version/database/init.sql +++ b/Python-Version/database/init.sql @@ -172,6 +172,35 @@ CREATE TABLE IF NOT EXISTS campaign_status ( \echo 'Table campaign_status created' +-- ============================================================================ +-- Table: creativex_scores +-- Purpose: Stores CreativeX quality scores extracted from PDFs via LlamaExtract +-- ============================================================================ + +CREATE TABLE IF NOT EXISTS creativex_scores ( + -- Primary Key + id SERIAL PRIMARY KEY, + + -- File Information + filename VARCHAR(500) NOT NULL, + box_file_id VARCHAR(255), + + -- CreativeX Data (parsed fields for quick access) + creativex_id VARCHAR(255), + creativex_url TEXT, + quality_score VARCHAR(50), + + -- Full Extraction Data (JSONB - Complete LlamaExtract response for future use) + full_extraction_data JSONB, + + -- Timestamps + extracted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + status VARCHAR(50) DEFAULT 'active', + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); + +\echo 'Table creativex_scores created' + \echo 'Tables created successfully' -- ============================================================================ @@ -211,6 +240,11 @@ CREATE INDEX IF NOT EXISTS idx_campaign_status_status ON campaign_status(status) CREATE INDEX IF NOT EXISTS idx_campaign_status_live ON campaign_status(live_campaign); CREATE INDEX IF NOT EXISTS idx_campaign_status_webhook_sent ON campaign_status(webhook_sent); +-- creativex_scores indexes +CREATE INDEX IF NOT EXISTS idx_creativex_filename ON creativex_scores(filename); +CREATE INDEX IF NOT EXISTS idx_creativex_box_file ON creativex_scores(box_file_id); +CREATE INDEX IF NOT EXISTS idx_creativex_status ON creativex_scores(status); + \echo 'Indexes created successfully' -- ============================================================================ @@ -323,8 +357,10 @@ GRANT USAGE ON SCHEMA public TO ferrero_user; \echo ' - derivative_assets' \echo ' - asset_events' \echo ' - workflow_state' +\echo ' - campaign_status' +\echo ' - creativex_scores' \echo '' -\echo 'Indexes created: 12' +\echo 'Indexes created: 15' \echo 'Triggers created: 4' \echo 'Functions created: 2' \echo '' diff --git a/Python-Version/requirements.txt b/Python-Version/requirements.txt index 0cb95af..fab5da0 100644 --- a/Python-Version/requirements.txt +++ b/Python-Version/requirements.txt @@ -24,6 +24,9 @@ cryptography>=3.4.0 # Email templates Jinja2>=3.0.0 +# LlamaExtract for CreativeX score extraction +llama-cloud-services>=0.1.0 + # Retry logic tenacity>=8.0.0 diff --git a/Python-Version/scripts/creativex_scoring_storing.py b/Python-Version/scripts/creativex_scoring_storing.py new file mode 100755 index 0000000..2499445 --- /dev/null +++ b/Python-Version/scripts/creativex_scoring_storing.py @@ -0,0 +1,396 @@ +#!/usr/bin/env python3 +""" +CreativeX Score Extractor and Storage +Processes PDFs from Box folder 350605024645, extracts CreativeX scores using LlamaExtract, +stores results in database, and removes processed files from Box. +Compatible with Python 3.6+ +""" + +import sys +import os +import logging +from datetime import datetime +from pathlib import Path + +# Add shared library to path +sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..')) + +from shared.config_loader import load_config +from shared.box_client import BoxClient +from shared.database import Database +from shared.notifier import Notifier + +# Setup logging with rotation +from logging.handlers import RotatingFileHandler + +# Create logs directory if it doesn't exist +os.makedirs('logs', exist_ok=True) + +# Configure logging with rotation +log_handler = RotatingFileHandler( + 'logs/creativex_scoring.log', + maxBytes=10*1024*1024, # 10MB per file + backupCount=28 # Keep 28 rotated files (approximately 1 month) +) +log_handler.setLevel(logging.INFO) +log_handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')) + +console_handler = logging.StreamHandler() +console_handler.setLevel(logging.INFO) +console_handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')) + +logging.basicConfig( + level=logging.INFO, + handlers=[log_handler, console_handler] +) + +logger = logging.getLogger('CreativeXScoring') + + +class CreativeXExtractor: + """Handles extraction of CreativeX data from PDF files using LlamaExtract.""" + + def __init__(self, api_key, agent_name): + """ + Initialize the Llama Extract client. + + Args: + api_key: LlamaCloud API key + agent_name: Agent name in LlamaExtract + """ + try: + from llama_cloud_services import LlamaExtract + self.extractor = LlamaExtract(api_key=api_key) + self.agent_name = agent_name + logger.info("LlamaExtract client initialized with agent: {}".format(agent_name)) + except ImportError: + logger.error("llama-cloud-services not installed. Run: pip install llama-cloud-services") + raise + except Exception as e: + logger.error("Failed to initialize LlamaExtract: {}".format(str(e))) + raise + + def extract_from_file(self, file_path): + """ + Extract data from a PDF file using Llama Extract. + + Args: + file_path: Path to the PDF file + + Returns: + Dictionary containing the extraction result, or None if extraction fails + """ + try: + logger.info(" Getting agent: {}".format(self.agent_name)) + agent = self.extractor.get_agent(name=self.agent_name) + + if agent is None: + raise Exception("Agent '{}' not found".format(self.agent_name)) + + logger.info(" Running extraction on: {}".format(os.path.basename(file_path))) + result = agent.extract(str(file_path)) + + # Convert result to dictionary format + extraction_data = { + "run_id": getattr(result, "run_id", None), + "extraction_agent_id": getattr(result, "extraction_agent_id", None), + "data": result.data if hasattr(result, "data") else {}, + "extraction_metadata": getattr(result, "extraction_metadata", {}) + } + + return extraction_data + + except Exception as e: + logger.error(" ERROR: Extraction failed - {}".format(str(e))) + return None + + def parse_csv_fields(self, extraction_data): + """ + Parse specific fields for database storage. + + Expected fields: + - filename + - creativeXId.id + - creativeXId.url + - ferreroCreativeQuality.percentage + + Args: + extraction_data: Full extraction result dictionary + + Returns: + Dictionary with parsed fields, or None if required fields are missing + """ + try: + data = extraction_data.get("data", {}) + + # Extract filename + filename = data.get("filename", "") + + # Extract creativeXId fields + creative_x_id_obj = data.get("creativeXId", {}) + creative_x_id = creative_x_id_obj.get("id", "") if isinstance(creative_x_id_obj, dict) else "" + creative_x_url = creative_x_id_obj.get("url", "") if isinstance(creative_x_id_obj, dict) else "" + + # Extract ferreroCreativeQuality percentage + ferrero_quality_obj = data.get("ferreroCreativeQuality", {}) + quality_score = ferrero_quality_obj.get("percentage", "") if isinstance(ferrero_quality_obj, dict) else "" + + # Validate that we have the critical fields + if not filename: + logger.warning(" WARNING: filename field is missing from extraction data") + + return { + "filename": filename, + "id": creative_x_id, + "url": creative_x_url, + "score": quality_score + } + + except Exception as e: + logger.error(" ERROR: Failed to parse CSV fields - {}".format(str(e))) + return None + + +def process_pdfs(box_client, db, extractor, notifier, config): + """ + Process all PDFs in the CreativeX Box folder. + + Args: + box_client: BoxClient instance + db: Database instance + extractor: CreativeXExtractor instance + notifier: Notifier instance + config: Configuration dict + + Returns: + dict with processing results + """ + creativex_folder_id = config['creativex']['box_folder_id'] + + logger.info("=" * 60) + logger.info("CreativeX Score Extraction") + logger.info("=" * 60) + logger.info("Box Folder ID: {}".format(creativex_folder_id)) + logger.info("") + + try: + # List all PDF files in Box folder + files = box_client.list_folder_files(creativex_folder_id) + pdf_files = [f for f in files if f['name'].lower().endswith('.pdf')] + + if not pdf_files: + logger.info("No PDF files found in Box folder") + + # Send email notification + notifier.send_email( + template_name='creativex_no_files', + recipients=config['notifications']['recipients']['success'], + data={ + 'timestamp': datetime.now().strftime("%Y-%m-%d %H:%M:%S") + } + ) + + return {'success': True, 'file_count': 0, 'processed': 0, 'failed': 0} + + logger.info("Found {} PDF file(s) to process".format(len(pdf_files))) + logger.info("") + + # Create temp directory + temp_dir = Path('temp/creativex') + temp_dir.mkdir(parents=True, exist_ok=True) + + # Track results + processed_files = [] + failed_files = [] + + # Process each PDF + for idx, file_info in enumerate(pdf_files, 1): + file_id = file_info['id'] + filename = file_info['name'] + + logger.info("[{}/{}] Processing: {}".format(idx, len(pdf_files), filename)) + + try: + # 1. Download PDF from Box + temp_file_path = temp_dir / filename + box_client.download_file(file_id, str(temp_file_path)) + + # 2. Extract data using LlamaExtract + extraction_data = extractor.extract_from_file(str(temp_file_path)) + + if extraction_data is None: + raise Exception("Extraction returned None") + + # 3. Parse fields + parsed_fields = extractor.parse_csv_fields(extraction_data) + + if not parsed_fields: + raise Exception("Failed to parse extraction fields") + + # 4. Store in database with full JSON + db_result = db.store_creativex_score( + filename=parsed_fields['filename'], + creativex_id=parsed_fields['id'], + creativex_url=parsed_fields['url'], + quality_score=parsed_fields['score'], + box_file_id=file_id, + full_extraction_data=extraction_data + ) + + if not db_result['success']: + raise Exception("Database storage failed: {}".format(db_result.get('error', 'Unknown'))) + + # 5. Delete file from Box (only after successful storage) + try: + box_file = box_client.client.file(file_id) + box_file.delete() + logger.info(" Deleted from Box: {}".format(filename)) + except Exception as e: + logger.warning(" Could not delete file from Box: {}".format(str(e))) + # Don't fail the whole process if delete fails + + # 6. Clean up local temp file + try: + os.remove(str(temp_file_path)) + except Exception as e: + logger.warning(" Could not delete temp file: {}".format(str(e))) + + # Track success + processed_files.append({ + 'filename': parsed_fields['filename'], + 'creativex_id': parsed_fields['id'], + 'creativex_url': parsed_fields['url'], + 'quality_score': parsed_fields['score'], + 'box_file_id': file_id + }) + + logger.info(" ✓ Success: Score {} extracted and stored".format(parsed_fields['score'])) + logger.info("") + + except Exception as e: + logger.error(" ✗ Failed: {}".format(str(e))) + logger.info("") + + failed_files.append({ + 'filename': filename, + 'box_file_id': file_id, + 'error': str(e) + }) + + # Clean up temp file if it exists + try: + temp_file_path = temp_dir / filename + if temp_file_path.exists(): + os.remove(str(temp_file_path)) + except: + pass + + # Summary + total_files = len(pdf_files) + success_count = len(processed_files) + failed_count = len(failed_files) + + logger.info("=" * 60) + logger.info("Processing Complete") + logger.info("=" * 60) + logger.info("Total Files: {}".format(total_files)) + logger.info("Successful: {}".format(success_count)) + logger.info("Failed: {}".format(failed_count)) + logger.info("") + + # Send email notification + if failed_count == 0: + # All successful + notifier.send_email( + template_name='creativex_complete', + recipients=config['notifications']['recipients']['success'], + data={ + 'file_count': total_files, + 'success_count': success_count, + 'processed_files': processed_files + } + ) + else: + # Partial success + notifier.send_email( + template_name='creativex_partial', + recipients=config['notifications']['recipients']['errors'], + data={ + 'file_count': total_files, + 'success_count': success_count, + 'failed_count': failed_count, + 'processed_files': processed_files, + 'failed_files': failed_files + } + ) + + return { + 'success': success_count > 0, + 'file_count': total_files, + 'processed': success_count, + 'failed': failed_count + } + + except Exception as e: + logger.error("FATAL ERROR: {}".format(str(e))) + return {'success': False, 'error': str(e)} + + +def main(): + """Entry point.""" + try: + logger.info("Starting CreativeX Score Extraction") + logger.info("") + + # Load configuration + config = load_config('config/config.yaml') + + # Initialize clients + logger.info("Initializing clients...") + + # Box client for CreativeX folder + box = BoxClient(config, root_folder_id=config['creativex']['box_folder_id']) + + # Database + db = Database(config) + + # Notifier + notifier = Notifier(config) + + # CreativeX Extractor + extractor = CreativeXExtractor( + api_key=config['creativex']['llama_api_key'], + agent_name=config['creativex']['agent_name'] + ) + + logger.info("Clients initialized successfully") + logger.info("") + + # Process PDFs + result = process_pdfs(box, db, extractor, notifier, config) + + if result['success']: + logger.info("✓ CreativeX extraction completed successfully") + sys.exit(0) + else: + logger.error("✗ CreativeX extraction failed") + sys.exit(1) + + except KeyboardInterrupt: + logger.info("\n\nProcess interrupted by user.") + sys.exit(1) + except Exception as e: + logger.error("\nFATAL ERROR: {}".format(str(e))) + import traceback + traceback.print_exc() + sys.exit(1) + finally: + # Close database connections + try: + db.close() + except: + pass + + +if __name__ == "__main__": + main() diff --git a/Python-Version/scripts/shared/database.py b/Python-Version/scripts/shared/database.py index 273b4d5..5dc66e6 100644 --- a/Python-Version/scripts/shared/database.py +++ b/Python-Version/scripts/shared/database.py @@ -536,6 +536,103 @@ class Database: cursor.close() self.put_connection(conn) + def store_creativex_score(self, filename, creativex_id, creativex_url, quality_score, box_file_id, full_extraction_data): + """ + Store CreativeX score data extracted from PDF + + Args: + filename: Original filename from extraction + creativex_id: CreativeX ID from extraction + creativex_url: CreativeX URL from extraction + quality_score: Quality score percentage + box_file_id: Box file ID for tracking + full_extraction_data: Complete LlamaExtract JSON response + + Returns: + dict with success boolean + """ + conn = self.get_connection() + try: + cursor = conn.cursor() + + # Convert full_extraction_data to JSON string if it's a dict + import json + full_json = json.dumps(full_extraction_data) if isinstance(full_extraction_data, dict) else full_extraction_data + + cursor.execute(""" + INSERT INTO creativex_scores ( + filename, creativex_id, creativex_url, quality_score, + box_file_id, full_extraction_data + ) VALUES (%s, %s, %s, %s, %s, %s) + """, ( + filename, + creativex_id, + creativex_url, + quality_score, + box_file_id, + full_json + )) + + conn.commit() + logger.info("Stored CreativeX score: {} (Score: {})".format(filename, quality_score)) + + return {'success': True} + + except Exception as e: + conn.rollback() + logger.error("Failed to store CreativeX score: {}".format(str(e))) + return {'success': False, 'error': str(e)} + + finally: + cursor.close() + self.put_connection(conn) + + def get_creativex_score_by_filename(self, filename): + """ + Get CreativeX score data by filename + + Args: + filename: Filename to search for + + Returns: + dict with creativex data or None if not found + """ + conn = self.get_connection() + try: + cursor = conn.cursor() + + cursor.execute(""" + SELECT filename, creativex_id, creativex_url, quality_score, + box_file_id, full_extraction_data, extracted_at + FROM creativex_scores + WHERE filename = %s AND status = 'active' + ORDER BY extracted_at DESC + LIMIT 1 + """, (filename,)) + + row = cursor.fetchone() + + if not row: + return None + + # Parse JSONB as dict + import json + full_data = row[5] if isinstance(row[5], dict) else json.loads(row[5]) + + return { + 'filename': row[0], + 'creativex_id': row[1], + 'creativex_url': row[2], + 'quality_score': row[3], + 'box_file_id': row[4], + 'full_extraction_data': full_data, + 'extracted_at': row[6] + } + + finally: + cursor.close() + self.put_connection(conn) + def close(self): """Close all connections in pool""" if self.pool: diff --git a/Python-Version/scripts/shared/notifier.py b/Python-Version/scripts/shared/notifier.py index c15f44a..e17c6f8 100644 --- a/Python-Version/scripts/shared/notifier.py +++ b/Python-Version/scripts/shared/notifier.py @@ -678,6 +678,110 @@ class Notifier: """ + }, + 'creativex_complete': { + 'subject': "✅ CreativeX Scores Extracted - {file_count} files processed", + 'html': """ +
+
+

✅ CreativeX Score Extraction Complete

+
+ +
+

Files Processed: {{ file_count }}

+

Scores Extracted: {{ success_count }}

+

Source: Box Folder 350605024645

+
+ +

Extracted Scores:

+ {% for score in processed_files %} +
+
+ {{ score.filename }} +
+
+

Quality Score: {{ score.quality_score }}

+

CreativeX ID: {{ score.creativex_id }}

+ {% if score.creativex_url %}

CreativeX URL: {{ score.creativex_url }}

{% endif %} +

Box File ID: {{ score.box_file_id }}

+
+
+ {% endfor %} + +
+

✓ Complete: All scores extracted and stored in database.

+

Files Removed: Processed PDFs deleted from Box folder.

+
+ +

CreativeX scores stored with full JSON for future lookups.

+
+ """ + }, + 'creativex_partial': { + 'subject': "⚠️ CreativeX Extraction Partial - {success_count}/{file_count} successful", + 'html': """ +
+
+

⚠️ CreativeX Extraction Partially Complete

+
+ +
+

Total Files: {{ file_count }}

+

✓ Successful: {{ success_count }}

+

✗ Failed: {{ failed_count }}

+

Source: Box Folder 350605024645

+
+ + {% if processed_files %} +

✅ Successful Extractions ({{ success_count }}):

+ {% for score in processed_files %} +
+ {{ score.filename }} - Score: {{ score.quality_score }} +
+ {% endfor %} + {% endif %} + + {% if failed_files %} +

❌ Failed Extractions ({{ failed_count }}):

+ {% for file in failed_files %} +
+ {{ file.filename }} +
Error: {{ file.error }} +
+ {% endfor %} + {% endif %} + +
+

⚠️ Action Required: Review failed extractions above.

+

Failed files remain in Box folder for retry.

+
+ +

Successful scores stored in database. Failed files not deleted from Box.

+
+ """ + }, + 'creativex_no_files': { + 'subject': "ℹ️ CreativeX Extraction - No files found", + 'html': """ +
+
+

ℹ️ CreativeX Extraction - No Files

+
+ +
+

Status: No PDF files found

+

Source: Box Folder 350605024645

+

Run Time: {{ timestamp }}

+
+ +
+

ℹ️ Note: This is expected behavior when no new PDFs are ready for processing.

+

Upload PDFs to Box folder 350605024645 to process CreativeX scores.

+
+ +

Script completed successfully with no errors.

+
+ """ } }