ferrero-opentext/Python-Version/CREATIVEX_VERSION_UPDATES.md
DJP 6fee0cc725 Add version tracking and remove .0 decimals from CreativeX scores
Implements version counter for re-scored files and cleans up numeric formatting.

Decimal Removal:
- Strip .0 suffix from creativex_id (6864255.0 → 6864255)
- Strip .0 suffix from quality_score (80.0 → 80)
- Converts float → int → string before storing
- Cleaner data for display and DAM integration

Version Tracking:
- Counts total versions per filename (active + superseded)
- Returns version_number in database result
- Logs show version: "Score 80 extracted (Version 3)"
- Email templates display version badges for updates

Email Template Updates:
- Complete template: Shows "Version 3 (Updated)" badge in header
- Includes note: "This is version 3 of this file"
- Partial template: Shows "(Version 3)" inline
- Only displays version info if > 1

Database Changes:
- Query counts ALL versions before insert
- Returns version_number in result dict
- Logs include version in success/update messages

Benefits:
- Clean numeric values without unnecessary decimals
- Users can see if file was re-scored
- Version history visible in emails
- Still preserves all history in database
- A2→A3 integration unaffected (always gets latest active)

Example progression:
Upload 1: Score 80 (no version shown - it's the first)
Upload 2: Score 85 (Version 2 badge shown)
Upload 3: Score 90 (Version 3 badge shown)

Documentation: CREATIVEX_VERSION_UPDATES.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 16:55:07 -05:00

3.9 KiB

CreativeX Score Version Tracking & Decimal Removal

Changes Made

1. Decimal Removal (.0 suffix)

Problem: CreativeX ID and Quality Score were storing as 6864255.0 and 80.0 Solution: Strip .0 decimals before storing

Code Changes:

  • scripts/creativex_scoring_storing.py - parse_csv_fields() method
  • Converts to int then back to string: str(int(float(value)))
  • Applied to both creativex_id and quality_score

Result:

  • Before: ID=6864255.0, Score=80.0
  • After: ID=6864255, Score=80

2. Version Number Tracking

Feature: Track how many times each file has been scored Implementation: Count total records for filename (including superseded)

Code Changes:

Database Method (scripts/shared/database.py):

  • Added version counter query before insert
  • Counts ALL versions (active + superseded) for filename
  • Returns version_number in result dict
  • Logs include version: "Score: 80 -> 85, Version: 3"

Script (scripts/creativex_scoring_storing.py):

  • Captures version_number from database result
  • Passes to email template in processed_files list
  • Logs show version: "Success: Score 85 extracted (Version 3)"

Email Templates (scripts/shared/notifier.py):

  • Complete Template:

    • Shows version badge in header: "Version 3 (Updated)"
    • Only displays if version_number > 1
    • Note below: "This is version 3 of this file"
  • Partial Template:

    • Shows inline: "Score: 85 (Version 3)"

Example Email Output

First Upload (New File):

Filename: video.mp4
Quality Score: 80
CreativeX ID: 6864255

Third Upload (Re-scored):

Filename: video.mp4    [Version 3 (Updated)]
Quality Score: 85
CreativeX ID: 6864255

📝 Note: This is version 3 of this file (previous versions preserved in database)

Database Behavior

Version Counter Logic:

  1. Count ALL records with this filename
  2. New version = count + 1
  3. Mark old activesuperseded
  4. Insert new record as active with incremented version

Example Database State:

ID | Filename   | Score | Status     | Version (implicit)
1  | video.mp4  | 80    | superseded | 1 (first)
2  | video.mp4  | 85    | superseded | 2 (second)
3  | video.mp4  | 90    | active     | 3 (current)

Query Examples

Get Latest Version Only:

SELECT * FROM creativex_scores
WHERE filename = 'video.mp4' AND status = 'active';

Get Version Count for File:

SELECT COUNT(*) as version_count
FROM creativex_scores
WHERE filename = 'video.mp4';

Get All Versions with Numbers:

SELECT
    filename,
    quality_score,
    status,
    ROW_NUMBER() OVER (PARTITION BY filename ORDER BY created_at) as version_number,
    extracted_at
FROM creativex_scores
WHERE filename = 'video.mp4'
ORDER BY extracted_at;

Testing Checklist

  • Upload PDF to Box folder 350605024645
  • Run script: python scripts/creativex_scoring_storing.py
  • Check logs show version number
  • Check database: ID and Score have no .0
  • Check email shows version badge (if > 1)
  • Re-upload same PDF with different score
  • Verify version counter increments
  • Verify old record marked superseded

Benefits

  1. Clean Data: No unnecessary .0 decimals in IDs and scores
  2. Version Tracking: Know if file has been re-scored
  3. History Preserved: All previous scores available for audit
  4. Email Clarity: Users see when a file is being updated vs new
  5. A2→A3 Ready: Latest version automatically selected via status='active'

Future Use in A2→A3

The version tracking is informational only. The get_creativex_score_by_filename() method automatically returns the latest active version, so A2→A3 workflow doesn't need to worry about versions.

# This always returns the latest version
score_data = db.get_creativex_score_by_filename(filename)
# score_data['quality_score'] will be "90" (not "90.0")
# score_data['creativex_id'] will be "6864255" (not "6864255.0")