ferrero-opentext/Python-Version/MARKDOWN_DOCS/CREATIVEX_VERSION_UPDATES.md

3.9 KiB

CreativeX Score Version Tracking & Decimal Removal

Changes Made

1. Decimal Removal (.0 suffix)

Problem: CreativeX ID and Quality Score were storing as 6864255.0 and 80.0 Solution: Strip .0 decimals before storing

Code Changes:

  • scripts/creativex_scoring_storing.py - parse_csv_fields() method
  • Converts to int then back to string: str(int(float(value)))
  • Applied to both creativex_id and quality_score

Result:

  • Before: ID=6864255.0, Score=80.0
  • After: ID=6864255, Score=80

2. Version Number Tracking

Feature: Track how many times each file has been scored Implementation: Count total records for filename (including superseded)

Code Changes:

Database Method (scripts/shared/database.py):

  • Added version counter query before insert
  • Counts ALL versions (active + superseded) for filename
  • Returns version_number in result dict
  • Logs include version: "Score: 80 -> 85, Version: 3"

Script (scripts/creativex_scoring_storing.py):

  • Captures version_number from database result
  • Passes to email template in processed_files list
  • Logs show version: "Success: Score 85 extracted (Version 3)"

Email Templates (scripts/shared/notifier.py):

  • Complete Template:

    • Shows version badge in header: "Version 3 (Updated)"
    • Only displays if version_number > 1
    • Note below: "This is version 3 of this file"
  • Partial Template:

    • Shows inline: "Score: 85 (Version 3)"

Example Email Output

First Upload (New File):

Filename: video.mp4
Quality Score: 80
CreativeX ID: 6864255

Third Upload (Re-scored):

Filename: video.mp4    [Version 3 (Updated)]
Quality Score: 85
CreativeX ID: 6864255

📝 Note: This is version 3 of this file (previous versions preserved in database)

Database Behavior

Version Counter Logic:

  1. Count ALL records with this filename
  2. New version = count + 1
  3. Mark old activesuperseded
  4. Insert new record as active with incremented version

Example Database State:

ID | Filename   | Score | Status     | Version (implicit)
1  | video.mp4  | 80    | superseded | 1 (first)
2  | video.mp4  | 85    | superseded | 2 (second)
3  | video.mp4  | 90    | active     | 3 (current)

Query Examples

Get Latest Version Only:

SELECT * FROM creativex_scores
WHERE filename = 'video.mp4' AND status = 'active';

Get Version Count for File:

SELECT COUNT(*) as version_count
FROM creativex_scores
WHERE filename = 'video.mp4';

Get All Versions with Numbers:

SELECT
    filename,
    quality_score,
    status,
    ROW_NUMBER() OVER (PARTITION BY filename ORDER BY created_at) as version_number,
    extracted_at
FROM creativex_scores
WHERE filename = 'video.mp4'
ORDER BY extracted_at;

Testing Checklist

  • Upload PDF to Box folder 350605024645
  • Run script: python scripts/creativex_scoring_storing.py
  • Check logs show version number
  • Check database: ID and Score have no .0
  • Check email shows version badge (if > 1)
  • Re-upload same PDF with different score
  • Verify version counter increments
  • Verify old record marked superseded

Benefits

  1. Clean Data: No unnecessary .0 decimals in IDs and scores
  2. Version Tracking: Know if file has been re-scored
  3. History Preserved: All previous scores available for audit
  4. Email Clarity: Users see when a file is being updated vs new
  5. A2→A3 Ready: Latest version automatically selected via status='active'

Future Use in A2→A3

The version tracking is informational only. The get_creativex_score_by_filename() method automatically returns the latest active version, so A2→A3 workflow doesn't need to worry about versions.

# This always returns the latest version
score_data = db.get_creativex_score_by_filename(filename)
# score_data['quality_score'] will be "90" (not "90.0")
# score_data['creativex_id'] will be "6864255" (not "6864255.0")