ferrero-opentext/Python-Version/CREATIVEX_VERSION_UPDATES.md
DJP 6fee0cc725 Add version tracking and remove .0 decimals from CreativeX scores
Implements version counter for re-scored files and cleans up numeric formatting.

Decimal Removal:
- Strip .0 suffix from creativex_id (6864255.0 → 6864255)
- Strip .0 suffix from quality_score (80.0 → 80)
- Converts float → int → string before storing
- Cleaner data for display and DAM integration

Version Tracking:
- Counts total versions per filename (active + superseded)
- Returns version_number in database result
- Logs show version: "Score 80 extracted (Version 3)"
- Email templates display version badges for updates

Email Template Updates:
- Complete template: Shows "Version 3 (Updated)" badge in header
- Includes note: "This is version 3 of this file"
- Partial template: Shows "(Version 3)" inline
- Only displays version info if > 1

Database Changes:
- Query counts ALL versions before insert
- Returns version_number in result dict
- Logs include version in success/update messages

Benefits:
- Clean numeric values without unnecessary decimals
- Users can see if file was re-scored
- Version history visible in emails
- Still preserves all history in database
- A2→A3 integration unaffected (always gets latest active)

Example progression:
Upload 1: Score 80 (no version shown - it's the first)
Upload 2: Score 85 (Version 2 badge shown)
Upload 3: Score 90 (Version 3 badge shown)

Documentation: CREATIVEX_VERSION_UPDATES.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 16:55:07 -05:00

134 lines
3.9 KiB
Markdown

# CreativeX Score Version Tracking & Decimal Removal
## Changes Made
### 1. Decimal Removal (.0 suffix)
**Problem:** CreativeX ID and Quality Score were storing as `6864255.0` and `80.0`
**Solution:** Strip `.0` decimals before storing
**Code Changes:**
- `scripts/creativex_scoring_storing.py` - `parse_csv_fields()` method
- Converts to int then back to string: `str(int(float(value)))`
- Applied to both `creativex_id` and `quality_score`
**Result:**
- Before: ID=`6864255.0`, Score=`80.0`
- After: ID=`6864255`, Score=`80`
### 2. Version Number Tracking
**Feature:** Track how many times each file has been scored
**Implementation:** Count total records for filename (including superseded)
**Code Changes:**
#### Database Method (`scripts/shared/database.py`):
- Added version counter query before insert
- Counts ALL versions (active + superseded) for filename
- Returns `version_number` in result dict
- Logs include version: "Score: 80 -> 85, Version: 3"
#### Script (`scripts/creativex_scoring_storing.py`):
- Captures `version_number` from database result
- Passes to email template in `processed_files` list
- Logs show version: "Success: Score 85 extracted (Version 3)"
#### Email Templates (`scripts/shared/notifier.py`):
- **Complete Template:**
- Shows version badge in header: "Version 3 (Updated)"
- Only displays if `version_number > 1`
- Note below: "This is version 3 of this file"
- **Partial Template:**
- Shows inline: "Score: 85 (Version 3)"
## Example Email Output
### First Upload (New File):
```
Filename: video.mp4
Quality Score: 80
CreativeX ID: 6864255
```
### Third Upload (Re-scored):
```
Filename: video.mp4 [Version 3 (Updated)]
Quality Score: 85
CreativeX ID: 6864255
📝 Note: This is version 3 of this file (previous versions preserved in database)
```
## Database Behavior
### Version Counter Logic:
1. Count ALL records with this filename
2. New version = count + 1
3. Mark old `active``superseded`
4. Insert new record as `active` with incremented version
### Example Database State:
```
ID | Filename | Score | Status | Version (implicit)
1 | video.mp4 | 80 | superseded | 1 (first)
2 | video.mp4 | 85 | superseded | 2 (second)
3 | video.mp4 | 90 | active | 3 (current)
```
## Query Examples
### Get Latest Version Only:
```sql
SELECT * FROM creativex_scores
WHERE filename = 'video.mp4' AND status = 'active';
```
### Get Version Count for File:
```sql
SELECT COUNT(*) as version_count
FROM creativex_scores
WHERE filename = 'video.mp4';
```
### Get All Versions with Numbers:
```sql
SELECT
filename,
quality_score,
status,
ROW_NUMBER() OVER (PARTITION BY filename ORDER BY created_at) as version_number,
extracted_at
FROM creativex_scores
WHERE filename = 'video.mp4'
ORDER BY extracted_at;
```
## Testing Checklist
- [ ] Upload PDF to Box folder 350605024645
- [ ] Run script: `python scripts/creativex_scoring_storing.py`
- [ ] Check logs show version number
- [ ] Check database: ID and Score have no `.0`
- [ ] Check email shows version badge (if > 1)
- [ ] Re-upload same PDF with different score
- [ ] Verify version counter increments
- [ ] Verify old record marked `superseded`
## Benefits
1. **Clean Data:** No unnecessary `.0` decimals in IDs and scores
2. **Version Tracking:** Know if file has been re-scored
3. **History Preserved:** All previous scores available for audit
4. **Email Clarity:** Users see when a file is being updated vs new
5. **A2→A3 Ready:** Latest version automatically selected via `status='active'`
## Future Use in A2→A3
The version tracking is informational only. The `get_creativex_score_by_filename()` method automatically returns the latest `active` version, so A2→A3 workflow doesn't need to worry about versions.
```python
# This always returns the latest version
score_data = db.get_creativex_score_by_filename(filename)
# score_data['quality_score'] will be "90" (not "90.0")
# score_data['creativex_id'] will be "6864255" (not "6864255.0")
```