ferrero-opentext/Python-Version/MARKDOWN_DOCS/CREATIVEX_VERSION_UPDATES.md

134 lines
3.9 KiB
Markdown

# CreativeX Score Version Tracking & Decimal Removal
## Changes Made
### 1. Decimal Removal (.0 suffix)
**Problem:** CreativeX ID and Quality Score were storing as `6864255.0` and `80.0`
**Solution:** Strip `.0` decimals before storing
**Code Changes:**
- `scripts/creativex_scoring_storing.py` - `parse_csv_fields()` method
- Converts to int then back to string: `str(int(float(value)))`
- Applied to both `creativex_id` and `quality_score`
**Result:**
- Before: ID=`6864255.0`, Score=`80.0`
- After: ID=`6864255`, Score=`80`
### 2. Version Number Tracking
**Feature:** Track how many times each file has been scored
**Implementation:** Count total records for filename (including superseded)
**Code Changes:**
#### Database Method (`scripts/shared/database.py`):
- Added version counter query before insert
- Counts ALL versions (active + superseded) for filename
- Returns `version_number` in result dict
- Logs include version: "Score: 80 -> 85, Version: 3"
#### Script (`scripts/creativex_scoring_storing.py`):
- Captures `version_number` from database result
- Passes to email template in `processed_files` list
- Logs show version: "Success: Score 85 extracted (Version 3)"
#### Email Templates (`scripts/shared/notifier.py`):
- **Complete Template:**
- Shows version badge in header: "Version 3 (Updated)"
- Only displays if `version_number > 1`
- Note below: "This is version 3 of this file"
- **Partial Template:**
- Shows inline: "Score: 85 (Version 3)"
## Example Email Output
### First Upload (New File):
```
Filename: video.mp4
Quality Score: 80
CreativeX ID: 6864255
```
### Third Upload (Re-scored):
```
Filename: video.mp4 [Version 3 (Updated)]
Quality Score: 85
CreativeX ID: 6864255
📝 Note: This is version 3 of this file (previous versions preserved in database)
```
## Database Behavior
### Version Counter Logic:
1. Count ALL records with this filename
2. New version = count + 1
3. Mark old `active``superseded`
4. Insert new record as `active` with incremented version
### Example Database State:
```
ID | Filename | Score | Status | Version (implicit)
1 | video.mp4 | 80 | superseded | 1 (first)
2 | video.mp4 | 85 | superseded | 2 (second)
3 | video.mp4 | 90 | active | 3 (current)
```
## Query Examples
### Get Latest Version Only:
```sql
SELECT * FROM creativex_scores
WHERE filename = 'video.mp4' AND status = 'active';
```
### Get Version Count for File:
```sql
SELECT COUNT(*) as version_count
FROM creativex_scores
WHERE filename = 'video.mp4';
```
### Get All Versions with Numbers:
```sql
SELECT
filename,
quality_score,
status,
ROW_NUMBER() OVER (PARTITION BY filename ORDER BY created_at) as version_number,
extracted_at
FROM creativex_scores
WHERE filename = 'video.mp4'
ORDER BY extracted_at;
```
## Testing Checklist
- [ ] Upload PDF to Box folder 350605024645
- [ ] Run script: `python scripts/creativex_scoring_storing.py`
- [ ] Check logs show version number
- [ ] Check database: ID and Score have no `.0`
- [ ] Check email shows version badge (if > 1)
- [ ] Re-upload same PDF with different score
- [ ] Verify version counter increments
- [ ] Verify old record marked `superseded`
## Benefits
1. **Clean Data:** No unnecessary `.0` decimals in IDs and scores
2. **Version Tracking:** Know if file has been re-scored
3. **History Preserved:** All previous scores available for audit
4. **Email Clarity:** Users see when a file is being updated vs new
5. **A2→A3 Ready:** Latest version automatically selected via `status='active'`
## Future Use in A2→A3
The version tracking is informational only. The `get_creativex_score_by_filename()` method automatically returns the latest `active` version, so A2→A3 workflow doesn't need to worry about versions.
```python
# This always returns the latest version
score_data = db.get_creativex_score_by_filename(filename)
# score_data['quality_score'] will be "90" (not "90.0")
# score_data['creativex_id'] will be "6864255" (not "6864255.0")
```