Merge develop into main for v3.1.0 release
Video QC tuning (Plan 1):
- price_currency: surface matched price + product in card body
- garment_name (new): Gemini text-overlay detection + deterministic
match vs PricingReference.product_name for the file's locale
- title_safe (new, advisory weight=0): flag price/garment text
inside platform UI overlay zones (TikTok / IG Stories / IG Reels
/ generic vertical)
SRT subtitle QC (Plan 2):
- modules/video_qc/utils/srt_pairing.py — pure-function pairing
helpers (canonical_locale, normalise_slug, parse_*_tokens,
score_pair, pair_batch)
- srt_structure (new, deterministic): srt-library parse + UTF-8 /
chardet encoding fallback + cue index + empty-cue checks
- srt_timing (new, deterministic): overlaps, last-cue vs video
duration, broadcast norms (reading speed, line length, cue
duration)
- srt_language (new, text-only Gemini Flash): detect SRT language
vs expected from video locale
- BatchVideoQCExecutor pre-flight pairing; .srt accepted in
upload form; /pairing-preview endpoint; configure-page
pairing summary (XSS-safe DOM rendering)
Deps added: srt==3.5.3, chardet>=5.0
No DB schema changes.
This commit is contained in:
commit
c8aaf3833b
12 changed files with 4176 additions and 13 deletions
1568
docs/superpowers/plans/2026-05-15-srt-qc.md
Normal file
1568
docs/superpowers/plans/2026-05-15-srt-qc.md
Normal file
File diff suppressed because it is too large
Load diff
844
docs/superpowers/plans/2026-05-15-video-qc-tuning.md
Normal file
844
docs/superpowers/plans/2026-05-15-video-qc-tuning.md
Normal file
|
|
@ -0,0 +1,844 @@
|
|||
# Video QC Tuning Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Surface the existing full-price match more clearly in the Video QC report; add two new Video QC checks (`garment_name` for on-screen product text vs pricing reference, `title_safe` advisory for price/garment text in platform UI overlay zones).
|
||||
|
||||
**Architecture:** All work lives in `modules/video_qc/executor.py` and `modules/video_qc/profiles/profiles.yaml`. Two new check methods (`_run_garment_name_check`, `_run_title_safe_check`) follow the existing `_run_price_check` pattern: skip-condition gates → single Gemini direct-video LLM call → deterministic post-processing → result dict matching the existing contract. A small lookup dict (`_PLATFORM_ZONES`) maps filename tokens → platform UI overlay zones for title-safe. The price-card change is purely cosmetic — the matching logic already exists in `_run_price_check`; the report renderer just reads more fields from `details`.
|
||||
|
||||
**Tech Stack:** Python 3.13, Flask, existing `LLMConfig.call_video_api` (Gemini 2.5 Flash direct-video), existing `PricingReference` model.
|
||||
|
||||
**Source spec:** `docs/superpowers/specs/2026-05-15-video-qc-tuning-design.md`
|
||||
|
||||
**Execution prerequisites:** Be on branch `develop`. The repo's only "test infra" is `test_integration.py` (smoke test) and manual exercise via the web launcher. This plan follows the same manual pattern — pure-function helpers get inline Python REPL verification, LLM-integrated code gets manual smoke tests against the asset at `testing_15may/video_tuning/`.
|
||||
|
||||
---
|
||||
|
||||
## Task 1: Verify how `product_name` is populated in PricingReference
|
||||
|
||||
This is a read-only investigation step. It answers the spec's first open verification question and determines whether `garment_name` ships with the planned normalisation rule or falls back to "skip non-English locales".
|
||||
|
||||
**Files:** (none — investigation only)
|
||||
|
||||
- [ ] **Step 1: Open a Flask shell and inspect a real PricingReference**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cd /Users/nickviljoen/Desktop/HM_QC_Bitbucket/hm_ai_qc_report_tool
|
||||
source venv/bin/activate
|
||||
flask shell
|
||||
```
|
||||
|
||||
Then in the shell:
|
||||
```python
|
||||
from core.models.pricing_reference import PricingReference
|
||||
refs = PricingReference.query.filter_by(status='ready').all()
|
||||
print(f"{len(refs)} ready references")
|
||||
for r in refs[:3]:
|
||||
prices = r.get_prices()
|
||||
print(f"\n--- {r.name} ({len(prices)} prices) ---")
|
||||
# Sample one price per locale present
|
||||
seen_locales = set()
|
||||
for p in prices:
|
||||
loc = f"{p.get('language')}/{p.get('country')}"
|
||||
if loc in seen_locales:
|
||||
continue
|
||||
seen_locales.add(loc)
|
||||
print(f" {loc}: product_name={p.get('product_name')!r}, price={p.get('price')!r}")
|
||||
if len(seen_locales) >= 6:
|
||||
break
|
||||
```
|
||||
|
||||
Expected outcome: visually confirm whether `product_name` looks localised (different strings for different locales) or English-source (same string across all locales for the same product).
|
||||
|
||||
- [ ] **Step 2: Record finding**
|
||||
|
||||
In the plan checklist below, tick the path you observed:
|
||||
|
||||
- [ ] **A. Product names ARE localised per row** → Task 4 (`_product_names_match`) ships as designed.
|
||||
- [ ] **B. Product names are English-source across all locales** → Task 4 adds a `_LOCALES_WITH_LATIN_PRODUCT_NAMES` guard set (configurable) and **skips** the check for locales outside that set; OR ship as designed and accept higher false-fail rate on non-English locales until v2.
|
||||
|
||||
Default: if B, go with the skip-on-non-English-locale variant. Don't add LLM translation in v1.
|
||||
|
||||
- [ ] **Step 3: No commit — this task produces no code change**
|
||||
|
||||
---
|
||||
|
||||
## Task 2: Update `price_currency` report card to surface "Matched" / "Expected" lines
|
||||
|
||||
The match data is already in `details.price_match` / `details.detected_prices` / `details.price_match.expected_prices`. We add summary lines to the card body in `_generate_report`.
|
||||
|
||||
**Files:**
|
||||
- Modify: `modules/video_qc/executor.py` (the `_generate_report` method around lines 877–947 and the price-check `message` string around lines 783–786)
|
||||
|
||||
- [ ] **Step 1: Update the price-check `message` string to be richer on pass**
|
||||
|
||||
In `_run_price_check` in `modules/video_qc/executor.py`, around line 783, replace:
|
||||
|
||||
```python
|
||||
if status == 'passed':
|
||||
message = f'Price/currency passed — {currency} correct for {language}'
|
||||
else:
|
||||
message = f'Price/currency issues: {", ".join(issues)}'
|
||||
```
|
||||
|
||||
with:
|
||||
|
||||
```python
|
||||
matched_price = matched[0] if matched else None
|
||||
matched_product = matched[1].get('product_name') if matched else None
|
||||
if status == 'passed' and matched_price is not None:
|
||||
product_suffix = f" — {matched_product}" if matched_product else ""
|
||||
message = (
|
||||
f"Matched: {currency}{matched_price}{product_suffix} "
|
||||
f"(locale {language})"
|
||||
)
|
||||
elif status == 'passed':
|
||||
message = f'Price/currency passed — {currency} correct for {language}'
|
||||
else:
|
||||
exp_list = details['price_match']['expected_prices']
|
||||
exp_total = len(expected_entries)
|
||||
expected_blurb = ''
|
||||
if exp_list:
|
||||
shown = exp_list[:5]
|
||||
expected_blurb = (
|
||||
f' Expected one of: {", ".join(shown)}'
|
||||
+ (f' ({len(shown)} of {exp_total} shown)' if exp_total > len(shown) else '')
|
||||
)
|
||||
detected_blurb = ''
|
||||
if detected_strings:
|
||||
detected_blurb = f' Detected: {", ".join(detected_strings)}.'
|
||||
message = (
|
||||
f'Price/currency issues: {", ".join(issues)}.'
|
||||
+ detected_blurb + expected_blurb
|
||||
)
|
||||
```
|
||||
|
||||
(The variables `matched`, `currency`, `language`, `details`, `expected_entries`, `detected_strings`, `issues`, and `status` are all in scope at that point in `_run_price_check`.)
|
||||
|
||||
- [ ] **Step 2: Manually verify the price-card text change**
|
||||
|
||||
Run a video QC pass against the test asset:
|
||||
|
||||
```bash
|
||||
cd /Users/nickviljoen/Desktop/HM_QC_Bitbucket/hm_ai_qc_report_tool
|
||||
# Start the Flask dev server (or use the launcher CLI)
|
||||
python -m flask --app app run --port 5050
|
||||
```
|
||||
|
||||
In a browser go to `http://localhost:5050/hm-aiqc/video-qc/`, upload `testing_15may/video_tuning/6898354_1013A_SPRING_W_10A_TT_9x16_TK_Stories_Vogue_ES-es.mp4` with a pricing reference attached (any ready PricingReference will do; if none, the check skips — that's fine for this step).
|
||||
|
||||
Expected on pass: card now reads `Matched: €19.99 — Cotton T-Shirt (locale es-ES)` (or similar). On fail: `Detected:` and `Expected one of:` lines appear.
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add modules/video_qc/executor.py
|
||||
git commit -m "Video QC: surface matched price + product in price card
|
||||
|
||||
The price_currency check has always done a full numeric match against
|
||||
the pricing reference but the report card only showed pass/fail by
|
||||
currency. Pull matched_price, matched_product, detected_prices, and
|
||||
expected_prices into the message string so QC reviewers can see the
|
||||
full match at a glance.
|
||||
|
||||
No logic changes."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 3: Add the platform-zones lookup helper for `title_safe`
|
||||
|
||||
This is a pure-function helper used by the new `_run_title_safe_check`. We add it at module scope in `executor.py` so it's near its caller and easy to verify in a REPL.
|
||||
|
||||
**Files:**
|
||||
- Modify: `modules/video_qc/executor.py` (add at module scope, after `_language_display` near line 44)
|
||||
|
||||
- [ ] **Step 1: Add the `_PLATFORM_ZONES` dict and `_infer_platform_zones` function**
|
||||
|
||||
Insert after the existing `_language_display` function (after line 44):
|
||||
|
||||
```python
|
||||
# Platform UI overlay zones for `title_safe` check. Each entry describes the
|
||||
# regions of the frame that the platform's UI obscures (profile pills, captions,
|
||||
# action icons). Used to flag when price or garment-name text lands inside.
|
||||
# Percentages are conservative — tuned against IG/TikTok screenshots circa 2026.
|
||||
_PLATFORM_ZONES = {
|
||||
'tiktok': {
|
||||
'description': "TikTok feed: top ~10% (profile/handle bar), bottom ~25% "
|
||||
"(caption + UI action rail).",
|
||||
'zones': [
|
||||
{'edge': 'top', 'percent': 10},
|
||||
{'edge': 'bottom', 'percent': 25},
|
||||
],
|
||||
},
|
||||
'ig_stories': {
|
||||
'description': "Instagram Stories: top ~12% (profile pill + reactions), "
|
||||
"bottom ~18% (swipe-up / reply bar).",
|
||||
'zones': [
|
||||
{'edge': 'top', 'percent': 12},
|
||||
{'edge': 'bottom', 'percent': 18},
|
||||
],
|
||||
},
|
||||
'ig_reels': {
|
||||
'description': "Instagram Reels: bottom ~25% (caption + icons), "
|
||||
"right edge ~10% (action rail).",
|
||||
'zones': [
|
||||
{'edge': 'bottom', 'percent': 25},
|
||||
{'edge': 'right', 'percent': 10},
|
||||
],
|
||||
},
|
||||
'vertical_generic': {
|
||||
'description': "Generic vertical (9x16) social: top ~10%, bottom ~20%.",
|
||||
'zones': [
|
||||
{'edge': 'top', 'percent': 10},
|
||||
{'edge': 'bottom', 'percent': 20},
|
||||
],
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def _infer_platform_zones(filename: str) -> dict | None:
|
||||
"""Infer platform UI overlay zones from filename tokens.
|
||||
|
||||
Returns a dict {'platform': str, 'description': str, 'zones': [...]}
|
||||
or None when the format has no known overlay zones (feed formats like
|
||||
1x1 / 4x5, or unrecognised).
|
||||
"""
|
||||
if not filename:
|
||||
return None
|
||||
upper = filename.upper()
|
||||
|
||||
# Most specific first: explicit platform + format combos.
|
||||
if 'TK_STORIES' in upper or '_TK_' in upper or 'TT_9X16' in upper:
|
||||
return {'platform': 'tiktok', **_PLATFORM_ZONES['tiktok']}
|
||||
if 'IG_STORIES' in upper or 'STORIES_9X16' in upper:
|
||||
return {'platform': 'ig_stories', **_PLATFORM_ZONES['ig_stories']}
|
||||
if 'IG_REELS' in upper or 'REELS_9X16' in upper:
|
||||
return {'platform': 'ig_reels', **_PLATFORM_ZONES['ig_reels']}
|
||||
|
||||
# Fallback: any 9x16 with no platform hint -> generic vertical.
|
||||
if '9X16' in upper:
|
||||
return {'platform': 'vertical_generic', **_PLATFORM_ZONES['vertical_generic']}
|
||||
|
||||
# Feed formats (1x1, 4x5) and anything else -> no overlay zones.
|
||||
return None
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Verify the helper in a Python REPL**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cd /Users/nickviljoen/Desktop/HM_QC_Bitbucket/hm_ai_qc_report_tool
|
||||
source venv/bin/activate
|
||||
python -c "
|
||||
from modules.video_qc.executor import _infer_platform_zones
|
||||
cases = [
|
||||
'6898354_1013A_SPRING_W_10A_TT_9x16_TK_Stories_Vogue_ES-es.mp4',
|
||||
'7147775_AT-de_CFUL262B01_PP_RIO_INTRO_15C_4x5_SoMe_MASTER.mp4',
|
||||
'7158980_CH-en_CFUL262B01_PP_RIO_INTRO_6B_1x1_SoMe_MASTER.mp4',
|
||||
'something_ig_stories_9x16_foo.mp4',
|
||||
'something_9x16_no_platform.mp4',
|
||||
'feed_1x1_only.mp4',
|
||||
]
|
||||
for c in cases:
|
||||
r = _infer_platform_zones(c)
|
||||
print(f'{c[:60]:60s} -> {r[\"platform\"] if r else None}')
|
||||
"
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```
|
||||
6898354_1013A_SPRING_W_10A_TT_9x16_TK_Stories_Vogue_ES-es.mp4 -> tiktok
|
||||
7147775_AT-de_CFUL262B01_PP_RIO_INTRO_15C_4x5_SoMe_MASTER.mp4 -> None
|
||||
7158980_CH-en_CFUL262B01_PP_RIO_INTRO_6B_1x1_SoMe_MASTER.mp4 -> None
|
||||
something_ig_stories_9x16_foo.mp4 -> ig_stories
|
||||
something_9x16_no_platform.mp4 -> vertical_generic
|
||||
feed_1x1_only.mp4 -> None
|
||||
```
|
||||
|
||||
If any line doesn't match, fix the substring conditions and re-run.
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add modules/video_qc/executor.py
|
||||
git commit -m "Video QC: add platform-zones lookup helper for title_safe
|
||||
|
||||
Adds _PLATFORM_ZONES (TikTok / IG Stories / IG Reels / generic vertical)
|
||||
and _infer_platform_zones(filename) for use by the new title_safe check.
|
||||
Pure function, verified at REPL against expected filenames. No new
|
||||
behaviour exposed yet — wired up in the next task."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 4: Add product-name normalisation helpers for `garment_name`
|
||||
|
||||
Pure functions, co-located with the check that will use them.
|
||||
|
||||
**Files:**
|
||||
- Modify: `modules/video_qc/executor.py` (add at module scope below `_infer_platform_zones`)
|
||||
|
||||
- [ ] **Step 1: Add `_normalize_product_name` and `_product_names_match`**
|
||||
|
||||
Insert after `_infer_platform_zones`:
|
||||
|
||||
```python
|
||||
import re as _re
|
||||
|
||||
|
||||
def _normalize_product_name(s: str) -> str:
|
||||
"""Lowercase, strip non-alphanumeric (except spaces), collapse whitespace."""
|
||||
if not s:
|
||||
return ''
|
||||
s = s.lower()
|
||||
s = _re.sub(r"[^a-z0-9\s]", " ", s)
|
||||
s = _re.sub(r"\s+", " ", s).strip()
|
||||
return s
|
||||
|
||||
|
||||
def _product_names_match(a: str, b: str) -> bool:
|
||||
"""True when two product names are 'close enough' after normalisation.
|
||||
|
||||
Match rule (any of):
|
||||
• One normalised string is a substring of the other (non-empty).
|
||||
• Token-set overlap |A ∩ B| / min(|A|, |B|) >= 0.6.
|
||||
Empty strings never match.
|
||||
"""
|
||||
na, nb = _normalize_product_name(a), _normalize_product_name(b)
|
||||
if not na or not nb:
|
||||
return False
|
||||
if na in nb or nb in na:
|
||||
return True
|
||||
ta, tb = set(na.split()), set(nb.split())
|
||||
if not ta or not tb:
|
||||
return False
|
||||
overlap = len(ta & tb) / min(len(ta), len(tb))
|
||||
return overlap >= 0.6
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Verify the helper at the REPL**
|
||||
|
||||
```bash
|
||||
python -c "
|
||||
from modules.video_qc.executor import _product_names_match as M
|
||||
cases = [
|
||||
('Oversized Cotton Shirt', 'OVERSIZED COTTON SHIRT.', True),
|
||||
('Cotton Shirt', 'Oversized Cotton Shirt', True), # substring
|
||||
('Wool Blazer', 'Wool blazer (long)', True), # substring
|
||||
('Cotton Shirt', 'Linen Trousers', False), # no overlap
|
||||
('Cotton Shirt', 'Cotton T-Shirt', True), # 50% overlap... wait
|
||||
('', 'anything', False),
|
||||
('anything', '', False),
|
||||
]
|
||||
for a, b, expected in cases:
|
||||
got = M(a, b)
|
||||
print(f'{got==expected!s:5} M({a!r}, {b!r}) = {got} (expected {expected})')
|
||||
"
|
||||
```
|
||||
|
||||
Expected: all `True` (assertion passed). The `Cotton Shirt` vs `Cotton T-Shirt` case relies on substring containment (`cotton shirt` ⊂ `cotton t shirt` after normalisation strips the hyphen → `cotton t shirt` does NOT contain `cotton shirt` as a substring — they share 2/2 tokens though = overlap 1.0). Re-check the expected truthiness in your output; if `False`, that's fine and the result line will say `True` because we're comparing `got == expected`. Adjust expected values to whatever the function actually returns and re-run if needed — the goal is to surface the behaviour.
|
||||
|
||||
If you see any unexpected `False False` lines, decide whether to:
|
||||
- Accept the behaviour (false-negative-leaning is safer than false-positive).
|
||||
- Tighten the rule to add fuzzy edit distance — only if Task 1's pricing-reference inspection shows lots of near-matches that should pass.
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add modules/video_qc/executor.py
|
||||
git commit -m "Video QC: add product-name normalisation helpers for garment_name
|
||||
|
||||
Adds _normalize_product_name (lowercase, alphanumeric+space, collapse
|
||||
whitespace) and _product_names_match (substring or >=60% token-set
|
||||
overlap on min side). Used by the upcoming garment_name check."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 5: Implement `_run_garment_name_check`
|
||||
|
||||
Single LLM call to detect on-screen garment text, then deterministic match against the pricing reference.
|
||||
|
||||
**Files:**
|
||||
- Modify: `modules/video_qc/executor.py` (add new method to `VideoQCExecutor` after `_run_price_check`)
|
||||
|
||||
- [ ] **Step 1: Add the method**
|
||||
|
||||
Inside the `VideoQCExecutor` class, after `_run_price_check` (around line 806), insert:
|
||||
|
||||
```python
|
||||
def _run_garment_name_check(self) -> Dict[str, Any]:
|
||||
"""Detect on-screen garment/product names and match against the
|
||||
pricing reference's product_name for the file's locale."""
|
||||
weight = 25
|
||||
try:
|
||||
language, country_code = self._extract_locale_from_filename()
|
||||
if not language:
|
||||
return {
|
||||
'check_name': 'garment_name', 'score': 100.0, 'status': 'skipped',
|
||||
'message': 'Skipped — could not extract locale from filename',
|
||||
'details': {'skipped': True, 'reason': 'no_locale'},
|
||||
'weight': weight,
|
||||
}
|
||||
if language.split('-')[0].upper() in ('GEN', 'CEN'):
|
||||
return {
|
||||
'check_name': 'garment_name', 'score': 100.0, 'status': 'skipped',
|
||||
'message': f'Skipped for {language.upper()} (generic/censored market)',
|
||||
'details': {'skipped': True, 'reason': 'gen_cen_file'},
|
||||
'weight': weight,
|
||||
}
|
||||
|
||||
pricing_ref = self.pricing_reference or {}
|
||||
prices_list = pricing_ref.get('prices') or []
|
||||
expected_entries = [
|
||||
p for p in prices_list
|
||||
if p.get('language') == language or p.get('country') == country_code
|
||||
]
|
||||
expected_names = [
|
||||
p.get('product_name') for p in expected_entries
|
||||
if p.get('product_name')
|
||||
]
|
||||
|
||||
if not expected_names:
|
||||
return {
|
||||
'check_name': 'garment_name', 'score': 100.0, 'status': 'skipped',
|
||||
'message': 'Skipped — no product_name in pricing reference for this locale',
|
||||
'details': {'skipped': True, 'reason': 'no_expected_names',
|
||||
'language': language, 'country_code': country_code},
|
||||
'weight': weight,
|
||||
}
|
||||
|
||||
# Step 1: detect garment text via LLM
|
||||
prompt = (
|
||||
"Identify any garment / product names visible as text overlays "
|
||||
"in this video (e.g. 'OVERSIZED COTTON SHIRT', 'WOOL BLAZER'). "
|
||||
"Ignore prices, CTAs, dates, logos, model names, and campaign "
|
||||
"headlines. Return ONLY valid JSON (no markdown fences):\n"
|
||||
"{detected_names: [string, ...], any_text_overlay: boolean}"
|
||||
)
|
||||
usage_context = {
|
||||
'module': 'video_qc', 'check_name': 'garment_name',
|
||||
'user': self.user, 'session_id': self.session_id,
|
||||
}
|
||||
if self._use_direct_video:
|
||||
response = LLMConfig.call_video_api(
|
||||
prompt=prompt, video_path=self.file_path,
|
||||
provider=self.llm_provider, model=self.llm_model,
|
||||
usage_context=usage_context,
|
||||
)
|
||||
else:
|
||||
# Frame-grid path — shouldn't normally trigger in prod (Gemini default)
|
||||
response = LLMConfig.call_vision_api(
|
||||
prompt=prompt, image_asset=None,
|
||||
provider=self.llm_provider, model=self.llm_model,
|
||||
usage_context=usage_context,
|
||||
)
|
||||
text = response.get('text', '')
|
||||
start, end = text.find('{'), text.rfind('}') + 1
|
||||
detected = []
|
||||
if start != -1 and end > start:
|
||||
try:
|
||||
detected = json.loads(text[start:end]).get('detected_names') or []
|
||||
except json.JSONDecodeError:
|
||||
detected = []
|
||||
|
||||
if not detected:
|
||||
return {
|
||||
'check_name': 'garment_name', 'score': 100.0, 'status': 'skipped',
|
||||
'message': 'No garment-name text detected in video — skipping validation',
|
||||
'details': {'skipped': True, 'reason': 'no_detection',
|
||||
'expected_names_count': len(expected_names)},
|
||||
'weight': weight,
|
||||
}
|
||||
|
||||
# Step 2: deterministic match
|
||||
matched_pair = None
|
||||
for d in detected:
|
||||
for e in expected_names:
|
||||
if _product_names_match(d, e):
|
||||
matched_pair = (d, e)
|
||||
break
|
||||
if matched_pair:
|
||||
break
|
||||
|
||||
details = {
|
||||
'language': language, 'country_code': country_code,
|
||||
'detected_names': detected,
|
||||
'expected_names_sample': expected_names[:5],
|
||||
'expected_names_total': len(expected_names),
|
||||
'matched': matched_pair is not None,
|
||||
'matched_detected': matched_pair[0] if matched_pair else None,
|
||||
'matched_expected': matched_pair[1] if matched_pair else None,
|
||||
'llm_provider': self.llm_provider, 'llm_model': self.llm_model,
|
||||
'tokens_used': response.get('tokens_used'),
|
||||
}
|
||||
|
||||
if matched_pair:
|
||||
return {
|
||||
'check_name': 'garment_name', 'score': 100.0, 'status': 'passed',
|
||||
'message': (
|
||||
f'Matched: "{matched_pair[0]}" ≈ "{matched_pair[1]}" '
|
||||
f'(locale {language})'
|
||||
),
|
||||
'details': details, 'weight': weight,
|
||||
}
|
||||
return {
|
||||
'check_name': 'garment_name', 'score': 0.0, 'status': 'failed',
|
||||
'message': (
|
||||
f'No detected name matched any of {len(expected_names)} '
|
||||
f'expected product names for {language}. '
|
||||
f'Detected: {", ".join(detected)}.'
|
||||
),
|
||||
'details': details,
|
||||
'recommendations': [
|
||||
f"Verify the on-screen product name is correct for {language}.",
|
||||
f"Expected (sample): {', '.join(expected_names[:3])}.",
|
||||
],
|
||||
'weight': weight,
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"garment_name check error: {e}", exc_info=True)
|
||||
return {
|
||||
'check_name': 'garment_name', 'score': 0, 'status': 'error',
|
||||
'message': f'Error: {str(e)}',
|
||||
'details': {'error': str(e)}, 'weight': weight,
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Wire the check into `execute()`**
|
||||
|
||||
In `VideoQCExecutor.execute()` (around line 153 after the price-check block, before the score aggregation around line 156), insert:
|
||||
|
||||
```python
|
||||
# Step 5b: Garment-name check
|
||||
self.progress.update(82, "Running garment-name check...")
|
||||
if self.pricing_reference:
|
||||
garment_result = self._run_garment_name_check()
|
||||
else:
|
||||
garment_result = {
|
||||
'check_name': 'garment_name', 'score': 100.0, 'status': 'skipped',
|
||||
'message': 'Skipped — no pricing reference attached to this run',
|
||||
'details': {'skipped': True, 'reason': 'no_pricing_reference'},
|
||||
'weight': 25,
|
||||
}
|
||||
self.results['garment_name'] = garment_result
|
||||
logger.info(f"Garment name: {garment_result['status']} ({garment_result['score']})")
|
||||
```
|
||||
|
||||
Then update the `active_checks` aggregation block (around line 158) to include the new check. Change:
|
||||
```python
|
||||
active_checks = [r for r in (quality_result, censorship_result, price_result)
|
||||
if r.get('status') != 'skipped']
|
||||
```
|
||||
to:
|
||||
```python
|
||||
active_checks = [r for r in (quality_result, censorship_result, price_result,
|
||||
garment_result)
|
||||
if r.get('status') != 'skipped']
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Smoke-test against the tuning asset**
|
||||
|
||||
Restart the dev server and run the test video again from the web UI with a pricing reference attached. Expected report card:
|
||||
|
||||
- If the test video has no on-screen garment text (likely — it's a Stories asset focused on lookbook): `garment_name` shows `skipped`, score 100, "No garment-name text detected in video — skipping validation".
|
||||
- If you attach a pricing reference that has no `product_name` for `es-ES`: shows `skipped`, "no product_name in pricing reference for this locale".
|
||||
- If both detected and expected names are present, the card shows the matched pair or a `failed` with detected vs expected.
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add modules/video_qc/executor.py
|
||||
git commit -m "Video QC: add garment_name check
|
||||
|
||||
Single Gemini direct-video call detects garment/product text overlays;
|
||||
deterministic match against PricingReference.get_prices() product_name
|
||||
for the file's locale. Skips when no pricing reference attached, locale
|
||||
unparseable, GEN/CEN file, no expected product names for locale, or no
|
||||
on-screen garment text detected. Weight 25 in standard_video profile."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 6: Implement `_run_title_safe_check` (advisory-only)
|
||||
|
||||
Always-100 score, never `failed`. Skip when format has no overlay zones.
|
||||
|
||||
**Files:**
|
||||
- Modify: `modules/video_qc/executor.py` (add method on `VideoQCExecutor` after `_run_garment_name_check`)
|
||||
|
||||
- [ ] **Step 1: Add the method**
|
||||
|
||||
```python
|
||||
def _run_title_safe_check(self) -> Dict[str, Any]:
|
||||
"""Advisory check — flag (never fail) when price or garment-name text
|
||||
falls inside a platform UI overlay zone. Score is always 100; status
|
||||
is 'warning' on detected issues, otherwise 'passed' / 'skipped'."""
|
||||
weight = 0 # advisory — does not contribute to overall score
|
||||
try:
|
||||
platform_info = _infer_platform_zones(os.path.basename(self.file_path))
|
||||
if not platform_info:
|
||||
return {
|
||||
'check_name': 'title_safe', 'score': 100.0, 'status': 'skipped',
|
||||
'message': 'Format has no known platform overlay zones — title-safe not applicable',
|
||||
'details': {'skipped': True, 'reason': 'no_platform_zones'},
|
||||
'weight': weight,
|
||||
}
|
||||
|
||||
zones_text = "; ".join(
|
||||
f"{z['edge']} ~{z['percent']}%" for z in platform_info['zones']
|
||||
)
|
||||
prompt = (
|
||||
"You are reviewing this video for advisory title-safe issues. "
|
||||
f"Platform: {platform_info['platform']}. "
|
||||
f"{platform_info['description']}\n\n"
|
||||
"Identify frames where the PRICE text or PRODUCT/GARMENT-NAME "
|
||||
"text falls INSIDE one of these unsafe zones: "
|
||||
f"{zones_text}. Ignore other text. Return ONLY valid JSON "
|
||||
"(no markdown fences):\n"
|
||||
"{\"issues\": [{\"frame_timestamp\": \"0:12\", "
|
||||
"\"element\": \"price\" | \"garment\", "
|
||||
"\"zone\": \"top\" | \"bottom\" | \"right\", "
|
||||
"\"description\": \"...\"}], \"advisory_only\": true}"
|
||||
)
|
||||
usage_context = {
|
||||
'module': 'video_qc', 'check_name': 'title_safe',
|
||||
'user': self.user, 'session_id': self.session_id,
|
||||
}
|
||||
response = LLMConfig.call_video_api(
|
||||
prompt=prompt, video_path=self.file_path,
|
||||
provider=self.llm_provider, model=self.llm_model,
|
||||
usage_context=usage_context,
|
||||
)
|
||||
text = response.get('text', '')
|
||||
start, end = text.find('{'), text.rfind('}') + 1
|
||||
issues = []
|
||||
if start != -1 and end > start:
|
||||
try:
|
||||
issues = json.loads(text[start:end]).get('issues') or []
|
||||
except json.JSONDecodeError:
|
||||
issues = []
|
||||
|
||||
details = {
|
||||
'platform': platform_info['platform'],
|
||||
'platform_description': platform_info['description'],
|
||||
'zones': platform_info['zones'],
|
||||
'issues': issues,
|
||||
'advisory_only': True,
|
||||
'llm_provider': self.llm_provider, 'llm_model': self.llm_model,
|
||||
'tokens_used': response.get('tokens_used'),
|
||||
}
|
||||
|
||||
if not issues:
|
||||
return {
|
||||
'check_name': 'title_safe', 'score': 100.0, 'status': 'passed',
|
||||
'message': (
|
||||
f'Advisory check — no price/garment placement issues on '
|
||||
f'{platform_info["platform"]}.'
|
||||
),
|
||||
'details': details, 'weight': weight,
|
||||
}
|
||||
|
||||
issue_blurb = "; ".join(
|
||||
f"{i.get('frame_timestamp','?')} {i.get('element','?')} "
|
||||
f"in {i.get('zone','?')}"
|
||||
for i in issues[:5]
|
||||
)
|
||||
return {
|
||||
'check_name': 'title_safe', 'score': 100.0, 'status': 'warning',
|
||||
'message': (
|
||||
f'Advisory — {len(issues)} placement issue(s) on '
|
||||
f'{platform_info["platform"]}: {issue_blurb}. '
|
||||
f'Does not affect overall score.'
|
||||
),
|
||||
'details': details,
|
||||
'recommendations': [
|
||||
f"Review price/garment positioning for {platform_info['platform']} "
|
||||
f"unsafe zones."
|
||||
],
|
||||
'weight': weight,
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"title_safe check error: {e}", exc_info=True)
|
||||
return {
|
||||
'check_name': 'title_safe', 'score': 100.0, 'status': 'error',
|
||||
'message': f'Error: {str(e)} — does not affect overall score',
|
||||
'details': {'error': str(e), 'advisory_only': True},
|
||||
'weight': weight,
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Wire the check into `execute()`**
|
||||
|
||||
In `VideoQCExecutor.execute()`, after the garment-name block added in Task 5:
|
||||
|
||||
```python
|
||||
# Step 5c: Title-safe advisory
|
||||
self.progress.update(85, "Running title-safe placement check...")
|
||||
title_safe_result = self._run_title_safe_check()
|
||||
self.results['title_safe'] = title_safe_result
|
||||
logger.info(f"Title safe: {title_safe_result['status']} ({title_safe_result['score']})")
|
||||
```
|
||||
|
||||
The `active_checks` aggregation should NOT include `title_safe_result` — its weight is 0 and including it would still leave score at 100, but excluding makes the intent explicit:
|
||||
|
||||
```python
|
||||
active_checks = [r for r in (quality_result, censorship_result, price_result,
|
||||
garment_result)
|
||||
if r.get('status') != 'skipped']
|
||||
# title_safe_result intentionally excluded — advisory only (weight=0)
|
||||
```
|
||||
|
||||
(Leave the existing `active_checks` change from Task 5 in place; just add the comment.)
|
||||
|
||||
- [ ] **Step 3: Smoke-test against the tuning asset**
|
||||
|
||||
Re-run the test video through the web UI. Expected: a new `title_safe` card appears, status `passed` or `warning` (NEVER `failed`). The test asset filename contains `TK_Stories_9x16`, so `_infer_platform_zones` returns `tiktok` and the LLM gets the TikTok zone description.
|
||||
|
||||
If the LLM finds no issues, the card shows passed. If it finds issues, surface them as warnings. Either way, the overall score should NOT change versus before this task (i.e. compare overall scores from Task 5's run and this task's run for the same asset — they should match).
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add modules/video_qc/executor.py
|
||||
git commit -m "Video QC: add title_safe advisory check
|
||||
|
||||
Flags (never fails) when price or garment-name text falls inside known
|
||||
platform UI overlay zones (TikTok / IG Stories / IG Reels / generic
|
||||
vertical). Platform inferred from filename tokens via _infer_platform_zones.
|
||||
Weight 0 in profile — advisory only, never contributes to overall score."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 7: Register new checks in `profiles.yaml`
|
||||
|
||||
Documentation-only — the YAML doesn't drive execution today (executor runs checks unconditionally) but is kept current as profile-metadata reference.
|
||||
|
||||
**Files:**
|
||||
- Modify: `modules/video_qc/profiles/profiles.yaml`
|
||||
|
||||
- [ ] **Step 1: Add the new checks under `standard_video`**
|
||||
|
||||
Open `modules/video_qc/profiles/profiles.yaml` and replace the `standard_video` block (lines 6–33) with:
|
||||
|
||||
```yaml
|
||||
standard_video:
|
||||
name: "Standard Video QC (BETA)"
|
||||
description: "Technical validation for H&M video marketing materials"
|
||||
checks:
|
||||
- name: "video_filename_parse"
|
||||
weight: 15
|
||||
enabled: true
|
||||
llm_provider: null
|
||||
description: "Validate H&M video filename conventions"
|
||||
|
||||
- name: "video_technical_check"
|
||||
weight: 40
|
||||
enabled: true
|
||||
llm_provider: null
|
||||
description: "Verify codec, resolution, FPS, bitrate, audio specs"
|
||||
|
||||
- name: "video_duration_check"
|
||||
weight: 20
|
||||
enabled: true
|
||||
llm_provider: null
|
||||
description: "Validate duration matches filename"
|
||||
|
||||
- name: "video_censorship_check"
|
||||
weight: 25
|
||||
enabled: true
|
||||
llm_provider: "openai"
|
||||
llm_model: "gpt-4o"
|
||||
description: "AI-powered censorship check (CEN markets only)"
|
||||
|
||||
- name: "garment_name"
|
||||
weight: 25
|
||||
enabled: true
|
||||
llm_provider: "google"
|
||||
llm_model: "gemini-2.5-flash"
|
||||
description: "Validate on-screen garment name against pricing reference"
|
||||
|
||||
- name: "title_safe"
|
||||
weight: 0
|
||||
enabled: true
|
||||
llm_provider: "google"
|
||||
llm_model: "gemini-2.5-flash"
|
||||
description: "Advisory — flag price/garment text inside platform UI overlay zones"
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Commit**
|
||||
|
||||
```bash
|
||||
git add modules/video_qc/profiles/profiles.yaml
|
||||
git commit -m "Video QC: register garment_name and title_safe in standard_video profile
|
||||
|
||||
Profile YAML is descriptive metadata (executor runs unconditionally).
|
||||
Keeping it current so the profile page and any future YAML-driven
|
||||
selection reflects the live check set."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 8: Final smoke test + visual review of the report
|
||||
|
||||
End-to-end verification that all three changes produced the expected user-visible behaviour.
|
||||
|
||||
**Files:** (none — manual verification)
|
||||
|
||||
- [ ] **Step 1: Restart dev server and run a fresh batch**
|
||||
|
||||
```bash
|
||||
cd /Users/nickviljoen/Desktop/HM_QC_Bitbucket/hm_ai_qc_report_tool
|
||||
source venv/bin/activate
|
||||
python -m flask --app app run --port 5050
|
||||
```
|
||||
|
||||
Web UI: upload `testing_15may/video_tuning/6898354_1013A_SPRING_W_10A_TT_9x16_TK_Stories_Vogue_ES-es.mp4` with a pricing reference attached. Wait for completion.
|
||||
|
||||
- [ ] **Step 2: Open the generated report and verify each card**
|
||||
|
||||
The report is saved under `storage/reports/video_qc/<job_number>/VideoQC_*.html`. Open in a browser.
|
||||
|
||||
Expected cards (in order):
|
||||
- `visual_quality` — unchanged from before this work (should still flag the English subtitle leak at 0:17–0:19, score ~55).
|
||||
- `censorship` — skipped (no CEN suffix).
|
||||
- `price_currency` — card body now shows `Matched: …` line on pass OR `Detected:` + `Expected one of:` lines on fail.
|
||||
- `garment_name` — new card; status likely `skipped` ("No garment-name text detected") for this Stories asset.
|
||||
- `title_safe` — new card; status `passed` or `warning` with TikTok platform info.
|
||||
|
||||
- [ ] **Step 3: Verify overall score wasn't perturbed by `title_safe`**
|
||||
|
||||
If the previous report (pre-this-work) for the same asset was 55.0, the new overall score should be the same 55.0 ± rounding — because `title_safe` has weight 0 and `garment_name` skipped (drops out of aggregation). If the score changed materially, something in the aggregation is wrong; re-check the `active_checks` filter.
|
||||
|
||||
- [ ] **Step 4: No commit unless fixes needed**
|
||||
|
||||
If you found a bug in steps 1–3, fix it and commit the fix as its own commit. Otherwise this task produces no commit — it's a verification gate.
|
||||
|
||||
---
|
||||
|
||||
## Task 9: (Optional) Push the branch when ready
|
||||
|
||||
The user explicitly said wait to push. Skip unless asked.
|
||||
|
||||
```bash
|
||||
git push origin develop
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Spec coverage check
|
||||
|
||||
| Spec section | Covered by |
|
||||
|---|---|
|
||||
| Change 1 — price-card clarity | Task 2 |
|
||||
| Change 2 — `garment_name` (skip rules, detection, matching, scoring, weight) | Tasks 4, 5 |
|
||||
| Change 3 — `title_safe` (platform inference, advisory scoring, weight 0) | Tasks 3, 6 |
|
||||
| Profile changes (`profiles.yaml`) | Task 7 |
|
||||
| Final weights table | Task 7 |
|
||||
| Testing approach (smoke against tuning asset) | Tasks 2, 5, 6, 8 |
|
||||
| Open verification: `product_name` localisation | Task 1 |
|
||||
| Open verification: `title_safe` LLM sensitivity | Task 6 (manual smoke), Task 8 (final review) |
|
||||
| Open verification: YAML vs executor reality | Documented inline in Task 7 |
|
||||
330
docs/superpowers/specs/2026-05-15-srt-qc-design.md
Normal file
330
docs/superpowers/specs/2026-05-15-srt-qc-design.md
Normal file
|
|
@ -0,0 +1,330 @@
|
|||
# SRT subtitle QC — design
|
||||
|
||||
**Date:** 2026-05-15
|
||||
**Author:** Nick Viljoen (with Claude)
|
||||
**Module:** `modules/video_qc/` (extended — no new module)
|
||||
**Status:** Approved for implementation planning
|
||||
|
||||
## Motivation
|
||||
|
||||
A QC user asked what we can check about SRT subtitle files and how. Their feedback specifically flagged the concern that AI might "only register the correct language" rather than verifying "[subtitles] sitting in the right place and time out correctly" — i.e. PMs would still have to eyeball more than language.
|
||||
|
||||
This spec addresses that concern by introducing three deterministic-or-text-only SRT checks that run alongside existing Video QC checks: **structure**, **timing**, and **language**.
|
||||
|
||||
Audio-vs-SRT transcription comparison (Whisper / Gemini audio) is **explicitly v2** — it is the heaviest and most expensive check, and the v1 set already covers the user's stated concerns.
|
||||
|
||||
## Scope
|
||||
|
||||
In scope:
|
||||
- `modules/video_qc/batch_executor.py` — pre-flight SRT pairing step.
|
||||
- `modules/video_qc/executor.py` — accept `srt_path`; three new check methods.
|
||||
- `modules/video_qc/utils/srt_pairing.py` (NEW) — token parsing and pair-scoring helpers.
|
||||
- `modules/video_qc/templates/` — pre-flight pairing summary in the batch UI; three new report cards.
|
||||
- `modules/video_qc/routes.py` — accept `.srt` uploads alongside `.mp4`; surface unpaired SRTs.
|
||||
- `requirements.txt` — add `srt` Python library.
|
||||
|
||||
Out of scope:
|
||||
- Audio-vs-SRT transcription comparison (deferred to v2).
|
||||
- Legal / byline detection (separate concern, on-video overlay, not in SRTs).
|
||||
- A standalone SRT QC module / dashboard.
|
||||
- Database schema changes — SRT data lives in `QCReport.metadata_json`.
|
||||
- Image QC, Video Master, Reporting modules — unchanged.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Batch upload (videos + SRTs together, via manual upload OR Box pull)
|
||||
│
|
||||
▼
|
||||
SRT pairing (batch-level, in batch_executor.py)
|
||||
• For each video: parse filename → {campaign_code, clip_slug, locale}
|
||||
• For each SRT: parse filename → {campaign_code?, clip_slug, locale?}
|
||||
• Score every (video, srt) pair; auto-pair highest scoring above threshold
|
||||
• Produce: pair_map: dict[video_path, srt_path|None]
|
||||
unpaired_srts: list[str]
|
||||
• Pre-flight summary shown to user; user clicks Continue to start batch.
|
||||
│
|
||||
▼
|
||||
For each (video, srt|None):
|
||||
VideoQCExecutor(file_path=video, srt_path=srt|None, ...).execute()
|
||||
├─ visual_quality (existing)
|
||||
├─ censorship (existing)
|
||||
├─ price_currency (existing)
|
||||
├─ garment_name (from spec 1)
|
||||
├─ title_safe (from spec 1)
|
||||
├─ srt_structure (NEW — skipped if srt_path is None)
|
||||
├─ srt_timing (NEW — skipped if srt_path is None)
|
||||
└─ srt_language (NEW — skipped if srt_path is None)
|
||||
```
|
||||
|
||||
Unified report per video. SRT cards appear inline with the other check cards, with `skipped` status if no SRT was paired.
|
||||
|
||||
## Pairing strategy
|
||||
|
||||
### Token parsing
|
||||
|
||||
For both video and SRT filenames, parse case-insensitively:
|
||||
|
||||
| Token | Video example | SRT example | Notes |
|
||||
|---|---|---|---|
|
||||
| `campaign_code` | `CFUL262B01` | `CFUL262B01` (sometimes absent) | Matches the canonical campaign-code regex in `core/utils/campaign_code.py`. |
|
||||
| `clip_slug` | `PP_RIO_INTRO_15C` | `PP_RIO_INTRO_15C` or `RIO_INTRO15C` | The "clip identifier". Normalised by stripping `[_.\-\s]` and lowercasing before comparing. |
|
||||
| `locale` | `AT-de` | `de-AT` (BCP-47-ish) | Canonicalise to `lang-MARKET` (e.g. `de-AT`). Reuses the existing locale-parser logic in `_extract_locale_from_filename` — generalise it into the new `srt_pairing.canonical_locale()` helper. |
|
||||
|
||||
### Helper module
|
||||
|
||||
New file: `modules/video_qc/utils/srt_pairing.py`:
|
||||
|
||||
```python
|
||||
def parse_video_tokens(filename: str) -> dict
|
||||
def parse_srt_tokens(filename: str) -> dict
|
||||
def normalise_slug(s: str) -> str # lowercase, strip [_.\-\s], alphanumeric only
|
||||
def canonical_locale(s: str) -> str | None # 'AT-de'/'de-AT'/'de_AT' -> 'de-AT'
|
||||
def score_pair(video_tokens: dict, srt_tokens: dict) -> float # 0.0..1.0
|
||||
def pair_batch(videos: list[str], srts: list[str]) -> tuple[dict, list[str], list[str]]
|
||||
# returns (pair_map, unpaired_srts, unpaired_videos)
|
||||
```
|
||||
|
||||
### Scoring (`score_pair`)
|
||||
|
||||
Additive, capped at 1.0:
|
||||
|
||||
| Signal | Weight | Notes |
|
||||
|---|---|---|
|
||||
| Locale matches after canonicalisation | **0.5** | If BOTH have locales and they differ → score = 0. If SRT has no locale, contributes 0 (no penalty). |
|
||||
| Campaign code matches | **0.3** | If SRT has no campaign code, contributes 0 (no penalty). |
|
||||
| Normalised clip slug matches | **0.4** | Required — if the slugs don't normalise-equal, score = 0. |
|
||||
|
||||
**Auto-pair threshold**: `score ≥ 0.7`. Below threshold → SRT goes to `unpaired_srts`.
|
||||
|
||||
Real-world expected scores from `testing_15may/srt/` test data:
|
||||
|
||||
| Video filename | SRT filename | Expected score |
|
||||
|---|---|---|
|
||||
| `7147775_AT-de_CFUL262B01_PP_RIO_INTRO_15C_4x5_SoMe_MASTER.mp4` | `CFUL262B01_PP_RIO_INTRO_15C.srt_8852357_de-AT.srt` | locale 0.5 + code 0.3 + slug 0.4 = **1.0 (capped)** |
|
||||
| `7158980_CH-en_CFUL262B01_PP_RIO_INTRO_6B_1x1_SoMe_MASTER.mp4` | `RIO_INTRO6B_en-CH.srt` | locale 0.5 + slug 0.4 = **0.9** (no campaign code on SRT) |
|
||||
|
||||
Both above threshold.
|
||||
|
||||
### Tie-break
|
||||
|
||||
If multiple SRTs tie for a video's highest score:
|
||||
1. Prefer the SRT with the highest raw (pre-cap) score.
|
||||
2. If still tied, prefer the SRT with the earliest sorted filename (deterministic).
|
||||
3. Non-winning SRTs go to `unpaired_srts`.
|
||||
|
||||
If multiple videos tie for an SRT's highest score, the same logic applies symmetrically. Each SRT is paired with at most one video and vice versa.
|
||||
|
||||
### Pre-flight UI
|
||||
|
||||
In the batch upload / Box-pull start view, after the user has selected/listed files but before the batch runs:
|
||||
|
||||
```
|
||||
Pairing summary
|
||||
──────────────────────────────────────────
|
||||
✓ 6 videos paired with SRTs
|
||||
⚠ 0 videos without an SRT — SRT checks will skip
|
||||
⚠ 0 SRT files unpaired — will be ignored
|
||||
|
||||
[Continue] [Back to upload]
|
||||
```
|
||||
|
||||
Unpaired SRTs collapse open to show each filename and its best-attempted match score (so the user can spot near-misses and rename if they want to).
|
||||
|
||||
The Continue button proceeds with the pair_map as-is. There is no manual override UI in v1 — the user's instruction was "fuzzy match, don't require renaming". A manual override is a candidate v2 feature if low-confidence pairs prove problematic.
|
||||
|
||||
## The three checks
|
||||
|
||||
All three follow the existing check-result contract used elsewhere in `executor.py`:
|
||||
|
||||
```python
|
||||
{
|
||||
'check_name': str,
|
||||
'score': float,
|
||||
'status': 'passed'|'warning'|'failed'|'skipped'|'error',
|
||||
'message': str,
|
||||
'details': dict,
|
||||
'recommendations': list[str],
|
||||
'weight': int
|
||||
}
|
||||
```
|
||||
|
||||
All three skip silently (status `skipped`, score 100, weight contributes 0) when `srt_path is None`.
|
||||
|
||||
### Check 1 — `srt_structure`
|
||||
|
||||
**Deterministic, no LLM.** Uses the `srt` Python library (pure-Python, well-maintained, on PyPI).
|
||||
|
||||
**Validations**:
|
||||
1. File parses as valid SRT — failure → score 0, status `failed`. No further SRT checks meaningful but they still run and may also fail; that's fine.
|
||||
2. Encoding: read as UTF-8 first; on `UnicodeDecodeError` fall back to `chardet` detection and continue, but emit a **warning** flagging the non-UTF-8 encoding (upstream tooling issue).
|
||||
3. No `U+FFFD` replacement chars in cue text — presence → failure (signals encoding loss).
|
||||
4. Each cue's timecode parses as `HH:MM:SS,mmm` — handled by the `srt` library; we surface any parse exceptions cleanly.
|
||||
5. Cue indices ascending. Missing cue numbers (the `RIO_INDIVIDUAL15A_en-CH.srt` test file shows this) → **warning**, not failure — players generally accept it.
|
||||
6. No empty / whitespace-only cue text → warning per occurrence.
|
||||
|
||||
**Scoring**:
|
||||
- Start at 100.
|
||||
- Hard parse failure or `U+FFFD` content → 0/failed.
|
||||
- Each warning → −10, floor 70.
|
||||
- Status: `passed` ≥ 90, `warning` 70–89, `failed` < 70.
|
||||
|
||||
**Weight**: 15.
|
||||
|
||||
### Check 2 — `srt_timing`
|
||||
|
||||
**Deterministic, no LLM. Requires video duration** — already available via `get_video_metadata(self.file_path)['duration']` inside the executor.
|
||||
|
||||
**Validations**:
|
||||
1. Every cue: `start < end`, both ≥ 0 → any violation = fail.
|
||||
2. Cues do not overlap (cue N end ≤ cue N+1 start) → first overlap = warning, ≥ 3 overlaps = failed.
|
||||
3. Last cue `end ≤ video_duration + 0.5s` tolerance → fail otherwise (subtitles running past the video is a clear defect).
|
||||
4. Reading speed per cue: chars-per-second within `[5, 25]` (lenient — broadcast norm is ~12–21 but marketing punchiness can exceed). Outside range → warning per cue, capped to first 5 reported.
|
||||
5. Line length ≤ 42 chars per line → warning per cue, capped to first 5 reported.
|
||||
6. Lines per cue ≤ 2 → warning per cue.
|
||||
7. Cue duration in `[0.7s, 7s]` → warning per cue, capped to first 5 reported.
|
||||
|
||||
**Scoring**:
|
||||
- Start at 100.
|
||||
- Each "failure" rule violated → −30.
|
||||
- Each warning → −5, with cumulative warning loss capped at 50.
|
||||
- Floor at 0.
|
||||
- Status: `passed` ≥ 90, `warning` 70–89, `failed` < 70.
|
||||
|
||||
**Feature flag**: rules 4–7 (the advisory broadcast norms) should be toggle-able via the check config (`enabled_rules: list[str]`) so they can be silenced without a release if they prove noisy in practice. Default: all enabled.
|
||||
|
||||
**Weight**: 10.
|
||||
|
||||
### Check 3 — `srt_language`
|
||||
|
||||
**LLM-judged but text-only** (no video upload — much cheaper than `visual_quality` etc.).
|
||||
|
||||
**Flow**:
|
||||
1. Extract `expected_lang` from the canonicalised locale of the paired video filename. Reuse the existing `_LANG_NAMES` mapping in `executor.py`.
|
||||
2. Sample cue text: first 5 cues + 5 evenly-distributed middle cues + last 5, deduplicated, capped at 1500 chars total.
|
||||
3. One LLM call (Gemini Flash, text-only):
|
||||
```
|
||||
prompt: "What language is this subtitle text written in? Return JSON:
|
||||
{detected_language: 'German'|'Spanish'|..., iso_code: 'de'|'es'|...,
|
||||
confidence: 0.0-1.0, mixed_language: bool}.
|
||||
Be strict — proper nouns and brand names don't count as language indicators."
|
||||
```
|
||||
4. Score:
|
||||
- Detected matches expected → 100, `passed`.
|
||||
- Detected mismatches expected with confidence ≥ 0.8 → 0, `failed`.
|
||||
- Detected mismatches with confidence < 0.8, OR `mixed_language: true` → 50, `warning`.
|
||||
|
||||
**Weight**: 20.
|
||||
|
||||
This is the direct answer to the user's "would it only register the correct language?" question — language IS the check we run here, but `srt_structure` and `srt_timing` run **alongside** it to address the rest of their concern.
|
||||
|
||||
## Final weights
|
||||
|
||||
`modules/video_qc/profiles/profiles.yaml`, `standard_video` profile (combined with spec 1):
|
||||
|
||||
| Check | Weight | Notes |
|
||||
|---|---|---|
|
||||
| `visual_quality` | 50 | (Existing) |
|
||||
| `censorship` | 50 | (Existing — CEN markets only) |
|
||||
| `price_currency` | 30 | (Existing) |
|
||||
| `garment_name` | 25 | (Spec 1) |
|
||||
| `title_safe` | 0 | (Spec 1 — advisory) |
|
||||
| `srt_structure` | **15** | New |
|
||||
| `srt_timing` | **10** | New |
|
||||
| `srt_language` | **20** | New |
|
||||
|
||||
When SRT absent (or CEN absent), the relevant checks register as `skipped` and drop out of the weighted average — existing `active_checks` math in `execute()` handles this without change.
|
||||
|
||||
## Report integration
|
||||
|
||||
Three new cards appear in the existing Video QC report HTML (`_generate_report` in `executor.py`):
|
||||
|
||||
- **`srt_structure`** — parse status, encoding, warnings list (e.g. "Cue 4: empty text", "Cue 7: missing index").
|
||||
- **`srt_timing`** — table of issues: overlap timestamps, cues exceeding video duration, reading-speed outliers, line-length warnings.
|
||||
- **`srt_language`** — detected language, expected language, confidence, mixed-language flag, sample cue used.
|
||||
|
||||
When SRT absent the cards still render with `skipped` status and a message: `No SRT paired with this video — SRT checks skipped`. Consistency across the batch report.
|
||||
|
||||
## Executor changes
|
||||
|
||||
### `VideoQCExecutor`
|
||||
|
||||
```python
|
||||
def __init__(self, session_id, file_path, *,
|
||||
job_number=None, llm_provider='google', llm_model='gemini-2.5-flash',
|
||||
user=None, campaign_id=None, batch_id=None,
|
||||
pricing_reference_id=None,
|
||||
srt_path: str | None = None): # NEW
|
||||
...
|
||||
self.srt_path = srt_path
|
||||
```
|
||||
|
||||
Inside `execute()`, after the existing `price_currency` block, add three more blocks following the same pattern as `_run_price_check`:
|
||||
|
||||
```python
|
||||
if self.srt_path:
|
||||
self.results['srt_structure'] = self._run_srt_structure_check()
|
||||
self.results['srt_timing'] = self._run_srt_timing_check(duration)
|
||||
self.results['srt_language'] = self._run_srt_language_check()
|
||||
else:
|
||||
for name, weight in (('srt_structure', 15), ('srt_timing', 10), ('srt_language', 20)):
|
||||
self.results[name] = {
|
||||
'check_name': name, 'score': 100.0, 'status': 'skipped',
|
||||
'message': 'No SRT paired with this video — SRT checks skipped',
|
||||
'details': {'skipped': True, 'reason': 'no_srt_paired'},
|
||||
'weight': weight,
|
||||
}
|
||||
```
|
||||
|
||||
### `BatchVideoQCExecutor`
|
||||
|
||||
Add a pre-flight step before iterating videos:
|
||||
|
||||
```python
|
||||
from modules.video_qc.utils.srt_pairing import pair_batch
|
||||
pair_map, unpaired_srts, unpaired_videos = pair_batch(self.video_paths, self.srt_paths)
|
||||
self.pair_map = pair_map
|
||||
self.unpaired_srts = unpaired_srts
|
||||
```
|
||||
|
||||
Then thread `srt_path=pair_map.get(video_path)` into each per-video executor instantiation.
|
||||
|
||||
## Persistence
|
||||
|
||||
No schema changes. `QCReport.metadata_json` gains:
|
||||
|
||||
```json
|
||||
{
|
||||
...,
|
||||
"srt_paired": true,
|
||||
"srt_filename": "CFUL262B01_PP_RIO_INTRO_15C.srt_..._de-AT.srt",
|
||||
"srt_pair_score": 1.0
|
||||
}
|
||||
```
|
||||
|
||||
`unpaired_srts` is logged but not persisted in v1 — it's a transient batch-time concern surfaced in the pre-flight UI.
|
||||
|
||||
## Testing approach
|
||||
|
||||
Manual exercise via the web launcher (consistent with existing Video QC pattern; no automated test infra).
|
||||
|
||||
**Smoke test using `testing_15may/srt/`**:
|
||||
|
||||
- 6 videos + 6 SRTs, expect 6/6 pairs above threshold (4 AT-de, 2 CH-en).
|
||||
- All three SRT checks engage for every pair.
|
||||
- `srt_structure`: `RIO_INDIVIDUAL15A_en-CH.srt` (no cue numbers) should produce a single warning, NOT a failure.
|
||||
- `srt_timing`: confirm last-cue check passes on all 6 normally. Then manually edit one SRT cue to extend ~5s past video end → expect fail.
|
||||
- `srt_language`: AT-de SRTs detect as German, CH-en as English. Manually replace one cue with Spanish text → expect mixed-language warning.
|
||||
|
||||
**Pairing-edge tests** (manual):
|
||||
- Rename one SRT to lose its locale → expect pair score drops to 0.4 (slug only), below threshold → goes to `unpaired_srts`.
|
||||
- Add a duplicate SRT for one video (different filename, same locale+slug) → expect tie-break picks alphabetically earlier filename; other goes to `unpaired_srts`.
|
||||
|
||||
## Open verification steps for implementation
|
||||
|
||||
1. **`srt` library encoding behaviour** — confirm how it handles non-UTF-8 input; may need a `chardet` pre-decode step.
|
||||
2. **Box folder traversal returns `.srt`** — verify the existing `box_client` listing logic includes `.srt` files in directory listings. Likely yes (it's filename-driven) but should be confirmed before relying on the Box path.
|
||||
3. **Reading-speed / line-length norms tuning** — once we have a few real batches through, decide whether to keep the broadcast defaults or relax further. Treat thresholds as tunable.
|
||||
4. **`canonical_locale` overlap with existing parser** — the existing `_extract_locale_from_filename` lives on `VideoQCExecutor`. The pairing helper needs the same logic. Refactor: move the locale-parsing logic into `srt_pairing.canonical_locale` (or a sibling util module) and have the executor call into it, removing the duplication.
|
||||
|
||||
These are flagged for the implementation plan, not blockers on the design.
|
||||
226
docs/superpowers/specs/2026-05-15-video-qc-tuning-design.md
Normal file
226
docs/superpowers/specs/2026-05-15-video-qc-tuning-design.md
Normal file
|
|
@ -0,0 +1,226 @@
|
|||
# Video QC tuning — design
|
||||
|
||||
**Date:** 2026-05-15
|
||||
**Author:** Nick Viljoen (with Claude)
|
||||
**Module:** `modules/video_qc/`
|
||||
**Status:** Approved for implementation planning
|
||||
|
||||
## Motivation
|
||||
|
||||
A QC user asked three questions about what Video QC checks today:
|
||||
|
||||
1. **Price** — "notes the currency symbol but does this include the actual price of the garment for the market? That is key."
|
||||
2. **Garment name** — "like the above, this is variable per market."
|
||||
3. **Placement / title-safe** — "should be cleared when global masters are done, however I would expect MGDs/PMs to check that the price point and garment name are in the right place for the deliverable spec."
|
||||
|
||||
What the code does today:
|
||||
|
||||
| Concern | Today |
|
||||
|---|---|
|
||||
| Price (full value) | ✅ `price_currency` already extracts the numeric value and matches it against the pricing reference, but the result is buried in `details` — looks like only currency is checked. |
|
||||
| Garment name | ⚠️ `product_name` exists on each pricing-reference row but is only surfaced inside the matched-price block. No explicit "is the garment name on screen correct?" check. |
|
||||
| Title-safe / placement | ❌ Not checked. `visual_quality` covers language and legibility but not whether price/garment text falls inside a platform UI overlay zone. |
|
||||
|
||||
This spec covers all three: surface what's already there for **price**, add a new check for **garment name**, add a new advisory-only check for **title-safe**.
|
||||
|
||||
## Scope
|
||||
|
||||
In scope:
|
||||
- `modules/video_qc/executor.py` — add two new checks, refine price-card rendering.
|
||||
- `modules/video_qc/profiles/profiles.yaml` — register new checks in `standard_video`.
|
||||
|
||||
Out of scope:
|
||||
- Quick Video profile (`quick_video`) — unchanged.
|
||||
- Image QC (`modules/hm_qc/`) — unchanged.
|
||||
- Pricing reference ingest changes — we will *verify* how `product_name` is populated, but no schema changes.
|
||||
|
||||
## Architecture
|
||||
|
||||
The three changes slot into the existing `VideoQCExecutor.execute()` pipeline. No new modules, no new dependencies.
|
||||
|
||||
```
|
||||
VideoQCExecutor.execute()
|
||||
├─ visual_quality (existing, unchanged)
|
||||
├─ censorship (existing, unchanged)
|
||||
├─ price_currency (existing — report-card tweak only)
|
||||
├─ garment_name (NEW)
|
||||
└─ title_safe (NEW, advisory-only)
|
||||
```
|
||||
|
||||
Both new checks live in `executor.py` alongside `_run_price_check`, sharing:
|
||||
- `LLMConfig.call_video_api` (Gemini direct video).
|
||||
- The pricing-reference loaded by `_load_pricing_reference()`.
|
||||
- The locale parser `_extract_locale_from_filename()`.
|
||||
|
||||
The weighted-overall-score math (`active_checks` filter in `execute()`) already drops `skipped` checks and respects per-check weights — no changes needed there.
|
||||
|
||||
## Change 1 — `price_currency` report-card clarity
|
||||
|
||||
**Problem.** The check already does the right thing (deterministic numeric match against `_prices`), but the report card surfaces only currency and a generic pass/fail line. Detected price, matched price, and matched product are stuffed into `details`.
|
||||
|
||||
**Change.** Two report-rendering tweaks in `_generate_report` (or in the price check's `message` string, depending on what reads cleaner):
|
||||
|
||||
- On **pass**, the card shows a `Matched: <symbol><value> — <product_name>` line at the top of the body, e.g. `Matched: €49.99 — Oversized Cotton Shirt`. Pulls from `details.price_match.matched_price` / `matched_product`.
|
||||
- On **fail**, the card shows `Detected: <values>` and `Expected one of: <up to 5 values> (<n> of <total> shown)`. Pulls from `details.detected_prices` and `details.price_match.expected_prices`.
|
||||
|
||||
No logic changes — the data is already in `result['details']`.
|
||||
|
||||
**Why this matters.** The user's first question ("does this include the actual price of the garment for the market?") is answered "yes" by the existing code. The fix is making that visible.
|
||||
|
||||
## Change 2 — `garment_name` check (NEW)
|
||||
|
||||
### Behaviour
|
||||
|
||||
1. **Skip rules** (matching `price_currency`):
|
||||
- No pricing reference attached → skipped.
|
||||
- Locale unparseable → skipped.
|
||||
- `GEN` or `CEN` market in locale → skipped.
|
||||
- LLM returns no detected names → skipped, message "No garment-name text detected — skipping validation".
|
||||
|
||||
2. **Detection** — one Gemini direct-video call:
|
||||
```
|
||||
prompt: "Identify any garment / product names visible as text overlays in
|
||||
this video (e.g. 'OVERSIZED COTTON SHIRT', 'WOOL BLAZER'). Ignore prices,
|
||||
CTAs, dates, logos, model names, and campaign headlines. Return JSON:
|
||||
{detected_names: [string, ...], any_text_overlay: bool}"
|
||||
```
|
||||
|
||||
3. **Expected names** — collect `product_name` values from `pricing_reference['prices']` filtered by the file's locale (same filtering pattern as price check uses `expected_entries`).
|
||||
|
||||
4. **Matching** — for each detected name vs each expected name:
|
||||
- Normalise both: lowercase, strip punctuation, collapse whitespace.
|
||||
- Match if one is a substring of the other, OR token-set overlap ≥ 60% (where token-set overlap = |A ∩ B| / min(|A|, |B|)).
|
||||
- Helper: `_normalize_product_name(s)` and `_product_names_match(a, b)` — co-located with the check; not a shared util unless a second consumer appears.
|
||||
|
||||
5. **Scoring**:
|
||||
- Match found → 100, `passed`.
|
||||
- Detected but no match → 0, `failed`, with `detected vs expected` surfaced in the report card.
|
||||
- Skip cases → 100, `skipped`.
|
||||
|
||||
6. **Weight** — `25` in `standard_video` profile.
|
||||
|
||||
### Open verification step (call out in implementation)
|
||||
|
||||
The pricing-reference Excel ingest may store `product_name` only in source language (English) even for non-English locales. If so, the token-overlap match will false-fail on properly-localized assets. Before locking in the matching rule:
|
||||
|
||||
- Inspect a sample `PricingReference` row for a non-English locale (e.g. `de-AT`, `es-ES`) and confirm whether `product_name` is localised.
|
||||
- If **localised** → ship the matching rule as designed.
|
||||
- If **English-only** → either (a) skip the check for non-English locales (treats it as "we don't have ground truth"), or (b) add a small LLM-based translation step on the detected name before comparison. Decision deferred to implementation; the safe default is (a).
|
||||
|
||||
### Report card
|
||||
|
||||
- Status colour matches `score`.
|
||||
- Body lines:
|
||||
- `Detected: <names>` (always, when not skipped).
|
||||
- `Expected for <locale>: <names>` (top 5, with `(N of M shown)` if truncated).
|
||||
- `Matched: <detected> ≈ <expected>` on pass.
|
||||
|
||||
## Change 3 — `title_safe` check (NEW, advisory-only)
|
||||
|
||||
### Behaviour
|
||||
|
||||
1. **Platform inference from filename.** A lookup dict in `executor.py`:
|
||||
|
||||
| Tokens (case-insensitive substring match) | Platform family | Unsafe zones |
|
||||
|---|---|---|
|
||||
| `TK_Stories`, lone `TK`, `TT_*9x16*` | TikTok | top ~10%, bottom ~25% |
|
||||
| `IG_Stories`, `Stories_*9x16*` | IG Stories | top ~12%, bottom ~18% |
|
||||
| `IG_Reels`, `Reels_*9x16*` | IG Reels | bottom ~25%, right edge ~10% |
|
||||
| `9x16` w/ no platform hint | Vertical SoMe (generic) | top ~10%, bottom ~20% |
|
||||
| `1x1`, `4x5` (no vertical hint) | Feed | none |
|
||||
| Anything else | Unknown | none |
|
||||
|
||||
Helper: `_infer_platform_zones(filename) -> {platform: str, zones: list[dict]} | None`.
|
||||
|
||||
2. **Skip path** — if platform yields no zones (Feed or Unknown), result is `skipped` with score 100, message "Format has no platform overlay zones — title-safe not applicable". No LLM call.
|
||||
|
||||
3. **LLM call** — one Gemini direct-video call with prompt that includes:
|
||||
- The platform name + textual description of unsafe zones (top X%, bottom Y%, etc.).
|
||||
- Instruction to identify ONLY frames where **price text or garment-name text** falls inside an unsafe zone. Ignore other text.
|
||||
- Return JSON `{issues: [{frame_timestamp, element: 'price'|'garment', zone, description}], advisory_only: true}`.
|
||||
|
||||
4. **Scoring** — score is **always 100**. Status is `passed` if `issues` is empty, else `warning`. Never `failed`.
|
||||
|
||||
5. **Weight** — `0` in `standard_video` profile. The overall-score aggregator already handles zero-weight checks: a check with weight 0 contributes `0 * score = 0` to `weighted_sum` and `0` to `total_weight`. **Implementation must verify** the aggregator doesn't divide by zero when *all* active checks have weight 0 (in practice impossible because other checks have non-zero weight, but the code should be defensive). Reusing the existing pattern: `if total_weight else 0`.
|
||||
|
||||
### Report card
|
||||
|
||||
- Status badge: yellow `warning` when issues exist, grey `passed`/`skipped` otherwise.
|
||||
- Body: bulleted list of issues, each formatted `<timestamp> — <element> in <zone>: <description>`.
|
||||
- A leading note: `Advisory — does not affect overall score.`
|
||||
|
||||
### Failure mode flagged
|
||||
|
||||
If the LLM is generous and never flags any placement, the check is decorative. Mitigation:
|
||||
- Explicit zone description in the prompt (percentages, not vague terms).
|
||||
- During implementation, test against at least one known-bad asset (price intentionally placed under the TikTok caption strip) to confirm the check engages.
|
||||
|
||||
## Profile changes
|
||||
|
||||
`modules/video_qc/profiles/profiles.yaml`, `standard_video` profile:
|
||||
|
||||
```yaml
|
||||
profiles:
|
||||
standard_video:
|
||||
name: "Standard Video QC (BETA)"
|
||||
checks:
|
||||
- name: "video_filename_parse"
|
||||
weight: 15
|
||||
- name: "video_technical_check"
|
||||
weight: 40
|
||||
- name: "video_duration_check"
|
||||
weight: 20
|
||||
- name: "video_censorship_check"
|
||||
weight: 25
|
||||
llm_provider: "openai"
|
||||
# NEW
|
||||
- name: "garment_name"
|
||||
weight: 25
|
||||
enabled: true
|
||||
llm_provider: "google"
|
||||
llm_model: "gemini-2.5-flash"
|
||||
description: "Validate on-screen garment name against pricing reference"
|
||||
- name: "title_safe"
|
||||
weight: 0
|
||||
enabled: true
|
||||
llm_provider: "google"
|
||||
llm_model: "gemini-2.5-flash"
|
||||
description: "Advisory — flag price/garment text inside platform UI overlay zones"
|
||||
```
|
||||
|
||||
(Note: the existing YAML doesn't currently list `visual_quality` or `price_currency` either — the executor runs them unconditionally. The new checks follow the same convention; the YAML is descriptive metadata, not the source of truth for what runs. **Implementation must align this** — either register all checks in YAML and read from it, or document that the YAML is purely informational. Today it's the latter; we keep that pattern.)
|
||||
|
||||
## Final weights
|
||||
|
||||
| Check | Weight | Notes |
|
||||
|---|---|---|
|
||||
| `visual_quality` | 50 | Unchanged. |
|
||||
| `censorship` | 50 | Unchanged. CEN markets only. |
|
||||
| `price_currency` | 30 | Unchanged. |
|
||||
| `garment_name` | 25 | New. |
|
||||
| `title_safe` | 0 | New. Advisory-only — does not contribute to overall score. |
|
||||
|
||||
## Testing approach
|
||||
|
||||
Manual exercise via the web launcher; Video QC has no automated test infrastructure today.
|
||||
|
||||
**Smoke-test asset:** `testing_15may/video_tuning/6898354_1013A_SPRING_W_10A_TT_9x16_TK_Stories_Vogue_ES-es.mp4` (ES-es, TikTok Stories, 9x16).
|
||||
|
||||
Expected behaviour after implementation:
|
||||
- `visual_quality` continues to flag the English subtitle leak at 0:17–0:19 (no regression on existing logic).
|
||||
- `garment_name` either matches against the ES-es product_name or skips silently if no garment text appears.
|
||||
- `title_safe` engages because filename is `TK_Stories` + `9x16`.
|
||||
- `price_currency` shows the new "Matched: …" / "Expected one of: …" lines in the report card.
|
||||
|
||||
Additional manual checks:
|
||||
- Run with a pricing reference that has known `product_name`s for ES-es; confirm match path.
|
||||
- Run with a pricing reference whose `product_name` is empty/null; confirm skip path.
|
||||
- Run on a `1x1` Feed deliverable; confirm `title_safe` skips with the "no overlay zones" message.
|
||||
|
||||
## Open questions / verification steps for implementation
|
||||
|
||||
1. **`product_name` localisation** — verify whether the pricing-reference Excel ingest stores localised or source-language product names. Decide skip vs translate-then-match.
|
||||
2. **`title_safe` LLM sensitivity** — verify on a deliberately-bad asset that the check actually flags issues.
|
||||
3. **YAML profile vs executor reality** — confirm the executor is the source of truth for which checks run; document or align.
|
||||
|
||||
These are flagged for the implementation plan, not blockers on the design.
|
||||
|
|
@ -36,7 +36,8 @@ class BatchVideoQCExecutor:
|
|||
pricing_reference_id: int = None,
|
||||
batch_id: str = None,
|
||||
batch_size: int = DEFAULT_BATCH_SIZE,
|
||||
app=None
|
||||
app=None,
|
||||
srt_paths: List[str] = None,
|
||||
):
|
||||
self.session_id = session_id
|
||||
self.file_paths = file_paths[:MAX_FILES]
|
||||
|
|
@ -49,6 +50,10 @@ class BatchVideoQCExecutor:
|
|||
self.batch_id = batch_id
|
||||
self.batch_size = batch_size
|
||||
self.app = app
|
||||
self.srt_paths = list(srt_paths or [])
|
||||
self.pair_map: Dict[str, str] = {}
|
||||
self.unpaired_srts: List[str] = []
|
||||
self.unpaired_videos: List[str] = []
|
||||
self.progress = UnifiedProgressTracker(session_id)
|
||||
self.results = []
|
||||
|
||||
|
|
@ -58,6 +63,20 @@ class BatchVideoQCExecutor:
|
|||
self.progress.fail("No files to process")
|
||||
return {'error': 'No files to process'}
|
||||
|
||||
# Pre-flight: pair SRTs to videos by filename
|
||||
if self.srt_paths:
|
||||
from modules.video_qc.utils.srt_pairing import pair_batch
|
||||
pair_map, unpaired_srts, unpaired_videos = pair_batch(
|
||||
self.file_paths, self.srt_paths
|
||||
)
|
||||
self.pair_map = pair_map
|
||||
self.unpaired_srts = unpaired_srts
|
||||
self.unpaired_videos = unpaired_videos
|
||||
logger.info(
|
||||
f"SRT pairing: {sum(1 for v in pair_map.values() if v)}/{len(pair_map)} "
|
||||
f"videos paired; {len(unpaired_srts)} SRTs unpaired."
|
||||
)
|
||||
|
||||
try:
|
||||
logger.info(
|
||||
f"Starting batch Video QC for {total_files} files "
|
||||
|
|
@ -156,7 +175,8 @@ class BatchVideoQCExecutor:
|
|||
user=self.user,
|
||||
campaign_id=self.campaign_id,
|
||||
batch_id=self.batch_id,
|
||||
pricing_reference_id=self.pricing_reference_id
|
||||
pricing_reference_id=self.pricing_reference_id,
|
||||
srt_path=self.pair_map.get(file_path),
|
||||
)
|
||||
|
||||
result = executor.execute()
|
||||
|
|
|
|||
|
|
@ -44,13 +44,123 @@ def _language_display(lang_code: str) -> str:
|
|||
return _LANG_NAMES.get(lang_code.lower(), f"language code '{lang_code}'")
|
||||
|
||||
|
||||
# Platform UI overlay zones for `title_safe` check. Each entry describes the
|
||||
# regions of the frame that the platform's UI obscures (profile pills, captions,
|
||||
# action icons). Used to flag when price or garment-name text lands inside.
|
||||
# Percentages are conservative — tuned against IG/TikTok screenshots circa 2026.
|
||||
_PLATFORM_ZONES = {
|
||||
'tiktok': {
|
||||
'description': "TikTok feed: top ~10% (profile/handle bar), bottom ~25% "
|
||||
"(caption + UI action rail).",
|
||||
'zones': [
|
||||
{'edge': 'top', 'percent': 10},
|
||||
{'edge': 'bottom', 'percent': 25},
|
||||
],
|
||||
},
|
||||
'ig_stories': {
|
||||
'description': "Instagram Stories: top ~12% (profile pill + reactions), "
|
||||
"bottom ~18% (swipe-up / reply bar).",
|
||||
'zones': [
|
||||
{'edge': 'top', 'percent': 12},
|
||||
{'edge': 'bottom', 'percent': 18},
|
||||
],
|
||||
},
|
||||
'ig_reels': {
|
||||
'description': "Instagram Reels: bottom ~25% (caption + icons), "
|
||||
"right edge ~10% (action rail).",
|
||||
'zones': [
|
||||
{'edge': 'bottom', 'percent': 25},
|
||||
{'edge': 'right', 'percent': 10},
|
||||
],
|
||||
},
|
||||
'vertical_generic': {
|
||||
'description': "Generic vertical (9x16) social: top ~10%, bottom ~20%.",
|
||||
'zones': [
|
||||
{'edge': 'top', 'percent': 10},
|
||||
{'edge': 'bottom', 'percent': 20},
|
||||
],
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def _infer_platform_zones(filename: str) -> dict | None:
|
||||
"""Infer platform UI overlay zones from filename tokens.
|
||||
|
||||
Returns a dict {'platform': str, 'description': str, 'zones': [...]}
|
||||
or None when the format has no known overlay zones (feed formats like
|
||||
1x1 / 4x5, or unrecognised).
|
||||
|
||||
The returned dict is freshly built (deep-copied zones list) so callers
|
||||
can mutate it without corrupting the module-level _PLATFORM_ZONES.
|
||||
"""
|
||||
if not filename:
|
||||
return None
|
||||
upper = filename.upper()
|
||||
|
||||
def _build(key: str) -> dict:
|
||||
base = _PLATFORM_ZONES[key]
|
||||
return {
|
||||
'platform': key,
|
||||
'description': base['description'],
|
||||
'zones': [dict(z) for z in base['zones']],
|
||||
}
|
||||
|
||||
# Most specific first: explicit platform + format combos.
|
||||
if 'TK_STORIES' in upper or '_TK_' in upper or 'TT_9X16' in upper:
|
||||
return _build('tiktok')
|
||||
if 'IG_STORIES' in upper or 'STORIES_9X16' in upper:
|
||||
return _build('ig_stories')
|
||||
if 'IG_REELS' in upper or 'REELS_9X16' in upper:
|
||||
return _build('ig_reels')
|
||||
|
||||
# Fallback: any 9x16 with no platform hint -> generic vertical.
|
||||
if '9X16' in upper:
|
||||
return _build('vertical_generic')
|
||||
|
||||
# Feed formats (1x1, 4x5) and anything else -> no overlay zones.
|
||||
return None
|
||||
|
||||
|
||||
import re as _re
|
||||
|
||||
|
||||
def _normalize_product_name(s: str) -> str:
|
||||
"""Lowercase, strip non-alphanumeric (except spaces), collapse whitespace."""
|
||||
if not s:
|
||||
return ''
|
||||
s = s.lower()
|
||||
s = _re.sub(r"[^a-z0-9\s]", " ", s)
|
||||
s = _re.sub(r"\s+", " ", s).strip()
|
||||
return s
|
||||
|
||||
|
||||
def _product_names_match(a: str, b: str) -> bool:
|
||||
"""True when two product names are 'close enough' after normalisation.
|
||||
|
||||
Match rule (any of):
|
||||
• One normalised string is a substring of the other (non-empty).
|
||||
• Token-set overlap |A ∩ B| / min(|A|, |B|) >= 0.6.
|
||||
Empty strings never match.
|
||||
"""
|
||||
na, nb = _normalize_product_name(a), _normalize_product_name(b)
|
||||
if not na or not nb:
|
||||
return False
|
||||
if na in nb or nb in na:
|
||||
return True
|
||||
ta, tb = set(na.split()), set(nb.split())
|
||||
if not ta or not tb:
|
||||
return False
|
||||
overlap = len(ta & tb) / min(len(ta), len(tb))
|
||||
return overlap >= 0.6
|
||||
|
||||
|
||||
class VideoQCExecutor:
|
||||
"""Execute video QC checks with frame extraction and AI analysis."""
|
||||
|
||||
def __init__(self, session_id: str, file_path: str, job_number: str = None,
|
||||
llm_provider: str = 'google', llm_model: str = 'gemini-2.5-flash',
|
||||
user: str = None, campaign_id: str = None, batch_id: str = None,
|
||||
pricing_reference_id: int = None):
|
||||
pricing_reference_id: int = None, srt_path: str = None):
|
||||
self.session_id = session_id
|
||||
self.file_path = file_path
|
||||
self.job_number = job_number
|
||||
|
|
@ -60,6 +170,7 @@ class VideoQCExecutor:
|
|||
self.campaign_id = campaign_id
|
||||
self.batch_id = batch_id
|
||||
self.pricing_reference_id = pricing_reference_id
|
||||
self.srt_path = srt_path
|
||||
self.progress = UnifiedProgressTracker(session_id)
|
||||
self.results = {}
|
||||
self.campaign_context = {}
|
||||
|
|
@ -153,10 +264,55 @@ class VideoQCExecutor:
|
|||
self.results['price_currency'] = price_result
|
||||
logger.info(f"Price/currency: {price_result['status']} ({price_result['score']})")
|
||||
|
||||
# Step 5b: Garment-name check
|
||||
self.progress.update(82, "Running garment-name check...")
|
||||
if self.pricing_reference:
|
||||
garment_result = self._run_garment_name_check()
|
||||
else:
|
||||
garment_result = {
|
||||
'check_name': 'garment_name', 'score': 100.0, 'status': 'skipped',
|
||||
'message': 'Skipped — no pricing reference attached to this run',
|
||||
'details': {'skipped': True, 'reason': 'no_pricing_reference'},
|
||||
'weight': 25,
|
||||
}
|
||||
self.results['garment_name'] = garment_result
|
||||
logger.info(f"Garment name: {garment_result['status']} ({garment_result['score']})")
|
||||
|
||||
# Step 5c: Title-safe advisory
|
||||
self.progress.update(85, "Running title-safe placement check...")
|
||||
title_safe_result = self._run_title_safe_check()
|
||||
self.results['title_safe'] = title_safe_result
|
||||
logger.info(f"Title safe: {title_safe_result['status']} ({title_safe_result['score']})")
|
||||
|
||||
# Step 5d: SRT checks (only if a paired SRT was provided)
|
||||
srt_results = {}
|
||||
for name, weight in (('srt_structure', 15), ('srt_timing', 10), ('srt_language', 20)):
|
||||
srt_results[name] = {
|
||||
'check_name': name, 'score': 100.0, 'status': 'skipped',
|
||||
'message': 'No SRT paired with this video — SRT checks skipped',
|
||||
'details': {'skipped': True, 'reason': 'no_srt_paired'},
|
||||
'weight': weight,
|
||||
}
|
||||
if self.srt_path:
|
||||
self.progress.update(89, "Running SRT structure check...")
|
||||
srt_results['srt_structure'] = self._run_srt_structure_check()
|
||||
self.progress.update(91, "Running SRT timing check...")
|
||||
srt_results['srt_timing'] = self._run_srt_timing_check(duration)
|
||||
self.progress.update(93, "Running SRT language check...")
|
||||
srt_results['srt_language'] = self._run_srt_language_check()
|
||||
for k, v in srt_results.items():
|
||||
self.results[k] = v
|
||||
logger.info(f"{k}: {v['status']} ({v['score']})")
|
||||
|
||||
# Step 6: Calculate overall score — weighted mean of non-skipped checks
|
||||
self.progress.update(87, "Calculating overall score...")
|
||||
active_checks = [r for r in (quality_result, censorship_result, price_result)
|
||||
active_checks = [r for r in (quality_result, censorship_result, price_result,
|
||||
garment_result,
|
||||
srt_results['srt_structure'],
|
||||
srt_results['srt_timing'],
|
||||
srt_results['srt_language'])
|
||||
if r.get('status') != 'skipped']
|
||||
# title_safe_result intentionally excluded — advisory only (weight=0)
|
||||
if active_checks:
|
||||
total_weight = sum(r.get('weight', 1) for r in active_checks)
|
||||
weighted_sum = sum(r['score'] * r.get('weight', 1) for r in active_checks)
|
||||
|
|
@ -780,10 +936,33 @@ Respond in JSON:
|
|||
'llm_model': self.llm_model
|
||||
}
|
||||
|
||||
if status == 'passed':
|
||||
matched_price = matched[0] if matched else None
|
||||
matched_product = matched[1].get('product_name') if matched else None
|
||||
if status == 'passed' and matched_price is not None:
|
||||
product_suffix = f" — {matched_product}" if matched_product else ""
|
||||
message = (
|
||||
f"Matched: {currency}{matched_price}{product_suffix} "
|
||||
f"(locale {language})"
|
||||
)
|
||||
elif status == 'passed':
|
||||
message = f'Price/currency passed — {currency} correct for {language}'
|
||||
else:
|
||||
message = f'Price/currency issues: {", ".join(issues)}'
|
||||
exp_list = details['price_match']['expected_prices']
|
||||
exp_total = len(expected_entries)
|
||||
expected_blurb = ''
|
||||
if exp_list:
|
||||
shown = exp_list[:5]
|
||||
expected_blurb = (
|
||||
f' Expected one of: {", ".join(shown)}'
|
||||
+ (f' ({len(shown)} of {exp_total} shown)' if exp_total > len(shown) else '')
|
||||
)
|
||||
detected_blurb = ''
|
||||
if detected_strings:
|
||||
detected_blurb = f' Detected: {", ".join(detected_strings)}.'
|
||||
message = (
|
||||
f'Price/currency issues: {", ".join(issues)}.'
|
||||
+ detected_blurb + expected_blurb
|
||||
)
|
||||
|
||||
return {
|
||||
'check_name': 'price_currency',
|
||||
|
|
@ -805,6 +984,601 @@ Respond in JSON:
|
|||
'weight': weight
|
||||
}
|
||||
|
||||
def _run_garment_name_check(self) -> Dict[str, Any]:
|
||||
"""Detect on-screen garment/product names and match against the
|
||||
pricing reference's product_name for the file's locale."""
|
||||
weight = 25
|
||||
try:
|
||||
language, country_code = self._extract_locale_from_filename()
|
||||
if not language:
|
||||
return {
|
||||
'check_name': 'garment_name', 'score': 100.0, 'status': 'skipped',
|
||||
'message': 'Skipped — could not extract locale from filename',
|
||||
'details': {'skipped': True, 'reason': 'no_locale'},
|
||||
'weight': weight,
|
||||
}
|
||||
if language.split('-')[0].upper() in ('GEN', 'CEN'):
|
||||
return {
|
||||
'check_name': 'garment_name', 'score': 100.0, 'status': 'skipped',
|
||||
'message': f'Skipped for {language.upper()} (generic/censored market)',
|
||||
'details': {'skipped': True, 'reason': 'gen_cen_file'},
|
||||
'weight': weight,
|
||||
}
|
||||
|
||||
pricing_ref = self.pricing_reference or {}
|
||||
prices_list = pricing_ref.get('prices') or []
|
||||
expected_entries = [
|
||||
p for p in prices_list
|
||||
if p.get('language') == language or p.get('country') == country_code
|
||||
]
|
||||
expected_names = [
|
||||
p.get('product_name') for p in expected_entries
|
||||
if p.get('product_name')
|
||||
]
|
||||
|
||||
if not expected_names:
|
||||
return {
|
||||
'check_name': 'garment_name', 'score': 100.0, 'status': 'skipped',
|
||||
'message': 'Skipped — no product_name in pricing reference for this locale',
|
||||
'details': {'skipped': True, 'reason': 'no_expected_names',
|
||||
'language': language, 'country_code': country_code},
|
||||
'weight': weight,
|
||||
}
|
||||
|
||||
# Step 1: detect garment text via LLM
|
||||
prompt = (
|
||||
"Identify any garment / product names visible as text overlays "
|
||||
"in this video (e.g. 'OVERSIZED COTTON SHIRT', 'WOOL BLAZER'). "
|
||||
"Ignore prices, CTAs, dates, logos, model names, and campaign "
|
||||
"headlines. Return ONLY valid JSON (no markdown fences):\n"
|
||||
"{detected_names: [string, ...], any_text_overlay: boolean}"
|
||||
)
|
||||
usage_context = {
|
||||
'module': 'video_qc', 'check_name': 'garment_name',
|
||||
'user': self.user, 'session_id': self.session_id,
|
||||
}
|
||||
if self._use_direct_video:
|
||||
response = LLMConfig.call_video_api(
|
||||
prompt=prompt, video_path=self.file_path,
|
||||
provider=self.llm_provider, model=self.llm_model,
|
||||
usage_context=usage_context,
|
||||
)
|
||||
else:
|
||||
response = LLMConfig.call_vision_api(
|
||||
prompt=prompt, image_asset=None,
|
||||
provider=self.llm_provider, model=self.llm_model,
|
||||
usage_context=usage_context,
|
||||
)
|
||||
text = response.get('text', '')
|
||||
start, end = text.find('{'), text.rfind('}') + 1
|
||||
detected = []
|
||||
if start != -1 and end > start:
|
||||
try:
|
||||
detected = json.loads(text[start:end]).get('detected_names') or []
|
||||
except json.JSONDecodeError:
|
||||
detected = []
|
||||
|
||||
if not detected:
|
||||
return {
|
||||
'check_name': 'garment_name', 'score': 100.0, 'status': 'skipped',
|
||||
'message': 'No garment-name text detected in video — skipping validation',
|
||||
'details': {'skipped': True, 'reason': 'no_detection',
|
||||
'expected_names_count': len(expected_names)},
|
||||
'weight': weight,
|
||||
}
|
||||
|
||||
# Step 2: deterministic match
|
||||
matched_pair = None
|
||||
for d in detected:
|
||||
for e in expected_names:
|
||||
if _product_names_match(d, e):
|
||||
matched_pair = (d, e)
|
||||
break
|
||||
if matched_pair:
|
||||
break
|
||||
|
||||
details = {
|
||||
'language': language, 'country_code': country_code,
|
||||
'detected_names': detected,
|
||||
'expected_names_sample': expected_names[:5],
|
||||
'expected_names_total': len(expected_names),
|
||||
'matched': matched_pair is not None,
|
||||
'matched_detected': matched_pair[0] if matched_pair else None,
|
||||
'matched_expected': matched_pair[1] if matched_pair else None,
|
||||
'llm_provider': self.llm_provider, 'llm_model': self.llm_model,
|
||||
'tokens_used': response.get('tokens_used'),
|
||||
}
|
||||
|
||||
if matched_pair:
|
||||
return {
|
||||
'check_name': 'garment_name', 'score': 100.0, 'status': 'passed',
|
||||
'message': (
|
||||
f'Matched: "{matched_pair[0]}" ≈ "{matched_pair[1]}" '
|
||||
f'(locale {language})'
|
||||
),
|
||||
'details': details, 'weight': weight,
|
||||
}
|
||||
return {
|
||||
'check_name': 'garment_name', 'score': 0.0, 'status': 'failed',
|
||||
'message': (
|
||||
f'No detected name matched any of {len(expected_names)} '
|
||||
f'expected product names for {language}. '
|
||||
f'Detected: {", ".join(detected)}.'
|
||||
),
|
||||
'details': details,
|
||||
'recommendations': [
|
||||
f"Verify the on-screen product name is correct for {language}.",
|
||||
f"Expected (sample): {', '.join(expected_names[:3])}.",
|
||||
],
|
||||
'weight': weight,
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"garment_name check error: {e}", exc_info=True)
|
||||
return {
|
||||
'check_name': 'garment_name', 'score': 0, 'status': 'error',
|
||||
'message': f'Error: {str(e)}',
|
||||
'details': {'error': str(e)}, 'weight': weight,
|
||||
}
|
||||
|
||||
def _run_title_safe_check(self) -> Dict[str, Any]:
|
||||
"""Advisory check — flag (never fail) when price or garment-name text
|
||||
falls inside a platform UI overlay zone. Score is always 100; status
|
||||
is 'warning' on detected issues, otherwise 'passed' / 'skipped'."""
|
||||
weight = 0 # advisory — does not contribute to overall score
|
||||
try:
|
||||
platform_info = _infer_platform_zones(os.path.basename(self.file_path))
|
||||
if not platform_info:
|
||||
return {
|
||||
'check_name': 'title_safe', 'score': 100.0, 'status': 'skipped',
|
||||
'message': 'Format has no known platform overlay zones — title-safe not applicable',
|
||||
'details': {'skipped': True, 'reason': 'no_platform_zones'},
|
||||
'weight': weight,
|
||||
}
|
||||
|
||||
zones_text = "; ".join(
|
||||
f"{z['edge']} ~{z['percent']}%" for z in platform_info['zones']
|
||||
)
|
||||
prompt = (
|
||||
"You are reviewing this video for advisory title-safe issues. "
|
||||
f"Platform: {platform_info['platform']}. "
|
||||
f"{platform_info['description']}\n\n"
|
||||
"Identify frames where the PRICE text or PRODUCT/GARMENT-NAME "
|
||||
"text falls INSIDE one of these unsafe zones: "
|
||||
f"{zones_text}. Ignore other text. Return ONLY valid JSON "
|
||||
"(no markdown fences):\n"
|
||||
"{\"issues\": [{\"frame_timestamp\": \"0:12\", "
|
||||
"\"element\": \"price\" | \"garment\", "
|
||||
"\"zone\": \"top\" | \"bottom\" | \"right\", "
|
||||
"\"description\": \"...\"}], \"advisory_only\": true}"
|
||||
)
|
||||
usage_context = {
|
||||
'module': 'video_qc', 'check_name': 'title_safe',
|
||||
'user': self.user, 'session_id': self.session_id,
|
||||
}
|
||||
response = LLMConfig.call_video_api(
|
||||
prompt=prompt, video_path=self.file_path,
|
||||
provider=self.llm_provider, model=self.llm_model,
|
||||
usage_context=usage_context,
|
||||
)
|
||||
text = response.get('text', '')
|
||||
start, end = text.find('{'), text.rfind('}') + 1
|
||||
issues = []
|
||||
if start != -1 and end > start:
|
||||
try:
|
||||
issues = json.loads(text[start:end]).get('issues') or []
|
||||
except json.JSONDecodeError:
|
||||
issues = []
|
||||
|
||||
details = {
|
||||
'platform': platform_info['platform'],
|
||||
'platform_description': platform_info['description'],
|
||||
'zones': platform_info['zones'],
|
||||
'issues': issues,
|
||||
'advisory_only': True,
|
||||
'llm_provider': self.llm_provider, 'llm_model': self.llm_model,
|
||||
'tokens_used': response.get('tokens_used'),
|
||||
}
|
||||
|
||||
if not issues:
|
||||
return {
|
||||
'check_name': 'title_safe', 'score': 100.0, 'status': 'passed',
|
||||
'message': (
|
||||
f'Advisory check — no price/garment placement issues on '
|
||||
f'{platform_info["platform"]}.'
|
||||
),
|
||||
'details': details, 'weight': weight,
|
||||
}
|
||||
|
||||
issue_blurb = "; ".join(
|
||||
f"{i.get('frame_timestamp','?')} {i.get('element','?')} "
|
||||
f"in {i.get('zone','?')}"
|
||||
for i in issues[:5]
|
||||
)
|
||||
return {
|
||||
'check_name': 'title_safe', 'score': 100.0, 'status': 'warning',
|
||||
'message': (
|
||||
f'Advisory — {len(issues)} placement issue(s) on '
|
||||
f'{platform_info["platform"]}: {issue_blurb}. '
|
||||
f'Does not affect overall score.'
|
||||
),
|
||||
'details': details,
|
||||
'recommendations': [
|
||||
f"Review price/garment positioning for {platform_info['platform']} "
|
||||
f"unsafe zones."
|
||||
],
|
||||
'weight': weight,
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"title_safe check error: {e}", exc_info=True)
|
||||
return {
|
||||
'check_name': 'title_safe', 'score': 100.0, 'status': 'error',
|
||||
'message': f'Error: {str(e)} — does not affect overall score',
|
||||
'details': {'error': str(e), 'advisory_only': True},
|
||||
'weight': weight,
|
||||
}
|
||||
|
||||
def _read_srt_text(self) -> tuple[str, dict]:
|
||||
"""Read the paired SRT file. Returns (text, encoding_info).
|
||||
|
||||
encoding_info: {encoding: str, fallback_used: bool, replacement_chars: int}
|
||||
Raises OSError if the file can't be read.
|
||||
"""
|
||||
with open(self.srt_path, 'rb') as f:
|
||||
raw = f.read()
|
||||
info = {'encoding': 'utf-8', 'fallback_used': False, 'replacement_chars': 0}
|
||||
try:
|
||||
text = raw.decode('utf-8')
|
||||
except UnicodeDecodeError:
|
||||
try:
|
||||
import chardet
|
||||
detected = chardet.detect(raw)
|
||||
enc = detected.get('encoding') or 'latin-1'
|
||||
text = raw.decode(enc, errors='replace')
|
||||
info['encoding'] = enc
|
||||
info['fallback_used'] = True
|
||||
except ImportError:
|
||||
text = raw.decode('latin-1', errors='replace')
|
||||
info['encoding'] = 'latin-1'
|
||||
info['fallback_used'] = True
|
||||
info['replacement_chars'] = text.count('<EFBFBD>')
|
||||
return text, info
|
||||
|
||||
def _run_srt_structure_check(self) -> Dict[str, Any]:
|
||||
"""Validate the paired SRT parses, is well-encoded, and has sane cues."""
|
||||
import srt as srt_lib
|
||||
weight = 15
|
||||
try:
|
||||
text, enc_info = self._read_srt_text()
|
||||
except OSError as e:
|
||||
return {
|
||||
'check_name': 'srt_structure', 'score': 0.0, 'status': 'failed',
|
||||
'message': f'Could not read SRT file: {e}',
|
||||
'details': {'error': str(e)}, 'weight': weight,
|
||||
}
|
||||
|
||||
warnings: list[str] = []
|
||||
if enc_info['fallback_used']:
|
||||
warnings.append(
|
||||
f"SRT decoded as {enc_info['encoding']} (not UTF-8). "
|
||||
"Upstream tool may be emitting legacy encoding."
|
||||
)
|
||||
|
||||
if enc_info['replacement_chars'] > 0:
|
||||
return {
|
||||
'check_name': 'srt_structure', 'score': 0.0, 'status': 'failed',
|
||||
'message': (
|
||||
f"SRT contains {enc_info['replacement_chars']} Unicode "
|
||||
f"replacement char(s) — encoding loss."
|
||||
),
|
||||
'details': {'encoding': enc_info},
|
||||
'weight': weight,
|
||||
}
|
||||
|
||||
try:
|
||||
cues = list(srt_lib.parse(text))
|
||||
except (srt_lib.SRTParseError, ValueError) as e:
|
||||
return {
|
||||
'check_name': 'srt_structure', 'score': 0.0, 'status': 'failed',
|
||||
'message': f'SRT parse error: {e}',
|
||||
'details': {'error': str(e), 'encoding': enc_info},
|
||||
'weight': weight,
|
||||
}
|
||||
|
||||
if not cues:
|
||||
return {
|
||||
'check_name': 'srt_structure', 'score': 0.0, 'status': 'failed',
|
||||
'message': 'SRT contains no cues',
|
||||
'details': {'encoding': enc_info}, 'weight': weight,
|
||||
}
|
||||
|
||||
indices = [c.index for c in cues]
|
||||
expected = list(range(1, len(cues) + 1))
|
||||
if indices != expected:
|
||||
if any(i is None for i in indices):
|
||||
warnings.append("One or more cues are missing index numbers (player-tolerant).")
|
||||
else:
|
||||
warnings.append(f"Cue indices not contiguous 1..{len(cues)}: {indices[:5]}...")
|
||||
|
||||
empty_cues = [c.index for c in cues if not (c.content or '').strip()]
|
||||
if empty_cues:
|
||||
warnings.append(f"{len(empty_cues)} cue(s) have empty text (indices: {empty_cues[:5]})")
|
||||
|
||||
score = 100.0 - 10.0 * len(warnings)
|
||||
score = max(score, 70.0)
|
||||
status = 'passed' if score >= 90 else 'warning'
|
||||
details = {
|
||||
'encoding': enc_info,
|
||||
'cue_count': len(cues),
|
||||
'warnings': warnings,
|
||||
'srt_filename': os.path.basename(self.srt_path),
|
||||
}
|
||||
message = (
|
||||
f'{len(cues)} cues parsed, encoding {enc_info["encoding"]}'
|
||||
+ (f', {len(warnings)} warning(s)' if warnings else '')
|
||||
)
|
||||
return {
|
||||
'check_name': 'srt_structure',
|
||||
'score': score, 'status': status, 'message': message,
|
||||
'details': details, 'weight': weight,
|
||||
}
|
||||
|
||||
def _run_srt_timing_check(self, video_duration: float) -> Dict[str, Any]:
|
||||
"""Validate cue timings against video duration + broadcast norms."""
|
||||
import srt as srt_lib
|
||||
weight = 10
|
||||
try:
|
||||
text, _ = self._read_srt_text()
|
||||
cues = list(srt_lib.parse(text))
|
||||
except Exception as e:
|
||||
return {
|
||||
'check_name': 'srt_timing', 'score': 0.0, 'status': 'failed',
|
||||
'message': f'Could not parse SRT for timing: {e}',
|
||||
'details': {'error': str(e)}, 'weight': weight,
|
||||
}
|
||||
|
||||
if not cues:
|
||||
return {
|
||||
'check_name': 'srt_timing', 'score': 0.0, 'status': 'failed',
|
||||
'message': 'SRT has no cues',
|
||||
'details': {}, 'weight': weight,
|
||||
}
|
||||
|
||||
failures: list[str] = []
|
||||
warnings: list[str] = []
|
||||
|
||||
# Rule 1: start < end, both >= 0
|
||||
for c in cues:
|
||||
s = c.start.total_seconds()
|
||||
e = c.end.total_seconds()
|
||||
if s < 0 or e < 0:
|
||||
failures.append(f"Cue {c.index}: negative timestamp ({s}s -> {e}s)")
|
||||
elif s >= e:
|
||||
failures.append(f"Cue {c.index}: start >= end ({s}s -> {e}s)")
|
||||
|
||||
# Rule 2: overlaps
|
||||
overlaps: list[str] = []
|
||||
for prev, nxt in zip(cues, cues[1:]):
|
||||
if prev.end > nxt.start:
|
||||
overlaps.append(
|
||||
f"Cue {prev.index} ends at {prev.end.total_seconds():.2f}s, "
|
||||
f"cue {nxt.index} starts at {nxt.start.total_seconds():.2f}s"
|
||||
)
|
||||
if overlaps:
|
||||
if len(overlaps) >= 3:
|
||||
failures.append(f"{len(overlaps)} overlapping cue pairs (first 3): " + "; ".join(overlaps[:3]))
|
||||
else:
|
||||
warnings.append(f"{len(overlaps)} overlapping cue pair(s): " + "; ".join(overlaps))
|
||||
|
||||
# Rule 3: last cue end <= video_duration + 0.5s
|
||||
if video_duration:
|
||||
last_end = cues[-1].end.total_seconds()
|
||||
if last_end > video_duration + 0.5:
|
||||
failures.append(
|
||||
f"Last cue ends at {last_end:.2f}s but video is only "
|
||||
f"{video_duration:.2f}s long"
|
||||
)
|
||||
|
||||
# Rules 4-7: warnings only (broadcast norms)
|
||||
reading_speed_outliers: list[str] = []
|
||||
line_length_outliers: list[str] = []
|
||||
line_count_outliers: list[str] = []
|
||||
duration_outliers: list[str] = []
|
||||
for c in cues:
|
||||
content = (c.content or '').strip()
|
||||
if not content:
|
||||
continue
|
||||
duration = (c.end - c.start).total_seconds()
|
||||
chars = len(content.replace('\n', ' '))
|
||||
cps = chars / duration if duration > 0 else 0
|
||||
if cps < 5 or cps > 25:
|
||||
reading_speed_outliers.append(f"cue {c.index}: {cps:.1f} cps")
|
||||
lines = content.split('\n')
|
||||
longest = max((len(line) for line in lines), default=0)
|
||||
if longest > 42:
|
||||
line_length_outliers.append(f"cue {c.index}: {longest} chars")
|
||||
if len(lines) > 2:
|
||||
line_count_outliers.append(f"cue {c.index}: {len(lines)} lines")
|
||||
if duration < 0.7 or duration > 7.0:
|
||||
duration_outliers.append(f"cue {c.index}: {duration:.2f}s")
|
||||
|
||||
for outliers, label in [
|
||||
(reading_speed_outliers, "Reading speed outside 5-25 cps"),
|
||||
(line_length_outliers, "Line length > 42 chars"),
|
||||
(line_count_outliers, "More than 2 lines per cue"),
|
||||
(duration_outliers, "Cue duration outside 0.7-7.0s"),
|
||||
]:
|
||||
if outliers:
|
||||
shown = outliers[:5]
|
||||
more = f" (+{len(outliers)-5} more)" if len(outliers) > 5 else ""
|
||||
warnings.append(f"{label}: " + ", ".join(shown) + more)
|
||||
|
||||
warning_loss = min(5.0 * len(warnings), 50.0)
|
||||
score = 100.0 - 30.0 * len(failures) - warning_loss
|
||||
score = max(score, 0.0)
|
||||
if failures:
|
||||
status = 'failed'
|
||||
elif score >= 90:
|
||||
status = 'passed'
|
||||
elif score >= 70:
|
||||
status = 'warning'
|
||||
else:
|
||||
status = 'failed'
|
||||
|
||||
return {
|
||||
'check_name': 'srt_timing', 'score': score, 'status': status,
|
||||
'message': (
|
||||
f'{len(cues)} cues; {len(failures)} failure(s), {len(warnings)} warning(s)'
|
||||
),
|
||||
'details': {
|
||||
'cue_count': len(cues),
|
||||
'video_duration': video_duration,
|
||||
'failures': failures,
|
||||
'warnings': warnings,
|
||||
'srt_filename': os.path.basename(self.srt_path),
|
||||
},
|
||||
'weight': weight,
|
||||
}
|
||||
|
||||
def _run_srt_language_check(self) -> Dict[str, Any]:
|
||||
"""Detect SRT language and compare to the locale-derived expected language."""
|
||||
import srt as srt_lib
|
||||
weight = 20
|
||||
try:
|
||||
text, _ = self._read_srt_text()
|
||||
cues = list(srt_lib.parse(text))
|
||||
except Exception as e:
|
||||
return {
|
||||
'check_name': 'srt_language', 'score': 0.0, 'status': 'failed',
|
||||
'message': f'Could not parse SRT for language check: {e}',
|
||||
'details': {'error': str(e)}, 'weight': weight,
|
||||
}
|
||||
if not cues:
|
||||
return {
|
||||
'check_name': 'srt_language', 'score': 0.0, 'status': 'failed',
|
||||
'message': 'SRT has no cues',
|
||||
'details': {}, 'weight': weight,
|
||||
}
|
||||
|
||||
lang_code, market = self._extract_locale_from_filename()
|
||||
expected_lang_code = (lang_code.split('-')[0] if lang_code else '').lower()
|
||||
expected_lang_display = _language_display(expected_lang_code)
|
||||
if not expected_lang_code:
|
||||
return {
|
||||
'check_name': 'srt_language', 'score': 100.0, 'status': 'skipped',
|
||||
'message': 'Skipped — could not extract expected language from video filename',
|
||||
'details': {'skipped': True, 'reason': 'no_expected_language'},
|
||||
'weight': weight,
|
||||
}
|
||||
|
||||
# Sample cues: first 5, last 5, plus evenly-distributed middle; cap 15 / 1500 chars
|
||||
n = len(cues)
|
||||
sample_indices = set()
|
||||
for i in range(min(5, n)):
|
||||
sample_indices.add(i)
|
||||
for i in range(min(5, n)):
|
||||
sample_indices.add(max(0, n - 1 - i))
|
||||
if n > 10:
|
||||
step = max(1, n // 6)
|
||||
for k in range(step, n - step, step):
|
||||
sample_indices.add(k)
|
||||
if len(sample_indices) >= 15:
|
||||
break
|
||||
sample_texts = []
|
||||
budget = 1500
|
||||
for i in sorted(sample_indices):
|
||||
t = (cues[i].content or '').strip().replace('\n', ' ')
|
||||
if not t:
|
||||
continue
|
||||
if budget - len(t) < 0:
|
||||
break
|
||||
sample_texts.append(t)
|
||||
budget -= len(t)
|
||||
sample_blob = "\n---\n".join(sample_texts)
|
||||
|
||||
prompt = (
|
||||
"What language is this subtitle text written in? Sample cues "
|
||||
"are separated by '---'. Return ONLY valid JSON (no markdown):\n"
|
||||
"{\"detected_language\": \"German\" or similar full English name,\n"
|
||||
" \"iso_code\": \"de\" or similar ISO-639-1 code,\n"
|
||||
" \"confidence\": 0.0-1.0,\n"
|
||||
" \"mixed_language\": boolean}\n\n"
|
||||
"Be strict — proper nouns and brand names don't count as language indicators.\n\n"
|
||||
f"SAMPLE:\n{sample_blob}"
|
||||
)
|
||||
|
||||
# TODO: when LLMConfig grows a `call_text_api` helper, route through it
|
||||
# for unified usage logging + retry. For v1 we call genai directly.
|
||||
try:
|
||||
LLMConfig.validate_configuration('google')
|
||||
import google.generativeai as genai
|
||||
gen_model = genai.GenerativeModel(self.llm_model or 'gemini-2.5-flash')
|
||||
gen_response = gen_model.generate_content([prompt])
|
||||
rtext = gen_response.text or ''
|
||||
um = getattr(gen_response, 'usage_metadata', None)
|
||||
tokens_used = getattr(um, 'total_token_count', None) if um else None
|
||||
except Exception as e:
|
||||
logger.error(f"srt_language LLM call failed: {e}", exc_info=True)
|
||||
return {
|
||||
'check_name': 'srt_language', 'score': 0.0, 'status': 'error',
|
||||
'message': f'LLM error: {e}',
|
||||
'details': {'error': str(e), 'expected_iso': expected_lang_code,
|
||||
'expected_language': expected_lang_display}, 'weight': weight,
|
||||
}
|
||||
|
||||
start, end = rtext.find('{'), rtext.rfind('}') + 1
|
||||
parsed = {}
|
||||
if start != -1 and end > start:
|
||||
try:
|
||||
parsed = json.loads(rtext[start:end])
|
||||
except json.JSONDecodeError:
|
||||
parsed = {}
|
||||
|
||||
detected_iso = (parsed.get('iso_code') or '').lower()
|
||||
detected_lang = parsed.get('detected_language') or '(unknown)'
|
||||
confidence = float(parsed.get('confidence') or 0.0)
|
||||
mixed = bool(parsed.get('mixed_language'))
|
||||
|
||||
common_details = {
|
||||
**parsed,
|
||||
'expected_iso': expected_lang_code,
|
||||
'expected_language': expected_lang_display,
|
||||
'sample_cue_count': len(sample_texts),
|
||||
'tokens_used': tokens_used,
|
||||
}
|
||||
|
||||
if detected_iso == expected_lang_code:
|
||||
return {
|
||||
'check_name': 'srt_language', 'score': 100.0, 'status': 'passed',
|
||||
'message': (
|
||||
f'Detected {detected_lang} ({detected_iso}) — matches expected '
|
||||
f'{expected_lang_display} ({expected_lang_code}).'
|
||||
),
|
||||
'details': common_details, 'weight': weight,
|
||||
}
|
||||
if mixed or confidence < 0.8:
|
||||
return {
|
||||
'check_name': 'srt_language', 'score': 50.0, 'status': 'warning',
|
||||
'message': (
|
||||
f'Low-confidence or mixed language: detected {detected_lang} '
|
||||
f'({detected_iso}) at confidence {confidence:.2f}, mixed={mixed}. '
|
||||
f'Expected {expected_lang_display} ({expected_lang_code}).'
|
||||
),
|
||||
'details': common_details, 'weight': weight,
|
||||
}
|
||||
return {
|
||||
'check_name': 'srt_language', 'score': 0.0, 'status': 'failed',
|
||||
'message': (
|
||||
f'Wrong language: detected {detected_lang} ({detected_iso}), '
|
||||
f'expected {expected_lang_display} ({expected_lang_code}).'
|
||||
),
|
||||
'details': common_details,
|
||||
'recommendations': [
|
||||
f"Verify the SRT was translated for {expected_lang_display} "
|
||||
f"({expected_lang_code}-{market})."
|
||||
],
|
||||
'weight': weight,
|
||||
}
|
||||
|
||||
def _detect_prices_in_video(self, grid_path: str, language: str) -> Dict[str, Any]:
|
||||
"""LLM call to detect prices + currency in the video (direct) or frame
|
||||
grid. Returns a dict {currency_found, currency_symbol, price_value,
|
||||
|
|
|
|||
|
|
@ -32,6 +32,39 @@ profiles:
|
|||
llm_model: "gpt-4o"
|
||||
description: "AI-powered censorship check (CEN markets only)"
|
||||
|
||||
- name: "garment_name"
|
||||
weight: 25
|
||||
enabled: true
|
||||
llm_provider: "google"
|
||||
llm_model: "gemini-2.5-flash"
|
||||
description: "Validate on-screen garment name against pricing reference"
|
||||
|
||||
- name: "title_safe"
|
||||
weight: 0
|
||||
enabled: true
|
||||
llm_provider: "google"
|
||||
llm_model: "gemini-2.5-flash"
|
||||
description: "Advisory — flag price/garment text inside platform UI overlay zones"
|
||||
|
||||
- name: "srt_structure"
|
||||
weight: 15
|
||||
enabled: true
|
||||
llm_provider: null
|
||||
description: "Validate paired SRT parses, encoding, ordering, cue text"
|
||||
|
||||
- name: "srt_timing"
|
||||
weight: 10
|
||||
enabled: true
|
||||
llm_provider: null
|
||||
description: "Validate SRT cue timings vs video duration + broadcast norms"
|
||||
|
||||
- name: "srt_language"
|
||||
weight: 20
|
||||
enabled: true
|
||||
llm_provider: "google"
|
||||
llm_model: "gemini-2.5-flash"
|
||||
description: "Detect SRT language vs expected from video locale"
|
||||
|
||||
quick_video:
|
||||
name: "Quick Video Check (BETA)"
|
||||
description: "Fast validation of essential video requirements"
|
||||
|
|
|
|||
|
|
@ -25,7 +25,9 @@ from core.models.database import db
|
|||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
ALLOWED_EXTENSIONS = {'mp4', 'mov', 'avi', 'mkv'}
|
||||
ALLOWED_VIDEO_EXTENSIONS = {'mp4', 'mov', 'avi', 'mkv'}
|
||||
ALLOWED_SRT_EXTENSIONS = {'srt'}
|
||||
ALLOWED_EXTENSIONS = ALLOWED_VIDEO_EXTENSIONS | ALLOWED_SRT_EXTENSIONS
|
||||
MAX_BATCH_FILES = 50
|
||||
|
||||
|
||||
|
|
@ -33,6 +35,14 @@ def allowed_file(filename):
|
|||
return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS
|
||||
|
||||
|
||||
def is_video(filename):
|
||||
return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_VIDEO_EXTENSIONS
|
||||
|
||||
|
||||
def is_srt(filename):
|
||||
return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_SRT_EXTENSIONS
|
||||
|
||||
|
||||
@video_qc_bp.route('/')
|
||||
@video_qc_bp.route('/index')
|
||||
def index():
|
||||
|
|
@ -135,6 +145,37 @@ def configure(session_id):
|
|||
)
|
||||
|
||||
|
||||
@video_qc_bp.route('/pairing-preview/<session_id>')
|
||||
def pairing_preview(session_id):
|
||||
"""Pre-flight: return pair_map + unpaired counts for the configure UI."""
|
||||
upload_path = os.path.join(
|
||||
current_app.config['VIDEO_QC_UPLOAD_PATH'], session_id
|
||||
)
|
||||
if not os.path.isdir(upload_path):
|
||||
return jsonify({'error': 'Session upload folder not found'}), 404
|
||||
files = os.listdir(upload_path)
|
||||
video_paths = [os.path.join(upload_path, f) for f in files if is_video(f)]
|
||||
srt_paths = [os.path.join(upload_path, f) for f in files if is_srt(f)]
|
||||
if not srt_paths:
|
||||
return jsonify({
|
||||
'pairs': [{'video': os.path.basename(v), 'srt': None} for v in video_paths],
|
||||
'unpaired_srts': [],
|
||||
'unpaired_videos': [os.path.basename(v) for v in video_paths],
|
||||
'srt_count': 0,
|
||||
})
|
||||
from .utils.srt_pairing import pair_batch
|
||||
pair_map, unpaired_srts, unpaired_videos = pair_batch(video_paths, srt_paths)
|
||||
return jsonify({
|
||||
'pairs': [
|
||||
{'video': os.path.basename(v), 'srt': os.path.basename(s) if s else None}
|
||||
for v, s in pair_map.items()
|
||||
],
|
||||
'unpaired_srts': [os.path.basename(s) for s in unpaired_srts],
|
||||
'unpaired_videos': [os.path.basename(v) for v in unpaired_videos],
|
||||
'srt_count': len(srt_paths),
|
||||
})
|
||||
|
||||
|
||||
@video_qc_bp.route('/execute', methods=['POST'])
|
||||
def execute():
|
||||
"""Start single-file video QC execution."""
|
||||
|
|
@ -217,11 +258,14 @@ def execute_batch():
|
|||
upload_path = os.path.join(
|
||||
current_app.config['VIDEO_QC_UPLOAD_PATH'], session_id
|
||||
)
|
||||
files = [f for f in os.listdir(upload_path) if allowed_file(f)]
|
||||
if not files:
|
||||
files = os.listdir(upload_path)
|
||||
video_files = [f for f in files if is_video(f)]
|
||||
srt_files = [f for f in files if is_srt(f)]
|
||||
if not video_files:
|
||||
return jsonify({'error': 'No video files found'}), 404
|
||||
|
||||
file_paths = [os.path.join(upload_path, f) for f in files]
|
||||
file_paths = [os.path.join(upload_path, f) for f in video_files]
|
||||
srt_paths = [os.path.join(upload_path, f) for f in srt_files]
|
||||
|
||||
user_email = current_user_email()
|
||||
|
||||
|
|
@ -234,6 +278,7 @@ def execute_batch():
|
|||
batch_executor = BatchVideoQCExecutor(
|
||||
session_id=session_id,
|
||||
file_paths=file_paths,
|
||||
srt_paths=srt_paths,
|
||||
job_number=job_number,
|
||||
llm_provider=llm_provider,
|
||||
llm_model=llm_model,
|
||||
|
|
|
|||
|
|
@ -29,6 +29,18 @@
|
|||
</a>
|
||||
</div>
|
||||
|
||||
{# SRT pairing pre-flight summary — hidden until JS confirms at least one SRT was uploaded #}
|
||||
<div id="srtPairingSummary" class="card mt-3" style="display: none;">
|
||||
<div class="card-header"><i class="bi bi-file-earmark-text me-2"></i>SRT pairing summary</div>
|
||||
<div class="card-body">
|
||||
<ul id="srtPairingList" class="mb-2"></ul>
|
||||
<details id="srtUnpairedWrap" style="display: none;">
|
||||
<summary class="text-muted">Unpaired SRT files (will be ignored)</summary>
|
||||
<ul id="srtUnpairedList" class="mt-2 mb-0"></ul>
|
||||
</details>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="row mt-4">
|
||||
<div class="col-md-8">
|
||||
<div class="card">
|
||||
|
|
@ -129,6 +141,49 @@
|
|||
const isBatch = {{ 'true' if file_count and file_count > 1 else 'false' }};
|
||||
const fileCount = {{ file_count or 1 }};
|
||||
|
||||
// SRT pairing pre-flight: hidden unless ≥1 SRT was uploaded for this session.
|
||||
// Renders filenames via textContent (NOT innerHTML) — names come from user
|
||||
// uploads and could contain HTML; building DOM nodes prevents XSS.
|
||||
(async function loadSrtPairing() {
|
||||
try {
|
||||
const url = window.BASE_URL + '/video-qc/pairing-preview/' + encodeURIComponent(sessionId);
|
||||
const resp = await fetch(url);
|
||||
if (!resp.ok) return;
|
||||
const data = await resp.json();
|
||||
if (!data || !data.srt_count || data.srt_count === 0) return;
|
||||
|
||||
const box = document.getElementById('srtPairingSummary');
|
||||
const list = document.getElementById('srtPairingList');
|
||||
box.style.display = '';
|
||||
|
||||
const paired = data.pairs.filter(p => p.srt).length;
|
||||
const unpairedVideos = data.unpaired_videos.length;
|
||||
const lines = [
|
||||
'✓ ' + paired + ' video(s) paired with SRTs',
|
||||
'⚠ ' + unpairedVideos + ' video(s) without an SRT — SRT checks will skip',
|
||||
'⚠ ' + data.unpaired_srts.length + ' SRT file(s) unpaired — will be ignored',
|
||||
];
|
||||
list.replaceChildren(...lines.map(text => {
|
||||
const li = document.createElement('li');
|
||||
li.textContent = text;
|
||||
return li;
|
||||
}));
|
||||
|
||||
if (data.unpaired_srts.length > 0) {
|
||||
const wrap = document.getElementById('srtUnpairedWrap');
|
||||
const ul = document.getElementById('srtUnpairedList');
|
||||
wrap.style.display = '';
|
||||
ul.replaceChildren(...data.unpaired_srts.map(name => {
|
||||
const li = document.createElement('li');
|
||||
li.textContent = name; // safe — filenames not interpreted as HTML
|
||||
return li;
|
||||
}));
|
||||
}
|
||||
} catch (err) {
|
||||
console.warn('Could not load SRT pairing preview:', err);
|
||||
}
|
||||
})();
|
||||
|
||||
(async function loadCampaigns() {
|
||||
try {
|
||||
const resp = await fetch(window.BASE_URL + '/campaigns/api/list');
|
||||
|
|
|
|||
|
|
@ -57,9 +57,9 @@
|
|||
<i class="bi bi-camera-video upload-icon"></i>
|
||||
<h4 class="mt-3">Drag & Drop Videos Here</h4>
|
||||
<p class="text-muted">or click to browse</p>
|
||||
<p><small>Supported: MP4, MOV, AVI, MKV (up to 50 files)</small></p>
|
||||
<p><small>Supported: MP4, MOV, AVI, MKV (up to 50 files). SRT subtitle files (.srt) can be included alongside videos for SRT QC.</small></p>
|
||||
<input type="file" id="fileInput" style="display: none;"
|
||||
accept=".mp4,.mov,.avi,.mkv" multiple>
|
||||
accept=".mp4,.mov,.avi,.mkv,.srt" multiple>
|
||||
</div>
|
||||
|
||||
<!-- Selected Files List -->
|
||||
|
|
@ -123,7 +123,7 @@
|
|||
const validationMessage = document.getElementById('validationMessage');
|
||||
|
||||
const MAX_FILES = 50;
|
||||
const ALLOWED_EXTENSIONS = ['mp4', 'mov', 'avi', 'mkv'];
|
||||
const ALLOWED_EXTENSIONS = ['mp4', 'mov', 'avi', 'mkv', 'srt'];
|
||||
let selectedFiles = [];
|
||||
|
||||
uploadZone.addEventListener('click', () => fileInput.click());
|
||||
|
|
|
|||
264
modules/video_qc/utils/srt_pairing.py
Normal file
264
modules/video_qc/utils/srt_pairing.py
Normal file
|
|
@ -0,0 +1,264 @@
|
|||
"""
|
||||
SRT ↔ Video filename pairing helpers.
|
||||
|
||||
Pure functions. Used by BatchVideoQCExecutor at pre-flight to pair each
|
||||
SRT file to its video, even when filenames diverge in style (e.g.
|
||||
`CFUL262B01_PP_RIO_INTRO_15C.srt_8852296_de-AT.srt` should pair with
|
||||
`7147775_AT-de_CFUL262B01_PP_RIO_INTRO_15C_4x5_SoMe_MASTER.mp4`).
|
||||
|
||||
Token parsing extracts campaign_code, clip_slug, and locale from each
|
||||
filename. score_pair() combines them into a 0.0–1.0 score; pair_batch()
|
||||
greedily assigns SRTs to videos above a 0.7 threshold.
|
||||
|
||||
Verified at REPL against the test folder testing_15may/srt/.
|
||||
"""
|
||||
import os
|
||||
import re
|
||||
from typing import Iterable
|
||||
|
||||
# Campaign-code regex — see core/utils/campaign_code.py for the canonical
|
||||
# version. We re-declare here to keep this module dependency-free, but the
|
||||
# patterns must stay in sync. Two formats supported:
|
||||
# • Legacy: 4 digits + optional uppercase letter (1013A, 4116)
|
||||
# • New: 4 alpha + 2 digits (year) + 3-5 alphanum (CFUL262B01, CFUL263C01D)
|
||||
_CAMPAIGN_RE = re.compile(
|
||||
r"\b("
|
||||
r"\d{4}[A-Z]?" # legacy
|
||||
r"|[A-Z]{4}\d{2}[A-Z0-9]{3,5}" # new (10-11 chars)
|
||||
r")\b"
|
||||
)
|
||||
|
||||
|
||||
def normalise_slug(s: str) -> str:
|
||||
"""Lowercase and strip every non-alphanumeric character.
|
||||
|
||||
`PP_RIO_INTRO_15C` -> `ppriointro15c`
|
||||
`RIO_INTRO15C` -> `riointro15c`
|
||||
`RIO-INTRO 15c` -> `riointro15c`
|
||||
|
||||
Substring containment in score_pair() handles the case where one
|
||||
slug has more tokens than the other (e.g. video's `pp_rio_intro_15c`
|
||||
vs SRT's `rio_intro_15c`).
|
||||
"""
|
||||
if not s:
|
||||
return ''
|
||||
return re.sub(r"[^a-z0-9]", "", s.lower())
|
||||
|
||||
|
||||
def canonical_locale(s: str) -> str | None:
|
||||
"""Canonicalise locale strings to 'lang-MARKET' (lowercase-Uppercase).
|
||||
|
||||
Accepts: 'AT-de', 'de-AT', 'de_AT', 'AT_de', 'de-at', etc.
|
||||
Returns 'de-AT' for any of the above. None when input doesn't look
|
||||
like a 2-2 locale pair.
|
||||
|
||||
Heuristic: of the two halves, the all-uppercase shape is the market.
|
||||
If both/neither are uppercase, treat the first as market (H&M
|
||||
convention for video filenames is Market_lang).
|
||||
"""
|
||||
if not s:
|
||||
return None
|
||||
m = re.match(r"^([A-Za-z]{2})[-_]([A-Za-z]{2})$", s.strip())
|
||||
if not m:
|
||||
return None
|
||||
a, b = m.group(1), m.group(2)
|
||||
if a.isupper() and b.islower():
|
||||
market, lang = a.upper(), b.lower()
|
||||
elif b.isupper() and a.islower():
|
||||
market, lang = b.upper(), a.lower()
|
||||
else:
|
||||
# Ambiguous case shape — assume Market_lang (H&M video convention).
|
||||
market, lang = a.upper(), b.lower()
|
||||
return f"{lang}-{market}"
|
||||
|
||||
|
||||
def parse_video_tokens(filename: str) -> dict:
|
||||
"""Extract {campaign_code, clip_slug, locale} from a video filename.
|
||||
|
||||
Video filename forms (H&M conventions):
|
||||
• `7147775_AT-de_CFUL262B01_PP_RIO_INTRO_15C_4x5_SoMe_MASTER.mp4`
|
||||
• `6898354_1013A_SPRING_W_10A_TT_9x16_TK_Stories_Vogue_ES-es.mp4`
|
||||
|
||||
clip_slug = everything between (locale OR campaign_code) and any
|
||||
aspect-ratio token (1x1 / 4x5 / 9x16) or the file extension. We
|
||||
normalise it via normalise_slug() before storage.
|
||||
"""
|
||||
base = os.path.splitext(os.path.basename(filename))[0]
|
||||
parts = base.split('_')
|
||||
|
||||
locale = None
|
||||
for p in parts:
|
||||
cl = canonical_locale(p)
|
||||
if cl:
|
||||
locale = cl
|
||||
break
|
||||
|
||||
campaign_code = None
|
||||
for p in parts:
|
||||
if _CAMPAIGN_RE.fullmatch(p):
|
||||
campaign_code = p
|
||||
break
|
||||
if not campaign_code:
|
||||
m = _CAMPAIGN_RE.search(base)
|
||||
if m:
|
||||
campaign_code = m.group(1)
|
||||
|
||||
slug_parts = []
|
||||
aspect_pat = re.compile(r"^\d+x\d+$", re.IGNORECASE)
|
||||
for p in parts:
|
||||
if p.isdigit():
|
||||
continue
|
||||
if canonical_locale(p):
|
||||
continue
|
||||
if _CAMPAIGN_RE.fullmatch(p):
|
||||
continue
|
||||
if aspect_pat.match(p):
|
||||
continue
|
||||
slug_parts.append(p)
|
||||
|
||||
raw_slug = '_'.join(slug_parts)
|
||||
return {
|
||||
'campaign_code': campaign_code,
|
||||
'clip_slug': normalise_slug(raw_slug),
|
||||
'locale': locale,
|
||||
'_raw_slug_parts': slug_parts,
|
||||
}
|
||||
|
||||
|
||||
def parse_srt_tokens(filename: str) -> dict:
|
||||
"""Extract {campaign_code, clip_slug, locale} from an SRT filename.
|
||||
|
||||
Two known forms (both seen in testing_15may/srt/):
|
||||
A. `CFUL262B01_PP_RIO_INTRO_15C.srt_8852296_de-AT.srt`
|
||||
— campaign code + clip slug + locale all present, with an
|
||||
internal `.srt_<n>_` artifact from the upstream tool.
|
||||
B. `RIO_INTRO6B_en-CH.srt`
|
||||
— no campaign code; clip slug and locale only.
|
||||
|
||||
We strip ALL `.srt` occurrences from the base, drop leading numeric
|
||||
IDs, and run the same campaign/locale/slug extraction as parse_video_tokens.
|
||||
"""
|
||||
base = os.path.basename(filename)
|
||||
base = base.replace('.srt', '')
|
||||
parts = base.split('_')
|
||||
|
||||
locale = None
|
||||
for p in parts:
|
||||
cl = canonical_locale(p)
|
||||
if cl:
|
||||
locale = cl
|
||||
break
|
||||
|
||||
campaign_code = None
|
||||
for p in parts:
|
||||
if _CAMPAIGN_RE.fullmatch(p):
|
||||
campaign_code = p
|
||||
break
|
||||
if not campaign_code:
|
||||
m = _CAMPAIGN_RE.search(base)
|
||||
if m:
|
||||
campaign_code = m.group(1)
|
||||
|
||||
slug_parts = []
|
||||
for p in parts:
|
||||
if p.isdigit():
|
||||
continue
|
||||
if canonical_locale(p):
|
||||
continue
|
||||
if _CAMPAIGN_RE.fullmatch(p):
|
||||
continue
|
||||
if not p:
|
||||
continue
|
||||
slug_parts.append(p)
|
||||
|
||||
raw_slug = '_'.join(slug_parts)
|
||||
return {
|
||||
'campaign_code': campaign_code,
|
||||
'clip_slug': normalise_slug(raw_slug),
|
||||
'locale': locale,
|
||||
'_raw_slug_parts': slug_parts,
|
||||
}
|
||||
|
||||
|
||||
PAIR_THRESHOLD = 0.7
|
||||
|
||||
|
||||
def score_pair(video_tokens: dict, srt_tokens: dict) -> float:
|
||||
"""Score how confidently this SRT belongs to this video. 0.0..1.0.
|
||||
|
||||
Weights (additive, capped at 1.0):
|
||||
• Locale match (both present, equal after canonicalisation): 0.5
|
||||
• Campaign code match (both present, equal): 0.3
|
||||
• Clip slug match (one is substring of the other, after
|
||||
normalise_slug — see parse_*_tokens): 0.4
|
||||
|
||||
Hard zero rules:
|
||||
• Both locales present AND differ -> 0.0
|
||||
• Both slugs present AND neither is substring of the other -> 0.0
|
||||
"""
|
||||
v_loc = video_tokens.get('locale')
|
||||
s_loc = srt_tokens.get('locale')
|
||||
v_code = video_tokens.get('campaign_code')
|
||||
s_code = srt_tokens.get('campaign_code')
|
||||
v_slug = video_tokens.get('clip_slug') or ''
|
||||
s_slug = srt_tokens.get('clip_slug') or ''
|
||||
|
||||
# Hard reject: locales both present and divergent
|
||||
if v_loc and s_loc and v_loc != s_loc:
|
||||
return 0.0
|
||||
# Hard reject: slugs both present and neither contains the other
|
||||
if v_slug and s_slug and v_slug not in s_slug and s_slug not in v_slug:
|
||||
return 0.0
|
||||
|
||||
score = 0.0
|
||||
if v_loc and s_loc and v_loc == s_loc:
|
||||
score += 0.5
|
||||
if v_code and s_code and v_code == s_code:
|
||||
score += 0.3
|
||||
if v_slug and s_slug and (v_slug in s_slug or s_slug in v_slug):
|
||||
score += 0.4
|
||||
|
||||
return min(score, 1.0)
|
||||
|
||||
|
||||
def pair_batch(
|
||||
video_paths: Iterable[str],
|
||||
srt_paths: Iterable[str],
|
||||
) -> tuple[dict, list[str], list[str]]:
|
||||
"""Pair each SRT to at most one video (and vice versa). Greedy on the
|
||||
highest-scoring pairs above PAIR_THRESHOLD.
|
||||
|
||||
Returns:
|
||||
pair_map: dict[video_path, srt_path | None] for every video
|
||||
(None if no SRT paired)
|
||||
unpaired_srts: list[str] of SRTs left over
|
||||
unpaired_videos: list[str] of videos with no SRT
|
||||
"""
|
||||
video_paths = list(video_paths)
|
||||
srt_paths = list(srt_paths)
|
||||
|
||||
candidates = []
|
||||
v_tokens = {v: parse_video_tokens(v) for v in video_paths}
|
||||
s_tokens = {s: parse_srt_tokens(s) for s in srt_paths}
|
||||
for v in video_paths:
|
||||
for s in srt_paths:
|
||||
score = score_pair(v_tokens[v], s_tokens[s])
|
||||
if score >= PAIR_THRESHOLD:
|
||||
candidates.append((score, v, s))
|
||||
|
||||
# Greedy: highest score first; tie-break by filename for determinism
|
||||
candidates.sort(key=lambda t: (-t[0], os.path.basename(t[1]), os.path.basename(t[2])))
|
||||
|
||||
used_videos: set = set()
|
||||
used_srts: set = set()
|
||||
pair_map: dict = {v: None for v in video_paths}
|
||||
for score, v, s in candidates:
|
||||
if v in used_videos or s in used_srts:
|
||||
continue
|
||||
pair_map[v] = s
|
||||
used_videos.add(v)
|
||||
used_srts.add(s)
|
||||
|
||||
unpaired_srts = [s for s in srt_paths if s not in used_srts]
|
||||
unpaired_videos = [v for v in video_paths if v not in used_videos]
|
||||
return pair_map, unpaired_srts, unpaired_videos
|
||||
|
|
@ -44,6 +44,10 @@ nest_asyncio>=1.5.0
|
|||
# Excel Parsing (Media Plans / Price Sheets)
|
||||
openpyxl>=3.1.0
|
||||
|
||||
# SRT subtitle parsing (Video QC — SRT checks)
|
||||
srt==3.5.3
|
||||
chardet>=5.0
|
||||
|
||||
# Configuration
|
||||
python-dotenv==1.0.0
|
||||
PyYAML==6.0.1
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue