Commit graph

103 commits

Author SHA1 Message Date
nickviljoen
5de8f5fe7b Video QC: fix client-side SRT rejection in upload form
The accept='' attribute and help copy already allowed .srt, but a
separate ALLOWED_EXTENSIONS array in upload.html's JS was filtering
out .srt files as 'unsupported format'. Adding 'srt' to that array
fixes the silent skip seen on Dev (file picker showed .srt as
valid, then the submit handler dropped it).
2026-05-15 21:26:20 +02:00
nickviljoen
70700f4f91 Video QC: register SRT checks in standard_video profile
Profile YAML is descriptive metadata (executor runs unconditionally).
Documenting srt_structure (15), srt_timing (10), srt_language (20)
so the profile page reflects the live check set.
2026-05-15 20:48:14 +02:00
nickviljoen
f361d8e9a1 Video QC UI: SRT upload + pre-flight pairing summary
Upload form accepts .srt alongside .mp4. Configure page shows pair_map
counts and a collapsible list of unpaired SRTs (rendered via DOM
textContent to avoid XSS from user-controlled SRT filenames). Uses the
new /pairing-preview/<session_id> endpoint.
2026-05-15 20:47:38 +02:00
nickviljoen
57dbefe4f2 Video QC routes: accept .srt uploads + pre-flight pairing endpoint
Adds .srt to ALLOWED_EXTENSIONS; introduces is_video() / is_srt() helpers.
New /pairing-preview/<session_id> endpoint returns pair_map + unpaired lists
for the configure UI. Batch execute threads srt_paths into BatchVideoQCExecutor.
2026-05-15 20:45:55 +02:00
nickviljoen
45fe103fd3 Video QC batch: pair SRTs to videos at pre-flight
Adds optional srt_paths constructor parameter. At execute() top, runs
pair_batch() to produce pair_map / unpaired_srts / unpaired_videos.
Threads pair_map[video_path] into each per-video VideoQCExecutor as
srt_path. No-op when srt_paths is empty.
2026-05-15 20:44:33 +02:00
nickviljoen
6faf785d61 Video QC: wire srt_structure/srt_timing/srt_language into execute()
Skipped when self.srt_path is None (one result per check, weight set so
the weighted-average math is unchanged). When set, runs all three
checks sequentially with progress updates. SRT results appear as
additional cards in the existing Video QC report.
2026-05-15 20:43:37 +02:00
nickviljoen
0a1f116338 SRT QC: add srt_language check (inline Gemini text call)
Text-only LLM call samples up to 15 cues (~1500 chars), asks Gemini
Flash to identify language. Pass = ISO matches expected from video
filename's locale; warning = low confidence or mixed_language; fail =
ISO mismatch with high confidence. Weight 20.

Note: uses genai.GenerativeModel directly rather than a unified
LLMConfig.call_text_api (which doesn't exist yet). Marked TODO for
future refactor when that helper is added.
2026-05-15 20:40:21 +02:00
nickviljoen
ae510a7ecd Merge develop (SRT structure+timing checks) — resolve srt_pairing conflict keeping full implementation 2026-05-15 20:39:14 +02:00
nickviljoen
8493bd645c SRT QC: add srt_timing check
Deterministic. Validates start < end, no overlaps (>=3 overlaps fails,
otherwise warns), last cue <= video duration + 0.5s tolerance. Warning-
only rules: reading speed 5-25 cps, line length <= 42 chars, lines per
cue <= 2, cue duration 0.7-7.0s. Fail = -30, warning = -5 (capped at 50
warning-loss). Weight 10.
2026-05-15 20:34:39 +02:00
nickviljoen
e8ba567590 SRT QC: add srt_structure check
Deterministic, no LLM. Parses SRT via the srt library, validates UTF-8
encoding (with chardet fallback for non-UTF-8), no replacement chars,
non-empty cue content, ascending cue indices. Fails on parse error /
replacement chars / no cues; warnings otherwise. Weight 15.
2026-05-15 20:30:01 +02:00
nickviljoen
239d39d4eb Video QC: thread srt_path through executor constructor
Defaults to None — existing single-file flow unchanged. Used by the
upcoming SRT checks and by BatchVideoQCExecutor's pair_batch step.
2026-05-15 20:25:24 +02:00
nickviljoen
b1a657d593 SRT QC: add score_pair + pair_batch
score_pair: additive locale (0.5) + campaign code (0.3) + clip-slug
substring (0.4), capped at 1.0, with hard-reject on divergent locales
or non-overlapping slugs. pair_batch: greedy highest-first assignment
above 0.7 threshold; one SRT per video.

Verified pairs all 6 videos in testing_15may/srt/ to their SRTs.
2026-05-15 20:24:24 +02:00
nickviljoen
b61d20d084 SRT QC: add parse_video_tokens + parse_srt_tokens
Extract campaign code, clip slug, and locale from both video and SRT
filenames. Handles the two SRT styles seen in testing_15may/srt/
(campaign-code-prefixed CFUL... form, and abbreviated RIO_INTRO6B form).
Verified at REPL against test data.
2026-05-15 20:21:08 +02:00
nickviljoen
df212ea158 SRT QC: scaffold srt_pairing module with normalise_slug + canonical_locale
Pure-function helpers, verified at REPL. canonical_locale handles the
de-AT ↔ AT-de order flip between SRT and video filenames; normalise_slug
strips non-alphanumerics so RIO_INTRO_15C ≈ RIO_INTRO15C.
2026-05-15 20:18:45 +02:00
nickviljoen
868d8d8208 SRT QC: add srt library to requirements
Pure-Python SRT parser used by the upcoming srt_structure and srt_timing
checks. Pinned to 3.5.3. chardet added for non-UTF-8 fallback.
2026-05-15 20:17:40 +02:00
nickviljoen
2e9f6f43a5 Video QC: register garment_name and title_safe in standard_video profile
Profile YAML is descriptive metadata (executor runs unconditionally).
Keeping it current so the profile page and any future YAML-driven
selection reflects the live check set.
2026-05-15 12:35:11 +02:00
nickviljoen
89a42b0dfa Video QC: add title_safe advisory check
Flags (never fails) when price or garment-name text falls inside known
platform UI overlay zones (TikTok / IG Stories / IG Reels / generic
vertical). Platform inferred from filename tokens via _infer_platform_zones.
Weight 0 in profile — advisory only, never contributes to overall score.
2026-05-15 12:34:24 +02:00
nickviljoen
8d277d2cb3 Video QC: add garment_name check
Single Gemini direct-video call detects garment/product text overlays;
deterministic match against PricingReference.get_prices() product_name
for the file's locale. Skips when no pricing reference attached, locale
unparseable, GEN/CEN file, no expected product names for locale, or no
on-screen garment text detected. Weight 25 in standard_video profile.
2026-05-15 12:13:19 +02:00
nickviljoen
75063c54f9 Video QC: add product-name normalisation helpers for garment_name
Adds _normalize_product_name (lowercase, alphanumeric+space, collapse
whitespace) and _product_names_match (substring or >=60% token-set
overlap on min side). Used by the upcoming garment_name check.
2026-05-15 12:11:02 +02:00
nickviljoen
4ada4c2d59 Video QC: add platform-zones lookup helper for title_safe
Adds _PLATFORM_ZONES (TikTok / IG Stories / IG Reels / generic vertical)
and _infer_platform_zones(filename) for use by the new title_safe check.
Pure function, verified at REPL against expected filenames. No new
behaviour exposed yet — wired up in the next task.
2026-05-15 12:08:42 +02:00
nickviljoen
78f61e0ba2 Video QC: surface matched price + product in price card
The price_currency check has always done a full numeric match against
the pricing reference but the report card only showed pass/fail by
currency. Pull matched_price, matched_product, detected_prices, and
expected_prices into the message string so QC reviewers can see the
full match at a glance.

No logic changes.
2026-05-15 12:01:58 +02:00
nickviljoen
6a41fc727e Add SRT subtitle QC implementation plan
15 tasks: add srt library, build srt_pairing helpers incrementally
(normalise_slug, canonical_locale, parse_*_tokens, score_pair,
pair_batch), thread srt_path through executor, implement the three
SRT checks (structure, timing, language), wire batch pre-flight
pairing, update routes for .srt uploads and pairing-preview endpoint,
add pre-flight UI to configure template (XSS-safe DOM rendering for
user-controlled filenames), register checks in profile, end-to-end
smoke test against testing_15may/srt/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 11:49:23 +02:00
nickviljoen
b8070196a9 Add Video QC tuning implementation plan
9 tasks: investigate product_name localisation, surface matched price
in report card, add platform-zones lookup, add product-name normalisation
helpers, implement garment_name check, implement title_safe advisory
check, update profile YAML, end-to-end smoke test. Each task carries
exact code, exact file paths, and a manual verification step matching
the codebase's existing manual-smoke-test pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 11:19:19 +02:00
nickviljoen
039036bcd9 Add SRT subtitle QC design spec
Extends Video QC with SRT pairing (fuzzy match on campaign code +
clip slug + canonical locale) and three new checks: srt_structure
(deterministic, valid SRT + encoding + cue numbering), srt_timing
(deterministic, against video duration + broadcast norms), and
srt_language (text-only LLM call to detect language vs expected
locale). Audio-vs-SRT transcription comparison deferred to v2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 11:01:11 +02:00
nickviljoen
cba8ac8e5b Add Video QC tuning design spec
Surfaces existing price-match in the report card, adds a new
garment_name check (deterministic match against pricing reference),
and adds an advisory-only title_safe check for price/garment text
falling inside platform UI overlay zones. Flags open verification
steps (product_name localisation, LLM sensitivity) for implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 10:48:24 +02:00
nickviljoen
3f124318f9 Phase 4 prep: add Prod cutover runbook
Mirror of DEV_CUTOVER_RUNBOOK.md adjusted for optical-prod:
* Step 1 covers the merge-to-main + tag v3.0.0 done from the laptop
  before any server-side work begins.
* Reflects the gotchas we hit on Dev (per-repo Bitbucket Access Key,
  docker group membership, SSH alias, .env from .env.prod.example,
  Apache Include pattern).
* Adds post-cutover housekeeping: daily DB backup cron, disk-usage
  warning cron, and the sandbox decommission steps for optical-web-1
  after the 1-week soak.
2026-05-09 20:47:05 +02:00
nickviljoen
4aa74b114a HM QC: thread signed-in user into batch executor
Single-file QC populated executor.context['user'] from current_user_email()
in routes.py, but batch QC routed through BatchQCExecutor — which never
accepted a user kwarg or set context['user'] on its per-file QCExecutor
instances. Result: every LLM call from a batched HM QC run logged as
anonymous in the Usage dashboard, only single-file and Video QC runs
showed the user's email.

BatchQCExecutor now takes user and stamps it onto each per-file
executor's context just before execute(), matching the Video QC
batch executor pattern.
2026-05-09 20:40:00 +02:00
nickviljoen
a52d50d549 Reporting: show searched Box folder under the search input
Mirrors the hint pattern just added to Video Master so users can see
exactly which Box folder the search is scanning, with a clickable
link to open it in Box for self-diagnosis when a job number doesn't
turn up.
2026-05-09 20:23:06 +02:00
nickviljoen
6b8b8ea5a6 Video Master: revert campaigns folder + lenient name matching
The earlier swap to BOX_CAMPAIGNS_FOLDER_ID=133295752718 was wrong —
Video Master operates on the automation campaigns folder
(156182880490), where subfolders are named by campaign TITLE rather
than the numeric job ID used in Reporting's root.

Reverted the default in config.py and all three .env example files.

Folder naming on Box is inconsistent — '1_CFUL263C01C_Kids drop1' vs
'1_CFUL263C01F-Kids drop 2' vs 'Summer Activation 2026' all coexist.
search_subfolder now strips every non-alphanumeric character from
both the search input and the folder names before substring match,
so:
  "kids drop 1"   →  matches "1_CFUL263C01C_Kids drop1"
  "Spring 2026"   →  matches "4023 Spring 2026"
  "winterfilm"    →  matches "1_WA20263C01 Winter Film"

Form label/placeholder updated to "Campaign Title" with a hint that
spaces/underscores/hyphens/case are all ignored.
2026-05-09 20:19:35 +02:00
nickviljoen
087224976a Box: search-API-first lookup + 60s enumeration cap
The previous search_subfolder implementation paginated the entire
parent folder before falling back to Box's indexed search API. With
the campaigns folder containing thousands of children, this exceeded
even the new 5-minute background-thread cap and surfaced as 'Search
timed out after 5 minutes' to the user.

Now:
  1. Hit the indexed search API first (~1-2s typical, even on huge
     parents) — returns immediately on a match.
  2. Fall back to a streaming enumeration only for fresh folders Box
     hasn't indexed yet (~10 min latency window). Capped at 60s wall
     clock so we don't loop forever on a missing campaign.

Also improves the not-found error message to mention the indexing
latency caveat — handles the otherwise-confusing case where a freshly-
created campaign folder isn't searchable for a few minutes.
2026-05-09 20:03:53 +02:00
nickviljoen
a3aee0de2e Video Master: async campaign search + correct UI labels
- /api/search-campaign now kicks off a background thread and returns
  immediately. The browser polls /api/progress/<session_id> and fetches
  the cached result via the new /api/search-campaign-result/<session_id>
  endpoint when complete. Box folder enumeration on a not-found campaign
  was taking >30s, exceeding the GCP load balancer's response timeout
  and surfacing as 'stream timeout' (not valid JSON) to the user.
- Result cached for 10 min via the existing reporting result_cache
  (filesystem-backed → safe across gunicorn workers).
- Form label/placeholder/hint updated: tool accepts a campaign NUMBER,
  not a campaign name. Placeholder shows '1993857' instead of
  '1011A Spring SS2025'.
2026-05-09 19:52:49 +02:00
nickviljoen
a500d7b088 Six tooling fixes from Dev test pass
Video QC:
* _extract_locale_from_filename now also handles the suffix form
  ..._XX-yy.ext (case-insensitive both sides), so DOOH/OOH-style
  adapt filenames like ..._ES-es.mp4 unblock the price_currency
  check instead of skipping with "could not extract locale".
* Batch results page expires the SQLAlchemy session at the top of
  the route so the post-completion reload sees committed reports
  even when it lands on a different gunicorn worker than the one
  that wrote them. Reload delay bumped 1s → 2s for margin.
* visual_quality prompt now passes the filename's market+language
  to the LLM and tells it the on-screen copy should be in the
  localized language, not the source-language guideline copy.
  Stops Spanish-market videos being flagged as "language mismatch
  with English campaign guidelines".

Printer Check:
* regions.json rewritten to cover all 10 H&M regions (AME, CEU,
  NEU, GCN, IND, SHE, SEU, EEU, EAS, Franchise) with default-all
  groups. Two judgement calls vs the screenshot: kept TR for
  Turkey (TK is Tokelau in ISO and would break filename matching)
  and BR for Brazil (every other code is 2-letter ISO).

Campaign codes:
* New core/utils/campaign_code.py is the single source of truth.
  Matches both the legacy 4-digits-plus-optional-letter (1013A,
  4116) and the new 11-char alphanumeric with year at positions
  5-6 (CFUL263C01D). All four prior parser sites now import from
  this helper.

Video Master:
* BOX_CAMPAIGNS_FOLDER_ID switched 156182880490 → 133295752718
  (same root the Reporting tool uses). Updated config.py default
  and all three .env example files.
* Match page now shows which Box folder the search runs against
  (with a clickable link), and on a not-found error explains what
  was searched for so missing-campaign cases are self-diagnosable.
2026-05-09 18:32:23 +02:00
nickviljoen
6a2945275a Reporting: filesystem-back the search-result cache
The previous in-memory dict only worked with a single gunicorn worker.
With workers=2 in gunicorn_config.py, the async-search worker stored
the result in its own process memory while the dashboard request
landed on the other worker ~50% of the time — cache miss → fell
through to a synchronous Box fetch → exceeded the GCP load
balancer's 30s timeout, returning "stream timeout" to the user even
though the search itself succeeded.

Now stores cache entries as pickled files at storage/cache/<key>.pkl,
shared across workers via the existing volume mount. Atomic writes
via tempfile + os.replace. TTL still 30 minutes. Public API
(cache_set/get/delete/cleanup) is unchanged so call sites in
reporting/routes.py continue to work.
2026-05-09 17:46:42 +02:00
nickviljoen
9447f1684a MSAL: ensure redirectUri always ends in trailing slash
Entra registered the URIs with trailing slashes
(https://optical-{dev,prod}.oliver.solutions/hm-aiqc/), but the
JS was producing the URI without a trailing slash because Flask's
request.script_root strips it (X-Script-Name: /hm-aiqc).

Result was AADSTS50011 'Reply address did not match' on every
sign-in attempt. Now always normalise to exactly one trailing
slash, matching what's registered in Entra.
2026-05-09 17:16:02 +02:00
nickviljoen
0d1d8fd2c9 Apache: move ProxyTimeout out of <Location> (not allowed there)
ProxyTimeout cannot be set inside <Location>. Express the per-route
timeout via the timeout= parameter on ProxyPass instead, matching the
ppt-tool pattern in the optical-dev vhost.
2026-05-09 17:12:09 +02:00
nickviljoen
7622b650af Apache: consolidate dev+prod into single Include-style snippet
Match the convention used by every other app on optical-{dev,prod}:
each app ships one /opt/<app>/deploy/<app>.conf, and the per-host
vhost adds a single `Include` line.

Combines apache-dev.conf and apache-prod.conf (which were identical)
into apache-hm-aiqc.conf. Drops X-Forwarded-Proto and ProxyPreserveHost
since the parent vhost already sets them globally. Raises the body
size to 500MB inside /hm-aiqc/ for video uploads.
2026-05-09 17:05:39 +02:00
nickviljoen
aacefbd7df deploy.sh: handle first-deploy and --force re-deploys
The previous version short-circuited with 'nothing to do' when local
HEAD already matched the target — which is wrong on the first deploy
(no container yet) and inconvenient when re-running after a manual
fix. Now:
  * Auto-proceeds when no container is running for the web service.
  * Accepts --force to rebuild + restart at the same revision.
  * Skips the empty changelog section when CURRENT == TARGET.
2026-05-09 16:35:36 +02:00
nickviljoen
458c75311e Phase 3 prep: add Dev cutover runbook
Self-contained SSH-and-paste guide covering clone → .env → deploy → Apache
reload → smoke test. Includes troubleshooting for the most likely failure
modes (MSAL redirect mismatch, missing env vars, /health 503).
2026-05-09 14:42:12 +02:00
nickviljoen
e772095158 Phase 2: deploy machinery for Dev/Prod cutover
- deploy.sh dev|prod with --dry-run, auto-rollback if /health fails
  within 60s; checkpoint saved to .last_deploy_rollback before reset
- deploy/rollback.sh last|<sha> with the same Docker compose dance
- deploy/health-check.sh — curl wrapper for monitoring/oncall
- deploy/apache-{dev,prod}.conf — Location blocks proxying /hm-aiqc/
  to gunicorn on 127.0.0.1:5050 with X-Script-Name set so wsgi.py's
  ReverseProxied middleware emits prefixed URLs
- deploy/.env.{dev,prod}.example — starter envs with Azure SSO config
2026-05-09 14:08:06 +02:00
nickviljoen
84326352b2 Phase 1: replace local username/password auth with Azure AD SSO
Lifted JWT-cookie auth pattern from the AI QC sibling project:
  core/auth/middleware.py validates Azure AD JWTs and stores them in
  an httpOnly cookie (hm_aiqc_auth_token). Tenant membership is
  enforced by JWTValidator's tid check, which is sufficient for the
  tenant-wide access policy chosen for this project.

  templates/login.html now drives an MSAL.js popup that POSTs the
  ID token to /auth/login. base.html exposes Azure config to all
  pages so the logout button can also clear the MSAL session.

  app.py's @before_request now checks the JWT cookie and exposes
  g.user; modules read user identity via core.auth.current_user_email
  so usage logs and created_by columns now record the signed-in
  user's email rather than a session value.

  Legacy username/password code removed: top-level auth_middleware.py,
  jwt_validator.py, deploy/generate_password.py.
2026-05-09 13:59:29 +02:00
nickviljoen
2258fa532b Phase 0: bootstrap Alembic, add /health, prep for Dev/Prod cutover
- core/health blueprint exposes GET /health for deploy smoke tests
- Replace db.create_all() + ensure_schema() ALTER patch with Alembic
- Initial migration captures current schema (5 tables, all indexes)
- docker-entrypoint runs wait_for_db.py + flask db upgrade before gunicorn
2026-05-09 13:47:54 +02:00
nickviljoen
a0a9d0af47 Reporting: show all jobs in Previous Box Reports
Aggregate box_import reports by job_number in SQL instead of fetching
the most recent 100 rows and grouping in Python. The row-level LIMIT
hid older jobs whenever one job's rows filled the window.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:15:58 +02:00
nickviljoen
e69f077c79 Add Dev/Prod migration + SSO plan
Captures the four-phase plan to move HM QC from the shared sandbox to
dedicated Dev/Prod servers with Azure AD SSO, mirroring the AI QC sibling
project's pattern. Includes locked-in decisions (URL path, branch strategy,
shared Entra app, fresh-start data), file-by-file lift list from AI QC,
phased checklist, and the IT ticket text. Action deferred to late April.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 21:09:37 +02:00
nickviljoen
fc11a98a95 v2.5.0: Update README and CHANGELOG
Documents the Video Master 3-pass duration cascade, version-aware folder
discovery, AI Vision swap to Gemini 2.5 Flash, report download endpoint,
and the gunicorn worker-recycle fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 20:20:31 +02:00
nickviljoen
b140ab3860 Gunicorn: raise max_requests and graceful_timeout
The previous max_requests=200 caused workers to recycle every ~5 minutes
under normal progress polling (~40 req/min), killing any in-flight
background matching/QC thread on the worker. Bumping to 5000 means a
worker only recycles after several hours, well past any single job. Also
raise graceful_timeout to 600s so in-flight threads finish on legitimate
shutdowns instead of being SIGKILL'd after 30s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 13:17:43 +02:00
nickviljoen
3dd0420145 Video Master: version grouping, 3-pass duration cascade, report download
- Folder discovery groups files by version (V1, V2, ...); only the highest
  version per master/adapt is matched. Lower versions are reported as
  "superseded" so users can see what was skipped.
- Matching is now an asymmetric 3-pass cascade per adaptation:
    Pass 1: masters of same duration (±0.5s) — pHash + AKAZE
    Pass 2: masters strictly longer than the adapt — pHash + AKAZE
            (shorter masters can't have produced the adapt; never compared)
    Pass 3: AI Vision on same-duration / different-resolution masters,
            triggered only when Passes 1 and 2 find nothing (covers crops).
- AI Vision default switched from gpt-4o to gemini-2.5-flash (~10x cheaper)
  and re-enabled in CampaignMatcher.
- Master temp files now persist for the whole run so Pass 3 can re-read
  frames; cleanup still happens via shutil.rmtree at end of run.
- Report shows a "Resolved at" badge per match (Pass 1/2/3) and a new
  Superseded Files section.
- New /video-master/report/<id>/download endpoint serves the saved HTML
  with attachment headers; Download buttons added to results.html and
  view_report.html.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 12:44:43 +02:00
nickviljoen
a2ebc921aa v2.4.0: Update README and CHANGELOG
Bump version to 2.4.0 and document the changes shipped over the last
few commits:
- Pricing references library + Excel mastersheet deterministic parsing
- New lookup shape {_format, _prices}
- Deterministic price matching in HM QC price_currency check
- Video QC multi-file batch processing
- Video QC price/currency check (weight 30) + weighted-mean scoring
- Video QC f-string ValueError fix
- Reporting history dashboard view-details / accordion fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:33:48 +02:00
nickviljoen
39383db95f Pricing refs: Excel support, structured lookup, deterministic price match, video price check
A. Excel upload — /campaigns/pricing/upload now accepts .xlsx/.xls
   alongside .pdf. File picker in the campaigns UI matches.

B. Deterministic Excel parser (openpyxl, no LLM) — looks for H&M-style
   mastersheets:
     - 'MPC Prices' sheet -> flat list of {product_id, language, country,
       price, currency, product_name} entries (this is the gold mine).
     - Regional sheets (AME/CEU/EEU/...) -> formatted prices per locale
       used to derive currency symbol, position, decimal/thousands
       separators. Skips OLD/COPY sheets.
   Verified against the attached 1013A mastersheet: 448 price entries
   across 7 products x 74 locales, 139 locale format entries.

   Parser lives in modules/campaigns/pricing_parser.py alongside the
   existing PDF path (which now also returns the structured form with
   empty _prices).

   New lookup shape stored in PricingReference.parsed_data_json:
     {"_format": {"en-US": {currency_code, symbol, position, ...}, ...},
      "_prices": [{product_id, language, country, price, currency,
                   product_name}, ...]}
   Legacy flat {"<code>": {...}} is still recognised (treated as _format
   only) for backwards compatibility with the legacy global JSON import.

   Model helpers added:
     - PricingReference.get_format_map()
     - PricingReference.get_prices()
   to_dict() now reports price_count alongside entry_count.

C. Upgraded price_currency_check.py — when a pricing reference with
   _prices is attached, the check runs a deterministic comparison:
   detected price(s) -> normalize (_normalize_price handles '$49.99',
   '39,99 €', 'CHF 49.95', '1.234,56', 'Rs. 2,799', '13 995 Ft', '349,-',
   '0.999.000'...) -> compare with tol=0.005 against the expected
   per-locale rows. LLM-based campaign-sheet fallback only runs if no
   _prices are present (legacy PDF reference or has_pricing campaign
   presentation).

D. Video QC price check — new _run_price_check step in the executor.
   Parses filename (Market_lang_CampaignNum_... -> 'lang-Market' locale),
   detects prices across frames via the same Gemini/GPT-4o path the
   other checks use, then deterministic-validates against the attached
   pricing reference. Skipped if no pricing ref, unknown locale, GEN/CEN
   markets, or no price visible in video.

   Overall video score now uses weighted mean of active (non-skipped)
   checks (visual_quality w=50, censorship w=50, price_currency w=30)
   instead of the hardcoded 50/50 split — so skipping any one check
   falls through cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:52:39 +02:00
nickviljoen
e5d0d468db Pricing references: standalone library (was single global file)
The "Global Pricing Reference" is no longer a single file at
storage/reference/global_pricing.json. Pricing references are now
first-class DB rows (PricingReference model), uploadable as a library
in the Campaigns tab and selectable per-run alongside the campaign
presentation dropdown on the HM QC and Video QC configure pages.

New:
- core/models/pricing_reference.py — PricingReference model: id, name,
  pdf_filename, pdf_path, parsed_content, parsed_data_json, status,
  created_at/by. get_lookup() deserializes parsed_data_json; to_dict()
  powers the dropdown API.
- /campaigns/pricing/upload — creates a PricingReference row, saves PDF
  under storage/pricing_references/<id>/, kicks off background parse.
- /campaigns/pricing/<id> DELETE, /campaigns/api/pricing/list,
  /campaigns/api/pricing/status/<id>.
- Campaigns index: "Pricing References" table card (mirrors the
  presentations card) + upload form with optional name field.

Changed:
- pricing_parser: parse_pricing_pdf_to_dict returns (dict, raw_text);
  new parse_pricing_reference(id) runs the parse against a DB row and
  sets status to ready/error. Legacy file-based path removed.
- QCExecutor and VideoQCExecutor accept pricing_reference_id; load the
  row into context['pricing_reference']={id, name, lookup}.
- BatchQCExecutor and BatchVideoQCExecutor thread pricing_reference_id
  through to per-file executors.
- price_currency_check._validate_currency reads context instead of the
  disk file; returns 'skipped_no_reference' if no ref attached.
- HM QC + Video QC /execute and /execute/batch routes pass
  pricing_reference_id from the JSON payload.
- Configure templates for HM QC and Video QC add a second dropdown
  "Pricing Reference (Optional)" loaded from /campaigns/api/pricing/list.

Backwards compatibility:
- app.py: on startup, if storage/reference/global_pricing.json exists
  and the pricing_references table is empty, import it as a
  "Default (legacy global)" PricingReference row so existing installs
  keep a valid reference attached (user can pick it at configure time).
- config.py: retains GLOBAL_PRICING_{PDF,JSON}_PATH for the legacy
  importer; adds PRICING_REF_STORAGE_PATH for the new per-row storage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:27:09 +02:00
nickviljoen
a0cc96afaf Video QC: multi-file batch upload & processing
Mirrors the existing HM QC batch pattern so Video QC now supports
queueing and processing multiple videos from a single upload.

New:
- batch_executor.py — BatchVideoQCExecutor, sequential processing
  (gc.collect() between videos, cooldown between batches), stamps
  a shared batch_id into each report's metadata_json.
- /video-qc/execute/batch — kicks off a BatchVideoQCExecutor thread.
- /video-qc/results/batch/<session_id> — batch summary card, per-file
  list (filename, score, status, view/download), ZIP download link.
  Reuses results.html with is_batch flag.
- /video-qc/report/<id>/download, /video-qc/report/batch/<id>/download
  (ZIP), /video-qc/report/batch/<id> DELETE.

Changed:
- VideoQCExecutor accepts batch_id; writes it into metadata when set.
- /video-qc/upload accepts multi-file (request.files.getlist('files'))
  with single-file fallback; returns is_batch/filenames/file_count.
- Upload template: drag-and-drop list UI (same pattern as HM QC upload).
- Configure template: shows file count + list, swaps button text and
  POST endpoint based on file_count; redirects to results/batch when
  batch, results when single.
- Video QC index uses QCReport.get_recent_grouped to render "Batch
  Reports" (collapsible per-batch table) + "Individual Reports".

Post-run destinations:
- 1 file -> /video-qc/results/<session_id> (unchanged)
- N files -> /video-qc/results/batch/<session_id> (batch summary +
  list of reports from the run)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:58:46 +02:00