Commit graph

11 commits

Author SHA1 Message Date
nickviljoen
6b8b8ea5a6 Video Master: revert campaigns folder + lenient name matching
The earlier swap to BOX_CAMPAIGNS_FOLDER_ID=133295752718 was wrong —
Video Master operates on the automation campaigns folder
(156182880490), where subfolders are named by campaign TITLE rather
than the numeric job ID used in Reporting's root.

Reverted the default in config.py and all three .env example files.

Folder naming on Box is inconsistent — '1_CFUL263C01C_Kids drop1' vs
'1_CFUL263C01F-Kids drop 2' vs 'Summer Activation 2026' all coexist.
search_subfolder now strips every non-alphanumeric character from
both the search input and the folder names before substring match,
so:
  "kids drop 1"   →  matches "1_CFUL263C01C_Kids drop1"
  "Spring 2026"   →  matches "4023 Spring 2026"
  "winterfilm"    →  matches "1_WA20263C01 Winter Film"

Form label/placeholder updated to "Campaign Title" with a hint that
spaces/underscores/hyphens/case are all ignored.
2026-05-09 20:19:35 +02:00
nickviljoen
087224976a Box: search-API-first lookup + 60s enumeration cap
The previous search_subfolder implementation paginated the entire
parent folder before falling back to Box's indexed search API. With
the campaigns folder containing thousands of children, this exceeded
even the new 5-minute background-thread cap and surfaced as 'Search
timed out after 5 minutes' to the user.

Now:
  1. Hit the indexed search API first (~1-2s typical, even on huge
     parents) — returns immediately on a match.
  2. Fall back to a streaming enumeration only for fresh folders Box
     hasn't indexed yet (~10 min latency window). Capped at 60s wall
     clock so we don't loop forever on a missing campaign.

Also improves the not-found error message to mention the indexing
latency caveat — handles the otherwise-confusing case where a freshly-
created campaign folder isn't searchable for a few minutes.
2026-05-09 20:03:53 +02:00
nickviljoen
a3aee0de2e Video Master: async campaign search + correct UI labels
- /api/search-campaign now kicks off a background thread and returns
  immediately. The browser polls /api/progress/<session_id> and fetches
  the cached result via the new /api/search-campaign-result/<session_id>
  endpoint when complete. Box folder enumeration on a not-found campaign
  was taking >30s, exceeding the GCP load balancer's response timeout
  and surfacing as 'stream timeout' (not valid JSON) to the user.
- Result cached for 10 min via the existing reporting result_cache
  (filesystem-backed → safe across gunicorn workers).
- Form label/placeholder/hint updated: tool accepts a campaign NUMBER,
  not a campaign name. Placeholder shows '1993857' instead of
  '1011A Spring SS2025'.
2026-05-09 19:52:49 +02:00
nickviljoen
a500d7b088 Six tooling fixes from Dev test pass
Video QC:
* _extract_locale_from_filename now also handles the suffix form
  ..._XX-yy.ext (case-insensitive both sides), so DOOH/OOH-style
  adapt filenames like ..._ES-es.mp4 unblock the price_currency
  check instead of skipping with "could not extract locale".
* Batch results page expires the SQLAlchemy session at the top of
  the route so the post-completion reload sees committed reports
  even when it lands on a different gunicorn worker than the one
  that wrote them. Reload delay bumped 1s → 2s for margin.
* visual_quality prompt now passes the filename's market+language
  to the LLM and tells it the on-screen copy should be in the
  localized language, not the source-language guideline copy.
  Stops Spanish-market videos being flagged as "language mismatch
  with English campaign guidelines".

Printer Check:
* regions.json rewritten to cover all 10 H&M regions (AME, CEU,
  NEU, GCN, IND, SHE, SEU, EEU, EAS, Franchise) with default-all
  groups. Two judgement calls vs the screenshot: kept TR for
  Turkey (TK is Tokelau in ISO and would break filename matching)
  and BR for Brazil (every other code is 2-letter ISO).

Campaign codes:
* New core/utils/campaign_code.py is the single source of truth.
  Matches both the legacy 4-digits-plus-optional-letter (1013A,
  4116) and the new 11-char alphanumeric with year at positions
  5-6 (CFUL263C01D). All four prior parser sites now import from
  this helper.

Video Master:
* BOX_CAMPAIGNS_FOLDER_ID switched 156182880490 → 133295752718
  (same root the Reporting tool uses). Updated config.py default
  and all three .env example files.
* Match page now shows which Box folder the search runs against
  (with a clickable link), and on a not-found error explains what
  was searched for so missing-campaign cases are self-diagnosable.
2026-05-09 18:32:23 +02:00
nickviljoen
84326352b2 Phase 1: replace local username/password auth with Azure AD SSO
Lifted JWT-cookie auth pattern from the AI QC sibling project:
  core/auth/middleware.py validates Azure AD JWTs and stores them in
  an httpOnly cookie (hm_aiqc_auth_token). Tenant membership is
  enforced by JWTValidator's tid check, which is sufficient for the
  tenant-wide access policy chosen for this project.

  templates/login.html now drives an MSAL.js popup that POSTs the
  ID token to /auth/login. base.html exposes Azure config to all
  pages so the logout button can also clear the MSAL session.

  app.py's @before_request now checks the JWT cookie and exposes
  g.user; modules read user identity via core.auth.current_user_email
  so usage logs and created_by columns now record the signed-in
  user's email rather than a session value.

  Legacy username/password code removed: top-level auth_middleware.py,
  jwt_validator.py, deploy/generate_password.py.
2026-05-09 13:59:29 +02:00
nickviljoen
3dd0420145 Video Master: version grouping, 3-pass duration cascade, report download
- Folder discovery groups files by version (V1, V2, ...); only the highest
  version per master/adapt is matched. Lower versions are reported as
  "superseded" so users can see what was skipped.
- Matching is now an asymmetric 3-pass cascade per adaptation:
    Pass 1: masters of same duration (±0.5s) — pHash + AKAZE
    Pass 2: masters strictly longer than the adapt — pHash + AKAZE
            (shorter masters can't have produced the adapt; never compared)
    Pass 3: AI Vision on same-duration / different-resolution masters,
            triggered only when Passes 1 and 2 find nothing (covers crops).
- AI Vision default switched from gpt-4o to gemini-2.5-flash (~10x cheaper)
  and re-enabled in CampaignMatcher.
- Master temp files now persist for the whole run so Pass 3 can re-read
  frames; cleanup still happens via shutil.rmtree at end of run.
- Report shows a "Resolved at" badge per match (Pass 1/2/3) and a new
  Superseded Files section.
- New /video-master/report/<id>/download endpoint serves the saved HTML
  with attachment headers; Download buttons added to results.html and
  view_report.html.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 12:44:43 +02:00
nickviljoen
6341714899 Split input/output token tracking; refresh provider pricing table
UsageLog now records input_tokens and output_tokens separately and costs
each side at its real rate. The old single 'blended' rate underpriced
input-heavy workloads (vision/QC) and overpriced output-heavy ones.
COST_PER_MILLION_TOKENS rebuilt against the live OpenAI, Gemini and
Anthropic pricing pages (GPT-5.4 family, GPT-4.x, o4-mini; Gemini 2.5
Pro/Flash/Flash-Lite + 1.5 legacy; Claude 4.7/4.6/4.5 + 3.x legacy).
Unknown models now warn instead of silently defaulting to $5/1M.

Adds idempotent ALTER TABLE migration on startup so existing SQLite DBs
pick up the new columns. Dashboard + API surface the input/output split.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 14:40:13 +02:00
nickviljoen
5267e590eb Disable AKAZE for campaign matching — temp files deleted before use
AKAZE tier needs the actual video file to extract frames, but our
temp-download-and-delete approach means the file is gone by that point.
Perceptual hash (Tier 1) works fine with saved fingerprint data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 22:55:42 +02:00
nickviljoen
80d305d123 Fix Video Master: use correct Box campaigns folder ID, improve search
- Add BOX_CAMPAIGNS_FOLDER_ID config (156182880490) separate from
  BOX_REPORT_FOLDER_ID which is for QC reports
- Update search_subfolder() to use Box search API first (fast for large
  folders with 1000+ campaigns), fall back to folder listing
- Increase folder listing limit from 200 to 500

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 21:15:59 +02:00
nickviljoen
7feead49d1 Implement Video Master: campaign-based master-to-adaptation matching
Full workflow:
- Enter campaign name → search Box for campaign folder
- Auto-discover Global Masters and Regional Masters subfolders
- Preview: shows master count, countries, adaptation count
- Phase 1: Download each master to temp, fingerprint, delete video
- Phase 2: Download each adaptation to temp, match against masters, delete
- Results: per-master adaptation mapping, unmatched items, match rate
- HTML report with detailed breakdown
- Previous Matching Jobs table with View/Delete

Box client additions:
- search_subfolder() - case-insensitive subfolder search
- list_subfolders() - enumerate child folders
- list_video_files() - list video files in folder
- download_file_to_disk() - streaming download for large files (ProRes)

Storage: only fingerprints (~50KB) + key frames stored permanently.
Videos deleted immediately after processing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 21:06:37 +02:00
nickviljoen
e6f3e9387e Add modular architecture, core framework, and web UI
New blueprint-based module system (hm_qc, video_qc, video_master,
reporting), core framework (database, config, templates), and
unified web interface with progress tracking and tab navigation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 11:39:04 +02:00