The earlier swap to BOX_CAMPAIGNS_FOLDER_ID=133295752718 was wrong —
Video Master operates on the automation campaigns folder
(156182880490), where subfolders are named by campaign TITLE rather
than the numeric job ID used in Reporting's root.
Reverted the default in config.py and all three .env example files.
Folder naming on Box is inconsistent — '1_CFUL263C01C_Kids drop1' vs
'1_CFUL263C01F-Kids drop 2' vs 'Summer Activation 2026' all coexist.
search_subfolder now strips every non-alphanumeric character from
both the search input and the folder names before substring match,
so:
"kids drop 1" → matches "1_CFUL263C01C_Kids drop1"
"Spring 2026" → matches "4023 Spring 2026"
"winterfilm" → matches "1_WA20263C01 Winter Film"
Form label/placeholder updated to "Campaign Title" with a hint that
spaces/underscores/hyphens/case are all ignored.
The previous search_subfolder implementation paginated the entire
parent folder before falling back to Box's indexed search API. With
the campaigns folder containing thousands of children, this exceeded
even the new 5-minute background-thread cap and surfaced as 'Search
timed out after 5 minutes' to the user.
Now:
1. Hit the indexed search API first (~1-2s typical, even on huge
parents) — returns immediately on a match.
2. Fall back to a streaming enumeration only for fresh folders Box
hasn't indexed yet (~10 min latency window). Capped at 60s wall
clock so we don't loop forever on a missing campaign.
Also improves the not-found error message to mention the indexing
latency caveat — handles the otherwise-confusing case where a freshly-
created campaign folder isn't searchable for a few minutes.
- /api/search-campaign now kicks off a background thread and returns
immediately. The browser polls /api/progress/<session_id> and fetches
the cached result via the new /api/search-campaign-result/<session_id>
endpoint when complete. Box folder enumeration on a not-found campaign
was taking >30s, exceeding the GCP load balancer's response timeout
and surfacing as 'stream timeout' (not valid JSON) to the user.
- Result cached for 10 min via the existing reporting result_cache
(filesystem-backed → safe across gunicorn workers).
- Form label/placeholder/hint updated: tool accepts a campaign NUMBER,
not a campaign name. Placeholder shows '1993857' instead of
'1011A Spring SS2025'.
Video QC:
* _extract_locale_from_filename now also handles the suffix form
..._XX-yy.ext (case-insensitive both sides), so DOOH/OOH-style
adapt filenames like ..._ES-es.mp4 unblock the price_currency
check instead of skipping with "could not extract locale".
* Batch results page expires the SQLAlchemy session at the top of
the route so the post-completion reload sees committed reports
even when it lands on a different gunicorn worker than the one
that wrote them. Reload delay bumped 1s → 2s for margin.
* visual_quality prompt now passes the filename's market+language
to the LLM and tells it the on-screen copy should be in the
localized language, not the source-language guideline copy.
Stops Spanish-market videos being flagged as "language mismatch
with English campaign guidelines".
Printer Check:
* regions.json rewritten to cover all 10 H&M regions (AME, CEU,
NEU, GCN, IND, SHE, SEU, EEU, EAS, Franchise) with default-all
groups. Two judgement calls vs the screenshot: kept TR for
Turkey (TK is Tokelau in ISO and would break filename matching)
and BR for Brazil (every other code is 2-letter ISO).
Campaign codes:
* New core/utils/campaign_code.py is the single source of truth.
Matches both the legacy 4-digits-plus-optional-letter (1013A,
4116) and the new 11-char alphanumeric with year at positions
5-6 (CFUL263C01D). All four prior parser sites now import from
this helper.
Video Master:
* BOX_CAMPAIGNS_FOLDER_ID switched 156182880490 → 133295752718
(same root the Reporting tool uses). Updated config.py default
and all three .env example files.
* Match page now shows which Box folder the search runs against
(with a clickable link), and on a not-found error explains what
was searched for so missing-campaign cases are self-diagnosable.
Lifted JWT-cookie auth pattern from the AI QC sibling project:
core/auth/middleware.py validates Azure AD JWTs and stores them in
an httpOnly cookie (hm_aiqc_auth_token). Tenant membership is
enforced by JWTValidator's tid check, which is sufficient for the
tenant-wide access policy chosen for this project.
templates/login.html now drives an MSAL.js popup that POSTs the
ID token to /auth/login. base.html exposes Azure config to all
pages so the logout button can also clear the MSAL session.
app.py's @before_request now checks the JWT cookie and exposes
g.user; modules read user identity via core.auth.current_user_email
so usage logs and created_by columns now record the signed-in
user's email rather than a session value.
Legacy username/password code removed: top-level auth_middleware.py,
jwt_validator.py, deploy/generate_password.py.
- Folder discovery groups files by version (V1, V2, ...); only the highest
version per master/adapt is matched. Lower versions are reported as
"superseded" so users can see what was skipped.
- Matching is now an asymmetric 3-pass cascade per adaptation:
Pass 1: masters of same duration (±0.5s) — pHash + AKAZE
Pass 2: masters strictly longer than the adapt — pHash + AKAZE
(shorter masters can't have produced the adapt; never compared)
Pass 3: AI Vision on same-duration / different-resolution masters,
triggered only when Passes 1 and 2 find nothing (covers crops).
- AI Vision default switched from gpt-4o to gemini-2.5-flash (~10x cheaper)
and re-enabled in CampaignMatcher.
- Master temp files now persist for the whole run so Pass 3 can re-read
frames; cleanup still happens via shutil.rmtree at end of run.
- Report shows a "Resolved at" badge per match (Pass 1/2/3) and a new
Superseded Files section.
- New /video-master/report/<id>/download endpoint serves the saved HTML
with attachment headers; Download buttons added to results.html and
view_report.html.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
UsageLog now records input_tokens and output_tokens separately and costs
each side at its real rate. The old single 'blended' rate underpriced
input-heavy workloads (vision/QC) and overpriced output-heavy ones.
COST_PER_MILLION_TOKENS rebuilt against the live OpenAI, Gemini and
Anthropic pricing pages (GPT-5.4 family, GPT-4.x, o4-mini; Gemini 2.5
Pro/Flash/Flash-Lite + 1.5 legacy; Claude 4.7/4.6/4.5 + 3.x legacy).
Unknown models now warn instead of silently defaulting to $5/1M.
Adds idempotent ALTER TABLE migration on startup so existing SQLite DBs
pick up the new columns. Dashboard + API surface the input/output split.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AKAZE tier needs the actual video file to extract frames, but our
temp-download-and-delete approach means the file is gone by that point.
Perceptual hash (Tier 1) works fine with saved fingerprint data.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add BOX_CAMPAIGNS_FOLDER_ID config (156182880490) separate from
BOX_REPORT_FOLDER_ID which is for QC reports
- Update search_subfolder() to use Box search API first (fast for large
folders with 1000+ campaigns), fall back to folder listing
- Increase folder listing limit from 200 to 500
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full workflow:
- Enter campaign name → search Box for campaign folder
- Auto-discover Global Masters and Regional Masters subfolders
- Preview: shows master count, countries, adaptation count
- Phase 1: Download each master to temp, fingerprint, delete video
- Phase 2: Download each adaptation to temp, match against masters, delete
- Results: per-master adaptation mapping, unmatched items, match rate
- HTML report with detailed breakdown
- Previous Matching Jobs table with View/Delete
Box client additions:
- search_subfolder() - case-insensitive subfolder search
- list_subfolders() - enumerate child folders
- list_video_files() - list video files in folder
- download_file_to_disk() - streaming download for large files (ProRes)
Storage: only fingerprints (~50KB) + key frames stored permanently.
Videos deleted immediately after processing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New blueprint-based module system (hm_qc, video_qc, video_master,
reporting), core framework (database, config, templates), and
unified web interface with progress tracking and tab navigation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>