Lifted JWT-cookie auth pattern from the AI QC sibling project:
core/auth/middleware.py validates Azure AD JWTs and stores them in
an httpOnly cookie (hm_aiqc_auth_token). Tenant membership is
enforced by JWTValidator's tid check, which is sufficient for the
tenant-wide access policy chosen for this project.
templates/login.html now drives an MSAL.js popup that POSTs the
ID token to /auth/login. base.html exposes Azure config to all
pages so the logout button can also clear the MSAL session.
app.py's @before_request now checks the JWT cookie and exposes
g.user; modules read user identity via core.auth.current_user_email
so usage logs and created_by columns now record the signed-in
user's email rather than a session value.
Legacy username/password code removed: top-level auth_middleware.py,
jwt_validator.py, deploy/generate_password.py.
Aggregate box_import reports by job_number in SQL instead of fetching
the most recent 100 rows and grouping in Python. The row-level LIMIT
hid older jobs whenever one job's rows filled the window.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Folder discovery groups files by version (V1, V2, ...); only the highest
version per master/adapt is matched. Lower versions are reported as
"superseded" so users can see what was skipped.
- Matching is now an asymmetric 3-pass cascade per adaptation:
Pass 1: masters of same duration (±0.5s) — pHash + AKAZE
Pass 2: masters strictly longer than the adapt — pHash + AKAZE
(shorter masters can't have produced the adapt; never compared)
Pass 3: AI Vision on same-duration / different-resolution masters,
triggered only when Passes 1 and 2 find nothing (covers crops).
- AI Vision default switched from gpt-4o to gemini-2.5-flash (~10x cheaper)
and re-enabled in CampaignMatcher.
- Master temp files now persist for the whole run so Pass 3 can re-read
frames; cleanup still happens via shutil.rmtree at end of run.
- Report shows a "Resolved at" badge per match (Pass 1/2/3) and a new
Superseded Files section.
- New /video-master/report/<id>/download endpoint serves the saved HTML
with attachment headers; Download buttons added to results.html and
view_report.html.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A. Excel upload — /campaigns/pricing/upload now accepts .xlsx/.xls
alongside .pdf. File picker in the campaigns UI matches.
B. Deterministic Excel parser (openpyxl, no LLM) — looks for H&M-style
mastersheets:
- 'MPC Prices' sheet -> flat list of {product_id, language, country,
price, currency, product_name} entries (this is the gold mine).
- Regional sheets (AME/CEU/EEU/...) -> formatted prices per locale
used to derive currency symbol, position, decimal/thousands
separators. Skips OLD/COPY sheets.
Verified against the attached 1013A mastersheet: 448 price entries
across 7 products x 74 locales, 139 locale format entries.
Parser lives in modules/campaigns/pricing_parser.py alongside the
existing PDF path (which now also returns the structured form with
empty _prices).
New lookup shape stored in PricingReference.parsed_data_json:
{"_format": {"en-US": {currency_code, symbol, position, ...}, ...},
"_prices": [{product_id, language, country, price, currency,
product_name}, ...]}
Legacy flat {"<code>": {...}} is still recognised (treated as _format
only) for backwards compatibility with the legacy global JSON import.
Model helpers added:
- PricingReference.get_format_map()
- PricingReference.get_prices()
to_dict() now reports price_count alongside entry_count.
C. Upgraded price_currency_check.py — when a pricing reference with
_prices is attached, the check runs a deterministic comparison:
detected price(s) -> normalize (_normalize_price handles '$49.99',
'39,99 €', 'CHF 49.95', '1.234,56', 'Rs. 2,799', '13 995 Ft', '349,-',
'0.999.000'...) -> compare with tol=0.005 against the expected
per-locale rows. LLM-based campaign-sheet fallback only runs if no
_prices are present (legacy PDF reference or has_pricing campaign
presentation).
D. Video QC price check — new _run_price_check step in the executor.
Parses filename (Market_lang_CampaignNum_... -> 'lang-Market' locale),
detects prices across frames via the same Gemini/GPT-4o path the
other checks use, then deterministic-validates against the attached
pricing reference. Skipped if no pricing ref, unknown locale, GEN/CEN
markets, or no price visible in video.
Overall video score now uses weighted mean of active (non-skipped)
checks (visual_quality w=50, censorship w=50, price_currency w=30)
instead of the hardcoded 50/50 split — so skipping any one check
falls through cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "Global Pricing Reference" is no longer a single file at
storage/reference/global_pricing.json. Pricing references are now
first-class DB rows (PricingReference model), uploadable as a library
in the Campaigns tab and selectable per-run alongside the campaign
presentation dropdown on the HM QC and Video QC configure pages.
New:
- core/models/pricing_reference.py — PricingReference model: id, name,
pdf_filename, pdf_path, parsed_content, parsed_data_json, status,
created_at/by. get_lookup() deserializes parsed_data_json; to_dict()
powers the dropdown API.
- /campaigns/pricing/upload — creates a PricingReference row, saves PDF
under storage/pricing_references/<id>/, kicks off background parse.
- /campaigns/pricing/<id> DELETE, /campaigns/api/pricing/list,
/campaigns/api/pricing/status/<id>.
- Campaigns index: "Pricing References" table card (mirrors the
presentations card) + upload form with optional name field.
Changed:
- pricing_parser: parse_pricing_pdf_to_dict returns (dict, raw_text);
new parse_pricing_reference(id) runs the parse against a DB row and
sets status to ready/error. Legacy file-based path removed.
- QCExecutor and VideoQCExecutor accept pricing_reference_id; load the
row into context['pricing_reference']={id, name, lookup}.
- BatchQCExecutor and BatchVideoQCExecutor thread pricing_reference_id
through to per-file executors.
- price_currency_check._validate_currency reads context instead of the
disk file; returns 'skipped_no_reference' if no ref attached.
- HM QC + Video QC /execute and /execute/batch routes pass
pricing_reference_id from the JSON payload.
- Configure templates for HM QC and Video QC add a second dropdown
"Pricing Reference (Optional)" loaded from /campaigns/api/pricing/list.
Backwards compatibility:
- app.py: on startup, if storage/reference/global_pricing.json exists
and the pricing_references table is empty, import it as a
"Default (legacy global)" PricingReference row so existing installs
keep a valid reference attached (user can pick it at configure time).
- config.py: retains GLOBAL_PRICING_{PDF,JSON}_PATH for the legacy
importer; adds PRICING_REF_STORAGE_PATH for the new per-row storage.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the existing HM QC batch pattern so Video QC now supports
queueing and processing multiple videos from a single upload.
New:
- batch_executor.py — BatchVideoQCExecutor, sequential processing
(gc.collect() between videos, cooldown between batches), stamps
a shared batch_id into each report's metadata_json.
- /video-qc/execute/batch — kicks off a BatchVideoQCExecutor thread.
- /video-qc/results/batch/<session_id> — batch summary card, per-file
list (filename, score, status, view/download), ZIP download link.
Reuses results.html with is_batch flag.
- /video-qc/report/<id>/download, /video-qc/report/batch/<id>/download
(ZIP), /video-qc/report/batch/<id> DELETE.
Changed:
- VideoQCExecutor accepts batch_id; writes it into metadata when set.
- /video-qc/upload accepts multi-file (request.files.getlist('files'))
with single-file fallback; returns is_batch/filenames/file_count.
- Upload template: drag-and-drop list UI (same pattern as HM QC upload).
- Configure template: shows file count + list, swaps button text and
POST endpoint based on file_count; redirects to results/batch when
batch, results when single.
- Video QC index uses QCReport.get_recent_grouped to render "Batch
Reports" (collapsible per-batch table) + "Individual Reports".
Post-run destinations:
- 1 file -> /video-qc/results/<session_id> (unchanged)
- N files -> /video-qc/results/batch/<session_id> (batch summary +
list of reports from the run)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- video_qc/executor.py: escape braces in JSON example blocks inside
f-string prompts (visual_quality, censorship). Unescaped { } made
Python parse the example as format specifiers, raising
"Invalid format specifier ' 85, ..." and failing execution.
- reporting/routes.py: history_dashboard now passes reports=parsed_reports
(matching the live dashboard route) and attaches friendly_checks per
report. Previously passed parsed_reports=friendly_reports, a kwarg
the template does not consume, leaving the Parsed Data View accordion
empty and breaking the "View Details" scroll-to-file links.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
UsageLog now records input_tokens and output_tokens separately and costs
each side at its real rate. The old single 'blended' rate underpriced
input-heavy workloads (vision/QC) and overpriced output-heavy ones.
COST_PER_MILLION_TOKENS rebuilt against the live OpenAI, Gemini and
Anthropic pricing pages (GPT-5.4 family, GPT-4.x, o4-mini; Gemini 2.5
Pro/Flash/Flash-Lite + 1.5 legacy; Claude 4.7/4.6/4.5 + 3.x legacy).
Unknown models now warn instead of silently defaulting to $5/1M.
Adds idempotent ALTER TABLE migration on startup so existing SQLite DBs
pick up the new columns. Dashboard + API surface the input/output split.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Show job number in batch header instead of just "Batch <date>"
- Add delete batch button (trash icon) that removes all reports + files
- New DELETE /hm-qc/report/batch/<batch_id> route
- Unified batch results view: always renders from DB reports (not
ephemeral progress tracker data), so the view is identical whether
you just completed a batch or navigated back from another tab
- Include thumbnails in batch results per-file rows
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Still OOM after 7 files despite sequential processing. Root cause:
Python's allocator doesn't return freed memory to the OS, so image
buffers accumulate across files until the OOM killer strikes.
Fixes:
- Reduce LLM image max size from 2000px to 1200px (64% less RAM per
image, still sufficient for vision analysis)
- Always close PIL images immediately (not just when opened locally)
- Replace ThreadPoolExecutor with simple sequential loop + gc.collect()
after each file to force memory reclamation
- Switch gunicorn to gthread (2 workers x 2 threads) for better
request concurrency without extra memory overhead
- Add max_requests=200 to auto-recycle workers and release accumulated
memory
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Worker was SIGKILL'd by OOM killer during batch QC (18 files). Fixes:
- Reduce MAX_CONCURRENT_FILES from 2 to 1 (sequential processing)
- Reduce gunicorn workers from 4 to 2 (less memory contention)
- Explicitly close PIL images after thumbnail generation
- Close BytesIO buffers and PIL images after base64 encoding
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Embed asset thumbnail (base64) in HTML report header
- Add view/download buttons to batch results per-file rows
- Add download ZIP and consolidated report buttons to batch results
- Add view/download buttons to upload page recent reports table
- Add download button to individual reports on index page
- New POST /hm-qc/report/consolidated route: merges selected reports
into a single downloadable HTML with summary table + embedded reports
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ThreadPoolExecutor workers don't inherit the parent thread's Flask app
context, causing "Working outside of application context" errors during
batch QC execution. Pass the app instance into BatchQCExecutor and wrap
each child thread's work with app.app_context(). Also ensure the
progress_sessions table is created on fresh databases.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Video QC: Switch to Google Gemini direct video analysis as default (OpenAI frame grid fallback)
- HM QC: Group reports by batch with collapsible sections, ZIP download per batch
- HM QC: Generate asset thumbnails (150px) displayed in report listings
- Speed: Remove artificial delays, add ThreadPoolExecutor(2) for parallel batch processing
- Price detection: Improved prompt with country context, detect all prices, increased text limit
- New Printer Check module: CSV-to-PDF cross-referencing ported from CrossMatch Rust app
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Export/download links in reporting dashboards used hardcoded absolute
paths (e.g. /reporting/export/html/...) which bypassed the reverse
proxy SCRIPT_NAME prefix (/hm-ai-qc-report), causing "No file" errors.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Strip markdown code fences from LLM response before JSON parsing
- Log raw response and parsed result for debugging
- Show warning with provider/model info when detection fails (instead of silent skip)
- Separate "detection failed" (warning, 70) from "no price found" (skipped, 100)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Accept .xlsx/.xls uploads alongside PDFs in campaigns module
- New parse_campaign_excel() in services.py using openpyxl
- Converts all sheets to structured text (headers + rows) for LLM use
- Upload form now accepts both PDF and Excel files
- Added openpyxl to requirements.txt
Workflow: upload campaign presentation (PDF) + media plan (Excel with
has_pricing checked) for the same campaign ID. The price check will
use the Excel data to validate actual prices per country.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Global pricing parser now explicitly extracts format only (symbol,
position, separators) — ignores actual price values in the reference doc
- Executors load ALL ready documents for a campaign (not just the latest),
combining their content — supports guidelines + media plan side by side
- Campaign context now separates pricing_content (from has_pricing docs)
from general parsed_content (all docs combined)
- Price check uses pricing_content specifically for actual price validation
- Report header shows document count (e.g., "1022B - AW25 Display (2 docs) + pricing")
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Filename check:
- Rewritten to flexibly parse multiple H&M naming conventions
(Display, DOOH, OOH, SOME STATIC, Social, POS, DS)
- Extracts country code, language code, dimensions, campaign number
- Scores based on how much metadata was extracted (not rigid pattern)
- Tested against real filenames: BG_bg, ES_es, NO-no formats
Price/currency check (new):
- Detects prices in images via LLM vision API
- Validates currency against global pricing reference (deterministic)
- Falls back to LLM validation for unknown countries
- Optional campaign pricing sheet validation when has_pricing=True
- Added to profile with weight 30
Profile weights rebalanced: filename 30, quality 40, price 30
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add /report/<id>/download route using send_file instead of broken
static file URL (fixes 404 on Download Report button)
- Add campaign label to HTML report header (Campaign: ID - Name)
- Store campaign_id in report metadata_json for traceability
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix background parsing thread: pass app reference explicitly instead of
trying to access current_app inside the thread (was silently failing)
- Add progress bar with animated stages during upload and parsing
- Add data-id/data-status attributes to table rows for auto-polling
- On page load, automatically poll any pending/parsing rows and update
their status badges in-place (fixes stale "Pending" on tab return)
- Immediately inject new row into table after upload so user sees it
without needing to refresh
- Remove broken _parse_pricing_background function
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces a new Campaigns module for uploading campaign presentation PDFs
that QC checks reference to validate assets against campaign-specific
guidelines (typography, layout, copy, pricing format). Also adds a global
pricing reference system that maps country codes to currency symbols and
formats for deterministic price/currency validation.
- New CampaignPresentation model + campaigns blueprint with CRUD routes
- PDF parsing via LlamaParse (text + multimodal page images)
- Global pricing PDF parsed into structured JSON lookup
- Campaign context injected into both image and video QC executors
- Quality checks enhanced with campaign guidelines in LLM prompts
- Price/currency check uses global pricing lookup (saves an LLM call)
- Campaign dropdown added to HM QC and Video QC configure pages
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AKAZE tier needs the actual video file to extract frames, but our
temp-download-and-delete approach means the file is gone by that point.
Perceptual hash (Tier 1) works fine with saved fingerprint data.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add BOX_CAMPAIGNS_FOLDER_ID config (156182880490) separate from
BOX_REPORT_FOLDER_ID which is for QC reports
- Update search_subfolder() to use Box search API first (fast for large
folders with 1000+ campaigns), fall back to folder listing
- Increase folder listing limit from 200 to 500
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full workflow:
- Enter campaign name → search Box for campaign folder
- Auto-discover Global Masters and Regional Masters subfolders
- Preview: shows master count, countries, adaptation count
- Phase 1: Download each master to temp, fingerprint, delete video
- Phase 2: Download each adaptation to temp, match against masters, delete
- Results: per-master adaptation mapping, unmatched items, match rate
- HTML report with detailed breakdown
- Previous Matching Jobs table with View/Delete
Box client additions:
- search_subfolder() - case-insensitive subfolder search
- list_subfolders() - enumerate child folders
- list_video_files() - list video files in folder
- download_file_to_disk() - streaming download for large files (ProRes)
Storage: only fingerprints (~50KB) + key frames stored permanently.
Videos deleted immediately after processing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Censorship check now only runs if filename contains _CEN suffix
(matches legacy behavior). Non-CEN files get "skipped" with 100 score
- When censorship is skipped, visual quality score is 100% of overall
- Updated language consistency prompt to avoid false positives:
- Words like "Rock" (German for skirt), "Mode" (fashion), etc.
- Must verify a word is NOT valid in the primary language before flagging
- Brand names and international terms are excluded from checks
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full video QC workflow:
- Upload → Configure (LLM provider + job number) → Execute → Results
- Extracts 1 frame per second using existing FFmpeg/extract_thumbnails()
- Stitches frames into labeled grid image for efficient AI analysis
- Two separate AI checks:
1. Visual Quality (50%): language consistency, text legibility, logo clarity
2. Censorship (50%): body coverage and content appropriateness
- Progress tracking via SSE/polling
- HTML report generation with per-check scores
- Previous Video QC Reports table with View/Delete on index page
- Usage dashboard integration (logs tokens + cost per API call)
- Supports OpenAI GPT-4o and Google Gemini provider choice
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New UsageLog model tracking every LLM API call (provider, model,
tokens, estimated cost, user, module, check name)
- Instrument LLMConfig.call_vision_api() to auto-log each call
- New /usage tab in nav bar with dashboard showing:
- Summary cards (total calls, tokens, estimated cost)
- Breakdowns by provider, model, tool, and user
- Recent API calls table
- Time filters (All Time, 30 Days, 7 Days, Today)
- Cost estimates based on per-model token pricing
- Pass logged-in user through executor context for tracking
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- HM QC: trash icon per report row, DELETE /hm-qc/report/<id> removes
DB record and file from disk
- Reporting: trash icon per Box job row, DELETE /reporting/history/delete/<job>
removes all saved Box reports for that job number
- Confirmation prompts before deletion
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add /hm-qc/report/<id> route to serve saved reports by database ID
- Create view_report.html template with score summary and embedded report iframe
- Add "View" button column to Previous QC Reports table
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace dimension_check with filename_parse in H&M Image Check profile
- Rewrite quality check prompt to be much stricter on text legibility:
- Text legibility is now the #1 priority (CRITICAL check)
- Any illegible text forces score below 70 (FAILED)
- Explicit instructions to check ALL text including small overlays
- Low contrast text on dark/busy backgrounds flagged as common failure
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove "Previous QC Reports" table from reporting index
- Add "Previous QC Reports" table to HM QC index page
- Update HM QC index route to pass recent reports
- Update feature list to reflect current checks
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add /reporting/history/<job_number> route that loads saved reports from
disk/database instead of re-fetching from Box
- Split "Previous Searches" into "Previous Box Reports" and "Previous QC
Reports" sections with separate tables
- "View" buttons link to history_dashboard (reads from saved files)
- Box reports show job-grouped view, QC reports show individual files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- After a successful Box search, save downloaded HTML reports to disk
and record them in qc_reports table (report_type='box_import')
- Skip duplicates by checking box_id in metadata
- Update reporting index to show "Previous Searches" with source badges
- Rename "Recent Reports" to "Previous Searches" for clarity
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update image quality prompt to evaluate text/title legibility
- Add Google Gemini (generativeai) as LLM provider in LLMConfig
- Add AI Provider dropdown on configure page (OpenAI GPT-4o / Google Gemini)
- Pass selected provider through execute routes to override profile defaults
- Add google-generativeai to requirements.txt
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace 3 profiles with single "H&M Image Check" (dimension_check + image_quality)
- Remove filename_parse check (pattern didn't match actual filenames)
- Create DimensionCheck class for image dimension validation
- Fix configure page to route multi-file uploads to batch endpoint
- Auto-select single profile, show file list on configure page
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix back navigation on reporting dashboards (linked to / instead of /reporting/index)
- Add "Run Another QC" button on HM QC results page
- Add Recent Reports table on reporting search page (grouped by job number)
- Add Recent QC Reports table on HM QC upload page
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add Dockerfile, docker-compose.yml, .dockerignore for containerised deployment
- Add deploy/ scripts (deploy.sh, nginx/apache configs, password generator)
- Replace MSAL/Azure AD auth with local username/password authentication
- Add login.html template
- Simplify app.py, middleware, and auth routes for production use
- Update gunicorn_config.py and wsgi.py for Docker/production
- Update templates to work with new auth and URL prefix handling
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rewrite CHANGELOG.md to cover platform v1.0.0 and auth fix,
with reporting module history preserved as subsection
- Replace stale DOCUMENTATION_SUMMARY.txt with current project
structure and key decisions
- Rewrite MIGRATION_GUIDE.md to document legacy tool consolidation
with complete file mappings for hm_qc and video_qc
- Add legacy context headers to module docs (legacy_README,
legacy_DEV_SETUP, legacy_CLAUDE) pointing to main README
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge original CLI check implementations from hm_qc/ and
hm_qc_video/ repos into modules/*/checks/legacy/ directories.
Includes profiles, launchers, utils, orchestrators, and the
standalone video Flask web app. Reference files (test data,
results, cheat sheets) copied to gitignored reference/ directory.
Censorship trainset copied to gitignored data/supporting/.
The legacy/ naming convention separates original run_check()
function-based implementations from the new BaseCheck class
architecture.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New blueprint-based module system (hm_qc, video_qc, video_master,
reporting), core framework (database, config, templates), and
unified web interface with progress tracking and tab navigation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>