Documents the Video Master 3-pass duration cascade, version-aware folder discovery, AI Vision swap to Gemini 2.5 Flash, report download endpoint, and the gunicorn worker-recycle fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
26 KiB
26 KiB
Changelog
All notable changes to the HM QC Platform will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[2.5.0] - 2026-04-28
Video Master: Version-Aware Matching, 3-Pass Duration Cascade, Report Download
Folder Discovery — Version Grouping
- Files in Box are now grouped by base name (with the
V<n>token stripped). Only the highest-version master/adaptation in each group is matched. - Lower-version siblings are listed in a new Superseded Files section at the bottom of the report so users can see what was skipped.
- New helper
select_latest_versions()inmetadata_parser.pyplus a_extract_version()regex ([_-]?V<digit>[_-.]?case-insensitive). Version stored in metadata alongsideformat/variant/duration.
Asymmetric 3-Pass Matching Cascade (replaces single Stage 0 filter)
- Pass 1 — masters of the same duration as the adapt (±0.5s): perceptual hash + AKAZE. If matches found, return.
- Pass 2 — masters strictly longer than the adapt: perceptual hash + AKAZE. (Shorter masters can't have produced a longer adapt and are never compared.) If matches found, return.
- Pass 3 — masters of the same duration with different resolution: AI Vision only, fires only when Passes 1 and 2 both yield zero. Targets reframes/crops where pHash fails.
- Each match in the report carries a
pass_tier(1/2/3) shown as a coloured "Resolved at" badge — useful for triage and trust calibration.
AI Vision Provider Swap
- AI Vision default switched from
openai/gpt-4o($0.05/comparison) to$0.005/comparison) — ~10× cost reduction for the same fallback quality.google/gemini-2.5-flash( - AI Vision re-enabled in
CampaignMatcher(was hard-disabled for cost control under the previous default).
Master Temp File Lifecycle
- Master temp files now persist for the entire campaign run (cleaned up via
shutil.rmtree(temp_dir)at end). Previously they were deleted right after fingerprinting, which would have starved Pass 3 AI Vision of its source frames.
Report Download
- New endpoint
GET /video-master/report/<id>/downloadserves the saved HTML withContent-Disposition: attachment. - Download buttons added to
view_report.htmlandresults.html.
Infrastructure
gunicorn_config.py—max_requestsraised from200to5000(jitter30→500); addedgraceful_timeout=600. The previous setting recycled the worker every ~5 minutes under normal progress polling, killing any in-flight matching/QC daemon thread mid-run. The new setting lets workers run several hours between recycles, well past any single job. Trade-off: workers no longer recycle frequently, so if any C extension (ffmpeg, OpenCV) leaks memory it will accumulate — monitordocker stats hm-qc-appover time. A Celery refactor is the proper long-term fix and has been scoped but deferred.
[2.4.0] - 2026-04-23
Pricing References Library, Deterministic Price Match, Video Batch & Video Price Check
Pricing References — Standalone Library (replaces single global JSON)
- New
PricingReferencemodel — independent uploadable library, NOT tied to a campaign_id. Users pick one at QC configure time alongside (or instead of) a campaign presentation. - Campaigns tab reworked: the old single "Global Pricing Reference" upload section is replaced with a "Pricing References" library card (name + upload, per-row list with delete + status polling).
- New routes:
POST /campaigns/pricing/upload,DELETE /campaigns/pricing/<id>,GET /campaigns/api/pricing/list,GET /campaigns/api/pricing/status/<id>. - Per-row storage:
storage/pricing_references/<id>/<filename>. - Backwards-compat auto-migration: on first startup after upgrade, if
storage/reference/global_pricing.jsonexists and the table is empty, it's imported as a "Default (legacy global)" row.
Excel Mastersheet Support (new)
- Pricing upload now accepts
.xlsx/.xlsalongside.pdf. - Deterministic parser (
modules/campaigns/pricing_parser.py) — uses openpyxl, no LLM:MPC Pricessheet → flat list of{product_id, language, country, price, currency, product_name}entries.- Regional sheets (AME/CEU/EEU/NEU/SEU/FRN/SHE/GCN/EAS/IN/BR/AME Latam) → formatted prices per locale column, used to derive currency symbol, position, decimal_separator, thousands_separator.
- Skips
OLD/COPYsheets andPRICE NOT PRESENT IN REPORTcells.
- PDF parsing kept as fallback (LlamaParse + LLM, format metadata only).
New Lookup Shape
Stored in PricingReference.parsed_data_json:
{
"_format": {"en-US": {"currency_code":"USD","symbol":"$","position":"before",...}},
"_prices": [{"product_id":"1334912002","language":"en-US","price":"49.99","currency":"USD",...}]
}
Legacy flat {"<code>": {...}} still recognised (treated as _format only).
Deterministic Price Matching
- HM QC price/currency check upgraded: when the attached pricing reference has
_prices, detected prices are normalized (_normalize_pricehandles$49.99,39,99 €,1.234,56,CHF 49.95,13 995 Ft,349,-,0.999.000, etc.) and compared with 0.005 tolerance against per-locale expected rows. - LLM-based campaign-sheet fallback only runs when
_pricesis empty. - Result includes
matched_product,matched_currency, and the list of expected prices for the locale for easier debugging.
Video QC — Batch Processing (mirrors HM QC batch pattern)
- Multi-file upload (up to 50 videos) with drag-and-drop UI.
- New
BatchVideoQCExecutor— sequential processing,gc.collect()between videos, sharedbatch_idstamped into each report's metadata. - New routes:
/video-qc/execute/batch,/video-qc/results/batch/<session_id>,/video-qc/report/<id>/download,/video-qc/report/batch/<id>/download(ZIP),/video-qc/report/batch/<id>DELETE. - Batch results page: summary card (Total / Passed / Failed / Warnings / Avg Score) + per-file list with View + Download.
- Video QC index page groups history by batch (collapsible cards) + shows individual runs below.
- After run: single file → single results page; batch → batch results page.
Video QC — Price & Currency Check (new)
- New
_run_price_checkstep in video executor (weight 30). - Parses filename
Market_lang_CampaignNum_...→lang-Marketlocale. - Detects prices across video via Gemini direct video / GPT-4o frame grid, then deterministic-validates currency + price against the attached pricing reference.
- Skipped cleanly if no pricing ref, unparseable locale, GEN/CEN markets, or no price visible.
- Overall score replaced hardcoded 50/50 split with weighted mean of non-skipped checks (
visual_quality50,censorship50,price_currency30).
Bug Fixes
- Fixed Video QC ValueError on execution (
Invalid format specifier ' 85, ...): visual_quality and censorship prompt f-strings contained JSON examples with unescaped{/}that Python tried to parse as format specifiers — escaped to{{/}}. - Fixed Reporting history dashboard: "View Details" links were dead and the Parsed Data View accordion was empty when viewing saved reports from the Reporting index.
history_dashboard()was passingparsed_reports=friendly_reports(wrong kwarg, wrong shape). Aligned with the livedashboard()route: attachesfriendly_checksto each parsed report and passesreports=parsed_reports.
Infrastructure
core/models/pricing_reference.py— new model withget_format_map()/get_prices()helpers.core/models/database.py— registers the new model sodb.create_all()creates the table.app.py— runs the one-time legacy global-pricing migration on startup.config.py— addsPRICING_REF_STORAGE_PATH, keepsGLOBAL_PRICING_*_PATHfor the legacy importer.
[2.3.0] - 2026-04-16
Batch QC Improvements, Consolidated Reports & Stability Fixes
Batch QC — Download, Naming & Delete
- Consolidated report: select multiple reports (checkboxes) and download a single combined HTML with summary table + all individual reports embedded (
POST /hm-qc/report/consolidated) - Download All (ZIP): batch-level ZIP download on both results page and index page
- Per-file actions: View and Download buttons on every file in batch results
- Batch naming: batches now display their job number instead of just "Batch {date}"
- Delete batch: trash button removes all reports, HTML files, and thumbnails for a batch (
DELETE /hm-qc/report/batch/<batch_id>) - Consistent results view: batch results page now always renders from database (not ephemeral progress data), so the view is identical whether you just completed a batch or navigated back from another tab
Report Thumbnails
- Thumbnails embedded in HTML reports: asset preview image (base64-encoded) appears in the report header, making reports fully self-contained
- Thumbnails also appear in batch results per-file rows
Download Buttons Across All Pages
- Upload page: "Recent QC Reports" table now has View/Download buttons
- Index page: individual reports table now has Download button (was only View + Delete)
- Batch results page: per-file View/Download buttons + Select All + consolidated report
Memory & Stability Fixes (OOM)
- Fixed OOM crash on large batches: gunicorn worker was SIGKILL'd by OOM killer after ~7-12 files
- Switched batch processing from
ThreadPoolExecutor(2 concurrent) to sequential loop — each file gets its ownapp.app_context()andgc.collect()runs between files - Reduced LLM image max size from 2000px to 1200px (sufficient for vision analysis, 64% less RAM)
- Explicit
PIL.Image.close()anddelafter all image operations - Switched gunicorn from sync to gthread workers (2 workers x 2 threads)
- Added
max_requests=200to auto-recycle workers and release accumulated memory - Successfully processed 18-file batches at ~220MB stable memory
Flask App Context Fix
- Fixed "Working outside of application context" error during batch QC execution
ThreadPoolExecutorchild threads did not inherit Flask app context — fixed by passingapptoBatchQCExecutorand wrapping each child thread withapp.app_context()- Ensured
progress_sessionstable is created on fresh databases by importingProgressSessionininit_db()
Infrastructure
gunicorn_config.py: gthread workers,max_requests=200, timeout 300score/models/database.py: importsProgressSessionbeforedb.create_all()core/utils/progress_tracker.py: removed unusedflask.sessionimportcore/services/llm_config.py: image max 1200px, aggressive PIL cleanup
[2.1.0] - 2026-03-26
Campaign Presentations & Pricing Reference
New Module: Campaigns (modules/campaigns/)
- Campaign presentation upload (PDF): creative guidelines with typography specs, layout rules, copy text, ratio-specific mockups
- Media plan upload (Excel .xlsx): product names, prices, currency per country/language — parsed with openpyxl
- Multiple documents per campaign: linked by Campaign ID, loaded together during QC
- LlamaParse multimodal parsing for PDFs (text + page images)
- Auto-polling: status badges update in-place when parsing completes
- View page: shows extracted text and page images
- Global Pricing Reference: single PDF upload parsed into structured JSON (
storage/reference/global_pricing.json) for currency format validation (symbol, position, separators) — format only, not actual prices - API endpoints:
/campaigns/api/list,/campaigns/api/<campaign_id>,/campaigns/api/status/<id>
HM QC Module — Campaign-Aware Checks
- Filename check rewritten: flexibly extracts country code, language, dimensions, campaign number from multiple H&M naming conventions (Display, DOOH, OOH, SOME STATIC, Social, POS) — no longer requires rigid pattern
- New Price/Currency check (
price_currency_check.py): detects prices in images via LLM vision, validates currency against global pricing reference (deterministic), validates actual prices against campaign media plan when available - Campaign guidelines in quality check: campaign presentation content injected into LLM prompt for typography, layout, copy, and branding validation
- Campaign dropdown on configure page to select which campaign to validate against
- Report download route added (
/report/<id>/download) — fixes 404 on Download Report button - Campaign label in report header (e.g., "1022B - AW25 Display (2 docs) + pricing")
- Profile rebalanced: filename_parse 30%, image_quality 40%, price_currency 30%
Video QC Module — Campaign-Aware
- Campaign dropdown on configure page
- Campaign guidelines injected into visual quality check prompt
- Multi-document support: loads all campaign docs (guidelines + media plan)
Database
- New
CampaignPresentationmodel: campaign_id, campaign_name, pdf_filename, pdf_path, parsed_content, page_images_dir, has_pricing, status, error_message - New table:
campaign_presentations
Infrastructure
openpyxladded to requirements for Excel parsingllama-parseandnest_asyncioadded to requirements for PDF parsing- Dockerfile updated with
storage/campaignsandstorage/referencedirectories config.py: newCAMPAIGN_STORAGE_PATH,GLOBAL_PRICING_PDF_PATH,GLOBAL_PRICING_JSON_PATH- Background thread context fix: app reference captured before request ends
New Files
core/models/campaign_presentation.py— CampaignPresentation database modelmodules/campaigns/__init__.py,blueprint.py,routes.py,services.py,pricing_parser.pymodules/campaigns/templates/campaigns/index.html,view.htmlmodules/hm_qc/checks/price_currency_check.py— Price/currency validation check
[2.0.0] - 2026-03-21
Major Release — Full Platform Deployment & AI-Powered QC
Deployment
- Docker deployment with Dockerfile, docker-compose.yml, and .dockerignore
- Apache reverse proxy config (deploy/apache-location.conf, deploy/nginx-location.conf)
- Deploy script (deploy/deploy.sh) and password generator (deploy/generate_password.py)
- Deployed at
https://ai-sandbox.oliver.solutions/hm-ai-qc-report
Authentication
- Replaced Azure AD/MSAL with local username/password authentication
- Session-based auth with PBKDF2/scrypt password hashing
before_requesthook enforces login on all routes- Login page template (templates/login.html)
HM QC Module — Full Overhaul
- Simplified to single profile: "H&M Image Check" (filename_parse 50% + image_quality 50%)
- Removed dimension_check, censorship_check, image_parse, and quick_check/standard_pdf profiles
- Strict text legibility prompt: illegible text automatically fails (score < 70)
- Text legibility is #1 priority in AI evaluation
- False-positive prevention for multi-language words (e.g., "Rock" = skirt in German)
- LLM provider choice: OpenAI GPT-4o or Google Gemini 2.5 Flash on configure page
- Previous QC Reports table on index page with View and Delete buttons
- View saved reports:
/hm-qc/report/<id>serves saved HTML report with score summary - Back navigation: "Run Another QC" button on results page
- Fixed multi-file batch routing: configure page now correctly detects file count and routes to batch endpoint
Video QC Module — Built from Scratch
- Full workflow: Upload → Configure → Execute → Results (was previously "coming soon")
- Frame extraction: 1 frame per second using FFmpeg
extract_thumbnails() - Grid stitching: Frames composited into labeled grid image for efficient AI analysis
- Two AI checks:
- Visual Quality (50%): language consistency + text legibility + logo clarity
- Censorship (50%): body coverage compliance — auto-detects
_CENmarket from filename, skips non-CEN files
- Language false-positive prevention: validates words exist in detected primary language before flagging
- LLM provider choice (OpenAI / Google Gemini)
- Previous Video QC Reports with View/Delete
- Progress tracking (SSE + polling)
- HTML report generation
Video Master Adot — Campaign-Based Matching
- Campaign search: enter campaign name → searches Box for folder under CAMPAIGNS
- Auto-discovery: finds Global Masters and Regional Masters subfolders (case-insensitive)
- Preview: shows master count, country list with adaptation counts before starting
- Phase 1 — Fingerprint masters: temp download each master → fingerprint (~50KB) → delete video
- Phase 2 — Match adaptations: temp download each adaptation → match against fingerprints → delete
- Recursive folder search: finds videos inside subfolders (DOOH, DS, OLV, etc.)
- Results report: per-master adaptation mapping, unmatched items, match rate
- Storage efficient: only fingerprints stored (~50KB per master), videos deleted immediately
- New Box client methods:
search_subfolder(),list_subfolders(),list_video_files(),download_file_to_disk() BOX_CAMPAIGNS_FOLDER_IDconfig for campaign folder (separate from QC reports folder)
Reporting Module — Enhancements
- Box reports saved locally: after search, HTML reports saved to disk + database for instant re-viewing
- Previous Box Reports section with View button (loads from saved files, no Box re-fetch)
- Previous QC Reports moved to HM QC tab (was incorrectly on Reporting tab)
- History dashboard:
/reporting/history/<job_number>serves saved reports without Box API calls - Delete buttons on Box reports (per job number) and QC reports (per report)
- Fixed back navigation: dashboard "Back" links now go to
/reporting/index - Fixed Box search pagination: capped iterations to prevent runaway loops (was paginating through 35k+ results)
Usage Dashboard — New Module
- New tab in navigation bar
- Tracks every LLM API call: provider, model, tokens, estimated cost, user, module
UsageLogdatabase model with auto-logging fromLLMConfig.call_vision_api()- Summary cards: total calls, tokens, estimated cost (USD)
- Breakdowns: by provider, model, tool, user
- Recent calls table with full details
- Time filters: All Time, 30 Days, 7 Days, Today
- Cost estimates based on per-model token pricing
LLM Provider Support
- Google Gemini added as LLM provider (via
google-generativeaipackage) - Default Google model:
gemini-2.5-flash - Provider selector on HM QC and Video QC configure pages
LLMConfigupdated with Google Vision API integration- Cost tracking for all providers
Infrastructure
- Gunicorn production config (
gunicorn_config.py,wsgi.py) - Database path fix for Docker (
sqlite:////app/database/qc_platform.db) - Dockerfile: Debian Trixie package name fixes (
libgl1-mesa-dri,libchromaprint-tools) - Box SDK: rewrote folder methods to use
get_items()instead of.get()(fixes collaborated folder access) _get_folder_items()helper with pagination and fallback for Box API compatibility
New Files
Dockerfile,docker-compose.yml,.dockerignoredeploy/deploy.sh,deploy/apache-location.conf,deploy/nginx-location.conf,deploy/generate_password.pytemplates/login.htmlmodules/video_qc/executor.py— Video QC executor with frame extraction and AI checksmodules/video_master/campaign_matcher.py— Campaign-based master matching orchestratormodules/usage/__init__.py,modules/usage/routes.py— Usage dashboard modulemodules/usage/templates/usage/dashboard.htmlcore/models/usage_log.py— Usage tracking modelmodules/hm_qc/checks/dimension_check.py— Dimension validation checkmodules/hm_qc/templates/hm_qc/view_report.html— Saved report viewermodules/video_qc/templates/video_qc/configure.html,results.html,view_report.htmlmodules/video_master/templates/video_master/match.html,results.html,view_report.html
[1.1.0] - 2026-03-13
Added
Reporting Module
- Async search with progress bar:
POST /reporting/search/asyncstarts background search, returns session_id for progress tracking via SSE (/reporting/progress/<id>) or polling (/reporting/api/progress/<id>) - CSV export:
GET /reporting/export/csv/<job_number>and/errorsvariant — columns: Job Number, Filename, Box Link, Check Name, Status, Issue Description, Action Required - Multi-job search: Comma-separated job numbers in search input; combined dashboard at
/reporting/dashboard/multiwith cross-job summary and per-job collapsible sections - Multi-job CSV export:
GET /reporting/export/csv/multi?session_id=<id>for combined export - Error code cleanup:
ERROR_DISPLAY_MAPandACTION_GUIDANCEinreport_parser.pymap technical check names to human-readable display names with remediation guidance - Designer-friendly dashboard: Sanitized check names, error summaries, and action guidance shown by default; "Show Technical Details" toggle reveals full results/config
- Result caching: In-memory cache with 30-min TTL (
result_cache.py) stores parsed results between async search and dashboard render
HM QC Module
- Batch file upload: Upload form accepts multiple files (up to 100) with drag & drop, file list preview, and per-file remove
- Batch execution:
POST /hm-qc/execute/batchprocesses files in configurable batches (default 10) with 2-second cooldown between batches for API rate limiting - Batch results:
GET /hm-qc/results/batch/<session_id>shows summary (total/passed/failed/warnings/average score) and per-file results with scores BatchQCExecutor: New class inbatch_executor.py— reusesQCExecutorper file, tracks progress at batch level, isolates per-file errors
Changed
modules/reporting/routes.py— Reorganized with async search, CSV export, multi-job, and progress endpointsmodules/reporting/aggregator.py— Addedget_consolidated_reports_with_progress()andget_consolidated_reports_multi()with tracker integrationcore/utils/report_parser.py— Addedsanitize_error_for_display(),get_designer_friendly_checks(),ERROR_DISPLAY_MAP,ACTION_GUIDANCEmodules/reporting/templates/reporting/index.html— Progress bar replaces spinner, multi-job placeholder textmodules/reporting/templates/reporting/dashboard.html— CSV export buttons, friendly check names, technical details toggle, source badgesmodules/hm_qc/routes.py—POST /hm-qc/uploadacceptsgetlist('files')for multi-file; new/execute/batchand/results/batch/<id>routesmodules/hm_qc/templates/hm_qc/upload.html— Multi-file support withmultipleattribute, file list UI, client-side validation (max 100)modules/hm_qc/templates/hm_qc/results.html— Extended for batch results display
New Files
modules/reporting/result_cache.py— Thread-safe in-memory cache with TTLmodules/reporting/templates/reporting/dashboard_multi.html— Multi-job dashboard templatemodules/hm_qc/batch_executor.py— Batch processing executor with rate limiting
[1.0.1] - 2026-02-25
Fixed
- Authentication flow: switched from MSAL popup to redirect-based login
- Root route changed from 302 redirect to direct render to preserve URL hash fragment for MSAL
- Logout now uses
clearCache()instead oflogoutPopup()
[1.0.0] - 2026-02-25
Added
- Unified Platform: Consolidated separate QC tools into single Flask application
- Modular Architecture: Blueprint-based modules (HM QC, Video QC, Video Master, Reporting)
- Core Framework: Shared auth, database, services, and utilities in
core/ - Legacy Code Integration: Original check implementations from
hm_qcandhm_qc_videorepos preserved inmodules/*/checks/legacy/ - Azure AD Authentication: MSAL redirect flow with session management
- Database: SQLite with SQLAlchemy for QC report persistence
- 0-100 Scoring System: Configurable weighted scoring for all QC checks
- AI/LLM Integration: Configurable LLM provider (OpenAI/Anthropic) for content analysis
- Web UI: Unified interface with tab navigation across all modules
- Progress Tracking: Real-time progress updates for QC runs
Modules
- Reporting (complete): Consolidated report search from Box.com and local database
- HM QC (complete): PDF/image quality control with 2 sample checks + 20 legacy checks
- Video QC (BETA): Video quality control with 7 legacy checks
- Video Master Adot (BETA): Video fingerprinting and master matching
Security
- No hardcoded API keys (all via environment variables)
- Azure AD JWT validation
- httpOnly session cookies
- CSRF protection via SameSite=Lax
Reporting Module History
The Reporting module was developed as a standalone tool before being integrated into the unified platform. Its version history is preserved below for reference.
Reporting v2.1.0 - 2025-12-17
Added
- Real-time progress indicator with visual progress bar during campaign search
- Support for CAMPAIGNS/{CampaignNumber}/QC/ folder hierarchy
- Automatic pagination for searching through 3500+ campaign folders
Changed
- Updated Box folder structure from flat to hierarchical (CAMPAIGNS/*/QC/)
- Changed BOX_REPORT_FOLDER_ID from 303321539397 to 133295752718
Fixed
- Box SDK minimal object issue by requesting specific fields
- Pagination support for campaigns appearing later in alphabetical order
Reporting v2.0.0 - 2025-11-15
Added
- Azure AD (Microsoft Entra ID) authentication using MSAL
- JWT token validation with RS256 signature verification
- httpOnly cookies for secure session management
Changed
- Updated to use port 7183
- All API endpoints now require authentication
Reporting v1.0.0 - 2025-10-01
Added
- Initial release of QC Report Dashboard
- Box.com API integration with JWT authentication
- HTML report parsing and aggregation
- Job number search functionality