# Changelog All notable changes to the HM QC Platform will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [2.5.0] - 2026-04-28 ### Video Master: Version-Aware Matching, 3-Pass Duration Cascade, Report Download #### Folder Discovery — Version Grouping - Files in Box are now grouped by base name (with the `V` token stripped). Only the **highest-version master/adaptation** in each group is matched. - Lower-version siblings are listed in a new **Superseded Files** section at the bottom of the report so users can see what was skipped. - New helper `select_latest_versions()` in `metadata_parser.py` plus a `_extract_version()` regex (`[_-]?V[_-.]?` case-insensitive). Version stored in metadata alongside `format`/`variant`/`duration`. #### Asymmetric 3-Pass Matching Cascade (replaces single Stage 0 filter) - **Pass 1** — masters of the **same duration** as the adapt (±0.5s): perceptual hash + AKAZE. If matches found, return. - **Pass 2** — masters **strictly longer** than the adapt: perceptual hash + AKAZE. (Shorter masters can't have produced a longer adapt and are never compared.) If matches found, return. - **Pass 3** — masters of the **same duration with different resolution**: AI Vision only, fires only when Passes 1 and 2 both yield zero. Targets reframes/crops where pHash fails. - Each match in the report carries a `pass_tier` (1/2/3) shown as a coloured "Resolved at" badge — useful for triage and trust calibration. #### AI Vision Provider Swap - AI Vision default switched from `openai/gpt-4o` (~$0.05/comparison) to `google/gemini-2.5-flash` (~$0.005/comparison) — ~10× cost reduction for the same fallback quality. - AI Vision **re-enabled** in `CampaignMatcher` (was hard-disabled for cost control under the previous default). #### Master Temp File Lifecycle - Master temp files now persist for the entire campaign run (cleaned up via `shutil.rmtree(temp_dir)` at end). Previously they were deleted right after fingerprinting, which would have starved Pass 3 AI Vision of its source frames. #### Report Download - New endpoint `GET /video-master/report//download` serves the saved HTML with `Content-Disposition: attachment`. - Download buttons added to `view_report.html` and `results.html`. #### Infrastructure - **`gunicorn_config.py`** — `max_requests` raised from `200` to `5000` (jitter `30→500`); added `graceful_timeout=600`. The previous setting recycled the worker every ~5 minutes under normal progress polling, killing any in-flight matching/QC daemon thread mid-run. The new setting lets workers run several hours between recycles, well past any single job. **Trade-off**: workers no longer recycle frequently, so if any C extension (ffmpeg, OpenCV) leaks memory it will accumulate — monitor `docker stats hm-qc-app` over time. A Celery refactor is the proper long-term fix and has been scoped but deferred. --- ## [2.4.0] - 2026-04-23 ### Pricing References Library, Deterministic Price Match, Video Batch & Video Price Check #### Pricing References — Standalone Library (replaces single global JSON) - **New `PricingReference` model** — independent uploadable library, NOT tied to a campaign_id. Users pick one at QC configure time alongside (or instead of) a campaign presentation. - Campaigns tab reworked: the old single "Global Pricing Reference" upload section is replaced with a "Pricing References" library card (name + upload, per-row list with delete + status polling). - New routes: `POST /campaigns/pricing/upload`, `DELETE /campaigns/pricing/`, `GET /campaigns/api/pricing/list`, `GET /campaigns/api/pricing/status/`. - Per-row storage: `storage/pricing_references//`. - **Backwards-compat auto-migration**: on first startup after upgrade, if `storage/reference/global_pricing.json` exists and the table is empty, it's imported as a "Default (legacy global)" row. #### Excel Mastersheet Support (new) - Pricing upload now accepts `.xlsx` / `.xls` alongside `.pdf`. - **Deterministic parser** (`modules/campaigns/pricing_parser.py`) — uses openpyxl, no LLM: - `MPC Prices` sheet → flat list of `{product_id, language, country, price, currency, product_name}` entries. - Regional sheets (AME/CEU/EEU/NEU/SEU/FRN/SHE/GCN/EAS/IN/BR/AME Latam) → formatted prices per locale column, used to derive currency symbol, position, decimal_separator, thousands_separator. - Skips `OLD` / `COPY` sheets and `PRICE NOT PRESENT IN REPORT` cells. - PDF parsing kept as fallback (LlamaParse + LLM, format metadata only). #### New Lookup Shape Stored in `PricingReference.parsed_data_json`: ```json { "_format": {"en-US": {"currency_code":"USD","symbol":"$","position":"before",...}}, "_prices": [{"product_id":"1334912002","language":"en-US","price":"49.99","currency":"USD",...}] } ``` Legacy flat `{"": {...}}` still recognised (treated as `_format` only). #### Deterministic Price Matching - **HM QC price/currency check upgraded**: when the attached pricing reference has `_prices`, detected prices are normalized (`_normalize_price` handles `$49.99`, `39,99 €`, `1.234,56`, `CHF 49.95`, `13 995 Ft`, `349,-`, `0.999.000`, etc.) and compared with 0.005 tolerance against per-locale expected rows. - LLM-based campaign-sheet fallback only runs when `_prices` is empty. - Result includes `matched_product`, `matched_currency`, and the list of expected prices for the locale for easier debugging. #### Video QC — Batch Processing (mirrors HM QC batch pattern) - **Multi-file upload** (up to 50 videos) with drag-and-drop UI. - New `BatchVideoQCExecutor` — sequential processing, `gc.collect()` between videos, shared `batch_id` stamped into each report's metadata. - New routes: `/video-qc/execute/batch`, `/video-qc/results/batch/`, `/video-qc/report//download`, `/video-qc/report/batch//download` (ZIP), `/video-qc/report/batch/` DELETE. - Batch results page: summary card (Total / Passed / Failed / Warnings / Avg Score) + per-file list with View + Download. - Video QC index page groups history by batch (collapsible cards) + shows individual runs below. - After run: single file → single results page; batch → batch results page. #### Video QC — Price & Currency Check (new) - New `_run_price_check` step in video executor (weight 30). - Parses filename `Market_lang_CampaignNum_...` → `lang-Market` locale. - Detects prices across video via Gemini direct video / GPT-4o frame grid, then deterministic-validates currency + price against the attached pricing reference. - Skipped cleanly if no pricing ref, unparseable locale, GEN/CEN markets, or no price visible. - Overall score replaced hardcoded 50/50 split with **weighted mean of non-skipped checks** (`visual_quality` 50, `censorship` 50, `price_currency` 30). #### Bug Fixes - **Fixed Video QC ValueError on execution** (`Invalid format specifier ' 85, ...`): visual_quality and censorship prompt f-strings contained JSON examples with unescaped `{`/`}` that Python tried to parse as format specifiers — escaped to `{{` / `}}`. - **Fixed Reporting history dashboard**: "View Details" links were dead and the Parsed Data View accordion was empty when viewing saved reports from the Reporting index. `history_dashboard()` was passing `parsed_reports=friendly_reports` (wrong kwarg, wrong shape). Aligned with the live `dashboard()` route: attaches `friendly_checks` to each parsed report and passes `reports=parsed_reports`. #### Infrastructure - `core/models/pricing_reference.py` — new model with `get_format_map()` / `get_prices()` helpers. - `core/models/database.py` — registers the new model so `db.create_all()` creates the table. - `app.py` — runs the one-time legacy global-pricing migration on startup. - `config.py` — adds `PRICING_REF_STORAGE_PATH`, keeps `GLOBAL_PRICING_*_PATH` for the legacy importer. --- ## [2.3.0] - 2026-04-16 ### Batch QC Improvements, Consolidated Reports & Stability Fixes #### Batch QC — Download, Naming & Delete - **Consolidated report**: select multiple reports (checkboxes) and download a single combined HTML with summary table + all individual reports embedded (`POST /hm-qc/report/consolidated`) - **Download All (ZIP)**: batch-level ZIP download on both results page and index page - **Per-file actions**: View and Download buttons on every file in batch results - **Batch naming**: batches now display their job number instead of just "Batch {date}" - **Delete batch**: trash button removes all reports, HTML files, and thumbnails for a batch (`DELETE /hm-qc/report/batch/`) - **Consistent results view**: batch results page now always renders from database (not ephemeral progress data), so the view is identical whether you just completed a batch or navigated back from another tab #### Report Thumbnails - **Thumbnails embedded in HTML reports**: asset preview image (base64-encoded) appears in the report header, making reports fully self-contained - Thumbnails also appear in batch results per-file rows #### Download Buttons Across All Pages - Upload page: "Recent QC Reports" table now has View/Download buttons - Index page: individual reports table now has Download button (was only View + Delete) - Batch results page: per-file View/Download buttons + Select All + consolidated report #### Memory & Stability Fixes (OOM) - **Fixed OOM crash on large batches**: gunicorn worker was SIGKILL'd by OOM killer after ~7-12 files - Switched batch processing from `ThreadPoolExecutor` (2 concurrent) to sequential loop — each file gets its own `app.app_context()` and `gc.collect()` runs between files - Reduced LLM image max size from 2000px to 1200px (sufficient for vision analysis, 64% less RAM) - Explicit `PIL.Image.close()` and `del` after all image operations - Switched gunicorn from sync to gthread workers (2 workers x 2 threads) - Added `max_requests=200` to auto-recycle workers and release accumulated memory - Successfully processed 18-file batches at ~220MB stable memory #### Flask App Context Fix - **Fixed "Working outside of application context" error** during batch QC execution - `ThreadPoolExecutor` child threads did not inherit Flask app context — fixed by passing `app` to `BatchQCExecutor` and wrapping each child thread with `app.app_context()` - Ensured `progress_sessions` table is created on fresh databases by importing `ProgressSession` in `init_db()` #### Infrastructure - `gunicorn_config.py`: gthread workers, `max_requests=200`, timeout 300s - `core/models/database.py`: imports `ProgressSession` before `db.create_all()` - `core/utils/progress_tracker.py`: removed unused `flask.session` import - `core/services/llm_config.py`: image max 1200px, aggressive PIL cleanup --- ## [2.1.0] - 2026-03-26 ### Campaign Presentations & Pricing Reference #### New Module: Campaigns (`modules/campaigns/`) - **Campaign presentation upload** (PDF): creative guidelines with typography specs, layout rules, copy text, ratio-specific mockups - **Media plan upload** (Excel .xlsx): product names, prices, currency per country/language — parsed with openpyxl - **Multiple documents per campaign**: linked by Campaign ID, loaded together during QC - **LlamaParse** multimodal parsing for PDFs (text + page images) - **Auto-polling**: status badges update in-place when parsing completes - **View page**: shows extracted text and page images - **Global Pricing Reference**: single PDF upload parsed into structured JSON (`storage/reference/global_pricing.json`) for currency format validation (symbol, position, separators) — format only, not actual prices - **API endpoints**: `/campaigns/api/list`, `/campaigns/api/`, `/campaigns/api/status/` #### HM QC Module — Campaign-Aware Checks - **Filename check rewritten**: flexibly extracts country code, language, dimensions, campaign number from multiple H&M naming conventions (Display, DOOH, OOH, SOME STATIC, Social, POS) — no longer requires rigid pattern - **New Price/Currency check** (`price_currency_check.py`): detects prices in images via LLM vision, validates currency against global pricing reference (deterministic), validates actual prices against campaign media plan when available - **Campaign guidelines in quality check**: campaign presentation content injected into LLM prompt for typography, layout, copy, and branding validation - **Campaign dropdown** on configure page to select which campaign to validate against - **Report download** route added (`/report//download`) — fixes 404 on Download Report button - **Campaign label** in report header (e.g., "1022B - AW25 Display (2 docs) + pricing") - **Profile rebalanced**: filename_parse 30%, image_quality 40%, price_currency 30% #### Video QC Module — Campaign-Aware - **Campaign dropdown** on configure page - **Campaign guidelines** injected into visual quality check prompt - **Multi-document support**: loads all campaign docs (guidelines + media plan) #### Database - New `CampaignPresentation` model: campaign_id, campaign_name, pdf_filename, pdf_path, parsed_content, page_images_dir, has_pricing, status, error_message - New table: `campaign_presentations` #### Infrastructure - `openpyxl` added to requirements for Excel parsing - `llama-parse` and `nest_asyncio` added to requirements for PDF parsing - Dockerfile updated with `storage/campaigns` and `storage/reference` directories - `config.py`: new `CAMPAIGN_STORAGE_PATH`, `GLOBAL_PRICING_PDF_PATH`, `GLOBAL_PRICING_JSON_PATH` - Background thread context fix: app reference captured before request ends #### New Files - `core/models/campaign_presentation.py` — CampaignPresentation database model - `modules/campaigns/__init__.py`, `blueprint.py`, `routes.py`, `services.py`, `pricing_parser.py` - `modules/campaigns/templates/campaigns/index.html`, `view.html` - `modules/hm_qc/checks/price_currency_check.py` — Price/currency validation check --- ## [2.0.0] - 2026-03-21 ### Major Release — Full Platform Deployment & AI-Powered QC #### Deployment - **Docker deployment** with Dockerfile, docker-compose.yml, and .dockerignore - **Apache reverse proxy** config (deploy/apache-location.conf, deploy/nginx-location.conf) - Deploy script (deploy/deploy.sh) and password generator (deploy/generate_password.py) - Deployed at `https://ai-sandbox.oliver.solutions/hm-ai-qc-report` #### Authentication - **Replaced Azure AD/MSAL** with local username/password authentication - Session-based auth with PBKDF2/scrypt password hashing - `before_request` hook enforces login on all routes - Login page template (templates/login.html) #### HM QC Module — Full Overhaul - **Simplified to single profile**: "H&M Image Check" (filename_parse 50% + image_quality 50%) - Removed dimension_check, censorship_check, image_parse, and quick_check/standard_pdf profiles - **Strict text legibility prompt**: illegible text automatically fails (score < 70) - Text legibility is #1 priority in AI evaluation - False-positive prevention for multi-language words (e.g., "Rock" = skirt in German) - **LLM provider choice**: OpenAI GPT-4o or Google Gemini 2.5 Flash on configure page - **Previous QC Reports** table on index page with View and Delete buttons - **View saved reports**: `/hm-qc/report/` serves saved HTML report with score summary - **Back navigation**: "Run Another QC" button on results page - **Fixed multi-file batch routing**: configure page now correctly detects file count and routes to batch endpoint #### Video QC Module — Built from Scratch - **Full workflow**: Upload → Configure → Execute → Results (was previously "coming soon") - **Frame extraction**: 1 frame per second using FFmpeg `extract_thumbnails()` - **Grid stitching**: Frames composited into labeled grid image for efficient AI analysis - **Two AI checks**: - Visual Quality (50%): language consistency + text legibility + logo clarity - Censorship (50%): body coverage compliance — auto-detects `_CEN` market from filename, skips non-CEN files - **Language false-positive prevention**: validates words exist in detected primary language before flagging - LLM provider choice (OpenAI / Google Gemini) - Previous Video QC Reports with View/Delete - Progress tracking (SSE + polling) - HTML report generation #### Video Master Adot — Campaign-Based Matching - **Campaign search**: enter campaign name → searches Box for folder under CAMPAIGNS - **Auto-discovery**: finds Global Masters and Regional Masters subfolders (case-insensitive) - **Preview**: shows master count, country list with adaptation counts before starting - **Phase 1 — Fingerprint masters**: temp download each master → fingerprint (~50KB) → delete video - **Phase 2 — Match adaptations**: temp download each adaptation → match against fingerprints → delete - **Recursive folder search**: finds videos inside subfolders (DOOH, DS, OLV, etc.) - **Results report**: per-master adaptation mapping, unmatched items, match rate - **Storage efficient**: only fingerprints stored (~50KB per master), videos deleted immediately - New Box client methods: `search_subfolder()`, `list_subfolders()`, `list_video_files()`, `download_file_to_disk()` - `BOX_CAMPAIGNS_FOLDER_ID` config for campaign folder (separate from QC reports folder) #### Reporting Module — Enhancements - **Box reports saved locally**: after search, HTML reports saved to disk + database for instant re-viewing - **Previous Box Reports** section with View button (loads from saved files, no Box re-fetch) - **Previous QC Reports** moved to HM QC tab (was incorrectly on Reporting tab) - **History dashboard**: `/reporting/history/` serves saved reports without Box API calls - **Delete buttons** on Box reports (per job number) and QC reports (per report) - **Fixed back navigation**: dashboard "Back" links now go to `/reporting/index` - **Fixed Box search pagination**: capped iterations to prevent runaway loops (was paginating through 35k+ results) #### Usage Dashboard — New Module - **New tab** in navigation bar - Tracks every LLM API call: provider, model, tokens, estimated cost, user, module - `UsageLog` database model with auto-logging from `LLMConfig.call_vision_api()` - **Summary cards**: total calls, tokens, estimated cost (USD) - **Breakdowns**: by provider, model, tool, user - **Recent calls** table with full details - **Time filters**: All Time, 30 Days, 7 Days, Today - Cost estimates based on per-model token pricing #### LLM Provider Support - **Google Gemini** added as LLM provider (via `google-generativeai` package) - Default Google model: `gemini-2.5-flash` - Provider selector on HM QC and Video QC configure pages - `LLMConfig` updated with Google Vision API integration - Cost tracking for all providers #### Infrastructure - Gunicorn production config (`gunicorn_config.py`, `wsgi.py`) - Database path fix for Docker (`sqlite:////app/database/qc_platform.db`) - Dockerfile: Debian Trixie package name fixes (`libgl1-mesa-dri`, `libchromaprint-tools`) - Box SDK: rewrote folder methods to use `get_items()` instead of `.get()` (fixes collaborated folder access) - `_get_folder_items()` helper with pagination and fallback for Box API compatibility #### New Files - `Dockerfile`, `docker-compose.yml`, `.dockerignore` - `deploy/deploy.sh`, `deploy/apache-location.conf`, `deploy/nginx-location.conf`, `deploy/generate_password.py` - `templates/login.html` - `modules/video_qc/executor.py` — Video QC executor with frame extraction and AI checks - `modules/video_master/campaign_matcher.py` — Campaign-based master matching orchestrator - `modules/usage/__init__.py`, `modules/usage/routes.py` — Usage dashboard module - `modules/usage/templates/usage/dashboard.html` - `core/models/usage_log.py` — Usage tracking model - `modules/hm_qc/checks/dimension_check.py` — Dimension validation check - `modules/hm_qc/templates/hm_qc/view_report.html` — Saved report viewer - `modules/video_qc/templates/video_qc/configure.html`, `results.html`, `view_report.html` - `modules/video_master/templates/video_master/match.html`, `results.html`, `view_report.html` ## [1.1.0] - 2026-03-13 ### Added #### Reporting Module - **Async search with progress bar**: `POST /reporting/search/async` starts background search, returns session_id for progress tracking via SSE (`/reporting/progress/`) or polling (`/reporting/api/progress/`) - **CSV export**: `GET /reporting/export/csv/` and `/errors` variant — columns: Job Number, Filename, Box Link, Check Name, Status, Issue Description, Action Required - **Multi-job search**: Comma-separated job numbers in search input; combined dashboard at `/reporting/dashboard/multi` with cross-job summary and per-job collapsible sections - **Multi-job CSV export**: `GET /reporting/export/csv/multi?session_id=` for combined export - **Error code cleanup**: `ERROR_DISPLAY_MAP` and `ACTION_GUIDANCE` in `report_parser.py` map technical check names to human-readable display names with remediation guidance - **Designer-friendly dashboard**: Sanitized check names, error summaries, and action guidance shown by default; "Show Technical Details" toggle reveals full results/config - **Result caching**: In-memory cache with 30-min TTL (`result_cache.py`) stores parsed results between async search and dashboard render #### HM QC Module - **Batch file upload**: Upload form accepts multiple files (up to 100) with drag & drop, file list preview, and per-file remove - **Batch execution**: `POST /hm-qc/execute/batch` processes files in configurable batches (default 10) with 2-second cooldown between batches for API rate limiting - **Batch results**: `GET /hm-qc/results/batch/` shows summary (total/passed/failed/warnings/average score) and per-file results with scores - **`BatchQCExecutor`**: New class in `batch_executor.py` — reuses `QCExecutor` per file, tracks progress at batch level, isolates per-file errors ### Changed - `modules/reporting/routes.py` — Reorganized with async search, CSV export, multi-job, and progress endpoints - `modules/reporting/aggregator.py` — Added `get_consolidated_reports_with_progress()` and `get_consolidated_reports_multi()` with tracker integration - `core/utils/report_parser.py` — Added `sanitize_error_for_display()`, `get_designer_friendly_checks()`, `ERROR_DISPLAY_MAP`, `ACTION_GUIDANCE` - `modules/reporting/templates/reporting/index.html` — Progress bar replaces spinner, multi-job placeholder text - `modules/reporting/templates/reporting/dashboard.html` — CSV export buttons, friendly check names, technical details toggle, source badges - `modules/hm_qc/routes.py` — `POST /hm-qc/upload` accepts `getlist('files')` for multi-file; new `/execute/batch` and `/results/batch/` routes - `modules/hm_qc/templates/hm_qc/upload.html` — Multi-file support with `multiple` attribute, file list UI, client-side validation (max 100) - `modules/hm_qc/templates/hm_qc/results.html` — Extended for batch results display ### New Files - `modules/reporting/result_cache.py` — Thread-safe in-memory cache with TTL - `modules/reporting/templates/reporting/dashboard_multi.html` — Multi-job dashboard template - `modules/hm_qc/batch_executor.py` — Batch processing executor with rate limiting ## [1.0.1] - 2026-02-25 ### Fixed - Authentication flow: switched from MSAL popup to redirect-based login - Root route changed from 302 redirect to direct render to preserve URL hash fragment for MSAL - Logout now uses `clearCache()` instead of `logoutPopup()` ## [1.0.0] - 2026-02-25 ### Added - **Unified Platform**: Consolidated separate QC tools into single Flask application - **Modular Architecture**: Blueprint-based modules (HM QC, Video QC, Video Master, Reporting) - **Core Framework**: Shared auth, database, services, and utilities in `core/` - **Legacy Code Integration**: Original check implementations from `hm_qc` and `hm_qc_video` repos preserved in `modules/*/checks/legacy/` - **Azure AD Authentication**: MSAL redirect flow with session management - **Database**: SQLite with SQLAlchemy for QC report persistence - **0-100 Scoring System**: Configurable weighted scoring for all QC checks - **AI/LLM Integration**: Configurable LLM provider (OpenAI/Anthropic) for content analysis - **Web UI**: Unified interface with tab navigation across all modules - **Progress Tracking**: Real-time progress updates for QC runs ### Modules - **Reporting** (complete): Consolidated report search from Box.com and local database - **HM QC** (complete): PDF/image quality control with 2 sample checks + 20 legacy checks - **Video QC** (BETA): Video quality control with 7 legacy checks - **Video Master Adot** (BETA): Video fingerprinting and master matching ### Security - No hardcoded API keys (all via environment variables) - Azure AD JWT validation - httpOnly session cookies - CSRF protection via SameSite=Lax --- ## Reporting Module History The Reporting module was developed as a standalone tool before being integrated into the unified platform. Its version history is preserved below for reference. ### Reporting v2.1.0 - 2025-12-17 #### Added - Real-time progress indicator with visual progress bar during campaign search - Support for CAMPAIGNS/{CampaignNumber}/QC/ folder hierarchy - Automatic pagination for searching through 3500+ campaign folders #### Changed - Updated Box folder structure from flat to hierarchical (CAMPAIGNS/*/QC/) - Changed BOX_REPORT_FOLDER_ID from 303321539397 to 133295752718 #### Fixed - Box SDK minimal object issue by requesting specific fields - Pagination support for campaigns appearing later in alphabetical order ### Reporting v2.0.0 - 2025-11-15 #### Added - Azure AD (Microsoft Entra ID) authentication using MSAL - JWT token validation with RS256 signature verification - httpOnly cookies for secure session management #### Changed - Updated to use port 7183 - All API endpoints now require authentication ### Reporting v1.0.0 - 2025-10-01 #### Added - Initial release of QC Report Dashboard - Box.com API integration with JWT authentication - HTML report parsing and aggregation - Job number search functionality