hm_ai_qc_report_tool

Author	SHA1	Message	Date
nickviljoen	a500d7b088	Six tooling fixes from Dev test pass Video QC: * _extract_locale_from_filename now also handles the suffix form ..._XX-yy.ext (case-insensitive both sides), so DOOH/OOH-style adapt filenames like ..._ES-es.mp4 unblock the price_currency check instead of skipping with "could not extract locale". * Batch results page expires the SQLAlchemy session at the top of the route so the post-completion reload sees committed reports even when it lands on a different gunicorn worker than the one that wrote them. Reload delay bumped 1s → 2s for margin. * visual_quality prompt now passes the filename's market+language to the LLM and tells it the on-screen copy should be in the localized language, not the source-language guideline copy. Stops Spanish-market videos being flagged as "language mismatch with English campaign guidelines". Printer Check: * regions.json rewritten to cover all 10 H&M regions (AME, CEU, NEU, GCN, IND, SHE, SEU, EEU, EAS, Franchise) with default-all groups. Two judgement calls vs the screenshot: kept TR for Turkey (TK is Tokelau in ISO and would break filename matching) and BR for Brazil (every other code is 2-letter ISO). Campaign codes: * New core/utils/campaign_code.py is the single source of truth. Matches both the legacy 4-digits-plus-optional-letter (1013A, 4116) and the new 11-char alphanumeric with year at positions 5-6 (CFUL263C01D). All four prior parser sites now import from this helper. Video Master: * BOX_CAMPAIGNS_FOLDER_ID switched 156182880490 → 133295752718 (same root the Reporting tool uses). Updated config.py default and all three .env example files. * Match page now shows which Box folder the search runs against (with a clickable link), and on a not-found error explains what was searched for so missing-campaign cases are self-diagnosable.	2026-05-09 18:32:23 +02:00
nickviljoen	39383db95f	Pricing refs: Excel support, structured lookup, deterministic price match, video price check A. Excel upload — /campaigns/pricing/upload now accepts .xlsx/.xls alongside .pdf. File picker in the campaigns UI matches. B. Deterministic Excel parser (openpyxl, no LLM) — looks for H&M-style mastersheets: - 'MPC Prices' sheet -> flat list of {product_id, language, country, price, currency, product_name} entries (this is the gold mine). - Regional sheets (AME/CEU/EEU/...) -> formatted prices per locale used to derive currency symbol, position, decimal/thousands separators. Skips OLD/COPY sheets. Verified against the attached 1013A mastersheet: 448 price entries across 7 products x 74 locales, 139 locale format entries. Parser lives in modules/campaigns/pricing_parser.py alongside the existing PDF path (which now also returns the structured form with empty _prices). New lookup shape stored in PricingReference.parsed_data_json: {"_format": {"en-US": {currency_code, symbol, position, ...}, ...}, "_prices": [{product_id, language, country, price, currency, product_name}, ...]} Legacy flat {"<code>": {...}} is still recognised (treated as _format only) for backwards compatibility with the legacy global JSON import. Model helpers added: - PricingReference.get_format_map() - PricingReference.get_prices() to_dict() now reports price_count alongside entry_count. C. Upgraded price_currency_check.py — when a pricing reference with _prices is attached, the check runs a deterministic comparison: detected price(s) -> normalize (_normalize_price handles '$49.99', '39,99 €', 'CHF 49.95', '1.234,56', 'Rs. 2,799', '13 995 Ft', '349,-', '0.999.000'...) -> compare with tol=0.005 against the expected per-locale rows. LLM-based campaign-sheet fallback only runs if no _prices are present (legacy PDF reference or has_pricing campaign presentation). D. Video QC price check — new _run_price_check step in the executor. Parses filename (Market_lang_CampaignNum_... -> 'lang-Market' locale), detects prices across frames via the same Gemini/GPT-4o path the other checks use, then deterministic-validates against the attached pricing reference. Skipped if no pricing ref, unknown locale, GEN/CEN markets, or no price visible in video. Overall video score now uses weighted mean of active (non-skipped) checks (visual_quality w=50, censorship w=50, price_currency w=30) instead of the hardcoded 50/50 split — so skipping any one check falls through cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:52:39 +02:00
nickviljoen	e5d0d468db	Pricing references: standalone library (was single global file) The "Global Pricing Reference" is no longer a single file at storage/reference/global_pricing.json. Pricing references are now first-class DB rows (PricingReference model), uploadable as a library in the Campaigns tab and selectable per-run alongside the campaign presentation dropdown on the HM QC and Video QC configure pages. New: - core/models/pricing_reference.py — PricingReference model: id, name, pdf_filename, pdf_path, parsed_content, parsed_data_json, status, created_at/by. get_lookup() deserializes parsed_data_json; to_dict() powers the dropdown API. - /campaigns/pricing/upload — creates a PricingReference row, saves PDF under storage/pricing_references/<id>/, kicks off background parse. - /campaigns/pricing/<id> DELETE, /campaigns/api/pricing/list, /campaigns/api/pricing/status/<id>. - Campaigns index: "Pricing References" table card (mirrors the presentations card) + upload form with optional name field. Changed: - pricing_parser: parse_pricing_pdf_to_dict returns (dict, raw_text); new parse_pricing_reference(id) runs the parse against a DB row and sets status to ready/error. Legacy file-based path removed. - QCExecutor and VideoQCExecutor accept pricing_reference_id; load the row into context['pricing_reference']={id, name, lookup}. - BatchQCExecutor and BatchVideoQCExecutor thread pricing_reference_id through to per-file executors. - price_currency_check._validate_currency reads context instead of the disk file; returns 'skipped_no_reference' if no ref attached. - HM QC + Video QC /execute and /execute/batch routes pass pricing_reference_id from the JSON payload. - Configure templates for HM QC and Video QC add a second dropdown "Pricing Reference (Optional)" loaded from /campaigns/api/pricing/list. Backwards compatibility: - app.py: on startup, if storage/reference/global_pricing.json exists and the pricing_references table is empty, import it as a "Default (legacy global)" PricingReference row so existing installs keep a valid reference attached (user can pick it at configure time). - config.py: retains GLOBAL_PRICING_{PDF,JSON}_PATH for the legacy importer; adds PRICING_REF_STORAGE_PATH for the new per-row storage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:27:09 +02:00
nickviljoen	d036752d17	v2.2.0: Gemini video, batch grouping, thumbnails, speed, price fix, printer check - Video QC: Switch to Google Gemini direct video analysis as default (OpenAI frame grid fallback) - HM QC: Group reports by batch with collapsible sections, ZIP download per batch - HM QC: Generate asset thumbnails (150px) displayed in report listings - Speed: Remove artificial delays, add ThreadPoolExecutor(2) for parallel batch processing - Price detection: Improved prompt with country context, detect all prices, increased text limit - New Printer Check module: CSV-to-PDF cross-referencing ported from CrossMatch Rust app Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 13:56:07 +02:00
nickviljoen	7a3272b7c4	Fix price detection: better error handling, strip markdown fences, log responses - Strip markdown code fences from LLM response before JSON parsing - Log raw response and parsed result for debugging - Show warning with provider/model info when detection fails (instead of silent skip) - Separate "detection failed" (warning, 70) from "no price found" (skipped, 100) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 19:05:51 +02:00
nickviljoen	2d5fe43031	Support multiple campaign docs + clarify pricing is format-only - Global pricing parser now explicitly extracts format only (symbol, position, separators) — ignores actual price values in the reference doc - Executors load ALL ready documents for a campaign (not just the latest), combining their content — supports guidelines + media plan side by side - Campaign context now separates pricing_content (from has_pricing docs) from general parsed_content (all docs combined) - Price check uses pricing_content specifically for actual price validation - Report header shows document count (e.g., "1022B - AW25 Display (2 docs) + pricing") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 18:47:46 +02:00
nickviljoen	fc15a2dda3	Rewrite filename check + add price/currency check to image QC Filename check: - Rewritten to flexibly parse multiple H&M naming conventions (Display, DOOH, OOH, SOME STATIC, Social, POS, DS) - Extracts country code, language code, dimensions, campaign number - Scores based on how much metadata was extracted (not rigid pattern) - Tested against real filenames: BG_bg, ES_es, NO-no formats Price/currency check (new): - Detects prices in images via LLM vision API - Validates currency against global pricing reference (deterministic) - Falls back to LLM validation for unknown countries - Optional campaign pricing sheet validation when has_pricing=True - Added to profile with weight 30 Profile weights rebalanced: filename 30, quality 40, price 30 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 18:39:54 +02:00
nickviljoen	9c33858726	Add campaign presentation management and global pricing reference Introduces a new Campaigns module for uploading campaign presentation PDFs that QC checks reference to validate assets against campaign-specific guidelines (typography, layout, copy, pricing format). Also adds a global pricing reference system that maps country codes to currency symbols and formats for deterministic price/currency validation. - New CampaignPresentation model + campaigns blueprint with CRUD routes - PDF parsing via LlamaParse (text + multimodal page images) - Global pricing PDF parsed into structured JSON lookup - Campaign context injected into both image and video QC executors - Quality checks enhanced with campaign guidelines in LLM prompts - Price/currency check uses global pricing lookup (saves an LLM call) - Campaign dropdown added to HM QC and Video QC configure pages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 16:12:22 +02:00
nickviljoen	e910e00edf	Add Usage Dashboard with token tracking, cost estimates, and filters - New UsageLog model tracking every LLM API call (provider, model, tokens, estimated cost, user, module, check name) - Instrument LLMConfig.call_vision_api() to auto-log each call - New /usage tab in nav bar with dashboard showing: - Summary cards (total calls, tokens, estimated cost) - Breakdowns by provider, model, tool, and user - Recent API calls table - Time filters (All Time, 30 Days, 7 Days, Today) - Cost estimates based on per-model token pricing - Pass logged-in user through executor context for tracking Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 18:17:21 +02:00
nickviljoen	5e291723a0	Swap dimension_check back to filename_parse, strengthen text legibility prompt - Replace dimension_check with filename_parse in H&M Image Check profile - Rewrite quality check prompt to be much stricter on text legibility: - Text legibility is now the #1 priority (CRITICAL check) - Any illegible text forces score below 70 (FAILED) - Explicit instructions to check ALL text including small overlays - Low contrast text on dark/busy backgrounds flagged as common failure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 17:38:01 +02:00
nickviljoen	91dec41e0b	Batch 3: Add title legibility check, Google Gemini support, LLM provider selector - Update image quality prompt to evaluate text/title legibility - Add Google Gemini (generativeai) as LLM provider in LLMConfig - Add AI Provider dropdown on configure page (OpenAI GPT-4o / Google Gemini) - Pass selected provider through execute routes to override profile defaults - Add google-generativeai to requirements.txt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 16:53:07 +02:00
nickviljoen	1c582ffcf4	Batch 2: Simplify to single profile, fix multi-file batch execution - Replace 3 profiles with single "H&M Image Check" (dimension_check + image_quality) - Remove filename_parse check (pattern didn't match actual filenames) - Create DimensionCheck class for image dimension validation - Fix configure page to route multi-file uploads to batch endpoint - Auto-select single profile, show file list on configure page Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 16:50:35 +02:00
nickviljoen	ffd8b7303c	v1.1.0: Add progress tracking, CSV export, multi-job support, batch processing, and security fixes - Reporting: async search with SSE progress bar, CSV export with Box file links, multi-job support, designer-friendly error display with action guidance - HM QC: batch file upload (up to 100 files), batch execution with rate limiting, batch results summary - Fix: SQLAlchemy stale cache in SSE progress streaming (expire_all + commit) - Fix: Box folder pagination loop (search API instead of iterating 10,300 folders) - Fix: HM QC blank screen (progress.js not loaded, hardcoded wrong URLs) - Security: remove hardcoded API keys from legacy files, read from .env instead Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 09:43:20 +02:00
nickviljoen	677736943a	Consolidate legacy hm_qc and video_qc tools into main project Merge original CLI check implementations from hm_qc/ and hm_qc_video/ repos into modules/*/checks/legacy/ directories. Includes profiles, launchers, utils, orchestrators, and the standalone video Flask web app. Reference files (test data, results, cheat sheets) copied to gitignored reference/ directory. Censorship trainset copied to gitignored data/supporting/. The legacy/ naming convention separates original run_check() function-based implementations from the new BaseCheck class architecture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 11:40:53 +02:00
nickviljoen	e6f3e9387e	Add modular architecture, core framework, and web UI New blueprint-based module system (hm_qc, video_qc, video_master, reporting), core framework (database, config, templates), and unified web interface with progress tracking and tab navigation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 11:39:04 +02:00

15 commits