hm_ai_qc_report_tool

Author	SHA1	Message	Date
nickviljoen	5de8f5fe7b	Video QC: fix client-side SRT rejection in upload form The accept='' attribute and help copy already allowed .srt, but a separate ALLOWED_EXTENSIONS array in upload.html's JS was filtering out .srt files as 'unsupported format'. Adding 'srt' to that array fixes the silent skip seen on Dev (file picker showed .srt as valid, then the submit handler dropped it).	2026-05-15 21:26:20 +02:00
nickviljoen	70700f4f91	Video QC: register SRT checks in standard_video profile Profile YAML is descriptive metadata (executor runs unconditionally). Documenting srt_structure (15), srt_timing (10), srt_language (20) so the profile page reflects the live check set.	2026-05-15 20:48:14 +02:00
nickviljoen	f361d8e9a1	Video QC UI: SRT upload + pre-flight pairing summary Upload form accepts .srt alongside .mp4. Configure page shows pair_map counts and a collapsible list of unpaired SRTs (rendered via DOM textContent to avoid XSS from user-controlled SRT filenames). Uses the new /pairing-preview/<session_id> endpoint.	2026-05-15 20:47:38 +02:00
nickviljoen	57dbefe4f2	Video QC routes: accept .srt uploads + pre-flight pairing endpoint Adds .srt to ALLOWED_EXTENSIONS; introduces is_video() / is_srt() helpers. New /pairing-preview/<session_id> endpoint returns pair_map + unpaired lists for the configure UI. Batch execute threads srt_paths into BatchVideoQCExecutor.	2026-05-15 20:45:55 +02:00
nickviljoen	45fe103fd3	Video QC batch: pair SRTs to videos at pre-flight Adds optional srt_paths constructor parameter. At execute() top, runs pair_batch() to produce pair_map / unpaired_srts / unpaired_videos. Threads pair_map[video_path] into each per-video VideoQCExecutor as srt_path. No-op when srt_paths is empty.	2026-05-15 20:44:33 +02:00
nickviljoen	6faf785d61	Video QC: wire srt_structure/srt_timing/srt_language into execute() Skipped when self.srt_path is None (one result per check, weight set so the weighted-average math is unchanged). When set, runs all three checks sequentially with progress updates. SRT results appear as additional cards in the existing Video QC report.	2026-05-15 20:43:37 +02:00
nickviljoen	0a1f116338	SRT QC: add srt_language check (inline Gemini text call) Text-only LLM call samples up to 15 cues (~1500 chars), asks Gemini Flash to identify language. Pass = ISO matches expected from video filename's locale; warning = low confidence or mixed_language; fail = ISO mismatch with high confidence. Weight 20. Note: uses genai.GenerativeModel directly rather than a unified LLMConfig.call_text_api (which doesn't exist yet). Marked TODO for future refactor when that helper is added.	2026-05-15 20:40:21 +02:00
nickviljoen	ae510a7ecd	Merge develop (SRT structure+timing checks) — resolve srt_pairing conflict keeping full implementation	2026-05-15 20:39:14 +02:00
nickviljoen	8493bd645c	SRT QC: add srt_timing check Deterministic. Validates start < end, no overlaps (>=3 overlaps fails, otherwise warns), last cue <= video duration + 0.5s tolerance. Warning- only rules: reading speed 5-25 cps, line length <= 42 chars, lines per cue <= 2, cue duration 0.7-7.0s. Fail = -30, warning = -5 (capped at 50 warning-loss). Weight 10.	2026-05-15 20:34:39 +02:00
nickviljoen	e8ba567590	SRT QC: add srt_structure check Deterministic, no LLM. Parses SRT via the srt library, validates UTF-8 encoding (with chardet fallback for non-UTF-8), no replacement chars, non-empty cue content, ascending cue indices. Fails on parse error / replacement chars / no cues; warnings otherwise. Weight 15.	2026-05-15 20:30:01 +02:00
nickviljoen	239d39d4eb	Video QC: thread srt_path through executor constructor Defaults to None — existing single-file flow unchanged. Used by the upcoming SRT checks and by BatchVideoQCExecutor's pair_batch step.	2026-05-15 20:25:24 +02:00
nickviljoen	b1a657d593	SRT QC: add score_pair + pair_batch score_pair: additive locale (0.5) + campaign code (0.3) + clip-slug substring (0.4), capped at 1.0, with hard-reject on divergent locales or non-overlapping slugs. pair_batch: greedy highest-first assignment above 0.7 threshold; one SRT per video. Verified pairs all 6 videos in testing_15may/srt/ to their SRTs.	2026-05-15 20:24:24 +02:00
nickviljoen	b61d20d084	SRT QC: add parse_video_tokens + parse_srt_tokens Extract campaign code, clip slug, and locale from both video and SRT filenames. Handles the two SRT styles seen in testing_15may/srt/ (campaign-code-prefixed CFUL... form, and abbreviated RIO_INTRO6B form). Verified at REPL against test data.	2026-05-15 20:21:08 +02:00
nickviljoen	df212ea158	SRT QC: scaffold srt_pairing module with normalise_slug + canonical_locale Pure-function helpers, verified at REPL. canonical_locale handles the de-AT ↔ AT-de order flip between SRT and video filenames; normalise_slug strips non-alphanumerics so RIO_INTRO_15C ≈ RIO_INTRO15C.	2026-05-15 20:18:45 +02:00
nickviljoen	2e9f6f43a5	Video QC: register garment_name and title_safe in standard_video profile Profile YAML is descriptive metadata (executor runs unconditionally). Keeping it current so the profile page and any future YAML-driven selection reflects the live check set.	2026-05-15 12:35:11 +02:00
nickviljoen	89a42b0dfa	Video QC: add title_safe advisory check Flags (never fails) when price or garment-name text falls inside known platform UI overlay zones (TikTok / IG Stories / IG Reels / generic vertical). Platform inferred from filename tokens via _infer_platform_zones. Weight 0 in profile — advisory only, never contributes to overall score.	2026-05-15 12:34:24 +02:00
nickviljoen	8d277d2cb3	Video QC: add garment_name check Single Gemini direct-video call detects garment/product text overlays; deterministic match against PricingReference.get_prices() product_name for the file's locale. Skips when no pricing reference attached, locale unparseable, GEN/CEN file, no expected product names for locale, or no on-screen garment text detected. Weight 25 in standard_video profile.	2026-05-15 12:13:19 +02:00
nickviljoen	75063c54f9	Video QC: add product-name normalisation helpers for garment_name Adds _normalize_product_name (lowercase, alphanumeric+space, collapse whitespace) and _product_names_match (substring or >=60% token-set overlap on min side). Used by the upcoming garment_name check.	2026-05-15 12:11:02 +02:00
nickviljoen	4ada4c2d59	Video QC: add platform-zones lookup helper for title_safe Adds _PLATFORM_ZONES (TikTok / IG Stories / IG Reels / generic vertical) and _infer_platform_zones(filename) for use by the new title_safe check. Pure function, verified at REPL against expected filenames. No new behaviour exposed yet — wired up in the next task.	2026-05-15 12:08:42 +02:00
nickviljoen	78f61e0ba2	Video QC: surface matched price + product in price card The price_currency check has always done a full numeric match against the pricing reference but the report card only showed pass/fail by currency. Pull matched_price, matched_product, detected_prices, and expected_prices into the message string so QC reviewers can see the full match at a glance. No logic changes.	2026-05-15 12:01:58 +02:00
nickviljoen	4aa74b114a	HM QC: thread signed-in user into batch executor Single-file QC populated executor.context['user'] from current_user_email() in routes.py, but batch QC routed through BatchQCExecutor — which never accepted a user kwarg or set context['user'] on its per-file QCExecutor instances. Result: every LLM call from a batched HM QC run logged as anonymous in the Usage dashboard, only single-file and Video QC runs showed the user's email. BatchQCExecutor now takes user and stamps it onto each per-file executor's context just before execute(), matching the Video QC batch executor pattern.	2026-05-09 20:40:00 +02:00
nickviljoen	a52d50d549	Reporting: show searched Box folder under the search input Mirrors the hint pattern just added to Video Master so users can see exactly which Box folder the search is scanning, with a clickable link to open it in Box for self-diagnosis when a job number doesn't turn up.	2026-05-09 20:23:06 +02:00
nickviljoen	6b8b8ea5a6	Video Master: revert campaigns folder + lenient name matching The earlier swap to BOX_CAMPAIGNS_FOLDER_ID=133295752718 was wrong — Video Master operates on the automation campaigns folder (156182880490), where subfolders are named by campaign TITLE rather than the numeric job ID used in Reporting's root. Reverted the default in config.py and all three .env example files. Folder naming on Box is inconsistent — '1_CFUL263C01C_Kids drop1' vs '1_CFUL263C01F-Kids drop 2' vs 'Summer Activation 2026' all coexist. search_subfolder now strips every non-alphanumeric character from both the search input and the folder names before substring match, so: "kids drop 1" → matches "1_CFUL263C01C_Kids drop1" "Spring 2026" → matches "4023 Spring 2026" "winterfilm" → matches "1_WA20263C01 Winter Film" Form label/placeholder updated to "Campaign Title" with a hint that spaces/underscores/hyphens/case are all ignored.	2026-05-09 20:19:35 +02:00
nickviljoen	087224976a	Box: search-API-first lookup + 60s enumeration cap The previous search_subfolder implementation paginated the entire parent folder before falling back to Box's indexed search API. With the campaigns folder containing thousands of children, this exceeded even the new 5-minute background-thread cap and surfaced as 'Search timed out after 5 minutes' to the user. Now: 1. Hit the indexed search API first (~1-2s typical, even on huge parents) — returns immediately on a match. 2. Fall back to a streaming enumeration only for fresh folders Box hasn't indexed yet (~10 min latency window). Capped at 60s wall clock so we don't loop forever on a missing campaign. Also improves the not-found error message to mention the indexing latency caveat — handles the otherwise-confusing case where a freshly- created campaign folder isn't searchable for a few minutes.	2026-05-09 20:03:53 +02:00
nickviljoen	a3aee0de2e	Video Master: async campaign search + correct UI labels - /api/search-campaign now kicks off a background thread and returns immediately. The browser polls /api/progress/<session_id> and fetches the cached result via the new /api/search-campaign-result/<session_id> endpoint when complete. Box folder enumeration on a not-found campaign was taking >30s, exceeding the GCP load balancer's response timeout and surfacing as 'stream timeout' (not valid JSON) to the user. - Result cached for 10 min via the existing reporting result_cache (filesystem-backed → safe across gunicorn workers). - Form label/placeholder/hint updated: tool accepts a campaign NUMBER, not a campaign name. Placeholder shows '1993857' instead of '1011A Spring SS2025'.	2026-05-09 19:52:49 +02:00
nickviljoen	a500d7b088	Six tooling fixes from Dev test pass Video QC: * _extract_locale_from_filename now also handles the suffix form ..._XX-yy.ext (case-insensitive both sides), so DOOH/OOH-style adapt filenames like ..._ES-es.mp4 unblock the price_currency check instead of skipping with "could not extract locale". * Batch results page expires the SQLAlchemy session at the top of the route so the post-completion reload sees committed reports even when it lands on a different gunicorn worker than the one that wrote them. Reload delay bumped 1s → 2s for margin. * visual_quality prompt now passes the filename's market+language to the LLM and tells it the on-screen copy should be in the localized language, not the source-language guideline copy. Stops Spanish-market videos being flagged as "language mismatch with English campaign guidelines". Printer Check: * regions.json rewritten to cover all 10 H&M regions (AME, CEU, NEU, GCN, IND, SHE, SEU, EEU, EAS, Franchise) with default-all groups. Two judgement calls vs the screenshot: kept TR for Turkey (TK is Tokelau in ISO and would break filename matching) and BR for Brazil (every other code is 2-letter ISO). Campaign codes: * New core/utils/campaign_code.py is the single source of truth. Matches both the legacy 4-digits-plus-optional-letter (1013A, 4116) and the new 11-char alphanumeric with year at positions 5-6 (CFUL263C01D). All four prior parser sites now import from this helper. Video Master: * BOX_CAMPAIGNS_FOLDER_ID switched 156182880490 → 133295752718 (same root the Reporting tool uses). Updated config.py default and all three .env example files. * Match page now shows which Box folder the search runs against (with a clickable link), and on a not-found error explains what was searched for so missing-campaign cases are self-diagnosable.	2026-05-09 18:32:23 +02:00
nickviljoen	6a2945275a	Reporting: filesystem-back the search-result cache The previous in-memory dict only worked with a single gunicorn worker. With workers=2 in gunicorn_config.py, the async-search worker stored the result in its own process memory while the dashboard request landed on the other worker ~50% of the time — cache miss → fell through to a synchronous Box fetch → exceeded the GCP load balancer's 30s timeout, returning "stream timeout" to the user even though the search itself succeeded. Now stores cache entries as pickled files at storage/cache/<key>.pkl, shared across workers via the existing volume mount. Atomic writes via tempfile + os.replace. TTL still 30 minutes. Public API (cache_set/get/delete/cleanup) is unchanged so call sites in reporting/routes.py continue to work.	2026-05-09 17:46:42 +02:00
nickviljoen	84326352b2	Phase 1: replace local username/password auth with Azure AD SSO Lifted JWT-cookie auth pattern from the AI QC sibling project: core/auth/middleware.py validates Azure AD JWTs and stores them in an httpOnly cookie (hm_aiqc_auth_token). Tenant membership is enforced by JWTValidator's tid check, which is sufficient for the tenant-wide access policy chosen for this project. templates/login.html now drives an MSAL.js popup that POSTs the ID token to /auth/login. base.html exposes Azure config to all pages so the logout button can also clear the MSAL session. app.py's @before_request now checks the JWT cookie and exposes g.user; modules read user identity via core.auth.current_user_email so usage logs and created_by columns now record the signed-in user's email rather than a session value. Legacy username/password code removed: top-level auth_middleware.py, jwt_validator.py, deploy/generate_password.py.	2026-05-09 13:59:29 +02:00
nickviljoen	a0a9d0af47	Reporting: show all jobs in Previous Box Reports Aggregate box_import reports by job_number in SQL instead of fetching the most recent 100 rows and grouping in Python. The row-level LIMIT hid older jobs whenever one job's rows filled the window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 15:15:58 +02:00
nickviljoen	3dd0420145	Video Master: version grouping, 3-pass duration cascade, report download - Folder discovery groups files by version (V1, V2, ...); only the highest version per master/adapt is matched. Lower versions are reported as "superseded" so users can see what was skipped. - Matching is now an asymmetric 3-pass cascade per adaptation: Pass 1: masters of same duration (±0.5s) — pHash + AKAZE Pass 2: masters strictly longer than the adapt — pHash + AKAZE (shorter masters can't have produced the adapt; never compared) Pass 3: AI Vision on same-duration / different-resolution masters, triggered only when Passes 1 and 2 find nothing (covers crops). - AI Vision default switched from gpt-4o to gemini-2.5-flash (~10x cheaper) and re-enabled in CampaignMatcher. - Master temp files now persist for the whole run so Pass 3 can re-read frames; cleanup still happens via shutil.rmtree at end of run. - Report shows a "Resolved at" badge per match (Pass 1/2/3) and a new Superseded Files section. - New /video-master/report/<id>/download endpoint serves the saved HTML with attachment headers; Download buttons added to results.html and view_report.html. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 12:44:43 +02:00
nickviljoen	39383db95f	Pricing refs: Excel support, structured lookup, deterministic price match, video price check A. Excel upload — /campaigns/pricing/upload now accepts .xlsx/.xls alongside .pdf. File picker in the campaigns UI matches. B. Deterministic Excel parser (openpyxl, no LLM) — looks for H&M-style mastersheets: - 'MPC Prices' sheet -> flat list of {product_id, language, country, price, currency, product_name} entries (this is the gold mine). - Regional sheets (AME/CEU/EEU/...) -> formatted prices per locale used to derive currency symbol, position, decimal/thousands separators. Skips OLD/COPY sheets. Verified against the attached 1013A mastersheet: 448 price entries across 7 products x 74 locales, 139 locale format entries. Parser lives in modules/campaigns/pricing_parser.py alongside the existing PDF path (which now also returns the structured form with empty _prices). New lookup shape stored in PricingReference.parsed_data_json: {"_format": {"en-US": {currency_code, symbol, position, ...}, ...}, "_prices": [{product_id, language, country, price, currency, product_name}, ...]} Legacy flat {"<code>": {...}} is still recognised (treated as _format only) for backwards compatibility with the legacy global JSON import. Model helpers added: - PricingReference.get_format_map() - PricingReference.get_prices() to_dict() now reports price_count alongside entry_count. C. Upgraded price_currency_check.py — when a pricing reference with _prices is attached, the check runs a deterministic comparison: detected price(s) -> normalize (_normalize_price handles '$49.99', '39,99 €', 'CHF 49.95', '1.234,56', 'Rs. 2,799', '13 995 Ft', '349,-', '0.999.000'...) -> compare with tol=0.005 against the expected per-locale rows. LLM-based campaign-sheet fallback only runs if no _prices are present (legacy PDF reference or has_pricing campaign presentation). D. Video QC price check — new _run_price_check step in the executor. Parses filename (Market_lang_CampaignNum_... -> 'lang-Market' locale), detects prices across frames via the same Gemini/GPT-4o path the other checks use, then deterministic-validates against the attached pricing reference. Skipped if no pricing ref, unknown locale, GEN/CEN markets, or no price visible in video. Overall video score now uses weighted mean of active (non-skipped) checks (visual_quality w=50, censorship w=50, price_currency w=30) instead of the hardcoded 50/50 split — so skipping any one check falls through cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:52:39 +02:00
nickviljoen	e5d0d468db	Pricing references: standalone library (was single global file) The "Global Pricing Reference" is no longer a single file at storage/reference/global_pricing.json. Pricing references are now first-class DB rows (PricingReference model), uploadable as a library in the Campaigns tab and selectable per-run alongside the campaign presentation dropdown on the HM QC and Video QC configure pages. New: - core/models/pricing_reference.py — PricingReference model: id, name, pdf_filename, pdf_path, parsed_content, parsed_data_json, status, created_at/by. get_lookup() deserializes parsed_data_json; to_dict() powers the dropdown API. - /campaigns/pricing/upload — creates a PricingReference row, saves PDF under storage/pricing_references/<id>/, kicks off background parse. - /campaigns/pricing/<id> DELETE, /campaigns/api/pricing/list, /campaigns/api/pricing/status/<id>. - Campaigns index: "Pricing References" table card (mirrors the presentations card) + upload form with optional name field. Changed: - pricing_parser: parse_pricing_pdf_to_dict returns (dict, raw_text); new parse_pricing_reference(id) runs the parse against a DB row and sets status to ready/error. Legacy file-based path removed. - QCExecutor and VideoQCExecutor accept pricing_reference_id; load the row into context['pricing_reference']={id, name, lookup}. - BatchQCExecutor and BatchVideoQCExecutor thread pricing_reference_id through to per-file executors. - price_currency_check._validate_currency reads context instead of the disk file; returns 'skipped_no_reference' if no ref attached. - HM QC + Video QC /execute and /execute/batch routes pass pricing_reference_id from the JSON payload. - Configure templates for HM QC and Video QC add a second dropdown "Pricing Reference (Optional)" loaded from /campaigns/api/pricing/list. Backwards compatibility: - app.py: on startup, if storage/reference/global_pricing.json exists and the pricing_references table is empty, import it as a "Default (legacy global)" PricingReference row so existing installs keep a valid reference attached (user can pick it at configure time). - config.py: retains GLOBAL_PRICING_{PDF,JSON}_PATH for the legacy importer; adds PRICING_REF_STORAGE_PATH for the new per-row storage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:27:09 +02:00
nickviljoen	a0cc96afaf	Video QC: multi-file batch upload & processing Mirrors the existing HM QC batch pattern so Video QC now supports queueing and processing multiple videos from a single upload. New: - batch_executor.py — BatchVideoQCExecutor, sequential processing (gc.collect() between videos, cooldown between batches), stamps a shared batch_id into each report's metadata_json. - /video-qc/execute/batch — kicks off a BatchVideoQCExecutor thread. - /video-qc/results/batch/<session_id> — batch summary card, per-file list (filename, score, status, view/download), ZIP download link. Reuses results.html with is_batch flag. - /video-qc/report/<id>/download, /video-qc/report/batch/<id>/download (ZIP), /video-qc/report/batch/<id> DELETE. Changed: - VideoQCExecutor accepts batch_id; writes it into metadata when set. - /video-qc/upload accepts multi-file (request.files.getlist('files')) with single-file fallback; returns is_batch/filenames/file_count. - Upload template: drag-and-drop list UI (same pattern as HM QC upload). - Configure template: shows file count + list, swaps button text and POST endpoint based on file_count; redirects to results/batch when batch, results when single. - Video QC index uses QCReport.get_recent_grouped to render "Batch Reports" (collapsible per-batch table) + "Individual Reports". Post-run destinations: - 1 file -> /video-qc/results/<session_id> (unchanged) - N files -> /video-qc/results/batch/<session_id> (batch summary + list of reports from the run) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:58:46 +02:00
nickviljoen	42055d9a7a	Fix Video QC crash and Reporting history dashboard - video_qc/executor.py: escape braces in JSON example blocks inside f-string prompts (visual_quality, censorship). Unescaped { } made Python parse the example as format specifiers, raising "Invalid format specifier ' 85, ..." and failing execution. - reporting/routes.py: history_dashboard now passes reports=parsed_reports (matching the live dashboard route) and attaches friendly_checks per report. Previously passed parsed_reports=friendly_reports, a kwarg the template does not consume, leaving the Parsed Data View accordion empty and breaking the "View Details" scroll-to-file links. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:34:22 +02:00
nickviljoen	6341714899	Split input/output token tracking; refresh provider pricing table UsageLog now records input_tokens and output_tokens separately and costs each side at its real rate. The old single 'blended' rate underpriced input-heavy workloads (vision/QC) and overpriced output-heavy ones. COST_PER_MILLION_TOKENS rebuilt against the live OpenAI, Gemini and Anthropic pricing pages (GPT-5.4 family, GPT-4.x, o4-mini; Gemini 2.5 Pro/Flash/Flash-Lite + 1.5 legacy; Claude 4.7/4.6/4.5 + 3.x legacy). Unknown models now warn instead of silently defaulting to $5/1M. Adds idempotent ALTER TABLE migration on startup so existing SQLite DBs pick up the new columns. Dashboard + API surface the input/output split. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 14:40:13 +02:00
nickviljoen	ffb4745d83	Batch naming, delete batch, consistent results view - Show job number in batch header instead of just "Batch <date>" - Add delete batch button (trash icon) that removes all reports + files - New DELETE /hm-qc/report/batch/<batch_id> route - Unified batch results view: always renders from DB reports (not ephemeral progress tracker data), so the view is identical whether you just completed a batch or navigated back from another tab - Include thumbnails in batch results per-file rows Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 16:38:25 +02:00
nickviljoen	63b8a04c46	Fix persistent OOM: reduce image size, force GC, recycle workers Still OOM after 7 files despite sequential processing. Root cause: Python's allocator doesn't return freed memory to the OS, so image buffers accumulate across files until the OOM killer strikes. Fixes: - Reduce LLM image max size from 2000px to 1200px (64% less RAM per image, still sufficient for vision analysis) - Always close PIL images immediately (not just when opened locally) - Replace ThreadPoolExecutor with simple sequential loop + gc.collect() after each file to force memory reclamation - Switch gunicorn to gthread (2 workers x 2 threads) for better request concurrency without extra memory overhead - Add max_requests=200 to auto-recycle workers and release accumulated memory Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 16:17:59 +02:00
nickviljoen	5e3f071344	Fix OOM crash on large batches: reduce concurrency and free image memory Worker was SIGKILL'd by OOM killer during batch QC (18 files). Fixes: - Reduce MAX_CONCURRENT_FILES from 2 to 1 (sequential processing) - Reduce gunicorn workers from 4 to 2 (less memory contention) - Explicitly close PIL images after thumbnail generation - Close BytesIO buffers and PIL images after base64 encoding Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 16:02:30 +02:00
nickviljoen	d04b86ac04	Add thumbnail to reports, download buttons, and consolidated report - Embed asset thumbnail (base64) in HTML report header - Add view/download buttons to batch results per-file rows - Add download ZIP and consolidated report buttons to batch results - Add view/download buttons to upload page recent reports table - Add download button to individual reports on index page - New POST /hm-qc/report/consolidated route: merges selected reports into a single downloadable HTML with summary table + embedded reports Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 15:33:32 +02:00
nickviljoen	8a7d477c86	Fix batch QC: add Flask app context to ThreadPoolExecutor child threads ThreadPoolExecutor workers don't inherit the parent thread's Flask app context, causing "Working outside of application context" errors during batch QC execution. Pass the app instance into BatchQCExecutor and wrap each child thread's work with app.app_context(). Also ensure the progress_sessions table is created on fresh databases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 15:20:56 +02:00
nickviljoen	d036752d17	v2.2.0: Gemini video, batch grouping, thumbnails, speed, price fix, printer check - Video QC: Switch to Google Gemini direct video analysis as default (OpenAI frame grid fallback) - HM QC: Group reports by batch with collapsible sections, ZIP download per batch - HM QC: Generate asset thumbnails (150px) displayed in report listings - Speed: Remove artificial delays, add ThreadPoolExecutor(2) for parallel batch processing - Price detection: Improved prompt with country context, detect all prices, increased text limit - New Printer Check module: CSV-to-PDF cross-referencing ported from CrossMatch Rust app Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 13:56:07 +02:00
nickviljoen	472862329c	Fix report download: use url_for() instead of hardcoded paths Export/download links in reporting dashboards used hardcoded absolute paths (e.g. /reporting/export/html/...) which bypassed the reverse proxy SCRIPT_NAME prefix (/hm-ai-qc-report), causing "No file" errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 12:53:22 +02:00
nickviljoen	7a3272b7c4	Fix price detection: better error handling, strip markdown fences, log responses - Strip markdown code fences from LLM response before JSON parsing - Log raw response and parsed result for debugging - Show warning with provider/model info when detection fails (instead of silent skip) - Separate "detection failed" (warning, 70) from "no price found" (skipped, 100) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 19:05:51 +02:00
nickviljoen	81a1cd94c9	Add Excel (.xlsx) support for campaign media plans / price sheets - Accept .xlsx/.xls uploads alongside PDFs in campaigns module - New parse_campaign_excel() in services.py using openpyxl - Converts all sheets to structured text (headers + rows) for LLM use - Upload form now accepts both PDF and Excel files - Added openpyxl to requirements.txt Workflow: upload campaign presentation (PDF) + media plan (Excel with has_pricing checked) for the same campaign ID. The price check will use the Excel data to validate actual prices per country. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 18:54:59 +02:00
nickviljoen	2d5fe43031	Support multiple campaign docs + clarify pricing is format-only - Global pricing parser now explicitly extracts format only (symbol, position, separators) — ignores actual price values in the reference doc - Executors load ALL ready documents for a campaign (not just the latest), combining their content — supports guidelines + media plan side by side - Campaign context now separates pricing_content (from has_pricing docs) from general parsed_content (all docs combined) - Price check uses pricing_content specifically for actual price validation - Report header shows document count (e.g., "1022B - AW25 Display (2 docs) + pricing") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 18:47:46 +02:00
nickviljoen	fc15a2dda3	Rewrite filename check + add price/currency check to image QC Filename check: - Rewritten to flexibly parse multiple H&M naming conventions (Display, DOOH, OOH, SOME STATIC, Social, POS, DS) - Extracts country code, language code, dimensions, campaign number - Scores based on how much metadata was extracted (not rigid pattern) - Tested against real filenames: BG_bg, ES_es, NO-no formats Price/currency check (new): - Detects prices in images via LLM vision API - Validates currency against global pricing reference (deterministic) - Falls back to LLM validation for unknown countries - Optional campaign pricing sheet validation when has_pricing=True - Added to profile with weight 30 Profile weights rebalanced: filename 30, quality 40, price 30 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 18:39:54 +02:00
nickviljoen	dc73268309	Fix report download 404 and add campaign info to reports - Add /report/<id>/download route using send_file instead of broken static file URL (fixes 404 on Download Report button) - Add campaign label to HTML report header (Campaign: ID - Name) - Store campaign_id in report metadata_json for traceability Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 18:26:18 +02:00
nickviljoen	392e0e5864	Fix campaign upload: threading context, progress bar, auto-refresh table - Fix background parsing thread: pass app reference explicitly instead of trying to access current_app inside the thread (was silently failing) - Add progress bar with animated stages during upload and parsing - Add data-id/data-status attributes to table rows for auto-polling - On page load, automatically poll any pending/parsing rows and update their status badges in-place (fixes stale "Pending" on tab return) - Immediately inject new row into table after upload so user sees it without needing to refresh - Remove broken _parse_pricing_background function Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 18:03:13 +02:00
nickviljoen	9c33858726	Add campaign presentation management and global pricing reference Introduces a new Campaigns module for uploading campaign presentation PDFs that QC checks reference to validate assets against campaign-specific guidelines (typography, layout, copy, pricing format). Also adds a global pricing reference system that maps country codes to currency symbols and formats for deterministic price/currency validation. - New CampaignPresentation model + campaigns blueprint with CRUD routes - PDF parsing via LlamaParse (text + multimodal page images) - Global pricing PDF parsed into structured JSON lookup - Campaign context injected into both image and video QC executors - Quality checks enhanced with campaign guidelines in LLM prompts - Price/currency check uses global pricing lookup (saves an LLM call) - Campaign dropdown added to HM QC and Video QC configure pages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 16:12:22 +02:00
nickviljoen	5267e590eb	Disable AKAZE for campaign matching — temp files deleted before use AKAZE tier needs the actual video file to extract frames, but our temp-download-and-delete approach means the file is gone by that point. Perceptual hash (Tier 1) works fine with saved fingerprint data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 22:55:42 +02:00

1 2

70 commits