Commit graph

30 commits

Author SHA1 Message Date
nickviljoen
4aa74b114a HM QC: thread signed-in user into batch executor
Single-file QC populated executor.context['user'] from current_user_email()
in routes.py, but batch QC routed through BatchQCExecutor — which never
accepted a user kwarg or set context['user'] on its per-file QCExecutor
instances. Result: every LLM call from a batched HM QC run logged as
anonymous in the Usage dashboard, only single-file and Video QC runs
showed the user's email.

BatchQCExecutor now takes user and stamps it onto each per-file
executor's context just before execute(), matching the Video QC
batch executor pattern.
2026-05-09 20:40:00 +02:00
nickviljoen
a500d7b088 Six tooling fixes from Dev test pass
Video QC:
* _extract_locale_from_filename now also handles the suffix form
  ..._XX-yy.ext (case-insensitive both sides), so DOOH/OOH-style
  adapt filenames like ..._ES-es.mp4 unblock the price_currency
  check instead of skipping with "could not extract locale".
* Batch results page expires the SQLAlchemy session at the top of
  the route so the post-completion reload sees committed reports
  even when it lands on a different gunicorn worker than the one
  that wrote them. Reload delay bumped 1s → 2s for margin.
* visual_quality prompt now passes the filename's market+language
  to the LLM and tells it the on-screen copy should be in the
  localized language, not the source-language guideline copy.
  Stops Spanish-market videos being flagged as "language mismatch
  with English campaign guidelines".

Printer Check:
* regions.json rewritten to cover all 10 H&M regions (AME, CEU,
  NEU, GCN, IND, SHE, SEU, EEU, EAS, Franchise) with default-all
  groups. Two judgement calls vs the screenshot: kept TR for
  Turkey (TK is Tokelau in ISO and would break filename matching)
  and BR for Brazil (every other code is 2-letter ISO).

Campaign codes:
* New core/utils/campaign_code.py is the single source of truth.
  Matches both the legacy 4-digits-plus-optional-letter (1013A,
  4116) and the new 11-char alphanumeric with year at positions
  5-6 (CFUL263C01D). All four prior parser sites now import from
  this helper.

Video Master:
* BOX_CAMPAIGNS_FOLDER_ID switched 156182880490 → 133295752718
  (same root the Reporting tool uses). Updated config.py default
  and all three .env example files.
* Match page now shows which Box folder the search runs against
  (with a clickable link), and on a not-found error explains what
  was searched for so missing-campaign cases are self-diagnosable.
2026-05-09 18:32:23 +02:00
nickviljoen
84326352b2 Phase 1: replace local username/password auth with Azure AD SSO
Lifted JWT-cookie auth pattern from the AI QC sibling project:
  core/auth/middleware.py validates Azure AD JWTs and stores them in
  an httpOnly cookie (hm_aiqc_auth_token). Tenant membership is
  enforced by JWTValidator's tid check, which is sufficient for the
  tenant-wide access policy chosen for this project.

  templates/login.html now drives an MSAL.js popup that POSTs the
  ID token to /auth/login. base.html exposes Azure config to all
  pages so the logout button can also clear the MSAL session.

  app.py's @before_request now checks the JWT cookie and exposes
  g.user; modules read user identity via core.auth.current_user_email
  so usage logs and created_by columns now record the signed-in
  user's email rather than a session value.

  Legacy username/password code removed: top-level auth_middleware.py,
  jwt_validator.py, deploy/generate_password.py.
2026-05-09 13:59:29 +02:00
nickviljoen
39383db95f Pricing refs: Excel support, structured lookup, deterministic price match, video price check
A. Excel upload — /campaigns/pricing/upload now accepts .xlsx/.xls
   alongside .pdf. File picker in the campaigns UI matches.

B. Deterministic Excel parser (openpyxl, no LLM) — looks for H&M-style
   mastersheets:
     - 'MPC Prices' sheet -> flat list of {product_id, language, country,
       price, currency, product_name} entries (this is the gold mine).
     - Regional sheets (AME/CEU/EEU/...) -> formatted prices per locale
       used to derive currency symbol, position, decimal/thousands
       separators. Skips OLD/COPY sheets.
   Verified against the attached 1013A mastersheet: 448 price entries
   across 7 products x 74 locales, 139 locale format entries.

   Parser lives in modules/campaigns/pricing_parser.py alongside the
   existing PDF path (which now also returns the structured form with
   empty _prices).

   New lookup shape stored in PricingReference.parsed_data_json:
     {"_format": {"en-US": {currency_code, symbol, position, ...}, ...},
      "_prices": [{product_id, language, country, price, currency,
                   product_name}, ...]}
   Legacy flat {"<code>": {...}} is still recognised (treated as _format
   only) for backwards compatibility with the legacy global JSON import.

   Model helpers added:
     - PricingReference.get_format_map()
     - PricingReference.get_prices()
   to_dict() now reports price_count alongside entry_count.

C. Upgraded price_currency_check.py — when a pricing reference with
   _prices is attached, the check runs a deterministic comparison:
   detected price(s) -> normalize (_normalize_price handles '$49.99',
   '39,99 €', 'CHF 49.95', '1.234,56', 'Rs. 2,799', '13 995 Ft', '349,-',
   '0.999.000'...) -> compare with tol=0.005 against the expected
   per-locale rows. LLM-based campaign-sheet fallback only runs if no
   _prices are present (legacy PDF reference or has_pricing campaign
   presentation).

D. Video QC price check — new _run_price_check step in the executor.
   Parses filename (Market_lang_CampaignNum_... -> 'lang-Market' locale),
   detects prices across frames via the same Gemini/GPT-4o path the
   other checks use, then deterministic-validates against the attached
   pricing reference. Skipped if no pricing ref, unknown locale, GEN/CEN
   markets, or no price visible in video.

   Overall video score now uses weighted mean of active (non-skipped)
   checks (visual_quality w=50, censorship w=50, price_currency w=30)
   instead of the hardcoded 50/50 split — so skipping any one check
   falls through cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:52:39 +02:00
nickviljoen
e5d0d468db Pricing references: standalone library (was single global file)
The "Global Pricing Reference" is no longer a single file at
storage/reference/global_pricing.json. Pricing references are now
first-class DB rows (PricingReference model), uploadable as a library
in the Campaigns tab and selectable per-run alongside the campaign
presentation dropdown on the HM QC and Video QC configure pages.

New:
- core/models/pricing_reference.py — PricingReference model: id, name,
  pdf_filename, pdf_path, parsed_content, parsed_data_json, status,
  created_at/by. get_lookup() deserializes parsed_data_json; to_dict()
  powers the dropdown API.
- /campaigns/pricing/upload — creates a PricingReference row, saves PDF
  under storage/pricing_references/<id>/, kicks off background parse.
- /campaigns/pricing/<id> DELETE, /campaigns/api/pricing/list,
  /campaigns/api/pricing/status/<id>.
- Campaigns index: "Pricing References" table card (mirrors the
  presentations card) + upload form with optional name field.

Changed:
- pricing_parser: parse_pricing_pdf_to_dict returns (dict, raw_text);
  new parse_pricing_reference(id) runs the parse against a DB row and
  sets status to ready/error. Legacy file-based path removed.
- QCExecutor and VideoQCExecutor accept pricing_reference_id; load the
  row into context['pricing_reference']={id, name, lookup}.
- BatchQCExecutor and BatchVideoQCExecutor thread pricing_reference_id
  through to per-file executors.
- price_currency_check._validate_currency reads context instead of the
  disk file; returns 'skipped_no_reference' if no ref attached.
- HM QC + Video QC /execute and /execute/batch routes pass
  pricing_reference_id from the JSON payload.
- Configure templates for HM QC and Video QC add a second dropdown
  "Pricing Reference (Optional)" loaded from /campaigns/api/pricing/list.

Backwards compatibility:
- app.py: on startup, if storage/reference/global_pricing.json exists
  and the pricing_references table is empty, import it as a
  "Default (legacy global)" PricingReference row so existing installs
  keep a valid reference attached (user can pick it at configure time).
- config.py: retains GLOBAL_PRICING_{PDF,JSON}_PATH for the legacy
  importer; adds PRICING_REF_STORAGE_PATH for the new per-row storage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:27:09 +02:00
nickviljoen
ffb4745d83 Batch naming, delete batch, consistent results view
- Show job number in batch header instead of just "Batch <date>"
- Add delete batch button (trash icon) that removes all reports + files
- New DELETE /hm-qc/report/batch/<batch_id> route
- Unified batch results view: always renders from DB reports (not
  ephemeral progress tracker data), so the view is identical whether
  you just completed a batch or navigated back from another tab
- Include thumbnails in batch results per-file rows

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:38:25 +02:00
nickviljoen
63b8a04c46 Fix persistent OOM: reduce image size, force GC, recycle workers
Still OOM after 7 files despite sequential processing. Root cause:
Python's allocator doesn't return freed memory to the OS, so image
buffers accumulate across files until the OOM killer strikes.

Fixes:
- Reduce LLM image max size from 2000px to 1200px (64% less RAM per
  image, still sufficient for vision analysis)
- Always close PIL images immediately (not just when opened locally)
- Replace ThreadPoolExecutor with simple sequential loop + gc.collect()
  after each file to force memory reclamation
- Switch gunicorn to gthread (2 workers x 2 threads) for better
  request concurrency without extra memory overhead
- Add max_requests=200 to auto-recycle workers and release accumulated
  memory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:17:59 +02:00
nickviljoen
5e3f071344 Fix OOM crash on large batches: reduce concurrency and free image memory
Worker was SIGKILL'd by OOM killer during batch QC (18 files). Fixes:
- Reduce MAX_CONCURRENT_FILES from 2 to 1 (sequential processing)
- Reduce gunicorn workers from 4 to 2 (less memory contention)
- Explicitly close PIL images after thumbnail generation
- Close BytesIO buffers and PIL images after base64 encoding

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:02:30 +02:00
nickviljoen
d04b86ac04 Add thumbnail to reports, download buttons, and consolidated report
- Embed asset thumbnail (base64) in HTML report header
- Add view/download buttons to batch results per-file rows
- Add download ZIP and consolidated report buttons to batch results
- Add view/download buttons to upload page recent reports table
- Add download button to individual reports on index page
- New POST /hm-qc/report/consolidated route: merges selected reports
  into a single downloadable HTML with summary table + embedded reports

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:33:32 +02:00
nickviljoen
8a7d477c86 Fix batch QC: add Flask app context to ThreadPoolExecutor child threads
ThreadPoolExecutor workers don't inherit the parent thread's Flask app
context, causing "Working outside of application context" errors during
batch QC execution. Pass the app instance into BatchQCExecutor and wrap
each child thread's work with app.app_context(). Also ensure the
progress_sessions table is created on fresh databases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:20:56 +02:00
nickviljoen
d036752d17 v2.2.0: Gemini video, batch grouping, thumbnails, speed, price fix, printer check
- Video QC: Switch to Google Gemini direct video analysis as default (OpenAI frame grid fallback)
- HM QC: Group reports by batch with collapsible sections, ZIP download per batch
- HM QC: Generate asset thumbnails (150px) displayed in report listings
- Speed: Remove artificial delays, add ThreadPoolExecutor(2) for parallel batch processing
- Price detection: Improved prompt with country context, detect all prices, increased text limit
- New Printer Check module: CSV-to-PDF cross-referencing ported from CrossMatch Rust app

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:56:07 +02:00
nickviljoen
7a3272b7c4 Fix price detection: better error handling, strip markdown fences, log responses
- Strip markdown code fences from LLM response before JSON parsing
- Log raw response and parsed result for debugging
- Show warning with provider/model info when detection fails (instead of silent skip)
- Separate "detection failed" (warning, 70) from "no price found" (skipped, 100)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 19:05:51 +02:00
nickviljoen
2d5fe43031 Support multiple campaign docs + clarify pricing is format-only
- Global pricing parser now explicitly extracts format only (symbol,
  position, separators) — ignores actual price values in the reference doc
- Executors load ALL ready documents for a campaign (not just the latest),
  combining their content — supports guidelines + media plan side by side
- Campaign context now separates pricing_content (from has_pricing docs)
  from general parsed_content (all docs combined)
- Price check uses pricing_content specifically for actual price validation
- Report header shows document count (e.g., "1022B - AW25 Display (2 docs) + pricing")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 18:47:46 +02:00
nickviljoen
fc15a2dda3 Rewrite filename check + add price/currency check to image QC
Filename check:
- Rewritten to flexibly parse multiple H&M naming conventions
  (Display, DOOH, OOH, SOME STATIC, Social, POS, DS)
- Extracts country code, language code, dimensions, campaign number
- Scores based on how much metadata was extracted (not rigid pattern)
- Tested against real filenames: BG_bg, ES_es, NO-no formats

Price/currency check (new):
- Detects prices in images via LLM vision API
- Validates currency against global pricing reference (deterministic)
- Falls back to LLM validation for unknown countries
- Optional campaign pricing sheet validation when has_pricing=True
- Added to profile with weight 30

Profile weights rebalanced: filename 30, quality 40, price 30

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 18:39:54 +02:00
nickviljoen
dc73268309 Fix report download 404 and add campaign info to reports
- Add /report/<id>/download route using send_file instead of broken
  static file URL (fixes 404 on Download Report button)
- Add campaign label to HTML report header (Campaign: ID - Name)
- Store campaign_id in report metadata_json for traceability

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 18:26:18 +02:00
nickviljoen
9c33858726 Add campaign presentation management and global pricing reference
Introduces a new Campaigns module for uploading campaign presentation PDFs
that QC checks reference to validate assets against campaign-specific
guidelines (typography, layout, copy, pricing format). Also adds a global
pricing reference system that maps country codes to currency symbols and
formats for deterministic price/currency validation.

- New CampaignPresentation model + campaigns blueprint with CRUD routes
- PDF parsing via LlamaParse (text + multimodal page images)
- Global pricing PDF parsed into structured JSON lookup
- Campaign context injected into both image and video QC executors
- Quality checks enhanced with campaign guidelines in LLM prompts
- Price/currency check uses global pricing lookup (saves an LLM call)
- Campaign dropdown added to HM QC and Video QC configure pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 16:12:22 +02:00
nickviljoen
b4e94ad4eb Update default Google model to gemini-2.5-flash
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:59:00 +02:00
nickviljoen
e910e00edf Add Usage Dashboard with token tracking, cost estimates, and filters
- New UsageLog model tracking every LLM API call (provider, model,
  tokens, estimated cost, user, module, check name)
- Instrument LLMConfig.call_vision_api() to auto-log each call
- New /usage tab in nav bar with dashboard showing:
  - Summary cards (total calls, tokens, estimated cost)
  - Breakdowns by provider, model, tool, and user
  - Recent API calls table
  - Time filters (All Time, 30 Days, 7 Days, Today)
- Cost estimates based on per-model token pricing
- Pass logged-in user through executor context for tracking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:17:21 +02:00
nickviljoen
b4abbe8d2d Add delete buttons for reports in both HM QC and Reporting sections
- HM QC: trash icon per report row, DELETE /hm-qc/report/<id> removes
  DB record and file from disk
- Reporting: trash icon per Box job row, DELETE /reporting/history/delete/<job>
  removes all saved Box reports for that job number
- Confirmation prompts before deletion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:13:03 +02:00
nickviljoen
71ddf7892f Add View button to previous QC reports to open saved HTML report
- Add /hm-qc/report/<id> route to serve saved reports by database ID
- Create view_report.html template with score summary and embedded report iframe
- Add "View" button column to Previous QC Reports table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 17:55:15 +02:00
nickviljoen
5e291723a0 Swap dimension_check back to filename_parse, strengthen text legibility prompt
- Replace dimension_check with filename_parse in H&M Image Check profile
- Rewrite quality check prompt to be much stricter on text legibility:
  - Text legibility is now the #1 priority (CRITICAL check)
  - Any illegible text forces score below 70 (FAILED)
  - Explicit instructions to check ALL text including small overlays
  - Low contrast text on dark/busy backgrounds flagged as common failure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 17:38:01 +02:00
nickviljoen
23fda1ec70 Move QC reports section from Reporting tab to HM QC tab
- Remove "Previous QC Reports" table from reporting index
- Add "Previous QC Reports" table to HM QC index page
- Update HM QC index route to pass recent reports
- Update feature list to reflect current checks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 17:16:41 +02:00
nickviljoen
91dec41e0b Batch 3: Add title legibility check, Google Gemini support, LLM provider selector
- Update image quality prompt to evaluate text/title legibility
- Add Google Gemini (generativeai) as LLM provider in LLMConfig
- Add AI Provider dropdown on configure page (OpenAI GPT-4o / Google Gemini)
- Pass selected provider through execute routes to override profile defaults
- Add google-generativeai to requirements.txt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 16:53:07 +02:00
nickviljoen
1c582ffcf4 Batch 2: Simplify to single profile, fix multi-file batch execution
- Replace 3 profiles with single "H&M Image Check" (dimension_check + image_quality)
- Remove filename_parse check (pattern didn't match actual filenames)
- Create DimensionCheck class for image dimension validation
- Fix configure page to route multi-file uploads to batch endpoint
- Auto-select single profile, show file list on configure page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 16:50:35 +02:00
nickviljoen
9ce44981eb Batch 1: Fix navigation and add past reports views
- Fix back navigation on reporting dashboards (linked to / instead of /reporting/index)
- Add "Run Another QC" button on HM QC results page
- Add Recent Reports table on reporting search page (grouped by job number)
- Add Recent QC Reports table on HM QC upload page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 16:48:24 +02:00
nickviljoen
f21e41afc3 v1.2.0: Add Docker deployment, simplify auth to local login, production config
- Add Dockerfile, docker-compose.yml, .dockerignore for containerised deployment
- Add deploy/ scripts (deploy.sh, nginx/apache configs, password generator)
- Replace MSAL/Azure AD auth with local username/password authentication
- Add login.html template
- Simplify app.py, middleware, and auth routes for production use
- Update gunicorn_config.py and wsgi.py for Docker/production
- Update templates to work with new auth and URL prefix handling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 14:37:53 +02:00
nickviljoen
ffd8b7303c v1.1.0: Add progress tracking, CSV export, multi-job support, batch processing, and security fixes
- Reporting: async search with SSE progress bar, CSV export with Box file links,
  multi-job support, designer-friendly error display with action guidance
- HM QC: batch file upload (up to 100 files), batch execution with rate limiting,
  batch results summary
- Fix: SQLAlchemy stale cache in SSE progress streaming (expire_all + commit)
- Fix: Box folder pagination loop (search API instead of iterating 10,300 folders)
- Fix: HM QC blank screen (progress.js not loaded, hardcoded wrong URLs)
- Security: remove hardcoded API keys from legacy files, read from .env instead

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 09:43:20 +02:00
nickviljoen
35a15bfe09 Update documentation for unified platform consolidation
- Rewrite CHANGELOG.md to cover platform v1.0.0 and auth fix,
  with reporting module history preserved as subsection
- Replace stale DOCUMENTATION_SUMMARY.txt with current project
  structure and key decisions
- Rewrite MIGRATION_GUIDE.md to document legacy tool consolidation
  with complete file mappings for hm_qc and video_qc
- Add legacy context headers to module docs (legacy_README,
  legacy_DEV_SETUP, legacy_CLAUDE) pointing to main README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 13:51:21 +02:00
nickviljoen
677736943a Consolidate legacy hm_qc and video_qc tools into main project
Merge original CLI check implementations from hm_qc/ and
hm_qc_video/ repos into modules/*/checks/legacy/ directories.
Includes profiles, launchers, utils, orchestrators, and the
standalone video Flask web app. Reference files (test data,
results, cheat sheets) copied to gitignored reference/ directory.
Censorship trainset copied to gitignored data/supporting/.

The legacy/ naming convention separates original run_check()
function-based implementations from the new BaseCheck class
architecture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 11:40:53 +02:00
nickviljoen
e6f3e9387e Add modular architecture, core framework, and web UI
New blueprint-based module system (hm_qc, video_qc, video_master,
reporting), core framework (database, config, templates), and
unified web interface with progress tracking and tab navigation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 11:39:04 +02:00