Commit graph

15 commits

Author SHA1 Message Date
nickviljoen
a500d7b088 Six tooling fixes from Dev test pass
Video QC:
* _extract_locale_from_filename now also handles the suffix form
  ..._XX-yy.ext (case-insensitive both sides), so DOOH/OOH-style
  adapt filenames like ..._ES-es.mp4 unblock the price_currency
  check instead of skipping with "could not extract locale".
* Batch results page expires the SQLAlchemy session at the top of
  the route so the post-completion reload sees committed reports
  even when it lands on a different gunicorn worker than the one
  that wrote them. Reload delay bumped 1s → 2s for margin.
* visual_quality prompt now passes the filename's market+language
  to the LLM and tells it the on-screen copy should be in the
  localized language, not the source-language guideline copy.
  Stops Spanish-market videos being flagged as "language mismatch
  with English campaign guidelines".

Printer Check:
* regions.json rewritten to cover all 10 H&M regions (AME, CEU,
  NEU, GCN, IND, SHE, SEU, EEU, EAS, Franchise) with default-all
  groups. Two judgement calls vs the screenshot: kept TR for
  Turkey (TK is Tokelau in ISO and would break filename matching)
  and BR for Brazil (every other code is 2-letter ISO).

Campaign codes:
* New core/utils/campaign_code.py is the single source of truth.
  Matches both the legacy 4-digits-plus-optional-letter (1013A,
  4116) and the new 11-char alphanumeric with year at positions
  5-6 (CFUL263C01D). All four prior parser sites now import from
  this helper.

Video Master:
* BOX_CAMPAIGNS_FOLDER_ID switched 156182880490 → 133295752718
  (same root the Reporting tool uses). Updated config.py default
  and all three .env example files.
* Match page now shows which Box folder the search runs against
  (with a clickable link), and on a not-found error explains what
  was searched for so missing-campaign cases are self-diagnosable.
2026-05-09 18:32:23 +02:00
nickviljoen
39383db95f Pricing refs: Excel support, structured lookup, deterministic price match, video price check
A. Excel upload — /campaigns/pricing/upload now accepts .xlsx/.xls
   alongside .pdf. File picker in the campaigns UI matches.

B. Deterministic Excel parser (openpyxl, no LLM) — looks for H&M-style
   mastersheets:
     - 'MPC Prices' sheet -> flat list of {product_id, language, country,
       price, currency, product_name} entries (this is the gold mine).
     - Regional sheets (AME/CEU/EEU/...) -> formatted prices per locale
       used to derive currency symbol, position, decimal/thousands
       separators. Skips OLD/COPY sheets.
   Verified against the attached 1013A mastersheet: 448 price entries
   across 7 products x 74 locales, 139 locale format entries.

   Parser lives in modules/campaigns/pricing_parser.py alongside the
   existing PDF path (which now also returns the structured form with
   empty _prices).

   New lookup shape stored in PricingReference.parsed_data_json:
     {"_format": {"en-US": {currency_code, symbol, position, ...}, ...},
      "_prices": [{product_id, language, country, price, currency,
                   product_name}, ...]}
   Legacy flat {"<code>": {...}} is still recognised (treated as _format
   only) for backwards compatibility with the legacy global JSON import.

   Model helpers added:
     - PricingReference.get_format_map()
     - PricingReference.get_prices()
   to_dict() now reports price_count alongside entry_count.

C. Upgraded price_currency_check.py — when a pricing reference with
   _prices is attached, the check runs a deterministic comparison:
   detected price(s) -> normalize (_normalize_price handles '$49.99',
   '39,99 €', 'CHF 49.95', '1.234,56', 'Rs. 2,799', '13 995 Ft', '349,-',
   '0.999.000'...) -> compare with tol=0.005 against the expected
   per-locale rows. LLM-based campaign-sheet fallback only runs if no
   _prices are present (legacy PDF reference or has_pricing campaign
   presentation).

D. Video QC price check — new _run_price_check step in the executor.
   Parses filename (Market_lang_CampaignNum_... -> 'lang-Market' locale),
   detects prices across frames via the same Gemini/GPT-4o path the
   other checks use, then deterministic-validates against the attached
   pricing reference. Skipped if no pricing ref, unknown locale, GEN/CEN
   markets, or no price visible in video.

   Overall video score now uses weighted mean of active (non-skipped)
   checks (visual_quality w=50, censorship w=50, price_currency w=30)
   instead of the hardcoded 50/50 split — so skipping any one check
   falls through cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:52:39 +02:00
nickviljoen
e5d0d468db Pricing references: standalone library (was single global file)
The "Global Pricing Reference" is no longer a single file at
storage/reference/global_pricing.json. Pricing references are now
first-class DB rows (PricingReference model), uploadable as a library
in the Campaigns tab and selectable per-run alongside the campaign
presentation dropdown on the HM QC and Video QC configure pages.

New:
- core/models/pricing_reference.py — PricingReference model: id, name,
  pdf_filename, pdf_path, parsed_content, parsed_data_json, status,
  created_at/by. get_lookup() deserializes parsed_data_json; to_dict()
  powers the dropdown API.
- /campaigns/pricing/upload — creates a PricingReference row, saves PDF
  under storage/pricing_references/<id>/, kicks off background parse.
- /campaigns/pricing/<id> DELETE, /campaigns/api/pricing/list,
  /campaigns/api/pricing/status/<id>.
- Campaigns index: "Pricing References" table card (mirrors the
  presentations card) + upload form with optional name field.

Changed:
- pricing_parser: parse_pricing_pdf_to_dict returns (dict, raw_text);
  new parse_pricing_reference(id) runs the parse against a DB row and
  sets status to ready/error. Legacy file-based path removed.
- QCExecutor and VideoQCExecutor accept pricing_reference_id; load the
  row into context['pricing_reference']={id, name, lookup}.
- BatchQCExecutor and BatchVideoQCExecutor thread pricing_reference_id
  through to per-file executors.
- price_currency_check._validate_currency reads context instead of the
  disk file; returns 'skipped_no_reference' if no ref attached.
- HM QC + Video QC /execute and /execute/batch routes pass
  pricing_reference_id from the JSON payload.
- Configure templates for HM QC and Video QC add a second dropdown
  "Pricing Reference (Optional)" loaded from /campaigns/api/pricing/list.

Backwards compatibility:
- app.py: on startup, if storage/reference/global_pricing.json exists
  and the pricing_references table is empty, import it as a
  "Default (legacy global)" PricingReference row so existing installs
  keep a valid reference attached (user can pick it at configure time).
- config.py: retains GLOBAL_PRICING_{PDF,JSON}_PATH for the legacy
  importer; adds PRICING_REF_STORAGE_PATH for the new per-row storage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:27:09 +02:00
nickviljoen
d036752d17 v2.2.0: Gemini video, batch grouping, thumbnails, speed, price fix, printer check
- Video QC: Switch to Google Gemini direct video analysis as default (OpenAI frame grid fallback)
- HM QC: Group reports by batch with collapsible sections, ZIP download per batch
- HM QC: Generate asset thumbnails (150px) displayed in report listings
- Speed: Remove artificial delays, add ThreadPoolExecutor(2) for parallel batch processing
- Price detection: Improved prompt with country context, detect all prices, increased text limit
- New Printer Check module: CSV-to-PDF cross-referencing ported from CrossMatch Rust app

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:56:07 +02:00
nickviljoen
7a3272b7c4 Fix price detection: better error handling, strip markdown fences, log responses
- Strip markdown code fences from LLM response before JSON parsing
- Log raw response and parsed result for debugging
- Show warning with provider/model info when detection fails (instead of silent skip)
- Separate "detection failed" (warning, 70) from "no price found" (skipped, 100)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 19:05:51 +02:00
nickviljoen
2d5fe43031 Support multiple campaign docs + clarify pricing is format-only
- Global pricing parser now explicitly extracts format only (symbol,
  position, separators) — ignores actual price values in the reference doc
- Executors load ALL ready documents for a campaign (not just the latest),
  combining their content — supports guidelines + media plan side by side
- Campaign context now separates pricing_content (from has_pricing docs)
  from general parsed_content (all docs combined)
- Price check uses pricing_content specifically for actual price validation
- Report header shows document count (e.g., "1022B - AW25 Display (2 docs) + pricing")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 18:47:46 +02:00
nickviljoen
fc15a2dda3 Rewrite filename check + add price/currency check to image QC
Filename check:
- Rewritten to flexibly parse multiple H&M naming conventions
  (Display, DOOH, OOH, SOME STATIC, Social, POS, DS)
- Extracts country code, language code, dimensions, campaign number
- Scores based on how much metadata was extracted (not rigid pattern)
- Tested against real filenames: BG_bg, ES_es, NO-no formats

Price/currency check (new):
- Detects prices in images via LLM vision API
- Validates currency against global pricing reference (deterministic)
- Falls back to LLM validation for unknown countries
- Optional campaign pricing sheet validation when has_pricing=True
- Added to profile with weight 30

Profile weights rebalanced: filename 30, quality 40, price 30

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 18:39:54 +02:00
nickviljoen
9c33858726 Add campaign presentation management and global pricing reference
Introduces a new Campaigns module for uploading campaign presentation PDFs
that QC checks reference to validate assets against campaign-specific
guidelines (typography, layout, copy, pricing format). Also adds a global
pricing reference system that maps country codes to currency symbols and
formats for deterministic price/currency validation.

- New CampaignPresentation model + campaigns blueprint with CRUD routes
- PDF parsing via LlamaParse (text + multimodal page images)
- Global pricing PDF parsed into structured JSON lookup
- Campaign context injected into both image and video QC executors
- Quality checks enhanced with campaign guidelines in LLM prompts
- Price/currency check uses global pricing lookup (saves an LLM call)
- Campaign dropdown added to HM QC and Video QC configure pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 16:12:22 +02:00
nickviljoen
e910e00edf Add Usage Dashboard with token tracking, cost estimates, and filters
- New UsageLog model tracking every LLM API call (provider, model,
  tokens, estimated cost, user, module, check name)
- Instrument LLMConfig.call_vision_api() to auto-log each call
- New /usage tab in nav bar with dashboard showing:
  - Summary cards (total calls, tokens, estimated cost)
  - Breakdowns by provider, model, tool, and user
  - Recent API calls table
  - Time filters (All Time, 30 Days, 7 Days, Today)
- Cost estimates based on per-model token pricing
- Pass logged-in user through executor context for tracking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:17:21 +02:00
nickviljoen
5e291723a0 Swap dimension_check back to filename_parse, strengthen text legibility prompt
- Replace dimension_check with filename_parse in H&M Image Check profile
- Rewrite quality check prompt to be much stricter on text legibility:
  - Text legibility is now the #1 priority (CRITICAL check)
  - Any illegible text forces score below 70 (FAILED)
  - Explicit instructions to check ALL text including small overlays
  - Low contrast text on dark/busy backgrounds flagged as common failure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 17:38:01 +02:00
nickviljoen
91dec41e0b Batch 3: Add title legibility check, Google Gemini support, LLM provider selector
- Update image quality prompt to evaluate text/title legibility
- Add Google Gemini (generativeai) as LLM provider in LLMConfig
- Add AI Provider dropdown on configure page (OpenAI GPT-4o / Google Gemini)
- Pass selected provider through execute routes to override profile defaults
- Add google-generativeai to requirements.txt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 16:53:07 +02:00
nickviljoen
1c582ffcf4 Batch 2: Simplify to single profile, fix multi-file batch execution
- Replace 3 profiles with single "H&M Image Check" (dimension_check + image_quality)
- Remove filename_parse check (pattern didn't match actual filenames)
- Create DimensionCheck class for image dimension validation
- Fix configure page to route multi-file uploads to batch endpoint
- Auto-select single profile, show file list on configure page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 16:50:35 +02:00
nickviljoen
ffd8b7303c v1.1.0: Add progress tracking, CSV export, multi-job support, batch processing, and security fixes
- Reporting: async search with SSE progress bar, CSV export with Box file links,
  multi-job support, designer-friendly error display with action guidance
- HM QC: batch file upload (up to 100 files), batch execution with rate limiting,
  batch results summary
- Fix: SQLAlchemy stale cache in SSE progress streaming (expire_all + commit)
- Fix: Box folder pagination loop (search API instead of iterating 10,300 folders)
- Fix: HM QC blank screen (progress.js not loaded, hardcoded wrong URLs)
- Security: remove hardcoded API keys from legacy files, read from .env instead

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 09:43:20 +02:00
nickviljoen
677736943a Consolidate legacy hm_qc and video_qc tools into main project
Merge original CLI check implementations from hm_qc/ and
hm_qc_video/ repos into modules/*/checks/legacy/ directories.
Includes profiles, launchers, utils, orchestrators, and the
standalone video Flask web app. Reference files (test data,
results, cheat sheets) copied to gitignored reference/ directory.
Censorship trainset copied to gitignored data/supporting/.

The legacy/ naming convention separates original run_check()
function-based implementations from the new BaseCheck class
architecture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 11:40:53 +02:00
nickviljoen
e6f3e9387e Add modular architecture, core framework, and web UI
New blueprint-based module system (hm_qc, video_qc, video_master,
reporting), core framework (database, config, templates), and
unified web interface with progress tracking and tab navigation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 11:39:04 +02:00