Commit graph

38 commits

Author SHA1 Message Date
nickviljoen
fc15a2dda3 Rewrite filename check + add price/currency check to image QC
Filename check:
- Rewritten to flexibly parse multiple H&M naming conventions
  (Display, DOOH, OOH, SOME STATIC, Social, POS, DS)
- Extracts country code, language code, dimensions, campaign number
- Scores based on how much metadata was extracted (not rigid pattern)
- Tested against real filenames: BG_bg, ES_es, NO-no formats

Price/currency check (new):
- Detects prices in images via LLM vision API
- Validates currency against global pricing reference (deterministic)
- Falls back to LLM validation for unknown countries
- Optional campaign pricing sheet validation when has_pricing=True
- Added to profile with weight 30

Profile weights rebalanced: filename 30, quality 40, price 30

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 18:39:54 +02:00
nickviljoen
dc73268309 Fix report download 404 and add campaign info to reports
- Add /report/<id>/download route using send_file instead of broken
  static file URL (fixes 404 on Download Report button)
- Add campaign label to HTML report header (Campaign: ID - Name)
- Store campaign_id in report metadata_json for traceability

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 18:26:18 +02:00
nickviljoen
9df6b9e490 Add llama-parse and nest_asyncio to requirements.txt
These are needed for campaign PDF parsing in the Docker image.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 18:11:36 +02:00
nickviljoen
a4b42771b9 Add storage/campaigns and storage/reference dirs to Dockerfile
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 18:06:57 +02:00
nickviljoen
392e0e5864 Fix campaign upload: threading context, progress bar, auto-refresh table
- Fix background parsing thread: pass app reference explicitly instead of
  trying to access current_app inside the thread (was silently failing)
- Add progress bar with animated stages during upload and parsing
- Add data-id/data-status attributes to table rows for auto-polling
- On page load, automatically poll any pending/parsing rows and update
  their status badges in-place (fixes stale "Pending" on tab return)
- Immediately inject new row into table after upload so user sees it
  without needing to refresh
- Remove broken _parse_pricing_background function

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 18:03:13 +02:00
nickviljoen
9c33858726 Add campaign presentation management and global pricing reference
Introduces a new Campaigns module for uploading campaign presentation PDFs
that QC checks reference to validate assets against campaign-specific
guidelines (typography, layout, copy, pricing format). Also adds a global
pricing reference system that maps country codes to currency symbols and
formats for deterministic price/currency validation.

- New CampaignPresentation model + campaigns blueprint with CRUD routes
- PDF parsing via LlamaParse (text + multimodal page images)
- Global pricing PDF parsed into structured JSON lookup
- Campaign context injected into both image and video QC executors
- Quality checks enhanced with campaign guidelines in LLM prompts
- Price/currency check uses global pricing lookup (saves an LLM call)
- Campaign dropdown added to HM QC and Video QC configure pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 16:12:22 +02:00
nickviljoen
5267e590eb Disable AKAZE for campaign matching — temp files deleted before use
AKAZE tier needs the actual video file to extract frames, but our
temp-download-and-delete approach means the file is gone by that point.
Perceptual hash (Tier 1) works fine with saved fingerprint data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 22:55:42 +02:00
nickviljoen
1c35813314 v2.0.0: Update all documentation for major release
- Complete README rewrite reflecting deployed production state
- Comprehensive CHANGELOG entry for v2.0.0 covering all changes:
  Docker deployment, auth overhaul, HM QC improvements, Video QC
  (built from scratch), Video Master campaign matching, Usage Dashboard,
  Google Gemini support, reporting enhancements, Box API fixes
- Updated .env.example with all current config options
- Removed outdated references to Azure AD, local development setup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 22:13:27 +02:00
nickviljoen
6205b1cb18 Rewrite Box folder methods to avoid .get() entirely
Box SDK .get() on folder objects fails with "Item.get() takes 1
positional argument" in the deployed environment. Replaced all
folder.get() calls with a new _get_folder_items() helper that uses
get_items() with pagination, falling back to folder.get() only as
last resort. This fixes list_subfolders, list_video_files, and
search_subfolder.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 22:00:16 +02:00
nickviljoen
272b8ea055 Fix list_video_files to search subfolders recursively
Global Masters folder contains subfolders (DOOH, DS, OLV, etc.) with
videos inside them, not videos directly. Added recursive=True option
to search one level of subfolders for video files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 21:50:03 +02:00
nickviljoen
ccfa49cdad Fix Box SDK folder.get() call — remove fields parameter
Box SDK v3 Item.get() doesn't accept fields as positional argument.
Remove fields parameter and let it return full folder info including
item_collection with inline entries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 21:44:26 +02:00
nickviljoen
834b9ee3e2 Fix Box API for collaborated folders: use folder.get() with inline items
The CAMPAIGNS folder is owned by a different user and shared via
collaboration. get_items() and search API fail with "not found" for
these folders, but folder.get() works and returns inline items.

- Rewrite search_subfolder() to use folder.get() first, with pagination
  fallback for folders with >100 items
- Rewrite list_subfolders() and list_video_files() with same approach
- Add BOX_CAMPAIGNS_FOLDER_ID config (156182880490) separate from
  the QC reports folder

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 21:36:25 +02:00
nickviljoen
80d305d123 Fix Video Master: use correct Box campaigns folder ID, improve search
- Add BOX_CAMPAIGNS_FOLDER_ID config (156182880490) separate from
  BOX_REPORT_FOLDER_ID which is for QC reports
- Update search_subfolder() to use Box search API first (fast for large
  folders with 1000+ campaigns), fall back to folder listing
- Increase folder listing limit from 200 to 500

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 21:15:59 +02:00
nickviljoen
7feead49d1 Implement Video Master: campaign-based master-to-adaptation matching
Full workflow:
- Enter campaign name → search Box for campaign folder
- Auto-discover Global Masters and Regional Masters subfolders
- Preview: shows master count, countries, adaptation count
- Phase 1: Download each master to temp, fingerprint, delete video
- Phase 2: Download each adaptation to temp, match against masters, delete
- Results: per-master adaptation mapping, unmatched items, match rate
- HTML report with detailed breakdown
- Previous Matching Jobs table with View/Delete

Box client additions:
- search_subfolder() - case-insensitive subfolder search
- list_subfolders() - enumerate child folders
- list_video_files() - list video files in folder
- download_file_to_disk() - streaming download for large files (ProRes)

Storage: only fingerprints (~50KB) + key frames stored permanently.
Videos deleted immediately after processing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 21:06:37 +02:00
nickviljoen
b4e94ad4eb Update default Google model to gemini-2.5-flash
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:59:00 +02:00
nickviljoen
5fd5f0fc4f Fix Video QC: skip censorship for non-CEN files, fix language false positives
- Censorship check now only runs if filename contains _CEN suffix
  (matches legacy behavior). Non-CEN files get "skipped" with 100 score
- When censorship is skipped, visual quality score is 100% of overall
- Updated language consistency prompt to avoid false positives:
  - Words like "Rock" (German for skirt), "Mode" (fashion), etc.
  - Must verify a word is NOT valid in the primary language before flagging
  - Brand names and international terms are excluded from checks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:53:46 +02:00
nickviljoen
3c7ab234be Implement Video QC: AI-powered visual quality and censorship checks
Full video QC workflow:
- Upload → Configure (LLM provider + job number) → Execute → Results
- Extracts 1 frame per second using existing FFmpeg/extract_thumbnails()
- Stitches frames into labeled grid image for efficient AI analysis
- Two separate AI checks:
  1. Visual Quality (50%): language consistency, text legibility, logo clarity
  2. Censorship (50%): body coverage and content appropriateness
- Progress tracking via SSE/polling
- HTML report generation with per-check scores
- Previous Video QC Reports table with View/Delete on index page
- Usage dashboard integration (logs tokens + cost per API call)
- Supports OpenAI GPT-4o and Google Gemini provider choice

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:40:38 +02:00
nickviljoen
e910e00edf Add Usage Dashboard with token tracking, cost estimates, and filters
- New UsageLog model tracking every LLM API call (provider, model,
  tokens, estimated cost, user, module, check name)
- Instrument LLMConfig.call_vision_api() to auto-log each call
- New /usage tab in nav bar with dashboard showing:
  - Summary cards (total calls, tokens, estimated cost)
  - Breakdowns by provider, model, tool, and user
  - Recent API calls table
  - Time filters (All Time, 30 Days, 7 Days, Today)
- Cost estimates based on per-model token pricing
- Pass logged-in user through executor context for tracking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:17:21 +02:00
nickviljoen
b4abbe8d2d Add delete buttons for reports in both HM QC and Reporting sections
- HM QC: trash icon per report row, DELETE /hm-qc/report/<id> removes
  DB record and file from disk
- Reporting: trash icon per Box job row, DELETE /reporting/history/delete/<job>
  removes all saved Box reports for that job number
- Confirmation prompts before deletion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:13:03 +02:00
nickviljoen
71ddf7892f Add View button to previous QC reports to open saved HTML report
- Add /hm-qc/report/<id> route to serve saved reports by database ID
- Create view_report.html template with score summary and embedded report iframe
- Add "View" button column to Previous QC Reports table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 17:55:15 +02:00
nickviljoen
5e291723a0 Swap dimension_check back to filename_parse, strengthen text legibility prompt
- Replace dimension_check with filename_parse in H&M Image Check profile
- Rewrite quality check prompt to be much stricter on text legibility:
  - Text legibility is now the #1 priority (CRITICAL check)
  - Any illegible text forces score below 70 (FAILED)
  - Explicit instructions to check ALL text including small overlays
  - Low contrast text on dark/busy backgrounds flagged as common failure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 17:38:01 +02:00
nickviljoen
23fda1ec70 Move QC reports section from Reporting tab to HM QC tab
- Remove "Previous QC Reports" table from reporting index
- Add "Previous QC Reports" table to HM QC index page
- Update HM QC index route to pass recent reports
- Update feature list to reflect current checks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 17:16:41 +02:00
nickviljoen
634eb2a634 Split previous reports into Box and QC sections, view from DB not re-search
- Add /reporting/history/<job_number> route that loads saved reports from
  disk/database instead of re-fetching from Box
- Split "Previous Searches" into "Previous Box Reports" and "Previous QC
  Reports" sections with separate tables
- "View" buttons link to history_dashboard (reads from saved files)
- Box reports show job-grouped view, QC reports show individual files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 17:12:47 +02:00
nickviljoen
e2b9691912 Save Box search results to database for reporting history
- After a successful Box search, save downloaded HTML reports to disk
  and record them in qc_reports table (report_type='box_import')
- Skip duplicates by checking box_id in metadata
- Update reporting index to show "Previous Searches" with source badges
- Rename "Recent Reports" to "Previous Searches" for clarity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 17:06:33 +02:00
nickviljoen
501db24e05 Fix Box search infinite pagination loop
Box search generator auto-paginates through ALL results (35k+ for broad
queries). Added iteration caps to prevent runaway API calls:
- _search_folder_by_name: cap at 50 results
- _search_files_by_job_number: cap at 100 results
- _scan_folder_by_name: cap at 1000 items

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 17:00:00 +02:00
nickviljoen
91dec41e0b Batch 3: Add title legibility check, Google Gemini support, LLM provider selector
- Update image quality prompt to evaluate text/title legibility
- Add Google Gemini (generativeai) as LLM provider in LLMConfig
- Add AI Provider dropdown on configure page (OpenAI GPT-4o / Google Gemini)
- Pass selected provider through execute routes to override profile defaults
- Add google-generativeai to requirements.txt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 16:53:07 +02:00
nickviljoen
1c582ffcf4 Batch 2: Simplify to single profile, fix multi-file batch execution
- Replace 3 profiles with single "H&M Image Check" (dimension_check + image_quality)
- Remove filename_parse check (pattern didn't match actual filenames)
- Create DimensionCheck class for image dimension validation
- Fix configure page to route multi-file uploads to batch endpoint
- Auto-select single profile, show file list on configure page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 16:50:35 +02:00
nickviljoen
9ce44981eb Batch 1: Fix navigation and add past reports views
- Fix back navigation on reporting dashboards (linked to / instead of /reporting/index)
- Add "Run Another QC" button on HM QC results page
- Add Recent Reports table on reporting search page (grouped by job number)
- Add Recent QC Reports table on HM QC upload page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 16:48:24 +02:00
nickviljoen
6012260f21 Fix: require login for all routes via before_request hook
The require_auth decorator was never applied to routes, leaving
the entire app publicly accessible. Added a before_request hook
that redirects unauthenticated users to the login page.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 15:10:28 +02:00
nickviljoen
b670b55432 Fix Dockerfile: update package names for Debian Trixie
libgl1-mesa-glx → libgl1-mesa-dri, chromaprint-tools → libchromaprint-tools

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 14:43:11 +02:00
nickviljoen
f21e41afc3 v1.2.0: Add Docker deployment, simplify auth to local login, production config
- Add Dockerfile, docker-compose.yml, .dockerignore for containerised deployment
- Add deploy/ scripts (deploy.sh, nginx/apache configs, password generator)
- Replace MSAL/Azure AD auth with local username/password authentication
- Add login.html template
- Simplify app.py, middleware, and auth routes for production use
- Update gunicorn_config.py and wsgi.py for Docker/production
- Update templates to work with new auth and URL prefix handling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 14:37:53 +02:00
nickviljoen
ffd8b7303c v1.1.0: Add progress tracking, CSV export, multi-job support, batch processing, and security fixes
- Reporting: async search with SSE progress bar, CSV export with Box file links,
  multi-job support, designer-friendly error display with action guidance
- HM QC: batch file upload (up to 100 files), batch execution with rate limiting,
  batch results summary
- Fix: SQLAlchemy stale cache in SSE progress streaming (expire_all + commit)
- Fix: Box folder pagination loop (search API instead of iterating 10,300 folders)
- Fix: HM QC blank screen (progress.js not loaded, hardcoded wrong URLs)
- Security: remove hardcoded API keys from legacy files, read from .env instead

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 09:43:20 +02:00
nickviljoen
35a15bfe09 Update documentation for unified platform consolidation
- Rewrite CHANGELOG.md to cover platform v1.0.0 and auth fix,
  with reporting module history preserved as subsection
- Replace stale DOCUMENTATION_SUMMARY.txt with current project
  structure and key decisions
- Rewrite MIGRATION_GUIDE.md to document legacy tool consolidation
  with complete file mappings for hm_qc and video_qc
- Add legacy context headers to module docs (legacy_README,
  legacy_DEV_SETUP, legacy_CLAUDE) pointing to main README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 13:51:21 +02:00
nickviljoen
1dff8fece5 Fix auth flow: switch from popup to redirect-based MSAL login
The popup login flow was broken because the Flask 302 redirect from
/ to /reporting/index caused MSAL in the popup to consume the auth
code hash before the parent window could detect it, leaving the
parent stuck on "Authenticating..." while the popup rendered the
full app.

- Switch signIn() from loginPopup() to loginRedirect()
- Add handleRedirectPromise() at start of initAuth() to process
  the auth code on page load after returning from Microsoft
- Change root route from 302 redirect to direct template render
  so the #code=... hash fragment is preserved for MSAL
- Switch signOut() from logoutPopup() to clearCache()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 12:22:33 +02:00
nickviljoen
677736943a Consolidate legacy hm_qc and video_qc tools into main project
Merge original CLI check implementations from hm_qc/ and
hm_qc_video/ repos into modules/*/checks/legacy/ directories.
Includes profiles, launchers, utils, orchestrators, and the
standalone video Flask web app. Reference files (test data,
results, cheat sheets) copied to gitignored reference/ directory.
Censorship trainset copied to gitignored data/supporting/.

The legacy/ naming convention separates original run_check()
function-based implementations from the new BaseCheck class
architecture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 11:40:53 +02:00
nickviljoen
e6f3e9387e Add modular architecture, core framework, and web UI
New blueprint-based module system (hm_qc, video_qc, video_master,
reporting), core framework (database, config, templates), and
unified web interface with progress tracking and tab navigation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 11:39:04 +02:00
nickviljoen
96d0bf95e1 Reporting updated.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-14 09:14:00 +02:00
nickviljoen
42f654f78b Initial Commit 2025-12-30 16:47:56 +02:00