The accept='' attribute and help copy already allowed .srt, but a
separate ALLOWED_EXTENSIONS array in upload.html's JS was filtering
out .srt files as 'unsupported format'. Adding 'srt' to that array
fixes the silent skip seen on Dev (file picker showed .srt as
valid, then the submit handler dropped it).
Profile YAML is descriptive metadata (executor runs unconditionally).
Documenting srt_structure (15), srt_timing (10), srt_language (20)
so the profile page reflects the live check set.
Upload form accepts .srt alongside .mp4. Configure page shows pair_map
counts and a collapsible list of unpaired SRTs (rendered via DOM
textContent to avoid XSS from user-controlled SRT filenames). Uses the
new /pairing-preview/<session_id> endpoint.
Adds optional srt_paths constructor parameter. At execute() top, runs
pair_batch() to produce pair_map / unpaired_srts / unpaired_videos.
Threads pair_map[video_path] into each per-video VideoQCExecutor as
srt_path. No-op when srt_paths is empty.
Skipped when self.srt_path is None (one result per check, weight set so
the weighted-average math is unchanged). When set, runs all three
checks sequentially with progress updates. SRT results appear as
additional cards in the existing Video QC report.
Text-only LLM call samples up to 15 cues (~1500 chars), asks Gemini
Flash to identify language. Pass = ISO matches expected from video
filename's locale; warning = low confidence or mixed_language; fail =
ISO mismatch with high confidence. Weight 20.
Note: uses genai.GenerativeModel directly rather than a unified
LLMConfig.call_text_api (which doesn't exist yet). Marked TODO for
future refactor when that helper is added.
score_pair: additive locale (0.5) + campaign code (0.3) + clip-slug
substring (0.4), capped at 1.0, with hard-reject on divergent locales
or non-overlapping slugs. pair_batch: greedy highest-first assignment
above 0.7 threshold; one SRT per video.
Verified pairs all 6 videos in testing_15may/srt/ to their SRTs.
Extract campaign code, clip slug, and locale from both video and SRT
filenames. Handles the two SRT styles seen in testing_15may/srt/
(campaign-code-prefixed CFUL... form, and abbreviated RIO_INTRO6B form).
Verified at REPL against test data.
Pure-function helpers, verified at REPL. canonical_locale handles the
de-AT ↔ AT-de order flip between SRT and video filenames; normalise_slug
strips non-alphanumerics so RIO_INTRO_15C ≈ RIO_INTRO15C.
Profile YAML is descriptive metadata (executor runs unconditionally).
Keeping it current so the profile page and any future YAML-driven
selection reflects the live check set.
Flags (never fails) when price or garment-name text falls inside known
platform UI overlay zones (TikTok / IG Stories / IG Reels / generic
vertical). Platform inferred from filename tokens via _infer_platform_zones.
Weight 0 in profile — advisory only, never contributes to overall score.
Single Gemini direct-video call detects garment/product text overlays;
deterministic match against PricingReference.get_prices() product_name
for the file's locale. Skips when no pricing reference attached, locale
unparseable, GEN/CEN file, no expected product names for locale, or no
on-screen garment text detected. Weight 25 in standard_video profile.
Adds _normalize_product_name (lowercase, alphanumeric+space, collapse
whitespace) and _product_names_match (substring or >=60% token-set
overlap on min side). Used by the upcoming garment_name check.
Adds _PLATFORM_ZONES (TikTok / IG Stories / IG Reels / generic vertical)
and _infer_platform_zones(filename) for use by the new title_safe check.
Pure function, verified at REPL against expected filenames. No new
behaviour exposed yet — wired up in the next task.
The price_currency check has always done a full numeric match against
the pricing reference but the report card only showed pass/fail by
currency. Pull matched_price, matched_product, detected_prices, and
expected_prices into the message string so QC reviewers can see the
full match at a glance.
No logic changes.
Video QC:
* _extract_locale_from_filename now also handles the suffix form
..._XX-yy.ext (case-insensitive both sides), so DOOH/OOH-style
adapt filenames like ..._ES-es.mp4 unblock the price_currency
check instead of skipping with "could not extract locale".
* Batch results page expires the SQLAlchemy session at the top of
the route so the post-completion reload sees committed reports
even when it lands on a different gunicorn worker than the one
that wrote them. Reload delay bumped 1s → 2s for margin.
* visual_quality prompt now passes the filename's market+language
to the LLM and tells it the on-screen copy should be in the
localized language, not the source-language guideline copy.
Stops Spanish-market videos being flagged as "language mismatch
with English campaign guidelines".
Printer Check:
* regions.json rewritten to cover all 10 H&M regions (AME, CEU,
NEU, GCN, IND, SHE, SEU, EEU, EAS, Franchise) with default-all
groups. Two judgement calls vs the screenshot: kept TR for
Turkey (TK is Tokelau in ISO and would break filename matching)
and BR for Brazil (every other code is 2-letter ISO).
Campaign codes:
* New core/utils/campaign_code.py is the single source of truth.
Matches both the legacy 4-digits-plus-optional-letter (1013A,
4116) and the new 11-char alphanumeric with year at positions
5-6 (CFUL263C01D). All four prior parser sites now import from
this helper.
Video Master:
* BOX_CAMPAIGNS_FOLDER_ID switched 156182880490 → 133295752718
(same root the Reporting tool uses). Updated config.py default
and all three .env example files.
* Match page now shows which Box folder the search runs against
(with a clickable link), and on a not-found error explains what
was searched for so missing-campaign cases are self-diagnosable.
Lifted JWT-cookie auth pattern from the AI QC sibling project:
core/auth/middleware.py validates Azure AD JWTs and stores them in
an httpOnly cookie (hm_aiqc_auth_token). Tenant membership is
enforced by JWTValidator's tid check, which is sufficient for the
tenant-wide access policy chosen for this project.
templates/login.html now drives an MSAL.js popup that POSTs the
ID token to /auth/login. base.html exposes Azure config to all
pages so the logout button can also clear the MSAL session.
app.py's @before_request now checks the JWT cookie and exposes
g.user; modules read user identity via core.auth.current_user_email
so usage logs and created_by columns now record the signed-in
user's email rather than a session value.
Legacy username/password code removed: top-level auth_middleware.py,
jwt_validator.py, deploy/generate_password.py.
A. Excel upload — /campaigns/pricing/upload now accepts .xlsx/.xls
alongside .pdf. File picker in the campaigns UI matches.
B. Deterministic Excel parser (openpyxl, no LLM) — looks for H&M-style
mastersheets:
- 'MPC Prices' sheet -> flat list of {product_id, language, country,
price, currency, product_name} entries (this is the gold mine).
- Regional sheets (AME/CEU/EEU/...) -> formatted prices per locale
used to derive currency symbol, position, decimal/thousands
separators. Skips OLD/COPY sheets.
Verified against the attached 1013A mastersheet: 448 price entries
across 7 products x 74 locales, 139 locale format entries.
Parser lives in modules/campaigns/pricing_parser.py alongside the
existing PDF path (which now also returns the structured form with
empty _prices).
New lookup shape stored in PricingReference.parsed_data_json:
{"_format": {"en-US": {currency_code, symbol, position, ...}, ...},
"_prices": [{product_id, language, country, price, currency,
product_name}, ...]}
Legacy flat {"<code>": {...}} is still recognised (treated as _format
only) for backwards compatibility with the legacy global JSON import.
Model helpers added:
- PricingReference.get_format_map()
- PricingReference.get_prices()
to_dict() now reports price_count alongside entry_count.
C. Upgraded price_currency_check.py — when a pricing reference with
_prices is attached, the check runs a deterministic comparison:
detected price(s) -> normalize (_normalize_price handles '$49.99',
'39,99 €', 'CHF 49.95', '1.234,56', 'Rs. 2,799', '13 995 Ft', '349,-',
'0.999.000'...) -> compare with tol=0.005 against the expected
per-locale rows. LLM-based campaign-sheet fallback only runs if no
_prices are present (legacy PDF reference or has_pricing campaign
presentation).
D. Video QC price check — new _run_price_check step in the executor.
Parses filename (Market_lang_CampaignNum_... -> 'lang-Market' locale),
detects prices across frames via the same Gemini/GPT-4o path the
other checks use, then deterministic-validates against the attached
pricing reference. Skipped if no pricing ref, unknown locale, GEN/CEN
markets, or no price visible in video.
Overall video score now uses weighted mean of active (non-skipped)
checks (visual_quality w=50, censorship w=50, price_currency w=30)
instead of the hardcoded 50/50 split — so skipping any one check
falls through cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "Global Pricing Reference" is no longer a single file at
storage/reference/global_pricing.json. Pricing references are now
first-class DB rows (PricingReference model), uploadable as a library
in the Campaigns tab and selectable per-run alongside the campaign
presentation dropdown on the HM QC and Video QC configure pages.
New:
- core/models/pricing_reference.py — PricingReference model: id, name,
pdf_filename, pdf_path, parsed_content, parsed_data_json, status,
created_at/by. get_lookup() deserializes parsed_data_json; to_dict()
powers the dropdown API.
- /campaigns/pricing/upload — creates a PricingReference row, saves PDF
under storage/pricing_references/<id>/, kicks off background parse.
- /campaigns/pricing/<id> DELETE, /campaigns/api/pricing/list,
/campaigns/api/pricing/status/<id>.
- Campaigns index: "Pricing References" table card (mirrors the
presentations card) + upload form with optional name field.
Changed:
- pricing_parser: parse_pricing_pdf_to_dict returns (dict, raw_text);
new parse_pricing_reference(id) runs the parse against a DB row and
sets status to ready/error. Legacy file-based path removed.
- QCExecutor and VideoQCExecutor accept pricing_reference_id; load the
row into context['pricing_reference']={id, name, lookup}.
- BatchQCExecutor and BatchVideoQCExecutor thread pricing_reference_id
through to per-file executors.
- price_currency_check._validate_currency reads context instead of the
disk file; returns 'skipped_no_reference' if no ref attached.
- HM QC + Video QC /execute and /execute/batch routes pass
pricing_reference_id from the JSON payload.
- Configure templates for HM QC and Video QC add a second dropdown
"Pricing Reference (Optional)" loaded from /campaigns/api/pricing/list.
Backwards compatibility:
- app.py: on startup, if storage/reference/global_pricing.json exists
and the pricing_references table is empty, import it as a
"Default (legacy global)" PricingReference row so existing installs
keep a valid reference attached (user can pick it at configure time).
- config.py: retains GLOBAL_PRICING_{PDF,JSON}_PATH for the legacy
importer; adds PRICING_REF_STORAGE_PATH for the new per-row storage.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the existing HM QC batch pattern so Video QC now supports
queueing and processing multiple videos from a single upload.
New:
- batch_executor.py — BatchVideoQCExecutor, sequential processing
(gc.collect() between videos, cooldown between batches), stamps
a shared batch_id into each report's metadata_json.
- /video-qc/execute/batch — kicks off a BatchVideoQCExecutor thread.
- /video-qc/results/batch/<session_id> — batch summary card, per-file
list (filename, score, status, view/download), ZIP download link.
Reuses results.html with is_batch flag.
- /video-qc/report/<id>/download, /video-qc/report/batch/<id>/download
(ZIP), /video-qc/report/batch/<id> DELETE.
Changed:
- VideoQCExecutor accepts batch_id; writes it into metadata when set.
- /video-qc/upload accepts multi-file (request.files.getlist('files'))
with single-file fallback; returns is_batch/filenames/file_count.
- Upload template: drag-and-drop list UI (same pattern as HM QC upload).
- Configure template: shows file count + list, swaps button text and
POST endpoint based on file_count; redirects to results/batch when
batch, results when single.
- Video QC index uses QCReport.get_recent_grouped to render "Batch
Reports" (collapsible per-batch table) + "Individual Reports".
Post-run destinations:
- 1 file -> /video-qc/results/<session_id> (unchanged)
- N files -> /video-qc/results/batch/<session_id> (batch summary +
list of reports from the run)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- video_qc/executor.py: escape braces in JSON example blocks inside
f-string prompts (visual_quality, censorship). Unescaped { } made
Python parse the example as format specifiers, raising
"Invalid format specifier ' 85, ..." and failing execution.
- reporting/routes.py: history_dashboard now passes reports=parsed_reports
(matching the live dashboard route) and attaches friendly_checks per
report. Previously passed parsed_reports=friendly_reports, a kwarg
the template does not consume, leaving the Parsed Data View accordion
empty and breaking the "View Details" scroll-to-file links.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Video QC: Switch to Google Gemini direct video analysis as default (OpenAI frame grid fallback)
- HM QC: Group reports by batch with collapsible sections, ZIP download per batch
- HM QC: Generate asset thumbnails (150px) displayed in report listings
- Speed: Remove artificial delays, add ThreadPoolExecutor(2) for parallel batch processing
- Price detection: Improved prompt with country context, detect all prices, increased text limit
- New Printer Check module: CSV-to-PDF cross-referencing ported from CrossMatch Rust app
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Global pricing parser now explicitly extracts format only (symbol,
position, separators) — ignores actual price values in the reference doc
- Executors load ALL ready documents for a campaign (not just the latest),
combining their content — supports guidelines + media plan side by side
- Campaign context now separates pricing_content (from has_pricing docs)
from general parsed_content (all docs combined)
- Price check uses pricing_content specifically for actual price validation
- Report header shows document count (e.g., "1022B - AW25 Display (2 docs) + pricing")
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces a new Campaigns module for uploading campaign presentation PDFs
that QC checks reference to validate assets against campaign-specific
guidelines (typography, layout, copy, pricing format). Also adds a global
pricing reference system that maps country codes to currency symbols and
formats for deterministic price/currency validation.
- New CampaignPresentation model + campaigns blueprint with CRUD routes
- PDF parsing via LlamaParse (text + multimodal page images)
- Global pricing PDF parsed into structured JSON lookup
- Campaign context injected into both image and video QC executors
- Quality checks enhanced with campaign guidelines in LLM prompts
- Price/currency check uses global pricing lookup (saves an LLM call)
- Campaign dropdown added to HM QC and Video QC configure pages
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Censorship check now only runs if filename contains _CEN suffix
(matches legacy behavior). Non-CEN files get "skipped" with 100 score
- When censorship is skipped, visual quality score is 100% of overall
- Updated language consistency prompt to avoid false positives:
- Words like "Rock" (German for skirt), "Mode" (fashion), etc.
- Must verify a word is NOT valid in the primary language before flagging
- Brand names and international terms are excluded from checks
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full video QC workflow:
- Upload → Configure (LLM provider + job number) → Execute → Results
- Extracts 1 frame per second using existing FFmpeg/extract_thumbnails()
- Stitches frames into labeled grid image for efficient AI analysis
- Two separate AI checks:
1. Visual Quality (50%): language consistency, text legibility, logo clarity
2. Censorship (50%): body coverage and content appropriateness
- Progress tracking via SSE/polling
- HTML report generation with per-check scores
- Previous Video QC Reports table with View/Delete on index page
- Usage dashboard integration (logs tokens + cost per API call)
- Supports OpenAI GPT-4o and Google Gemini provider choice
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add Dockerfile, docker-compose.yml, .dockerignore for containerised deployment
- Add deploy/ scripts (deploy.sh, nginx/apache configs, password generator)
- Replace MSAL/Azure AD auth with local username/password authentication
- Add login.html template
- Simplify app.py, middleware, and auth routes for production use
- Update gunicorn_config.py and wsgi.py for Docker/production
- Update templates to work with new auth and URL prefix handling
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rewrite CHANGELOG.md to cover platform v1.0.0 and auth fix,
with reporting module history preserved as subsection
- Replace stale DOCUMENTATION_SUMMARY.txt with current project
structure and key decisions
- Rewrite MIGRATION_GUIDE.md to document legacy tool consolidation
with complete file mappings for hm_qc and video_qc
- Add legacy context headers to module docs (legacy_README,
legacy_DEV_SETUP, legacy_CLAUDE) pointing to main README
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge original CLI check implementations from hm_qc/ and
hm_qc_video/ repos into modules/*/checks/legacy/ directories.
Includes profiles, launchers, utils, orchestrators, and the
standalone video Flask web app. Reference files (test data,
results, cheat sheets) copied to gitignored reference/ directory.
Censorship trainset copied to gitignored data/supporting/.
The legacy/ naming convention separates original run_check()
function-based implementations from the new BaseCheck class
architecture.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New blueprint-based module system (hm_qc, video_qc, video_master,
reporting), core framework (database, config, templates), and
unified web interface with progress tracking and tab navigation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>