ai_qc

Author	SHA1	Message	Date
nickviljoen	29ee941037	refactor(formatting_diff): narrow scope to bold + italic only First real-data test against the AXA car-insurance PDFs surfaced a noise problem: the new document is a brand refresh — every page flips font (PublicoBanner-Bold→PublicoHeadline-Bold) and colour (#893f4a→#2e3092). At medium-per-finding that crashed the diff score to 0.0 and drowned the bold-regression signal AXA actually flagged. Drop font, size, colour comparators. Keep bold + italic — the attributes the vision-LLM consistently misses on dense layouts. The LLM already narrates colour-scheme rebrands and font swaps in its Modified / Style-changes blocks; running both layers on the same visual change just double-counts it. Tests inverted from "X change is flagged" to "X change is NOT flagged" to lock the scope decision in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 12:37:19 +02:00
nickviljoen	d327776c70	fix(diff_engine): guard compute_formatting_diff against per-pair failure If the deterministic formatting comparator raises on any single page-pair (e.g. unexpected span shape from a future PyMuPDF version), degrade to zero formatting findings for that pair instead of aborting the whole 52-page diff run. Logged for visibility. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 10:31:16 +02:00
nickviljoen	0fd6a35562	fix(diff_report): _fmt_value labels italic flips correctly Previously every boolean attribute rendered as "Bold → Regular", producing "Italic: Bold → Regular" for italic flips. Now the helper takes the attribute name and emits "Italic → Regular" or "Bold → Regular" depending on which boolean attribute is being shown. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 10:22:39 +02:00
nickviljoen	7eaac85df3	feat(diff_report): render formatting_changes as a per-pair block Adds a "🎨 Formatting changes" block to the per-page diff report when the deterministic formatting layer finds typographic flips. Distinguishes page-wide style shifts from local span flips, lists up to three example quotes per aggregated finding, and HTML-escapes all user-controlled strings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 10:08:47 +02:00
nickviljoen	2b1bb9ccf0	feat(diff_engine): merge formatting_diff findings into pair_diffs run_page_pair_diff now invokes compute_formatting_diff alongside the LLM call for each aligned pair. When the deterministic layer finds typographic flips on a page the LLM saw as identical, the pair is re-classified as having differences with medium severity. Each aggregated finding contributes to the global medium-severity tally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 10:03:54 +02:00
nickviljoen	d21a8a276d	refactor(formatting_diff): harden page_wide threshold + None-key handling Three review-driven hardening tweaks: - page_wide now requires ≥3 matched spans (PAGE_WIDE_MIN_SPANS). Avoids labelling section-break pages with a single flipped heading as page-wide. - _collect_flips normalises bold/italic via bool() and font/color via "or ''" so callers passing dicts without those keys do not produce phantom flips against False/''. - Adds tests for empty span lists and the missing-bold-key case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 10:01:23 +02:00
nickviljoen	98679e7329	feat(document_mode): add deterministic span formatting diff New formatting_diff module compares span-level bold/italic/font/size/ color attributes between aligned page-pairs. Pure-Python; reads PyMuPDF metadata already captured during ingest. Aggregates identical flips into single findings and flags page-wide style shifts. Powers the AXA document_diff fix for missed formatting changes that the vision-LLM does not reliably detect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 09:56:34 +02:00
nickviljoen	f69e181520	feat(ingest): capture span color as #rrggbb string Adds a 'color' field to each span dict extracted by _extract_page_spans. Powers the upcoming deterministic formatting-diff layer for AXA document_diff mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 09:45:21 +02:00
nickviljoen	71bb9a6295	fix(hp_copy_review): correct llm casing + route HP reports to /hp/ folder Two bugs surfaced by the first dev smoke test: 1. Profile JSON declared "llm": "gemini" (lowercase). llm_config's dispatcher compares model_name == "Gemini" case-sensitively (matches the rest of the codebase), so the check fell through to "Invalid model selected" and never reached the API. Every other profile uses "Gemini" with capital G. Spec mistake — fixed. 2. get_client_from_profile() resolves the per-report output folder from the profile_id via hardcoded prefix matches. No 'hp_' branch existed, so hp_copy_review reports landed under output-dev/general/ instead of output-dev/hp/ — the UI then couldn't find them. Added 'hp_' → 'hp' alongside the existing mappings. The check itself works correctly otherwise: profile_source was user_selected, brand resolved to 'hp', and the reference asset was successfully attached. Bug 1 just prevented Gemini from being called. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 22:07:25 +02:00
nickviljoen	68a2360811	feat(report): render hp_copy_review findings as a structured table Both HTML report generators (generate_html_content and generate_comprehensive_html_report) get a small case: when a check result has a 'findings' array in its json_data, render it as a priority-coloured table with quote/issue/suggested-fix/source columns instead of the default response-text block. The summary field (when present) renders above the table. Fallback to text rendering when findings is absent — every existing check is unaffected. All string fields from the LLM are HTML-escaped via html.escape() to neutralise stray <, >, &, or quote characters. Inline CSS for .findings-table / .priority-pill / .priority-high\|medium\|low / .muted is added to both stylesheets so the two generators stay visually in sync.	2026-05-17 21:37:35 +02:00
nickviljoen	0e833447c0	fix(brand-guidelines): inject xlsx Source Messaging summary into check prompts Task 5 review found that get_reference_asset_content treated all non-localization-matrix .xlsx files as opaque ('reference file uploaded'), never reading the Gemini summary that excel_processor writes. That meant hp_copy_review would see no canonical messaging and fire its score-0 fallback on every real asset. Extend the .xlsx branch to mirror the PDF pattern: when the file record has a summary_path (set by excel_processor after a successful source-messaging summary), read and inject the Markdown into the reference content block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 21:28:32 +02:00
nickviljoen	4c19a0fb9d	feat(hp_copy_review): single-check LLM grader against Source Messaging Single Gemini call per asset. Prompt assembles attached Source Messaging summaries + media-plan language context + the asset image. Returns structured JSON with score, summary, and a findings array (priority, category, quote, issue, suggested fix, source reference). Empty findings = clean asset; missing reference -> score 0 with a clear message rather than running blind. Mirrors the boots_tandc_wording pattern: subclass FlaskAppTemplate, expose a static prompt template, let process_single_check inject reference-asset content and media-plan context at runtime. A standalone build_prompt() helper mirrors that assembly for unit- style smoke tests and ad-hoc prompt inspection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 21:25:30 +02:00
nickviljoen	014a9cb8ff	feat(hp): promote HP client + add hp_copy_review profile HP is no longer a placeholder. The client gets a new hp_copy_review profile (single weighted check, client-specific visibility) as its default, plus the generic static_general and video_general profiles it already had visibility into.	2026-05-17 21:08:18 +02:00
nickviljoen	568465f9be	fix(brand-guidelines): preserve localization-matrix parsing in xlsx dispatch The prior Task 2 commit (`295305e`) over-replaced existing logic that recognised certain .xlsx/.xls uploads as localization matrices and set asset_type='localization_matrix'. That field is load-bearing in two downstream sites (api_server.py:1628 and :1986) that build localization context for QC checks; destroying it would silently break any existing client using localization matrices. Restore the original try-localization-matrix-first path; only fall through to excel_processor (HP Source Messaging summary) when the file isn't a parseable localization matrix. Also restore .xls support and tag Source Messaging uploads as asset_type='source_messaging' so downstream code can distinguish them from localization matrices. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 21:03:56 +02:00
nickviljoen	295305ef2d	feat(brand-guidelines): route .xlsx uploads to excel_processor The /api/brand_guidelines POST handler now dispatches by extension: .pdf → pdf_processor.process_pdf_file (existing), .xlsx → excel_processor.process_excel_file (new). Same DB record shape; cover image is null for Excel since there's no first-page analogue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 21:02:05 +02:00
nickviljoen	c51e0729ce	fix(excel-processor): wrap extraction in try/except to honour 'never raises' Code review found that _extract_workbook_text was unwrapped — a corrupt/locked .xlsx or InvalidFileException would leak out of process_excel_file despite the docstring promising 'Never raises'. Wrap the extraction call too; on extraction failure, write a degraded summary explaining the failure and return cleanly. Verified by passing a non-existent file: the function returns a degraded summary instead of raising FileNotFoundError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 20:55:54 +02:00
nickviljoen	abd36a9abe	fix(excel-processor): use literal trademark glyphs in summary prompt Spec requires "™, ®, ©" in the Approved Brand and Product Names section instructions; first pass wrote "TM, R, C" out of unfounded caution about encoding. Python 3 source handles UTF-8 fine and pdf_processor.py uses smart punctuation throughout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 20:52:38 +02:00
nickviljoen	ed46504ac6	feat(excel-processor): add openpyxl + Gemini summary pipeline for HP Source Messaging Mirrors pdf_processor.py — public process_excel_file() reads any HP Source Messaging Excel, extracts cells via openpyxl (skipping empty rows, capped at 50K chars), and summarises into structured Markdown via Gemini 2.5 Pro. Output saved as brand_guidelines/files/{file_id}_summary.md. On Gemini failure the processor writes a degraded summary containing the raw extraction so the reference asset stays usable. Test fixtures (real HP Excels) live under backend/tests/fixtures/hp/ and are gitignored.	2026-05-17 20:49:50 +02:00
nickviljoen	1057c5660f	chore(env): untrack legacy env files so deploys stop clobbering them config.env, backend/config.env, config/development.env, and config/production.env still contained real secrets and were getting silently reverted by `git reset --hard` during deploys — manual key-restore was required after both v1.3.0 and v1.3.1 to recover the in-place GOOGLE_API_KEY rotation. Move them to .gitignore alongside the already-untracked backend/config/*.env paths. The next deploy after this lands will delete them from disk one final time (because they were tracked in the prior commit). Same backup/restore dance documented for the previous secrets-untrack is needed for that single deploy; after it, the files are permanently untracked. This does NOT remove historical secrets from git history. Rotation of OPENAI_API_KEY, BOX_CLIENT_SECRET, SECRET_KEY, SMTP_PASSWORD remains a separate open follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 19:00:09 +02:00
nickviljoen	1c5dd980d4	perf(document-mode): parallelize per-page check dispatch in stages 3c/3d A 4-page Boots PPack run (7 page-scoped checks) was taking ~15 min because the dispatcher processed pages sequentially within each check — 28 Gemini calls in a single file. Asset-mode's ThreadPoolExecutor parallelism was bypassed because doc-mode called process_checks_in_batches once per page in a loop. Wrap the per-page dispatch in both Stage 3c (page_sample) and Stage 3d (page_each) with a ThreadPoolExecutor (max_workers=4). Extract the per-page work into a single nested helper used by both stages, which also tags each result with page_type so the existing artwork vs informational aggregation in Stage 3d keeps working. Aggregation logic, scoring, strict-grade override, and report shape are all unchanged. process_checks_in_batches is already reentrant (asset-mode uses it under its own internal ThreadPoolExecutor), so concurrent calls are safe. Progress-tracker writes intentionally tolerate races (visual only). Per-page exceptions are caught inside the helper so one bad page doesn't kill the doc — it just records a score-0 result. Expected: 15 min → ~3-4 min on the same 4-page PDF. Needs wall-time confirmation on dev with a real run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 18:14:27 +02:00
nickviljoen	a3b3f45f01	fix(deploy): use git's own -n limit instead of \| head -20 When the deploy batch has more than 20 commits, the `git log ... \| head -20` pipeline closes the pipe after 20 lines. git log gets SIGPIPE (exit 141), which `set -o pipefail` propagates, and `set -e` then exits the script silently — no prompt shown, no error message. Only bites for release-sized batches (>20 commits). First seen on the v1.3.0 prod deploy: 20 commits displayed, then the script returned to the shell without prompting. dev deploys never hit this because they typically only have 1-3 commits ahead. Fix: tell git to limit its own output via `-n 20`. Same display, no broken pipe. Also swap the count-by-wc-l for `git rev-list --count` which is more idiomatic and avoids any further pipe shenanigans. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 15:25:38 +02:00
nickviljoen	31b059de79	docs: add Box client onboarding runbook Documents the end-to-end process for adding a new client to the Box-webhook-driven QC pipeline: 1. Box admin: create INCOMING + REPORTS folders, invite service account 2. Code: add box_folder_id / box_reports_folder_id / default_profile to client_config.py, ship via PR 3. Verify service account access with `box_setup.py list-folder` 4. Register webhook via `box_setup.py register-all-clients` (or UI) 5. End-to-end test by uploading a sample asset, watching logs, confirming report appears + source moves to _PROCESSED 6. Optional: tune default_profile from the Settings UI without a code deploy 7. Promote to prod (develop→main PR, tag, deploy.sh prod) Includes a gotchas table for the issues most likely to come up: 403s from missing collaborator invites, signature verification failures, folder ID mismatches, replace-upload behavior, etc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 14:12:48 +02:00
nickviljoen	bf89466d06	feat(settings): default-profile UI per client (admin-only) for Box webhook flow Adds a "Default Profile" sub-tab to the Settings modal. Lists the current client's profiles as radio buttons, shows which is the active default and whether it's a runtime override or the static value from client_config.py. Admins click a different profile + Set to override; clear-override button reverts to the static value. Storage layer: backend/client_defaults.json (gitignored, per-server), following the same pattern as user_access.json. Resolution order in client_config.get_default_profile(): override → static default_profile field → None. The Box webhook handler is the sole consumer that needs profile selection without a logged-in user; it now reads via get_default_profile() so overrides take effect. Why a separate JSON, not rewriting client_config.py: a buggy override write can never break server boot — worst case the override is ignored and the static value applies. Cleaner separation between "static config you check in" and "runtime overrides admins make". Backend: - client_config.get_default_profile(client_id) — resolver - client_config.set_default_profile(client_id, profile_id) — validates + writes (rejects profiles not in client's profile list) - client_config.clear_default_profile_override(client_id) - GET /api/clients/<id>/default_profile (any auth'd user) - PUT /api/clients/<id>/default_profile (admin-only, _require_admin) - DELETE /api/clients/<id>/default_profile (admin-only) - Box webhook handler in api_server.py now uses get_default_profile() Frontend: - New "Default Profile" tab button + tab content in Settings modal - showTab hook loads settings when tab activates - loadDefaultProfileSettings / saveDefaultProfile / clearDefaultProfileOverride functions - DOM-construction (createElement + textContent) used throughout — no innerHTML with interpolated values, so user-controllable strings (client_id, profile_id) can never cause XSS Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 13:50:20 +02:00
nickviljoen	b7e9c483de	feat(box-jwt): move source file to _PROCESSED after successful run Solves two problems at once: 1. Folder cleanliness — INCOMING accumulates indefinitely otherwise. 2. Duplicate-upload re-trigger — Box V2's FILE.UPLOADED trigger doesn't fire when the same filename is "uploaded as new version" of an existing file. By moving the source out of INCOMING after success, re-uploading the same filename becomes a genuinely-new file event again and the webhook fires normally. After report uploads successfully to the REPORTS folder, the worker: 1. find_or_create_subfolder(<INCOMING>, '_PROCESSED') — idempotent 2. move_file(file_id, <_PROCESSED>, new_name=f'{session_id}_{filename}') The session_id prefix gives the archived file a sortable timestamp and ties it back to the matching QC_Report_<session_id>_*.html in REPORTS. Defensive: the move only runs if the report upload to Box succeeded. If Box delivery failed, the source stays in INCOMING so a retry just means re-uploading. Move failures are non-fatal — logged + recorded in result_data['box_source_move_error'], analysis still marked complete. Adds four helpers to box_jwt_client.py: - find_subfolder_by_name(parent, name) → Optional[str] - create_subfolder(parent, name) → str - find_or_create_subfolder(parent, name) → str (idempotent) - move_file(file_id, target_folder, new_name=None) → Dict Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 13:29:45 +02:00
nickviljoen	c75f3a99b9	fix(reports): render check details for status='success' in generate_comprehensive_html_report generate_comprehensive_html_report filtered check rendering with `status == 'completed'`, but the modern check pipeline (process_single_check via /api/start_analysis and the Phase 4 Box webhook flow) returns `status == 'success'`. Only the legacy process_single_check_with_triage returns 'completed'. Result: every report produced by the modern pipeline had an empty "Detailed Analysis Results" section — just the heading with nothing below it. Surfaced when Nick ran a LOREAL Box-webhook test on 2026-05-17: webhook fired correctly, 4 LLM checks ran, scores came back, technical pre-flight rendered, but the per-check accordion was empty. Fix: accept either status value, so both modern and legacy code paths render correctly. Errored checks (status='error') still skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 13:01:21 +02:00
nickviljoen	4a9ddee87f	feat(clients): wire LOREAL Box folders for webhook-driven QC First client to use the Phase 4 unattended-QC pipeline. Adds three optional fields to the loreal entry in client_config.py: - box_folder_id=381501258415 (AI-QC > INCOMING > AI QC LOREAL IN) - box_reports_folder_id=382076841334 (AI-QC > REPORTS > AI QC LOREAL REPORTS) - default_profile=loreal_static When a file lands in the INCOMING folder, /api/box/webhook will pick it up, run loreal_static (strict-grade), and upload the HTML report to the REPORTS folder. Other clients remain unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 09:50:40 +02:00
nickviljoen	a99c8601f0	Merge develop into feature/box-jwt-integration Brings in the 4 commits that landed on develop after this branch was cut: the chore/untrack-env-files PR (#7) and the fix/tech-section-in-html-content PR (#8). Conflict resolution: - .gitignore: both branches added `backend/config/box_jwt_config.json` in slightly different positions. Kept both sets of additions — development.env + production.env (from develop) and box_jwt_config.json (from this branch). - api_server.py: auto-merged cleanly; the Phase 4 webhook endpoint and the Phase 3 technical-section fix touch different regions of the file. Verified after merge: api_server imports cleanly, box_webhook route registered, _render_technical_section_html callable, 60 QC apps and 15 profiles load. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 23:42:00 +02:00
nickviljoen	096eba747d	fix(tech-check): also render Technical section in generate_html_content Phase 3 patched generate_comprehensive_html_report() but missed the older generate_html_content() generator. The /api/start_analysis flow with output_mode='html' (the path the web UI's download button actually triggers) routes through generate_html_content, so the Technical Details section never appeared in user-downloaded reports despite the technical_report data being present in the underlying result_data. Mirrors the Phase 3 treatment exactly: pre-builds technical_html via _render_technical_section_html(), adds the .technical / .technical-grid / .tech-row CSS rules, and injects {technical_html} between the summary block and the Detailed Analysis Results header. generate_comprehensive_html_report() retains the same logic for the /api/process_file path (line 4187) and the new Box webhook flow (_run_box_triggered_analysis on the Phase 4 branch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 23:28:52 +02:00
nickviljoen	cfb13eb870	chore(secrets): untrack env files + add JWT path to .gitignore backend/config/development.env and backend/config/production.env were committed to the repo with real API keys, SMTP passwords, and Flask SECRET_KEY values. This commit: 1. Adds both files to .gitignore so future edits stop landing in git. 2. git rm --cached's them (local copies preserved on disk, just untracked). 3. Also pre-emptively adds backend/config/box_jwt_config.json to .gitignore — Phase 4 already gitignores it on a separate branch, but listing it here protects the file regardless of merge order. 4. Updates backend/config/.env.template with the new Box JWT-related vars (BOX_JWT_CONFIG_PATH, BOX_WEBHOOK_PRIMARY_KEY, BOX_WEBHOOK_SECONDARY_KEY) so the template is a complete reference for setting up a new environment from scratch. IMPORTANT — secrets still in git history after this commit. Removing them from history requires a destructive rewrite (git filter-repo + force-push every branch). Pragmatic alternative: rotate any secret that was ever in the files. Candidates: OPENAI_API_KEY, BOX_CLIENT_SECRET, SECRET_KEY, SMTP_PASSWORD. AZURE_TENANT_ID and AZURE_CLIENT_ID are public-ish identifiers and don't need rotating. GOOGLE_API_KEY just rotated this session. DEPLOY GOTCHA: deploy.sh does git reset --hard, which will delete the env files from /opt/ai_qc/backend/config/ on the server when this commit lands. Back them up before deploying, restore after: sudo cp /opt/ai_qc/backend/config/development.env /tmp/dev.env.bak # ...deploy... sudo cp /tmp/dev.env.bak /opt/ai_qc/backend/config/development.env sudo systemctl restart ai-qc.service Same dance on prod with production.env when promoting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 23:13:18 +02:00
nickviljoen	65848bcda1	feat(box-jwt): add box_setup.py bootstrap CLI for webhook management One-off script used to register/inspect Box V2 webhooks against the service account. Subcommands: list-webhooks, list-folder, list-clients, create-webhook, delete-webhook, register-all-clients. Typical bootstrap flow on a fresh deploy: 1. Drop box_jwt_config.json on the server (gitignored, scp'd in). 2. Verify the service account can read each client folder: `python backend/scripts/box_setup.py list-folder <folder_id>` 3. Once a client's box_folder_id is set in client_config.py, register its webhook idempotently: `python backend/scripts/box_setup.py register-all-clients \ https://optical-dev.oliver.solutions/ai_qc/api/box/webhook` 4. Copy the signing keys from the Box Developer Console (Custom App → Webhooks) into BOX_WEBHOOK_PRIMARY_KEY / BOX_WEBHOOK_SECONDARY_KEY in the env file, then restart ai-qc.service. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 22:53:03 +02:00
nickviljoen	8f995d557b	feat(box-jwt): JWT service-account client + webhook ingestion endpoint Adds machine-to-machine Box integration alongside the existing per-user OAuth scaffolding. The new JWT client (backend/box_jwt_client.py) is the auth/file/webhook surface used for unattended workflows: load the Custom App JSON config, sign a JWT assertion, exchange for a 60-minute service-account access token (cached + refreshed automatically), and expose file download/upload + V2 webhook CRUD + HMAC signature verification. Wires a new POST /api/box/webhook endpoint (NOT @auth.require_auth — it authenticates each delivery via Box's HMAC signature headers) that: 1. Verifies the signature against env-configured signing keys (BOX_WEBHOOK_PRIMARY_KEY / BOX_WEBHOOK_SECONDARY_KEY). 2. Dedups deliveries by box-delivery-id with a bounded in-memory cache. 3. Maps the source folder to a client via a new get_client_by_box_folder() helper on client_config. 4. Spawns a background thread that downloads the file, runs the same technical pre-flight + LLM check pipeline as the user-uploaded path, writes the HTML report to output/<client>/, uploads the report back to the client's box_reports_folder_id, and logs the run with a synthetic 'box_webhook' user. Webhook runs skip media-plan / localization / OCR context — those are user-UI concepts without a meaningful source in unattended runs. The existing /api/start_analysis path is unchanged. client_config.py gains three optional per-client fields used by the new flow when present: `box_folder_id`, `box_reports_folder_id`, and `default_profile`. Existing client entries keep working without them. .gitignore now excludes backend/config/box_jwt_config.json so the JWT config (with its embedded private key + passphrase) never lands in git. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 22:51:34 +02:00
nickviljoen	377efe30e5	feat(tech-check): show Technical Details section in HTML report Adds a new "Technical Details" card to generate_comprehensive_html_report() between the summary and the per-check detailed results. Renders only the fields present on the technical_report dict (file size, dimensions, DPI, page count, duration, fonts, etc. — vary by file type) and shows a prominent filename-vs-actual match badge when filename hints were parsed. If technical_report is absent or kind==unknown, the section is omitted entirely so reports for assets we can't inspect (e.g. exotic extensions) keep the existing layout unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 22:00:25 +02:00
nickviljoen	2b287f3dbb	feat(tech-check): wire pre-flight into visual + document analysis Runs technical_check.inspect() immediately after file save on both /api/start_analysis (visual flow) and /api/document/start_analysis (document flow). The report is stashed on progress_tracker[session_id] so it survives across the background thread boundary, then surfaces two ways: 1. Each LLM check in the visual flow gets a "Technical metadata" preamble prepended to its prompt via format_for_llm_prompt(), so the model knows the file's actual dimensions, format, page count, etc. without having to infer them visually. 2. result_data['technical_report'] in both flows carries the same dict through to the frontend for UI rendering (next commit). Pre-flight is best-effort: if it fails for any reason, analysis still proceeds without the preamble (silent except for the report.errors list). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:57:11 +02:00
nickviljoen	f4a95914b5	feat(tech-check): add machine-side pre-flight inspection module New backend/technical_check.py extracts technical metadata from uploaded assets via PIL (images), PyMuPDF (PDFs), and ffprobe (videos) — no LLM, runs in milliseconds. Also opportunistically parses dimension hints from the filename and compares them to the actual file, returning a match/mismatch verdict. Output is a JSON-serializable dict; format_for_llm_prompt() renders it as a tight Markdown block that downstream prompts can prepend. Module never raises — inspection errors land in `errors` so partial reports still surface. Standalone for this commit. Wiring into the upload flow and UI lands in subsequent commits on this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:53:06 +02:00
Nick Viljoen	5d1eab493c	Merged in feature/add-demo-clients (pull request #4 ) Feature/add demo clients	2026-05-14 19:34:38 +00:00
nickviljoen	93dc030e0c	feat(clients): add Google, HP, Ferrero as demo placeholders Three new clients in demo/eval phase. Each uses Honda-style minimal setup (static_general + video_general only) until real scope and test assets arrive. Descriptions are placeholders to be replaced once scope is confirmed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:30:18 +02:00
nickviljoen	d1826d83f1	chore(dow-jones): remove client_config entry Drops the 'dow_jones' block from CLIENT_PROFILES. After this, the client picker no longer renders Dow Jones; the four archived profiles are unreachable from user flows. Nine clients remain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:12:47 +02:00
nickviljoen	b23b7f2e17	chore(dow-jones): archive profiles, checks, and per-client doc Moves the Dow Jones / MarketWatch / WSJ profile JSONs (4), check apps (22), and CLAUDE_DOW_JONES.md into backend/_archive/dow_jones/. All moves use git mv so history follows. Adds a restore-instructions README. No loader changes needed — the archive lives outside the scanned directories. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:11:54 +02:00
nickviljoen	a1cfc75309	Merge remote-tracking branch 'origin/develop' into feature/axa-accessibility-profile-split # Conflicts: # CLAUDE_AXA.md	2026-05-10 11:20:09 +02:00
nickviljoen	a46ba9fc71	Split AXA accessibility check into its own profile Removed axa_pdf_accessibility from axa_policy_document (was 8 checks, now 7) and created a new axa_accessibility profile that contains only that check. Marked the new profile strict_grade: true so a single PDF/UA-1 rule failure forces an unmistakable Fail badge on the report — mirrors how axes4 PAC is used in practice (single-purpose, binary verdict). Lets users run accessibility-only QC without sitting through the rest of the policy-document checks, and removes weight from the policy-document score that the accessibility check wasn't really earning (its 0/10 verdict was dragging the overall grade in a way that obscured the content checks). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 11:15:46 +02:00
nickviljoen	2aeff24136	Wire veraPDF into axa_pdf_accessibility for PAC-equivalent PDF/UA-1 validation AXA's accessibility QC team uses axes4 PAC (PDF/UA-1 / Matterhorn Protocol) as their compliance gate, but our existing 9-criterion deterministic check runs surface-level only and would pass documents PAC fails. Wired up the existing _run_verapdf() stub so veraPDF — the open-source Matterhorn implementation — runs as a subprocess and drives the score when available. Verified locally: veraPDF on EAA_v1.pdf reports the exact same Content (86) and Metadata (1) failure counts as PAC's report on the same document family, confirming protocol parity. Falls back cleanly to the deterministic layer when veraPDF isn't installed, so deploys are safe before the binary lands on dev/prod servers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 10:36:03 +02:00
nickviljoen	59a0b2408c	Restructure CLAUDE.md docs: slim project-wide root, complete per-client coverage Splits the monolithic CLAUDE.md (962 lines) into a slim project-wide root (211 lines) plus per-client files. Auto-loaded context drops ~88% per session. Changes: - CLAUDE.md slimmed to project-wide essentials (architecture, auth, deployment, branch strategy, deploy scripts, prod troubleshooting, pre-session checklist). Adds explicit session-start convention pointing to CLAUDE_<CLIENT>.md for client-specific work. Updates client roster table to all 10 clients with profile counts. - New CLAUDE_AXA.md: document-mode pipeline + axa_policy_document profiles - New CLAUDE_DIAGEO.md: key_visual + packaging profiles, check inventories - New CLAUDE_UNILEVER.md: profiles + zero-score logic for face/new visibility - New CLAUDE_HONDA.md, CLAUDE_RANK.md, CLAUDE_GENERAL.md: stubs (clients use generic profiles only — kept for completeness and future expansion) - backend/CLAUDE.md: stale 932-line duplicate replaced with 18-line redirect to root + backend-specific quick pointers Per-client files (CLAUDE_LOREAL.md, CLAUDE_AMAZON.md, CLAUDE_BOOTS.md, CLAUDE_DOW_JONES.md) unchanged — already had the right content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 12:29:16 +02:00
nickviljoen	f5aaf8da24	Merge feature/dow-jones-tuning into develop: WSJ Static prompt tuning	2026-05-06 12:03:56 +02:00
nickviljoen	3b76bf2c9c	Tune WSJ Static prompts: cap whitelist, graphic headline, split-layout logo, 30% sizing cap - wsj_capitalization_punctuation: explicit complete-sentence whitelist + soft-flag pattern for Rule 5 price formatting (price_spacing_correct / price_bolded_correct accept needs_manual_check, new price_formatting_caveat field) - wsj_typography_hierarchy: graphic/illustrative headline awareness — large stylised serif price/number graphics are recognised as the display headline; surrounding sans-serif copy is correctly classified as subhead/body. Stylised price headlines exempt from the period rule. - wsj_logo_compliance: horizontal logo placement allows anchoring to the copy block on split/asymmetric layouts; mandatory sizing assessment block with worked examples, score capped at 6/10 for logos exceeding 30% of longest side. Validated on 3 WSJ-NY test assets across 3 iterations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 12:01:59 +02:00
nickviljoen	cec11f1f6a	Tune Boots PPack prompts: superscript guard, ALL CAPS / logotype exceptions, weight/sizing limits Three rounds of prompt tuning against the Remington (4p), Easter Overlay (18p), and Grenade (7p) sample packs. Easter Overlay (the noisiest) climbed 72.38 → 78.97 → 80.04 across iterations, with strict-grade violations dropping 27 → 18 → 14. Remaining violations are now genuine compliance issues — the noise patterns are cleared. boots_caveat_compliance: - Superscript guard: vision LLM was flagging every roundel asterisk as superscript because the * glyph naturally sits high in its line. Strict two-feature rule now required (raised baseline AND visibly shrunk ~50-60% of body). Borderline cases → "needs_manual_check" with new superscript_caveat field. Caveat avg 4.4 → 7.27. - Same vision-LLM caveat applied to weight_matching (Light vs Regular at small sizes is below detection threshold) and sizing_compliant (1-2pt size differences below detection threshold). New weight_caveat and sizing_caveat fields. Reserved 1-2 score band for unambiguous critical violations only. - Explicit scoring principle: "when in doubt, prefer 7-8 with manual_check flags over a lower confident-violation score". boots_brand_name_accuracy: - ALL CAPS retail convention now explicitly acceptable. L'OREAL, ESTEE LAUDER, MAYBELLINE etc. no longer flagged as casing errors — only structural element mismatches (accents, hyphens, apostrophes, special chars) count. - Stylised brand logotype exception: known logomarks like `17` for SEVENTEEN, &SISTERS ampersand styling, e.l.f. dot rendering are Pass — surfaced via new logotype_observations field. - Brand name avg 5.53 → 7.47 → 6.67 (LLM run-to-run variability). Strongest real catch in dataset: Easter Overlay page 14 is labelled for the ROI market in production notes but uses £ instead of € on the artwork. Exactly the pre-press error worth surfacing. Caught consistently across all runs by boots_currency_locale. CLAUDE_BOOTS.md updated with three-pack smoke-test table, vision-LLM limitations summary, and the four reusable prompt-tuning patterns that worked on this build. Local-only — feature/boots-ppack remains unmerged until after Boots show-and-tell. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 16:26:11 +02:00
nickviljoen	50d0063b37	Add Boots Production Pack profile (multi-page document mode) New profile boots_ppack for QCing multi-page Boots production packs (PowerPoint-exported PDFs, 4-18 pages each). Built on top of AXA's document-mode infrastructure — branched off feature/axa-document-mode because it reuses the dispatcher, ingest, and result writer. New checks: - boots_logo_compliance — three-path scoring (master wordmark / partner lock-up / no branding) so OLIVER x BOOTS-style footer lock-ups aren't scored against master wordmark rules. Conservative without a formal Boots logo guideline. - boots_colour_palette — verifies CMYK/RGB/Hex spec values on creative- guidance pages against canonical Boots Blue / Health Primary Blue / Offer Red, plus visual sanity-check on artwork pages. Existing checks tuned: - boots_brand_name_accuracy: closed-world list semantics. Brands not on the approved list now go to names_not_on_list (manual review) instead of failing — the list is sourced from the original 7 docs and is known incomplete (Remington, Imodium, Maybelline etc. are legitimate Boots- stocked brands not on it). - boots_tandc_wording: explicit font-weight caveat — Boots Sharp Regular vs Light isn't reliably distinguishable by vision LLM at small sizes. Surfaced via font_weight_caveat field + needs_manual_check value. Page classifier (document_mode/page_classifier.py): Heuristic tags each page as cover / checklist / palette / notes / artwork. Validated on all 10 sample packs. Strict-grade exemption (Profile.strict_grade flag): Only artwork-classified pages count towards Pass/Fail. Cover, checklist, palette, and notes pages are still QC'd and reported as Informational but cannot trigger a Fail. Banner shows exactly which artwork-page checks fell below 6. Result writer extended: - Per-page table with score + page_type pill for any page_each-scope check (auto-applied as fallback) - Strict-grade banner (red on violation, green when clean) - Page_type pills throughout the per-page strip Smoke-test result (Remington 4-page pack, 2026-05-05): Overall 70.75/100, strict-grade Fail. After two iterations of prompt tuning, all three remaining strict-grade violations are real catches: orphan asterisk in T&Cs, "they may not be stocked" wording deviation, missing "Charges may apply". brand_name_accuracy 7.0 (was 3.0 before list fix), logo_compliance 9.5 (was 1.5 before lock-up path fix). Local-only — not pushed to dev or merged to develop until after Boots show-and-tell. Same posture as feature/axa-document-mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:47:13 +02:00
nickviljoen	90563b8cf2	Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5) Multi-page PDF QC for AXA Ireland policy documents. Runs as a third mode alongside static + video, gated on profile.mode. New code isolated under backend/document_mode/ with new endpoints under /api/document/*. Phase 1 — Spine + 6 deterministic doc-scope checks ($0, runs in seconds): - Scope-aware dispatcher (document/targeted/page_sample/page_pair/page_each) - axa_font_inventory, axa_phone_inventory, axa_bold_words_definitions, axa_page_numbering, axa_print_code, axa_omg_versioning - Bootstrap bold-words dictionary extracted from Example 1 General Definitions Phase 3 — Old-vs-new diff (~$0.50/run, 3-5 min): - Page alignment via difflib SequenceMatcher (windowed fuzzy match) - Vision-LLM page-pair diff via Gemini 2.5 Pro (8 concurrent) - Two-slot upload UX, axa_policy_document_diff profile, mode=document_diff Phase 4 — PDF accessibility (PyMuPDF, $0): - 9 PDF/UA-1 aligned criteria (tagged structure, /MarkInfo, title, /Lang, encryption, font embedding, PDF version, XMP UA-conformance, alt-text) - _run_verapdf() stub for optional Java-based veraPDF integration later Phase 5 — Print preflight (PyMuPDF, $0): - 7 criteria (page geometry, bleed, image colour spaces, image DPI, transparency, PDF/X conformance, spot colours) Profile additions: - axa_policy_document — 8 deterministic checks, $0 cost - axa_policy_document_diff — 1 page-pair LLM check, ~$0.50/run API additions: - POST /api/document/start_analysis (single PDF) - POST /api/document/start_diff (old + new PDFs) Frontend additions: - Third profile.mode value (document_diff) in applyProfileMode() - Two-slot upload UX with PDF-only file pickers - checkFormValidity() branches by mode for the analyse-button gate Smoke-tested locally against Example 1 (Home Insurance V8, 86pp) and Example 2 (Landlord V1 vs V10, 68→74pp) with real findings caught including bold-words gaps, missing PDF/UA flag, transparency on press, V1→V10 bold-formatting fixes. Plan + integration map + gotchas in backend/AXA_DOCUMENT_MODE_PLAN.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 18:38:14 +02:00
nickviljoen	67ed7fdd9d	Add wsj podcast profile to Dow Jones client, File naming check added to all profiles	2026-04-29 18:17:36 +02:00
nickviljoen	b32e8f0c8b	Add wsj podcast profile to Dow Jones client, File naming check added to all profiles	2026-04-29 18:09:58 +02:00
nickviljoen	24c716df77	Fix /api/access_request iterating list_access_entries() as a list list_access_entries() returns a dict {default_clients, entries} but the endpoint iterated it directly, which yields the dict keys (strings) and then crashed on .get('is_admin') with "'str' object has no attribute 'get'". Read access_data['entries'] instead so admin recipients are collected correctly and the request email actually sends. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 08:24:22 +02:00

1 2 3

136 commits