ai_qc

Author	SHA1	Message	Date
nickviljoen	29ee941037	refactor(formatting_diff): narrow scope to bold + italic only First real-data test against the AXA car-insurance PDFs surfaced a noise problem: the new document is a brand refresh — every page flips font (PublicoBanner-Bold→PublicoHeadline-Bold) and colour (#893f4a→#2e3092). At medium-per-finding that crashed the diff score to 0.0 and drowned the bold-regression signal AXA actually flagged. Drop font, size, colour comparators. Keep bold + italic — the attributes the vision-LLM consistently misses on dense layouts. The LLM already narrates colour-scheme rebrands and font swaps in its Modified / Style-changes blocks; running both layers on the same visual change just double-counts it. Tests inverted from "X change is flagged" to "X change is NOT flagged" to lock the scope decision in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 12:37:19 +02:00
nickviljoen	d327776c70	fix(diff_engine): guard compute_formatting_diff against per-pair failure If the deterministic formatting comparator raises on any single page-pair (e.g. unexpected span shape from a future PyMuPDF version), degrade to zero formatting findings for that pair instead of aborting the whole 52-page diff run. Logged for visibility. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 10:31:16 +02:00
nickviljoen	0fd6a35562	fix(diff_report): _fmt_value labels italic flips correctly Previously every boolean attribute rendered as "Bold → Regular", producing "Italic: Bold → Regular" for italic flips. Now the helper takes the attribute name and emits "Italic → Regular" or "Bold → Regular" depending on which boolean attribute is being shown. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 10:22:39 +02:00
nickviljoen	7eaac85df3	feat(diff_report): render formatting_changes as a per-pair block Adds a "🎨 Formatting changes" block to the per-page diff report when the deterministic formatting layer finds typographic flips. Distinguishes page-wide style shifts from local span flips, lists up to three example quotes per aggregated finding, and HTML-escapes all user-controlled strings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 10:08:47 +02:00
nickviljoen	2b1bb9ccf0	feat(diff_engine): merge formatting_diff findings into pair_diffs run_page_pair_diff now invokes compute_formatting_diff alongside the LLM call for each aligned pair. When the deterministic layer finds typographic flips on a page the LLM saw as identical, the pair is re-classified as having differences with medium severity. Each aggregated finding contributes to the global medium-severity tally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 10:03:54 +02:00
nickviljoen	d21a8a276d	refactor(formatting_diff): harden page_wide threshold + None-key handling Three review-driven hardening tweaks: - page_wide now requires ≥3 matched spans (PAGE_WIDE_MIN_SPANS). Avoids labelling section-break pages with a single flipped heading as page-wide. - _collect_flips normalises bold/italic via bool() and font/color via "or ''" so callers passing dicts without those keys do not produce phantom flips against False/''. - Adds tests for empty span lists and the missing-bold-key case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 10:01:23 +02:00
nickviljoen	98679e7329	feat(document_mode): add deterministic span formatting diff New formatting_diff module compares span-level bold/italic/font/size/ color attributes between aligned page-pairs. Pure-Python; reads PyMuPDF metadata already captured during ingest. Aggregates identical flips into single findings and flags page-wide style shifts. Powers the AXA document_diff fix for missed formatting changes that the vision-LLM does not reliably detect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 09:56:34 +02:00
nickviljoen	f69e181520	feat(ingest): capture span color as #rrggbb string Adds a 'color' field to each span dict extracted by _extract_page_spans. Powers the upcoming deterministic formatting-diff layer for AXA document_diff mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 09:45:21 +02:00
nickviljoen	1c5dd980d4	perf(document-mode): parallelize per-page check dispatch in stages 3c/3d A 4-page Boots PPack run (7 page-scoped checks) was taking ~15 min because the dispatcher processed pages sequentially within each check — 28 Gemini calls in a single file. Asset-mode's ThreadPoolExecutor parallelism was bypassed because doc-mode called process_checks_in_batches once per page in a loop. Wrap the per-page dispatch in both Stage 3c (page_sample) and Stage 3d (page_each) with a ThreadPoolExecutor (max_workers=4). Extract the per-page work into a single nested helper used by both stages, which also tags each result with page_type so the existing artwork vs informational aggregation in Stage 3d keeps working. Aggregation logic, scoring, strict-grade override, and report shape are all unchanged. process_checks_in_batches is already reentrant (asset-mode uses it under its own internal ThreadPoolExecutor), so concurrent calls are safe. Progress-tracker writes intentionally tolerate races (visual only). Per-page exceptions are caught inside the helper so one bad page doesn't kill the doc — it just records a score-0 result. Expected: 15 min → ~3-4 min on the same 4-page PDF. Needs wall-time confirmation on dev with a real run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 18:14:27 +02:00
nickviljoen	2aeff24136	Wire veraPDF into axa_pdf_accessibility for PAC-equivalent PDF/UA-1 validation AXA's accessibility QC team uses axes4 PAC (PDF/UA-1 / Matterhorn Protocol) as their compliance gate, but our existing 9-criterion deterministic check runs surface-level only and would pass documents PAC fails. Wired up the existing _run_verapdf() stub so veraPDF — the open-source Matterhorn implementation — runs as a subprocess and drives the score when available. Verified locally: veraPDF on EAA_v1.pdf reports the exact same Content (86) and Metadata (1) failure counts as PAC's report on the same document family, confirming protocol parity. Falls back cleanly to the deterministic layer when veraPDF isn't installed, so deploys are safe before the binary lands on dev/prod servers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 10:36:03 +02:00
nickviljoen	50d0063b37	Add Boots Production Pack profile (multi-page document mode) New profile boots_ppack for QCing multi-page Boots production packs (PowerPoint-exported PDFs, 4-18 pages each). Built on top of AXA's document-mode infrastructure — branched off feature/axa-document-mode because it reuses the dispatcher, ingest, and result writer. New checks: - boots_logo_compliance — three-path scoring (master wordmark / partner lock-up / no branding) so OLIVER x BOOTS-style footer lock-ups aren't scored against master wordmark rules. Conservative without a formal Boots logo guideline. - boots_colour_palette — verifies CMYK/RGB/Hex spec values on creative- guidance pages against canonical Boots Blue / Health Primary Blue / Offer Red, plus visual sanity-check on artwork pages. Existing checks tuned: - boots_brand_name_accuracy: closed-world list semantics. Brands not on the approved list now go to names_not_on_list (manual review) instead of failing — the list is sourced from the original 7 docs and is known incomplete (Remington, Imodium, Maybelline etc. are legitimate Boots- stocked brands not on it). - boots_tandc_wording: explicit font-weight caveat — Boots Sharp Regular vs Light isn't reliably distinguishable by vision LLM at small sizes. Surfaced via font_weight_caveat field + needs_manual_check value. Page classifier (document_mode/page_classifier.py): Heuristic tags each page as cover / checklist / palette / notes / artwork. Validated on all 10 sample packs. Strict-grade exemption (Profile.strict_grade flag): Only artwork-classified pages count towards Pass/Fail. Cover, checklist, palette, and notes pages are still QC'd and reported as Informational but cannot trigger a Fail. Banner shows exactly which artwork-page checks fell below 6. Result writer extended: - Per-page table with score + page_type pill for any page_each-scope check (auto-applied as fallback) - Strict-grade banner (red on violation, green when clean) - Page_type pills throughout the per-page strip Smoke-test result (Remington 4-page pack, 2026-05-05): Overall 70.75/100, strict-grade Fail. After two iterations of prompt tuning, all three remaining strict-grade violations are real catches: orphan asterisk in T&Cs, "they may not be stocked" wording deviation, missing "Charges may apply". brand_name_accuracy 7.0 (was 3.0 before list fix), logo_compliance 9.5 (was 1.5 before lock-up path fix). Local-only — not pushed to dev or merged to develop until after Boots show-and-tell. Same posture as feature/axa-document-mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:47:13 +02:00
nickviljoen	90563b8cf2	Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5) Multi-page PDF QC for AXA Ireland policy documents. Runs as a third mode alongside static + video, gated on profile.mode. New code isolated under backend/document_mode/ with new endpoints under /api/document/*. Phase 1 — Spine + 6 deterministic doc-scope checks ($0, runs in seconds): - Scope-aware dispatcher (document/targeted/page_sample/page_pair/page_each) - axa_font_inventory, axa_phone_inventory, axa_bold_words_definitions, axa_page_numbering, axa_print_code, axa_omg_versioning - Bootstrap bold-words dictionary extracted from Example 1 General Definitions Phase 3 — Old-vs-new diff (~$0.50/run, 3-5 min): - Page alignment via difflib SequenceMatcher (windowed fuzzy match) - Vision-LLM page-pair diff via Gemini 2.5 Pro (8 concurrent) - Two-slot upload UX, axa_policy_document_diff profile, mode=document_diff Phase 4 — PDF accessibility (PyMuPDF, $0): - 9 PDF/UA-1 aligned criteria (tagged structure, /MarkInfo, title, /Lang, encryption, font embedding, PDF version, XMP UA-conformance, alt-text) - _run_verapdf() stub for optional Java-based veraPDF integration later Phase 5 — Print preflight (PyMuPDF, $0): - 7 criteria (page geometry, bleed, image colour spaces, image DPI, transparency, PDF/X conformance, spot colours) Profile additions: - axa_policy_document — 8 deterministic checks, $0 cost - axa_policy_document_diff — 1 page-pair LLM check, ~$0.50/run API additions: - POST /api/document/start_analysis (single PDF) - POST /api/document/start_diff (old + new PDFs) Frontend additions: - Third profile.mode value (document_diff) in applyProfileMode() - Two-slot upload UX with PDF-only file pickers - checkFormValidity() branches by mode for the analyse-button gate Smoke-tested locally against Example 1 (Home Insurance V8, 86pp) and Example 2 (Landlord V1 vs V10, 68→74pp) with real findings caught including bold-words gaps, missing PDF/UA flag, transparency on press, V1→V10 bold-formatting fixes. Plan + integration map + gotchas in backend/AXA_DOCUMENT_MODE_PLAN.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 18:38:14 +02:00

12 commits