ai_qc/backend/document_mode
nickviljoen 29ee941037 refactor(formatting_diff): narrow scope to bold + italic only
First real-data test against the AXA car-insurance PDFs surfaced a
noise problem: the new document is a brand refresh — every page flips
font (PublicoBanner-Bold→PublicoHeadline-Bold) and colour
(#893f4a→#2e3092). At medium-per-finding that crashed the diff score
to 0.0 and drowned the bold-regression signal AXA actually flagged.

Drop font, size, colour comparators. Keep bold + italic — the
attributes the vision-LLM consistently misses on dense layouts. The
LLM already narrates colour-scheme rebrands and font swaps in its
Modified / Style-changes blocks; running both layers on the same
visual change just double-counts it.

Tests inverted from "X change is flagged" to "X change is NOT
flagged" to lock the scope decision in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 12:37:19 +02:00
..
data Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5) 2026-05-01 18:38:14 +02:00
__init__.py Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5) 2026-05-01 18:38:14 +02:00
accessibility_checks.py Wire veraPDF into axa_pdf_accessibility for PAC-equivalent PDF/UA-1 validation 2026-05-10 10:36:03 +02:00
checks.py Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5) 2026-05-01 18:38:14 +02:00
diff_engine.py fix(diff_engine): guard compute_formatting_diff against per-pair failure 2026-05-19 10:31:16 +02:00
diff_report_writer.py fix(diff_report): _fmt_value labels italic flips correctly 2026-05-19 10:22:39 +02:00
dispatcher.py perf(document-mode): parallelize per-page check dispatch in stages 3c/3d 2026-05-17 18:14:27 +02:00
formatting_diff.py refactor(formatting_diff): narrow scope to bold + italic only 2026-05-19 12:37:19 +02:00
ingest.py feat(document_mode): add deterministic span formatting diff 2026-05-19 09:56:34 +02:00
page_classifier.py Add Boots Production Pack profile (multi-page document mode) 2026-05-05 12:47:13 +02:00
print_preflight_checks.py Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5) 2026-05-01 18:38:14 +02:00
result_writer.py Wire veraPDF into axa_pdf_accessibility for PAC-equivalent PDF/UA-1 validation 2026-05-10 10:36:03 +02:00