ai_qc/backend/tests
nickviljoen 29ee941037 refactor(formatting_diff): narrow scope to bold + italic only
First real-data test against the AXA car-insurance PDFs surfaced a
noise problem: the new document is a brand refresh — every page flips
font (PublicoBanner-Bold→PublicoHeadline-Bold) and colour
(#893f4a→#2e3092). At medium-per-finding that crashed the diff score
to 0.0 and drowned the bold-regression signal AXA actually flagged.

Drop font, size, colour comparators. Keep bold + italic — the
attributes the vision-LLM consistently misses on dense layouts. The
LLM already narrates colour-scheme rebrands and font swaps in its
Modified / Style-changes blocks; running both layers on the same
visual change just double-counts it.

Tests inverted from "X change is flagged" to "X change is NOT
flagged" to lock the scope decision in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 12:37:19 +02:00
..
test_diff_engine_formatting_integration.py feat(diff_engine): merge formatting_diff findings into pair_diffs 2026-05-19 10:03:54 +02:00
test_diff_report_formatting_block.py fix(diff_report): _fmt_value labels italic flips correctly 2026-05-19 10:22:39 +02:00
test_formatting_diff.py refactor(formatting_diff): narrow scope to bold + italic only 2026-05-19 12:37:19 +02:00
test_ingest_color.py feat(ingest): capture span color as #rrggbb string 2026-05-19 09:45:21 +02:00