First real-data test against the AXA car-insurance PDFs surfaced a
noise problem: the new document is a brand refresh — every page flips
font (PublicoBanner-Bold→PublicoHeadline-Bold) and colour
(#893f4a→#2e3092). At medium-per-finding that crashed the diff score
to 0.0 and drowned the bold-regression signal AXA actually flagged.
Drop font, size, colour comparators. Keep bold + italic — the
attributes the vision-LLM consistently misses on dense layouts. The
LLM already narrates colour-scheme rebrands and font swaps in its
Modified / Style-changes blocks; running both layers on the same
visual change just double-counts it.
Tests inverted from "X change is flagged" to "X change is NOT
flagged" to lock the scope decision in.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously every boolean attribute rendered as "Bold → Regular",
producing "Italic: Bold → Regular" for italic flips. Now the helper
takes the attribute name and emits "Italic → Regular" or
"Bold → Regular" depending on which boolean attribute is being shown.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a "🎨 Formatting changes" block to the per-page diff report
when the deterministic formatting layer finds typographic flips.
Distinguishes page-wide style shifts from local span flips, lists up
to three example quotes per aggregated finding, and HTML-escapes all
user-controlled strings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
run_page_pair_diff now invokes compute_formatting_diff alongside the
LLM call for each aligned pair. When the deterministic layer finds
typographic flips on a page the LLM saw as identical, the pair is
re-classified as having differences with medium severity. Each
aggregated finding contributes to the global medium-severity tally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three review-driven hardening tweaks:
- page_wide now requires ≥3 matched spans (PAGE_WIDE_MIN_SPANS).
Avoids labelling section-break pages with a single flipped heading
as page-wide.
- _collect_flips normalises bold/italic via bool() and font/color
via "or ''" so callers passing dicts without those keys do not
produce phantom flips against False/''.
- Adds tests for empty span lists and the missing-bold-key case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New formatting_diff module compares span-level bold/italic/font/size/
color attributes between aligned page-pairs. Pure-Python; reads
PyMuPDF metadata already captured during ingest. Aggregates identical
flips into single findings and flags page-wide style shifts.
Powers the AXA document_diff fix for missed formatting changes that
the vision-LLM does not reliably detect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a 'color' field to each span dict extracted by
_extract_page_spans. Powers the upcoming deterministic
formatting-diff layer for AXA document_diff mode.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>