First real-data test against the AXA car-insurance PDFs surfaced a
noise problem: the new document is a brand refresh — every page flips
font (PublicoBanner-Bold→PublicoHeadline-Bold) and colour
(#893f4a→#2e3092). At medium-per-finding that crashed the diff score
to 0.0 and drowned the bold-regression signal AXA actually flagged.
Drop font, size, colour comparators. Keep bold + italic — the
attributes the vision-LLM consistently misses on dense layouts. The
LLM already narrates colour-scheme rebrands and font swaps in its
Modified / Style-changes blocks; running both layers on the same
visual change just double-counts it.
Tests inverted from "X change is flagged" to "X change is NOT
flagged" to lock the scope decision in.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the AXA client doc to reflect the 2026-05-10 state:
- Status line now reads 2026-05-10, covers Phase 6 (veraPDF), profile split,
and dev deploy
- New "AI usage across AXA tools" section for client-facing communication
(8 of 9 tools deterministic, only axa_pdf_diff uses AI)
- Open items expanded to include the pending source-PDF request and the
prod-deployment hold
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removed axa_pdf_accessibility from axa_policy_document (was 8 checks, now 7)
and created a new axa_accessibility profile that contains only that check.
Marked the new profile strict_grade: true so a single PDF/UA-1 rule failure
forces an unmistakable Fail badge on the report — mirrors how axes4 PAC is
used in practice (single-purpose, binary verdict).
Lets users run accessibility-only QC without sitting through the rest of
the policy-document checks, and removes weight from the policy-document
score that the accessibility check wasn't really earning (its 0/10 verdict
was dragging the overall grade in a way that obscured the content checks).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AXA's accessibility QC team uses axes4 PAC (PDF/UA-1 / Matterhorn Protocol)
as their compliance gate, but our existing 9-criterion deterministic check
runs surface-level only and would pass documents PAC fails. Wired up the
existing _run_verapdf() stub so veraPDF — the open-source Matterhorn
implementation — runs as a subprocess and drives the score when available.
Verified locally: veraPDF on EAA_v1.pdf reports the exact same Content (86)
and Metadata (1) failure counts as PAC's report on the same document family,
confirming protocol parity.
Falls back cleanly to the deterministic layer when veraPDF isn't installed,
so deploys are safe before the binary lands on dev/prod servers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>