# AXA Client Documentation > Referenced from main CLAUDE.md. Detailed AXA QC profile descriptions, document-mode pipeline notes, and status. ## Overview AXA QC is built around **document-mode** — multi-page PDF analysis (policy documents, forms, brochures), not single-asset image checks. The document-mode subsystem (`backend/document_mode/`) was built for AXA and is now reused by Boots Production Pack. **Status (2026-05-10):** Phases 1, 3, 4, 5, 6 merged to `develop` and live on dev (`https://optical-dev.oliver.solutions/ai_qc/`). Phase 6 wires veraPDF into the accessibility check (PAC-equivalent PDF/UA-1 validation) and splits accessibility into its own dedicated profile. Email to AXA pending — explains Adobe vs PAC + veraPDF parity findings + requests the original `axa-transaction-charges-100326.pdf` so we can run a true apples-to-apples comparison. Not yet on prod — held for AXA show-and-tell + email response. Full plan in `backend/AXA_DOCUMENT_MODE_PLAN.md`. ## AXA Profiles ### `axa_policy_document` — single-document mode (7 checks) Multi-page policy document QC. `mode: document`, scopes vary per check. Accessibility validation lives in the dedicated `axa_accessibility` profile, not here. | Check | What it does | Weight | |------|--------------|--------| | `axa_font_inventory` | Per-page font extraction + brand-font compliance against AXA's approved font list | 1.0 | | `axa_phone_inventory` | Extracts phone numbers across pages, validates format and approved-list membership | 1.0 | | `axa_bold_words_definitions` | Bold-word inventory + definition cross-check (seed list at `backend/document_mode/data/axa_bold_words_seed.json`) | 2.0 | | `axa_page_numbering` | Page numbering format and continuity | 1.0 | | `axa_print_preflight` | Print-preflight checks (color space, embedded fonts, image resolution) | 1.0 | | `axa_print_code` | Print code presence + format | 1.0 | | `axa_omg_versioning` | OMG version footer/header presence and consistency | 1.0 | ### `axa_accessibility` — accessibility-only mode (1 check, strict-grade) `mode: document`, `strict_grade: true`. Standalone PDF/UA-1 validation for users who only need to check accessibility compliance without the full policy-document content suite. Mirrors how axes4 PAC is used — single-purpose, binary verdict. | Check | What it does | Weight | |------|--------------|--------| | `axa_pdf_accessibility` | PDF/UA-1 validation via veraPDF (matches axes4 PAC), with deterministic PyMuPDF fallback if veraPDF is not installed | 1.0 | ### `axa_policy_document_diff` — old-vs-new diff mode (1 check) `mode: document_diff` — compares two PDFs (old vs new policy version) and reports structured changes. | Check | What it does | Weight | |------|--------------|--------| | `axa_pdf_diff` | Detects added/removed/modified pages, paragraphs, defined terms, phone numbers | 1.0 | ## Document-mode infrastructure AXA's document-mode subsystem is the foundation for all multi-page PDF QC in this app: - `document_mode/ingest.py` — PDF ingestion, page rendering, span/font/color extraction via PyMuPDF - `document_mode/dispatcher.py` — Orchestrates per-check execution against pages, supports scopes: `document` / `targeted` / `page_sample` / `page_pair` / `page_each` - `document_mode/checks.py`, `print_preflight_checks.py`, `accessibility_checks.py` — AXA check implementations - `document_mode/diff_engine.py`, `diff_report_writer.py` — Old-vs-new diff handling - `document_mode/result_writer.py` — HTML report rendering with per-page sections Boots Production Pack reuses this entire spine — so any infra changes here affect both clients. ## AI usage across AXA tools For client-facing context: **8 of 9 AXA tools are deterministic** (no LLM, $0 cost, runs in seconds). Only `axa_pdf_diff` uses AI — Gemini 2.5 Pro vision-LLM page-pair comparison at ~$0.40-0.80 per pair, supplemented by a deterministic PyMuPDF span comparator that catches bold/italic flips the vision-LLM misses (font/size/colour changes are left to the LLM narrative diff — flagging them deterministically drowns out the bold/italic regressions on re-branded documents). The accessibility check uses veraPDF, which is a rule-based open-source PDF/UA-1 validator — not AI. This framing matters when clients conflate "automation" with "AI". | Tool | Type | Engine | |---|---|---| | `axa_font_inventory`, `axa_phone_inventory`, `axa_bold_words_definitions`, `axa_page_numbering`, `axa_print_code`, `axa_omg_versioning` | Deterministic | PyMuPDF (text + font extraction, regex) | | `axa_print_preflight` | Deterministic | PyMuPDF (page geometry, image colour spaces, DPI, transparency, PDF/X) | | `axa_pdf_accessibility` | Deterministic (rule-based) | veraPDF subprocess (PDF/UA-1 / Matterhorn Protocol) + PyMuPDF fallback | | `axa_pdf_diff` | **AI + deterministic** | Gemini 2.5 Pro vision-LLM (content + font/size/colour narrative) + PyMuPDF span comparator (bold/italic flip detection) | ## Open items - AXA show-and-tell pending — feedback will drive the next round of tuning - Awaiting `axa-transaction-charges-100326.pdf` from AXA (the file PAC was run against) — needed to fully confirm veraPDF↔PAC parity on the Structure Elements rule bucket - Phase 2 (any further check expansion) deferred until after show-and-tell - Canonical AXA font list / approved phone list / OMG version reference data may need expansion as test PDFs surface gaps - Prod deployment of veraPDF + `axa_accessibility` profile — held until AXA confirms findings on dev ## veraPDF deployment `axa_pdf_accessibility` runs the **veraPDF** PDF/UA-1 validator as a subprocess when the binary is available. veraPDF implements the Matterhorn Protocol — the same rule set axes4 PAC uses — so its verdict is the closest open-source equivalent to PAC. Binary resolution order (in `accessibility_checks._resolve_verapdf_binary`): 1. `VERAPDF_BIN` env var 2. `verapdf` on PATH 3. `/opt/ai_qc/vendor/verapdf/verapdf` (project-local production install) If veraPDF isn't installed the check falls back to the 9-criterion deterministic PyMuPDF layer — no breakage, just less depth. **Production install pattern** is a project-local bundled-JRE tarball under `/opt/ai_qc/vendor/verapdf/` to avoid touching system Java or other projects on shared servers. ## Key files - `backend/AXA_DOCUMENT_MODE_PLAN.md` — full design plan and phase breakdown - `backend/document_mode/` — pipeline implementation - `backend/profiles/axa_policy_document.json`, `axa_accessibility.json`, `axa_policy_document_diff.json` - `backend/document_mode/data/axa_bold_words_seed.json` — bold-word seed list