ai_qc/backend/document_mode
nickviljoen 1c5dd980d4 perf(document-mode): parallelize per-page check dispatch in stages 3c/3d
A 4-page Boots PPack run (7 page-scoped checks) was taking ~15 min
because the dispatcher processed pages sequentially within each
check — 28 Gemini calls in a single file. Asset-mode's
ThreadPoolExecutor parallelism was bypassed because doc-mode called
process_checks_in_batches once per page in a loop.

Wrap the per-page dispatch in both Stage 3c (page_sample) and Stage
3d (page_each) with a ThreadPoolExecutor (max_workers=4). Extract
the per-page work into a single nested helper used by both stages,
which also tags each result with page_type so the existing artwork
vs informational aggregation in Stage 3d keeps working. Aggregation
logic, scoring, strict-grade override, and report shape are all
unchanged.

process_checks_in_batches is already reentrant (asset-mode uses it
under its own internal ThreadPoolExecutor), so concurrent calls are
safe. Progress-tracker writes intentionally tolerate races (visual
only). Per-page exceptions are caught inside the helper so one bad
page doesn't kill the doc — it just records a score-0 result.

Expected: 15 min → ~3-4 min on the same 4-page PDF. Needs wall-time
confirmation on dev with a real run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 18:14:27 +02:00
..
data Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5) 2026-05-01 18:38:14 +02:00
__init__.py Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5) 2026-05-01 18:38:14 +02:00
accessibility_checks.py Wire veraPDF into axa_pdf_accessibility for PAC-equivalent PDF/UA-1 validation 2026-05-10 10:36:03 +02:00
checks.py Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5) 2026-05-01 18:38:14 +02:00
diff_engine.py Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5) 2026-05-01 18:38:14 +02:00
diff_report_writer.py Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5) 2026-05-01 18:38:14 +02:00
dispatcher.py perf(document-mode): parallelize per-page check dispatch in stages 3c/3d 2026-05-17 18:14:27 +02:00
ingest.py Add Boots Production Pack profile (multi-page document mode) 2026-05-05 12:47:13 +02:00
page_classifier.py Add Boots Production Pack profile (multi-page document mode) 2026-05-05 12:47:13 +02:00
print_preflight_checks.py Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5) 2026-05-01 18:38:14 +02:00
result_writer.py Wire veraPDF into axa_pdf_accessibility for PAC-equivalent PDF/UA-1 validation 2026-05-10 10:36:03 +02:00