ai_qc

History

nickviljoen 1c5dd980d4 perf(document-mode): parallelize per-page check dispatch in stages 3c/3d A 4-page Boots PPack run (7 page-scoped checks) was taking ~15 min because the dispatcher processed pages sequentially within each check — 28 Gemini calls in a single file. Asset-mode's ThreadPoolExecutor parallelism was bypassed because doc-mode called process_checks_in_batches once per page in a loop. Wrap the per-page dispatch in both Stage 3c (page_sample) and Stage 3d (page_each) with a ThreadPoolExecutor (max_workers=4). Extract the per-page work into a single nested helper used by both stages, which also tags each result with page_type so the existing artwork vs informational aggregation in Stage 3d keeps working. Aggregation logic, scoring, strict-grade override, and report shape are all unchanged. process_checks_in_batches is already reentrant (asset-mode uses it under its own internal ThreadPoolExecutor), so concurrent calls are safe. Progress-tracker writes intentionally tolerate races (visual only). Per-page exceptions are caught inside the helper so one bad page doesn't kill the doc — it just records a score-0 result. Expected: 15 min → ~3-4 min on the same 4-page PDF. Needs wall-time confirmation on dev with a real run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-17 18:14:27 +02:00
..
data	Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5)	2026-05-01 18:38:14 +02:00
__init__.py	Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5)	2026-05-01 18:38:14 +02:00
accessibility_checks.py	Wire veraPDF into axa_pdf_accessibility for PAC-equivalent PDF/UA-1 validation	2026-05-10 10:36:03 +02:00
checks.py	Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5)	2026-05-01 18:38:14 +02:00
diff_engine.py	Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5)	2026-05-01 18:38:14 +02:00
diff_report_writer.py	Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5)	2026-05-01 18:38:14 +02:00
dispatcher.py	perf(document-mode): parallelize per-page check dispatch in stages 3c/3d	2026-05-17 18:14:27 +02:00
ingest.py	Add Boots Production Pack profile (multi-page document mode)	2026-05-05 12:47:13 +02:00
page_classifier.py	Add Boots Production Pack profile (multi-page document mode)	2026-05-05 12:47:13 +02:00
print_preflight_checks.py	Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5)	2026-05-01 18:38:14 +02:00
result_writer.py	Wire veraPDF into axa_pdf_accessibility for PAC-equivalent PDF/UA-1 validation	2026-05-10 10:36:03 +02:00