Commit graph

136 commits

Author SHA1 Message Date
nickviljoen
29ee941037 refactor(formatting_diff): narrow scope to bold + italic only
First real-data test against the AXA car-insurance PDFs surfaced a
noise problem: the new document is a brand refresh — every page flips
font (PublicoBanner-Bold→PublicoHeadline-Bold) and colour
(#893f4a→#2e3092). At medium-per-finding that crashed the diff score
to 0.0 and drowned the bold-regression signal AXA actually flagged.

Drop font, size, colour comparators. Keep bold + italic — the
attributes the vision-LLM consistently misses on dense layouts. The
LLM already narrates colour-scheme rebrands and font swaps in its
Modified / Style-changes blocks; running both layers on the same
visual change just double-counts it.

Tests inverted from "X change is flagged" to "X change is NOT
flagged" to lock the scope decision in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 12:37:19 +02:00
nickviljoen
d327776c70 fix(diff_engine): guard compute_formatting_diff against per-pair failure
If the deterministic formatting comparator raises on any single page-pair
(e.g. unexpected span shape from a future PyMuPDF version), degrade to
zero formatting findings for that pair instead of aborting the whole
52-page diff run. Logged for visibility.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:31:16 +02:00
nickviljoen
0fd6a35562 fix(diff_report): _fmt_value labels italic flips correctly
Previously every boolean attribute rendered as "Bold → Regular",
producing "Italic: Bold → Regular" for italic flips. Now the helper
takes the attribute name and emits "Italic → Regular" or
"Bold → Regular" depending on which boolean attribute is being shown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:22:39 +02:00
nickviljoen
7eaac85df3 feat(diff_report): render formatting_changes as a per-pair block
Adds a "🎨 Formatting changes" block to the per-page diff report
when the deterministic formatting layer finds typographic flips.
Distinguishes page-wide style shifts from local span flips, lists up
to three example quotes per aggregated finding, and HTML-escapes all
user-controlled strings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:08:47 +02:00
nickviljoen
2b1bb9ccf0 feat(diff_engine): merge formatting_diff findings into pair_diffs
run_page_pair_diff now invokes compute_formatting_diff alongside the
LLM call for each aligned pair. When the deterministic layer finds
typographic flips on a page the LLM saw as identical, the pair is
re-classified as having differences with medium severity. Each
aggregated finding contributes to the global medium-severity tally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:03:54 +02:00
nickviljoen
d21a8a276d refactor(formatting_diff): harden page_wide threshold + None-key handling
Three review-driven hardening tweaks:
- page_wide now requires ≥3 matched spans (PAGE_WIDE_MIN_SPANS).
  Avoids labelling section-break pages with a single flipped heading
  as page-wide.
- _collect_flips normalises bold/italic via bool() and font/color
  via "or ''" so callers passing dicts without those keys do not
  produce phantom flips against False/''.
- Adds tests for empty span lists and the missing-bold-key case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:01:23 +02:00
nickviljoen
98679e7329 feat(document_mode): add deterministic span formatting diff
New formatting_diff module compares span-level bold/italic/font/size/
color attributes between aligned page-pairs. Pure-Python; reads
PyMuPDF metadata already captured during ingest. Aggregates identical
flips into single findings and flags page-wide style shifts.

Powers the AXA document_diff fix for missed formatting changes that
the vision-LLM does not reliably detect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 09:56:34 +02:00
nickviljoen
f69e181520 feat(ingest): capture span color as #rrggbb string
Adds a 'color' field to each span dict extracted by
_extract_page_spans. Powers the upcoming deterministic
formatting-diff layer for AXA document_diff mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 09:45:21 +02:00
nickviljoen
71bb9a6295 fix(hp_copy_review): correct llm casing + route HP reports to /hp/ folder
Two bugs surfaced by the first dev smoke test:

1. Profile JSON declared "llm": "gemini" (lowercase). llm_config's
   dispatcher compares model_name == "Gemini" case-sensitively
   (matches the rest of the codebase), so the check fell through to
   "Invalid model selected" and never reached the API. Every other
   profile uses "Gemini" with capital G. Spec mistake — fixed.

2. get_client_from_profile() resolves the per-report output folder
   from the profile_id via hardcoded prefix matches. No 'hp_' branch
   existed, so hp_copy_review reports landed under output-dev/general/
   instead of output-dev/hp/ — the UI then couldn't find them. Added
   'hp_' → 'hp' alongside the existing mappings.

The check itself works correctly otherwise: profile_source was
user_selected, brand resolved to 'hp', and the reference asset was
successfully attached. Bug 1 just prevented Gemini from being called.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 22:07:25 +02:00
nickviljoen
68a2360811 feat(report): render hp_copy_review findings as a structured table
Both HTML report generators (generate_html_content and
generate_comprehensive_html_report) get a small case: when a check
result has a 'findings' array in its json_data, render it as a
priority-coloured table with quote/issue/suggested-fix/source
columns instead of the default response-text block. The summary
field (when present) renders above the table. Fallback to text
rendering when findings is absent — every existing check is
unaffected.

All string fields from the LLM are HTML-escaped via html.escape()
to neutralise stray <, >, &, or quote characters. Inline CSS for
.findings-table / .priority-pill / .priority-high|medium|low /
.muted is added to both stylesheets so the two generators stay
visually in sync.
2026-05-17 21:37:35 +02:00
nickviljoen
0e833447c0 fix(brand-guidelines): inject xlsx Source Messaging summary into check prompts
Task 5 review found that get_reference_asset_content treated all
non-localization-matrix .xlsx files as opaque ('reference file
uploaded'), never reading the Gemini summary that excel_processor
writes. That meant hp_copy_review would see no canonical messaging
and fire its score-0 fallback on every real asset.

Extend the .xlsx branch to mirror the PDF pattern: when the file
record has a summary_path (set by excel_processor after a
successful source-messaging summary), read and inject the Markdown
into the reference content block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:28:32 +02:00
nickviljoen
4c19a0fb9d feat(hp_copy_review): single-check LLM grader against Source Messaging
Single Gemini call per asset. Prompt assembles attached Source
Messaging summaries + media-plan language context + the asset image.
Returns structured JSON with score, summary, and a findings array
(priority, category, quote, issue, suggested fix, source reference).
Empty findings = clean asset; missing reference -> score 0 with a
clear message rather than running blind.

Mirrors the boots_tandc_wording pattern: subclass FlaskAppTemplate,
expose a static prompt template, let process_single_check inject
reference-asset content and media-plan context at runtime. A
standalone build_prompt() helper mirrors that assembly for unit-
style smoke tests and ad-hoc prompt inspection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:25:30 +02:00
nickviljoen
014a9cb8ff feat(hp): promote HP client + add hp_copy_review profile
HP is no longer a placeholder. The client gets a new hp_copy_review
profile (single weighted check, client-specific visibility) as its
default, plus the generic static_general and video_general profiles
it already had visibility into.
2026-05-17 21:08:18 +02:00
nickviljoen
568465f9be fix(brand-guidelines): preserve localization-matrix parsing in xlsx dispatch
The prior Task 2 commit (295305e) over-replaced existing logic that
recognised certain .xlsx/.xls uploads as localization matrices and
set asset_type='localization_matrix'. That field is load-bearing in
two downstream sites (api_server.py:1628 and :1986) that build
localization context for QC checks; destroying it would silently
break any existing client using localization matrices.

Restore the original try-localization-matrix-first path; only fall
through to excel_processor (HP Source Messaging summary) when the
file isn't a parseable localization matrix. Also restore .xls
support and tag Source Messaging uploads as
asset_type='source_messaging' so downstream code can distinguish
them from localization matrices.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:03:56 +02:00
nickviljoen
295305ef2d feat(brand-guidelines): route .xlsx uploads to excel_processor
The /api/brand_guidelines POST handler now dispatches by extension:
.pdf → pdf_processor.process_pdf_file (existing), .xlsx →
excel_processor.process_excel_file (new). Same DB record shape;
cover image is null for Excel since there's no first-page analogue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:02:05 +02:00
nickviljoen
c51e0729ce fix(excel-processor): wrap extraction in try/except to honour 'never raises'
Code review found that _extract_workbook_text was unwrapped — a
corrupt/locked .xlsx or InvalidFileException would leak out of
process_excel_file despite the docstring promising 'Never raises'.
Wrap the extraction call too; on extraction failure, write a
degraded summary explaining the failure and return cleanly.

Verified by passing a non-existent file: the function returns a
degraded summary instead of raising FileNotFoundError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 20:55:54 +02:00
nickviljoen
abd36a9abe fix(excel-processor): use literal trademark glyphs in summary prompt
Spec requires "™, ®, ©" in the Approved Brand and Product Names
section instructions; first pass wrote "TM, R, C" out of unfounded
caution about encoding. Python 3 source handles UTF-8 fine and
pdf_processor.py uses smart punctuation throughout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 20:52:38 +02:00
nickviljoen
ed46504ac6 feat(excel-processor): add openpyxl + Gemini summary pipeline for HP Source Messaging
Mirrors pdf_processor.py — public process_excel_file() reads any HP
Source Messaging Excel, extracts cells via openpyxl (skipping empty
rows, capped at 50K chars), and summarises into structured Markdown
via Gemini 2.5 Pro. Output saved as brand_guidelines/files/{file_id}_summary.md.

On Gemini failure the processor writes a degraded summary containing
the raw extraction so the reference asset stays usable. Test fixtures
(real HP Excels) live under backend/tests/fixtures/hp/ and are gitignored.
2026-05-17 20:49:50 +02:00
nickviljoen
1057c5660f chore(env): untrack legacy env files so deploys stop clobbering them
config.env, backend/config.env, config/development.env, and
config/production.env still contained real secrets and were getting
silently reverted by `git reset --hard` during deploys — manual
key-restore was required after both v1.3.0 and v1.3.1 to recover
the in-place GOOGLE_API_KEY rotation. Move them to .gitignore
alongside the already-untracked backend/config/*.env paths.

The next deploy after this lands will delete them from disk one
final time (because they were tracked in the prior commit). Same
backup/restore dance documented for the previous secrets-untrack
is needed for that single deploy; after it, the files are
permanently untracked.

This does NOT remove historical secrets from git history. Rotation
of OPENAI_API_KEY, BOX_CLIENT_SECRET, SECRET_KEY, SMTP_PASSWORD
remains a separate open follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 19:00:09 +02:00
nickviljoen
1c5dd980d4 perf(document-mode): parallelize per-page check dispatch in stages 3c/3d
A 4-page Boots PPack run (7 page-scoped checks) was taking ~15 min
because the dispatcher processed pages sequentially within each
check — 28 Gemini calls in a single file. Asset-mode's
ThreadPoolExecutor parallelism was bypassed because doc-mode called
process_checks_in_batches once per page in a loop.

Wrap the per-page dispatch in both Stage 3c (page_sample) and Stage
3d (page_each) with a ThreadPoolExecutor (max_workers=4). Extract
the per-page work into a single nested helper used by both stages,
which also tags each result with page_type so the existing artwork
vs informational aggregation in Stage 3d keeps working. Aggregation
logic, scoring, strict-grade override, and report shape are all
unchanged.

process_checks_in_batches is already reentrant (asset-mode uses it
under its own internal ThreadPoolExecutor), so concurrent calls are
safe. Progress-tracker writes intentionally tolerate races (visual
only). Per-page exceptions are caught inside the helper so one bad
page doesn't kill the doc — it just records a score-0 result.

Expected: 15 min → ~3-4 min on the same 4-page PDF. Needs wall-time
confirmation on dev with a real run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 18:14:27 +02:00
nickviljoen
a3b3f45f01 fix(deploy): use git's own -n limit instead of | head -20
When the deploy batch has more than 20 commits, the `git log ... | head -20`
pipeline closes the pipe after 20 lines. git log gets SIGPIPE (exit 141),
which `set -o pipefail` propagates, and `set -e` then exits the script
silently — no prompt shown, no error message.

Only bites for release-sized batches (>20 commits). First seen on the
v1.3.0 prod deploy: 20 commits displayed, then the script returned to
the shell without prompting. dev deploys never hit this because they
typically only have 1-3 commits ahead.

Fix: tell git to limit its own output via `-n 20`. Same display, no
broken pipe. Also swap the count-by-wc-l for `git rev-list --count`
which is more idiomatic and avoids any further pipe shenanigans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 15:25:38 +02:00
nickviljoen
31b059de79 docs: add Box client onboarding runbook
Documents the end-to-end process for adding a new client to the
Box-webhook-driven QC pipeline:

1. Box admin: create INCOMING + REPORTS folders, invite service account
2. Code: add box_folder_id / box_reports_folder_id / default_profile
   to client_config.py, ship via PR
3. Verify service account access with `box_setup.py list-folder`
4. Register webhook via `box_setup.py register-all-clients` (or UI)
5. End-to-end test by uploading a sample asset, watching logs,
   confirming report appears + source moves to _PROCESSED
6. Optional: tune default_profile from the Settings UI without a code
   deploy
7. Promote to prod (develop→main PR, tag, deploy.sh prod)

Includes a gotchas table for the issues most likely to come up:
403s from missing collaborator invites, signature verification
failures, folder ID mismatches, replace-upload behavior, etc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 14:12:48 +02:00
nickviljoen
bf89466d06 feat(settings): default-profile UI per client (admin-only) for Box webhook flow
Adds a "Default Profile" sub-tab to the Settings modal. Lists the
current client's profiles as radio buttons, shows which is the active
default and whether it's a runtime override or the static value from
client_config.py. Admins click a different profile + Set to override;
clear-override button reverts to the static value.

Storage layer: backend/client_defaults.json (gitignored, per-server),
following the same pattern as user_access.json. Resolution order in
client_config.get_default_profile(): override → static
default_profile field → None. The Box webhook handler is the sole
consumer that needs profile selection without a logged-in user; it
now reads via get_default_profile() so overrides take effect.

Why a separate JSON, not rewriting client_config.py: a buggy override
write can never break server boot — worst case the override is
ignored and the static value applies. Cleaner separation between
"static config you check in" and "runtime overrides admins make".

Backend:
- client_config.get_default_profile(client_id) — resolver
- client_config.set_default_profile(client_id, profile_id) — validates
  + writes (rejects profiles not in client's profile list)
- client_config.clear_default_profile_override(client_id)
- GET /api/clients/<id>/default_profile (any auth'd user)
- PUT /api/clients/<id>/default_profile (admin-only, _require_admin)
- DELETE /api/clients/<id>/default_profile (admin-only)
- Box webhook handler in api_server.py now uses get_default_profile()

Frontend:
- New "Default Profile" tab button + tab content in Settings modal
- showTab hook loads settings when tab activates
- loadDefaultProfileSettings / saveDefaultProfile /
  clearDefaultProfileOverride functions
- DOM-construction (createElement + textContent) used throughout —
  no innerHTML with interpolated values, so user-controllable
  strings (client_id, profile_id) can never cause XSS

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:50:20 +02:00
nickviljoen
b7e9c483de feat(box-jwt): move source file to _PROCESSED after successful run
Solves two problems at once:

1. Folder cleanliness — INCOMING accumulates indefinitely otherwise.
2. Duplicate-upload re-trigger — Box V2's FILE.UPLOADED trigger doesn't
   fire when the same filename is "uploaded as new version" of an
   existing file. By moving the source out of INCOMING after success,
   re-uploading the same filename becomes a genuinely-new file event
   again and the webhook fires normally.

After report uploads successfully to the REPORTS folder, the worker:
1. find_or_create_subfolder(<INCOMING>, '_PROCESSED') — idempotent
2. move_file(file_id, <_PROCESSED>, new_name=f'{session_id}_{filename}')

The session_id prefix gives the archived file a sortable timestamp and
ties it back to the matching QC_Report_<session_id>_*.html in REPORTS.

Defensive: the move only runs if the report upload to Box succeeded.
If Box delivery failed, the source stays in INCOMING so a retry just
means re-uploading. Move failures are non-fatal — logged + recorded
in result_data['box_source_move_error'], analysis still marked
complete.

Adds four helpers to box_jwt_client.py:
- find_subfolder_by_name(parent, name) → Optional[str]
- create_subfolder(parent, name) → str
- find_or_create_subfolder(parent, name) → str  (idempotent)
- move_file(file_id, target_folder, new_name=None) → Dict

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:29:45 +02:00
nickviljoen
c75f3a99b9 fix(reports): render check details for status='success' in generate_comprehensive_html_report
generate_comprehensive_html_report filtered check rendering with
`status == 'completed'`, but the modern check pipeline
(process_single_check via /api/start_analysis and the Phase 4 Box
webhook flow) returns `status == 'success'`. Only the legacy
process_single_check_with_triage returns 'completed'.

Result: every report produced by the modern pipeline had an empty
"Detailed Analysis Results" section — just the heading with nothing
below it. Surfaced when Nick ran a LOREAL Box-webhook test on
2026-05-17: webhook fired correctly, 4 LLM checks ran, scores came
back, technical pre-flight rendered, but the per-check accordion was
empty.

Fix: accept either status value, so both modern and legacy code paths
render correctly. Errored checks (status='error') still skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:01:21 +02:00
nickviljoen
4a9ddee87f feat(clients): wire LOREAL Box folders for webhook-driven QC
First client to use the Phase 4 unattended-QC pipeline. Adds three
optional fields to the loreal entry in client_config.py:

- box_folder_id=381501258415 (AI-QC > INCOMING > AI QC LOREAL IN)
- box_reports_folder_id=382076841334 (AI-QC > REPORTS > AI QC LOREAL REPORTS)
- default_profile=loreal_static

When a file lands in the INCOMING folder, /api/box/webhook will pick
it up, run loreal_static (strict-grade), and upload the HTML report
to the REPORTS folder. Other clients remain unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 09:50:40 +02:00
nickviljoen
a99c8601f0 Merge develop into feature/box-jwt-integration
Brings in the 4 commits that landed on develop after this branch was
cut: the chore/untrack-env-files PR (#7) and the
fix/tech-section-in-html-content PR (#8).

Conflict resolution:
- .gitignore: both branches added `backend/config/box_jwt_config.json`
  in slightly different positions. Kept both sets of additions —
  development.env + production.env (from develop) and
  box_jwt_config.json (from this branch).
- api_server.py: auto-merged cleanly; the Phase 4 webhook endpoint and
  the Phase 3 technical-section fix touch different regions of the file.

Verified after merge: api_server imports cleanly, box_webhook route
registered, _render_technical_section_html callable, 60 QC apps and
15 profiles load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 23:42:00 +02:00
nickviljoen
096eba747d fix(tech-check): also render Technical section in generate_html_content
Phase 3 patched generate_comprehensive_html_report() but missed the
older generate_html_content() generator. The /api/start_analysis flow
with output_mode='html' (the path the web UI's download button
actually triggers) routes through generate_html_content, so the
Technical Details section never appeared in user-downloaded reports
despite the technical_report data being present in the underlying
result_data.

Mirrors the Phase 3 treatment exactly: pre-builds technical_html via
_render_technical_section_html(), adds the .technical / .technical-grid
/ .tech-row CSS rules, and injects {technical_html} between the
summary block and the Detailed Analysis Results header.

generate_comprehensive_html_report() retains the same logic for the
/api/process_file path (line 4187) and the new Box webhook flow
(_run_box_triggered_analysis on the Phase 4 branch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 23:28:52 +02:00
nickviljoen
cfb13eb870 chore(secrets): untrack env files + add JWT path to .gitignore
backend/config/development.env and backend/config/production.env were
committed to the repo with real API keys, SMTP passwords, and Flask
SECRET_KEY values. This commit:

1. Adds both files to .gitignore so future edits stop landing in git.
2. git rm --cached's them (local copies preserved on disk, just
   untracked).
3. Also pre-emptively adds backend/config/box_jwt_config.json to
   .gitignore — Phase 4 already gitignores it on a separate branch, but
   listing it here protects the file regardless of merge order.
4. Updates backend/config/.env.template with the new Box JWT-related
   vars (BOX_JWT_CONFIG_PATH, BOX_WEBHOOK_PRIMARY_KEY,
   BOX_WEBHOOK_SECONDARY_KEY) so the template is a complete reference
   for setting up a new environment from scratch.

IMPORTANT — secrets still in git history after this commit. Removing
them from history requires a destructive rewrite (git filter-repo +
force-push every branch). Pragmatic alternative: rotate any secret
that was ever in the files. Candidates: OPENAI_API_KEY, BOX_CLIENT_SECRET,
SECRET_KEY, SMTP_PASSWORD. AZURE_TENANT_ID and AZURE_CLIENT_ID are
public-ish identifiers and don't need rotating. GOOGLE_API_KEY just
rotated this session.

DEPLOY GOTCHA: deploy.sh does git reset --hard, which will delete the
env files from /opt/ai_qc/backend/config/ on the server when this
commit lands. Back them up before deploying, restore after:

    sudo cp /opt/ai_qc/backend/config/development.env /tmp/dev.env.bak
    # ...deploy...
    sudo cp /tmp/dev.env.bak /opt/ai_qc/backend/config/development.env
    sudo systemctl restart ai-qc.service

Same dance on prod with production.env when promoting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 23:13:18 +02:00
nickviljoen
65848bcda1 feat(box-jwt): add box_setup.py bootstrap CLI for webhook management
One-off script used to register/inspect Box V2 webhooks against the
service account. Subcommands: list-webhooks, list-folder, list-clients,
create-webhook, delete-webhook, register-all-clients.

Typical bootstrap flow on a fresh deploy:
1. Drop box_jwt_config.json on the server (gitignored, scp'd in).
2. Verify the service account can read each client folder:
   `python backend/scripts/box_setup.py list-folder <folder_id>`
3. Once a client's box_folder_id is set in client_config.py, register
   its webhook idempotently:
   `python backend/scripts/box_setup.py register-all-clients \
       https://optical-dev.oliver.solutions/ai_qc/api/box/webhook`
4. Copy the signing keys from the Box Developer Console (Custom App →
   Webhooks) into BOX_WEBHOOK_PRIMARY_KEY / BOX_WEBHOOK_SECONDARY_KEY
   in the env file, then restart ai-qc.service.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 22:53:03 +02:00
nickviljoen
8f995d557b feat(box-jwt): JWT service-account client + webhook ingestion endpoint
Adds machine-to-machine Box integration alongside the existing per-user
OAuth scaffolding. The new JWT client (backend/box_jwt_client.py) is
the auth/file/webhook surface used for unattended workflows: load the
Custom App JSON config, sign a JWT assertion, exchange for a 60-minute
service-account access token (cached + refreshed automatically), and
expose file download/upload + V2 webhook CRUD + HMAC signature
verification.

Wires a new POST /api/box/webhook endpoint (NOT @auth.require_auth — it
authenticates each delivery via Box's HMAC signature headers) that:

1. Verifies the signature against env-configured signing keys
   (BOX_WEBHOOK_PRIMARY_KEY / BOX_WEBHOOK_SECONDARY_KEY).
2. Dedups deliveries by box-delivery-id with a bounded in-memory cache.
3. Maps the source folder to a client via a new
   get_client_by_box_folder() helper on client_config.
4. Spawns a background thread that downloads the file, runs the same
   technical pre-flight + LLM check pipeline as the user-uploaded path,
   writes the HTML report to output/<client>/, uploads the report back
   to the client's box_reports_folder_id, and logs the run with a
   synthetic 'box_webhook' user.

Webhook runs skip media-plan / localization / OCR context — those are
user-UI concepts without a meaningful source in unattended runs. The
existing /api/start_analysis path is unchanged.

client_config.py gains three optional per-client fields used by the new
flow when present: `box_folder_id`, `box_reports_folder_id`, and
`default_profile`. Existing client entries keep working without them.

.gitignore now excludes backend/config/box_jwt_config.json so the JWT
config (with its embedded private key + passphrase) never lands in git.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 22:51:34 +02:00
nickviljoen
377efe30e5 feat(tech-check): show Technical Details section in HTML report
Adds a new "Technical Details" card to generate_comprehensive_html_report()
between the summary and the per-check detailed results. Renders only
the fields present on the technical_report dict (file size, dimensions,
DPI, page count, duration, fonts, etc. — vary by file type) and shows
a prominent filename-vs-actual match badge when filename hints were
parsed.

If technical_report is absent or kind==unknown, the section is omitted
entirely so reports for assets we can't inspect (e.g. exotic
extensions) keep the existing layout unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 22:00:25 +02:00
nickviljoen
2b287f3dbb feat(tech-check): wire pre-flight into visual + document analysis
Runs technical_check.inspect() immediately after file save on both
/api/start_analysis (visual flow) and /api/document/start_analysis
(document flow). The report is stashed on progress_tracker[session_id]
so it survives across the background thread boundary, then surfaces
two ways:

1. Each LLM check in the visual flow gets a "Technical metadata"
   preamble prepended to its prompt via format_for_llm_prompt(), so the
   model knows the file's actual dimensions, format, page count, etc.
   without having to infer them visually.
2. result_data['technical_report'] in both flows carries the same dict
   through to the frontend for UI rendering (next commit).

Pre-flight is best-effort: if it fails for any reason, analysis still
proceeds without the preamble (silent except for the report.errors
list).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 21:57:11 +02:00
nickviljoen
f4a95914b5 feat(tech-check): add machine-side pre-flight inspection module
New backend/technical_check.py extracts technical metadata from
uploaded assets via PIL (images), PyMuPDF (PDFs), and ffprobe (videos)
— no LLM, runs in milliseconds. Also opportunistically parses
dimension hints from the filename and compares them to the actual
file, returning a match/mismatch verdict.

Output is a JSON-serializable dict; format_for_llm_prompt() renders it
as a tight Markdown block that downstream prompts can prepend. Module
never raises — inspection errors land in `errors` so partial reports
still surface.

Standalone for this commit. Wiring into the upload flow and UI lands
in subsequent commits on this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 21:53:06 +02:00
Nick Viljoen
5d1eab493c Merged in feature/add-demo-clients (pull request #4)
Feature/add demo clients
2026-05-14 19:34:38 +00:00
nickviljoen
93dc030e0c feat(clients): add Google, HP, Ferrero as demo placeholders
Three new clients in demo/eval phase. Each uses Honda-style minimal
setup (static_general + video_general only) until real scope and test
assets arrive. Descriptions are placeholders to be replaced once scope
is confirmed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 21:30:18 +02:00
nickviljoen
d1826d83f1 chore(dow-jones): remove client_config entry
Drops the 'dow_jones' block from CLIENT_PROFILES. After this, the
client picker no longer renders Dow Jones; the four archived profiles
are unreachable from user flows. Nine clients remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 21:12:47 +02:00
nickviljoen
b23b7f2e17 chore(dow-jones): archive profiles, checks, and per-client doc
Moves the Dow Jones / MarketWatch / WSJ profile JSONs (4), check apps
(22), and CLAUDE_DOW_JONES.md into backend/_archive/dow_jones/. All
moves use git mv so history follows. Adds a restore-instructions
README. No loader changes needed — the archive lives outside the
scanned directories.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 21:11:54 +02:00
nickviljoen
a1cfc75309 Merge remote-tracking branch 'origin/develop' into feature/axa-accessibility-profile-split
# Conflicts:
#	CLAUDE_AXA.md
2026-05-10 11:20:09 +02:00
nickviljoen
a46ba9fc71 Split AXA accessibility check into its own profile
Removed axa_pdf_accessibility from axa_policy_document (was 8 checks, now 7)
and created a new axa_accessibility profile that contains only that check.
Marked the new profile strict_grade: true so a single PDF/UA-1 rule failure
forces an unmistakable Fail badge on the report — mirrors how axes4 PAC is
used in practice (single-purpose, binary verdict).

Lets users run accessibility-only QC without sitting through the rest of
the policy-document checks, and removes weight from the policy-document
score that the accessibility check wasn't really earning (its 0/10 verdict
was dragging the overall grade in a way that obscured the content checks).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 11:15:46 +02:00
nickviljoen
2aeff24136 Wire veraPDF into axa_pdf_accessibility for PAC-equivalent PDF/UA-1 validation
AXA's accessibility QC team uses axes4 PAC (PDF/UA-1 / Matterhorn Protocol)
as their compliance gate, but our existing 9-criterion deterministic check
runs surface-level only and would pass documents PAC fails. Wired up the
existing _run_verapdf() stub so veraPDF — the open-source Matterhorn
implementation — runs as a subprocess and drives the score when available.

Verified locally: veraPDF on EAA_v1.pdf reports the exact same Content (86)
and Metadata (1) failure counts as PAC's report on the same document family,
confirming protocol parity.

Falls back cleanly to the deterministic layer when veraPDF isn't installed,
so deploys are safe before the binary lands on dev/prod servers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 10:36:03 +02:00
nickviljoen
59a0b2408c Restructure CLAUDE.md docs: slim project-wide root, complete per-client coverage
Splits the monolithic CLAUDE.md (962 lines) into a slim project-wide root (211 lines)
plus per-client files. Auto-loaded context drops ~88% per session.

Changes:
- CLAUDE.md slimmed to project-wide essentials (architecture, auth, deployment, branch
  strategy, deploy scripts, prod troubleshooting, pre-session checklist). Adds explicit
  session-start convention pointing to CLAUDE_<CLIENT>.md for client-specific work.
  Updates client roster table to all 10 clients with profile counts.
- New CLAUDE_AXA.md: document-mode pipeline + axa_policy_document profiles
- New CLAUDE_DIAGEO.md: key_visual + packaging profiles, check inventories
- New CLAUDE_UNILEVER.md: profiles + zero-score logic for face/new visibility
- New CLAUDE_HONDA.md, CLAUDE_RANK.md, CLAUDE_GENERAL.md: stubs (clients use generic
  profiles only — kept for completeness and future expansion)
- backend/CLAUDE.md: stale 932-line duplicate replaced with 18-line redirect to root
  + backend-specific quick pointers

Per-client files (CLAUDE_LOREAL.md, CLAUDE_AMAZON.md, CLAUDE_BOOTS.md,
CLAUDE_DOW_JONES.md) unchanged — already had the right content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:29:16 +02:00
nickviljoen
f5aaf8da24 Merge feature/dow-jones-tuning into develop: WSJ Static prompt tuning 2026-05-06 12:03:56 +02:00
nickviljoen
3b76bf2c9c Tune WSJ Static prompts: cap whitelist, graphic headline, split-layout logo, 30% sizing cap
- wsj_capitalization_punctuation: explicit complete-sentence whitelist + soft-flag pattern for Rule 5 price formatting (price_spacing_correct / price_bolded_correct accept needs_manual_check, new price_formatting_caveat field)
- wsj_typography_hierarchy: graphic/illustrative headline awareness — large stylised serif price/number graphics are recognised as the display headline; surrounding sans-serif copy is correctly classified as subhead/body. Stylised price headlines exempt from the period rule.
- wsj_logo_compliance: horizontal logo placement allows anchoring to the copy block on split/asymmetric layouts; mandatory sizing assessment block with worked examples, score capped at 6/10 for logos exceeding 30% of longest side.

Validated on 3 WSJ-NY test assets across 3 iterations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:01:59 +02:00
nickviljoen
cec11f1f6a Tune Boots PPack prompts: superscript guard, ALL CAPS / logotype exceptions, weight/sizing limits
Three rounds of prompt tuning against the Remington (4p), Easter Overlay
(18p), and Grenade (7p) sample packs. Easter Overlay (the noisiest)
climbed 72.38 → 78.97 → 80.04 across iterations, with strict-grade
violations dropping 27 → 18 → 14. Remaining violations are now genuine
compliance issues — the noise patterns are cleared.

boots_caveat_compliance:
- Superscript guard: vision LLM was flagging every roundel asterisk as
  superscript because the * glyph naturally sits high in its line.
  Strict two-feature rule now required (raised baseline AND visibly
  shrunk ~50-60% of body). Borderline cases → "needs_manual_check"
  with new superscript_caveat field. Caveat avg 4.4 → 7.27.
- Same vision-LLM caveat applied to weight_matching (Light vs Regular
  at small sizes is below detection threshold) and sizing_compliant
  (1-2pt size differences below detection threshold). New weight_caveat
  and sizing_caveat fields. Reserved 1-2 score band for unambiguous
  critical violations only.
- Explicit scoring principle: "when in doubt, prefer 7-8 with
  manual_check flags over a lower confident-violation score".

boots_brand_name_accuracy:
- ALL CAPS retail convention now explicitly acceptable. L'OREAL,
  ESTEE LAUDER, MAYBELLINE etc. no longer flagged as casing errors —
  only structural element mismatches (accents, hyphens, apostrophes,
  special chars) count.
- Stylised brand logotype exception: known logomarks like `17` for
  SEVENTEEN, &SISTERS ampersand styling, e.l.f. dot rendering are
  Pass — surfaced via new logotype_observations field.
- Brand name avg 5.53 → 7.47 → 6.67 (LLM run-to-run variability).

Strongest real catch in dataset: Easter Overlay page 14 is labelled
for the ROI market in production notes but uses £ instead of € on
the artwork. Exactly the pre-press error worth surfacing. Caught
consistently across all runs by boots_currency_locale.

CLAUDE_BOOTS.md updated with three-pack smoke-test table, vision-LLM
limitations summary, and the four reusable prompt-tuning patterns
that worked on this build.

Local-only — feature/boots-ppack remains unmerged until after Boots
show-and-tell.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 16:26:11 +02:00
nickviljoen
50d0063b37 Add Boots Production Pack profile (multi-page document mode)
New profile boots_ppack for QCing multi-page Boots production packs
(PowerPoint-exported PDFs, 4-18 pages each). Built on top of AXA's
document-mode infrastructure — branched off feature/axa-document-mode
because it reuses the dispatcher, ingest, and result writer.

New checks:
- boots_logo_compliance — three-path scoring (master wordmark / partner
  lock-up / no branding) so OLIVER x BOOTS-style footer lock-ups aren't
  scored against master wordmark rules. Conservative without a formal
  Boots logo guideline.
- boots_colour_palette — verifies CMYK/RGB/Hex spec values on creative-
  guidance pages against canonical Boots Blue / Health Primary Blue /
  Offer Red, plus visual sanity-check on artwork pages.

Existing checks tuned:
- boots_brand_name_accuracy: closed-world list semantics. Brands not on
  the approved list now go to names_not_on_list (manual review) instead
  of failing — the list is sourced from the original 7 docs and is known
  incomplete (Remington, Imodium, Maybelline etc. are legitimate Boots-
  stocked brands not on it).
- boots_tandc_wording: explicit font-weight caveat — Boots Sharp Regular
  vs Light isn't reliably distinguishable by vision LLM at small sizes.
  Surfaced via font_weight_caveat field + needs_manual_check value.

Page classifier (document_mode/page_classifier.py):
Heuristic tags each page as cover / checklist / palette / notes /
artwork. Validated on all 10 sample packs.

Strict-grade exemption (Profile.strict_grade flag):
Only artwork-classified pages count towards Pass/Fail. Cover, checklist,
palette, and notes pages are still QC'd and reported as Informational
but cannot trigger a Fail. Banner shows exactly which artwork-page
checks fell below 6.

Result writer extended:
- Per-page table with score + page_type pill for any page_each-scope
  check (auto-applied as fallback)
- Strict-grade banner (red on violation, green when clean)
- Page_type pills throughout the per-page strip

Smoke-test result (Remington 4-page pack, 2026-05-05):
Overall 70.75/100, strict-grade Fail. After two iterations of prompt
tuning, all three remaining strict-grade violations are real catches:
orphan asterisk in T&Cs, "they may not be stocked" wording deviation,
missing "Charges may apply". brand_name_accuracy 7.0 (was 3.0 before
list fix), logo_compliance 9.5 (was 1.5 before lock-up path fix).

Local-only — not pushed to dev or merged to develop until after Boots
show-and-tell. Same posture as feature/axa-document-mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:47:13 +02:00
nickviljoen
90563b8cf2 Add AXA document-mode QC pipeline (Phases 1, 3, 4, 5)
Multi-page PDF QC for AXA Ireland policy documents. Runs as a third mode
alongside static + video, gated on profile.mode. New code isolated under
backend/document_mode/ with new endpoints under /api/document/*.

Phase 1 — Spine + 6 deterministic doc-scope checks ($0, runs in seconds):
- Scope-aware dispatcher (document/targeted/page_sample/page_pair/page_each)
- axa_font_inventory, axa_phone_inventory, axa_bold_words_definitions,
  axa_page_numbering, axa_print_code, axa_omg_versioning
- Bootstrap bold-words dictionary extracted from Example 1 General Definitions

Phase 3 — Old-vs-new diff (~$0.50/run, 3-5 min):
- Page alignment via difflib SequenceMatcher (windowed fuzzy match)
- Vision-LLM page-pair diff via Gemini 2.5 Pro (8 concurrent)
- Two-slot upload UX, axa_policy_document_diff profile, mode=document_diff

Phase 4 — PDF accessibility (PyMuPDF, $0):
- 9 PDF/UA-1 aligned criteria (tagged structure, /MarkInfo, title, /Lang,
  encryption, font embedding, PDF version, XMP UA-conformance, alt-text)
- _run_verapdf() stub for optional Java-based veraPDF integration later

Phase 5 — Print preflight (PyMuPDF, $0):
- 7 criteria (page geometry, bleed, image colour spaces, image DPI,
  transparency, PDF/X conformance, spot colours)

Profile additions:
- axa_policy_document — 8 deterministic checks, $0 cost
- axa_policy_document_diff — 1 page-pair LLM check, ~$0.50/run

API additions:
- POST /api/document/start_analysis (single PDF)
- POST /api/document/start_diff (old + new PDFs)

Frontend additions:
- Third profile.mode value (document_diff) in applyProfileMode()
- Two-slot upload UX with PDF-only file pickers
- checkFormValidity() branches by mode for the analyse-button gate

Smoke-tested locally against Example 1 (Home Insurance V8, 86pp) and
Example 2 (Landlord V1 vs V10, 68→74pp) with real findings caught
including bold-words gaps, missing PDF/UA flag, transparency on press,
V1→V10 bold-formatting fixes. Plan + integration map + gotchas in
backend/AXA_DOCUMENT_MODE_PLAN.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 18:38:14 +02:00
nickviljoen
67ed7fdd9d Add wsj podcast profile to Dow Jones client, File naming check added to all profiles 2026-04-29 18:17:36 +02:00
nickviljoen
b32e8f0c8b Add wsj podcast profile to Dow Jones client, File naming check added to all profiles 2026-04-29 18:09:58 +02:00
nickviljoen
24c716df77 Fix /api/access_request iterating list_access_entries() as a list
list_access_entries() returns a dict {default_clients, entries} but the
endpoint iterated it directly, which yields the dict keys (strings) and
then crashed on .get('is_admin') with "'str' object has no attribute
'get'". Read access_data['entries'] instead so admin recipients are
collected correctly and the request email actually sends.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 08:24:22 +02:00