ai_qc

Author	SHA1	Message	Date
nickviljoen	4c19a0fb9d	feat(hp_copy_review): single-check LLM grader against Source Messaging Single Gemini call per asset. Prompt assembles attached Source Messaging summaries + media-plan language context + the asset image. Returns structured JSON with score, summary, and a findings array (priority, category, quote, issue, suggested fix, source reference). Empty findings = clean asset; missing reference -> score 0 with a clear message rather than running blind. Mirrors the boots_tandc_wording pattern: subclass FlaskAppTemplate, expose a static prompt template, let process_single_check inject reference-asset content and media-plan context at runtime. A standalone build_prompt() helper mirrors that assembly for unit- style smoke tests and ad-hoc prompt inspection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 21:25:30 +02:00
nickviljoen	014a9cb8ff	feat(hp): promote HP client + add hp_copy_review profile HP is no longer a placeholder. The client gets a new hp_copy_review profile (single weighted check, client-specific visibility) as its default, plus the generic static_general and video_general profiles it already had visibility into.	2026-05-17 21:08:18 +02:00
nickviljoen	568465f9be	fix(brand-guidelines): preserve localization-matrix parsing in xlsx dispatch The prior Task 2 commit (`295305e`) over-replaced existing logic that recognised certain .xlsx/.xls uploads as localization matrices and set asset_type='localization_matrix'. That field is load-bearing in two downstream sites (api_server.py:1628 and :1986) that build localization context for QC checks; destroying it would silently break any existing client using localization matrices. Restore the original try-localization-matrix-first path; only fall through to excel_processor (HP Source Messaging summary) when the file isn't a parseable localization matrix. Also restore .xls support and tag Source Messaging uploads as asset_type='source_messaging' so downstream code can distinguish them from localization matrices. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 21:03:56 +02:00
nickviljoen	295305ef2d	feat(brand-guidelines): route .xlsx uploads to excel_processor The /api/brand_guidelines POST handler now dispatches by extension: .pdf → pdf_processor.process_pdf_file (existing), .xlsx → excel_processor.process_excel_file (new). Same DB record shape; cover image is null for Excel since there's no first-page analogue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 21:02:05 +02:00
nickviljoen	c51e0729ce	fix(excel-processor): wrap extraction in try/except to honour 'never raises' Code review found that _extract_workbook_text was unwrapped — a corrupt/locked .xlsx or InvalidFileException would leak out of process_excel_file despite the docstring promising 'Never raises'. Wrap the extraction call too; on extraction failure, write a degraded summary explaining the failure and return cleanly. Verified by passing a non-existent file: the function returns a degraded summary instead of raising FileNotFoundError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 20:55:54 +02:00
nickviljoen	abd36a9abe	fix(excel-processor): use literal trademark glyphs in summary prompt Spec requires "™, ®, ©" in the Approved Brand and Product Names section instructions; first pass wrote "TM, R, C" out of unfounded caution about encoding. Python 3 source handles UTF-8 fine and pdf_processor.py uses smart punctuation throughout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 20:52:38 +02:00
nickviljoen	ed46504ac6	feat(excel-processor): add openpyxl + Gemini summary pipeline for HP Source Messaging Mirrors pdf_processor.py — public process_excel_file() reads any HP Source Messaging Excel, extracts cells via openpyxl (skipping empty rows, capped at 50K chars), and summarises into structured Markdown via Gemini 2.5 Pro. Output saved as brand_guidelines/files/{file_id}_summary.md. On Gemini failure the processor writes a degraded summary containing the raw extraction so the reference asset stays usable. Test fixtures (real HP Excels) live under backend/tests/fixtures/hp/ and are gitignored.	2026-05-17 20:49:50 +02:00
nickviljoen	7d178f11ee	docs(plan): HP onboarding cycle 1 implementation plan 7-task plan against 2026-05-17-hp-cycle-1-onboarding-design.md: excel_processor → .xlsx dispatch → media-plan language field → HP client+profile → hp_copy_review check → findings-table renderer → dev smoke + deploy. Lightweight verification posture (py_compile + imports + profile load + python3 -c mini-tests + dev smoke runs) to match the project's existing style — no pytest scaffolding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 20:32:56 +02:00
nickviljoen	53ba67c2c0	docs(spec): HP onboarding cycle 1 — hp_copy_review check Captures the brainstorm outcome for migrating HP off the deprecated hp-copy PHP/Make.com POC onto AI QC. Cycle 1 of 3 in HP onboarding (cycles 2 = Word/PPT processor, 3 = Box picker — both independent and shipped later). Locks the four design decisions reached during the brainstorm: - User selects the canonical Source Messaging reference asset at QC-run time (matches existing brand-guidelines UX) - Single hp_copy_review check, single Gemini call per asset, structured findings JSON output matching the Messi Copy Review document format - Excel processor mirrors pdf_processor.py: openpyxl extracts raw cell content, Gemini summarises into structured Markdown, saved as {file_id}_summary.md alongside the file - Media-plan `language` field is free-form text, included in the check prompt when present, omitted gracefully when absent No code yet — pick up with the writing-plans skill to draft the implementation plan against this spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 20:17:43 +02:00
Nick Viljoen	fc588c626d	Merged in feature/untrack-tracked-env-files (pull request #19 ) chore(env): untrack legacy env files so deploys stop clobbering them	2026-05-17 17:01:32 +00:00
nickviljoen	1057c5660f	chore(env): untrack legacy env files so deploys stop clobbering them config.env, backend/config.env, config/development.env, and config/production.env still contained real secrets and were getting silently reverted by `git reset --hard` during deploys — manual key-restore was required after both v1.3.0 and v1.3.1 to recover the in-place GOOGLE_API_KEY rotation. Move them to .gitignore alongside the already-untracked backend/config/*.env paths. The next deploy after this lands will delete them from disk one final time (because they were tracked in the prior commit). Same backup/restore dance documented for the previous secrets-untrack is needed for that single deploy; after it, the files are permanently untracked. This does NOT remove historical secrets from git history. Rotation of OPENAI_API_KEY, BOX_CLIENT_SECRET, SECRET_KEY, SMTP_PASSWORD remains a separate open follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 19:00:09 +02:00
Nick Viljoen	8e2d970d61	Merged in feature/boots-ppack-page-parallelism (pull request #17 ) perf(document-mode): parallelize per-page check dispatch in stages 3c/3d	2026-05-17 16:15:52 +00:00
nickviljoen	1c5dd980d4	perf(document-mode): parallelize per-page check dispatch in stages 3c/3d A 4-page Boots PPack run (7 page-scoped checks) was taking ~15 min because the dispatcher processed pages sequentially within each check — 28 Gemini calls in a single file. Asset-mode's ThreadPoolExecutor parallelism was bypassed because doc-mode called process_checks_in_batches once per page in a loop. Wrap the per-page dispatch in both Stage 3c (page_sample) and Stage 3d (page_each) with a ThreadPoolExecutor (max_workers=4). Extract the per-page work into a single nested helper used by both stages, which also tags each result with page_type so the existing artwork vs informational aggregation in Stage 3d keeps working. Aggregation logic, scoring, strict-grade override, and report shape are all unchanged. process_checks_in_batches is already reentrant (asset-mode uses it under its own internal ThreadPoolExecutor), so concurrent calls are safe. Progress-tracker writes intentionally tolerate races (visual only). Per-page exceptions are caught inside the helper so one bad page doesn't kill the doc — it just records a score-0 result. Expected: 15 min → ~3-4 min on the same 4-page PDF. Needs wall-time confirmation on dev with a real run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 18:14:27 +02:00
nickviljoen	8e50413b53	docs(spec): Phase 5 cycle 1 — Postgres database design Captures the brainstorm outcome for adding a Postgres database alongside the existing JSONL usage logs, ahead of the dashboard work. Decomposes Phase 5 into three independent cycles (DB first, then Docker, then dashboard) and locks the schema, transition strategy (dual-write), hosting (Docker on each VM), backup approach (pg_dump → GCS), and rollback escape hatch. No code changes yet — pick up with the writing-plans skill when returning to Phase 5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 17:02:03 +02:00
Nick Viljoen	86dec44124	Merged in fix/deploy-sigpipe-when-many-commits (pull request #16 ) fix(deploy): use git's own -n limit instead of \| head -20	2026-05-17 13:26:41 +00:00
nickviljoen	a3b3f45f01	fix(deploy): use git's own -n limit instead of \| head -20 When the deploy batch has more than 20 commits, the `git log ... \| head -20` pipeline closes the pipe after 20 lines. git log gets SIGPIPE (exit 141), which `set -o pipefail` propagates, and `set -e` then exits the script silently — no prompt shown, no error message. Only bites for release-sized batches (>20 commits). First seen on the v1.3.0 prod deploy: 20 commits displayed, then the script returned to the shell without prompting. dev deploys never hit this because they typically only have 1-3 commits ahead. Fix: tell git to limit its own output via `-n 20`. Same display, no broken pipe. Also swap the count-by-wc-l for `git rev-list --count` which is more idiomatic and avoids any further pipe shenanigans. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 15:25:38 +02:00
Nick Viljoen	e25006039f	Merged in docs/box-client-onboarding-runbook (pull request #14 ) docs: add Box client onboarding runbook	2026-05-17 12:13:53 +00:00
nickviljoen	31b059de79	docs: add Box client onboarding runbook Documents the end-to-end process for adding a new client to the Box-webhook-driven QC pipeline: 1. Box admin: create INCOMING + REPORTS folders, invite service account 2. Code: add box_folder_id / box_reports_folder_id / default_profile to client_config.py, ship via PR 3. Verify service account access with `box_setup.py list-folder` 4. Register webhook via `box_setup.py register-all-clients` (or UI) 5. End-to-end test by uploading a sample asset, watching logs, confirming report appears + source moves to _PROCESSED 6. Optional: tune default_profile from the Settings UI without a code deploy 7. Promote to prod (develop→main PR, tag, deploy.sh prod) Includes a gotchas table for the issues most likely to come up: 403s from missing collaborator invites, signature verification failures, folder ID mismatches, replace-upload behavior, etc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 14:12:48 +02:00
Nick Viljoen	432162f167	Merged in feature/default-profile-ui (pull request #13 ) feat(settings): default-profile UI per client (admin-only) for Box webhook flow	2026-05-17 11:51:50 +00:00
nickviljoen	bf89466d06	feat(settings): default-profile UI per client (admin-only) for Box webhook flow Adds a "Default Profile" sub-tab to the Settings modal. Lists the current client's profiles as radio buttons, shows which is the active default and whether it's a runtime override or the static value from client_config.py. Admins click a different profile + Set to override; clear-override button reverts to the static value. Storage layer: backend/client_defaults.json (gitignored, per-server), following the same pattern as user_access.json. Resolution order in client_config.get_default_profile(): override → static default_profile field → None. The Box webhook handler is the sole consumer that needs profile selection without a logged-in user; it now reads via get_default_profile() so overrides take effect. Why a separate JSON, not rewriting client_config.py: a buggy override write can never break server boot — worst case the override is ignored and the static value applies. Cleaner separation between "static config you check in" and "runtime overrides admins make". Backend: - client_config.get_default_profile(client_id) — resolver - client_config.set_default_profile(client_id, profile_id) — validates + writes (rejects profiles not in client's profile list) - client_config.clear_default_profile_override(client_id) - GET /api/clients/<id>/default_profile (any auth'd user) - PUT /api/clients/<id>/default_profile (admin-only, _require_admin) - DELETE /api/clients/<id>/default_profile (admin-only) - Box webhook handler in api_server.py now uses get_default_profile() Frontend: - New "Default Profile" tab button + tab content in Settings modal - showTab hook loads settings when tab activates - loadDefaultProfileSettings / saveDefaultProfile / clearDefaultProfileOverride functions - DOM-construction (createElement + textContent) used throughout — no innerHTML with interpolated values, so user-controllable strings (client_id, profile_id) can never cause XSS Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 13:50:20 +02:00
Nick Viljoen	be00f24416	Merged in feature/box-processed-folder-move (pull request #12 ) feat(box-jwt): move source file to _PROCESSED after successful run	2026-05-17 11:31:01 +00:00
nickviljoen	b7e9c483de	feat(box-jwt): move source file to _PROCESSED after successful run Solves two problems at once: 1. Folder cleanliness — INCOMING accumulates indefinitely otherwise. 2. Duplicate-upload re-trigger — Box V2's FILE.UPLOADED trigger doesn't fire when the same filename is "uploaded as new version" of an existing file. By moving the source out of INCOMING after success, re-uploading the same filename becomes a genuinely-new file event again and the webhook fires normally. After report uploads successfully to the REPORTS folder, the worker: 1. find_or_create_subfolder(<INCOMING>, '_PROCESSED') — idempotent 2. move_file(file_id, <_PROCESSED>, new_name=f'{session_id}_{filename}') The session_id prefix gives the archived file a sortable timestamp and ties it back to the matching QC_Report_<session_id>_*.html in REPORTS. Defensive: the move only runs if the report upload to Box succeeded. If Box delivery failed, the source stays in INCOMING so a retry just means re-uploading. Move failures are non-fatal — logged + recorded in result_data['box_source_move_error'], analysis still marked complete. Adds four helpers to box_jwt_client.py: - find_subfolder_by_name(parent, name) → Optional[str] - create_subfolder(parent, name) → str - find_or_create_subfolder(parent, name) → str (idempotent) - move_file(file_id, target_folder, new_name=None) → Dict Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 13:29:45 +02:00
Nick Viljoen	4d08a23322	Merged in fix/comprehensive-report-status-filter (pull request #11 ) fix(reports): render check details for status='success' in generate_comprehensive_html_report	2026-05-17 11:05:25 +00:00
nickviljoen	c75f3a99b9	fix(reports): render check details for status='success' in generate_comprehensive_html_report generate_comprehensive_html_report filtered check rendering with `status == 'completed'`, but the modern check pipeline (process_single_check via /api/start_analysis and the Phase 4 Box webhook flow) returns `status == 'success'`. Only the legacy process_single_check_with_triage returns 'completed'. Result: every report produced by the modern pipeline had an empty "Detailed Analysis Results" section — just the heading with nothing below it. Surfaced when Nick ran a LOREAL Box-webhook test on 2026-05-17: webhook fired correctly, 4 LLM checks ran, scores came back, technical pre-flight rendered, but the per-check accordion was empty. Fix: accept either status value, so both modern and legacy code paths render correctly. Errored checks (status='error') still skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 13:01:21 +02:00
Nick Viljoen	57ce396860	Merged in feature/loreal-box-folders (pull request #10 ) feat(clients): wire LOREAL Box folders for webhook-driven QC	2026-05-15 07:51:39 +00:00
nickviljoen	4a9ddee87f	feat(clients): wire LOREAL Box folders for webhook-driven QC First client to use the Phase 4 unattended-QC pipeline. Adds three optional fields to the loreal entry in client_config.py: - box_folder_id=381501258415 (AI-QC > INCOMING > AI QC LOREAL IN) - box_reports_folder_id=382076841334 (AI-QC > REPORTS > AI QC LOREAL REPORTS) - default_profile=loreal_static When a file lands in the INCOMING folder, /api/box/webhook will pick it up, run loreal_static (strict-grade), and upload the HTML report to the REPORTS folder. Other clients remain unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 09:50:40 +02:00
Nick Viljoen	1c8e1ea1a7	Merged in feature/box-jwt-integration (pull request #9 ) Feature/box jwt integration	2026-05-14 21:42:43 +00:00
nickviljoen	a99c8601f0	Merge develop into feature/box-jwt-integration Brings in the 4 commits that landed on develop after this branch was cut: the chore/untrack-env-files PR (#7) and the fix/tech-section-in-html-content PR (#8). Conflict resolution: - .gitignore: both branches added `backend/config/box_jwt_config.json` in slightly different positions. Kept both sets of additions — development.env + production.env (from develop) and box_jwt_config.json (from this branch). - api_server.py: auto-merged cleanly; the Phase 4 webhook endpoint and the Phase 3 technical-section fix touch different regions of the file. Verified after merge: api_server imports cleanly, box_webhook route registered, _render_technical_section_html callable, 60 QC apps and 15 profiles load. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 23:42:00 +02:00
Nick Viljoen	c99b8b7770	Merged in fix/tech-section-in-html-content (pull request #8 ) fix(tech-check): also render Technical section in generate_html_content	2026-05-14 21:29:37 +00:00
nickviljoen	096eba747d	fix(tech-check): also render Technical section in generate_html_content Phase 3 patched generate_comprehensive_html_report() but missed the older generate_html_content() generator. The /api/start_analysis flow with output_mode='html' (the path the web UI's download button actually triggers) routes through generate_html_content, so the Technical Details section never appeared in user-downloaded reports despite the technical_report data being present in the underlying result_data. Mirrors the Phase 3 treatment exactly: pre-builds technical_html via _render_technical_section_html(), adds the .technical / .technical-grid / .tech-row CSS rules, and injects {technical_html} between the summary block and the Detailed Analysis Results header. generate_comprehensive_html_report() retains the same logic for the /api/process_file path (line 4187) and the new Box webhook flow (_run_box_triggered_analysis on the Phase 4 branch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 23:28:52 +02:00
Nick Viljoen	33278e4f62	Merged in chore/untrack-env-files (pull request #7 ) chore(secrets): untrack env files + add JWT path to .gitignore	2026-05-14 21:17:47 +00:00
nickviljoen	cfb13eb870	chore(secrets): untrack env files + add JWT path to .gitignore backend/config/development.env and backend/config/production.env were committed to the repo with real API keys, SMTP passwords, and Flask SECRET_KEY values. This commit: 1. Adds both files to .gitignore so future edits stop landing in git. 2. git rm --cached's them (local copies preserved on disk, just untracked). 3. Also pre-emptively adds backend/config/box_jwt_config.json to .gitignore — Phase 4 already gitignores it on a separate branch, but listing it here protects the file regardless of merge order. 4. Updates backend/config/.env.template with the new Box JWT-related vars (BOX_JWT_CONFIG_PATH, BOX_WEBHOOK_PRIMARY_KEY, BOX_WEBHOOK_SECONDARY_KEY) so the template is a complete reference for setting up a new environment from scratch. IMPORTANT — secrets still in git history after this commit. Removing them from history requires a destructive rewrite (git filter-repo + force-push every branch). Pragmatic alternative: rotate any secret that was ever in the files. Candidates: OPENAI_API_KEY, BOX_CLIENT_SECRET, SECRET_KEY, SMTP_PASSWORD. AZURE_TENANT_ID and AZURE_CLIENT_ID are public-ish identifiers and don't need rotating. GOOGLE_API_KEY just rotated this session. DEPLOY GOTCHA: deploy.sh does git reset --hard, which will delete the env files from /opt/ai_qc/backend/config/ on the server when this commit lands. Back them up before deploying, restore after: sudo cp /opt/ai_qc/backend/config/development.env /tmp/dev.env.bak # ...deploy... sudo cp /tmp/dev.env.bak /opt/ai_qc/backend/config/development.env sudo systemctl restart ai-qc.service Same dance on prod with production.env when promoting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 23:13:18 +02:00
nickviljoen	65848bcda1	feat(box-jwt): add box_setup.py bootstrap CLI for webhook management One-off script used to register/inspect Box V2 webhooks against the service account. Subcommands: list-webhooks, list-folder, list-clients, create-webhook, delete-webhook, register-all-clients. Typical bootstrap flow on a fresh deploy: 1. Drop box_jwt_config.json on the server (gitignored, scp'd in). 2. Verify the service account can read each client folder: `python backend/scripts/box_setup.py list-folder <folder_id>` 3. Once a client's box_folder_id is set in client_config.py, register its webhook idempotently: `python backend/scripts/box_setup.py register-all-clients \ https://optical-dev.oliver.solutions/ai_qc/api/box/webhook` 4. Copy the signing keys from the Box Developer Console (Custom App → Webhooks) into BOX_WEBHOOK_PRIMARY_KEY / BOX_WEBHOOK_SECONDARY_KEY in the env file, then restart ai-qc.service. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 22:53:03 +02:00
nickviljoen	8f995d557b	feat(box-jwt): JWT service-account client + webhook ingestion endpoint Adds machine-to-machine Box integration alongside the existing per-user OAuth scaffolding. The new JWT client (backend/box_jwt_client.py) is the auth/file/webhook surface used for unattended workflows: load the Custom App JSON config, sign a JWT assertion, exchange for a 60-minute service-account access token (cached + refreshed automatically), and expose file download/upload + V2 webhook CRUD + HMAC signature verification. Wires a new POST /api/box/webhook endpoint (NOT @auth.require_auth — it authenticates each delivery via Box's HMAC signature headers) that: 1. Verifies the signature against env-configured signing keys (BOX_WEBHOOK_PRIMARY_KEY / BOX_WEBHOOK_SECONDARY_KEY). 2. Dedups deliveries by box-delivery-id with a bounded in-memory cache. 3. Maps the source folder to a client via a new get_client_by_box_folder() helper on client_config. 4. Spawns a background thread that downloads the file, runs the same technical pre-flight + LLM check pipeline as the user-uploaded path, writes the HTML report to output/<client>/, uploads the report back to the client's box_reports_folder_id, and logs the run with a synthetic 'box_webhook' user. Webhook runs skip media-plan / localization / OCR context — those are user-UI concepts without a meaningful source in unattended runs. The existing /api/start_analysis path is unchanged. client_config.py gains three optional per-client fields used by the new flow when present: `box_folder_id`, `box_reports_folder_id`, and `default_profile`. Existing client entries keep working without them. .gitignore now excludes backend/config/box_jwt_config.json so the JWT config (with its embedded private key + passphrase) never lands in git. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 22:51:34 +02:00
Nick Viljoen	95121f2fb9	Merged in feature/technical-preflight (pull request #6 ) Feature/technical preflight	2026-05-14 20:07:10 +00:00
nickviljoen	377efe30e5	feat(tech-check): show Technical Details section in HTML report Adds a new "Technical Details" card to generate_comprehensive_html_report() between the summary and the per-check detailed results. Renders only the fields present on the technical_report dict (file size, dimensions, DPI, page count, duration, fonts, etc. — vary by file type) and shows a prominent filename-vs-actual match badge when filename hints were parsed. If technical_report is absent or kind==unknown, the section is omitted entirely so reports for assets we can't inspect (e.g. exotic extensions) keep the existing layout unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 22:00:25 +02:00
nickviljoen	2b287f3dbb	feat(tech-check): wire pre-flight into visual + document analysis Runs technical_check.inspect() immediately after file save on both /api/start_analysis (visual flow) and /api/document/start_analysis (document flow). The report is stashed on progress_tracker[session_id] so it survives across the background thread boundary, then surfaces two ways: 1. Each LLM check in the visual flow gets a "Technical metadata" preamble prepended to its prompt via format_for_llm_prompt(), so the model knows the file's actual dimensions, format, page count, etc. without having to infer them visually. 2. result_data['technical_report'] in both flows carries the same dict through to the frontend for UI rendering (next commit). Pre-flight is best-effort: if it fails for any reason, analysis still proceeds without the preamble (silent except for the report.errors list). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:57:11 +02:00
nickviljoen	f4a95914b5	feat(tech-check): add machine-side pre-flight inspection module New backend/technical_check.py extracts technical metadata from uploaded assets via PIL (images), PyMuPDF (PDFs), and ffprobe (videos) — no LLM, runs in milliseconds. Also opportunistically parses dimension hints from the filename and compares them to the actual file, returning a match/mismatch verdict. Output is a JSON-serializable dict; format_for_llm_prompt() renders it as a tight Markdown block that downstream prompts can prepend. Module never raises — inspection errors land in `errors` so partial reports still surface. Standalone for this commit. Wiring into the upload flow and UI lands in subsequent commits on this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:53:06 +02:00
Nick Viljoen	94af442393	Merged in chore/claude-md-after-phases-1-2 (pull request #5 ) docs: update CLAUDE.md after Phases 1+2 (Dow Jones removed, demos added)	2026-05-14 19:40:44 +00:00
nickviljoen	bcd318a7b1	docs: update CLAUDE.md after Phases 1+2 (Dow Jones removed, demos added) Updates the intro count (9 → 12 clients), adds Google/HP/Ferrero to the client name list, and adds three table rows for the new demo clients (Doc column marked _scope pending_ until per-client docs land). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:39:40 +02:00
Nick Viljoen	5d1eab493c	Merged in feature/add-demo-clients (pull request #4 ) Feature/add demo clients	2026-05-14 19:34:38 +00:00
Nick Viljoen	02ae248e92	Merged in feature/remove-dow-jones (pull request #3 ) Feature/remove dow jones	2026-05-14 19:33:56 +00:00
nickviljoen	93dc030e0c	feat(clients): add Google, HP, Ferrero as demo placeholders Three new clients in demo/eval phase. Each uses Honda-style minimal setup (static_general + video_general only) until real scope and test assets arrive. Descriptions are placeholders to be replaced once scope is confirmed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:30:18 +02:00
nickviljoen	5860abf0f9	docs(dow-jones): update CLAUDE.md after offboarding Removes the Dow Jones row from the client/profile table and the four Dow Jones profile names from the pre-session profile-load checklist. Also updates the intro paragraph counts (9 clients, 15 profiles, 60+ checks) and drops Dow Jones from the client name list, so the intro no longer contradicts the table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:14:30 +02:00
nickviljoen	d1826d83f1	chore(dow-jones): remove client_config entry Drops the 'dow_jones' block from CLIENT_PROFILES. After this, the client picker no longer renders Dow Jones; the four archived profiles are unreachable from user flows. Nine clients remain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:12:47 +02:00
nickviljoen	b23b7f2e17	chore(dow-jones): archive profiles, checks, and per-client doc Moves the Dow Jones / MarketWatch / WSJ profile JSONs (4), check apps (22), and CLAUDE_DOW_JONES.md into backend/_archive/dow_jones/. All moves use git mv so history follows. Adds a restore-instructions README. No loader changes needed — the archive lives outside the scanned directories. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:11:54 +02:00
nickviljoen	69f6abca56	docs(dow-jones): add Phase 1 implementation plan Step-by-step plan that turns the spec into 5 tasks: archive moves (one commit), client_config edit (one commit), CLAUDE.md edits (one commit), full verification, then push + PR with explicit user-confirm gates. Defensive guards at each task halt execution if the codebase has drifted from the spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:09:14 +02:00
nickviljoen	8437b63871	docs(dow-jones): add Phase 1 spec for client offboarding Captures the design for removing Dow Jones from Visual AI QC: archive location (backend/_archive/dow_jones/), file moves, code edits, things explicitly not touched, and verification commands. Implementation follows in subsequent commits on this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:06:10 +02:00
nickviljoen	9746ba249b	docs: refresh CLAUDE_AXA.md status + add AI-usage breakdown Updates the AXA client doc to reflect the 2026-05-10 state: - Status line now reads 2026-05-10, covers Phase 6 (veraPDF), profile split, and dev deploy - New "AI usage across AXA tools" section for client-facing communication (8 of 9 tools deterministic, only axa_pdf_diff uses AI) - Open items expanded to include the pending source-PDF request and the prod-deployment hold Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 11:54:24 +02:00
nickviljoen	a80ff6dee4	Merged in feature/axa-accessibility-profile-split (pull request #2 )	2026-05-10 11:21:34 +02:00

1 2 3 4

195 commits