Commit graph

212 commits

Author SHA1 Message Date
Nick Viljoen
bc6b4e8482 Merged in feature/axa-formatting-diff (pull request #24)
Feature/axa formatting diff
2026-05-19 10:45:59 +00:00
nickviljoen
29ee941037 refactor(formatting_diff): narrow scope to bold + italic only
First real-data test against the AXA car-insurance PDFs surfaced a
noise problem: the new document is a brand refresh — every page flips
font (PublicoBanner-Bold→PublicoHeadline-Bold) and colour
(#893f4a→#2e3092). At medium-per-finding that crashed the diff score
to 0.0 and drowned the bold-regression signal AXA actually flagged.

Drop font, size, colour comparators. Keep bold + italic — the
attributes the vision-LLM consistently misses on dense layouts. The
LLM already narrates colour-scheme rebrands and font swaps in its
Modified / Style-changes blocks; running both layers on the same
visual change just double-counts it.

Tests inverted from "X change is flagged" to "X change is NOT
flagged" to lock the scope decision in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 12:37:19 +02:00
nickviljoen
d327776c70 fix(diff_engine): guard compute_formatting_diff against per-pair failure
If the deterministic formatting comparator raises on any single page-pair
(e.g. unexpected span shape from a future PyMuPDF version), degrade to
zero formatting findings for that pair instead of aborting the whole
52-page diff run. Logged for visibility.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:31:16 +02:00
nickviljoen
640bbe4671 docs(axa): note deterministic formatting layer added to axa_pdf_diff
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:23:15 +02:00
nickviljoen
0fd6a35562 fix(diff_report): _fmt_value labels italic flips correctly
Previously every boolean attribute rendered as "Bold → Regular",
producing "Italic: Bold → Regular" for italic flips. Now the helper
takes the attribute name and emits "Italic → Regular" or
"Bold → Regular" depending on which boolean attribute is being shown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:22:39 +02:00
nickviljoen
7eaac85df3 feat(diff_report): render formatting_changes as a per-pair block
Adds a "🎨 Formatting changes" block to the per-page diff report
when the deterministic formatting layer finds typographic flips.
Distinguishes page-wide style shifts from local span flips, lists up
to three example quotes per aggregated finding, and HTML-escapes all
user-controlled strings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:08:47 +02:00
nickviljoen
2b1bb9ccf0 feat(diff_engine): merge formatting_diff findings into pair_diffs
run_page_pair_diff now invokes compute_formatting_diff alongside the
LLM call for each aligned pair. When the deterministic layer finds
typographic flips on a page the LLM saw as identical, the pair is
re-classified as having differences with medium severity. Each
aggregated finding contributes to the global medium-severity tally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:03:54 +02:00
nickviljoen
d21a8a276d refactor(formatting_diff): harden page_wide threshold + None-key handling
Three review-driven hardening tweaks:
- page_wide now requires ≥3 matched spans (PAGE_WIDE_MIN_SPANS).
  Avoids labelling section-break pages with a single flipped heading
  as page-wide.
- _collect_flips normalises bold/italic via bool() and font/color
  via "or ''" so callers passing dicts without those keys do not
  produce phantom flips against False/''.
- Adds tests for empty span lists and the missing-bold-key case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:01:23 +02:00
nickviljoen
98679e7329 feat(document_mode): add deterministic span formatting diff
New formatting_diff module compares span-level bold/italic/font/size/
color attributes between aligned page-pairs. Pure-Python; reads
PyMuPDF metadata already captured during ingest. Aggregates identical
flips into single findings and flags page-wide style shifts.

Powers the AXA document_diff fix for missed formatting changes that
the vision-LLM does not reliably detect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 09:56:34 +02:00
nickviljoen
f69e181520 feat(ingest): capture span color as #rrggbb string
Adds a 'color' field to each span dict extracted by
_extract_page_spans. Powers the upcoming deterministic
formatting-diff layer for AXA document_diff mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 09:45:21 +02:00
nickviljoen
25bb472a53 docs: plan for AXA diff deterministic formatting layer
Six-task TDD plan implementing the spec at
docs/superpowers/specs/2026-05-19-axa-formatting-diff-design.md.
Local execution only this cycle — Nick tests locally before any
push to dev, and the prod bundle ships tonight alongside other
AXA changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 09:39:40 +02:00
nickviljoen
5e1263380e docs: spec for AXA diff deterministic formatting layer
New formatting_diff module compares PyMuPDF span-level (bold, italic,
font, size, color) attributes between aligned page-pairs to catch
formatting changes the vision-LLM misses. Addresses AXA client feedback
that page 18+ un-bolding of blue text was not surfaced in the diff
report.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 09:32:00 +02:00
Nick Viljoen
77276f2a7c Merged in feature/hp-cycle-1-onboarding (pull request #22)
fix(hp_copy_review): correct llm casing + route HP reports to /hp/ folder
2026-05-17 20:10:50 +00:00
nickviljoen
71bb9a6295 fix(hp_copy_review): correct llm casing + route HP reports to /hp/ folder
Two bugs surfaced by the first dev smoke test:

1. Profile JSON declared "llm": "gemini" (lowercase). llm_config's
   dispatcher compares model_name == "Gemini" case-sensitively
   (matches the rest of the codebase), so the check fell through to
   "Invalid model selected" and never reached the API. Every other
   profile uses "Gemini" with capital G. Spec mistake — fixed.

2. get_client_from_profile() resolves the per-report output folder
   from the profile_id via hardcoded prefix matches. No 'hp_' branch
   existed, so hp_copy_review reports landed under output-dev/general/
   instead of output-dev/hp/ — the UI then couldn't find them. Added
   'hp_' → 'hp' alongside the existing mappings.

The check itself works correctly otherwise: profile_source was
user_selected, brand resolved to 'hp', and the reference asset was
successfully attached. Bug 1 just prevented Gemini from being called.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 22:07:25 +02:00
Nick Viljoen
03410cb271 Merged in feature/hp-cycle-1-onboarding (pull request #21)
Feature/hp cycle 1 onboarding
2026-05-17 19:43:10 +00:00
nickviljoen
68a2360811 feat(report): render hp_copy_review findings as a structured table
Both HTML report generators (generate_html_content and
generate_comprehensive_html_report) get a small case: when a check
result has a 'findings' array in its json_data, render it as a
priority-coloured table with quote/issue/suggested-fix/source
columns instead of the default response-text block. The summary
field (when present) renders above the table. Fallback to text
rendering when findings is absent — every existing check is
unaffected.

All string fields from the LLM are HTML-escaped via html.escape()
to neutralise stray <, >, &, or quote characters. Inline CSS for
.findings-table / .priority-pill / .priority-high|medium|low /
.muted is added to both stylesheets so the two generators stay
visually in sync.
2026-05-17 21:37:35 +02:00
nickviljoen
0e833447c0 fix(brand-guidelines): inject xlsx Source Messaging summary into check prompts
Task 5 review found that get_reference_asset_content treated all
non-localization-matrix .xlsx files as opaque ('reference file
uploaded'), never reading the Gemini summary that excel_processor
writes. That meant hp_copy_review would see no canonical messaging
and fire its score-0 fallback on every real asset.

Extend the .xlsx branch to mirror the PDF pattern: when the file
record has a summary_path (set by excel_processor after a
successful source-messaging summary), read and inject the Markdown
into the reference content block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:28:32 +02:00
nickviljoen
4c19a0fb9d feat(hp_copy_review): single-check LLM grader against Source Messaging
Single Gemini call per asset. Prompt assembles attached Source
Messaging summaries + media-plan language context + the asset image.
Returns structured JSON with score, summary, and a findings array
(priority, category, quote, issue, suggested fix, source reference).
Empty findings = clean asset; missing reference -> score 0 with a
clear message rather than running blind.

Mirrors the boots_tandc_wording pattern: subclass FlaskAppTemplate,
expose a static prompt template, let process_single_check inject
reference-asset content and media-plan context at runtime. A
standalone build_prompt() helper mirrors that assembly for unit-
style smoke tests and ad-hoc prompt inspection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:25:30 +02:00
nickviljoen
014a9cb8ff feat(hp): promote HP client + add hp_copy_review profile
HP is no longer a placeholder. The client gets a new hp_copy_review
profile (single weighted check, client-specific visibility) as its
default, plus the generic static_general and video_general profiles
it already had visibility into.
2026-05-17 21:08:18 +02:00
nickviljoen
568465f9be fix(brand-guidelines): preserve localization-matrix parsing in xlsx dispatch
The prior Task 2 commit (295305e) over-replaced existing logic that
recognised certain .xlsx/.xls uploads as localization matrices and
set asset_type='localization_matrix'. That field is load-bearing in
two downstream sites (api_server.py:1628 and :1986) that build
localization context for QC checks; destroying it would silently
break any existing client using localization matrices.

Restore the original try-localization-matrix-first path; only fall
through to excel_processor (HP Source Messaging summary) when the
file isn't a parseable localization matrix. Also restore .xls
support and tag Source Messaging uploads as
asset_type='source_messaging' so downstream code can distinguish
them from localization matrices.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:03:56 +02:00
nickviljoen
295305ef2d feat(brand-guidelines): route .xlsx uploads to excel_processor
The /api/brand_guidelines POST handler now dispatches by extension:
.pdf → pdf_processor.process_pdf_file (existing), .xlsx →
excel_processor.process_excel_file (new). Same DB record shape;
cover image is null for Excel since there's no first-page analogue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:02:05 +02:00
nickviljoen
c51e0729ce fix(excel-processor): wrap extraction in try/except to honour 'never raises'
Code review found that _extract_workbook_text was unwrapped — a
corrupt/locked .xlsx or InvalidFileException would leak out of
process_excel_file despite the docstring promising 'Never raises'.
Wrap the extraction call too; on extraction failure, write a
degraded summary explaining the failure and return cleanly.

Verified by passing a non-existent file: the function returns a
degraded summary instead of raising FileNotFoundError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 20:55:54 +02:00
nickviljoen
abd36a9abe fix(excel-processor): use literal trademark glyphs in summary prompt
Spec requires "™, ®, ©" in the Approved Brand and Product Names
section instructions; first pass wrote "TM, R, C" out of unfounded
caution about encoding. Python 3 source handles UTF-8 fine and
pdf_processor.py uses smart punctuation throughout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 20:52:38 +02:00
nickviljoen
ed46504ac6 feat(excel-processor): add openpyxl + Gemini summary pipeline for HP Source Messaging
Mirrors pdf_processor.py — public process_excel_file() reads any HP
Source Messaging Excel, extracts cells via openpyxl (skipping empty
rows, capped at 50K chars), and summarises into structured Markdown
via Gemini 2.5 Pro. Output saved as brand_guidelines/files/{file_id}_summary.md.

On Gemini failure the processor writes a degraded summary containing
the raw extraction so the reference asset stays usable. Test fixtures
(real HP Excels) live under backend/tests/fixtures/hp/ and are gitignored.
2026-05-17 20:49:50 +02:00
nickviljoen
7d178f11ee docs(plan): HP onboarding cycle 1 implementation plan
7-task plan against 2026-05-17-hp-cycle-1-onboarding-design.md:
excel_processor → .xlsx dispatch → media-plan language field →
HP client+profile → hp_copy_review check → findings-table renderer
→ dev smoke + deploy. Lightweight verification posture (py_compile +
imports + profile load + python3 -c mini-tests + dev smoke runs)
to match the project's existing style — no pytest scaffolding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 20:32:56 +02:00
nickviljoen
53ba67c2c0 docs(spec): HP onboarding cycle 1 — hp_copy_review check
Captures the brainstorm outcome for migrating HP off the deprecated
hp-copy PHP/Make.com POC onto AI QC. Cycle 1 of 3 in HP onboarding
(cycles 2 = Word/PPT processor, 3 = Box picker — both independent
and shipped later). Locks the four design decisions reached during
the brainstorm:

- User selects the canonical Source Messaging reference asset at
  QC-run time (matches existing brand-guidelines UX)
- Single hp_copy_review check, single Gemini call per asset,
  structured findings JSON output matching the Messi Copy Review
  document format
- Excel processor mirrors pdf_processor.py: openpyxl extracts raw
  cell content, Gemini summarises into structured Markdown,
  saved as {file_id}_summary.md alongside the file
- Media-plan `language` field is free-form text, included in the
  check prompt when present, omitted gracefully when absent

No code yet — pick up with the writing-plans skill to draft the
implementation plan against this spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 20:17:43 +02:00
Nick Viljoen
fc588c626d Merged in feature/untrack-tracked-env-files (pull request #19)
chore(env): untrack legacy env files so deploys stop clobbering them
2026-05-17 17:01:32 +00:00
nickviljoen
1057c5660f chore(env): untrack legacy env files so deploys stop clobbering them
config.env, backend/config.env, config/development.env, and
config/production.env still contained real secrets and were getting
silently reverted by `git reset --hard` during deploys — manual
key-restore was required after both v1.3.0 and v1.3.1 to recover
the in-place GOOGLE_API_KEY rotation. Move them to .gitignore
alongside the already-untracked backend/config/*.env paths.

The next deploy after this lands will delete them from disk one
final time (because they were tracked in the prior commit). Same
backup/restore dance documented for the previous secrets-untrack
is needed for that single deploy; after it, the files are
permanently untracked.

This does NOT remove historical secrets from git history. Rotation
of OPENAI_API_KEY, BOX_CLIENT_SECRET, SECRET_KEY, SMTP_PASSWORD
remains a separate open follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 19:00:09 +02:00
Nick Viljoen
8e2d970d61 Merged in feature/boots-ppack-page-parallelism (pull request #17)
perf(document-mode): parallelize per-page check dispatch in stages 3c/3d
2026-05-17 16:15:52 +00:00
nickviljoen
1c5dd980d4 perf(document-mode): parallelize per-page check dispatch in stages 3c/3d
A 4-page Boots PPack run (7 page-scoped checks) was taking ~15 min
because the dispatcher processed pages sequentially within each
check — 28 Gemini calls in a single file. Asset-mode's
ThreadPoolExecutor parallelism was bypassed because doc-mode called
process_checks_in_batches once per page in a loop.

Wrap the per-page dispatch in both Stage 3c (page_sample) and Stage
3d (page_each) with a ThreadPoolExecutor (max_workers=4). Extract
the per-page work into a single nested helper used by both stages,
which also tags each result with page_type so the existing artwork
vs informational aggregation in Stage 3d keeps working. Aggregation
logic, scoring, strict-grade override, and report shape are all
unchanged.

process_checks_in_batches is already reentrant (asset-mode uses it
under its own internal ThreadPoolExecutor), so concurrent calls are
safe. Progress-tracker writes intentionally tolerate races (visual
only). Per-page exceptions are caught inside the helper so one bad
page doesn't kill the doc — it just records a score-0 result.

Expected: 15 min → ~3-4 min on the same 4-page PDF. Needs wall-time
confirmation on dev with a real run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 18:14:27 +02:00
nickviljoen
8e50413b53 docs(spec): Phase 5 cycle 1 — Postgres database design
Captures the brainstorm outcome for adding a Postgres database
alongside the existing JSONL usage logs, ahead of the dashboard
work. Decomposes Phase 5 into three independent cycles (DB first,
then Docker, then dashboard) and locks the schema, transition
strategy (dual-write), hosting (Docker on each VM), backup
approach (pg_dump → GCS), and rollback escape hatch.

No code changes yet — pick up with the writing-plans skill when
returning to Phase 5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 17:02:03 +02:00
Nick Viljoen
86dec44124 Merged in fix/deploy-sigpipe-when-many-commits (pull request #16)
fix(deploy): use git's own -n limit instead of | head -20
2026-05-17 13:26:41 +00:00
nickviljoen
a3b3f45f01 fix(deploy): use git's own -n limit instead of | head -20
When the deploy batch has more than 20 commits, the `git log ... | head -20`
pipeline closes the pipe after 20 lines. git log gets SIGPIPE (exit 141),
which `set -o pipefail` propagates, and `set -e` then exits the script
silently — no prompt shown, no error message.

Only bites for release-sized batches (>20 commits). First seen on the
v1.3.0 prod deploy: 20 commits displayed, then the script returned to
the shell without prompting. dev deploys never hit this because they
typically only have 1-3 commits ahead.

Fix: tell git to limit its own output via `-n 20`. Same display, no
broken pipe. Also swap the count-by-wc-l for `git rev-list --count`
which is more idiomatic and avoids any further pipe shenanigans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 15:25:38 +02:00
Nick Viljoen
e25006039f Merged in docs/box-client-onboarding-runbook (pull request #14)
docs: add Box client onboarding runbook
2026-05-17 12:13:53 +00:00
nickviljoen
31b059de79 docs: add Box client onboarding runbook
Documents the end-to-end process for adding a new client to the
Box-webhook-driven QC pipeline:

1. Box admin: create INCOMING + REPORTS folders, invite service account
2. Code: add box_folder_id / box_reports_folder_id / default_profile
   to client_config.py, ship via PR
3. Verify service account access with `box_setup.py list-folder`
4. Register webhook via `box_setup.py register-all-clients` (or UI)
5. End-to-end test by uploading a sample asset, watching logs,
   confirming report appears + source moves to _PROCESSED
6. Optional: tune default_profile from the Settings UI without a code
   deploy
7. Promote to prod (develop→main PR, tag, deploy.sh prod)

Includes a gotchas table for the issues most likely to come up:
403s from missing collaborator invites, signature verification
failures, folder ID mismatches, replace-upload behavior, etc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 14:12:48 +02:00
Nick Viljoen
432162f167 Merged in feature/default-profile-ui (pull request #13)
feat(settings): default-profile UI per client (admin-only) for Box webhook flow
2026-05-17 11:51:50 +00:00
nickviljoen
bf89466d06 feat(settings): default-profile UI per client (admin-only) for Box webhook flow
Adds a "Default Profile" sub-tab to the Settings modal. Lists the
current client's profiles as radio buttons, shows which is the active
default and whether it's a runtime override or the static value from
client_config.py. Admins click a different profile + Set to override;
clear-override button reverts to the static value.

Storage layer: backend/client_defaults.json (gitignored, per-server),
following the same pattern as user_access.json. Resolution order in
client_config.get_default_profile(): override → static
default_profile field → None. The Box webhook handler is the sole
consumer that needs profile selection without a logged-in user; it
now reads via get_default_profile() so overrides take effect.

Why a separate JSON, not rewriting client_config.py: a buggy override
write can never break server boot — worst case the override is
ignored and the static value applies. Cleaner separation between
"static config you check in" and "runtime overrides admins make".

Backend:
- client_config.get_default_profile(client_id) — resolver
- client_config.set_default_profile(client_id, profile_id) — validates
  + writes (rejects profiles not in client's profile list)
- client_config.clear_default_profile_override(client_id)
- GET /api/clients/<id>/default_profile (any auth'd user)
- PUT /api/clients/<id>/default_profile (admin-only, _require_admin)
- DELETE /api/clients/<id>/default_profile (admin-only)
- Box webhook handler in api_server.py now uses get_default_profile()

Frontend:
- New "Default Profile" tab button + tab content in Settings modal
- showTab hook loads settings when tab activates
- loadDefaultProfileSettings / saveDefaultProfile /
  clearDefaultProfileOverride functions
- DOM-construction (createElement + textContent) used throughout —
  no innerHTML with interpolated values, so user-controllable
  strings (client_id, profile_id) can never cause XSS

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:50:20 +02:00
Nick Viljoen
be00f24416 Merged in feature/box-processed-folder-move (pull request #12)
feat(box-jwt): move source file to _PROCESSED after successful run
2026-05-17 11:31:01 +00:00
nickviljoen
b7e9c483de feat(box-jwt): move source file to _PROCESSED after successful run
Solves two problems at once:

1. Folder cleanliness — INCOMING accumulates indefinitely otherwise.
2. Duplicate-upload re-trigger — Box V2's FILE.UPLOADED trigger doesn't
   fire when the same filename is "uploaded as new version" of an
   existing file. By moving the source out of INCOMING after success,
   re-uploading the same filename becomes a genuinely-new file event
   again and the webhook fires normally.

After report uploads successfully to the REPORTS folder, the worker:
1. find_or_create_subfolder(<INCOMING>, '_PROCESSED') — idempotent
2. move_file(file_id, <_PROCESSED>, new_name=f'{session_id}_{filename}')

The session_id prefix gives the archived file a sortable timestamp and
ties it back to the matching QC_Report_<session_id>_*.html in REPORTS.

Defensive: the move only runs if the report upload to Box succeeded.
If Box delivery failed, the source stays in INCOMING so a retry just
means re-uploading. Move failures are non-fatal — logged + recorded
in result_data['box_source_move_error'], analysis still marked
complete.

Adds four helpers to box_jwt_client.py:
- find_subfolder_by_name(parent, name) → Optional[str]
- create_subfolder(parent, name) → str
- find_or_create_subfolder(parent, name) → str  (idempotent)
- move_file(file_id, target_folder, new_name=None) → Dict

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:29:45 +02:00
Nick Viljoen
4d08a23322 Merged in fix/comprehensive-report-status-filter (pull request #11)
fix(reports): render check details for status='success' in generate_comprehensive_html_report
2026-05-17 11:05:25 +00:00
nickviljoen
c75f3a99b9 fix(reports): render check details for status='success' in generate_comprehensive_html_report
generate_comprehensive_html_report filtered check rendering with
`status == 'completed'`, but the modern check pipeline
(process_single_check via /api/start_analysis and the Phase 4 Box
webhook flow) returns `status == 'success'`. Only the legacy
process_single_check_with_triage returns 'completed'.

Result: every report produced by the modern pipeline had an empty
"Detailed Analysis Results" section — just the heading with nothing
below it. Surfaced when Nick ran a LOREAL Box-webhook test on
2026-05-17: webhook fired correctly, 4 LLM checks ran, scores came
back, technical pre-flight rendered, but the per-check accordion was
empty.

Fix: accept either status value, so both modern and legacy code paths
render correctly. Errored checks (status='error') still skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:01:21 +02:00
Nick Viljoen
57ce396860 Merged in feature/loreal-box-folders (pull request #10)
feat(clients): wire LOREAL Box folders for webhook-driven QC
2026-05-15 07:51:39 +00:00
nickviljoen
4a9ddee87f feat(clients): wire LOREAL Box folders for webhook-driven QC
First client to use the Phase 4 unattended-QC pipeline. Adds three
optional fields to the loreal entry in client_config.py:

- box_folder_id=381501258415 (AI-QC > INCOMING > AI QC LOREAL IN)
- box_reports_folder_id=382076841334 (AI-QC > REPORTS > AI QC LOREAL REPORTS)
- default_profile=loreal_static

When a file lands in the INCOMING folder, /api/box/webhook will pick
it up, run loreal_static (strict-grade), and upload the HTML report
to the REPORTS folder. Other clients remain unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 09:50:40 +02:00
Nick Viljoen
1c8e1ea1a7 Merged in feature/box-jwt-integration (pull request #9)
Feature/box jwt integration
2026-05-14 21:42:43 +00:00
nickviljoen
a99c8601f0 Merge develop into feature/box-jwt-integration
Brings in the 4 commits that landed on develop after this branch was
cut: the chore/untrack-env-files PR (#7) and the
fix/tech-section-in-html-content PR (#8).

Conflict resolution:
- .gitignore: both branches added `backend/config/box_jwt_config.json`
  in slightly different positions. Kept both sets of additions —
  development.env + production.env (from develop) and
  box_jwt_config.json (from this branch).
- api_server.py: auto-merged cleanly; the Phase 4 webhook endpoint and
  the Phase 3 technical-section fix touch different regions of the file.

Verified after merge: api_server imports cleanly, box_webhook route
registered, _render_technical_section_html callable, 60 QC apps and
15 profiles load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 23:42:00 +02:00
Nick Viljoen
c99b8b7770 Merged in fix/tech-section-in-html-content (pull request #8)
fix(tech-check): also render Technical section in generate_html_content
2026-05-14 21:29:37 +00:00
nickviljoen
096eba747d fix(tech-check): also render Technical section in generate_html_content
Phase 3 patched generate_comprehensive_html_report() but missed the
older generate_html_content() generator. The /api/start_analysis flow
with output_mode='html' (the path the web UI's download button
actually triggers) routes through generate_html_content, so the
Technical Details section never appeared in user-downloaded reports
despite the technical_report data being present in the underlying
result_data.

Mirrors the Phase 3 treatment exactly: pre-builds technical_html via
_render_technical_section_html(), adds the .technical / .technical-grid
/ .tech-row CSS rules, and injects {technical_html} between the
summary block and the Detailed Analysis Results header.

generate_comprehensive_html_report() retains the same logic for the
/api/process_file path (line 4187) and the new Box webhook flow
(_run_box_triggered_analysis on the Phase 4 branch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 23:28:52 +02:00
Nick Viljoen
33278e4f62 Merged in chore/untrack-env-files (pull request #7)
chore(secrets): untrack env files + add JWT path to .gitignore
2026-05-14 21:17:47 +00:00
nickviljoen
cfb13eb870 chore(secrets): untrack env files + add JWT path to .gitignore
backend/config/development.env and backend/config/production.env were
committed to the repo with real API keys, SMTP passwords, and Flask
SECRET_KEY values. This commit:

1. Adds both files to .gitignore so future edits stop landing in git.
2. git rm --cached's them (local copies preserved on disk, just
   untracked).
3. Also pre-emptively adds backend/config/box_jwt_config.json to
   .gitignore — Phase 4 already gitignores it on a separate branch, but
   listing it here protects the file regardless of merge order.
4. Updates backend/config/.env.template with the new Box JWT-related
   vars (BOX_JWT_CONFIG_PATH, BOX_WEBHOOK_PRIMARY_KEY,
   BOX_WEBHOOK_SECONDARY_KEY) so the template is a complete reference
   for setting up a new environment from scratch.

IMPORTANT — secrets still in git history after this commit. Removing
them from history requires a destructive rewrite (git filter-repo +
force-push every branch). Pragmatic alternative: rotate any secret
that was ever in the files. Candidates: OPENAI_API_KEY, BOX_CLIENT_SECRET,
SECRET_KEY, SMTP_PASSWORD. AZURE_TENANT_ID and AZURE_CLIENT_ID are
public-ish identifiers and don't need rotating. GOOGLE_API_KEY just
rotated this session.

DEPLOY GOTCHA: deploy.sh does git reset --hard, which will delete the
env files from /opt/ai_qc/backend/config/ on the server when this
commit lands. Back them up before deploying, restore after:

    sudo cp /opt/ai_qc/backend/config/development.env /tmp/dev.env.bak
    # ...deploy...
    sudo cp /tmp/dev.env.bak /opt/ai_qc/backend/config/development.env
    sudo systemctl restart ai-qc.service

Same dance on prod with production.env when promoting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 23:13:18 +02:00
nickviljoen
65848bcda1 feat(box-jwt): add box_setup.py bootstrap CLI for webhook management
One-off script used to register/inspect Box V2 webhooks against the
service account. Subcommands: list-webhooks, list-folder, list-clients,
create-webhook, delete-webhook, register-all-clients.

Typical bootstrap flow on a fresh deploy:
1. Drop box_jwt_config.json on the server (gitignored, scp'd in).
2. Verify the service account can read each client folder:
   `python backend/scripts/box_setup.py list-folder <folder_id>`
3. Once a client's box_folder_id is set in client_config.py, register
   its webhook idempotently:
   `python backend/scripts/box_setup.py register-all-clients \
       https://optical-dev.oliver.solutions/ai_qc/api/box/webhook`
4. Copy the signing keys from the Box Developer Console (Custom App →
   Webhooks) into BOX_WEBHOOK_PRIMARY_KEY / BOX_WEBHOOK_SECONDARY_KEY
   in the env file, then restart ai-qc.service.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 22:53:03 +02:00