ai_qc/backend/BOX_CLIENT_ONBOARDING.md
nickviljoen 31b059de79 docs: add Box client onboarding runbook
Documents the end-to-end process for adding a new client to the
Box-webhook-driven QC pipeline:

1. Box admin: create INCOMING + REPORTS folders, invite service account
2. Code: add box_folder_id / box_reports_folder_id / default_profile
   to client_config.py, ship via PR
3. Verify service account access with `box_setup.py list-folder`
4. Register webhook via `box_setup.py register-all-clients` (or UI)
5. End-to-end test by uploading a sample asset, watching logs,
   confirming report appears + source moves to _PROCESSED
6. Optional: tune default_profile from the Settings UI without a code
   deploy
7. Promote to prod (develop→main PR, tag, deploy.sh prod)

Includes a gotchas table for the issues most likely to come up:
403s from missing collaborator invites, signature verification
failures, folder ID mismatches, replace-upload behavior, etc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 14:12:48 +02:00

197 lines
10 KiB
Markdown

# Box Client Onboarding Runbook
Adds a new client to the Box-webhook-driven QC pipeline (Phase 4). Run through this once per client. Most steps need ~5 minutes; total ~30 minutes including Box admin turnaround for collaborator invites.
Architectural reference: the JWT auth + webhook endpoint live in `backend/box_jwt_client.py` and `backend/api_server.py` (search for `_run_box_triggered_analysis`). The admin CLI is `backend/scripts/box_setup.py`. The JWT auth coexists with an older per-user OAuth flow in `backend/box_client.py` — different code path, dormant scaffolding, not used by this pipeline.
---
## What you need before starting
- **Box admin access** (or someone who can act as one) — to create folders and invite the service account.
- **SSH access to the dev server** (`optical-production-dev`) — to run the bootstrap CLI and tail logs.
- **Repo write access** — to land the `client_config.py` change as a PR.
- **The client's profile decisions** — which profile should be the unattended-run default? (Pick from the client's existing `profiles` list.)
Already done at the platform level (don't redo per-client):
- JWT config JSON at `/opt/ai_qc/backend/config/box_jwt_config.json` on each server
- `BOX_WEBHOOK_PRIMARY_KEY` + `BOX_WEBHOOK_SECONDARY_KEY` in each server's env file
- ffmpeg installed (for video pre-flight)
---
## Step 1 — Box-side prep (admin task)
For client `<CLIENT>` (e.g. Diageo):
1. **Create two folders in Box:**
- `AI-QC > INCOMING > AI QC <CLIENT> IN` — where source assets land
- `AI-QC > REPORTS > AI QC <CLIENT> REPORTS` — where QC reports land
2. **Invite the JWT service account as a collaborator on BOTH folders.** Role: **Editor** or higher. (Editor lets it read uploads, write reports, and move files into the auto-created `_PROCESSED` subfolder. Co-owner also works.)
3. **Capture the folder IDs.** Box shows them in the URL when you open a folder, or you can list them programmatically once invites are in:
```bash
cd /opt/ai_qc
venv/bin/python backend/scripts/box_setup.py list-folder <parent_AI-QC_folder_id>
```
---
## Step 2 — Code change
Edit `backend/client_config.py`, add three optional fields to the client entry:
```python
'<client_id>': {
'name': 'Client Display Name',
'profiles': ['client_specific_profile', 'static_general', 'video_general'],
'display_name': 'Client Display Name',
'description': '...',
'box_folder_id': '<INCOMING folder ID>',
'box_reports_folder_id': '<REPORTS folder ID>',
'default_profile': '<one of the profiles above>',
},
```
Then:
- Push as a small PR → merge to `develop`
- On the dev server: `cd /opt/ai_qc && ./backend/scripts/deploy.sh dev`
- No env-file backup dance needed (this is a code-only change)
---
## Step 3 — Verify the service account got access
Before registering webhooks, sanity-check that the service account can actually read the folders the admin invited it to:
```bash
cd /opt/ai_qc
venv/bin/python backend/scripts/box_setup.py list-folder <INCOMING folder ID>
venv/bin/python backend/scripts/box_setup.py list-folder <REPORTS folder ID>
```
Expected: both print `Folder <id> contains N items:` even if empty.
**If you get `Access Denied` / HTTP 403**: the service account isn't actually a collaborator yet. Box admin needs to retry the invite. Common causes:
- Invite went to the wrong identity (Box has separate "user" and "app" identities — the JWT app is an app)
- Invite is pending acceptance somewhere
- Folder was created but invite wasn't applied at the right level
Don't proceed until both `list-folder` calls succeed.
---
## Step 4 — Register the V2 webhook
**Option A: CLI (recommended)** — idempotent, batch-able, lives in version control:
```bash
cd /opt/ai_qc
venv/bin/python backend/scripts/box_setup.py register-all-clients \
https://optical-dev.oliver.solutions/ai_qc/api/box/webhook
```
The script:
- Scans `client_config.py` for every client with `box_folder_id` set
- For each, checks Box for an existing webhook on that folder pointing at the given URL
- Skips ones that already exist
- Creates webhooks for any that are missing
- Prints `<client> (<folder_id>): CREATED webhook id=<id>` or `SKIP — webhook already exists`
Safe to re-run any time; it won't duplicate.
**Option B: Box Developer Console UI** — useful for one-off testing:
- Box Developer Console → your Custom App → **Webhooks** tab → **Create Webhook**
- URL: `https://optical-dev.oliver.solutions/ai_qc/api/box/webhook`
- Content Type: **Folder** → search/pick the client's INCOMING folder
- Event Triggers: tick **`FILE.UPLOADED`** only (do not tick others — they'd trigger spurious webhook deliveries)
- Save
No new signing keys to generate — they're app-level, configured once for the whole Custom App.
---
## Step 5 — End-to-end test
Open one terminal:
```bash
sudo journalctl -u ai-qc.service -f
```
In Box: upload a small test asset (image, PDF, or video) to the client's INCOMING folder.
Within a few seconds you should see (timestamps abbreviated):
```
Box webhook: dispatching session=<ts> client=<client_id> profile=<default_profile> file_id=...
Box webhook: downloaded <file> → uploads-dev/<ts>/<file>
Running check 1/N: <check_name>
...
Box webhook: uploaded report QC_Report_<ts>_<file>.html → folder <REPORTS folder ID>
Box webhook: moved source → _PROCESSED/<ts>_<file>
Box webhook: analysis complete for session <ts>, score <N>
```
Then in Box, verify:
- A new `QC_Report_<ts>_<original-filename>.html` exists in the REPORTS folder
- The source file has been moved into the auto-created `_PROCESSED` subfolder inside INCOMING. Its new name has the session_id prefix, which ties back to the corresponding report.
---
## Step 6 — (Optional) Tune the default profile from the UI
If the team finds that the static `default_profile` in code doesn't match how they want webhook-triggered runs to behave, an admin can change it without a code deploy:
1. Open the app → pick the client in the picker
2. ⚙️ **Settings** → **Default Profile** tab
3. Click a different profile → **Set as default**
The override is persisted to `backend/client_defaults.json` (gitignored, per-server) and takes effect immediately on the next webhook run. **Revert to static default** clears the override.
---
## Step 7 — Promote to prod
After the dev test passes:
1. PR `develop → main` on Bitbucket. Merge.
2. Tag main: e.g. `v1.2.0`, push the tag.
3. On the prod server (`optical-production`):
```bash
cd /opt/ai_qc
./backend/scripts/deploy.sh prod v1.2.0
```
4. Once-per-environment prod prerequisites (you only do these the first time prod gets Phase 4, never again):
- JWT config JSON at `/opt/ai_qc/backend/config/box_jwt_config.json` (scp from your laptop, `chmod 600`)
- `BOX_WEBHOOK_PRIMARY_KEY` + `BOX_WEBHOOK_SECONDARY_KEY` in `production.env` — these are the same app-level keys as dev
- `sudo apt install ffmpeg` (for video pre-flight)
5. Register webhooks pointing at the prod URL (different from dev's URL — each webhook is bound to one address):
```bash
cd /opt/ai_qc
venv/bin/python backend/scripts/box_setup.py register-all-clients \
https://optical-prod.oliver.solutions/ai_qc/api/box/webhook
```
The Box folders themselves are shared — you don't create new prod-only folders. Both dev and prod webhooks fire on the same client folders. If you don't want prod handling uploads yet, just don't register the prod webhooks until you're ready.
---
## Common gotchas
| Symptom | Likely cause | Fix |
|---|---|---|
| 403 from `list-folder` | Service account isn't a collaborator on that folder yet | Box admin re-invites with Editor role |
| `Box webhook: signature verification failed` in logs | Signing keys in env don't match what the Custom App has | Box Developer Console → Manage Signature Keys → regenerate → update env on each server → restart service |
| `Box webhook: no client configured for Box folder <id>` | The folder ID Box sent doesn't match any `box_folder_id` in `client_config.py` | Check `client_config.py` against the actual Box folder ID; they're strings, must match exactly |
| `Box webhook: skipping non-QC extension <ext>` | User uploaded a file type we don't QC (e.g. `.docx`, `.zip`) | Working as intended; document for the client |
| Webhook fires correctly but source file stays in INCOMING | The report-upload step failed earlier; the move is gated on a successful report upload so the user can retry by re-uploading | Look upstream in the log for `failed to upload report to Box: <error>` and fix the cause (usually a permissions issue on the REPORTS folder) |
| Re-uploading the same filename doesn't trigger a fresh webhook | This is normal Box V2 behavior — same-name "replace" uploads create new versions of the existing file, which the folder-scoped webhook doesn't fire on | The auto-move-to-`_PROCESSED` step solves this for the happy path. If a file got stuck in INCOMING because of a previous failure, move/delete it manually so the next upload is a genuinely-new file |
| Reports folder fills up indefinitely | No auto-cleanup of old reports — by design | Manual cleanup, or add an age-based pruning script as a follow-up |
| `_PROCESSED` folder not auto-created | Service account doesn't have Editor (Viewer can't create subfolders) | Box admin upgrades the collaborator role to Editor |
---
## What this onboarding does NOT cover
- **Removing a client from the integration** — to stop processing: delete the webhook in the Box Developer Console (or `box_setup.py delete-webhook <webhook_id>`), then remove the `box_folder_id` field from `client_config.py` in a PR. Existing reports in the REPORTS folder are left alone.
- **Multiple webhook-triggered profiles per client** — current schema is one default profile per client. If a client needs `FILE.UPLOADED` in one folder to run profile A and a different folder to run profile B, that's a schema change (one `client_config.py` entry per folder, or extend the schema to `{folder_id: profile_id}` maps).
- **Webhook health monitoring** — there's no alert if Box stops delivering. If you suspect webhooks are silent, drop a fresh test asset and watch logs; if nothing fires, check Box Developer Console → Webhooks → the webhook's `App Diagnostics` tab.