Solves two problems at once:
1. Folder cleanliness — INCOMING accumulates indefinitely otherwise.
2. Duplicate-upload re-trigger — Box V2's FILE.UPLOADED trigger doesn't
fire when the same filename is "uploaded as new version" of an
existing file. By moving the source out of INCOMING after success,
re-uploading the same filename becomes a genuinely-new file event
again and the webhook fires normally.
After report uploads successfully to the REPORTS folder, the worker:
1. find_or_create_subfolder(<INCOMING>, '_PROCESSED') — idempotent
2. move_file(file_id, <_PROCESSED>, new_name=f'{session_id}_{filename}')
The session_id prefix gives the archived file a sortable timestamp and
ties it back to the matching QC_Report_<session_id>_*.html in REPORTS.
Defensive: the move only runs if the report upload to Box succeeded.
If Box delivery failed, the source stays in INCOMING so a retry just
means re-uploading. Move failures are non-fatal — logged + recorded
in result_data['box_source_move_error'], analysis still marked
complete.
Adds four helpers to box_jwt_client.py:
- find_subfolder_by_name(parent, name) → Optional[str]
- create_subfolder(parent, name) → str
- find_or_create_subfolder(parent, name) → str (idempotent)
- move_file(file_id, target_folder, new_name=None) → Dict
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds machine-to-machine Box integration alongside the existing per-user
OAuth scaffolding. The new JWT client (backend/box_jwt_client.py) is
the auth/file/webhook surface used for unattended workflows: load the
Custom App JSON config, sign a JWT assertion, exchange for a 60-minute
service-account access token (cached + refreshed automatically), and
expose file download/upload + V2 webhook CRUD + HMAC signature
verification.
Wires a new POST /api/box/webhook endpoint (NOT @auth.require_auth — it
authenticates each delivery via Box's HMAC signature headers) that:
1. Verifies the signature against env-configured signing keys
(BOX_WEBHOOK_PRIMARY_KEY / BOX_WEBHOOK_SECONDARY_KEY).
2. Dedups deliveries by box-delivery-id with a bounded in-memory cache.
3. Maps the source folder to a client via a new
get_client_by_box_folder() helper on client_config.
4. Spawns a background thread that downloads the file, runs the same
technical pre-flight + LLM check pipeline as the user-uploaded path,
writes the HTML report to output/<client>/, uploads the report back
to the client's box_reports_folder_id, and logs the run with a
synthetic 'box_webhook' user.
Webhook runs skip media-plan / localization / OCR context — those are
user-UI concepts without a meaningful source in unattended runs. The
existing /api/start_analysis path is unchanged.
client_config.py gains three optional per-client fields used by the new
flow when present: `box_folder_id`, `box_reports_folder_id`, and
`default_profile`. Existing client entries keep working without them.
.gitignore now excludes backend/config/box_jwt_config.json so the JWT
config (with its embedded private key + passphrase) never lands in git.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>