Previously a nightly APScheduler container fired the tagger on every
file in the configured Box folder. With ~5000 files coming, that's
~5000 Box HTTP calls every night just to ask "is this tagged?". Move
to manual-only mode and source the skip decision from the local DB.
- `db.is_file_already_tagged(conn, file_id)` — returns True iff the
DB has a row with status IN ('success','backfilled'). Used by both
image and video loops in main.py instead of the previous
`check_existing_metadata(box_client, file_id)` Box round-trip.
- `fetch_existing_metadata(box_client, file_id)` (main.py) — returns
the user-defined template fields as a flat dict by stripping the
Box `$id`/`$type`/etc. attrs from the SDK response.
- `_run_backfill(run_id, db_conn)` (main.py) — walks the Box folder
and inserts a `status='backfilled'` row for every file Box already
has marriottUsa metadata for. Read-only against Box; safe to re-run.
Use this after first deploy, or to repopulate the DB from Box.
- `POST /api/backfill` mirrors `POST /api/runs` (background thread,
same live-state record).
- SPA: new "Backfill from Box" button next to "Run now" (with a
confirm dialog and a yellow `.status-backfilled` event treatment).
- docker-compose.yml: removed the `tagger` (scheduler) service.
Manual triggers via the SPA / `POST /api/runs` only. scheduler.py
stays in the repo for archival / opt-back-in.
- deploy.sh: readiness now checks the `api` container instead of
`tagger`; `--logs` tails api logs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>