Commit graph

2 commits

Author SHA1 Message Date
DJP
1f2c2ff8e1 Multi-token + fuzzy search; admin-only Run Now / Backfill
Search:
- Previously /api/events did one ILIKE %q% across the columns, so
  "female city" required the literal substring "female city" to
  appear somewhere. Now the query is tokenised on whitespace; every
  token must match somewhere (AND), and each token matches either
  by substring (ILIKE) across the searched columns OR by trigram
  similarity (pg_trgm) against a concatenated text blob with a 0.3
  threshold — handles typos like "femalle" → "female".
- Results ranked by summed similarity score across all tokens, then
  recency. Empty query falls back to "newest 100".
- schema.sql: CREATE EXTENSION IF NOT EXISTS pg_trgm (idempotent;
  applied by ensure_schema on api startup).

Admin gating:
- auth.py: User now carries `is_admin`. Computed from a
  comma-separated ADMIN_EMAILS env var (case-insensitive match
  against `preferred_username`/`upn`/`email` claim). New
  `require_admin` FastAPI dependency 403s non-admins.
- In DEV_AUTH_BYPASS mode the dev user is admin by default; flip
  DEV_AUTH_IS_ADMIN=false to test the read-only UX without enabling
  SSO.
- POST /api/runs and POST /api/backfill now gated by require_admin.
- /api/me carries is_admin so the SPA can hide the destructive
  buttons for non-admins.

Frontend:
- App.tsx fetches /api/me on mount and hides Run Now + Backfill
  unless `is_admin` is true. Non-admins still see search + results +
  recent-runs table.

docker-compose / .env.example: thread ADMIN_EMAILS +
DEV_AUTH_IS_ADMIN into the api container.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 15:51:50 -04:00
DJP
99e978b895 Dockerize, add Postgres request log, FastAPI + React SPA
Run model: long-running scheduler container (APScheduler) replacing the
systemd timer in Docker deployments. Every Gemini-analysed file is also
persisted to a Postgres `tagging_events` table (run_id, prompt, raw
response, validated metadata, Box-write outcomes, status, error, timing)
for search and audit. Box is still updated exactly as before and remains
the source of truth for "already tagged" — `db.log_event` swallows DB
failures so an outage can't stop a tagging pass.

Backend:
- `db.py` + `schema.sql` — append-only `tagging_events` with indexes on
  run_id, file_id, created_at.
- `scheduler.py` — APScheduler BlockingScheduler with `SCHEDULE_CRON`
  (default daily 02:00), `RUN_AT_STARTUP`, SIGTERM handling.
- `api.py` (FastAPI) — `/api/health`, `/api/me`, `/api/events?q=…`
  (single-input search across file_name, folder_path, description,
  status, file_id, validated_metadata::text, raw_response::text,
  scenes::text), `POST /api/runs` (fire-and-forget pass in a background
  thread), `/api/runs`, `/api/runs/{id}/events`. Every event response
  carries a synthesised `box_url`.
- `auth.py` — Azure AD bearer-token validation against the tenant JWKS
  (signature + aud + iss). `DEV_AUTH_BYPASS=true` short-circuits to a
  configurable dev user, mirrored on the frontend by
  `VITE_DEV_AUTH_BYPASS`.

Frontend (Vite + React + TS):
- `frontend/` SPA, Montserrat + black/white/#FFC407 palette.
- @azure/msal-react with the bypass switch (auto-signin when bypass off).
- Search bar across all logged fields, results list with metadata tags,
  status pills, and "Open in Box ↗" links.
- "Run now" button kicks off a tagging pass via `POST /api/runs` and
  polls `/api/runs/{id}/events` every 2 s for live progress.

Docker / compose:
- `docker-compose.yml` pins `name: marriott-tagging`. Three services:
  `db` (postgres:16, named volume, bound to 127.0.0.1 only), `tagger`
  (scheduler.py), `api` (uvicorn). Same image, different `command`.
- `Dockerfile` — python:3.12-slim, non-root user.

Deploy (optical-dev.oliver.solutions):
- `deploy/deploy.sh` — idempotent. Auto-picks free host ports
  (POSTGRES_HOST_PORT 5435-5499, MARRIOTT_API_PORT 8003-8099), renders
  `apache-marriott-tagging.conf` from the .tmpl, builds the SPA in a
  one-shot node:20-alpine container, rsyncs `dist/` to
  `/var/www/html/marriott-tagging/`, polls `/api/health`, and prints the
  shared-vhost Include line.
- `apache-marriott-tagging.conf.tmpl` — proxy `/marriott-tagging/api/`
  to the API container, alias `/marriott-tagging` to the SPA web-root,
  SPA fallback to `index.html`.

systemd unit files left in place for the existing Ubuntu deployment
path; do not run both on the same host (would double-fire the tagger).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 14:56:58 -04:00