marriott-box-image-video-ta.../auth.py
DJP 99e978b895 Dockerize, add Postgres request log, FastAPI + React SPA
Run model: long-running scheduler container (APScheduler) replacing the
systemd timer in Docker deployments. Every Gemini-analysed file is also
persisted to a Postgres `tagging_events` table (run_id, prompt, raw
response, validated metadata, Box-write outcomes, status, error, timing)
for search and audit. Box is still updated exactly as before and remains
the source of truth for "already tagged" — `db.log_event` swallows DB
failures so an outage can't stop a tagging pass.

Backend:
- `db.py` + `schema.sql` — append-only `tagging_events` with indexes on
  run_id, file_id, created_at.
- `scheduler.py` — APScheduler BlockingScheduler with `SCHEDULE_CRON`
  (default daily 02:00), `RUN_AT_STARTUP`, SIGTERM handling.
- `api.py` (FastAPI) — `/api/health`, `/api/me`, `/api/events?q=…`
  (single-input search across file_name, folder_path, description,
  status, file_id, validated_metadata::text, raw_response::text,
  scenes::text), `POST /api/runs` (fire-and-forget pass in a background
  thread), `/api/runs`, `/api/runs/{id}/events`. Every event response
  carries a synthesised `box_url`.
- `auth.py` — Azure AD bearer-token validation against the tenant JWKS
  (signature + aud + iss). `DEV_AUTH_BYPASS=true` short-circuits to a
  configurable dev user, mirrored on the frontend by
  `VITE_DEV_AUTH_BYPASS`.

Frontend (Vite + React + TS):
- `frontend/` SPA, Montserrat + black/white/#FFC407 palette.
- @azure/msal-react with the bypass switch (auto-signin when bypass off).
- Search bar across all logged fields, results list with metadata tags,
  status pills, and "Open in Box ↗" links.
- "Run now" button kicks off a tagging pass via `POST /api/runs` and
  polls `/api/runs/{id}/events` every 2 s for live progress.

Docker / compose:
- `docker-compose.yml` pins `name: marriott-tagging`. Three services:
  `db` (postgres:16, named volume, bound to 127.0.0.1 only), `tagger`
  (scheduler.py), `api` (uvicorn). Same image, different `command`.
- `Dockerfile` — python:3.12-slim, non-root user.

Deploy (optical-dev.oliver.solutions):
- `deploy/deploy.sh` — idempotent. Auto-picks free host ports
  (POSTGRES_HOST_PORT 5435-5499, MARRIOTT_API_PORT 8003-8099), renders
  `apache-marriott-tagging.conf` from the .tmpl, builds the SPA in a
  one-shot node:20-alpine container, rsyncs `dist/` to
  `/var/www/html/marriott-tagging/`, polls `/api/health`, and prints the
  shared-vhost Include line.
- `apache-marriott-tagging.conf.tmpl` — proxy `/marriott-tagging/api/`
  to the API container, alias `/marriott-tagging` to the SPA web-root,
  SPA fallback to `index.html`.

systemd unit files left in place for the existing Ubuntu deployment
path; do not run both on the same host (would double-fire the tagger).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 14:56:58 -04:00

110 lines
3.7 KiB
Python

"""
Azure AD (Entra ID) bearer-token auth for the FastAPI backend.
- DEV_AUTH_BYPASS=true → skip all validation, return a fixed dev user.
- Otherwise: extract Bearer token, fetch the tenant's JWKS once and cache it,
verify the JWT signature, and check `aud` matches AZURE_CLIENT_ID.
"""
import os
import time
from typing import Optional
import httpx
import jwt
from fastapi import Depends, Header, HTTPException, status
from jwt import PyJWKClient
AZURE_TENANT_ID = os.getenv("AZURE_TENANT_ID", "").strip()
AZURE_CLIENT_ID = os.getenv("AZURE_CLIENT_ID", "").strip()
DEV_AUTH_BYPASS = os.getenv("DEV_AUTH_BYPASS", "").strip().lower() in ("1", "true", "yes")
JWKS_URL = f"https://login.microsoftonline.com/{AZURE_TENANT_ID}/discovery/v2.0/keys" if AZURE_TENANT_ID else None
ISSUERS = (
f"https://login.microsoftonline.com/{AZURE_TENANT_ID}/v2.0",
f"https://sts.windows.net/{AZURE_TENANT_ID}/",
)
_jwks_client: Optional[PyJWKClient] = None
def _get_jwks_client() -> PyJWKClient:
global _jwks_client
if _jwks_client is None:
if not JWKS_URL:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="AZURE_TENANT_ID not configured on the server",
)
_jwks_client = PyJWKClient(JWKS_URL)
return _jwks_client
class User:
def __init__(self, *, oid: str, name: str, email: str, dev: bool = False):
self.oid = oid
self.name = name
self.email = email
self.dev = dev
def to_dict(self):
return {"oid": self.oid, "name": self.name, "email": self.email, "dev": self.dev}
def _bypass_user() -> User:
return User(
oid="dev-bypass",
name=os.getenv("DEV_AUTH_NAME", "Dev User"),
email=os.getenv("DEV_AUTH_EMAIL", "dev@oliver.agency"),
dev=True,
)
def require_auth(authorization: Optional[str] = Header(default=None)) -> User:
"""
FastAPI dependency. Validates the Bearer token and returns a User, or
raises 401. Honors DEV_AUTH_BYPASS for local/dev use.
"""
if DEV_AUTH_BYPASS:
return _bypass_user()
if not authorization or not authorization.lower().startswith("bearer "):
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Missing bearer token")
token = authorization.split(" ", 1)[1].strip()
if not AZURE_TENANT_ID or not AZURE_CLIENT_ID:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Server missing AZURE_TENANT_ID / AZURE_CLIENT_ID",
)
try:
signing_key = _get_jwks_client().get_signing_key_from_jwt(token).key
# Accept either v2.0 or v1.0 issuer URLs.
claims = jwt.decode(
token,
signing_key,
algorithms=["RS256"],
audience=AZURE_CLIENT_ID,
issuer=list(ISSUERS),
options={"verify_aud": True, "verify_iss": True, "verify_exp": True},
)
except jwt.InvalidTokenError as e:
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail=f"Invalid token: {e}")
except httpx.HTTPError as e:
raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail=f"JWKS fetch failed: {e}")
return User(
oid=claims.get("oid") or claims.get("sub", "unknown"),
name=claims.get("name", ""),
email=claims.get("preferred_username") or claims.get("upn") or claims.get("email", ""),
)
def maybe_auth_info():
"""Diagnostic helper for /api/health: report whether auth is wired."""
return {
"dev_bypass": DEV_AUTH_BYPASS,
"tenant_configured": bool(AZURE_TENANT_ID),
"client_configured": bool(AZURE_CLIENT_ID),
}