Run model: long-running scheduler container (APScheduler) replacing the
systemd timer in Docker deployments. Every Gemini-analysed file is also
persisted to a Postgres `tagging_events` table (run_id, prompt, raw
response, validated metadata, Box-write outcomes, status, error, timing)
for search and audit. Box is still updated exactly as before and remains
the source of truth for "already tagged" — `db.log_event` swallows DB
failures so an outage can't stop a tagging pass.
Backend:
- `db.py` + `schema.sql` — append-only `tagging_events` with indexes on
run_id, file_id, created_at.
- `scheduler.py` — APScheduler BlockingScheduler with `SCHEDULE_CRON`
(default daily 02:00), `RUN_AT_STARTUP`, SIGTERM handling.
- `api.py` (FastAPI) — `/api/health`, `/api/me`, `/api/events?q=…`
(single-input search across file_name, folder_path, description,
status, file_id, validated_metadata::text, raw_response::text,
scenes::text), `POST /api/runs` (fire-and-forget pass in a background
thread), `/api/runs`, `/api/runs/{id}/events`. Every event response
carries a synthesised `box_url`.
- `auth.py` — Azure AD bearer-token validation against the tenant JWKS
(signature + aud + iss). `DEV_AUTH_BYPASS=true` short-circuits to a
configurable dev user, mirrored on the frontend by
`VITE_DEV_AUTH_BYPASS`.
Frontend (Vite + React + TS):
- `frontend/` SPA, Montserrat + black/white/#FFC407 palette.
- @azure/msal-react with the bypass switch (auto-signin when bypass off).
- Search bar across all logged fields, results list with metadata tags,
status pills, and "Open in Box ↗" links.
- "Run now" button kicks off a tagging pass via `POST /api/runs` and
polls `/api/runs/{id}/events` every 2 s for live progress.
Docker / compose:
- `docker-compose.yml` pins `name: marriott-tagging`. Three services:
`db` (postgres:16, named volume, bound to 127.0.0.1 only), `tagger`
(scheduler.py), `api` (uvicorn). Same image, different `command`.
- `Dockerfile` — python:3.12-slim, non-root user.
Deploy (optical-dev.oliver.solutions):
- `deploy/deploy.sh` — idempotent. Auto-picks free host ports
(POSTGRES_HOST_PORT 5435-5499, MARRIOTT_API_PORT 8003-8099), renders
`apache-marriott-tagging.conf` from the .tmpl, builds the SPA in a
one-shot node:20-alpine container, rsyncs `dist/` to
`/var/www/html/marriott-tagging/`, polls `/api/health`, and prints the
shared-vhost Include line.
- `apache-marriott-tagging.conf.tmpl` — proxy `/marriott-tagging/api/`
to the API container, alias `/marriott-tagging` to the SPA web-root,
SPA fallback to `index.html`.
systemd unit files left in place for the existing Ubuntu deployment
path; do not run both on the same host (would double-fire the tagger).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
110 lines
3.7 KiB
Python
110 lines
3.7 KiB
Python
"""
|
|
Azure AD (Entra ID) bearer-token auth for the FastAPI backend.
|
|
|
|
- DEV_AUTH_BYPASS=true → skip all validation, return a fixed dev user.
|
|
- Otherwise: extract Bearer token, fetch the tenant's JWKS once and cache it,
|
|
verify the JWT signature, and check `aud` matches AZURE_CLIENT_ID.
|
|
"""
|
|
|
|
import os
|
|
import time
|
|
from typing import Optional
|
|
|
|
import httpx
|
|
import jwt
|
|
from fastapi import Depends, Header, HTTPException, status
|
|
from jwt import PyJWKClient
|
|
|
|
AZURE_TENANT_ID = os.getenv("AZURE_TENANT_ID", "").strip()
|
|
AZURE_CLIENT_ID = os.getenv("AZURE_CLIENT_ID", "").strip()
|
|
DEV_AUTH_BYPASS = os.getenv("DEV_AUTH_BYPASS", "").strip().lower() in ("1", "true", "yes")
|
|
|
|
JWKS_URL = f"https://login.microsoftonline.com/{AZURE_TENANT_ID}/discovery/v2.0/keys" if AZURE_TENANT_ID else None
|
|
ISSUERS = (
|
|
f"https://login.microsoftonline.com/{AZURE_TENANT_ID}/v2.0",
|
|
f"https://sts.windows.net/{AZURE_TENANT_ID}/",
|
|
)
|
|
|
|
_jwks_client: Optional[PyJWKClient] = None
|
|
|
|
|
|
def _get_jwks_client() -> PyJWKClient:
|
|
global _jwks_client
|
|
if _jwks_client is None:
|
|
if not JWKS_URL:
|
|
raise HTTPException(
|
|
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
|
detail="AZURE_TENANT_ID not configured on the server",
|
|
)
|
|
_jwks_client = PyJWKClient(JWKS_URL)
|
|
return _jwks_client
|
|
|
|
|
|
class User:
|
|
def __init__(self, *, oid: str, name: str, email: str, dev: bool = False):
|
|
self.oid = oid
|
|
self.name = name
|
|
self.email = email
|
|
self.dev = dev
|
|
|
|
def to_dict(self):
|
|
return {"oid": self.oid, "name": self.name, "email": self.email, "dev": self.dev}
|
|
|
|
|
|
def _bypass_user() -> User:
|
|
return User(
|
|
oid="dev-bypass",
|
|
name=os.getenv("DEV_AUTH_NAME", "Dev User"),
|
|
email=os.getenv("DEV_AUTH_EMAIL", "dev@oliver.agency"),
|
|
dev=True,
|
|
)
|
|
|
|
|
|
def require_auth(authorization: Optional[str] = Header(default=None)) -> User:
|
|
"""
|
|
FastAPI dependency. Validates the Bearer token and returns a User, or
|
|
raises 401. Honors DEV_AUTH_BYPASS for local/dev use.
|
|
"""
|
|
if DEV_AUTH_BYPASS:
|
|
return _bypass_user()
|
|
|
|
if not authorization or not authorization.lower().startswith("bearer "):
|
|
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Missing bearer token")
|
|
|
|
token = authorization.split(" ", 1)[1].strip()
|
|
if not AZURE_TENANT_ID or not AZURE_CLIENT_ID:
|
|
raise HTTPException(
|
|
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
|
detail="Server missing AZURE_TENANT_ID / AZURE_CLIENT_ID",
|
|
)
|
|
|
|
try:
|
|
signing_key = _get_jwks_client().get_signing_key_from_jwt(token).key
|
|
# Accept either v2.0 or v1.0 issuer URLs.
|
|
claims = jwt.decode(
|
|
token,
|
|
signing_key,
|
|
algorithms=["RS256"],
|
|
audience=AZURE_CLIENT_ID,
|
|
issuer=list(ISSUERS),
|
|
options={"verify_aud": True, "verify_iss": True, "verify_exp": True},
|
|
)
|
|
except jwt.InvalidTokenError as e:
|
|
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail=f"Invalid token: {e}")
|
|
except httpx.HTTPError as e:
|
|
raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail=f"JWKS fetch failed: {e}")
|
|
|
|
return User(
|
|
oid=claims.get("oid") or claims.get("sub", "unknown"),
|
|
name=claims.get("name", ""),
|
|
email=claims.get("preferred_username") or claims.get("upn") or claims.get("email", ""),
|
|
)
|
|
|
|
|
|
def maybe_auth_info():
|
|
"""Diagnostic helper for /api/health: report whether auth is wired."""
|
|
return {
|
|
"dev_bypass": DEV_AUTH_BYPASS,
|
|
"tenant_configured": bool(AZURE_TENANT_ID),
|
|
"client_configured": bool(AZURE_CLIENT_ID),
|
|
}
|