marriott-box-image-video-ta.../db.py
DJP 99e978b895 Dockerize, add Postgres request log, FastAPI + React SPA
Run model: long-running scheduler container (APScheduler) replacing the
systemd timer in Docker deployments. Every Gemini-analysed file is also
persisted to a Postgres `tagging_events` table (run_id, prompt, raw
response, validated metadata, Box-write outcomes, status, error, timing)
for search and audit. Box is still updated exactly as before and remains
the source of truth for "already tagged" — `db.log_event` swallows DB
failures so an outage can't stop a tagging pass.

Backend:
- `db.py` + `schema.sql` — append-only `tagging_events` with indexes on
  run_id, file_id, created_at.
- `scheduler.py` — APScheduler BlockingScheduler with `SCHEDULE_CRON`
  (default daily 02:00), `RUN_AT_STARTUP`, SIGTERM handling.
- `api.py` (FastAPI) — `/api/health`, `/api/me`, `/api/events?q=…`
  (single-input search across file_name, folder_path, description,
  status, file_id, validated_metadata::text, raw_response::text,
  scenes::text), `POST /api/runs` (fire-and-forget pass in a background
  thread), `/api/runs`, `/api/runs/{id}/events`. Every event response
  carries a synthesised `box_url`.
- `auth.py` — Azure AD bearer-token validation against the tenant JWKS
  (signature + aud + iss). `DEV_AUTH_BYPASS=true` short-circuits to a
  configurable dev user, mirrored on the frontend by
  `VITE_DEV_AUTH_BYPASS`.

Frontend (Vite + React + TS):
- `frontend/` SPA, Montserrat + black/white/#FFC407 palette.
- @azure/msal-react with the bypass switch (auto-signin when bypass off).
- Search bar across all logged fields, results list with metadata tags,
  status pills, and "Open in Box ↗" links.
- "Run now" button kicks off a tagging pass via `POST /api/runs` and
  polls `/api/runs/{id}/events` every 2 s for live progress.

Docker / compose:
- `docker-compose.yml` pins `name: marriott-tagging`. Three services:
  `db` (postgres:16, named volume, bound to 127.0.0.1 only), `tagger`
  (scheduler.py), `api` (uvicorn). Same image, different `command`.
- `Dockerfile` — python:3.12-slim, non-root user.

Deploy (optical-dev.oliver.solutions):
- `deploy/deploy.sh` — idempotent. Auto-picks free host ports
  (POSTGRES_HOST_PORT 5435-5499, MARRIOTT_API_PORT 8003-8099), renders
  `apache-marriott-tagging.conf` from the .tmpl, builds the SPA in a
  one-shot node:20-alpine container, rsyncs `dist/` to
  `/var/www/html/marriott-tagging/`, polls `/api/health`, and prints the
  shared-vhost Include line.
- `apache-marriott-tagging.conf.tmpl` — proxy `/marriott-tagging/api/`
  to the API container, alias `/marriott-tagging` to the SPA web-root,
  SPA fallback to `index.html`.

systemd unit files left in place for the existing Ubuntu deployment
path; do not run both on the same host (would double-fire the tagger).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 14:56:58 -04:00

112 lines
3.1 KiB
Python

"""
Postgres logging for the Marriott Box tagger.
One row per file Gemini was called on (success or error). The DB is auxiliary —
all functions swallow exceptions and print to stderr so a Postgres outage cannot
stop the tagging pass. Box remains the source of truth.
"""
import json
import os
import sys
from pathlib import Path
import psycopg
from psycopg.types.json import Jsonb
SCHEMA_PATH = Path(__file__).parent / "schema.sql"
INSERT_SQL = """
INSERT INTO tagging_events (
run_id, file_id, file_name, folder_path, media_type, gemini_model,
prompt, raw_response, description, scenes, validated_metadata,
metadata_write_success, description_write_success, scene_comment_write_success,
status, error_message, duration_ms
) VALUES (
%(run_id)s, %(file_id)s, %(file_name)s, %(folder_path)s, %(media_type)s, %(gemini_model)s,
%(prompt)s, %(raw_response)s, %(description)s, %(scenes)s, %(validated_metadata)s,
%(metadata_write_success)s, %(description_write_success)s, %(scene_comment_write_success)s,
%(status)s, %(error_message)s, %(duration_ms)s
)
"""
def _dsn():
dsn = os.getenv("DATABASE_URL")
if not dsn:
raise RuntimeError("DATABASE_URL not set")
return dsn
def get_conn():
"""Open a Postgres connection. Caller owns close()."""
return psycopg.connect(_dsn(), autocommit=True)
def ensure_schema(conn):
"""Apply schema.sql idempotently."""
sql = SCHEMA_PATH.read_text()
with conn.cursor() as cur:
cur.execute(sql)
def _jsonable(value):
if value is None:
return None
return Jsonb(value)
def log_event(
conn,
*,
run_id,
file_id,
file_name,
folder_path,
media_type,
gemini_model,
status,
prompt=None,
raw_response=None,
description=None,
scenes=None,
validated_metadata=None,
metadata_write_success=None,
description_write_success=None,
scene_comment_write_success=None,
error_message=None,
duration_ms=None,
):
"""
Insert one tagging_events row. Never raises — DB problems are reported to stderr
and the tagger continues.
"""
if conn is None:
return
params = {
"run_id": str(run_id),
"file_id": str(file_id) if file_id is not None else None,
"file_name": file_name,
"folder_path": folder_path,
"media_type": media_type,
"gemini_model": gemini_model,
"prompt": prompt,
"raw_response": _jsonable(raw_response),
"description": description,
"scenes": _jsonable(scenes),
"validated_metadata": _jsonable(validated_metadata),
"metadata_write_success": metadata_write_success,
"description_write_success": description_write_success,
"scene_comment_write_success": scene_comment_write_success,
"status": status,
"error_message": error_message,
"duration_ms": duration_ms,
}
try:
with conn.cursor() as cur:
cur.execute(INSERT_SQL, params)
except Exception as e:
print(
f" WARN: DB log_event failed ({type(e).__name__}: {e}) — continuing",
file=sys.stderr,
)