Run model: long-running scheduler container (APScheduler) replacing the
systemd timer in Docker deployments. Every Gemini-analysed file is also
persisted to a Postgres `tagging_events` table (run_id, prompt, raw
response, validated metadata, Box-write outcomes, status, error, timing)
for search and audit. Box is still updated exactly as before and remains
the source of truth for "already tagged" — `db.log_event` swallows DB
failures so an outage can't stop a tagging pass.
Backend:
- `db.py` + `schema.sql` — append-only `tagging_events` with indexes on
run_id, file_id, created_at.
- `scheduler.py` — APScheduler BlockingScheduler with `SCHEDULE_CRON`
(default daily 02:00), `RUN_AT_STARTUP`, SIGTERM handling.
- `api.py` (FastAPI) — `/api/health`, `/api/me`, `/api/events?q=…`
(single-input search across file_name, folder_path, description,
status, file_id, validated_metadata::text, raw_response::text,
scenes::text), `POST /api/runs` (fire-and-forget pass in a background
thread), `/api/runs`, `/api/runs/{id}/events`. Every event response
carries a synthesised `box_url`.
- `auth.py` — Azure AD bearer-token validation against the tenant JWKS
(signature + aud + iss). `DEV_AUTH_BYPASS=true` short-circuits to a
configurable dev user, mirrored on the frontend by
`VITE_DEV_AUTH_BYPASS`.
Frontend (Vite + React + TS):
- `frontend/` SPA, Montserrat + black/white/#FFC407 palette.
- @azure/msal-react with the bypass switch (auto-signin when bypass off).
- Search bar across all logged fields, results list with metadata tags,
status pills, and "Open in Box ↗" links.
- "Run now" button kicks off a tagging pass via `POST /api/runs` and
polls `/api/runs/{id}/events` every 2 s for live progress.
Docker / compose:
- `docker-compose.yml` pins `name: marriott-tagging`. Three services:
`db` (postgres:16, named volume, bound to 127.0.0.1 only), `tagger`
(scheduler.py), `api` (uvicorn). Same image, different `command`.
- `Dockerfile` — python:3.12-slim, non-root user.
Deploy (optical-dev.oliver.solutions):
- `deploy/deploy.sh` — idempotent. Auto-picks free host ports
(POSTGRES_HOST_PORT 5435-5499, MARRIOTT_API_PORT 8003-8099), renders
`apache-marriott-tagging.conf` from the .tmpl, builds the SPA in a
one-shot node:20-alpine container, rsyncs `dist/` to
`/var/www/html/marriott-tagging/`, polls `/api/health`, and prints the
shared-vhost Include line.
- `apache-marriott-tagging.conf.tmpl` — proxy `/marriott-tagging/api/`
to the API container, alias `/marriott-tagging` to the SPA web-root,
SPA fallback to `index.html`.
systemd unit files left in place for the existing Ubuntu deployment
path; do not run both on the same host (would double-fire the tagger).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
112 lines
3.1 KiB
Python
112 lines
3.1 KiB
Python
"""
|
|
Postgres logging for the Marriott Box tagger.
|
|
|
|
One row per file Gemini was called on (success or error). The DB is auxiliary —
|
|
all functions swallow exceptions and print to stderr so a Postgres outage cannot
|
|
stop the tagging pass. Box remains the source of truth.
|
|
"""
|
|
|
|
import json
|
|
import os
|
|
import sys
|
|
from pathlib import Path
|
|
|
|
import psycopg
|
|
from psycopg.types.json import Jsonb
|
|
|
|
SCHEMA_PATH = Path(__file__).parent / "schema.sql"
|
|
|
|
INSERT_SQL = """
|
|
INSERT INTO tagging_events (
|
|
run_id, file_id, file_name, folder_path, media_type, gemini_model,
|
|
prompt, raw_response, description, scenes, validated_metadata,
|
|
metadata_write_success, description_write_success, scene_comment_write_success,
|
|
status, error_message, duration_ms
|
|
) VALUES (
|
|
%(run_id)s, %(file_id)s, %(file_name)s, %(folder_path)s, %(media_type)s, %(gemini_model)s,
|
|
%(prompt)s, %(raw_response)s, %(description)s, %(scenes)s, %(validated_metadata)s,
|
|
%(metadata_write_success)s, %(description_write_success)s, %(scene_comment_write_success)s,
|
|
%(status)s, %(error_message)s, %(duration_ms)s
|
|
)
|
|
"""
|
|
|
|
|
|
def _dsn():
|
|
dsn = os.getenv("DATABASE_URL")
|
|
if not dsn:
|
|
raise RuntimeError("DATABASE_URL not set")
|
|
return dsn
|
|
|
|
|
|
def get_conn():
|
|
"""Open a Postgres connection. Caller owns close()."""
|
|
return psycopg.connect(_dsn(), autocommit=True)
|
|
|
|
|
|
def ensure_schema(conn):
|
|
"""Apply schema.sql idempotently."""
|
|
sql = SCHEMA_PATH.read_text()
|
|
with conn.cursor() as cur:
|
|
cur.execute(sql)
|
|
|
|
|
|
def _jsonable(value):
|
|
if value is None:
|
|
return None
|
|
return Jsonb(value)
|
|
|
|
|
|
def log_event(
|
|
conn,
|
|
*,
|
|
run_id,
|
|
file_id,
|
|
file_name,
|
|
folder_path,
|
|
media_type,
|
|
gemini_model,
|
|
status,
|
|
prompt=None,
|
|
raw_response=None,
|
|
description=None,
|
|
scenes=None,
|
|
validated_metadata=None,
|
|
metadata_write_success=None,
|
|
description_write_success=None,
|
|
scene_comment_write_success=None,
|
|
error_message=None,
|
|
duration_ms=None,
|
|
):
|
|
"""
|
|
Insert one tagging_events row. Never raises — DB problems are reported to stderr
|
|
and the tagger continues.
|
|
"""
|
|
if conn is None:
|
|
return
|
|
params = {
|
|
"run_id": str(run_id),
|
|
"file_id": str(file_id) if file_id is not None else None,
|
|
"file_name": file_name,
|
|
"folder_path": folder_path,
|
|
"media_type": media_type,
|
|
"gemini_model": gemini_model,
|
|
"prompt": prompt,
|
|
"raw_response": _jsonable(raw_response),
|
|
"description": description,
|
|
"scenes": _jsonable(scenes),
|
|
"validated_metadata": _jsonable(validated_metadata),
|
|
"metadata_write_success": metadata_write_success,
|
|
"description_write_success": description_write_success,
|
|
"scene_comment_write_success": scene_comment_write_success,
|
|
"status": status,
|
|
"error_message": error_message,
|
|
"duration_ms": duration_ms,
|
|
}
|
|
try:
|
|
with conn.cursor() as cur:
|
|
cur.execute(INSERT_SQL, params)
|
|
except Exception as e:
|
|
print(
|
|
f" WARN: DB log_event failed ({type(e).__name__}: {e}) — continuing",
|
|
file=sys.stderr,
|
|
)
|