marriott-box-image-video-ta.../scheduler.py
DJP 99e978b895 Dockerize, add Postgres request log, FastAPI + React SPA
Run model: long-running scheduler container (APScheduler) replacing the
systemd timer in Docker deployments. Every Gemini-analysed file is also
persisted to a Postgres `tagging_events` table (run_id, prompt, raw
response, validated metadata, Box-write outcomes, status, error, timing)
for search and audit. Box is still updated exactly as before and remains
the source of truth for "already tagged" — `db.log_event` swallows DB
failures so an outage can't stop a tagging pass.

Backend:
- `db.py` + `schema.sql` — append-only `tagging_events` with indexes on
  run_id, file_id, created_at.
- `scheduler.py` — APScheduler BlockingScheduler with `SCHEDULE_CRON`
  (default daily 02:00), `RUN_AT_STARTUP`, SIGTERM handling.
- `api.py` (FastAPI) — `/api/health`, `/api/me`, `/api/events?q=…`
  (single-input search across file_name, folder_path, description,
  status, file_id, validated_metadata::text, raw_response::text,
  scenes::text), `POST /api/runs` (fire-and-forget pass in a background
  thread), `/api/runs`, `/api/runs/{id}/events`. Every event response
  carries a synthesised `box_url`.
- `auth.py` — Azure AD bearer-token validation against the tenant JWKS
  (signature + aud + iss). `DEV_AUTH_BYPASS=true` short-circuits to a
  configurable dev user, mirrored on the frontend by
  `VITE_DEV_AUTH_BYPASS`.

Frontend (Vite + React + TS):
- `frontend/` SPA, Montserrat + black/white/#FFC407 palette.
- @azure/msal-react with the bypass switch (auto-signin when bypass off).
- Search bar across all logged fields, results list with metadata tags,
  status pills, and "Open in Box ↗" links.
- "Run now" button kicks off a tagging pass via `POST /api/runs` and
  polls `/api/runs/{id}/events` every 2 s for live progress.

Docker / compose:
- `docker-compose.yml` pins `name: marriott-tagging`. Three services:
  `db` (postgres:16, named volume, bound to 127.0.0.1 only), `tagger`
  (scheduler.py), `api` (uvicorn). Same image, different `command`.
- `Dockerfile` — python:3.12-slim, non-root user.

Deploy (optical-dev.oliver.solutions):
- `deploy/deploy.sh` — idempotent. Auto-picks free host ports
  (POSTGRES_HOST_PORT 5435-5499, MARRIOTT_API_PORT 8003-8099), renders
  `apache-marriott-tagging.conf` from the .tmpl, builds the SPA in a
  one-shot node:20-alpine container, rsyncs `dist/` to
  `/var/www/html/marriott-tagging/`, polls `/api/health`, and prints the
  shared-vhost Include line.
- `apache-marriott-tagging.conf.tmpl` — proxy `/marriott-tagging/api/`
  to the API container, alias `/marriott-tagging` to the SPA web-root,
  SPA fallback to `index.html`.

systemd unit files left in place for the existing Ubuntu deployment
path; do not run both on the same host (would double-fire the tagger).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 14:56:58 -04:00

82 lines
2.7 KiB
Python

"""
Long-running scheduler entrypoint for the Marriott Box tagger Docker container.
Replaces the Ubuntu systemd timer when running under Docker. Fires main.main()
on the schedule in $SCHEDULE_CRON (default: daily 02:00). If $RUN_AT_STARTUP=1,
also fires an immediate one-off pass on boot.
DB schema is bootstrapped once at startup if Postgres is reachable; main() also
re-checks per run, so a temporary DB outage during startup self-heals.
"""
import os
import signal
import sys
import time
from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.triggers.cron import CronTrigger
from dotenv import load_dotenv
import db
import main as tagger
def _bootstrap_db():
"""Best-effort: open a connection, apply schema. Failures are logged, not fatal."""
try:
conn = db.get_conn()
db.ensure_schema(conn)
conn.close()
print("[scheduler] Postgres schema ensured.")
except Exception as e:
print(f"[scheduler] WARN: could not bootstrap Postgres ({type(e).__name__}: {e}).")
def _run_job():
print(f"\n[scheduler] Firing tagging pass at {time.strftime('%Y-%m-%d %H:%M:%S %Z')}")
try:
tagger.main()
except SystemExit as e:
# main() calls sys.exit() on missing credentials; let the scheduler keep running.
print(f"[scheduler] tagging pass exited with code {e.code} — scheduler stays up.")
except Exception as e:
print(f"[scheduler] tagging pass raised {type(e).__name__}: {e} — scheduler stays up.")
def main():
load_dotenv()
_bootstrap_db()
cron_expr = os.getenv("SCHEDULE_CRON", "0 2 * * *").strip()
tz = os.getenv("TZ") # apscheduler reads tzinfo; if unset uses system local
print(f"[scheduler] Cron schedule: '{cron_expr}' (TZ={tz or 'system'})")
scheduler = BlockingScheduler(timezone=tz) if tz else BlockingScheduler()
scheduler.add_job(
_run_job,
CronTrigger.from_crontab(cron_expr, timezone=tz) if tz else CronTrigger.from_crontab(cron_expr),
id="tagging_pass",
max_instances=1,
coalesce=True,
misfire_grace_time=3600,
)
# SIGTERM/SIGINT → graceful shutdown
def _shutdown(signum, frame):
print(f"[scheduler] Received signal {signum} — shutting down.")
scheduler.shutdown(wait=False)
sys.exit(0)
signal.signal(signal.SIGTERM, _shutdown)
signal.signal(signal.SIGINT, _shutdown)
if os.getenv("RUN_AT_STARTUP", "").strip() in ("1", "true", "yes"):
print("[scheduler] RUN_AT_STARTUP set — firing one pass now.")
_run_job()
print("[scheduler] Entering scheduler loop. Ctrl-C / SIGTERM to exit.")
scheduler.start()
if __name__ == "__main__":
main()