Run model: long-running scheduler container (APScheduler) replacing the
systemd timer in Docker deployments. Every Gemini-analysed file is also
persisted to a Postgres `tagging_events` table (run_id, prompt, raw
response, validated metadata, Box-write outcomes, status, error, timing)
for search and audit. Box is still updated exactly as before and remains
the source of truth for "already tagged" — `db.log_event` swallows DB
failures so an outage can't stop a tagging pass.
Backend:
- `db.py` + `schema.sql` — append-only `tagging_events` with indexes on
run_id, file_id, created_at.
- `scheduler.py` — APScheduler BlockingScheduler with `SCHEDULE_CRON`
(default daily 02:00), `RUN_AT_STARTUP`, SIGTERM handling.
- `api.py` (FastAPI) — `/api/health`, `/api/me`, `/api/events?q=…`
(single-input search across file_name, folder_path, description,
status, file_id, validated_metadata::text, raw_response::text,
scenes::text), `POST /api/runs` (fire-and-forget pass in a background
thread), `/api/runs`, `/api/runs/{id}/events`. Every event response
carries a synthesised `box_url`.
- `auth.py` — Azure AD bearer-token validation against the tenant JWKS
(signature + aud + iss). `DEV_AUTH_BYPASS=true` short-circuits to a
configurable dev user, mirrored on the frontend by
`VITE_DEV_AUTH_BYPASS`.
Frontend (Vite + React + TS):
- `frontend/` SPA, Montserrat + black/white/#FFC407 palette.
- @azure/msal-react with the bypass switch (auto-signin when bypass off).
- Search bar across all logged fields, results list with metadata tags,
status pills, and "Open in Box ↗" links.
- "Run now" button kicks off a tagging pass via `POST /api/runs` and
polls `/api/runs/{id}/events` every 2 s for live progress.
Docker / compose:
- `docker-compose.yml` pins `name: marriott-tagging`. Three services:
`db` (postgres:16, named volume, bound to 127.0.0.1 only), `tagger`
(scheduler.py), `api` (uvicorn). Same image, different `command`.
- `Dockerfile` — python:3.12-slim, non-root user.
Deploy (optical-dev.oliver.solutions):
- `deploy/deploy.sh` — idempotent. Auto-picks free host ports
(POSTGRES_HOST_PORT 5435-5499, MARRIOTT_API_PORT 8003-8099), renders
`apache-marriott-tagging.conf` from the .tmpl, builds the SPA in a
one-shot node:20-alpine container, rsyncs `dist/` to
`/var/www/html/marriott-tagging/`, polls `/api/health`, and prints the
shared-vhost Include line.
- `apache-marriott-tagging.conf.tmpl` — proxy `/marriott-tagging/api/`
to the API container, alias `/marriott-tagging` to the SPA web-root,
SPA fallback to `index.html`.
systemd unit files left in place for the existing Ubuntu deployment
path; do not run both on the same host (would double-fire the tagger).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
82 lines
2.7 KiB
Python
82 lines
2.7 KiB
Python
"""
|
|
Long-running scheduler entrypoint for the Marriott Box tagger Docker container.
|
|
|
|
Replaces the Ubuntu systemd timer when running under Docker. Fires main.main()
|
|
on the schedule in $SCHEDULE_CRON (default: daily 02:00). If $RUN_AT_STARTUP=1,
|
|
also fires an immediate one-off pass on boot.
|
|
|
|
DB schema is bootstrapped once at startup if Postgres is reachable; main() also
|
|
re-checks per run, so a temporary DB outage during startup self-heals.
|
|
"""
|
|
|
|
import os
|
|
import signal
|
|
import sys
|
|
import time
|
|
|
|
from apscheduler.schedulers.blocking import BlockingScheduler
|
|
from apscheduler.triggers.cron import CronTrigger
|
|
from dotenv import load_dotenv
|
|
|
|
import db
|
|
import main as tagger
|
|
|
|
|
|
def _bootstrap_db():
|
|
"""Best-effort: open a connection, apply schema. Failures are logged, not fatal."""
|
|
try:
|
|
conn = db.get_conn()
|
|
db.ensure_schema(conn)
|
|
conn.close()
|
|
print("[scheduler] Postgres schema ensured.")
|
|
except Exception as e:
|
|
print(f"[scheduler] WARN: could not bootstrap Postgres ({type(e).__name__}: {e}).")
|
|
|
|
|
|
def _run_job():
|
|
print(f"\n[scheduler] Firing tagging pass at {time.strftime('%Y-%m-%d %H:%M:%S %Z')}")
|
|
try:
|
|
tagger.main()
|
|
except SystemExit as e:
|
|
# main() calls sys.exit() on missing credentials; let the scheduler keep running.
|
|
print(f"[scheduler] tagging pass exited with code {e.code} — scheduler stays up.")
|
|
except Exception as e:
|
|
print(f"[scheduler] tagging pass raised {type(e).__name__}: {e} — scheduler stays up.")
|
|
|
|
|
|
def main():
|
|
load_dotenv()
|
|
_bootstrap_db()
|
|
|
|
cron_expr = os.getenv("SCHEDULE_CRON", "0 2 * * *").strip()
|
|
tz = os.getenv("TZ") # apscheduler reads tzinfo; if unset uses system local
|
|
print(f"[scheduler] Cron schedule: '{cron_expr}' (TZ={tz or 'system'})")
|
|
|
|
scheduler = BlockingScheduler(timezone=tz) if tz else BlockingScheduler()
|
|
scheduler.add_job(
|
|
_run_job,
|
|
CronTrigger.from_crontab(cron_expr, timezone=tz) if tz else CronTrigger.from_crontab(cron_expr),
|
|
id="tagging_pass",
|
|
max_instances=1,
|
|
coalesce=True,
|
|
misfire_grace_time=3600,
|
|
)
|
|
|
|
# SIGTERM/SIGINT → graceful shutdown
|
|
def _shutdown(signum, frame):
|
|
print(f"[scheduler] Received signal {signum} — shutting down.")
|
|
scheduler.shutdown(wait=False)
|
|
sys.exit(0)
|
|
signal.signal(signal.SIGTERM, _shutdown)
|
|
signal.signal(signal.SIGINT, _shutdown)
|
|
|
|
if os.getenv("RUN_AT_STARTUP", "").strip() in ("1", "true", "yes"):
|
|
print("[scheduler] RUN_AT_STARTUP set — firing one pass now.")
|
|
_run_job()
|
|
|
|
print("[scheduler] Entering scheduler loop. Ctrl-C / SIGTERM to exit.")
|
|
scheduler.start()
|
|
|
|
|
|
if __name__ == "__main__":
|
|
main()
|