Old README still described the nightly scheduler container, didn't cover backfill / thumbnails / admin gating / multi-token search / the API endpoints, and pointed at fields that no longer exist on events. Comprehensive rewrite covering: what the app does today, architecture diagram, repo layout, local quickstart, full env-var reference, operations (run/backfill/inspect), API surface, MSAL setup steps, deploy script + manual vhost Include, the two-table schema, troubleshooting, and the legacy systemd path preserved at the end for reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
27 KiB
Marriott Box Asset Tagger
AI-driven metadata tagging for images and videos stored in a Marriott Box folder, with a searchable Postgres audit log and a React SPA on top. Gemini analyses each asset against the marriottUsa metadata template; the resulting structured metadata, description, and (for videos) scene breakdown are written back to Box. Every Gemini call is also persisted to a local Postgres so it can be searched, audited, and re-displayed without round-tripping Box.
What you can do
- Trigger a tagging pass from the SPA's Run now button — admin-only. Walks the configured Box folder, skips files already in the local DB, sends new ones to Gemini, validates against the live Box template schema, writes metadata + description (and scene-breakdown comments for videos) back to Box, and inserts a
tagging_eventsrow per file Gemini saw. - Backfill from Box — admin-only. Walks the Box folder and mirrors any existing
marriottUsametadata into the local DB (status =backfilled). No Gemini calls, no Box writes. Use this after first deploy, after restoring a wiped DB, or to refresh thumbnails. Re-runnable safely. - Search the request log across every text + JSON field (file name, folder path, description, validated metadata, raw Gemini response, scene breakdown, status, file ID, and the consolidated
search_termsblob). Multi-word queries are AND'd across tokens; each token also fuzzy-matches viapg_trgmsimilarity sofemallestill finds "female". - See thumbnails inline in the search results — Box's pre-generated 160×160 JPG for each file is cached in Postgres (
file_assets.thumbnail_bytes). - Click through to Box on every result — the
box_urlis synthesised per row. - Azure AD SSO for sign-in, with a
DEV_AUTH_BYPASSswitch for local dev andADMIN_EMAILSallowlist gating the destructive endpoints.
The cron-driven nightly scheduler that used to fire passes automatically has been removed. The tool is manual-only: a human clicks Run now (or POSTs /api/runs). This keeps Box and Gemini API costs predictable as the folder grows. scheduler.py remains in the repo if you want to wire cron back in.
Architecture
Apache (shared vhost on optical-dev.oliver.solutions)
│
├──── /marriott-tagging/api/* ──┐
│ ▼
│ ┌──────────────────┐
│ │ api container │
│ │ (uvicorn, │
│ │ FastAPI) │
│ │ │
│ │ • /api/health │
│ │ • /api/me │
│ │ • /api/events │
│ │ • /api/runs │ ──┐
│ │ • /api/backfill │ ──┤ background
│ │ • /api/files/ │ │ thread runs
│ │ {id}/thumb │ │ main._run_pass
│ └──────────────────┘ │ / _run_backfill
│ │ │ which call →
└──── /marriott-tagging/* ◀────┘ │
(static SPA from ▼
/var/www/html/ ┌──────────────────┐
marriott-tagging/) │ Box API │
│ Gemini API │
└──────────────────┘
│
▼
┌──────────────────┐
│ db container │
│ (Postgres 16, │
│ bound to │
│ 127.0.0.1) │
│ │
│ • tagging_events │
│ • file_assets │
└──────────────────┘
Containers: db + api. They share a Docker network and a named volume (marriott-tagging_pgdata).
Outside the container set: Apache (host), built SPA at /var/www/html/marriott-tagging/, the shared vhost include that proxies /marriott-tagging/api/ to the api container.
Repo layout
| Path | Purpose |
|---|---|
main.py |
The tagging pipeline — Box client, Gemini calls, validation, Box writes, Postgres logging, thumbnail fetch. _run_pass(...) for normal passes; _run_backfill(...) for the Box → DB mirror. |
api.py |
FastAPI app — search, run-trigger, backfill-trigger, thumbnail-serve. Background threads do the actual tagging/backfill work so the request returns immediately. |
auth.py |
Azure AD JWT validation against the tenant JWKS + the DEV_AUTH_BYPASS short-circuit. Exposes require_auth and require_admin FastAPI dependencies. |
db.py |
psycopg3 helpers — get_conn, ensure_schema, log_event, upsert_file_asset, get_thumbnail, is_file_already_tagged. Defensive — DB errors never crash a tagging pass. |
schema.sql |
tagging_events, file_assets, indexes, pg_trgm extension. Applied idempotently on api startup via the FastAPI lifespan handler. |
scheduler.py |
APScheduler entry point — kept for archival / opt-back-in. Not currently used; the compose file no longer wires up a tagger service. |
frontend/ |
Vite + React + TS SPA. src/App.tsx is the main page; src/auth.tsx does MSAL with the bypass switch; src/api.ts is the client. |
Dockerfile |
python:3.12-slim, non-root appuser. Same image runs the api container (and could run the scheduler if reactivated). |
docker-compose.yml |
name: marriott-tagging pinned. db (postgres:16) + api (built from Dockerfile). All host ports bound to 127.0.0.1. |
deploy/deploy.sh |
Idempotent server deploy: port auto-pick, git pull, rebuild, SPA build via one-shot node:20-alpine, rsync to /var/www/html/, /api/health poll. |
deploy/apache-marriott-tagging.conf.tmpl |
Apache vhost include — proxy /marriott-tagging/api/, alias /marriott-tagging to the SPA web-root, SPA fallback. __API_PORT__ rendered by deploy.sh. |
marriott-tagger.service / .timer |
Legacy systemd path. Not used in Docker mode. |
Quick start — local dev (macOS / Linux)
1. Prereqs
- Docker Desktop or Docker Engine with Compose v2
- Node 20+ (for
npm run dev) box_config.jsonin the repo root (JWT config from the Box Developer Console).envfrom.env.example
cp .env.example .env
# At minimum: set GEMINI_API_KEY and POSTGRES_PASSWORD
$EDITOR .env
2. Bring up Postgres + API
docker compose up --build -d
This starts:
db— Postgres 16, named volumepgdata, host port127.0.0.1:${POSTGRES_HOST_PORT:-5432}.api—uvicorn api:app, host port127.0.0.1:${MARRIOTT_API_PORT:-8004}.
Check health:
curl -s http://127.0.0.1:8004/api/health | jq
3. Run the SPA
cd frontend
npm install
npm run dev # http://localhost:5173
Vite proxies /api/* to 127.0.0.1:${MARRIOTT_API_PORT:-8004}. With the default VITE_DEV_AUTH_BYPASS=true you're auto-signed-in as the dev user.
4. Try a backfill
In the SPA click Backfill from Box. The active panel polls every 2 s and shows each file as it's processed. Thumbnails appear inline as rows land.
Configuration reference
All variables live in .env (gitignored). .env.example has the full list with comments.
Required to start
| Variable | Purpose |
|---|---|
GEMINI_API_KEY |
Google AI Studio key for Gemini calls. |
POSTGRES_USER / POSTGRES_PASSWORD / POSTGRES_DB |
DB creds. The compose file uses these to create the role + database. |
Ports (auto-managed by deploy.sh on the server)
| Variable | Default | Range scanned by deploy.sh |
|---|---|---|
POSTGRES_HOST_PORT |
5432 |
5435 – 5499 |
MARRIOTT_API_PORT |
8004 |
8003 – 8099 |
Both bound to 127.0.0.1 only — Postgres and the FastAPI process are never on the public internet. Apache reverse-proxies to MARRIOTT_API_PORT.
Auth
| Variable | Purpose |
|---|---|
DEV_AUTH_BYPASS |
true skips MSAL entirely; the api treats every caller as DEV_AUTH_EMAIL. Defaults to true to keep dev/first-deploy unblocked. |
DEV_AUTH_EMAIL / DEV_AUTH_NAME |
Identity stamped on requests when bypassed. |
DEV_AUTH_IS_ADMIN |
true (default) keeps the bypass user as admin; flip to false to preview the read-only UX. |
AZURE_TENANT_ID / AZURE_CLIENT_ID |
Your Azure AD app registration. Backend uses them to validate JWTs (JWKS fetch + aud/iss check). |
ADMIN_EMAILS |
Comma-separated allowlist that gates POST /api/runs and POST /api/backfill. Case-insensitive. Members see the destructive buttons in the SPA; everyone else gets read-only search. |
VITE_DEV_AUTH_BYPASS / VITE_AZURE_TENANT_ID / VITE_AZURE_CLIENT_ID |
Frontend mirrors. Baked into the SPA bundle at build time — changing them requires a re-build (deploy.sh handles this). |
VITE_PUBLIC_BASE |
Used by Vite for the SPA's base (asset prefix) AND by MSAL as the redirect-URI root. In local dev: http://localhost:5173. On the server, deploy.sh overrides with the prod URL automatically. |
Behavioural
| Variable | Purpose |
|---|---|
CORS_ORIGINS |
Comma-separated. Only set in local dev when Vite is on :5173 and FastAPI on host :8004. Empty in prod (Apache makes them same-origin). |
TZ |
Container timezone. Defaults to UTC. |
SCHEDULE_CRON / RUN_AT_STARTUP |
Read by scheduler.py only. Unused by default (no scheduler container in compose). |
Pipeline tuning (main.py constants)
Not in .env — edited at the top of main.py:
| Setting | Default | Description |
|---|---|---|
BOX_FOLDER_ID |
varies | Root Box folder to scan recursively. |
METADATA_TEMPLATE_KEY |
marriottUsa |
Box metadata template key. |
GEMINI_MODEL |
gemini-2.5-flash |
Model used for both image + video analysis. |
EXCLUDED_FOLDER_PREFIXES |
("z_", "zz_", "zzz_") |
Subfolder names to skip. |
GEMINI_DELAY / GEMINI_VIDEO_DELAY |
7 / 10 s |
Per-call rate-limit sleep. |
MAX_IMAGE_SIZE |
1000 px |
Longest side after resize before sending to Gemini. |
VIDEO_SIZE_LIMIT_INLINE |
20 MB |
Below this, Gemini gets the video inline; above, the File API is used. |
VIDEO_SOURCE_SIZE_LIMIT |
5 GB |
Skip videos with source file above this. |
VIDEO_PROXY_SIZE_LIMIT |
400 MB |
Skip videos with 480p proxy above this. |
MAX_FILES_PER_RUN |
200 |
Hard cap on newly-tagged files per pass. |
MAX_RUN_DURATION |
4 h |
Hard wall-clock cap per pass. |
DESCRIPTION_MAX_LENGTH |
255 |
Box description field char limit. |
SKIP_ALREADY_TAGGED |
True |
Toggles the DB-based skip check. |
THUMBNAIL_DIM |
160 |
Pixel dimension for cached thumbnails. |
Operations
Trigger a tagging pass
- From the SPA — click Run now. UI polls live; events stream into the active panel.
- From a shell (works with
DEV_AUTH_BYPASS=true):curl -X POST http://127.0.0.1:8004/api/runs - From inside the api container (bypasses the API entirely):
docker compose exec api python main.py
Trigger a backfill
- From the SPA — click Backfill from Box (admin-only; confirms first).
- From a shell:
curl -X POST http://127.0.0.1:8004/api/backfill
Backfill is idempotent: re-running won't duplicate tagging_events rows, and file_assets rows are upserted (preserving previously-captured thumbnails if today's fetch fails).
Inspect the DB
docker compose exec db psql -U marriott marriott_tagging
-- Row counts by status
SELECT status, count(*) FROM tagging_events GROUP BY status;
-- Recent events
SELECT created_at, media_type, file_name, status
FROM tagging_events ORDER BY created_at DESC LIMIT 20;
-- Thumbnail coverage
SELECT count(*) AS total,
count(*) FILTER (WHERE thumbnail_bytes IS NOT NULL) AS with_thumb,
avg(octet_length(thumbnail_bytes))::int AS avg_bytes
FROM file_assets;
-- All events for a given run
SELECT file_name, status, error_message
FROM tagging_events
WHERE run_id = '<uuid>'
ORDER BY created_at;
From your laptop (via SSH tunnel — Postgres isn't on the public internet):
ssh -L 55432:127.0.0.1:5435 user@optical-dev.oliver.solutions
psql postgresql://marriott:<password>@127.0.0.1:55432/marriott_tagging
Logs
docker compose logs -f api # API + background tagging/backfill threads
docker compose logs -f db
API reference
All endpoints behind /api. With DEV_AUTH_BYPASS=true no token is needed; with SSO enabled, include Authorization: Bearer <access_token>.
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /api/health |
none | Liveness + DB-reachable check + auth-config summary. |
| GET | /api/me |
required | { oid, name, email, dev, is_admin }. SPA uses is_admin to hide the destructive buttons. |
| GET | /api/events?q=…&limit=… |
required | Search. Whitespace-tokenises q; each token must match (substring OR pg_trgm similarity > 0.3) across the searched columns. Results ranked by summed similarity. limit 1-500 (default 100). |
| POST | /api/runs |
admin | Kicks off a tagging pass in a daemon thread. Returns { run_id, state: "running", started_by }. |
| GET | /api/runs?limit=… |
required | Recent runs from tagging_events, grouped by run_id, with counts and live state if still running. |
| GET | /api/runs/{run_id}/events |
required | Per-event detail for a single run. Includes live_state (running / completed / failed) and live_error. |
| POST | /api/backfill |
admin | Kicks off a backfill in a daemon thread. Same response shape as /api/runs. |
| GET | /api/files/{file_id}/thumbnail |
required | Streams the cached JPG thumbnail (Cache-Control: max-age=86400) or 404. |
Every event in /api/events / /api/runs/{id}/events includes a synthesised box_url (https://app.box.com/file/<file_id>) and a has_thumbnail boolean. The frontend builds the thumbnail URL via thumbnailUrl(file_id) which respects the SPA's base prefix.
Auth setup
Dev / first deploy
Keep DEV_AUTH_BYPASS=true and VITE_DEV_AUTH_BYPASS=true. Every request authenticates as DEV_AUTH_EMAIL, and the dev user is admin by default (toggle DEV_AUTH_IS_ADMIN=false to test the read-only UX).
Enabling Azure AD SSO
- Azure AD app registration (reuse an existing one if you have it).
- Redirect URIs (Single-page application platform):
- Local:
http://localhost:5173 - Prod:
https://optical-dev.oliver.solutions/marriott-tagging/
- Local:
- Expose an API with scope
access_as_userwhose Application ID URI isapi://<client-id>.
- Redirect URIs (Single-page application platform):
- Backend
.env(the api container):DEV_AUTH_BYPASS=false AZURE_TENANT_ID=<tenant-uuid> AZURE_CLIENT_ID=<client-uuid> ADMIN_EMAILS=alice@oliver.agency,bob@oliver.agency - Frontend
.env(baked into the SPA at build time):VITE_DEV_AUTH_BYPASS=false VITE_AZURE_TENANT_ID=<tenant-uuid> VITE_AZURE_CLIENT_ID=<client-uuid> - Rebuild + redeploy:
./deploy/deploy.sh docker compose up -d --force-recreate api
Backend validation: fetches the tenant's JWKS, verifies the RS256 signature, checks aud == AZURE_CLIENT_ID and iss matches one of the tenant issuer URLs. Admin gating: the email claim (preferred_username / upn / email) must match an entry in ADMIN_EMAILS (case-insensitive).
Server deployment — optical-dev.oliver.solutions
Mirrors the OSOP / adeo split-build pattern: backend in Docker, SPA built and served by Apache.
Public URL: https://optical-dev.oliver.solutions/marriott-tagging/
First-time setup
sudo git clone git@bitbucket.org:zlalani/marriott-box-image-video-tagging.git \
/opt/marriott-box-image-video-tagging
sudo chown -R "$USER:$USER" /opt/marriott-box-image-video-tagging
cd /opt/marriott-box-image-video-tagging
cp .env.example .env
$EDITOR .env # fill required values
$EDITOR box_config.json # paste Box JWT config
./deploy/deploy.sh
deploy.sh will:
- Sanity-check
.env,box_config.json, docker, git, compose v2. - Auto-pick free host ports (
POSTGRES_HOST_PORT5435-5499,MARRIOTT_API_PORT8003-8099), persisting choices back to.env. - Render
deploy/apache-marriott-tagging.conffrom the.tmplwith the picked api port. git pull --ff-only,docker compose build,docker compose up -d.- Build the Vite SPA in a one-shot
node:20-alpinecontainer (withVITE_PUBLIC_BASE=https://optical-dev.oliver.solutions/marriott-tagging), rsyncfrontend/dist/to/var/www/html/marriott-tagging/. - Poll
/api/healthuntil ready; verify the api container is running. - Print the Apache
Includeline to add to the shared vhost.
One-time vhost step (manual)
Add inside </VirtualHost> of /etc/apache2/sites-enabled/optical-dev.oliver.solutions.conf:
Include /opt/marriott-box-image-video-tagging/deploy/apache-marriott-tagging.conf
Then:
sudo apachectl configtest && sudo systemctl reload apache2
The deploy script intentionally does NOT touch the shared vhost — it's shared across many apps, and a per-app script editing it risks breaking others.
Re-deploying
cd /opt/marriott-box-image-video-tagging
./deploy/deploy.sh
Flags:
| Flag | Effect |
|---|---|
--no-pull |
Skip git pull (deploy whatever is in the working tree). |
--no-build |
Skip docker compose build (faster when only env / config changed). |
--no-frontend |
Skip Vite build + SPA sync. |
--run-now |
Also POST /api/runs to fire a tagging pass immediately (only works with DEV_AUTH_BYPASS=true). |
--logs |
Tail api logs after deploy. |
Common follow-ups
- Code changed but container kept the old image:
docker compose up -d --build --force-recreate api. - SPA changed but you don't want to rebuild the Python image:
./deploy/deploy.sh --no-build. - Schema added/changed: the api lifespan handler runs
ensure_schemaon startup, so a recreated api container applies it. New tables / indexes / extensions land automatically.
Database schema
tagging_events (append-only log)
One row per file the tagger sent to Gemini OR mirrored from Box. Skipped-as-already-tagged files are not logged.
| Column | Type | Notes |
|---|---|---|
id |
bigserial PK | |
run_id |
uuid NOT NULL | UUID per tagging/backfill pass — groups rows belonging to one run. |
created_at |
timestamptz NOT NULL | |
file_id, file_name, folder_path |
text | Box identifiers + display. |
media_type |
text | image or video. |
gemini_model |
text | E.g. gemini-2.5-flash. |
prompt |
text | Full prompt sent to Gemini (null for backfilled rows). |
raw_response |
jsonb | Untouched Gemini response (null for backfilled rows). |
description |
text | Description written to Box. |
scenes |
jsonb | Video scene breakdown. |
validated_metadata |
jsonb | Cleaned dict actually written to Box. |
metadata_write_success, description_write_success, scene_comment_write_success |
boolean | Per Box write. |
status |
text | success, backfilled, gemini_error, validation_error, metadata_write_error. |
error_message |
text | Free-form error if status is an error. |
duration_ms |
int | Gemini-call elapsed time (null for backfilled rows). |
Indexes: run_id, file_id, created_at DESC.
file_assets (per-file state)
One row per Box file_id, upserted by both the tagging pass and backfill.
| Column | Type | Notes |
|---|---|---|
file_id |
text PK | Matches tagging_events.file_id. |
thumbnail_bytes |
bytea | Box's 160×160 JPG. ~10-20 KB. |
thumbnail_content_type |
text | E.g. image/jpeg. |
thumbnail_size |
int | 160 today. |
search_terms |
text | Lowercased, whitespace-normalised text blob: file_name + folder + description + metadata values. |
updated_at |
timestamptz |
Index: updated_at DESC. Extension: pg_trgm (for fuzzy search via similarity()).
Troubleshooting
Blank page at the deployed URL
Asset paths baked with the wrong base. View-source the page; if the <script> tag reads src="/assets/..." instead of src="/marriott-tagging/assets/...", your VITE_PUBLIC_BASE was misset at build time. deploy.sh now overrides this with the prod URL automatically — git pull && ./deploy/deploy.sh --no-build rebuilds the SPA.
404 on a new API endpoint
The api container is running an old image. Force a recreate:
docker compose up -d --build --force-recreate api
500 on search
Usually pg_trgm extension missing. The api lifespan handler installs it on startup, but a stale running container might not have re-applied schema:
docker compose exec db psql -U marriott marriott_tagging \
-c "CREATE EXTENSION IF NOT EXISTS pg_trgm;"
Or just docker compose up -d --force-recreate api.
"Run now" did nothing visible
Probably the background thread crashed during init. Check api logs:
docker compose logs api --tail 60
Common causes:
box_config.jsonnot mounted into the api container — confirm withdocker compose exec api ls -la /app/box_config.json. The compose file bind-mounts./box_config.json; if it didn't exist when compose came up, no mount.GEMINI_API_KEYempty in the api container —docker compose exec api printenv GEMINI_API_KEY.- Every file already has metadata in Box / the DB — the pass completes silently with
0 tagged.
Postgres host-port conflict
deploy.sh scans 5435-5499. If your laptop already has a Postgres listening on those, bump the upper bound in deploy.sh or set POSTGRES_HOST_PORT manually in .env.
Legacy: systemd deployment (Ubuntu)
The marriott-tagger.service / .timer unit files are kept in the repo for a pre-Docker deployment path that runs main.py directly via a systemd timer. Don't run this alongside the Docker deploy on the same host — both will fire passes and double-write to Box.
Setup
sudo apt update
sudo apt install -y git python3 python3-venv python3-pip
sudo mkdir -p /opt/marriott-box-image-video-tagging
sudo chown $USER:$USER /opt/marriott-box-image-video-tagging
git clone git@bitbucket.org:zlalani/marriott-box-image-video-tagging.git /opt/marriott-box-image-video-tagging
cd /opt/marriott-box-image-video-tagging
sudo useradd --system --shell /usr/sbin/nologin --home-dir /opt/marriott-box-image-video-tagging marriott-tagger
sudo chown -R marriott-tagger:marriott-tagger /opt/marriott-box-image-video-tagging
# Drop credentials
sudo -u marriott-tagger tee /opt/marriott-box-image-video-tagging/box_config.json > /dev/null < /path/to/local/box_config.json
sudo -u marriott-tagger tee /opt/marriott-box-image-video-tagging/.env > /dev/null <<'EOF'
GEMINI_API_KEY=your_key_here
EOF
sudo chmod 600 /opt/marriott-box-image-video-tagging/box_config.json /opt/marriott-box-image-video-tagging/.env
# venv
sudo -u marriott-tagger python3 -m venv /opt/marriott-box-image-video-tagging/env
sudo -u marriott-tagger /opt/marriott-box-image-video-tagging/env/bin/pip install -r /opt/marriott-box-image-video-tagging/requirements.txt
# Install + enable
sudo cp marriott-tagger.service marriott-tagger.timer /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now marriott-tagger.timer
In this mode there's no Postgres, no SPA, no api — just main.py running on a cron. Tagging-events logging requires DATABASE_URL to be set in .env; otherwise db.log_event no-ops gracefully and you lose the audit log.
How the tagging pipeline works
- Dynamic prompt: Gemini's prompt is built at runtime from the live Box template definition (
fetch_template_schema). Field additions / option changes propagate automatically. - Metadata + description: Each file gets structured metadata (filterable in Box search) and a short description (visible in Box list views, also indexed by Box search).
- Search-keyword tail: Descriptions are formatted as
<summary>. <comma-separated keywords>.— the tail covers synonyms / broader terms (food/dining/eating/meal/restaurant) so a Box search for "Food" still hits assets tagged with enum valueDining. - Video scene breakdown: Videos additionally get a timestamped scene breakdown written as a comment on the Box file — high-level chapter map for finding moments inside long videos.
- DB-based skip: Once a file has a
successorbackfilledrow, the next pass skips it locally (no Box call, no Gemini call). Run Backfill from Box once to mirror Box's existing metadata into the local DB before relying on this. - Validation: Gemini output is validated against the template schema — invalid enum values are dropped, multi-select arrays are filtered to allowed options only.
- Large-video gating: Videos exceeding the source or proxy size limits are skipped cleanly rather than wasting time / API budget. Skips are reported as
skipped, noterrored. - Per-run limiter: A run will tag at most
MAX_FILES_PER_RUNfiles inMAX_RUN_DURATIONwall-clock seconds. Whichever cap hits first, the run exits cleanly with a summary; the next run picks up the remaining untagged files. This keeps a sudden 1000-file upload from blowing through your Gemini budget in one click. - Thumbnail cache: After a successful tag (or as part of backfill), the file's 160×160 JPG is fetched from Box and stored in
file_assets.thumbnail_bytes. The SPA renders it inline in search results;Cache-Control: private, max-age=86400means the browser caches it for a day.
Credentials & files NOT in git
box_config.json— Box JWT config. Bind-mounted read-only into the api container..env— All env vars includingGEMINI_API_KEY,POSTGRES_PASSWORD,AZURE_CLIENT_ID, etc.deploy/apache-marriott-tagging.conf— generated bydeploy.shfrom the.tmpl.frontend/node_modules/,frontend/dist/— npm install / Vite build artefacts.
.env.example is checked in; copy it to .env and fill in.