marriott-box-image-video-ta.../schema.sql
DJP 04440d661d Cache Box thumbnails + search blob; render in UI
Search results were text-only — hard to scan visually with thousands
of assets coming. Now every file Gemini-tags or backfill mirrors also
gets its Box-generated 160x160 JPG thumbnail (~10-20 KB) pulled and
stashed in Postgres, plus a consolidated `search_terms` blob
(file_name + folder + description + flattened metadata values).
Search results render the thumbnail inline; rows missing one show a
striped placeholder. Search SQL now LEFT JOINs file_assets and hits
search_terms too, so backfilled rows are properly searchable.

- schema.sql: new `file_assets` table (file_id PK, thumbnail_bytes
  bytea, search_terms text, updated_at). idempotent.
- db.py: `upsert_file_asset` (INSERT … ON CONFLICT preserving
  existing thumbnail bytes if today's fetch failed) and
  `get_thumbnail`. Both swallow exceptions per the established
  defensive pattern.
- main.py: `fetch_thumbnail` (Box SDK get_file_thumbnail_by_id, JPG
  at 160 px, handles BoxAPIError 202/404 as soft misses) and
  `build_search_terms` (lowercase, whitespace-collapsed text blob).
  `_persist_file_asset` wires both into the image+video success
  paths of `_run_pass` and into every iteration of `_run_backfill`.
- Backfill skip logic refined: always upsert file_assets (idempotent
  PK), only skip the tagging_events insert if a good row already
  exists. Re-running Backfill from Box populates thumbnails for
  rows backfilled before this feature shipped.
- api.py: `GET /api/files/{file_id}/thumbnail` streams the bytea
  with Cache-Control max-age=86400. Search SQL gains the LEFT JOIN
  and emits `has_thumbnail` per row. Search also matches against
  fa.search_terms so backfilled rows surface for free-text queries
  that hit their metadata.
- frontend: Event type adds `has_thumbnail`; `thumbnailUrl(fileId)`
  helper builds the prefix-aware URL via Vite's BASE. EventList
  renders the thumbnail (lazy, with onError fallback) or a striped
  placeholder. .thumb styling + .event-head layout in styles.css.

Verified locally: schema applies via lifespan; upsert + get_thumbnail
roundtrip; /api/files/999/thumbnail returns 200 with bytes; /api/events
returns has_thumbnail per row; multi-token "female city" search finds
a row whose validated_metadata contains both tokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 16:20:13 -04:00

45 lines
2.1 KiB
SQL

-- Marriott Box Tagger — request log
-- One row per file the tagger sent to Gemini (success or error).
-- Skipped-as-already-tagged files do not produce rows.
-- pg_trgm powers the fuzzy `similarity()` call in /api/events. Idempotent.
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE TABLE IF NOT EXISTS tagging_events (
id BIGSERIAL PRIMARY KEY,
run_id UUID NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
file_id TEXT NOT NULL,
file_name TEXT NOT NULL,
folder_path TEXT,
media_type TEXT NOT NULL CHECK (media_type IN ('image','video')),
gemini_model TEXT NOT NULL,
prompt TEXT,
raw_response JSONB,
description TEXT,
scenes JSONB,
validated_metadata JSONB,
metadata_write_success BOOLEAN,
description_write_success BOOLEAN,
scene_comment_write_success BOOLEAN,
status TEXT NOT NULL,
error_message TEXT,
duration_ms INTEGER
);
CREATE INDEX IF NOT EXISTS tagging_events_run_id_idx ON tagging_events (run_id);
CREATE INDEX IF NOT EXISTS tagging_events_file_id_idx ON tagging_events (file_id);
CREATE INDEX IF NOT EXISTS tagging_events_created_idx ON tagging_events (created_at DESC);
-- Per-file state: thumbnails + consolidated search blob. One row per Box file_id,
-- upserted by the tagger and backfill flows. Independent of the append-only
-- tagging_events log (which can have many rows per file_id over time).
CREATE TABLE IF NOT EXISTS file_assets (
file_id TEXT PRIMARY KEY,
thumbnail_bytes BYTEA,
thumbnail_content_type TEXT,
thumbnail_size INTEGER,
search_terms TEXT,
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS file_assets_updated_idx ON file_assets (updated_at DESC);