Search results were text-only — hard to scan visually with thousands
of assets coming. Now every file Gemini-tags or backfill mirrors also
gets its Box-generated 160x160 JPG thumbnail (~10-20 KB) pulled and
stashed in Postgres, plus a consolidated `search_terms` blob
(file_name + folder + description + flattened metadata values).
Search results render the thumbnail inline; rows missing one show a
striped placeholder. Search SQL now LEFT JOINs file_assets and hits
search_terms too, so backfilled rows are properly searchable.
- schema.sql: new `file_assets` table (file_id PK, thumbnail_bytes
bytea, search_terms text, updated_at). idempotent.
- db.py: `upsert_file_asset` (INSERT … ON CONFLICT preserving
existing thumbnail bytes if today's fetch failed) and
`get_thumbnail`. Both swallow exceptions per the established
defensive pattern.
- main.py: `fetch_thumbnail` (Box SDK get_file_thumbnail_by_id, JPG
at 160 px, handles BoxAPIError 202/404 as soft misses) and
`build_search_terms` (lowercase, whitespace-collapsed text blob).
`_persist_file_asset` wires both into the image+video success
paths of `_run_pass` and into every iteration of `_run_backfill`.
- Backfill skip logic refined: always upsert file_assets (idempotent
PK), only skip the tagging_events insert if a good row already
exists. Re-running Backfill from Box populates thumbnails for
rows backfilled before this feature shipped.
- api.py: `GET /api/files/{file_id}/thumbnail` streams the bytea
with Cache-Control max-age=86400. Search SQL gains the LEFT JOIN
and emits `has_thumbnail` per row. Search also matches against
fa.search_terms so backfilled rows surface for free-text queries
that hit their metadata.
- frontend: Event type adds `has_thumbnail`; `thumbnailUrl(fileId)`
helper builds the prefix-aware URL via Vite's BASE. EventList
renders the thumbnail (lazy, with onError fallback) or a striped
placeholder. .thumb styling + .event-head layout in styles.css.
Verified locally: schema applies via lifespan; upsert + get_thumbnail
roundtrip; /api/files/999/thumbnail returns 200 with bytes; /api/events
returns has_thumbnail per row; multi-token "female city" search finds
a row whose validated_metadata contains both tokens.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
45 lines
2.1 KiB
SQL
45 lines
2.1 KiB
SQL
-- Marriott Box Tagger — request log
|
|
-- One row per file the tagger sent to Gemini (success or error).
|
|
-- Skipped-as-already-tagged files do not produce rows.
|
|
|
|
-- pg_trgm powers the fuzzy `similarity()` call in /api/events. Idempotent.
|
|
CREATE EXTENSION IF NOT EXISTS pg_trgm;
|
|
|
|
CREATE TABLE IF NOT EXISTS tagging_events (
|
|
id BIGSERIAL PRIMARY KEY,
|
|
run_id UUID NOT NULL,
|
|
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
|
file_id TEXT NOT NULL,
|
|
file_name TEXT NOT NULL,
|
|
folder_path TEXT,
|
|
media_type TEXT NOT NULL CHECK (media_type IN ('image','video')),
|
|
gemini_model TEXT NOT NULL,
|
|
prompt TEXT,
|
|
raw_response JSONB,
|
|
description TEXT,
|
|
scenes JSONB,
|
|
validated_metadata JSONB,
|
|
metadata_write_success BOOLEAN,
|
|
description_write_success BOOLEAN,
|
|
scene_comment_write_success BOOLEAN,
|
|
status TEXT NOT NULL,
|
|
error_message TEXT,
|
|
duration_ms INTEGER
|
|
);
|
|
|
|
CREATE INDEX IF NOT EXISTS tagging_events_run_id_idx ON tagging_events (run_id);
|
|
CREATE INDEX IF NOT EXISTS tagging_events_file_id_idx ON tagging_events (file_id);
|
|
CREATE INDEX IF NOT EXISTS tagging_events_created_idx ON tagging_events (created_at DESC);
|
|
|
|
-- Per-file state: thumbnails + consolidated search blob. One row per Box file_id,
|
|
-- upserted by the tagger and backfill flows. Independent of the append-only
|
|
-- tagging_events log (which can have many rows per file_id over time).
|
|
CREATE TABLE IF NOT EXISTS file_assets (
|
|
file_id TEXT PRIMARY KEY,
|
|
thumbnail_bytes BYTEA,
|
|
thumbnail_content_type TEXT,
|
|
thumbnail_size INTEGER,
|
|
search_terms TEXT,
|
|
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
|
);
|
|
CREATE INDEX IF NOT EXISTS file_assets_updated_idx ON file_assets (updated_at DESC);
|