Simeon.Schecter a04e8c1e37 Add asset tagger pipeline with keyword-tail descriptions and large-video gating

- Box JWT + Gemini integration for image and video metadata tagging
- Description format includes search-keyword tail to address synonym gaps
  (e.g. "Food" search now hits assets tagged "Dining")
- Skip videos exceeding 5GB source or 400MB proxy (~60min runtime, beyond
  Gemini context budget) — counted as skipped, not errored
- Hardened None-response handling in Gemini JSON parser
- Per-run limiter: 200 newly-tagged files / 4 hour wall-clock cap, with
  clean exit and resumable progress on next run
- systemd service + timer for daily 2am tagging passes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-05-06 14:09:28 -04:00

1.2 KiB

Raw Permalink Blame History

CLAUDE.md — Box + Gemini Auto-Tagger (Marriott)

Project Overview

This tool auto-applies AI-generated tags/descriptions to images and videos stored in a Box enterprise instance for Marriott. It uses Google Gemini's vision API to analyze media files and write descriptions back to Box as metadata.

Tech Stack

Python 3.11+
Virtual environment: python -m venv env / source env/bin/activate
Box SDK: box-sdk-gen (NOT the legacy boxsdk)
Image processing: Pillow
Gemini API: google-genai (NOT the deprecated google-generativeai)

Key Package Installs

pip install box-sdk-gen[jwt] google-genai Pillow python-dotenv

Auth Files (DO NOT COMMIT)

box_config.json — Downloaded from Box Developer Console (JWT config)
.env — Contains GEMINI_API_KEY

Box Folder IDs

Source media folder: 3155... (confirm exact ID before running)
JWT config file is stored within this folder or adjacent

Code Style

Use try/except blocks on ALL API calls (Box and Gemini)
Log all errors with the file name and error type
Use .env for all secrets — never hardcode keys
Process files one at a time (not bulk) to respect API rate limits
Always check if a tag already exists before writing to avoid duplicates

1.2 KiB Raw Permalink Blame History