marriott-box-image-video-ta.../CLAUDE.md
Simeon.Schecter a04e8c1e37 Add asset tagger pipeline with keyword-tail descriptions and large-video gating
- Box JWT + Gemini integration for image and video metadata tagging
- Description format includes search-keyword tail to address synonym gaps
  (e.g. "Food" search now hits assets tagged "Dining")
- Skip videos exceeding 5GB source or 400MB proxy (~60min runtime, beyond
  Gemini context budget) — counted as skipped, not errored
- Hardened None-response handling in Gemini JSON parser
- Per-run limiter: 200 newly-tagged files / 4 hour wall-clock cap, with
  clean exit and resumable progress on next run
- systemd service + timer for daily 2am tagging passes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-06 14:09:28 -04:00

1.2 KiB

CLAUDE.md — Box + Gemini Auto-Tagger (Marriott)

Project Overview

This tool auto-applies AI-generated tags/descriptions to images and videos stored in a Box enterprise instance for Marriott. It uses Google Gemini's vision API to analyze media files and write descriptions back to Box as metadata.

Tech Stack

  • Python 3.11+
  • Virtual environment: python -m venv env / source env/bin/activate
  • Box SDK: box-sdk-gen (NOT the legacy boxsdk)
  • Image processing: Pillow
  • Gemini API: google-genai (NOT the deprecated google-generativeai)

Key Package Installs

pip install box-sdk-gen[jwt] google-genai Pillow python-dotenv

Auth Files (DO NOT COMMIT)

  • box_config.json — Downloaded from Box Developer Console (JWT config)
  • .env — Contains GEMINI_API_KEY

Box Folder IDs

  • Source media folder: 3155... (confirm exact ID before running)
  • JWT config file is stored within this folder or adjacent

Code Style

  • Use try/except blocks on ALL API calls (Box and Gemini)
  • Log all errors with the file name and error type
  • Use .env for all secrets — never hardcode keys
  • Process files one at a time (not bulk) to respect API rate limits
  • Always check if a tag already exists before writing to avoid duplicates