No description
Find a file
Simeon.Schecter a04e8c1e37 Add asset tagger pipeline with keyword-tail descriptions and large-video gating
- Box JWT + Gemini integration for image and video metadata tagging
- Description format includes search-keyword tail to address synonym gaps
  (e.g. "Food" search now hits assets tagged "Dining")
- Skip videos exceeding 5GB source or 400MB proxy (~60min runtime, beyond
  Gemini context budget) — counted as skipped, not errored
- Hardened None-response handling in Gemini JSON parser
- Per-run limiter: 200 newly-tagged files / 4 hour wall-clock cap, with
  clean exit and resumable progress on next run
- systemd service + timer for daily 2am tagging passes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-06 14:09:28 -04:00
.gitignore Add asset tagger pipeline with keyword-tail descriptions and large-video gating 2026-05-06 14:09:28 -04:00
CLAUDE.md Add asset tagger pipeline with keyword-tail descriptions and large-video gating 2026-05-06 14:09:28 -04:00
main.py Add asset tagger pipeline with keyword-tail descriptions and large-video gating 2026-05-06 14:09:28 -04:00
marriott-tagger.service Add asset tagger pipeline with keyword-tail descriptions and large-video gating 2026-05-06 14:09:28 -04:00
marriott-tagger.timer Add asset tagger pipeline with keyword-tail descriptions and large-video gating 2026-05-06 14:09:28 -04:00
README.md Add asset tagger pipeline with keyword-tail descriptions and large-video gating 2026-05-06 14:09:28 -04:00
requirements.txt Add asset tagger pipeline with keyword-tail descriptions and large-video gating 2026-05-06 14:09:28 -04:00

Marriott Box Asset Tagger

Batch-processes images in a Box folder, analyzes them with Gemini AI, and writes structured metadata back to Box using the marriottUsa metadata template.

Setup

1. Clone and create virtual environment

cd Marriott_Box_Asset_Tagging
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

2. Box JWT credentials

Download your Box app's JWT config from the Box Developer Console and save it as box_config.json in the project root.

The service account must have:

  • Access to folder 370595013246
  • Permission to read/write metadata using the marriottUsa template

3. Gemini API key

Add your key to .env:

GEMINI_API_KEY=your_key_here

Get a key at Google AI Studio.

Usage

source env/bin/activate
python main.py

The script will:

  1. Authenticate with Box and Gemini
  2. Fetch the marriottUsa template schema (fields, types, allowed values)
  3. Build a dynamic Gemini prompt from the schema
  4. List all image files in the target folder
  5. For each image: download, resize, analyze with Gemini, validate metadata, write to Box
  6. Print a summary of results

Configuration

Edit the constants at the top of main.py:

Setting Default Description
BOX_FOLDER_ID 370595013246 Box folder to process
METADATA_TEMPLATE_KEY marriottUsa Box metadata template key
GEMINI_MODEL gemini-2.5-flash Gemini model for analysis
EXCLUDED_FOLDERS {"zz_Working Retouch"} Subfolder names to skip
GEMINI_DELAY 7 Seconds between Gemini calls
SKIP_ALREADY_TAGGED True Skip files with existing metadata
MAX_IMAGE_SIZE 1000 Max pixel dimension for resize

How It Works

  • Dynamic prompt: The Gemini prompt is built at runtime from the actual Box template definition. If Marriott adds/changes fields or options in Box Admin, the script adapts automatically.
  • Metadata + description: Each file gets structured metadata (for filtered search) and a short description (visible in Box list views).
  • Resumable: Files with existing metadata are skipped by default, so the script can be re-run after interruptions or when new images are added.
  • Validation: Gemini output is validated against the template schema — invalid enum values are dropped, multiSelect arrays are filtered to allowed options only.