No description

Find a file

Simeon.Schecter a04e8c1e37 Add asset tagger pipeline with keyword-tail descriptions and large-video gating - Box JWT + Gemini integration for image and video metadata tagging - Description format includes search-keyword tail to address synonym gaps (e.g. "Food" search now hits assets tagged "Dining") - Skip videos exceeding 5GB source or 400MB proxy (~60min runtime, beyond Gemini context budget) — counted as skipped, not errored - Hardened None-response handling in Gemini JSON parser - Per-run limiter: 200 newly-tagged files / 4 hour wall-clock cap, with clean exit and resumable progress on next run - systemd service + timer for daily 2am tagging passes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-05-06 14:09:28 -04:00
.gitignore	Add asset tagger pipeline with keyword-tail descriptions and large-video gating	2026-05-06 14:09:28 -04:00
CLAUDE.md	Add asset tagger pipeline with keyword-tail descriptions and large-video gating	2026-05-06 14:09:28 -04:00
main.py	Add asset tagger pipeline with keyword-tail descriptions and large-video gating	2026-05-06 14:09:28 -04:00
marriott-tagger.service	Add asset tagger pipeline with keyword-tail descriptions and large-video gating	2026-05-06 14:09:28 -04:00
marriott-tagger.timer	Add asset tagger pipeline with keyword-tail descriptions and large-video gating	2026-05-06 14:09:28 -04:00
README.md	Add asset tagger pipeline with keyword-tail descriptions and large-video gating	2026-05-06 14:09:28 -04:00
requirements.txt	Add asset tagger pipeline with keyword-tail descriptions and large-video gating	2026-05-06 14:09:28 -04:00

README.md

Marriott Box Asset Tagger

Batch-processes images in a Box folder, analyzes them with Gemini AI, and writes structured metadata back to Box using the marriottUsa metadata template.

Setup

1. Clone and create virtual environment

cd Marriott_Box_Asset_Tagging
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

2. Box JWT credentials

Download your Box app's JWT config from the Box Developer Console and save it as box_config.json in the project root.

The service account must have:

Access to folder 370595013246
Permission to read/write metadata using the marriottUsa template

3. Gemini API key

Add your key to .env:

GEMINI_API_KEY=your_key_here

Get a key at Google AI Studio.

Usage

source env/bin/activate
python main.py

The script will:

Authenticate with Box and Gemini
Fetch the marriottUsa template schema (fields, types, allowed values)
Build a dynamic Gemini prompt from the schema
List all image files in the target folder
For each image: download, resize, analyze with Gemini, validate metadata, write to Box
Print a summary of results

Configuration

Edit the constants at the top of main.py:

Setting	Default	Description
`BOX_FOLDER_ID`	`370595013246`	Box folder to process
`METADATA_TEMPLATE_KEY`	`marriottUsa`	Box metadata template key
`GEMINI_MODEL`	`gemini-2.5-flash`	Gemini model for analysis
`EXCLUDED_FOLDERS`	`{"zz_Working Retouch"}`	Subfolder names to skip
`GEMINI_DELAY`	`7`	Seconds between Gemini calls
`SKIP_ALREADY_TAGGED`	`True`	Skip files with existing metadata
`MAX_IMAGE_SIZE`	`1000`	Max pixel dimension for resize

How It Works

Dynamic prompt: The Gemini prompt is built at runtime from the actual Box template definition. If Marriott adds/changes fields or options in Box Admin, the script adapts automatically.
Metadata + description: Each file gets structured metadata (for filtered search) and a short description (visible in Box list views).
Resumable: Files with existing metadata are skipped by default, so the script can be re-run after interruptions or when new images are added.
Validation: Gemini output is validated against the template schema — invalid enum values are dropped, multiSelect arrays are filtered to allowed options only.