No description

Find a file

Simeon.Schecter 010a3955a8 Document Ubuntu systemd deployment and current configuration - Add full Server Deployment section with prereqs, user setup, credential placement, venv setup, unit installation, verification, and update flow - Tailored for Ubuntu 22.04/24.04 (notes python3-venv apt package gotcha) - True up Configuration table with current constants (video size limits, per-run cap, video delay, excluded folder prefixes) - Update How It Works to cover keyword-tail descriptions, scene-breakdown comments, large-video gating, and the per-run limiter - Mention videos in the intro (was previously images-only) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-05-06 14:35:58 -04:00
.gitignore	Add asset tagger pipeline with keyword-tail descriptions and large-video gating	2026-05-06 14:09:28 -04:00
CLAUDE.md	Add asset tagger pipeline with keyword-tail descriptions and large-video gating	2026-05-06 14:09:28 -04:00
main.py	Add asset tagger pipeline with keyword-tail descriptions and large-video gating	2026-05-06 14:09:28 -04:00
marriott-tagger.service	Add asset tagger pipeline with keyword-tail descriptions and large-video gating	2026-05-06 14:09:28 -04:00
marriott-tagger.timer	Add asset tagger pipeline with keyword-tail descriptions and large-video gating	2026-05-06 14:09:28 -04:00
README.md	Document Ubuntu systemd deployment and current configuration	2026-05-06 14:35:58 -04:00
requirements.txt	Add asset tagger pipeline with keyword-tail descriptions and large-video gating	2026-05-06 14:09:28 -04:00

README.md

Marriott Box Asset Tagger

Batch-processes images and videos in a Box folder, analyzes them with Gemini AI, and writes structured metadata back to Box using the marriottUsa metadata template. Videos use Box's 480p MP4 proxy representations to keep bandwidth and Gemini token usage manageable.

Setup

1. Clone and create virtual environment

cd Marriott_Box_Asset_Tagging
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

2. Box JWT credentials

Download your Box app's JWT config from the Box Developer Console and save it as box_config.json in the project root.

The service account must have:

Access to folder 370595013246
Permission to read/write metadata using the marriottUsa template

3. Gemini API key

Add your key to .env:

GEMINI_API_KEY=your_key_here

Get a key at Google AI Studio.

Usage

source env/bin/activate
python main.py

The script will:

Authenticate with Box and Gemini
Fetch the marriottUsa template schema (fields, types, allowed values)
Build a dynamic Gemini prompt from the schema
Recursively list all image and video files in the target folder
For each image: download, resize, analyze with Gemini, validate metadata, write metadata + description to Box
For each video: fetch the 480p MP4 proxy from Box, analyze with Gemini, write metadata + description + a scene-breakdown comment to Box
Print a summary of results

Server Deployment (systemd, Ubuntu)

The repo includes marriott-tagger.service and marriott-tagger.timer for running the tagger as a scheduled service. These steps are written for Ubuntu 22.04 / 24.04 but should work on any systemd-based distribution with minor path tweaks (e.g. /sbin/nologin instead of /usr/sbin/nologin on Red Hat-family).

0. Prerequisites

sudo apt update
sudo apt install -y git python3 python3-venv python3-pip

python3-venv is a separate apt package on Ubuntu — python3 -m venv will fail without it.

1. Clone the repo on the server

sudo mkdir -p /opt/marriott-box-image-video-tagging
sudo chown $USER:$USER /opt/marriott-box-image-video-tagging
git clone git@bitbucket.org:zlalani/marriott-box-image-video-tagging.git /opt/marriott-box-image-video-tagging
cd /opt/marriott-box-image-video-tagging

2. Create the service user

sudo useradd --system --shell /usr/sbin/nologin --home-dir /opt/marriott-box-image-video-tagging marriott-tagger
sudo chown -R marriott-tagger:marriott-tagger /opt/marriott-box-image-video-tagging

3. Drop credentials in place (NOT in git)

sudo -u marriott-tagger tee /opt/marriott-box-image-video-tagging/box_config.json > /dev/null < /path/to/local/box_config.json
sudo -u marriott-tagger tee /opt/marriott-box-image-video-tagging/.env > /dev/null <<'EOF'
GEMINI_API_KEY=your_key_here
EOF
sudo chmod 600 /opt/marriott-box-image-video-tagging/box_config.json /opt/marriott-box-image-video-tagging/.env

4. Set up the virtualenv

sudo -u marriott-tagger python3 -m venv /opt/marriott-box-image-video-tagging/env
sudo -u marriott-tagger /opt/marriott-box-image-video-tagging/env/bin/pip install -r /opt/marriott-box-image-video-tagging/requirements.txt

5. Install the systemd unit files

sudo cp /opt/marriott-box-image-video-tagging/marriott-tagger.service /etc/systemd/system/
sudo cp /opt/marriott-box-image-video-tagging/marriott-tagger.timer /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now marriott-tagger.timer

6. Verify

# Show the next scheduled run
systemctl list-timers marriott-tagger.timer

# Trigger a one-off run immediately (timer will still run on schedule)
sudo systemctl start marriott-tagger.service

# Tail the logs (live)
sudo journalctl -u marriott-tagger -f

# Inspect the most recent run's full output
sudo journalctl -u marriott-tagger --since "1 day ago"

Updating the service

cd /opt/marriott-box-image-video-tagging
sudo -u marriott-tagger git pull
# If unit files changed:
sudo cp marriott-tagger.service marriott-tagger.timer /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl restart marriott-tagger.timer

Configuration

Edit the constants at the top of main.py:

Setting	Default	Description
`BOX_FOLDER_ID`	(varies)	Box folder to process
`METADATA_TEMPLATE_KEY`	`marriottUsa`	Box metadata template key
`GEMINI_MODEL`	`gemini-2.5-flash`	Gemini model for analysis
`EXCLUDED_FOLDER_PREFIXES`	`("z_", "zz_", "zzz_")`	Subfolder name prefixes to skip
`GEMINI_DELAY`	`7`	Seconds between Gemini image calls
`GEMINI_VIDEO_DELAY`	`10`	Seconds between Gemini video calls
`SKIP_ALREADY_TAGGED`	`True`	Skip files with existing metadata
`MAX_IMAGE_SIZE`	`1000`	Max pixel dimension for image resize
`VIDEO_SIZE_LIMIT_INLINE`	`20 MB`	Below this, send video inline; above, use Gemini File API
`VIDEO_SOURCE_SIZE_LIMIT`	`5 GB`	Skip videos whose source file exceeds this
`VIDEO_PROXY_SIZE_LIMIT`	`400 MB`	Skip videos whose 480p proxy exceeds this (~60 min runtime)
`MAX_FILES_PER_RUN`	`200`	Hard cap on newly-tagged files per run
`MAX_RUN_DURATION`	`4h`	Hard wall-clock cap per run
`DESCRIPTION_MAX_LENGTH`	`255`	Box description field char limit

How It Works

Dynamic prompt: The Gemini prompt is built at runtime from the actual Box template definition. If Marriott adds/changes fields or options in Box Admin, the script adapts automatically.
Metadata + description: Each file gets structured metadata (for filtered search) and a short description (visible in Box list views, also indexed by Box search).
Search-keyword tail: Each description is formatted as <summary sentence>. <comma-separated keywords>. — the keyword tail covers synonyms and broader category terms (e.g. food/dining/eating/meal/restaurant) so a search for "Food" hits assets tagged with the enum value Dining, etc.
Video scene breakdown: Videos additionally get a timestamped scene breakdown written as a comment on the Box file — a high-level chapter map for finding moments inside long videos.
Resumable: Files with existing metadata are skipped by default, so the script can be re-run after interruptions or when new files are added.
Validation: Gemini output is validated against the template schema — invalid enum values are dropped, multiSelect arrays are filtered to allowed options only.
Large-video gating: Videos exceeding the source or proxy size limits are skipped cleanly rather than wasting time / API budget on content beyond Gemini's context window. Skips are reported in the summary as skipped, not errored.
Per-run limiter: A daily run will tag at most MAX_FILES_PER_RUN newly-tagged files in MAX_RUN_DURATION of wall clock. Whichever cap hits first, the run exits cleanly with a summary line; the next scheduled run picks up the remaining untagged files. This keeps a sudden 1000-file upload from blowing through your Gemini budget in one night.