video-master-adapt/AI_VISION_GUIDE.md
2025-10-15 16:25:04 +02:00

11 KiB
Raw Blame History

AI Vision Guide

What is AI Vision?

AI Vision is a Tier 2 matching system that uses OpenAI's GPT-4o vision model to detect video matches that perceptual hashing can't find. It's especially powerful for cross-aspect-ratio scenarios.

When is it Used?

AI Vision smartly activates only when truly needed:

  1. No matches found with perceptual hashing (likely cross-aspect), OR
  2. Incomplete coverage (best match has < 100% frame coverage)

AI Vision is skipped when:

  • Perfect match found (100% frame coverage)
  • Same aspect ratio with complete match

Why this matters:

  • In typical batches, only 1-2 out of 39 adaptations need AI Vision
  • Saves ~97% of AI costs! ($0.30 vs $12 for 39 videos)
  • Much faster processing (seconds vs minutes)

You don't need to do anything - it automatically optimizes!

What Problems Does it Solve?

Problem: Cross-Aspect Ratios

Traditional perceptual hashing fails when comparing:

  • 16:9 master → 1:1 square adaptation (Instagram, Facebook)
  • 16:9 master → 9:16 vertical adaptation (TikTok, Stories)
  • 16:9 master → 4:5 portrait adaptation (Instagram feed)

Why? The pixel layouts are completely different after cropping/scaling.

Solution: Semantic Understanding

AI Vision looks at the content, not pixels:

  • Same people? ✓
  • Same products? ✓
  • Same settings? ✓
  • Same framing (even if cropped)? ✓
  • Different text/logos? Ignored!

Setup

1. Get OpenAI API Key

Visit https://platform.openai.com/api-keys and create a new key.

2. Configure Environment

# Copy example file
cp .env.example .env

# Edit .env and add your key
nano .env

Add this line:

OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxx

3. Verify

python cli.py status

You should see:

✓ AI Vision enabled (GPT-4o)

Usage

No changes needed! Just run your normal matching commands:

# Single match
python cli.py match /path/to/adaptation.mp4

# Batch match
python cli.py batch-match /path/to/adaptations/

AI Vision will activate automatically when needed.

Understanding Results

Terminal Output

When AI Vision finds a match, you'll see:

Best Match:
  Master: 5368082_1011A_SF_DROP_1_20_D_16x9_BVOD_YT_OLV_MASTER_1
  Duration: 20s
  Video frames matched: 95.0% (28/30 frames)
  Average frame similarity: 95.0%
  Combined confidence: 95.0%

AI Vision Analysis:
  Method: GPT-4o (OpenAI)
  Format: Adaptation is cropped from master

  AI Reasoning:
  Both sets feature the same two people in identical clothing and poses,
  indicating they are the same footage. The settings, such as the plain,
  light-colored backdrop, are consistent across both sets...

Key Fields

Field Meaning
Method Shows "AI Vision" instead of "Hash"
Format Indicates if adaptation is cropped from master
AI Reasoning Human-readable explanation of the match
Combined confidence Match confidence (0-100%)

Cost

Pricing (as of October 2025)

  • Model: GPT-4o
  • Cost per comparison: ~$0.005-0.007
  • 10 images (5 from adaptation + 5 from master)
  • Low detail mode to minimize cost

Examples

Scenario AI Triggered? Cost
1 same-aspect adaptation vs 50 masters No (100% match) $0.00
1 cross-aspect adaptation vs 50 masters Yes (no matches) ~$0.25-0.35
39 adaptations (38 same-aspect, 1 cross) vs 50 masters 1 only ~$0.30
100 same-aspect adaptations vs 50 masters None $0.00
100 cross-aspect adaptations vs 50 masters All 100 ~$25-35

Smart Triggering Benefits:

  • Only pays for what you need
  • Most batches cost < $1 (only cross-aspect videos)
  • Same-aspect matches are always free and fast!

Cost Tracking

The tool shows total cost after each run:

AI Vision total cost: $0.299

What AI Vision Ignores

AI Vision is trained to ignore these differences:

Text Variations:

  • Different languages (English → German → Spanish)
  • Different subtitles or captions
  • Different call-to-action text
  • Price tags or promotional text

Logo/Branding:

  • Logo size or placement changes
  • Different social media platform logos
  • Brand watermarks
  • Different aspect ratio templates

Technical Differences:

  • Different compression/quality
  • Different color grading (minor)
  • Different frame rates

What AI Vision Focuses On

AI Vision looks for semantic content:

🎯 People:

  • Same faces
  • Same clothing
  • Same poses/actions
  • Same movements

🎯 Products:

  • Same items being shown
  • Same product arrangements
  • Same product interactions

🎯 Settings:

  • Same backgrounds
  • Same environments
  • Same locations
  • Same props

🎯 Framing:

  • Same camera angles
  • Same composition (even if cropped)
  • Same shot sequence

Troubleshooting

⚠️ "AI Vision disabled (no API key)"

Solution: Set OPENAI_API_KEY in .env file

cp .env.example .env
# Edit .env and add your key

⚠️ "Error code: 401 - Invalid API key"

Solution: Check your API key is correct

# Verify key format (should start with sk-proj- or sk-)
cat .env | grep OPENAI_API_KEY

⚠️ "Error code: 429 - Rate limit exceeded"

Solution: You've hit OpenAI's rate limit

  • Wait a few minutes and try again
  • Reduce number of comparisons
  • Upgrade your OpenAI plan

⚠️ High costs

Solution: AI Vision is running too often

This usually means you have many cross-aspect adaptations. Options:

  1. Add masters in multiple aspect ratios (perceptual hash will match them)
  2. Pre-filter by aspect ratio (match 1:1 adaptations only against 1:1 masters)
  3. Increase confidence threshold to reduce AI Vision triggering

⚠️ "Model not found" error

Solution: Update to latest code (gpt-4-vision-preview deprecated)

The code should use gpt-4o model (already fixed in v2.0+)

Privacy & Security

What Gets Sent to OpenAI?

  • 5 JPEG frames from adaptation (base64-encoded)
  • 5 JPEG frames from master (base64-encoded)
  • Structured prompt asking for comparison
  • No video files
  • No audio
  • No metadata

Is it Secure?

  • HTTPS encrypted transmission
  • OpenAI doesn't train on your data (API)
  • Frames are deleted after analysis
  • .env file is gitignored (won't be committed)

Should I Use It?

Yes, if:

  • Content is not confidential
  • You're matching marketing/advertising content
  • You need cross-aspect detection
  • Cost is acceptable (~$0.30 per 50 masters)

No, if:

  • Content is highly sensitive/confidential
  • You're working with NDA/private content
  • You want 100% on-premise solution
  • Budget is extremely tight

Alternative: Use perceptual hashing only and ensure masters exist in all aspect ratios.

Optimization Tips

1. Add Multiple Aspect Ratio Masters

If you have masters in all aspect ratios, perceptual hashing will match them for free:

# Add 16:9 master
python cli.py add-master master_16x9.mp4

# Add 1:1 master (same content, cropped)
python cli.py add-master master_1x1.mp4

# Add 9:16 master (same content, cropped)
python cli.py add-master master_9x16.mp4

Now adaptations will match without AI Vision!

2. Pre-Filter by Aspect Ratio

Before matching, check aspect ratios:

from video_matcher.fingerprinter import VideoFingerprinter

fp = VideoFingerprinter()
info = fp.get_video_info("adaptation.mp4")
width, height = info['width'], info['height']
aspect = width / height

if aspect > 1.5:
    print("16:9 video - match against 16:9 masters only")
elif 0.9 < aspect < 1.1:
    print("1:1 video - match against 1:1 masters only")
else:
    print("9:16 video - match against 9:16 masters only")

3. Batch Strategically

AI Vision costs scale with comparisons. For 100 adaptations:

Expensive ($150-250):

# All adaptations against all masters
python cli.py batch-match adaptations/  # 100 × 50 masters = 5000 AI calls

Optimized ($5-10):

# First, quickly check which adaptations need AI Vision
# Then only run AI Vision on those that failed

Disable AI Vision

To completely disable AI Vision:

Option 1: Remove API Key

# In .env file, comment out or delete:
# OPENAI_API_KEY=sk-...

Option 2: Empty Value

# In .env file:
OPENAI_API_KEY=

Option 3: Don't Create .env File

Just don't create .env - AI Vision won't work without it.

The tool works perfectly fine without AI Vision - you just won't get cross-aspect matching.

Examples

Example 1: Instagram 1:1 from 16:9 Master

$ python cli.py match instagram_1x1_post.mp4

Analyzing adaptation: instagram_1x1_post.mp4
Comparing against 47 master(s)...

  No high-confidence matches found.
  Trying AI Vision (GPT-4o) for cross-aspect matching...

  ✓ AI Vision match: master_16x9_campaign_v1 (confidence: 95%, cost: $0.007)

Found 1 master(s) matching this adaptation:

Best Match:
  Master: master_16x9_campaign_v1
  Video frames matched: 95.0%
  Combined confidence: 95.0%

AI Vision Analysis:
  Method: GPT-4o (OpenAI)
  Format: Adaptation is cropped from master

  AI Reasoning:
  The same person appears in both sets wearing identical clothing.
  Set A appears to be a cropped center-portion of Set B, focusing on
  the subject while removing the wider 16:9 framing...

AI Vision total cost: $0.007

Example 2: TikTok 9:16 from 16:9 Master

$ python cli.py match tiktok_vertical.mp4

Analyzing adaptation: tiktok_vertical.mp4
Comparing against 47 master(s)...

  No high-confidence matches found.
  Trying AI Vision (GPT-4o) for cross-aspect matching...

  ✓ AI Vision match: summer_collection_16x9 (confidence: 92%, cost: $0.006)

Best Match:
  Master: summer_collection_16x9
  Video frames matched: 92.0%
  Combined confidence: 92.0%

AI Vision Analysis:
  Method: GPT-4o (OpenAI)
  Format: Adaptation is cropped from master

  AI Reasoning:
  Both videos show the same product photoshoot with identical models,
  clothing, and studio background. The 9:16 version is a vertical crop
  of the 16:9 source, maintaining the center subject while trimming
  horizontal edges...

FAQ

Q: Will AI Vision always be triggered? A: No, only when perceptual hashing fails or confidence < 90%

Q: Can I force AI Vision even for same-aspect videos? A: Not currently, but you could modify the threshold in matcher.py:190

Q: Does AI Vision work offline? A: No, it requires internet connection to OpenAI API

Q: Can I use a different AI model? A: Yes, you could modify ai_vision.py to use Claude, Gemini, etc.

Q: What if I run out of OpenAI credits? A: AI Vision will fail gracefully and return no matches

Q: Can AI Vision detect same-aspect matches too? A: Yes! But it's slower and costs money, so we use perceptual hash first

Q: Is GPT-4o better than GPT-4 Vision? A: Yes! GPT-4o is newer, faster, cheaper, and more accurate

Q: How accurate is AI Vision? A: Very accurate! In testing: 95%+ for clear matches, <5% false positives

Support

For issues with AI Vision:

  1. Check this guide first
  2. Verify API key in .env file
  3. Check OpenAI API status: https://status.openai.com
  4. Review troubleshooting section above
  5. Open GitHub issue if problem persists

Version: 2.0.0 Last Updated: 2025-10-10 Model: GPT-4o