nickviljoen eb31ac1498 Initial Commit

2025-10-15 16:25:04 +02:00

11 KiB

Raw Blame History

AI Vision Guide

What is AI Vision?

AI Vision is a Tier 2 matching system that uses OpenAI's GPT-4o vision model to detect video matches that perceptual hashing can't find. It's especially powerful for cross-aspect-ratio scenarios.

When is it Used?

AI Vision smartly activates only when truly needed:

✅ No matches found with perceptual hashing (likely cross-aspect), OR
✅ Incomplete coverage (best match has < 100% frame coverage)

AI Vision is skipped when:

❌ Perfect match found (100% frame coverage)
❌ Same aspect ratio with complete match

Why this matters:

In typical batches, only 1-2 out of 39 adaptations need AI Vision
Saves ~97% of AI costs! ($0.30 vs $12 for 39 videos)
Much faster processing (seconds vs minutes)

You don't need to do anything - it automatically optimizes!

What Problems Does it Solve?

❌ Problem: Cross-Aspect Ratios

Traditional perceptual hashing fails when comparing:

16:9 master → 1:1 square adaptation (Instagram, Facebook)
16:9 master → 9:16 vertical adaptation (TikTok, Stories)
16:9 master → 4:5 portrait adaptation (Instagram feed)

Why? The pixel layouts are completely different after cropping/scaling.

✅ Solution: Semantic Understanding

AI Vision looks at the content, not pixels:

Same people? ✓
Same products? ✓
Same settings? ✓
Same framing (even if cropped)? ✓
Different text/logos? Ignored!

Setup

1. Get OpenAI API Key

Visit https://platform.openai.com/api-keys and create a new key.

2. Configure Environment

# Copy example file
cp .env.example .env

# Edit .env and add your key
nano .env

Add this line:

OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxx

3. Verify

python cli.py status

You should see:

✓ AI Vision enabled (GPT-4o)

Usage

No changes needed! Just run your normal matching commands:

# Single match
python cli.py match /path/to/adaptation.mp4

# Batch match
python cli.py batch-match /path/to/adaptations/

AI Vision will activate automatically when needed.

Understanding Results

Terminal Output

When AI Vision finds a match, you'll see:

Best Match:
  Master: 5368082_1011A_SF_DROP_1_20_D_16x9_BVOD_YT_OLV_MASTER_1
  Duration: 20s
  Video frames matched: 95.0% (28/30 frames)
  Average frame similarity: 95.0%
  Combined confidence: 95.0%

AI Vision Analysis:
  Method: GPT-4o (OpenAI)
  Format: Adaptation is cropped from master

  AI Reasoning:
  Both sets feature the same two people in identical clothing and poses,
  indicating they are the same footage. The settings, such as the plain,
  light-colored backdrop, are consistent across both sets...

Key Fields

Field	Meaning
Method	Shows "AI Vision" instead of "Hash"
Format	Indicates if adaptation is cropped from master
AI Reasoning	Human-readable explanation of the match
Combined confidence	Match confidence (0-100%)

Cost

Pricing (as of October 2025)

Model: GPT-4o
Cost per comparison: ~$0.005-0.007
10 images (5 from adaptation + 5 from master)
Low detail mode to minimize cost

Examples

Scenario	AI Triggered?	Cost
1 same-aspect adaptation vs 50 masters	No (100% match)	$0.00
1 cross-aspect adaptation vs 50 masters	Yes (no matches)	~$0.25-0.35
39 adaptations (38 same-aspect, 1 cross) vs 50 masters	1 only	~$0.30
100 same-aspect adaptations vs 50 masters	None	$0.00
100 cross-aspect adaptations vs 50 masters	All 100	~$25-35

Smart Triggering Benefits:

✅ Only pays for what you need
✅ Most batches cost < $1 (only cross-aspect videos)
✅ Same-aspect matches are always free and fast!

Cost Tracking

The tool shows total cost after each run:

AI Vision total cost: $0.299

What AI Vision Ignores

AI Vision is trained to ignore these differences:

✅ Text Variations:

Different languages (English → German → Spanish)
Different subtitles or captions
Different call-to-action text
Price tags or promotional text

✅ Logo/Branding:

Logo size or placement changes
Different social media platform logos
Brand watermarks
Different aspect ratio templates

✅ Technical Differences:

Different compression/quality
Different color grading (minor)
Different frame rates

What AI Vision Focuses On

AI Vision looks for semantic content:

🎯 People:

Same faces
Same clothing
Same poses/actions
Same movements

🎯 Products:

Same items being shown
Same product arrangements
Same product interactions

🎯 Settings:

Same backgrounds
Same environments
Same locations
Same props

🎯 Framing:

Same camera angles
Same composition (even if cropped)
Same shot sequence

Troubleshooting

⚠️ "AI Vision disabled (no API key)"

Solution: Set OPENAI_API_KEY in .env file

cp .env.example .env
# Edit .env and add your key

⚠️ "Error code: 401 - Invalid API key"

Solution: Check your API key is correct

# Verify key format (should start with sk-proj- or sk-)
cat .env | grep OPENAI_API_KEY

⚠️ "Error code: 429 - Rate limit exceeded"

Solution: You've hit OpenAI's rate limit

Wait a few minutes and try again
Reduce number of comparisons
Upgrade your OpenAI plan

⚠️ High costs

Solution: AI Vision is running too often

This usually means you have many cross-aspect adaptations. Options:

Add masters in multiple aspect ratios (perceptual hash will match them)
Pre-filter by aspect ratio (match 1:1 adaptations only against 1:1 masters)
Increase confidence threshold to reduce AI Vision triggering

⚠️ "Model not found" error

Solution: Update to latest code (gpt-4-vision-preview deprecated)

The code should use gpt-4o model (already fixed in v2.0+)

Privacy & Security

What Gets Sent to OpenAI?

✅ 5 JPEG frames from adaptation (base64-encoded)
✅ 5 JPEG frames from master (base64-encoded)
✅ Structured prompt asking for comparison
❌ No video files
❌ No audio
❌ No metadata

Is it Secure?

✅ HTTPS encrypted transmission
✅ OpenAI doesn't train on your data (API)
✅ Frames are deleted after analysis
✅ .env file is gitignored (won't be committed)

Should I Use It?

Yes, if:

Content is not confidential
You're matching marketing/advertising content
You need cross-aspect detection
Cost is acceptable (~$0.30 per 50 masters)

No, if:

Content is highly sensitive/confidential
You're working with NDA/private content
You want 100% on-premise solution
Budget is extremely tight

Alternative: Use perceptual hashing only and ensure masters exist in all aspect ratios.

Optimization Tips

1. Add Multiple Aspect Ratio Masters

If you have masters in all aspect ratios, perceptual hashing will match them for free:

# Add 16:9 master
python cli.py add-master master_16x9.mp4

# Add 1:1 master (same content, cropped)
python cli.py add-master master_1x1.mp4

# Add 9:16 master (same content, cropped)
python cli.py add-master master_9x16.mp4

Now adaptations will match without AI Vision!

2. Pre-Filter by Aspect Ratio

Before matching, check aspect ratios:

from video_matcher.fingerprinter import VideoFingerprinter

fp = VideoFingerprinter()
info = fp.get_video_info("adaptation.mp4")
width, height = info['width'], info['height']
aspect = width / height

if aspect > 1.5:
    print("16:9 video - match against 16:9 masters only")
elif 0.9 < aspect < 1.1:
    print("1:1 video - match against 1:1 masters only")
else:
    print("9:16 video - match against 9:16 masters only")

3. Batch Strategically

AI Vision costs scale with comparisons. For 100 adaptations:

Expensive ($150-250):

# All adaptations against all masters
python cli.py batch-match adaptations/  # 100 × 50 masters = 5000 AI calls

Optimized ($5-10):

# First, quickly check which adaptations need AI Vision
# Then only run AI Vision on those that failed

Disable AI Vision

To completely disable AI Vision:

Option 1: Remove API Key

# In .env file, comment out or delete:
# OPENAI_API_KEY=sk-...

Option 2: Empty Value

# In .env file:
OPENAI_API_KEY=

Option 3: Don't Create .env File

Just don't create .env - AI Vision won't work without it.

The tool works perfectly fine without AI Vision - you just won't get cross-aspect matching.

Examples

Example 1: Instagram 1:1 from 16:9 Master

$ python cli.py match instagram_1x1_post.mp4

Analyzing adaptation: instagram_1x1_post.mp4
Comparing against 47 master(s)...

  No high-confidence matches found.
  Trying AI Vision (GPT-4o) for cross-aspect matching...

  ✓ AI Vision match: master_16x9_campaign_v1 (confidence: 95%, cost: $0.007)

Found 1 master(s) matching this adaptation:

Best Match:
  Master: master_16x9_campaign_v1
  Video frames matched: 95.0%
  Combined confidence: 95.0%

AI Vision Analysis:
  Method: GPT-4o (OpenAI)
  Format: Adaptation is cropped from master

  AI Reasoning:
  The same person appears in both sets wearing identical clothing.
  Set A appears to be a cropped center-portion of Set B, focusing on
  the subject while removing the wider 16:9 framing...

AI Vision total cost: $0.007

Example 2: TikTok 9:16 from 16:9 Master

$ python cli.py match tiktok_vertical.mp4

Analyzing adaptation: tiktok_vertical.mp4
Comparing against 47 master(s)...

  No high-confidence matches found.
  Trying AI Vision (GPT-4o) for cross-aspect matching...

  ✓ AI Vision match: summer_collection_16x9 (confidence: 92%, cost: $0.006)

Best Match:
  Master: summer_collection_16x9
  Video frames matched: 92.0%
  Combined confidence: 92.0%

AI Vision Analysis:
  Method: GPT-4o (OpenAI)
  Format: Adaptation is cropped from master

  AI Reasoning:
  Both videos show the same product photoshoot with identical models,
  clothing, and studio background. The 9:16 version is a vertical crop
  of the 16:9 source, maintaining the center subject while trimming
  horizontal edges...

FAQ

Q: Will AI Vision always be triggered? A: No, only when perceptual hashing fails or confidence < 90%

Q: Can I force AI Vision even for same-aspect videos? A: Not currently, but you could modify the threshold in matcher.py:190

Q: Does AI Vision work offline? A: No, it requires internet connection to OpenAI API

Q: Can I use a different AI model? A: Yes, you could modify ai_vision.py to use Claude, Gemini, etc.

Q: What if I run out of OpenAI credits? A: AI Vision will fail gracefully and return no matches

Q: Can AI Vision detect same-aspect matches too? A: Yes! But it's slower and costs money, so we use perceptual hash first

Q: Is GPT-4o better than GPT-4 Vision? A: Yes! GPT-4o is newer, faster, cheaper, and more accurate

Q: How accurate is AI Vision? A: Very accurate! In testing: 95%+ for clear matches, <5% false positives

Support

For issues with AI Vision:

Check this guide first
Verify API key in .env file
Check OpenAI API status: https://status.openai.com
Review troubleshooting section above
Open GitHub issue if problem persists

Version: 2.0.0 Last Updated: 2025-10-10 Model: GPT-4o

11 KiB Raw Blame History Unescape Escape

AI Vision Guide

What is AI Vision?

When is it Used?

What Problems Does it Solve?

❌ Problem: Cross-Aspect Ratios

✅ Solution: Semantic Understanding

Setup

1. Get OpenAI API Key

2. Configure Environment

3. Verify

Usage

Understanding Results

Terminal Output

Key Fields

Cost

Pricing (as of October 2025)

Examples

Cost Tracking

What AI Vision Ignores

What AI Vision Focuses On

Troubleshooting

⚠️ "AI Vision disabled (no API key)"

⚠️ "Error code: 401 - Invalid API key"

⚠️ "Error code: 429 - Rate limit exceeded"

⚠️ High costs

⚠️ "Model not found" error

Privacy & Security

What Gets Sent to OpenAI?

Is it Secure?

Should I Use It?

Optimization Tips

1. Add Multiple Aspect Ratio Masters

2. Pre-Filter by Aspect Ratio

3. Batch Strategically

Disable AI Vision

Option 1: Remove API Key

Option 2: Empty Value

Option 3: Don't Create .env File

Examples

Example 1: Instagram 1:1 from 16:9 Master

Example 2: TikTok 9:16 from 16:9 Master

FAQ

Support

11 KiB

Raw Blame History