11 KiB
AI Vision Guide
What is AI Vision?
AI Vision is a Tier 2 matching system that uses OpenAI's GPT-4o vision model to detect video matches that perceptual hashing can't find. It's especially powerful for cross-aspect-ratio scenarios.
When is it Used?
AI Vision smartly activates only when truly needed:
- ✅ No matches found with perceptual hashing (likely cross-aspect), OR
- ✅ Incomplete coverage (best match has < 100% frame coverage)
AI Vision is skipped when:
- ❌ Perfect match found (100% frame coverage)
- ❌ Same aspect ratio with complete match
Why this matters:
- In typical batches, only 1-2 out of 39 adaptations need AI Vision
- Saves ~97% of AI costs! ($0.30 vs $12 for 39 videos)
- Much faster processing (seconds vs minutes)
You don't need to do anything - it automatically optimizes!
What Problems Does it Solve?
❌ Problem: Cross-Aspect Ratios
Traditional perceptual hashing fails when comparing:
- 16:9 master → 1:1 square adaptation (Instagram, Facebook)
- 16:9 master → 9:16 vertical adaptation (TikTok, Stories)
- 16:9 master → 4:5 portrait adaptation (Instagram feed)
Why? The pixel layouts are completely different after cropping/scaling.
✅ Solution: Semantic Understanding
AI Vision looks at the content, not pixels:
- Same people? ✓
- Same products? ✓
- Same settings? ✓
- Same framing (even if cropped)? ✓
- Different text/logos? Ignored!
Setup
1. Get OpenAI API Key
Visit https://platform.openai.com/api-keys and create a new key.
2. Configure Environment
# Copy example file
cp .env.example .env
# Edit .env and add your key
nano .env
Add this line:
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxx
3. Verify
python cli.py status
You should see:
✓ AI Vision enabled (GPT-4o)
Usage
No changes needed! Just run your normal matching commands:
# Single match
python cli.py match /path/to/adaptation.mp4
# Batch match
python cli.py batch-match /path/to/adaptations/
AI Vision will activate automatically when needed.
Understanding Results
Terminal Output
When AI Vision finds a match, you'll see:
Best Match:
Master: 5368082_1011A_SF_DROP_1_20_D_16x9_BVOD_YT_OLV_MASTER_1
Duration: 20s
Video frames matched: 95.0% (28/30 frames)
Average frame similarity: 95.0%
Combined confidence: 95.0%
AI Vision Analysis:
Method: GPT-4o (OpenAI)
Format: Adaptation is cropped from master
AI Reasoning:
Both sets feature the same two people in identical clothing and poses,
indicating they are the same footage. The settings, such as the plain,
light-colored backdrop, are consistent across both sets...
Key Fields
| Field | Meaning |
|---|---|
| Method | Shows "AI Vision" instead of "Hash" |
| Format | Indicates if adaptation is cropped from master |
| AI Reasoning | Human-readable explanation of the match |
| Combined confidence | Match confidence (0-100%) |
Cost
Pricing (as of October 2025)
- Model: GPT-4o
- Cost per comparison: ~$0.005-0.007
- 10 images (5 from adaptation + 5 from master)
- Low detail mode to minimize cost
Examples
| Scenario | AI Triggered? | Cost |
|---|---|---|
| 1 same-aspect adaptation vs 50 masters | No (100% match) | $0.00 |
| 1 cross-aspect adaptation vs 50 masters | Yes (no matches) | ~$0.25-0.35 |
| 39 adaptations (38 same-aspect, 1 cross) vs 50 masters | 1 only | ~$0.30 |
| 100 same-aspect adaptations vs 50 masters | None | $0.00 |
| 100 cross-aspect adaptations vs 50 masters | All 100 | ~$25-35 |
Smart Triggering Benefits:
- ✅ Only pays for what you need
- ✅ Most batches cost < $1 (only cross-aspect videos)
- ✅ Same-aspect matches are always free and fast!
Cost Tracking
The tool shows total cost after each run:
AI Vision total cost: $0.299
What AI Vision Ignores
AI Vision is trained to ignore these differences:
✅ Text Variations:
- Different languages (English → German → Spanish)
- Different subtitles or captions
- Different call-to-action text
- Price tags or promotional text
✅ Logo/Branding:
- Logo size or placement changes
- Different social media platform logos
- Brand watermarks
- Different aspect ratio templates
✅ Technical Differences:
- Different compression/quality
- Different color grading (minor)
- Different frame rates
What AI Vision Focuses On
AI Vision looks for semantic content:
🎯 People:
- Same faces
- Same clothing
- Same poses/actions
- Same movements
🎯 Products:
- Same items being shown
- Same product arrangements
- Same product interactions
🎯 Settings:
- Same backgrounds
- Same environments
- Same locations
- Same props
🎯 Framing:
- Same camera angles
- Same composition (even if cropped)
- Same shot sequence
Troubleshooting
⚠️ "AI Vision disabled (no API key)"
Solution: Set OPENAI_API_KEY in .env file
cp .env.example .env
# Edit .env and add your key
⚠️ "Error code: 401 - Invalid API key"
Solution: Check your API key is correct
# Verify key format (should start with sk-proj- or sk-)
cat .env | grep OPENAI_API_KEY
⚠️ "Error code: 429 - Rate limit exceeded"
Solution: You've hit OpenAI's rate limit
- Wait a few minutes and try again
- Reduce number of comparisons
- Upgrade your OpenAI plan
⚠️ High costs
Solution: AI Vision is running too often
This usually means you have many cross-aspect adaptations. Options:
- Add masters in multiple aspect ratios (perceptual hash will match them)
- Pre-filter by aspect ratio (match 1:1 adaptations only against 1:1 masters)
- Increase confidence threshold to reduce AI Vision triggering
⚠️ "Model not found" error
Solution: Update to latest code (gpt-4-vision-preview deprecated)
The code should use gpt-4o model (already fixed in v2.0+)
Privacy & Security
What Gets Sent to OpenAI?
- ✅ 5 JPEG frames from adaptation (base64-encoded)
- ✅ 5 JPEG frames from master (base64-encoded)
- ✅ Structured prompt asking for comparison
- ❌ No video files
- ❌ No audio
- ❌ No metadata
Is it Secure?
- ✅ HTTPS encrypted transmission
- ✅ OpenAI doesn't train on your data (API)
- ✅ Frames are deleted after analysis
- ✅
.envfile is gitignored (won't be committed)
Should I Use It?
Yes, if:
- Content is not confidential
- You're matching marketing/advertising content
- You need cross-aspect detection
- Cost is acceptable (~$0.30 per 50 masters)
No, if:
- Content is highly sensitive/confidential
- You're working with NDA/private content
- You want 100% on-premise solution
- Budget is extremely tight
Alternative: Use perceptual hashing only and ensure masters exist in all aspect ratios.
Optimization Tips
1. Add Multiple Aspect Ratio Masters
If you have masters in all aspect ratios, perceptual hashing will match them for free:
# Add 16:9 master
python cli.py add-master master_16x9.mp4
# Add 1:1 master (same content, cropped)
python cli.py add-master master_1x1.mp4
# Add 9:16 master (same content, cropped)
python cli.py add-master master_9x16.mp4
Now adaptations will match without AI Vision!
2. Pre-Filter by Aspect Ratio
Before matching, check aspect ratios:
from video_matcher.fingerprinter import VideoFingerprinter
fp = VideoFingerprinter()
info = fp.get_video_info("adaptation.mp4")
width, height = info['width'], info['height']
aspect = width / height
if aspect > 1.5:
print("16:9 video - match against 16:9 masters only")
elif 0.9 < aspect < 1.1:
print("1:1 video - match against 1:1 masters only")
else:
print("9:16 video - match against 9:16 masters only")
3. Batch Strategically
AI Vision costs scale with comparisons. For 100 adaptations:
Expensive ($150-250):
# All adaptations against all masters
python cli.py batch-match adaptations/ # 100 × 50 masters = 5000 AI calls
Optimized ($5-10):
# First, quickly check which adaptations need AI Vision
# Then only run AI Vision on those that failed
Disable AI Vision
To completely disable AI Vision:
Option 1: Remove API Key
# In .env file, comment out or delete:
# OPENAI_API_KEY=sk-...
Option 2: Empty Value
# In .env file:
OPENAI_API_KEY=
Option 3: Don't Create .env File
Just don't create .env - AI Vision won't work without it.
The tool works perfectly fine without AI Vision - you just won't get cross-aspect matching.
Examples
Example 1: Instagram 1:1 from 16:9 Master
$ python cli.py match instagram_1x1_post.mp4
Analyzing adaptation: instagram_1x1_post.mp4
Comparing against 47 master(s)...
No high-confidence matches found.
Trying AI Vision (GPT-4o) for cross-aspect matching...
✓ AI Vision match: master_16x9_campaign_v1 (confidence: 95%, cost: $0.007)
Found 1 master(s) matching this adaptation:
Best Match:
Master: master_16x9_campaign_v1
Video frames matched: 95.0%
Combined confidence: 95.0%
AI Vision Analysis:
Method: GPT-4o (OpenAI)
Format: Adaptation is cropped from master
AI Reasoning:
The same person appears in both sets wearing identical clothing.
Set A appears to be a cropped center-portion of Set B, focusing on
the subject while removing the wider 16:9 framing...
AI Vision total cost: $0.007
Example 2: TikTok 9:16 from 16:9 Master
$ python cli.py match tiktok_vertical.mp4
Analyzing adaptation: tiktok_vertical.mp4
Comparing against 47 master(s)...
No high-confidence matches found.
Trying AI Vision (GPT-4o) for cross-aspect matching...
✓ AI Vision match: summer_collection_16x9 (confidence: 92%, cost: $0.006)
Best Match:
Master: summer_collection_16x9
Video frames matched: 92.0%
Combined confidence: 92.0%
AI Vision Analysis:
Method: GPT-4o (OpenAI)
Format: Adaptation is cropped from master
AI Reasoning:
Both videos show the same product photoshoot with identical models,
clothing, and studio background. The 9:16 version is a vertical crop
of the 16:9 source, maintaining the center subject while trimming
horizontal edges...
FAQ
Q: Will AI Vision always be triggered? A: No, only when perceptual hashing fails or confidence < 90%
Q: Can I force AI Vision even for same-aspect videos?
A: Not currently, but you could modify the threshold in matcher.py:190
Q: Does AI Vision work offline? A: No, it requires internet connection to OpenAI API
Q: Can I use a different AI model?
A: Yes, you could modify ai_vision.py to use Claude, Gemini, etc.
Q: What if I run out of OpenAI credits? A: AI Vision will fail gracefully and return no matches
Q: Can AI Vision detect same-aspect matches too? A: Yes! But it's slower and costs money, so we use perceptual hash first
Q: Is GPT-4o better than GPT-4 Vision? A: Yes! GPT-4o is newer, faster, cheaper, and more accurate
Q: How accurate is AI Vision? A: Very accurate! In testing: 95%+ for clear matches, <5% false positives
Support
For issues with AI Vision:
- Check this guide first
- Verify API key in
.envfile - Check OpenAI API status: https://status.openai.com
- Review troubleshooting section above
- Open GitHub issue if problem persists
Version: 2.0.0 Last Updated: 2025-10-10 Model: GPT-4o