23 KiB
Video Master-Adaptation Detection - Technical Documentation
Table of Contents
- Overview
- How It Works
- Architecture
- Matching Algorithm
- CLI Reference
- Batch Matching & HTML Reports
- Advanced Usage
- Understanding Results
- Performance Tuning
- Troubleshooting
- API Reference
Overview
This tool identifies which master video files were used to create adaptation videos (cutdowns, re-edits, speed changes, crops, etc.). It uses spatial-only matching that compares video content regardless of temporal order, making it robust to:
- Speed changes (slow-motion, time-lapse, speed ramping)
- Duration changes (15s adaptation from 20s master)
- Shot reordering (non-linear edits)
- Different aspect ratios (with separate masters per aspect ratio)
- Cropping and transformations
- Re-encoding and compression
Key Features
✅ Spatial-only video matching - Ignores timing, focuses on content ✅ Audio fingerprinting - Chromaprint-based robust audio matching ✅ Multi-master detection - Identifies all masters used in an adaptation ✅ Percentage contribution - Shows how much of each master was used ✅ Confidence scoring - Weighted scoring combining video + audio ✅ Batch processing - Bulk add masters from directories
How It Works
1. Fingerprinting Phase
When you add a master video or match an adaptation, the tool:
- Extracts frames at 2 frames per second (default, configurable)
- Creates perceptual hashes (8×8 DCT-based hashing)
- Extracts audio fingerprint using Chromaprint (if available)
- Stores fingerprints as JSON files for future comparisons
2. Matching Phase
When matching an adaptation against masters:
- Generates adaptation fingerprint (same process as masters)
- Spatial comparison: For each adaptation frame, finds the most similar frame in each master (anywhere in the timeline)
- Calculates percentage: (matching frames / total frames) × 100%
- Combines signals: Weighted combination of video (70%) + audio (30%)
- Ranks results: Sorted by combined confidence score
Key Insight: Spatial-Only Matching
Traditional video matching fails when adaptations are:
- Speed-changed (frames at different timestamps)
- Reordered (shots in different sequence)
- Edited (missing sections, insertions)
Solution: We ask "Does this frame exist ANYWHERE in the master?" instead of "Does this frame exist at timestamp T?"
This makes matching robust to timing changes while still accurately identifying source content.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ CLI Layer (cli.py) │
│ Commands: add-master, list-masters, match, clear, status │
└────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────────┐
│ Matcher Layer (matcher.py) │
│ • Loads fingerprints │
│ • Orchestrates comparison │
│ • Calculates percentages & confidence │
└────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────────┐
│ Fingerprinter Layer (fingerprinter.py) │
│ • Video frame extraction (FFmpeg) │
│ • Perceptual hashing (8×8 DCT) │
│ • Audio fingerprinting (Chromaprint) │
│ • Spatial-only comparison │
└────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────────┐
│ Storage Layer │
│ • data/fingerprints/*.json - Fingerprint files │
│ • data/masters.json - Master video database │
└─────────────────────────────────────────────────────────────────┘
Core Components
1. VideoFingerprinter (fingerprinter.py)
- Extracts video frames and generates perceptual hashes
- Creates audio fingerprints using Chromaprint
- Supports configurable sampling rate (frames per second)
- Stores fingerprints as JSON for reuse
2. VideoMatcher (matcher.py)
- Manages master video database
- Performs spatial-only matching
- Calculates percentage contributions
- Generates confidence scores
3. CLI (cli.py)
- User-facing command-line interface
- Rich terminal output with tables and colors
- Progress bars for batch operations
Matching Algorithm
Spatial-Only Video Matching
def compare_spatial_only(adaptation_fp, master_fp, threshold=0.70):
matches = 0
for adapt_frame in adaptation_frames:
best_similarity = 0
# Compare against ALL master frames (ignore time)
for master_frame in master_frames:
similarity = hamming_distance(adapt_frame.hash, master_frame.hash)
best_similarity = max(best_similarity, similarity)
if best_similarity >= threshold:
matches += 1
percentage = (matches / total_frames) * 100
return percentage
Key Parameters
| Parameter | Default | Description |
|---|---|---|
samples_per_second |
2.0 | Frames extracted per second (configurable in code) |
frame_threshold |
0.70 | Minimum similarity for frame match (0-1) |
threshold |
0.30 | Minimum % of frames to report master (0-1) |
Confidence Calculation
combined_score = (video_percentage / 100 × 0.7) + (audio_similarity × 0.3)
Confidence Levels:
- Very High: combined_score ≥ 0.90
- High: combined_score ≥ 0.75
- Medium: combined_score ≥ 0.60
- Low: combined_score ≥ 0.50
- Very Low: combined_score < 0.50
CLI Reference
add-master - Add Master Video
Add a master video to the library.
python cli.py add-master <video_path> [--id <custom_id>]
Examples:
# Auto-generate ID from filename
python cli.py add-master /path/to/master.mp4
# Use custom ID
python cli.py add-master /path/to/master.mp4 --id master_v1
list-masters - List All Masters
Display all master videos in the library.
python cli.py list-masters
Output:
- Master ID
- Filename
- Duration
- File path
match - Match Adaptation Video
Match an adaptation against all masters using spatial-only matching.
python cli.py match <video_path> [OPTIONS]
Options:
--threshold,-t(default: 0.3): Minimum percentage of frames matching (0-1)--frame-threshold,-f(default: 0.70): Similarity threshold for individual frames (0-1)
Examples:
# Default matching
python cli.py match /path/to/adaptation.mp4
# Stricter matching (require 50% of frames)
python cli.py match /path/to/adaptation.mp4 -t 0.5
# More sensitive frame matching
python cli.py match /path/to/adaptation.mp4 -f 0.65
# Combined: require 70% match with sensitive frame detection
python cli.py match /path/to/adaptation.mp4 -t 0.7 -f 0.65
status - System Status
Check system dependencies and library statistics.
python cli.py status
Shows:
- FFmpeg availability
- Chromaprint/AcoustID status
- TMK status
- Number of master videos
batch-match - Batch Match Folder
Match all videos in a folder and generate an HTML report.
python cli.py batch-match <folder_path> [OPTIONS]
Options:
--threshold,-t(default: 0.3): Minimum percentage match (0-1)--frame-threshold,-f(default: 0.70): Frame similarity threshold (0-1)--output,-o: Output HTML file path (default: auto-generated timestamp)
Examples:
# Process all videos in folder
python cli.py batch-match /path/to/adaptations/
# Custom thresholds
python cli.py batch-match /path/to/adaptations/ -t 0.5 -f 0.75
# Custom output filename
python cli.py batch-match /path/to/adaptations/ -o report.html
Output:
- Generates timestamped HTML report:
matching_report_YYYYMMDD_HHMMSS.html - Shows summary statistics in terminal
- Provides clickable file:// URL to open report
clear - Clear Library
Remove all master videos from the library.
python cli.py clear
⚠️ Warning: This deletes all fingerprints and master records. Cannot be undone.
Batch Matching & HTML Reports
Overview
The batch matching feature allows you to process an entire folder of adaptation videos and generate a comprehensive HTML report showing which masters were used for each adaptation.
Usage
Command Line:
# Basic usage
python cli.py batch-match /path/to/adaptations/
# With custom thresholds
python cli.py batch-match /path/to/adaptations/ -t 0.5 -f 0.75
# Specify output filename
python cli.py batch-match /path/to/adaptations/ -o my_report.html
Standalone Script:
# You can also use the standalone script
python batch_match.py /path/to/adaptations/
python batch_match.py /path/to/adaptations/ --output reports/batch_results.html
HTML Report Features
The generated HTML report includes:
1. Summary Dashboard
- Total adaptations processed
- Number of matched adaptations
- Number with no matches
- Total master matches across all adaptations
2. Per-Adaptation Cards Each adaptation is shown in a card with:
- Adaptation filename
- Number of matches badge
- List of all matching masters
- Error message (if processing failed)
3. Per-Master Match Details For each matching master:
- Master ID and filename
- Color-coded confidence badge:
- 🟢 Green - Very High/High confidence
- 🟡 Yellow - Medium confidence
- 🔴 Red - Low/Very Low confidence
- Master duration
- Video match percentage
- Frames matched (X/Y format)
- Combined confidence score
- Visual progress bar showing match percentage
4. Design Features
- Modern gradient design (purple theme)
- Responsive layout (works on mobile/tablet/desktop)
- Hover effects on cards
- Print-friendly styling
- Clean, professional appearance
Example Workflow
# 1. Add all masters
python bulk_add_masters.py "masters/" -r
# 2. Process all adaptations
python cli.py batch-match "adaptations/"
# 3. Open the generated report
open matching_report_20251010_153045.html
# 4. Review results:
# - Which adaptations matched which masters
# - Confidence levels for each match
# - Any processing errors
Use Cases
Quality Control:
- Verify adaptations were created from correct masters
- Check if all expected masters were used
- Identify adaptations with low confidence matches
Production Tracking:
- Document which masters were used for each delivery
- Generate audit trail of master usage
- Track adaptation creation workflow
Asset Management:
- Identify unused masters
- Find duplicate or similar adaptations
- Organize video library by source masters
Report Customization
The HTML report can be customized by editing batch_match.py:
# Line 23: Change color scheme
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
# Line 80: Adjust card styling
.adaptation {
background: white;
padding: 25px;
border-radius: 15px;
}
# Line 150: Modify confidence colors
.confidence-very-high { background: #51cf66; }
.confidence-high { background: #69db7c; }
Advanced Usage
Bulk Adding Masters
Use the bulk_add_masters.py script to add multiple videos at once:
# Add all .mp4 files from a directory
python bulk_add_masters.py /path/to/masters/
# Recursively add from subdirectories
python bulk_add_masters.py /path/to/masters/ --recursive
# Add specific pattern
python bulk_add_masters.py /path/to/masters/ --pattern "*.mov"
Adjusting Sampling Rate
The default is 2 frames per second, optimized for fast-paced advertising content with quick edits.
Edit src/video_matcher/fingerprinter.py:106:
samples_per_second = 2.0 # Default: good for ads with quick cuts
samples_per_second = 1.0 # Faster: basic matching, may miss quick edits
samples_per_second = 3.0 # Slower: catches sub-second cuts
Trade-offs:
| Rate | 20s Video | Use Case | Pros | Cons |
|---|---|---|---|---|
| 0.5 fps | 10 frames | Long-form content | Fast, small files | May miss cuts |
| 1.0 fps | 20 frames | General purpose | Balanced | Misses quick edits |
| 2.0 fps | 40 frames | Ads/Marketing | Catches quick cuts | 2x storage |
| 3.0 fps | 60 frames | Frame-accurate | Very detailed | 3x slower |
Recommendation: Keep 2 fps for advertising/marketing content with fast edits.
Handling Different Aspect Ratios
Best Practice: Maintain separate masters for each aspect ratio:
masters/
├── 16x9/
│ ├── master_A_16x9.mp4
│ ├── master_B_16x9.mp4
├── 9x16/
│ ├── master_A_9x16.mp4
│ ├── master_B_9x16.mp4
└── 1x1/
├── master_A_1x1.mp4
└── master_B_1x1.mp4
Add all versions to the library:
python bulk_add_masters.py masters/16x9/ -r
python bulk_add_masters.py masters/9x16/ -r
python bulk_add_masters.py masters/1x1/ -r
The matcher will automatically identify the correct aspect ratio master.
Understanding Results
Sample Output
Found 2 master(s) matching this adaptation:
╭──────┬────────────┬─────────────┬────────┬───────┬──────────┬────────────╮
│ Rank │ Master ID │ Video Match │ Frames │ Audio │ Combined │ Confidence │
├──────┼────────────┼─────────────┼────────┼───────┼──────────┼────────────┤
│ 1 │ master_C │ 100.0% │ 15/15 │ 0.500 │ 0.850 │ High │
│ 2 │ master_B │ 73.3% │ 11/15 │ 0.500 │ 0.663 │ Medium │
╰──────┴────────────┴─────────────┴────────┴───────┴──────────┴────────────╯
Best Match:
Master: master_C
Video frames matched: 100.0% (15/15 frames)
Average frame similarity: 94.4%
Audio similarity: 0.500
Combined confidence: 85.0%
Interpreting Scores
Video Match Percentage:
- 100%: All adaptation frames found in master
- 75-99%: Most frames match, likely correct master
- 50-74%: Partial match, possibly similar content
- <50%: Unlikely to be source master
Average Frame Similarity:
- >90%: Near-identical frames (same encoding/quality)
- 75-90%: Very similar (different encoding/compression)
- 60-75%: Similar content (crops, color grading)
- <60%: Different content or heavy transformations
Combined Score:
- Weighted combination: 70% video + 30% audio
- Audio helps disambiguate visually similar masters
- Higher combined score = more confident match
When Multiple Masters Match
If an adaptation uses content from multiple masters:
Best Match:
Master: master_A - 60% of frames
Other Potential Matches:
• master_B: 40% of frames
This indicates the adaptation combined:
- 60% content from master_A
- 40% content from master_B
Performance Tuning
Speed vs Accuracy
For faster matching (lower accuracy):
# Reduce sampling rate (1.0 = 1 frame per second)
samples_per_second = 1.0
# Increase thresholds (stricter matching)
frame_threshold = 0.75
threshold = 0.5
For better accuracy (slower):
# Increase sampling rate (3.0 = 3 frames per second)
samples_per_second = 3.0
# Lower thresholds (more sensitive)
frame_threshold = 0.65
threshold = 0.3
Default (balanced for ads):
samples_per_second = 2.0 # Catches quick edits
frame_threshold = 0.70
threshold = 0.3
Large Libraries
For libraries with 100+ masters:
-
Pre-filter by duration:
- Skip masters that are too short/long for the adaptation
-
Use audio pre-filtering:
- Match audio first, then only check video for audio matches
-
Parallel processing:
- Compare against multiple masters simultaneously
Troubleshooting
Common Issues
❌ No matches found
Cause: Thresholds too strict, or videos unrelated
Solution:
# Try more lenient settings
python cli.py match video.mp4 -t 0.2 -f 0.65
❌ Too many false positives
Cause: Thresholds too lenient, similar-looking content
Solution:
# Stricter matching
python cli.py match video.mp4 -t 0.5 -f 0.75
❌ Speed-changed adaptations not matching
Cause: Already handled! Spatial matching ignores timing
Check:
- Ensure video content is actually similar
- Lower frame_threshold if heavily processed
❌ Different aspect ratios not matching
Solution: Ensure you have masters in the same aspect ratio
# Add masters for each aspect ratio
python cli.py add-master master_16x9.mp4
python cli.py add-master master_1x1.mp4
❌ Audio similarity always 0.500
Cause: Chromaprint comparison not fully implemented (placeholder)
Note: This is a POC limitation. Video matching still works.
API Reference
VideoFingerprinter
from video_matcher.fingerprinter import VideoFingerprinter
fp = VideoFingerprinter(data_dir="data/fingerprints")
# Generate fingerprint
fingerprint = fp.fingerprint_video(
video_path="/path/to/video.mp4",
video_id="my_video"
)
# Load existing fingerprint
existing = fp.load_fingerprint("my_video")
# List all fingerprints
all_ids = fp.list_fingerprints()
VideoMatcher
from video_matcher.matcher import VideoMatcher
matcher = VideoMatcher(data_dir="data")
# Add master
matcher.add_master(
video_path="/path/to/master.mp4",
master_id="master_1"
)
# List masters
masters = matcher.list_masters()
# Match adaptation
matches = matcher.match_adaptation(
video_path="/path/to/adaptation.mp4",
threshold=0.3,
frame_threshold=0.70
)
# Clear all masters
matcher.clear_masters()
Comparison Functions
from video_matcher.fingerprinter import (
compare_spatial_only,
compare_audio_fingerprints
)
# Spatial video comparison
result = compare_spatial_only(
adaptation_fp=adapt_fp,
master_fp=master_fp,
similarity_threshold=0.75
)
# Returns: {
# 'matching_frames': 12,
# 'total_frames': 15,
# 'percentage': 80.0,
# 'average_similarity': 0.87
# }
# Audio comparison
audio_score = compare_audio_fingerprints(
fp1=adapt_audio,
fp2=master_audio
)
# Returns: float (0-1)
File Formats
Fingerprint JSON Structure
{
"video_id": "master_example",
"path": "/path/to/video.mp4",
"filename": "video.mp4",
"info": {
"duration": 20.0,
"width": 1920,
"height": 1080,
"fps": 25.0,
"has_audio": true,
"codec": "h264"
},
"audio_fp": {
"duration": 20.0,
"fingerprint": "AQAAZEw4Kc9w...",
"method": "chromaprint"
},
"video_fp": {
"method": "basic_hash",
"samples_per_second": 1.0,
"num_frames": 20,
"frames": [
{
"frame_id": 0,
"timestamp": 0.0,
"hash": "0xcfcfc7e3c3e3e3e3"
}
]
}
}
Masters Database (masters.json)
{
"masters": [
{
"master_id": "master_example",
"fingerprint_id": "master_master_example",
"path": "/path/to/video.mp4",
"filename": "video.mp4",
"duration": 20.0
}
]
}
Future Enhancements
Production-Ready Improvements
- TMK Integration - Facebook's Threat Match for more robust matching
- Segment Timeline - Show exactly which parts came from which master
- Web UI - Drag-drop interface with side-by-side comparison
- Batch Processing - Process hundreds of adaptations in parallel
- Database Storage - PostgreSQL/MongoDB instead of JSON files
- Vector Search - Milvus/Qdrant for sub-second matching in large libraries
- GPU Acceleration - CUDA-based hash computation
- CLIP Embeddings - Handle heavy crops, overlays, graphics
- Shot Detection - PySceneDetect for segment-level matching
- Audio Refinement - Proper Chromaprint comparison implementation
Suggested Architecture for Scale
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Web UI │────▶│ API Gateway │────▶│ Job Queue │
│ (React) │ │ (FastAPI) │ │ (Celery) │
└──────────────┘ └──────────────┘ └──────┬───────┘
│
┌──────────────┐ ┌───────▼───────┐
│ Vector DB │────▶│ Workers │
│ (Qdrant) │ │ (GPU-based) │
└──────────────┘ └───────────────┘
License
MIT License - See LICENSE file for details.
Support & Contact
For questions, issues, or contributions, please open an issue on the GitHub repository.
Documentation Version: 1.0 Last Updated: 2025-10-05