815 lines
23 KiB
Markdown
815 lines
23 KiB
Markdown
# Video Master-Adaptation Detection - Technical Documentation
|
||
|
||
## Table of Contents
|
||
1. [Overview](#overview)
|
||
2. [How It Works](#how-it-works)
|
||
3. [Architecture](#architecture)
|
||
4. [Matching Algorithm](#matching-algorithm)
|
||
5. [CLI Reference](#cli-reference)
|
||
6. [Batch Matching & HTML Reports](#batch-matching--html-reports)
|
||
7. [Advanced Usage](#advanced-usage)
|
||
8. [Understanding Results](#understanding-results)
|
||
9. [Performance Tuning](#performance-tuning)
|
||
10. [Troubleshooting](#troubleshooting)
|
||
11. [API Reference](#api-reference)
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
This tool identifies which master video files were used to create adaptation videos (cutdowns, re-edits, speed changes, crops, etc.). It uses **spatial-only matching** that compares video content regardless of temporal order, making it robust to:
|
||
|
||
- **Speed changes** (slow-motion, time-lapse, speed ramping)
|
||
- **Duration changes** (15s adaptation from 20s master)
|
||
- **Shot reordering** (non-linear edits)
|
||
- **Different aspect ratios** (with separate masters per aspect ratio)
|
||
- **Cropping and transformations**
|
||
- **Re-encoding and compression**
|
||
|
||
### Key Features
|
||
|
||
✅ **Spatial-only video matching** - Ignores timing, focuses on content
|
||
✅ **Audio fingerprinting** - Chromaprint-based robust audio matching
|
||
✅ **Multi-master detection** - Identifies all masters used in an adaptation
|
||
✅ **Percentage contribution** - Shows how much of each master was used
|
||
✅ **Confidence scoring** - Weighted scoring combining video + audio
|
||
✅ **Batch processing** - Bulk add masters from directories
|
||
|
||
---
|
||
|
||
## How It Works
|
||
|
||
### 1. Fingerprinting Phase
|
||
|
||
When you add a master video or match an adaptation, the tool:
|
||
|
||
1. **Extracts frames** at 2 frames per second (default, configurable)
|
||
2. **Creates perceptual hashes** (8×8 DCT-based hashing)
|
||
3. **Extracts audio fingerprint** using Chromaprint (if available)
|
||
4. **Stores fingerprints** as JSON files for future comparisons
|
||
|
||
### 2. Matching Phase
|
||
|
||
When matching an adaptation against masters:
|
||
|
||
1. **Generates adaptation fingerprint** (same process as masters)
|
||
2. **Spatial comparison**: For each adaptation frame, finds the most similar frame in each master (anywhere in the timeline)
|
||
3. **Calculates percentage**: (matching frames / total frames) × 100%
|
||
4. **Combines signals**: Weighted combination of video (70%) + audio (30%)
|
||
5. **Ranks results**: Sorted by combined confidence score
|
||
|
||
### Key Insight: Spatial-Only Matching
|
||
|
||
Traditional video matching fails when adaptations are:
|
||
- Speed-changed (frames at different timestamps)
|
||
- Reordered (shots in different sequence)
|
||
- Edited (missing sections, insertions)
|
||
|
||
**Solution**: We ask "Does this frame exist ANYWHERE in the master?" instead of "Does this frame exist at timestamp T?"
|
||
|
||
This makes matching robust to timing changes while still accurately identifying source content.
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ CLI Layer (cli.py) │
|
||
│ Commands: add-master, list-masters, match, clear, status │
|
||
└────────────────────────┬────────────────────────────────────────┘
|
||
│
|
||
┌────────────────────────▼────────────────────────────────────────┐
|
||
│ Matcher Layer (matcher.py) │
|
||
│ • Loads fingerprints │
|
||
│ • Orchestrates comparison │
|
||
│ • Calculates percentages & confidence │
|
||
└────────────────────────┬────────────────────────────────────────┘
|
||
│
|
||
┌────────────────────────▼────────────────────────────────────────┐
|
||
│ Fingerprinter Layer (fingerprinter.py) │
|
||
│ • Video frame extraction (FFmpeg) │
|
||
│ • Perceptual hashing (8×8 DCT) │
|
||
│ • Audio fingerprinting (Chromaprint) │
|
||
│ • Spatial-only comparison │
|
||
└────────────────────────┬────────────────────────────────────────┘
|
||
│
|
||
┌────────────────────────▼────────────────────────────────────────┐
|
||
│ Storage Layer │
|
||
│ • data/fingerprints/*.json - Fingerprint files │
|
||
│ • data/masters.json - Master video database │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Core Components
|
||
|
||
#### 1. `VideoFingerprinter` (fingerprinter.py)
|
||
- Extracts video frames and generates perceptual hashes
|
||
- Creates audio fingerprints using Chromaprint
|
||
- Supports configurable sampling rate (frames per second)
|
||
- Stores fingerprints as JSON for reuse
|
||
|
||
#### 2. `VideoMatcher` (matcher.py)
|
||
- Manages master video database
|
||
- Performs spatial-only matching
|
||
- Calculates percentage contributions
|
||
- Generates confidence scores
|
||
|
||
#### 3. `CLI` (cli.py)
|
||
- User-facing command-line interface
|
||
- Rich terminal output with tables and colors
|
||
- Progress bars for batch operations
|
||
|
||
---
|
||
|
||
## Matching Algorithm
|
||
|
||
### Spatial-Only Video Matching
|
||
|
||
```python
|
||
def compare_spatial_only(adaptation_fp, master_fp, threshold=0.70):
|
||
matches = 0
|
||
|
||
for adapt_frame in adaptation_frames:
|
||
best_similarity = 0
|
||
|
||
# Compare against ALL master frames (ignore time)
|
||
for master_frame in master_frames:
|
||
similarity = hamming_distance(adapt_frame.hash, master_frame.hash)
|
||
best_similarity = max(best_similarity, similarity)
|
||
|
||
if best_similarity >= threshold:
|
||
matches += 1
|
||
|
||
percentage = (matches / total_frames) * 100
|
||
return percentage
|
||
```
|
||
|
||
### Key Parameters
|
||
|
||
| Parameter | Default | Description |
|
||
|-----------|---------|-------------|
|
||
| `samples_per_second` | 2.0 | Frames extracted per second (configurable in code) |
|
||
| `frame_threshold` | 0.70 | Minimum similarity for frame match (0-1) |
|
||
| `threshold` | 0.30 | Minimum % of frames to report master (0-1) |
|
||
|
||
### Confidence Calculation
|
||
|
||
```
|
||
combined_score = (video_percentage / 100 × 0.7) + (audio_similarity × 0.3)
|
||
|
||
Confidence Levels:
|
||
- Very High: combined_score ≥ 0.90
|
||
- High: combined_score ≥ 0.75
|
||
- Medium: combined_score ≥ 0.60
|
||
- Low: combined_score ≥ 0.50
|
||
- Very Low: combined_score < 0.50
|
||
```
|
||
|
||
---
|
||
|
||
## CLI Reference
|
||
|
||
### `add-master` - Add Master Video
|
||
|
||
Add a master video to the library.
|
||
|
||
```bash
|
||
python cli.py add-master <video_path> [--id <custom_id>]
|
||
```
|
||
|
||
**Examples:**
|
||
```bash
|
||
# Auto-generate ID from filename
|
||
python cli.py add-master /path/to/master.mp4
|
||
|
||
# Use custom ID
|
||
python cli.py add-master /path/to/master.mp4 --id master_v1
|
||
```
|
||
|
||
### `list-masters` - List All Masters
|
||
|
||
Display all master videos in the library.
|
||
|
||
```bash
|
||
python cli.py list-masters
|
||
```
|
||
|
||
**Output:**
|
||
- Master ID
|
||
- Filename
|
||
- Duration
|
||
- File path
|
||
|
||
### `match` - Match Adaptation Video
|
||
|
||
Match an adaptation against all masters using spatial-only matching.
|
||
|
||
```bash
|
||
python cli.py match <video_path> [OPTIONS]
|
||
```
|
||
|
||
**Options:**
|
||
- `--threshold`, `-t` (default: 0.3): Minimum percentage of frames matching (0-1)
|
||
- `--frame-threshold`, `-f` (default: 0.70): Similarity threshold for individual frames (0-1)
|
||
|
||
**Examples:**
|
||
```bash
|
||
# Default matching
|
||
python cli.py match /path/to/adaptation.mp4
|
||
|
||
# Stricter matching (require 50% of frames)
|
||
python cli.py match /path/to/adaptation.mp4 -t 0.5
|
||
|
||
# More sensitive frame matching
|
||
python cli.py match /path/to/adaptation.mp4 -f 0.65
|
||
|
||
# Combined: require 70% match with sensitive frame detection
|
||
python cli.py match /path/to/adaptation.mp4 -t 0.7 -f 0.65
|
||
```
|
||
|
||
### `status` - System Status
|
||
|
||
Check system dependencies and library statistics.
|
||
|
||
```bash
|
||
python cli.py status
|
||
```
|
||
|
||
**Shows:**
|
||
- FFmpeg availability
|
||
- Chromaprint/AcoustID status
|
||
- TMK status
|
||
- Number of master videos
|
||
|
||
### `batch-match` - Batch Match Folder
|
||
|
||
Match all videos in a folder and generate an HTML report.
|
||
|
||
```bash
|
||
python cli.py batch-match <folder_path> [OPTIONS]
|
||
```
|
||
|
||
**Options:**
|
||
- `--threshold`, `-t` (default: 0.3): Minimum percentage match (0-1)
|
||
- `--frame-threshold`, `-f` (default: 0.70): Frame similarity threshold (0-1)
|
||
- `--output`, `-o`: Output HTML file path (default: auto-generated timestamp)
|
||
|
||
**Examples:**
|
||
```bash
|
||
# Process all videos in folder
|
||
python cli.py batch-match /path/to/adaptations/
|
||
|
||
# Custom thresholds
|
||
python cli.py batch-match /path/to/adaptations/ -t 0.5 -f 0.75
|
||
|
||
# Custom output filename
|
||
python cli.py batch-match /path/to/adaptations/ -o report.html
|
||
```
|
||
|
||
**Output:**
|
||
- Generates timestamped HTML report: `matching_report_YYYYMMDD_HHMMSS.html`
|
||
- Shows summary statistics in terminal
|
||
- Provides clickable file:// URL to open report
|
||
|
||
### `clear` - Clear Library
|
||
|
||
Remove all master videos from the library.
|
||
|
||
```bash
|
||
python cli.py clear
|
||
```
|
||
|
||
⚠️ **Warning:** This deletes all fingerprints and master records. Cannot be undone.
|
||
|
||
---
|
||
|
||
## Batch Matching & HTML Reports
|
||
|
||
### Overview
|
||
|
||
The batch matching feature allows you to process an entire folder of adaptation videos and generate a comprehensive HTML report showing which masters were used for each adaptation.
|
||
|
||
### Usage
|
||
|
||
**Command Line:**
|
||
```bash
|
||
# Basic usage
|
||
python cli.py batch-match /path/to/adaptations/
|
||
|
||
# With custom thresholds
|
||
python cli.py batch-match /path/to/adaptations/ -t 0.5 -f 0.75
|
||
|
||
# Specify output filename
|
||
python cli.py batch-match /path/to/adaptations/ -o my_report.html
|
||
```
|
||
|
||
**Standalone Script:**
|
||
```bash
|
||
# You can also use the standalone script
|
||
python batch_match.py /path/to/adaptations/
|
||
python batch_match.py /path/to/adaptations/ --output reports/batch_results.html
|
||
```
|
||
|
||
### HTML Report Features
|
||
|
||
The generated HTML report includes:
|
||
|
||
**1. Summary Dashboard**
|
||
- Total adaptations processed
|
||
- Number of matched adaptations
|
||
- Number with no matches
|
||
- Total master matches across all adaptations
|
||
|
||
**2. Per-Adaptation Cards**
|
||
Each adaptation is shown in a card with:
|
||
- Adaptation filename
|
||
- Number of matches badge
|
||
- List of all matching masters
|
||
- Error message (if processing failed)
|
||
|
||
**3. Per-Master Match Details**
|
||
For each matching master:
|
||
- Master ID and filename
|
||
- Color-coded confidence badge:
|
||
- 🟢 **Green** - Very High/High confidence
|
||
- 🟡 **Yellow** - Medium confidence
|
||
- 🔴 **Red** - Low/Very Low confidence
|
||
- Master duration
|
||
- Video match percentage
|
||
- Frames matched (X/Y format)
|
||
- Combined confidence score
|
||
- Visual progress bar showing match percentage
|
||
|
||
**4. Design Features**
|
||
- Modern gradient design (purple theme)
|
||
- Responsive layout (works on mobile/tablet/desktop)
|
||
- Hover effects on cards
|
||
- Print-friendly styling
|
||
- Clean, professional appearance
|
||
|
||
### Example Workflow
|
||
|
||
```bash
|
||
# 1. Add all masters
|
||
python bulk_add_masters.py "masters/" -r
|
||
|
||
# 2. Process all adaptations
|
||
python cli.py batch-match "adaptations/"
|
||
|
||
# 3. Open the generated report
|
||
open matching_report_20251010_153045.html
|
||
|
||
# 4. Review results:
|
||
# - Which adaptations matched which masters
|
||
# - Confidence levels for each match
|
||
# - Any processing errors
|
||
```
|
||
|
||
### Use Cases
|
||
|
||
**Quality Control:**
|
||
- Verify adaptations were created from correct masters
|
||
- Check if all expected masters were used
|
||
- Identify adaptations with low confidence matches
|
||
|
||
**Production Tracking:**
|
||
- Document which masters were used for each delivery
|
||
- Generate audit trail of master usage
|
||
- Track adaptation creation workflow
|
||
|
||
**Asset Management:**
|
||
- Identify unused masters
|
||
- Find duplicate or similar adaptations
|
||
- Organize video library by source masters
|
||
|
||
### Report Customization
|
||
|
||
The HTML report can be customized by editing `batch_match.py`:
|
||
|
||
```python
|
||
# Line 23: Change color scheme
|
||
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
||
|
||
# Line 80: Adjust card styling
|
||
.adaptation {
|
||
background: white;
|
||
padding: 25px;
|
||
border-radius: 15px;
|
||
}
|
||
|
||
# Line 150: Modify confidence colors
|
||
.confidence-very-high { background: #51cf66; }
|
||
.confidence-high { background: #69db7c; }
|
||
```
|
||
|
||
---
|
||
|
||
## Advanced Usage
|
||
|
||
### Bulk Adding Masters
|
||
|
||
Use the `bulk_add_masters.py` script to add multiple videos at once:
|
||
|
||
```bash
|
||
# Add all .mp4 files from a directory
|
||
python bulk_add_masters.py /path/to/masters/
|
||
|
||
# Recursively add from subdirectories
|
||
python bulk_add_masters.py /path/to/masters/ --recursive
|
||
|
||
# Add specific pattern
|
||
python bulk_add_masters.py /path/to/masters/ --pattern "*.mov"
|
||
```
|
||
|
||
### Adjusting Sampling Rate
|
||
|
||
The default is **2 frames per second**, optimized for fast-paced advertising content with quick edits.
|
||
|
||
Edit `src/video_matcher/fingerprinter.py:106`:
|
||
```python
|
||
samples_per_second = 2.0 # Default: good for ads with quick cuts
|
||
samples_per_second = 1.0 # Faster: basic matching, may miss quick edits
|
||
samples_per_second = 3.0 # Slower: catches sub-second cuts
|
||
```
|
||
|
||
**Trade-offs:**
|
||
|
||
| Rate | 20s Video | Use Case | Pros | Cons |
|
||
|------|-----------|----------|------|------|
|
||
| 0.5 fps | 10 frames | Long-form content | Fast, small files | May miss cuts |
|
||
| 1.0 fps | 20 frames | General purpose | Balanced | Misses quick edits |
|
||
| **2.0 fps** | **40 frames** | **Ads/Marketing** | **Catches quick cuts** | **2x storage** |
|
||
| 3.0 fps | 60 frames | Frame-accurate | Very detailed | 3x slower |
|
||
|
||
**Recommendation:** Keep 2 fps for advertising/marketing content with fast edits.
|
||
|
||
### Handling Different Aspect Ratios
|
||
|
||
**Best Practice:** Maintain separate masters for each aspect ratio:
|
||
|
||
```
|
||
masters/
|
||
├── 16x9/
|
||
│ ├── master_A_16x9.mp4
|
||
│ ├── master_B_16x9.mp4
|
||
├── 9x16/
|
||
│ ├── master_A_9x16.mp4
|
||
│ ├── master_B_9x16.mp4
|
||
└── 1x1/
|
||
├── master_A_1x1.mp4
|
||
└── master_B_1x1.mp4
|
||
```
|
||
|
||
Add all versions to the library:
|
||
```bash
|
||
python bulk_add_masters.py masters/16x9/ -r
|
||
python bulk_add_masters.py masters/9x16/ -r
|
||
python bulk_add_masters.py masters/1x1/ -r
|
||
```
|
||
|
||
The matcher will automatically identify the correct aspect ratio master.
|
||
|
||
---
|
||
|
||
## Understanding Results
|
||
|
||
### Sample Output
|
||
|
||
```
|
||
Found 2 master(s) matching this adaptation:
|
||
|
||
╭──────┬────────────┬─────────────┬────────┬───────┬──────────┬────────────╮
|
||
│ Rank │ Master ID │ Video Match │ Frames │ Audio │ Combined │ Confidence │
|
||
├──────┼────────────┼─────────────┼────────┼───────┼──────────┼────────────┤
|
||
│ 1 │ master_C │ 100.0% │ 15/15 │ 0.500 │ 0.850 │ High │
|
||
│ 2 │ master_B │ 73.3% │ 11/15 │ 0.500 │ 0.663 │ Medium │
|
||
╰──────┴────────────┴─────────────┴────────┴───────┴──────────┴────────────╯
|
||
|
||
Best Match:
|
||
Master: master_C
|
||
Video frames matched: 100.0% (15/15 frames)
|
||
Average frame similarity: 94.4%
|
||
Audio similarity: 0.500
|
||
Combined confidence: 85.0%
|
||
```
|
||
|
||
### Interpreting Scores
|
||
|
||
**Video Match Percentage:**
|
||
- **100%**: All adaptation frames found in master
|
||
- **75-99%**: Most frames match, likely correct master
|
||
- **50-74%**: Partial match, possibly similar content
|
||
- **<50%**: Unlikely to be source master
|
||
|
||
**Average Frame Similarity:**
|
||
- **>90%**: Near-identical frames (same encoding/quality)
|
||
- **75-90%**: Very similar (different encoding/compression)
|
||
- **60-75%**: Similar content (crops, color grading)
|
||
- **<60%**: Different content or heavy transformations
|
||
|
||
**Combined Score:**
|
||
- Weighted combination: 70% video + 30% audio
|
||
- Audio helps disambiguate visually similar masters
|
||
- Higher combined score = more confident match
|
||
|
||
### When Multiple Masters Match
|
||
|
||
If an adaptation uses content from multiple masters:
|
||
|
||
```
|
||
Best Match:
|
||
Master: master_A - 60% of frames
|
||
|
||
Other Potential Matches:
|
||
• master_B: 40% of frames
|
||
```
|
||
|
||
This indicates the adaptation combined:
|
||
- 60% content from master_A
|
||
- 40% content from master_B
|
||
|
||
---
|
||
|
||
## Performance Tuning
|
||
|
||
### Speed vs Accuracy
|
||
|
||
**For faster matching (lower accuracy):**
|
||
```python
|
||
# Reduce sampling rate (1.0 = 1 frame per second)
|
||
samples_per_second = 1.0
|
||
|
||
# Increase thresholds (stricter matching)
|
||
frame_threshold = 0.75
|
||
threshold = 0.5
|
||
```
|
||
|
||
**For better accuracy (slower):**
|
||
```python
|
||
# Increase sampling rate (3.0 = 3 frames per second)
|
||
samples_per_second = 3.0
|
||
|
||
# Lower thresholds (more sensitive)
|
||
frame_threshold = 0.65
|
||
threshold = 0.3
|
||
```
|
||
|
||
**Default (balanced for ads):**
|
||
```python
|
||
samples_per_second = 2.0 # Catches quick edits
|
||
frame_threshold = 0.70
|
||
threshold = 0.3
|
||
```
|
||
|
||
### Large Libraries
|
||
|
||
For libraries with 100+ masters:
|
||
|
||
1. **Pre-filter by duration:**
|
||
- Skip masters that are too short/long for the adaptation
|
||
|
||
2. **Use audio pre-filtering:**
|
||
- Match audio first, then only check video for audio matches
|
||
|
||
3. **Parallel processing:**
|
||
- Compare against multiple masters simultaneously
|
||
|
||
---
|
||
|
||
## Troubleshooting
|
||
|
||
### Common Issues
|
||
|
||
**❌ No matches found**
|
||
|
||
**Cause:** Thresholds too strict, or videos unrelated
|
||
|
||
**Solution:**
|
||
```bash
|
||
# Try more lenient settings
|
||
python cli.py match video.mp4 -t 0.2 -f 0.65
|
||
```
|
||
|
||
---
|
||
|
||
**❌ Too many false positives**
|
||
|
||
**Cause:** Thresholds too lenient, similar-looking content
|
||
|
||
**Solution:**
|
||
```bash
|
||
# Stricter matching
|
||
python cli.py match video.mp4 -t 0.5 -f 0.75
|
||
```
|
||
|
||
---
|
||
|
||
**❌ Speed-changed adaptations not matching**
|
||
|
||
**Cause:** Already handled! Spatial matching ignores timing
|
||
|
||
**Check:**
|
||
- Ensure video content is actually similar
|
||
- Lower frame_threshold if heavily processed
|
||
|
||
---
|
||
|
||
**❌ Different aspect ratios not matching**
|
||
|
||
**Solution:** Ensure you have masters in the same aspect ratio
|
||
|
||
```bash
|
||
# Add masters for each aspect ratio
|
||
python cli.py add-master master_16x9.mp4
|
||
python cli.py add-master master_1x1.mp4
|
||
```
|
||
|
||
---
|
||
|
||
**❌ Audio similarity always 0.500**
|
||
|
||
**Cause:** Chromaprint comparison not fully implemented (placeholder)
|
||
|
||
**Note:** This is a POC limitation. Video matching still works.
|
||
|
||
---
|
||
|
||
## API Reference
|
||
|
||
### VideoFingerprinter
|
||
|
||
```python
|
||
from video_matcher.fingerprinter import VideoFingerprinter
|
||
|
||
fp = VideoFingerprinter(data_dir="data/fingerprints")
|
||
|
||
# Generate fingerprint
|
||
fingerprint = fp.fingerprint_video(
|
||
video_path="/path/to/video.mp4",
|
||
video_id="my_video"
|
||
)
|
||
|
||
# Load existing fingerprint
|
||
existing = fp.load_fingerprint("my_video")
|
||
|
||
# List all fingerprints
|
||
all_ids = fp.list_fingerprints()
|
||
```
|
||
|
||
### VideoMatcher
|
||
|
||
```python
|
||
from video_matcher.matcher import VideoMatcher
|
||
|
||
matcher = VideoMatcher(data_dir="data")
|
||
|
||
# Add master
|
||
matcher.add_master(
|
||
video_path="/path/to/master.mp4",
|
||
master_id="master_1"
|
||
)
|
||
|
||
# List masters
|
||
masters = matcher.list_masters()
|
||
|
||
# Match adaptation
|
||
matches = matcher.match_adaptation(
|
||
video_path="/path/to/adaptation.mp4",
|
||
threshold=0.3,
|
||
frame_threshold=0.70
|
||
)
|
||
|
||
# Clear all masters
|
||
matcher.clear_masters()
|
||
```
|
||
|
||
### Comparison Functions
|
||
|
||
```python
|
||
from video_matcher.fingerprinter import (
|
||
compare_spatial_only,
|
||
compare_audio_fingerprints
|
||
)
|
||
|
||
# Spatial video comparison
|
||
result = compare_spatial_only(
|
||
adaptation_fp=adapt_fp,
|
||
master_fp=master_fp,
|
||
similarity_threshold=0.75
|
||
)
|
||
# Returns: {
|
||
# 'matching_frames': 12,
|
||
# 'total_frames': 15,
|
||
# 'percentage': 80.0,
|
||
# 'average_similarity': 0.87
|
||
# }
|
||
|
||
# Audio comparison
|
||
audio_score = compare_audio_fingerprints(
|
||
fp1=adapt_audio,
|
||
fp2=master_audio
|
||
)
|
||
# Returns: float (0-1)
|
||
```
|
||
|
||
---
|
||
|
||
## File Formats
|
||
|
||
### Fingerprint JSON Structure
|
||
|
||
```json
|
||
{
|
||
"video_id": "master_example",
|
||
"path": "/path/to/video.mp4",
|
||
"filename": "video.mp4",
|
||
"info": {
|
||
"duration": 20.0,
|
||
"width": 1920,
|
||
"height": 1080,
|
||
"fps": 25.0,
|
||
"has_audio": true,
|
||
"codec": "h264"
|
||
},
|
||
"audio_fp": {
|
||
"duration": 20.0,
|
||
"fingerprint": "AQAAZEw4Kc9w...",
|
||
"method": "chromaprint"
|
||
},
|
||
"video_fp": {
|
||
"method": "basic_hash",
|
||
"samples_per_second": 1.0,
|
||
"num_frames": 20,
|
||
"frames": [
|
||
{
|
||
"frame_id": 0,
|
||
"timestamp": 0.0,
|
||
"hash": "0xcfcfc7e3c3e3e3e3"
|
||
}
|
||
]
|
||
}
|
||
}
|
||
```
|
||
|
||
### Masters Database (masters.json)
|
||
|
||
```json
|
||
{
|
||
"masters": [
|
||
{
|
||
"master_id": "master_example",
|
||
"fingerprint_id": "master_master_example",
|
||
"path": "/path/to/video.mp4",
|
||
"filename": "video.mp4",
|
||
"duration": 20.0
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Future Enhancements
|
||
|
||
### Production-Ready Improvements
|
||
|
||
1. **TMK Integration** - Facebook's Threat Match for more robust matching
|
||
2. **Segment Timeline** - Show exactly which parts came from which master
|
||
3. **Web UI** - Drag-drop interface with side-by-side comparison
|
||
4. **Batch Processing** - Process hundreds of adaptations in parallel
|
||
5. **Database Storage** - PostgreSQL/MongoDB instead of JSON files
|
||
6. **Vector Search** - Milvus/Qdrant for sub-second matching in large libraries
|
||
7. **GPU Acceleration** - CUDA-based hash computation
|
||
8. **CLIP Embeddings** - Handle heavy crops, overlays, graphics
|
||
9. **Shot Detection** - PySceneDetect for segment-level matching
|
||
10. **Audio Refinement** - Proper Chromaprint comparison implementation
|
||
|
||
### Suggested Architecture for Scale
|
||
|
||
```
|
||
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||
│ Web UI │────▶│ API Gateway │────▶│ Job Queue │
|
||
│ (React) │ │ (FastAPI) │ │ (Celery) │
|
||
└──────────────┘ └──────────────┘ └──────┬───────┘
|
||
│
|
||
┌──────────────┐ ┌───────▼───────┐
|
||
│ Vector DB │────▶│ Workers │
|
||
│ (Qdrant) │ │ (GPU-based) │
|
||
└──────────────┘ └───────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## License
|
||
|
||
MIT License - See LICENSE file for details.
|
||
|
||
---
|
||
|
||
## Support & Contact
|
||
|
||
For questions, issues, or contributions, please open an issue on the GitHub repository.
|
||
|
||
**Documentation Version:** 1.0
|
||
**Last Updated:** 2025-10-05
|