Major Features: - 🖥️ Standalone desktop app (VideoMatcher.app) - double-click to run - 🎨 Black & gold branded UI (Montserrat font, #FFC407 accent) - 📁 Local file browser for master/adaptation folders - ⚡ Fast mode processing (10-20x faster, disables AKAZE/AI Vision) - 🤖 Smart AI Vision fallback (auto-retry when no matches found) - 📊 Real-time progress bars (fingerprinting & matching) - 💾 Local processing (no cloud, no authentication) - 📤 CSV export with master filenames Web Application (Enterprise): - 🌐 Flask web app with Azure AD authentication - 📦 Box.com integration for cloud storage - 🐳 Docker support for deployment - 🔐 JWT validation with httpOnly cookies - 🎯 REST API endpoints Enhancements: - Fixed master filename lookup (was showing "Unknown") - Automatic fingerprint recovery (detects missing files) - Improved CSV format (master file next to adaptation) - Port conflict handling (auto-finds available port) - Environment variable fixes for standalone mode Documentation: - Updated README with standalone app section - Added 10+ guide documents (UI improvements, fingerprint recovery, etc.) - Build instructions with PyInstaller - Comprehensive troubleshooting guide Technical: - PyInstaller build configuration (video_matcher.spec) - Launcher with environment setup (launcher.py) - Mock authentication for standalone mode - Video matcher service layer - Metadata parser and AKAZE video matching 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
547 lines
13 KiB
Markdown
547 lines
13 KiB
Markdown
# Batch Processing Guide
|
|
|
|
## Overview
|
|
|
|
This guide covers how to process entire folders of adaptation videos and generate comprehensive HTML reports.
|
|
|
|
**Last Updated:** January 2025 (Tested & Verified)
|
|
|
|
---
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### Process a Folder of Videos
|
|
|
|
```bash
|
|
# Fast mode (recommended for same-aspect videos)
|
|
python batch_match_fast.py /path/to/adaptations/ report.html
|
|
|
|
# Full mode (with AKAZE verification)
|
|
python cli.py batch-match /path/to/adaptations/ -o report.html
|
|
```
|
|
|
|
---
|
|
|
|
## 📋 Prerequisites
|
|
|
|
### 1. Add Master Videos First
|
|
|
|
Before batch processing, ensure your master videos are registered:
|
|
|
|
```bash
|
|
# Bulk add all masters from folder
|
|
python bulk_add_masters.py /path/to/masters/ -r
|
|
|
|
# Verify masters are loaded
|
|
python cli.py list-masters
|
|
```
|
|
|
|
**Expected output:**
|
|
```
|
|
Master Videos
|
|
╭──────────┬───────────┬──────────┬──────╮
|
|
│ ID │ Filename │ Duration │ Path │
|
|
├──────────┼───────────┼──────────┼──────┤
|
|
│ master_1 │ video.mp4 │ 20.0s │ ... │
|
|
│ ... │ ... │ ... │ ... │
|
|
╰──────────┴───────────┴──────────┴──────╯
|
|
|
|
✓ 46 masters registered
|
|
```
|
|
|
|
---
|
|
|
|
## ⚡ Batch Processing Modes
|
|
|
|
### Mode 1: Fast Batch (Recommended)
|
|
|
|
**Use when:**
|
|
- Same aspect ratio videos (1x1, 9x16, 16x9 → same format)
|
|
- Quick results needed
|
|
- High confidence in perceptual hash accuracy
|
|
|
|
**Command:**
|
|
```bash
|
|
python batch_match_fast.py /path/to/adaptations/ output_report.html
|
|
```
|
|
|
|
**Features:**
|
|
- ✅ Perceptual hash matching (fast)
|
|
- ✅ Metadata filtering (if filenames follow conventions)
|
|
- ✅ AI Vision fallback (if no matches)
|
|
- ❌ AKAZE verification (skipped for speed)
|
|
|
|
**Performance:**
|
|
- ~8-12 seconds per video
|
|
- **Example:** 39 videos in 5-8 minutes
|
|
|
|
---
|
|
|
|
### Mode 2: Full Batch (Most Accurate)
|
|
|
|
**Use when:**
|
|
- Cross-aspect ratio videos (16:9 → 1x1 → 9:16)
|
|
- Final validation needed
|
|
- Audit trail required
|
|
- Extra verification desired
|
|
|
|
**Command:**
|
|
```bash
|
|
python cli.py batch-match /path/to/adaptations/ -o output_report.html
|
|
```
|
|
|
|
**Features:**
|
|
- ✅ Perceptual hash pre-filtering
|
|
- ✅ AKAZE verification (top 5 candidates)
|
|
- ✅ Metadata filtering
|
|
- ✅ AI Vision fallback
|
|
|
|
**Performance:**
|
|
- ~15-25 seconds per video
|
|
- **Example:** 39 videos in 10-15 minutes
|
|
|
|
---
|
|
|
|
## 📊 Understanding the Output
|
|
|
|
### Terminal Output
|
|
|
|
During processing, you'll see:
|
|
|
|
```
|
|
Found 39 video file(s) to process
|
|
|
|
Comparing against 46 master(s)...
|
|
|
|
Processing adaptations...
|
|
[████████████████████████] 100%
|
|
|
|
✓ Report generated successfully!
|
|
|
|
Summary:
|
|
Total adaptations: 39
|
|
Matched: 38
|
|
No matches: 1
|
|
Total master matches: 38
|
|
|
|
📄 Report saved to: report.html
|
|
|
|
Open in browser: file:///path/to/report.html
|
|
```
|
|
|
|
### HTML Report Structure
|
|
|
|
The generated HTML report contains:
|
|
|
|
#### 1. **Header Section**
|
|
- Report title and timestamp
|
|
- Source folder path
|
|
|
|
#### 2. **Summary Dashboard** (6 Statistics Cards)
|
|
```
|
|
┌─────────────────────────────────────────────────────┐
|
|
│ 39 Adaptations │ 38 Matched │ 1 No Match │
|
|
│ 38 Total Matches│ 35 HASH │ 1 AI Vision │
|
|
└─────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Cards show:**
|
|
- Total adaptations processed
|
|
- Number matched
|
|
- Number with no matches
|
|
- Total master matches found
|
|
- AKAZE match count
|
|
- AI Vision match count
|
|
|
|
#### 3. **Individual Adaptation Cards**
|
|
|
|
Each adaptation shows:
|
|
```
|
|
┌────────────────────────────────────────────────────┐
|
|
│ AT_de_1011A_Spring_Feed_FB_1x1_6_A_5466976.mp4 │
|
|
│ [3 Matches] 🟢 │
|
|
├────────────────────────────────────────────────────┤
|
|
│ #1 5368067_..._MASTER_1 [VERY HIGH] 🟢 │
|
|
│ Duration: 20s │ Video: 100.0% │ Method: HASH │
|
|
│ Frames: 12/12 │ Score: 85.0% │
|
|
│ ████████████████████████████████████████ 100% │
|
|
├────────────────────────────────────────────────────┤
|
|
│ #2 5368104_..._MASTER_1 [HIGH] 🟢 │
|
|
│ Duration: 15s │ Video: 100.0% │ Method: HASH │
|
|
│ Frames: 12/12 │ Score: 85.0% │
|
|
│ ████████████████████████████████████████ 100% │
|
|
└────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Details shown:**
|
|
- Master ID (ranked by score and duration)
|
|
- Confidence badge (color-coded: green/yellow/red)
|
|
- Duration of master video
|
|
- Video match percentage
|
|
- Frame count (matched/total)
|
|
- Combined score
|
|
- Matching method (HASH/AKAZE/AI VISION)
|
|
- Visual progress bar
|
|
|
|
---
|
|
|
|
## 🎯 Real-World Example
|
|
|
|
### Test Case: Austrian Spring Fashion Campaign
|
|
|
|
**Setup:**
|
|
```bash
|
|
# Masters: 46 videos (various formats, variants, durations)
|
|
python bulk_add_masters.py /path/to/masters/ -r
|
|
|
|
# Adaptations: 39 videos (German language, Austrian market)
|
|
python batch_match_fast.py "/path/to/AT/" AT_report.html
|
|
```
|
|
|
|
**Results:**
|
|
```
|
|
Processing Time: 6 minutes 42 seconds
|
|
|
|
Summary:
|
|
Total adaptations: 39
|
|
Matched: 39
|
|
No matches: 0
|
|
Total master matches: 39
|
|
|
|
Method Breakdown:
|
|
Perceptual Hash: 39 (100%)
|
|
AKAZE: 0 (not run in fast mode)
|
|
AI Vision: 0 (not needed)
|
|
|
|
Average match confidence: 95.2%
|
|
```
|
|
|
|
**Findings:**
|
|
- ✅ All 39 adaptations matched successfully
|
|
- ✅ 100% match rates (12/12 frames)
|
|
- ✅ Different languages handled perfectly
|
|
- ✅ Logo/text differences ignored
|
|
- ✅ Correct master identification (longest duration ranked #1)
|
|
|
|
---
|
|
|
|
## 🔧 Advanced Options
|
|
|
|
### Custom Thresholds
|
|
|
|
```bash
|
|
# Adjust matching thresholds
|
|
python cli.py batch-match /path/to/folder/ \
|
|
-t 0.80 \ # Match threshold (80%)
|
|
-f 0.80 \ # Frame similarity
|
|
-m 0.90 \ # Min average similarity
|
|
-o report.html
|
|
```
|
|
|
|
**When to adjust:**
|
|
- `-t` (threshold): Lower for fuzzy matching, higher for strict
|
|
- `-f` (frame threshold): Lower for heavily edited videos
|
|
- `-m` (min avg similarity): Lower for degraded quality videos
|
|
|
|
### Process Multiple Folders
|
|
|
|
```bash
|
|
# Process by market
|
|
python batch_match_fast.py /path/to/AT/ AT_report.html
|
|
python batch_match_fast.py /path/to/DE/ DE_report.html
|
|
python batch_match_fast.py /path/to/FR/ FR_report.html
|
|
python batch_match_fast.py /path/to/UK/ UK_report.html
|
|
|
|
# Process by format
|
|
python batch_match_fast.py /path/to/1x1/ square_report.html
|
|
python batch_match_fast.py /path/to/9x16/ vertical_report.html
|
|
python batch_match_fast.py /path/to/16x9/ landscape_report.html
|
|
```
|
|
|
|
---
|
|
|
|
## 📈 Performance Guidelines
|
|
|
|
### Processing Time Estimates
|
|
|
|
| Video Count | Fast Mode | Full Mode |
|
|
|-------------|-----------|-----------|
|
|
| 10 | 2 min | 3-4 min |
|
|
| 25 | 4-5 min | 7-10 min |
|
|
| 50 | 8-10 min | 15-20 min |
|
|
| 100 | 15-20 min | 30-40 min |
|
|
| 500 | 80-100 min | 150-200 min |
|
|
|
|
**Variables affecting speed:**
|
|
- Video duration (longer = more frames)
|
|
- Number of masters in library
|
|
- CPU speed
|
|
- Disk I/O speed
|
|
|
|
### Memory Requirements
|
|
|
|
- **Small batch (<50 videos):** 2-4 GB RAM
|
|
- **Medium batch (50-200 videos):** 4-8 GB RAM
|
|
- **Large batch (>200 videos):** 8+ GB RAM
|
|
|
|
### Disk Space
|
|
|
|
- Fingerprint cache: ~20 KB per video
|
|
- **Example:** 500 videos = ~10 MB cache
|
|
- Reports: ~500 KB - 2 MB per report
|
|
|
|
---
|
|
|
|
## 🔍 Troubleshooting
|
|
|
|
### Issue: Processing Hangs
|
|
|
|
**Symptom:** Processing stops or hangs on a video
|
|
|
|
**Solution:**
|
|
1. Check if video file is corrupted:
|
|
```bash
|
|
ffmpeg -v error -i problem_video.mp4 -f null -
|
|
```
|
|
|
|
2. Skip problematic videos:
|
|
```bash
|
|
# Move to separate folder and process later
|
|
mv problem_video.mp4 ../problems/
|
|
```
|
|
|
|
3. Use faster mode:
|
|
```bash
|
|
python batch_match_fast.py /path/to/folder/ report.html
|
|
```
|
|
|
|
---
|
|
|
|
### Issue: No Matches Found
|
|
|
|
**Symptom:** All or most videos show "No matches"
|
|
|
|
**Causes & Solutions:**
|
|
|
|
1. **Masters not registered:**
|
|
```bash
|
|
python cli.py list-masters
|
|
# If empty, add masters first
|
|
python bulk_add_masters.py /path/to/masters/ -r
|
|
```
|
|
|
|
2. **Thresholds too strict:**
|
|
```bash
|
|
# Lower thresholds
|
|
python cli.py batch-match /path/to/folder/ -t 0.70 -f 0.75 -m 0.85
|
|
```
|
|
|
|
3. **Cross-aspect ratio videos:**
|
|
```bash
|
|
# Use full mode with AI Vision
|
|
python cli.py batch-match /path/to/folder/ -o report.html
|
|
# AI Vision will automatically trigger
|
|
```
|
|
|
|
4. **Different content:**
|
|
```bash
|
|
# Verify manually that adaptations are from your masters
|
|
# May need different master library
|
|
```
|
|
|
|
---
|
|
|
|
### Issue: Slow Processing
|
|
|
|
**Symptom:** Takes much longer than expected
|
|
|
|
**Solutions:**
|
|
|
|
1. **Use fast mode:**
|
|
```bash
|
|
python batch_match_fast.py /path/to/folder/ report.html
|
|
# 2x faster than full mode
|
|
```
|
|
|
|
2. **Check fingerprint cache:**
|
|
```bash
|
|
ls -lh data/fingerprints/
|
|
# Should have fingerprints for all masters
|
|
# If missing, run: python bulk_add_masters.py /path/to/masters/ -r
|
|
```
|
|
|
|
3. **Reduce metadata filtering overhead:**
|
|
```python
|
|
# Edit matcher.py or use fast mode which handles this
|
|
```
|
|
|
|
---
|
|
|
|
## 💡 Best Practices
|
|
|
|
### 1. Filename Conventions
|
|
|
|
For best metadata filtering results, use consistent naming:
|
|
|
|
**Good:**
|
|
```
|
|
Product_16x9_A_15s.mp4
|
|
Product_1x1_B_10s.mp4
|
|
Campaign_9x16_C_6s.mp4
|
|
```
|
|
|
|
**Less Ideal:**
|
|
```
|
|
video1.mp4
|
|
final_cut_v2.mp4
|
|
master_backup.mp4
|
|
```
|
|
|
|
**Metadata extraction looks for:**
|
|
- Format: `1x1`, `9x16`, `16x9`, `4x3`
|
|
- Variant: `A`, `B`, `C`, `D`, `E`, `F`
|
|
- Duration: `6s`, `10s`, `15s`, `20s`
|
|
|
|
### 2. Master Organization
|
|
|
|
Organize masters by campaign:
|
|
```
|
|
masters/
|
|
├── spring_2024/
|
|
│ ├── master_1x1_A_6s.mp4
|
|
│ ├── master_1x1_A_10s.mp4
|
|
│ └── master_1x1_A_15s.mp4
|
|
├── summer_2024/
|
|
│ └── ...
|
|
└── fall_2024/
|
|
└── ...
|
|
```
|
|
|
|
### 3. Adaptation Organization
|
|
|
|
Organize adaptations by market/format:
|
|
```
|
|
adaptations/
|
|
├── AT/ # Austria
|
|
├── DE/ # Germany
|
|
├── FR/ # France
|
|
└── UK/ # United Kingdom
|
|
```
|
|
|
|
Or by format:
|
|
```
|
|
adaptations/
|
|
├── 1x1/ # Square
|
|
├── 9x16/ # Vertical
|
|
└── 16x9/ # Landscape
|
|
```
|
|
|
|
### 4. Report Naming
|
|
|
|
Use descriptive report names:
|
|
```bash
|
|
# Good
|
|
python batch_match_fast.py AT/ AT_Spring2024_$(date +%Y%m%d).html
|
|
python batch_match_fast.py DE/ DE_Spring2024_$(date +%Y%m%d).html
|
|
|
|
# Descriptive with timestamp
|
|
python batch_match_fast.py AT/ AT_Spring_20240126.html
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 Interpreting Results
|
|
|
|
### Confidence Levels
|
|
|
|
| Badge | Meaning | Action |
|
|
|-------|---------|--------|
|
|
| 🟢 **VERY HIGH** | 90-100% confidence | Accept match |
|
|
| 🟢 **HIGH** | 75-89% confidence | Accept match |
|
|
| 🟡 **MEDIUM** | 60-74% confidence | Review recommended |
|
|
| 🔴 **LOW** | 50-59% confidence | Manual review required |
|
|
| 🔴 **VERY LOW** | <50% confidence | Likely incorrect |
|
|
|
|
### Match Percentage
|
|
|
|
- **100%**: Perfect match, all frames found
|
|
- **95-99%**: Excellent match, minor differences
|
|
- **80-94%**: Good match, some variations
|
|
- **60-79%**: Moderate match, review recommended
|
|
- **<60%**: Weak match, likely incorrect
|
|
|
|
### Method Indicators
|
|
|
|
- **HASH**: Matched via perceptual hash (fast, reliable)
|
|
- **AKAZE**: Verified via AKAZE features (robust, accurate)
|
|
- **AI VISION**: Matched via GPT-4V (cross-aspect, semantic)
|
|
|
|
---
|
|
|
|
## 🎯 Workflow Examples
|
|
|
|
### Daily Production Workflow
|
|
|
|
```bash
|
|
# 1. Process overnight batch
|
|
python batch_match_fast.py /incoming/daily/ daily_$(date +%Y%m%d).html
|
|
|
|
# 2. Review report in morning
|
|
open daily_20240126.html
|
|
|
|
# 3. Export results if needed
|
|
# (Report is self-contained HTML)
|
|
```
|
|
|
|
### Quality Assurance Workflow
|
|
|
|
```bash
|
|
# 1. Fast pass for bulk checking
|
|
python batch_match_fast.py /batch1/ quick_check.html
|
|
|
|
# 2. Full pass for final validation
|
|
python cli.py batch-match /batch1/ -o final_validation.html
|
|
|
|
# 3. Compare results
|
|
# Both reports should show same matches
|
|
# Full pass shows AKAZE verification
|
|
```
|
|
|
|
### Multi-Market Workflow
|
|
|
|
```bash
|
|
# Process each market separately
|
|
for market in AT DE FR UK ES IT; do
|
|
python batch_match_fast.py "/markets/$market/" "${market}_report.html"
|
|
done
|
|
|
|
# Consolidate results
|
|
# Each market gets its own report for review
|
|
```
|
|
|
|
---
|
|
|
|
## 📝 Summary
|
|
|
|
**For most use cases, use Fast Mode:**
|
|
```bash
|
|
python batch_match_fast.py /path/to/adaptations/ report.html
|
|
```
|
|
|
|
**For final validation, use Full Mode:**
|
|
```bash
|
|
python cli.py batch-match /path/to/adaptations/ -o report.html
|
|
```
|
|
|
|
**Both modes:**
|
|
- ✅ Handle text/logo differences
|
|
- ✅ Support multiple languages
|
|
- ✅ Generate beautiful HTML reports
|
|
- ✅ Show confidence levels and methods
|
|
- ✅ Rank by best match
|
|
|
|
**Tested and verified with real-world data! 🎉**
|
|
|
|
---
|
|
|
|
**End of Guide**
|