# Batch Processing Guide ## Overview This guide covers how to process entire folders of adaptation videos and generate comprehensive HTML reports. **Last Updated:** January 2025 (Tested & Verified) --- ## ๐Ÿš€ Quick Start ### Process a Folder of Videos ```bash # Fast mode (recommended for same-aspect videos) python batch_match_fast.py /path/to/adaptations/ report.html # Full mode (with AKAZE verification) python cli.py batch-match /path/to/adaptations/ -o report.html ``` --- ## ๐Ÿ“‹ Prerequisites ### 1. Add Master Videos First Before batch processing, ensure your master videos are registered: ```bash # Bulk add all masters from folder python bulk_add_masters.py /path/to/masters/ -r # Verify masters are loaded python cli.py list-masters ``` **Expected output:** ``` Master Videos โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚ ID โ”‚ Filename โ”‚ Duration โ”‚ Path โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ master_1 โ”‚ video.mp4 โ”‚ 20.0s โ”‚ ... โ”‚ โ”‚ ... โ”‚ ... โ”‚ ... โ”‚ ... โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ โœ“ 46 masters registered ``` --- ## โšก Batch Processing Modes ### Mode 1: Fast Batch (Recommended) **Use when:** - Same aspect ratio videos (1x1, 9x16, 16x9 โ†’ same format) - Quick results needed - High confidence in perceptual hash accuracy **Command:** ```bash python batch_match_fast.py /path/to/adaptations/ output_report.html ``` **Features:** - โœ… Perceptual hash matching (fast) - โœ… Metadata filtering (if filenames follow conventions) - โœ… AI Vision fallback (if no matches) - โŒ AKAZE verification (skipped for speed) **Performance:** - ~8-12 seconds per video - **Example:** 39 videos in 5-8 minutes --- ### Mode 2: Full Batch (Most Accurate) **Use when:** - Cross-aspect ratio videos (16:9 โ†’ 1x1 โ†’ 9:16) - Final validation needed - Audit trail required - Extra verification desired **Command:** ```bash python cli.py batch-match /path/to/adaptations/ -o output_report.html ``` **Features:** - โœ… Perceptual hash pre-filtering - โœ… AKAZE verification (top 5 candidates) - โœ… Metadata filtering - โœ… AI Vision fallback **Performance:** - ~15-25 seconds per video - **Example:** 39 videos in 10-15 minutes --- ## ๐Ÿ“Š Understanding the Output ### Terminal Output During processing, you'll see: ``` Found 39 video file(s) to process Comparing against 46 master(s)... Processing adaptations... [โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ] 100% โœ“ Report generated successfully! Summary: Total adaptations: 39 Matched: 38 No matches: 1 Total master matches: 38 ๐Ÿ“„ Report saved to: report.html Open in browser: file:///path/to/report.html ``` ### HTML Report Structure The generated HTML report contains: #### 1. **Header Section** - Report title and timestamp - Source folder path #### 2. **Summary Dashboard** (6 Statistics Cards) ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ 39 Adaptations โ”‚ 38 Matched โ”‚ 1 No Match โ”‚ โ”‚ 38 Total Matchesโ”‚ 35 HASH โ”‚ 1 AI Vision โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` **Cards show:** - Total adaptations processed - Number matched - Number with no matches - Total master matches found - AKAZE match count - AI Vision match count #### 3. **Individual Adaptation Cards** Each adaptation shows: ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ AT_de_1011A_Spring_Feed_FB_1x1_6_A_5466976.mp4 โ”‚ โ”‚ [3 Matches] ๐ŸŸข โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ #1 5368067_..._MASTER_1 [VERY HIGH] ๐ŸŸข โ”‚ โ”‚ Duration: 20s โ”‚ Video: 100.0% โ”‚ Method: HASH โ”‚ โ”‚ Frames: 12/12 โ”‚ Score: 85.0% โ”‚ โ”‚ โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 100% โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ #2 5368104_..._MASTER_1 [HIGH] ๐ŸŸข โ”‚ โ”‚ Duration: 15s โ”‚ Video: 100.0% โ”‚ Method: HASH โ”‚ โ”‚ Frames: 12/12 โ”‚ Score: 85.0% โ”‚ โ”‚ โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 100% โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` **Details shown:** - Master ID (ranked by score and duration) - Confidence badge (color-coded: green/yellow/red) - Duration of master video - Video match percentage - Frame count (matched/total) - Combined score - Matching method (HASH/AKAZE/AI VISION) - Visual progress bar --- ## ๐ŸŽฏ Real-World Example ### Test Case: Austrian Spring Fashion Campaign **Setup:** ```bash # Masters: 46 videos (various formats, variants, durations) python bulk_add_masters.py /path/to/masters/ -r # Adaptations: 39 videos (German language, Austrian market) python batch_match_fast.py "/path/to/AT/" AT_report.html ``` **Results:** ``` Processing Time: 6 minutes 42 seconds Summary: Total adaptations: 39 Matched: 39 No matches: 0 Total master matches: 39 Method Breakdown: Perceptual Hash: 39 (100%) AKAZE: 0 (not run in fast mode) AI Vision: 0 (not needed) Average match confidence: 95.2% ``` **Findings:** - โœ… All 39 adaptations matched successfully - โœ… 100% match rates (12/12 frames) - โœ… Different languages handled perfectly - โœ… Logo/text differences ignored - โœ… Correct master identification (longest duration ranked #1) --- ## ๐Ÿ”ง Advanced Options ### Custom Thresholds ```bash # Adjust matching thresholds python cli.py batch-match /path/to/folder/ \ -t 0.80 \ # Match threshold (80%) -f 0.80 \ # Frame similarity -m 0.90 \ # Min average similarity -o report.html ``` **When to adjust:** - `-t` (threshold): Lower for fuzzy matching, higher for strict - `-f` (frame threshold): Lower for heavily edited videos - `-m` (min avg similarity): Lower for degraded quality videos ### Process Multiple Folders ```bash # Process by market python batch_match_fast.py /path/to/AT/ AT_report.html python batch_match_fast.py /path/to/DE/ DE_report.html python batch_match_fast.py /path/to/FR/ FR_report.html python batch_match_fast.py /path/to/UK/ UK_report.html # Process by format python batch_match_fast.py /path/to/1x1/ square_report.html python batch_match_fast.py /path/to/9x16/ vertical_report.html python batch_match_fast.py /path/to/16x9/ landscape_report.html ``` --- ## ๐Ÿ“ˆ Performance Guidelines ### Processing Time Estimates | Video Count | Fast Mode | Full Mode | |-------------|-----------|-----------| | 10 | 2 min | 3-4 min | | 25 | 4-5 min | 7-10 min | | 50 | 8-10 min | 15-20 min | | 100 | 15-20 min | 30-40 min | | 500 | 80-100 min | 150-200 min | **Variables affecting speed:** - Video duration (longer = more frames) - Number of masters in library - CPU speed - Disk I/O speed ### Memory Requirements - **Small batch (<50 videos):** 2-4 GB RAM - **Medium batch (50-200 videos):** 4-8 GB RAM - **Large batch (>200 videos):** 8+ GB RAM ### Disk Space - Fingerprint cache: ~20 KB per video - **Example:** 500 videos = ~10 MB cache - Reports: ~500 KB - 2 MB per report --- ## ๐Ÿ” Troubleshooting ### Issue: Processing Hangs **Symptom:** Processing stops or hangs on a video **Solution:** 1. Check if video file is corrupted: ```bash ffmpeg -v error -i problem_video.mp4 -f null - ``` 2. Skip problematic videos: ```bash # Move to separate folder and process later mv problem_video.mp4 ../problems/ ``` 3. Use faster mode: ```bash python batch_match_fast.py /path/to/folder/ report.html ``` --- ### Issue: No Matches Found **Symptom:** All or most videos show "No matches" **Causes & Solutions:** 1. **Masters not registered:** ```bash python cli.py list-masters # If empty, add masters first python bulk_add_masters.py /path/to/masters/ -r ``` 2. **Thresholds too strict:** ```bash # Lower thresholds python cli.py batch-match /path/to/folder/ -t 0.70 -f 0.75 -m 0.85 ``` 3. **Cross-aspect ratio videos:** ```bash # Use full mode with AI Vision python cli.py batch-match /path/to/folder/ -o report.html # AI Vision will automatically trigger ``` 4. **Different content:** ```bash # Verify manually that adaptations are from your masters # May need different master library ``` --- ### Issue: Slow Processing **Symptom:** Takes much longer than expected **Solutions:** 1. **Use fast mode:** ```bash python batch_match_fast.py /path/to/folder/ report.html # 2x faster than full mode ``` 2. **Check fingerprint cache:** ```bash ls -lh data/fingerprints/ # Should have fingerprints for all masters # If missing, run: python bulk_add_masters.py /path/to/masters/ -r ``` 3. **Reduce metadata filtering overhead:** ```python # Edit matcher.py or use fast mode which handles this ``` --- ## ๐Ÿ’ก Best Practices ### 1. Filename Conventions For best metadata filtering results, use consistent naming: **Good:** ``` Product_16x9_A_15s.mp4 Product_1x1_B_10s.mp4 Campaign_9x16_C_6s.mp4 ``` **Less Ideal:** ``` video1.mp4 final_cut_v2.mp4 master_backup.mp4 ``` **Metadata extraction looks for:** - Format: `1x1`, `9x16`, `16x9`, `4x3` - Variant: `A`, `B`, `C`, `D`, `E`, `F` - Duration: `6s`, `10s`, `15s`, `20s` ### 2. Master Organization Organize masters by campaign: ``` masters/ โ”œโ”€โ”€ spring_2024/ โ”‚ โ”œโ”€โ”€ master_1x1_A_6s.mp4 โ”‚ โ”œโ”€โ”€ master_1x1_A_10s.mp4 โ”‚ โ””โ”€โ”€ master_1x1_A_15s.mp4 โ”œโ”€โ”€ summer_2024/ โ”‚ โ””โ”€โ”€ ... โ””โ”€โ”€ fall_2024/ โ””โ”€โ”€ ... ``` ### 3. Adaptation Organization Organize adaptations by market/format: ``` adaptations/ โ”œโ”€โ”€ AT/ # Austria โ”œโ”€โ”€ DE/ # Germany โ”œโ”€โ”€ FR/ # France โ””โ”€โ”€ UK/ # United Kingdom ``` Or by format: ``` adaptations/ โ”œโ”€โ”€ 1x1/ # Square โ”œโ”€โ”€ 9x16/ # Vertical โ””โ”€โ”€ 16x9/ # Landscape ``` ### 4. Report Naming Use descriptive report names: ```bash # Good python batch_match_fast.py AT/ AT_Spring2024_$(date +%Y%m%d).html python batch_match_fast.py DE/ DE_Spring2024_$(date +%Y%m%d).html # Descriptive with timestamp python batch_match_fast.py AT/ AT_Spring_20240126.html ``` --- ## ๐Ÿ“Š Interpreting Results ### Confidence Levels | Badge | Meaning | Action | |-------|---------|--------| | ๐ŸŸข **VERY HIGH** | 90-100% confidence | Accept match | | ๐ŸŸข **HIGH** | 75-89% confidence | Accept match | | ๐ŸŸก **MEDIUM** | 60-74% confidence | Review recommended | | ๐Ÿ”ด **LOW** | 50-59% confidence | Manual review required | | ๐Ÿ”ด **VERY LOW** | <50% confidence | Likely incorrect | ### Match Percentage - **100%**: Perfect match, all frames found - **95-99%**: Excellent match, minor differences - **80-94%**: Good match, some variations - **60-79%**: Moderate match, review recommended - **<60%**: Weak match, likely incorrect ### Method Indicators - **HASH**: Matched via perceptual hash (fast, reliable) - **AKAZE**: Verified via AKAZE features (robust, accurate) - **AI VISION**: Matched via GPT-4V (cross-aspect, semantic) --- ## ๐ŸŽฏ Workflow Examples ### Daily Production Workflow ```bash # 1. Process overnight batch python batch_match_fast.py /incoming/daily/ daily_$(date +%Y%m%d).html # 2. Review report in morning open daily_20240126.html # 3. Export results if needed # (Report is self-contained HTML) ``` ### Quality Assurance Workflow ```bash # 1. Fast pass for bulk checking python batch_match_fast.py /batch1/ quick_check.html # 2. Full pass for final validation python cli.py batch-match /batch1/ -o final_validation.html # 3. Compare results # Both reports should show same matches # Full pass shows AKAZE verification ``` ### Multi-Market Workflow ```bash # Process each market separately for market in AT DE FR UK ES IT; do python batch_match_fast.py "/markets/$market/" "${market}_report.html" done # Consolidate results # Each market gets its own report for review ``` --- ## ๐Ÿ“ Summary **For most use cases, use Fast Mode:** ```bash python batch_match_fast.py /path/to/adaptations/ report.html ``` **For final validation, use Full Mode:** ```bash python cli.py batch-match /path/to/adaptations/ -o report.html ``` **Both modes:** - โœ… Handle text/logo differences - โœ… Support multiple languages - โœ… Generate beautiful HTML reports - โœ… Show confidence levels and methods - โœ… Rank by best match **Tested and verified with real-world data! ๐ŸŽ‰** --- **End of Guide**