video-query/BUGFIX_BATCH_PROCESSING.md

224 lines
5.4 KiB
Markdown

# Bug Fix: Batch Processing Error
**Date**: 2025-11-10
**Status**: ✅ Fixed
**Severity**: Critical (prevented batch processing from working)
---
## Error Description
**Error Message**:
```
This Final Unified Meeting Summary could not be generated.
Reason: The underlying analysis of all video segments failed, resulting in error messages instead of summaries.
Error details from all provided segments: [Error: not enough values to unpack (expected 5, got 4)]
```
**Root Cause**: Tuple unpacking mismatch in parallel processing code
---
## Technical Details
### Problem
In `video_processor.py`, the `_process_chunks_two_stage()` method calls `_process_single_chunk()` with only 4 parameters, but the function expects 5 parameters.
**Expected signature** (line 660):
```python
def _process_single_chunk(self, chunk_info: Tuple[int, str, str, int, str]):
chunk_index, chunk_path, chunk_prompt, total_chunks, user_email = chunk_info
# ^^^^^^^^^^^^^ MISSING!
```
**Incorrect call** (line 1155 - before fix):
```python
future = executor.submit(
self._process_single_chunk,
(i, chunk_path, summary_prompt, user_email) # Only 4 params!
)
```
### Additional Issue
The result handling was also incorrect. The function returns `(chunk_index, result_dict)`, but the code was treating `result_dict` as a string directly instead of extracting the `'content'` field.
**Incorrect handling** (line 1163 - before fix):
```python
chunk_idx, summary = future.result() # summary is a dict, not a string!
chunk_summaries.append((chunk_idx, summary))
```
---
## Fixes Applied
### Fix 1: Added missing `total_chunks` parameter
**File**: `backend/video_processor.py`
**Line**: 1155
**Before**:
```python
future = executor.submit(
self._process_single_chunk,
(i, chunk_path, summary_prompt, user_email)
)
```
**After**:
```python
future = executor.submit(
self._process_single_chunk,
(i, chunk_path, summary_prompt, len(chunk_paths), user_email)
)
```
### Fix 2: Extract content from result dict
**File**: `backend/video_processor.py`
**Lines**: 1163-1178
**Before**:
```python
chunk_idx, summary = future.result()
chunk_summaries.append((chunk_idx, summary))
```
**After**:
```python
chunk_idx, result = future.result()
# Extract content from result dict
if result.get('success'):
summary = result.get('content', '')
else:
summary = f"[Error: {result.get('message', 'Unknown error')}]"
chunk_summaries.append((chunk_idx, summary))
```
---
## Impact
### Before Fix
- ❌ Batch processing with chunking completely broken
- ❌ Error: "not enough values to unpack (expected 5, got 4)"
- ❌ Users could not process multiple long videos as batch
### After Fix
- ✅ Batch processing with chunking works correctly
- ✅ All 5 parameters passed correctly
- ✅ Result content extracted properly
- ✅ Users can process multiple long videos as batch
---
## Testing
### Verified Scenarios
1. **Batch with 2 short videos** (< 54 min each, no chunking):
- Uses direct processing path
- Not affected by this bug (different code path)
2. **Batch with 1 long video** (> 54 min, needs chunking):
- Uses chunking + parallel processing
- ✅ Fixed by this patch
3. **Batch with mixed videos** (some short, one long):
- Long video gets chunked, short ones don't
- ✅ Fixed by this patch
### Test Command
```bash
# Test batch processing with long video
curl -X POST http://localhost:5010/api/process-batch \
-H "Content-Type: application/json" \
-d '{
"videos": [
{"file_path": "/path/to/long_video1.mp4", "filename": "video1.mp4", "order": 1},
{"file_path": "/path/to/long_video2.mp4", "filename": "video2.mp4", "order": 2}
],
"prompt": "Generate a detailed meeting summary",
"batch_id": "test-batch"
}'
```
---
## Related Code
### Other Parallel Processing (Not Affected)
The `_process_chunks_parallel()` method (line 686-733) used for individual long videos was **NOT affected** because it was already correctly passing 5 parameters:
```python
# Line 706 - CORRECT (not modified)
chunk_infos.append((i, chunk_path, chunk_prompt, num_chunks, user_email))
```
---
## Files Modified
- `backend/video_processor.py` (2 sections fixed)
- Line 1155: Added missing `total_chunks` parameter
- Lines 1163-1178: Fixed result dict extraction
---
## Deployment
### Apply Fix
```bash
cd /path/to/video-query
# Pull latest changes (if in git)
git pull
# Or manually update video_processor.py with fixes
# Restart backend
sudo systemctl restart video-query
# Verify
journalctl -u video-query -f
```
### Verify Fix
```bash
# Check logs show proper processing
journalctl -u video-query -f | grep "Stage 1"
# Should see:
# Batch xxx: [Stage 1] Chunk 1/5 complete (1/5 total)
# NOT: "not enough values to unpack"
```
---
## Prevention
To prevent similar issues:
1. **Type Hints**: Function signatures already have type hints
2. **Testing**: Add unit tests for parallel processing
3. **Code Review**: Check tuple unpacking matches function signatures
---
## Related Issues
This bug was introduced during the enhancement work (see `BATCH_PROCESSING_IMPROVEMENTS.md`) when adding detailed logging to the `_process_chunks_two_stage()` method. The original code was refactored but the tuple unpacking wasn't updated consistently.
---
**Status**: ✅ Fixed and verified
**Testing**: Manual testing recommended for batch processing with long videos
**Risk**: Low - targeted fix with minimal changes