Manish Tanwar 005769489e new features with video job cycles and video lenght changes

2025-10-22 14:24:13 +05:30

6.3 KiB

Raw Blame History

Parallel Video Chunk Processing

Overview

The video-query application now supports parallel processing of video chunks, significantly reducing processing time for long videos.

Performance Improvements

Before (Sequential Processing)

2-hour video (3 chunks of 50 min each)
- Chunk 1: ~3-5 minutes
- Chunk 2: ~3-5 minutes
- Chunk 3: ~3-5 minutes
- Total: 9-15 minutes

After (Parallel Processing)

2-hour video (3 chunks processed simultaneously)
- All 3 chunks: ~3-5 minutes
- Total: 3-5 minutes

Speed Improvement: 3x faster!

Configuration

Default Settings

# Default configuration in VideoProcessor
max_parallel_chunks = 4  # Conservative setting for free tier

Custom Configuration

from video_processor import VideoProcessor

# For Free Tier (5 RPM limit)
processor = VideoProcessor(api_key="your_key", max_parallel_chunks=3)

# For Paid Tier (150 RPM limit)
processor = VideoProcessor(api_key="your_key", max_parallel_chunks=10)

API Rate Limits

Gemini API Limits by Tier

Tier	RPM Limit	Recommended Workers	Max Video Chunks
Free	5 RPM	3-4 workers	3-4 simultaneous
Paid Tier 1	150 RPM	10+ workers	10+ simultaneous
Paid Tier 2+	1000+ RPM	20+ workers	20+ simultaneous

Important Notes

Rate limits are per project, not per API key
Concurrent requests are allowed as long as you stay within RPM limits
The system automatically respects your configured max_parallel_chunks setting

Usage

Enable Parallel Processing (Default)

from video_processor import VideoProcessor

processor = VideoProcessor()

# Parallel processing is enabled by default
result = processor.process_long_video(
    video_path="/path/to/long_video.mp4",
    prompt="Generate a meeting summary",
    user_email="user@example.com"
)

print(f"Processing mode: {result['processing_mode']}")  # Output: "parallel"

Disable Parallel Processing (Sequential)

# If you prefer sequential processing
result = processor.process_long_video(
    video_path="/path/to/long_video.mp4",
    prompt="Generate a meeting summary",
    user_email="user@example.com",
    use_parallel=False  # Disable parallel processing
)

print(f"Processing mode: {result['processing_mode']}")  # Output: "sequential"

Auto-Detection Mode

# Use process_video_auto for automatic detection
result = processor.process_video_auto(
    video_path="/path/to/video.mp4",
    prompt="Generate documentation",
    user_email="user@example.com"
)

# Automatically uses parallel processing for long videos

How It Works

Video Splitting
- Long videos are split into 25-minute chunks (configurable)
- Uses FFmpeg for fast, lossless splitting
Parallel Upload & Processing
- Chunks are uploaded to Gemini API concurrently
- Multiple API calls run simultaneously (up to max_parallel_chunks)
- Thread-safe execution using ThreadPoolExecutor
Response Combination
- Responses are collected in correct order
- Intelligently combined based on prompt type (meeting, documentation, etc.)
- For meetings, can optionally synthesize into unified summary
Cleanup
- Temporary chunk files are automatically deleted
- Handles errors gracefully

Technical Implementation

Thread Safety

The implementation uses:

concurrent.futures.ThreadPoolExecutor for parallel execution
threading.Lock for rate limiting
Order-preserving result collection

Error Handling

Each chunk is processed independently
If one chunk fails, the error is logged with specific details
Failed chunks return error information without crashing entire job
Results maintain correct order regardless of completion order

Environment Variables

# Optional: Set max workers via environment variable
export VIDEO_QUERY_MAX_WORKERS=5

# API Key (required)
export GOOGLE_API_KEY="your_gemini_api_key"

Logging

Monitor parallel processing with detailed logs:

[INFO] Starting parallel processing of 3 chunks with 4 workers
[INFO] [Parallel] Processing chunk 1/3: /tmp/video_chunk_01.mp4
[INFO] [Parallel] Processing chunk 2/3: /tmp/video_chunk_02.mp4
[INFO] [Parallel] Processing chunk 3/3: /tmp/video_chunk_03.mp4
[INFO] [Parallel] Completed chunk 2/3
[INFO] [Parallel] Progress: 1/3 chunks completed
[INFO] [Parallel] Completed chunk 1/3
[INFO] [Parallel] Progress: 2/3 chunks completed
[INFO] [Parallel] Completed chunk 3/3
[INFO] [Parallel] Progress: 3/3 chunks completed
[INFO] [Parallel] All 3 chunks processed
[INFO] Combining responses from all chunks...

Troubleshooting

Rate Limit Errors

Symptom: 429 Too Many Requests errors

Solution:

# Reduce max_parallel_chunks
processor = VideoProcessor(max_parallel_chunks=2)

Memory Issues

Symptom: Out of memory errors

Solution:

# Process fewer chunks in parallel
processor = VideoProcessor(max_parallel_chunks=2)

Slower Performance

Symptom: Parallel processing is slower than expected

Possible Causes:

Network bottleneck (upload bandwidth)
CPU bottleneck (video encoding)
API rate limiting

Solution:

Check network speed
Monitor CPU usage
Verify API tier and limits

Best Practices

Choose Appropriate Worker Count
- Free tier: 3-4 workers
- Paid tier: 8-10 workers
- Don't exceed your RPM limit
Monitor Resource Usage
- Check server memory
- Monitor network bandwidth
- Track API usage
Handle Errors Gracefully
- Implement retry logic
- Log all errors
- Provide fallback to sequential processing
Optimize Chunk Size
- 25 minutes for most cases
- 50 minutes if API supports it
- Balance between parallelism and chunk size

Future Enhancements

Potential improvements:

Adaptive worker count based on API tier detection
Exponential backoff for rate limit errors
Progress callbacks for real-time updates
Configurable chunk duration
Support for asyncio (async/await pattern)

Support

For issues or questions:

Check logs for detailed error information
Verify API key and rate limits
Review configuration settings
Consult Gemini API documentation

License

Same as the main project license.

6.3 KiB Raw Blame History

Parallel Video Chunk Processing

Overview

Performance Improvements

Before (Sequential Processing)

After (Parallel Processing)

Configuration

Default Settings

Custom Configuration

API Rate Limits

Gemini API Limits by Tier

Important Notes

Usage

Enable Parallel Processing (Default)

Disable Parallel Processing (Sequential)

Auto-Detection Mode

How It Works

Technical Implementation

Thread Safety

Error Handling

Environment Variables

Logging

Troubleshooting

Rate Limit Errors

Memory Issues

Slower Performance

Best Practices

Future Enhancements

Support

License

6.3 KiB

Raw Blame History