| .claude | ||
| backend | ||
| frontend | ||
| .gitignore | ||
| CLAUDE.md | ||
| extract_user_logs.sh | ||
| extract_user_logs_robust.sh | ||
| quick_extract.sh | ||
| README.md | ||
| requirements.txt | ||
| restart.sh | ||
| video_query.py | ||
Video Query Tool
A full-stack web application that processes videos using Google's Gemini AI model with dual processing modes (Single Video and Batch). Features intelligent two-stage synthesis, automatic video splitting, Azure AD B2C authentication, chunked file uploads up to 5GB, PDF generation with merged Mermaid diagrams, and comprehensive usage tracking.
Features
Core Functionality
- Dual Processing Modes:
- Single Video Mode: Process videos individually with per-video control
- Batch Mode: Combine multiple related videos (up to 10) for unified analysis
- Intelligent AI Synthesis: Two-stage processing ensures seamless results
- Stage 1: Each video/chunk → concise summary
- Stage 2: All summaries → unified cohesive result
- Video Processing: Upload and analyze using Google Gemini 2.0 Flash Exp AI model
- Prompt Templates:
- Meeting Summary
- Process/Tool Documentation
- Process Documentation with Mermaid Charts
- Custom Prompts
- Large File Support: Chunked upload system supporting files up to 5GB per file
- PDF Generation: Convert results to PDF with embedded Mermaid diagrams
- Authentication: Azure AD B2C integration (optional, controlled via .env)
- Parallel Processing: Process up to 2 videos simultaneously (single mode)
- Long Video Support: Automatic splitting and parallel chunk processing for videos > 54 minutes
Technical Features
- Explicit User Control: No auto-processing - all videos require explicit "Process" button click
- Batch Video Management: Reorder, arrange, and remove videos before processing
- Smart Diagram Merging: Multiple Mermaid diagrams intelligently combined into one
- Persistent Mode Selection: Processing mode and batch queue persist across page refreshes
- Multiple File Queue: Upload multiple videos, manage queue (Stop, Retry, Remove)
- Drag & Drop Upload: Modern file upload interface with progress tracking
- Real-time Status: Live status updates (uploading → uploaded → processing → completed)
- Queue Management: Stop, retry, or remove videos from processing queue anytime
- Automatic Video Splitting: Videos > 54 minutes automatically split into 54-min chunks
- Rate Limiting: Built-in API rate limiting (2-second delay) to prevent quota errors
- Error Handling: Comprehensive error handling with retry capability
- Processing Time Display: Shows processing duration for each completed video/batch
- Usage Analytics: Automated tracking via webhook integration
- Production Ready: Systemd service configuration and deployment scripts
Limitations
- Video Length: No limit - videos automatically split into 54-minute chunks
- Single Chunk Limit: Individual chunks must be under 55 minutes (handled automatically)
- File Size: Application supports uploads up to 5GB per file
- Supported Formats: MP4, AVI, MOV, WMV, MKV, WEBM
- Parallel Processing: Max 2 videos simultaneously in single mode (rate limit protection)
- Batch Size: Maximum 10 videos per batch processing session
- API Rate Limits: Gemini free tier: 5 RPM (built-in 2s delay between calls)
Project Structure
video_query/
├── backend/ # Flask/Hypercorn API server
│ ├── app.py # Main Flask application with PDF generation
│ ├── video_processor.py # Gemini API integration, parallel processing, rate limiting
│ ├── video_splitter.py # Video splitting for long videos (54-min chunks)
│ ├── auth.py # Azure AD B2C authentication handlers
│ ├── chunked_upload.py # Chunked file upload Blueprint
│ ├── run.py # Hypercorn production server
│ ├── requirements.txt # Python dependencies
│ ├── .env # Environment variables (GOOGLE_API_KEY)
│ └── test_*.py # API testing utilities
├── frontend/ # React SPA
│ ├── src/
│ │ ├── components/ # React components
│ │ │ ├── VideoUpload.js # Multi-file drag & drop upload
│ │ │ ├── PromptSelector.js # Mode selection and prompt editing
│ │ │ ├── ResultDisplay.js # Results with PDF generation
│ │ │ ├── AuthenticatedContent.js # Queue management, processed list
│ │ │ └── Login.js # Authentication interface
│ │ ├── auth/ # Authentication utilities
│ │ │ ├── authConfig.js # Azure AD B2C configuration
│ │ │ ├── AuthProvider.js # MSAL React provider
│ │ │ └── authApiClient.js # Authenticated API client
│ │ └── utils/
│ │ ├── chunkedUploader.js # Large file upload handler
│ │ ├── configLoader.js # Dynamic config loading
│ │ └── pathUtils.js # Path utilities
│ ├── public/
│ │ ├── config.js # Production config (committed)
│ │ ├── config.local.js # Local dev config (not committed)
│ │ └── index.html # Loads both configs
│ ├── package.json # Node.js dependencies
│ ├── .env # Frontend environment variables
│ └── build/ # Production build output
├── DEPLOYMENT.md # Production deployment instructions
├── LOG_EXTRACTION_README.md # Usage analytics documentation
├── CLAUDE.md # Development guidelines and build commands
├── restart.sh # Development restart script
├── quick_extract.sh # Log extraction utility
└── extract_user_logs*.sh # Advanced log processing
Dependencies
Backend Dependencies
- Flask 3.1.0: Web framework
- google-genai 1.45.0: Gemini AI SDK (updated API)
- Hypercorn 0.17.3: ASGI production server
- python-jose: JWT token validation for Azure AD
- flask-cors 5.0.1: Cross-origin resource sharing
- pdfkit 1.0.0: PDF generation from HTML
- cairosvg 2.8.0: SVG to PNG conversion for diagrams
- Pillow 11.2.1: Image processing
- python-dotenv 1.1.0: Environment variable management
- ffmpeg-python: Video splitting functionality
Frontend Dependencies
- React 18.2.0: UI framework
- @azure/msal-react 3.0.12: Microsoft Authentication Library
- axios 1.6.0: HTTP client with abort signal support
- bootstrap 5.3.2: UI components and styling
- mermaid 11.6.0: Diagram generation
- react-dropzone 14.2.3: Multi-file upload interface
- showdown 2.1.0: Markdown to HTML conversion
Setup Instructions
Prerequisites
- Python 3.8+
- Node.js 16+
- Google Cloud API key with Gemini access
- Azure AD B2C tenant (optional, for authentication)
- wkhtmltopdf (for PDF generation)
- ffmpeg/ffprobe (for video splitting)
Backend Setup
-
Create and activate virtual environment:
cd backend python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate -
Install dependencies:
pip install -r requirements.txt -
Set up environment variables (create
backend/.env):GOOGLE_API_KEY=your_gemini_api_key_here -
Install system dependencies:
# Ubuntu/Debian: sudo apt-get install wkhtmltopdf python3-cairo libcairo2-dev ffmpeg # macOS: brew install cairo wkhtmltopdf ffmpeg -
Start development server:
python3 run.py # Server runs on http://0.0.0.0:5010
Frontend Setup
-
Install Node.js dependencies:
cd frontend npm install -
Configure authentication (optional):
- Edit
frontend/.env:REACT_APP_DISABLE_AUTH=true # Disable auth for local dev - For production, update
src/auth/authConfig.jswith Azure AD B2C details
- Edit
-
Configure backend URL for local development:
- File
frontend/public/config.local.jsalready configured for localhost:5010 - This file is not committed (in .gitignore)
- File
-
Start development server:
npm start # Server runs on http://localhost:3000
Production Deployment
System Requirements
- Ubuntu/CentOS server
- Apache/Nginx web server
- Python 3.8+ with virtual environment
- wkhtmltopdf system package
- ffmpeg/ffprobe for video processing
- Node.js for building frontend
Backend Deployment
-
Install system packages:
sudo apt-get update sudo apt-get install -y wkhtmltopdf python3-cairo python3-pil libcairo2-dev ffmpeg -
Set up virtual environment and install dependencies:
cd backend python3 -m venv venv source venv/bin/activate pip install -r requirements.txt -
Create production .env file:
echo "GOOGLE_API_KEY=your_production_api_key" > .env -
Create systemd service (see
backend/video-query.service):sudo cp backend/video-query.service /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl enable video-query sudo systemctl start video-query
Frontend Deployment
-
Update production config (
frontend/public/config.js):window.__APP_CONFIG__ = { "basePath": "/video-query", "domain": "https://your-domain.com", "api": { "videoProcessingEndpoint": "https://your-domain.com/video_query_back/api/process", "chunkedUploadEndpoint": "https://your-domain.com/video_query_back" } }; -
Build for production:
cd frontend npm run build -
Deploy to web server:
sudo cp -r build/* /var/www/html/video-query/ -
Configure web server (Apache example):
<VirtualHost *:443> DocumentRoot /var/www/html # Frontend Alias /video-query /var/www/html/video-query # Backend proxy ProxyPass /video_query_back http://localhost:5010 ProxyPassReverse /video_query_back http://localhost:5010 </VirtualHost>
API Reference
Video Processing Endpoints
- POST /api/process: Single video processing endpoint
- Accepts JSON:
file_path,filename,prompt(for chunked uploads) - Returns: Processing result with content, processing time, chunks info
- Accepts JSON:
- POST /api/process-batch: Batch video processing endpoint
- Accepts JSON:
videos(array of {file_path, filename, order}),prompt,batch_id - Returns: Unified result for all videos, total chunks processed
- Maximum 10 videos per batch
- Accepts JSON:
Chunked Upload Endpoints
- POST /api/init-upload: Initialize chunked upload session
- POST /api/upload-chunk/<upload_id>: Upload file chunk
- POST /api/complete-upload/<upload_id>: Mark upload complete
- POST /api/cancel-upload/<upload_id>: Cancel upload
PDF Generation Endpoints
- POST /api/generate-pdf: Generate PDF from HTML with Mermaid diagrams
- JSON data:
html,textDiagrams,diagramPngs,videoFileName
- JSON data:
Authentication Endpoints (if enabled)
- GET /api/auth-test: Verify authentication status
Configuration Files
Backend Configuration
- backend/.env: Environment variables
GOOGLE_API_KEY=your_api_key
Frontend Configuration
- frontend/.env: React environment variables
REACT_APP_DISABLE_AUTH=true # Optional: disable auth for local dev - frontend/public/config.js: Production configuration (committed to git)
- frontend/public/config.local.js: Local development override (not committed)
Key Configuration Details
- Parallel Processing: Max 2 concurrent videos (App.js:245)
- Rate Limiting: 2-second delay between API calls (video_processor.py:224)
- File Size Threshold: 10MB for inline vs upload API (video_processor.py:167)
- Video Chunk Duration: 54 minutes (video_splitter.py)
Usage
Local Development
- Start backend:
cd backend && source venv/bin/activate && python3 run.py - Start frontend:
cd frontend && npm start - Open: http://localhost:3000
Processing Videos
The application supports two processing modes, selected via a toggle at the top:
Single Video Mode (Default)
Process videos individually with per-video control:
- Select Mode: Click "Single Video Mode" button at the top
- Upload: Drag & drop videos or click to select files
- Choose Prompt: Select a prompt template or write custom prompt
- Process Each Video: Click individual "Process Video" button for each uploaded video
- Videos show status: uploading → uploaded → processing → completed
- Up to 2 videos process in parallel automatically
- Monitor Progress: Watch real-time status updates and processing indicators
- Manage Queue: Use Stop (⏸️), Retry (🔄), or Remove (🗑️) per video
- View Results: Completed videos appear in "Processed Videos" section
- Download: Click "Download PDF" or "Copy Formatted" for any result
Batch Mode
Process multiple related videos as one unified analysis:
- Select Mode: Click "Batch Mode" button at the top
- Upload Videos: Add multiple related videos (max 10 per batch)
- Arrange Videos: Use Up/Down arrows to reorder videos in logical sequence
- Remove Unwanted: Click Remove button to exclude videos from batch
- Choose Prompt: Select or customize the analysis prompt
- Process Batch: Click single "Process Batch" button to analyze all videos together
- Backend automatically handles video splitting and chunking
- Two-stage synthesis creates unified result across all videos
- Multiple Mermaid diagrams merged into one comprehensive diagram
- View Results: Single unified result appears for entire batch
- Download: Generate PDF with combined analysis
Key Differences:
- Single Mode: Each video = separate result, manual per-video processing
- Batch Mode: All videos = one unified result, single batch processing
- Explicit Control: No auto-processing - all require button clicks
Processing Long Videos
- Videos > 54 minutes automatically split into chunks
- Each chunk processed in parallel (backend handles this)
- Results intelligently combined
- Processing time displayed for transparency
Development Utilities
- restart.sh: Quick development environment restart
- backend/test_*.py: API testing and validation scripts
- backend/run.py: Production server with optimized settings for large uploads
- extract_user_logs.sh*: Usage analytics extraction
Security Features
- Azure AD B2C integration with JWT validation (optional)
- CORS protection with specific origin allowlisting
- Secure file upload validation
- Temporary file cleanup
- Token expiration handling
- Rate limiting to prevent API abuse
- Abort signal support for cancellation
Troubleshooting
Backend Issues
- 400 INVALID_ARGUMENT: Usually rate limiting - check logs for details
- File upload errors: Verify ffmpeg installed (
which ffprobe) - PDF generation fails: Ensure wkhtmltopdf installed
Frontend Issues
- CORS errors: Check backend CORS settings in app.py
- Changes not visible: Clear browser cache (Ctrl+Shift+R)
- Config not loading: Verify config.js and config.local.js exist in public/
Rate Limiting
- Backend: 2-second delay between API calls (automatic)
- Frontend: Max 2 parallel videos
- Free tier: 5 RPM limit enforced by Gemini API
License
This project is proprietary and confidential.