video-query/README.md
2025-11-06 18:49:55 +05:30

398 lines
15 KiB
Markdown

# Video Query Tool
A full-stack web application that processes videos using Google's Gemini AI model with dual processing modes (Single Video and Batch). Features intelligent two-stage synthesis, automatic video splitting, Azure AD B2C authentication, chunked file uploads up to 5GB, PDF generation with merged Mermaid diagrams, and comprehensive usage tracking.
## Features
### Core Functionality
- **Dual Processing Modes**:
- **Single Video Mode**: Process videos individually with per-video control
- **Batch Mode**: Combine multiple related videos (up to 10) for unified analysis
- **Intelligent AI Synthesis**: Two-stage processing ensures seamless results
- Stage 1: Each video/chunk → concise summary
- Stage 2: All summaries → unified cohesive result
- **Video Processing**: Upload and analyze using Google Gemini 2.0 Flash Exp AI model
- **Prompt Templates**:
- Meeting Summary
- Process/Tool Documentation
- Process Documentation with Mermaid Charts
- Custom Prompts
- **Large File Support**: Chunked upload system supporting files up to 5GB per file
- **PDF Generation**: Convert results to PDF with embedded Mermaid diagrams
- **Authentication**: Azure AD B2C integration (optional, controlled via .env)
- **Parallel Processing**: Process up to 2 videos simultaneously (single mode)
- **Long Video Support**: Automatic splitting and parallel chunk processing for videos > 54 minutes
### Technical Features
- **Explicit User Control**: No auto-processing - all videos require explicit "Process" button click
- **Batch Video Management**: Reorder, arrange, and remove videos before processing
- **Smart Diagram Merging**: Multiple Mermaid diagrams intelligently combined into one
- **Persistent Mode Selection**: Processing mode and batch queue persist across page refreshes
- **Multiple File Queue**: Upload multiple videos, manage queue (Stop, Retry, Remove)
- **Drag & Drop Upload**: Modern file upload interface with progress tracking
- **Real-time Status**: Live status updates (uploading → uploaded → processing → completed)
- **Queue Management**: Stop, retry, or remove videos from processing queue anytime
- **Automatic Video Splitting**: Videos > 54 minutes automatically split into 54-min chunks
- **Rate Limiting**: Built-in API rate limiting (2-second delay) to prevent quota errors
- **Error Handling**: Comprehensive error handling with retry capability
- **Processing Time Display**: Shows processing duration for each completed video/batch
- **Usage Analytics**: Automated tracking via webhook integration
- **Production Ready**: Systemd service configuration and deployment scripts
## Limitations
- **Video Length**: No limit - videos automatically split into 54-minute chunks
- **Single Chunk Limit**: Individual chunks must be under 55 minutes (handled automatically)
- **File Size**: Application supports uploads up to 5GB per file
- **Supported Formats**: MP4, AVI, MOV, WMV, MKV, WEBM
- **Parallel Processing**: Max 2 videos simultaneously in single mode (rate limit protection)
- **Batch Size**: Maximum 10 videos per batch processing session
- **API Rate Limits**: Gemini free tier: 5 RPM (built-in 2s delay between calls)
## Project Structure
```
video_query/
├── backend/ # Flask/Hypercorn API server
│ ├── app.py # Main Flask application with PDF generation
│ ├── video_processor.py # Gemini API integration, parallel processing, rate limiting
│ ├── video_splitter.py # Video splitting for long videos (54-min chunks)
│ ├── auth.py # Azure AD B2C authentication handlers
│ ├── chunked_upload.py # Chunked file upload Blueprint
│ ├── run.py # Hypercorn production server
│ ├── requirements.txt # Python dependencies
│ ├── .env # Environment variables (GOOGLE_API_KEY)
│ └── test_*.py # API testing utilities
├── frontend/ # React SPA
│ ├── src/
│ │ ├── components/ # React components
│ │ │ ├── VideoUpload.js # Multi-file drag & drop upload
│ │ │ ├── PromptSelector.js # Mode selection and prompt editing
│ │ │ ├── ResultDisplay.js # Results with PDF generation
│ │ │ ├── AuthenticatedContent.js # Queue management, processed list
│ │ │ └── Login.js # Authentication interface
│ │ ├── auth/ # Authentication utilities
│ │ │ ├── authConfig.js # Azure AD B2C configuration
│ │ │ ├── AuthProvider.js # MSAL React provider
│ │ │ └── authApiClient.js # Authenticated API client
│ │ └── utils/
│ │ ├── chunkedUploader.js # Large file upload handler
│ │ ├── configLoader.js # Dynamic config loading
│ │ └── pathUtils.js # Path utilities
│ ├── public/
│ │ ├── config.js # Production config (committed)
│ │ ├── config.local.js # Local dev config (not committed)
│ │ └── index.html # Loads both configs
│ ├── package.json # Node.js dependencies
│ ├── .env # Frontend environment variables
│ └── build/ # Production build output
├── DEPLOYMENT.md # Production deployment instructions
├── LOG_EXTRACTION_README.md # Usage analytics documentation
├── CLAUDE.md # Development guidelines and build commands
├── restart.sh # Development restart script
├── quick_extract.sh # Log extraction utility
└── extract_user_logs*.sh # Advanced log processing
```
## Dependencies
### Backend Dependencies
- **Flask 3.1.0**: Web framework
- **google-genai 1.45.0**: Gemini AI SDK (updated API)
- **Hypercorn 0.17.3**: ASGI production server
- **python-jose**: JWT token validation for Azure AD
- **flask-cors 5.0.1**: Cross-origin resource sharing
- **pdfkit 1.0.0**: PDF generation from HTML
- **cairosvg 2.8.0**: SVG to PNG conversion for diagrams
- **Pillow 11.2.1**: Image processing
- **python-dotenv 1.1.0**: Environment variable management
- **ffmpeg-python**: Video splitting functionality
### Frontend Dependencies
- **React 18.2.0**: UI framework
- **@azure/msal-react 3.0.12**: Microsoft Authentication Library
- **axios 1.6.0**: HTTP client with abort signal support
- **bootstrap 5.3.2**: UI components and styling
- **mermaid 11.6.0**: Diagram generation
- **react-dropzone 14.2.3**: Multi-file upload interface
- **showdown 2.1.0**: Markdown to HTML conversion
## Setup Instructions
### Prerequisites
- Python 3.8+
- Node.js 16+
- Google Cloud API key with Gemini access
- Azure AD B2C tenant (optional, for authentication)
- wkhtmltopdf (for PDF generation)
- ffmpeg/ffprobe (for video splitting)
### Backend Setup
1. **Create and activate virtual environment**:
```bash
cd backend
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
2. **Install dependencies**:
```bash
pip install -r requirements.txt
```
3. **Set up environment variables** (create `backend/.env`):
```bash
GOOGLE_API_KEY=your_gemini_api_key_here
```
4. **Install system dependencies**:
```bash
# Ubuntu/Debian:
sudo apt-get install wkhtmltopdf python3-cairo libcairo2-dev ffmpeg
# macOS:
brew install cairo wkhtmltopdf ffmpeg
```
5. **Start development server**:
```bash
python3 run.py
# Server runs on http://0.0.0.0:5010
```
### Frontend Setup
1. **Install Node.js dependencies**:
```bash
cd frontend
npm install
```
2. **Configure authentication** (optional):
- Edit `frontend/.env`:
```
REACT_APP_DISABLE_AUTH=true # Disable auth for local dev
```
- For production, update `src/auth/authConfig.js` with Azure AD B2C details
3. **Configure backend URL for local development**:
- File `frontend/public/config.local.js` already configured for localhost:5010
- This file is not committed (in .gitignore)
4. **Start development server**:
```bash
npm start
# Server runs on http://localhost:3000
```
## Production Deployment
### System Requirements
- Ubuntu/CentOS server
- Apache/Nginx web server
- Python 3.8+ with virtual environment
- wkhtmltopdf system package
- ffmpeg/ffprobe for video processing
- Node.js for building frontend
### Backend Deployment
1. **Install system packages**:
```bash
sudo apt-get update
sudo apt-get install -y wkhtmltopdf python3-cairo python3-pil libcairo2-dev ffmpeg
```
2. **Set up virtual environment and install dependencies**:
```bash
cd backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
3. **Create production .env file**:
```bash
echo "GOOGLE_API_KEY=your_production_api_key" > .env
```
4. **Create systemd service** (see `backend/video-query.service`):
```bash
sudo cp backend/video-query.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable video-query
sudo systemctl start video-query
```
### Frontend Deployment
1. **Update production config** (`frontend/public/config.js`):
```javascript
window.__APP_CONFIG__ = {
"basePath": "/video-query",
"domain": "https://your-domain.com",
"api": {
"videoProcessingEndpoint": "https://your-domain.com/video_query_back/api/process",
"chunkedUploadEndpoint": "https://your-domain.com/video_query_back"
}
};
```
2. **Build for production**:
```bash
cd frontend
npm run build
```
3. **Deploy to web server**:
```bash
sudo cp -r build/* /var/www/html/video-query/
```
4. **Configure web server** (Apache example):
```apache
<VirtualHost *:443>
DocumentRoot /var/www/html
# Frontend
Alias /video-query /var/www/html/video-query
# Backend proxy
ProxyPass /video_query_back http://localhost:5010
ProxyPassReverse /video_query_back http://localhost:5010
</VirtualHost>
```
## API Reference
### Video Processing Endpoints
- **POST /api/process**: Single video processing endpoint
- Accepts JSON: `file_path`, `filename`, `prompt` (for chunked uploads)
- Returns: Processing result with content, processing time, chunks info
- **POST /api/process-batch**: Batch video processing endpoint
- Accepts JSON: `videos` (array of {file_path, filename, order}), `prompt`, `batch_id`
- Returns: Unified result for all videos, total chunks processed
- Maximum 10 videos per batch
### Chunked Upload Endpoints
- **POST /api/init-upload**: Initialize chunked upload session
- **POST /api/upload-chunk/<upload_id>**: Upload file chunk
- **POST /api/complete-upload/<upload_id>**: Mark upload complete
- **POST /api/cancel-upload/<upload_id>**: Cancel upload
### PDF Generation Endpoints
- **POST /api/generate-pdf**: Generate PDF from HTML with Mermaid diagrams
- JSON data: `html`, `textDiagrams`, `diagramPngs`, `videoFileName`
### Authentication Endpoints (if enabled)
- **GET /api/auth-test**: Verify authentication status
## Configuration Files
### Backend Configuration
- **backend/.env**: Environment variables
```
GOOGLE_API_KEY=your_api_key
```
### Frontend Configuration
- **frontend/.env**: React environment variables
```
REACT_APP_DISABLE_AUTH=true # Optional: disable auth for local dev
```
- **frontend/public/config.js**: Production configuration (committed to git)
- **frontend/public/config.local.js**: Local development override (not committed)
### Key Configuration Details
- **Parallel Processing**: Max 2 concurrent videos (App.js:245)
- **Rate Limiting**: 2-second delay between API calls (video_processor.py:224)
- **File Size Threshold**: 10MB for inline vs upload API (video_processor.py:167)
- **Video Chunk Duration**: 54 minutes (video_splitter.py)
## Usage
### Local Development
1. Start backend: `cd backend && source venv/bin/activate && python3 run.py`
2. Start frontend: `cd frontend && npm start`
3. Open: http://localhost:3000
### Processing Videos
The application supports two processing modes, selected via a toggle at the top:
#### Single Video Mode (Default)
Process videos individually with per-video control:
1. **Select Mode**: Click "Single Video Mode" button at the top
2. **Upload**: Drag & drop videos or click to select files
3. **Choose Prompt**: Select a prompt template or write custom prompt
4. **Process Each Video**: Click individual "Process Video" button for each uploaded video
- Videos show status: uploading → uploaded → processing → completed
- Up to 2 videos process in parallel automatically
5. **Monitor Progress**: Watch real-time status updates and processing indicators
6. **Manage Queue**: Use Stop (⏸️), Retry (🔄), or Remove (🗑️) per video
7. **View Results**: Completed videos appear in "Processed Videos" section
8. **Download**: Click "Download PDF" or "Copy Formatted" for any result
#### Batch Mode
Process multiple related videos as one unified analysis:
1. **Select Mode**: Click "Batch Mode" button at the top
2. **Upload Videos**: Add multiple related videos (max 10 per batch)
3. **Arrange Videos**: Use Up/Down arrows to reorder videos in logical sequence
4. **Remove Unwanted**: Click Remove button to exclude videos from batch
5. **Choose Prompt**: Select or customize the analysis prompt
6. **Process Batch**: Click single "Process Batch" button to analyze all videos together
- Backend automatically handles video splitting and chunking
- Two-stage synthesis creates unified result across all videos
- Multiple Mermaid diagrams merged into one comprehensive diagram
7. **View Results**: Single unified result appears for entire batch
8. **Download**: Generate PDF with combined analysis
**Key Differences**:
- **Single Mode**: Each video = separate result, manual per-video processing
- **Batch Mode**: All videos = one unified result, single batch processing
- **Explicit Control**: No auto-processing - all require button clicks
### Processing Long Videos
- Videos > 54 minutes automatically split into chunks
- Each chunk processed in parallel (backend handles this)
- Results intelligently combined
- Processing time displayed for transparency
## Development Utilities
- **restart.sh**: Quick development environment restart
- **backend/test_*.py**: API testing and validation scripts
- **backend/run.py**: Production server with optimized settings for large uploads
- **extract_user_logs*.sh**: Usage analytics extraction
## Security Features
- Azure AD B2C integration with JWT validation (optional)
- CORS protection with specific origin allowlisting
- Secure file upload validation
- Temporary file cleanup
- Token expiration handling
- Rate limiting to prevent API abuse
- Abort signal support for cancellation
## Troubleshooting
### Backend Issues
- **400 INVALID_ARGUMENT**: Usually rate limiting - check logs for details
- **File upload errors**: Verify ffmpeg installed (`which ffprobe`)
- **PDF generation fails**: Ensure wkhtmltopdf installed
### Frontend Issues
- **CORS errors**: Check backend CORS settings in app.py
- **Changes not visible**: Clear browser cache (Ctrl+Shift+R)
- **Config not loading**: Verify config.js and config.local.js exist in public/
### Rate Limiting
- Backend: 2-second delay between API calls (automatic)
- Frontend: Max 2 parallel videos
- Free tier: 5 RPM limit enforced by Gemini API
## License
This project is proprietary and confidential.