398 lines
15 KiB
Markdown
398 lines
15 KiB
Markdown
# Video Query Tool
|
|
|
|
A full-stack web application that processes videos using Google's Gemini AI model with dual processing modes (Single Video and Batch). Features intelligent two-stage synthesis, automatic video splitting, Azure AD B2C authentication, chunked file uploads up to 5GB, PDF generation with merged Mermaid diagrams, and comprehensive usage tracking.
|
|
|
|
## Features
|
|
|
|
### Core Functionality
|
|
- **Dual Processing Modes**:
|
|
- **Single Video Mode**: Process videos individually with per-video control
|
|
- **Batch Mode**: Combine multiple related videos (up to 10) for unified analysis
|
|
- **Intelligent AI Synthesis**: Two-stage processing ensures seamless results
|
|
- Stage 1: Each video/chunk → concise summary
|
|
- Stage 2: All summaries → unified cohesive result
|
|
- **Video Processing**: Upload and analyze using Google Gemini 2.0 Flash Exp AI model
|
|
- **Prompt Templates**:
|
|
- Meeting Summary
|
|
- Process/Tool Documentation
|
|
- Process Documentation with Mermaid Charts
|
|
- Custom Prompts
|
|
- **Large File Support**: Chunked upload system supporting files up to 5GB per file
|
|
- **PDF Generation**: Convert results to PDF with embedded Mermaid diagrams
|
|
- **Authentication**: Azure AD B2C integration (optional, controlled via .env)
|
|
- **Parallel Processing**: Process up to 2 videos simultaneously (single mode)
|
|
- **Long Video Support**: Automatic splitting and parallel chunk processing for videos > 54 minutes
|
|
|
|
### Technical Features
|
|
- **Explicit User Control**: No auto-processing - all videos require explicit "Process" button click
|
|
- **Batch Video Management**: Reorder, arrange, and remove videos before processing
|
|
- **Smart Diagram Merging**: Multiple Mermaid diagrams intelligently combined into one
|
|
- **Persistent Mode Selection**: Processing mode and batch queue persist across page refreshes
|
|
- **Multiple File Queue**: Upload multiple videos, manage queue (Stop, Retry, Remove)
|
|
- **Drag & Drop Upload**: Modern file upload interface with progress tracking
|
|
- **Real-time Status**: Live status updates (uploading → uploaded → processing → completed)
|
|
- **Queue Management**: Stop, retry, or remove videos from processing queue anytime
|
|
- **Automatic Video Splitting**: Videos > 54 minutes automatically split into 54-min chunks
|
|
- **Rate Limiting**: Built-in API rate limiting (2-second delay) to prevent quota errors
|
|
- **Error Handling**: Comprehensive error handling with retry capability
|
|
- **Processing Time Display**: Shows processing duration for each completed video/batch
|
|
- **Usage Analytics**: Automated tracking via webhook integration
|
|
- **Production Ready**: Systemd service configuration and deployment scripts
|
|
|
|
## Limitations
|
|
|
|
- **Video Length**: No limit - videos automatically split into 54-minute chunks
|
|
- **Single Chunk Limit**: Individual chunks must be under 55 minutes (handled automatically)
|
|
- **File Size**: Application supports uploads up to 5GB per file
|
|
- **Supported Formats**: MP4, AVI, MOV, WMV, MKV, WEBM
|
|
- **Parallel Processing**: Max 2 videos simultaneously in single mode (rate limit protection)
|
|
- **Batch Size**: Maximum 10 videos per batch processing session
|
|
- **API Rate Limits**: Gemini free tier: 5 RPM (built-in 2s delay between calls)
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
video_query/
|
|
├── backend/ # Flask/Hypercorn API server
|
|
│ ├── app.py # Main Flask application with PDF generation
|
|
│ ├── video_processor.py # Gemini API integration, parallel processing, rate limiting
|
|
│ ├── video_splitter.py # Video splitting for long videos (54-min chunks)
|
|
│ ├── auth.py # Azure AD B2C authentication handlers
|
|
│ ├── chunked_upload.py # Chunked file upload Blueprint
|
|
│ ├── run.py # Hypercorn production server
|
|
│ ├── requirements.txt # Python dependencies
|
|
│ ├── .env # Environment variables (GOOGLE_API_KEY)
|
|
│ └── test_*.py # API testing utilities
|
|
├── frontend/ # React SPA
|
|
│ ├── src/
|
|
│ │ ├── components/ # React components
|
|
│ │ │ ├── VideoUpload.js # Multi-file drag & drop upload
|
|
│ │ │ ├── PromptSelector.js # Mode selection and prompt editing
|
|
│ │ │ ├── ResultDisplay.js # Results with PDF generation
|
|
│ │ │ ├── AuthenticatedContent.js # Queue management, processed list
|
|
│ │ │ └── Login.js # Authentication interface
|
|
│ │ ├── auth/ # Authentication utilities
|
|
│ │ │ ├── authConfig.js # Azure AD B2C configuration
|
|
│ │ │ ├── AuthProvider.js # MSAL React provider
|
|
│ │ │ └── authApiClient.js # Authenticated API client
|
|
│ │ └── utils/
|
|
│ │ ├── chunkedUploader.js # Large file upload handler
|
|
│ │ ├── configLoader.js # Dynamic config loading
|
|
│ │ └── pathUtils.js # Path utilities
|
|
│ ├── public/
|
|
│ │ ├── config.js # Production config (committed)
|
|
│ │ ├── config.local.js # Local dev config (not committed)
|
|
│ │ └── index.html # Loads both configs
|
|
│ ├── package.json # Node.js dependencies
|
|
│ ├── .env # Frontend environment variables
|
|
│ └── build/ # Production build output
|
|
├── DEPLOYMENT.md # Production deployment instructions
|
|
├── LOG_EXTRACTION_README.md # Usage analytics documentation
|
|
├── CLAUDE.md # Development guidelines and build commands
|
|
├── restart.sh # Development restart script
|
|
├── quick_extract.sh # Log extraction utility
|
|
└── extract_user_logs*.sh # Advanced log processing
|
|
```
|
|
|
|
## Dependencies
|
|
|
|
### Backend Dependencies
|
|
- **Flask 3.1.0**: Web framework
|
|
- **google-genai 1.45.0**: Gemini AI SDK (updated API)
|
|
- **Hypercorn 0.17.3**: ASGI production server
|
|
- **python-jose**: JWT token validation for Azure AD
|
|
- **flask-cors 5.0.1**: Cross-origin resource sharing
|
|
- **pdfkit 1.0.0**: PDF generation from HTML
|
|
- **cairosvg 2.8.0**: SVG to PNG conversion for diagrams
|
|
- **Pillow 11.2.1**: Image processing
|
|
- **python-dotenv 1.1.0**: Environment variable management
|
|
- **ffmpeg-python**: Video splitting functionality
|
|
|
|
### Frontend Dependencies
|
|
- **React 18.2.0**: UI framework
|
|
- **@azure/msal-react 3.0.12**: Microsoft Authentication Library
|
|
- **axios 1.6.0**: HTTP client with abort signal support
|
|
- **bootstrap 5.3.2**: UI components and styling
|
|
- **mermaid 11.6.0**: Diagram generation
|
|
- **react-dropzone 14.2.3**: Multi-file upload interface
|
|
- **showdown 2.1.0**: Markdown to HTML conversion
|
|
|
|
## Setup Instructions
|
|
|
|
### Prerequisites
|
|
- Python 3.8+
|
|
- Node.js 16+
|
|
- Google Cloud API key with Gemini access
|
|
- Azure AD B2C tenant (optional, for authentication)
|
|
- wkhtmltopdf (for PDF generation)
|
|
- ffmpeg/ffprobe (for video splitting)
|
|
|
|
### Backend Setup
|
|
|
|
1. **Create and activate virtual environment**:
|
|
```bash
|
|
cd backend
|
|
python3 -m venv venv
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
```
|
|
|
|
2. **Install dependencies**:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
3. **Set up environment variables** (create `backend/.env`):
|
|
```bash
|
|
GOOGLE_API_KEY=your_gemini_api_key_here
|
|
```
|
|
|
|
4. **Install system dependencies**:
|
|
```bash
|
|
# Ubuntu/Debian:
|
|
sudo apt-get install wkhtmltopdf python3-cairo libcairo2-dev ffmpeg
|
|
|
|
# macOS:
|
|
brew install cairo wkhtmltopdf ffmpeg
|
|
```
|
|
|
|
5. **Start development server**:
|
|
```bash
|
|
python3 run.py
|
|
# Server runs on http://0.0.0.0:5010
|
|
```
|
|
|
|
### Frontend Setup
|
|
|
|
1. **Install Node.js dependencies**:
|
|
```bash
|
|
cd frontend
|
|
npm install
|
|
```
|
|
|
|
2. **Configure authentication** (optional):
|
|
- Edit `frontend/.env`:
|
|
```
|
|
REACT_APP_DISABLE_AUTH=true # Disable auth for local dev
|
|
```
|
|
- For production, update `src/auth/authConfig.js` with Azure AD B2C details
|
|
|
|
3. **Configure backend URL for local development**:
|
|
- File `frontend/public/config.local.js` already configured for localhost:5010
|
|
- This file is not committed (in .gitignore)
|
|
|
|
4. **Start development server**:
|
|
```bash
|
|
npm start
|
|
# Server runs on http://localhost:3000
|
|
```
|
|
|
|
## Production Deployment
|
|
|
|
### System Requirements
|
|
- Ubuntu/CentOS server
|
|
- Apache/Nginx web server
|
|
- Python 3.8+ with virtual environment
|
|
- wkhtmltopdf system package
|
|
- ffmpeg/ffprobe for video processing
|
|
- Node.js for building frontend
|
|
|
|
### Backend Deployment
|
|
|
|
1. **Install system packages**:
|
|
```bash
|
|
sudo apt-get update
|
|
sudo apt-get install -y wkhtmltopdf python3-cairo python3-pil libcairo2-dev ffmpeg
|
|
```
|
|
|
|
2. **Set up virtual environment and install dependencies**:
|
|
```bash
|
|
cd backend
|
|
python3 -m venv venv
|
|
source venv/bin/activate
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
3. **Create production .env file**:
|
|
```bash
|
|
echo "GOOGLE_API_KEY=your_production_api_key" > .env
|
|
```
|
|
|
|
4. **Create systemd service** (see `backend/video-query.service`):
|
|
```bash
|
|
sudo cp backend/video-query.service /etc/systemd/system/
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable video-query
|
|
sudo systemctl start video-query
|
|
```
|
|
|
|
### Frontend Deployment
|
|
|
|
1. **Update production config** (`frontend/public/config.js`):
|
|
```javascript
|
|
window.__APP_CONFIG__ = {
|
|
"basePath": "/video-query",
|
|
"domain": "https://your-domain.com",
|
|
"api": {
|
|
"videoProcessingEndpoint": "https://your-domain.com/video_query_back/api/process",
|
|
"chunkedUploadEndpoint": "https://your-domain.com/video_query_back"
|
|
}
|
|
};
|
|
```
|
|
|
|
2. **Build for production**:
|
|
```bash
|
|
cd frontend
|
|
npm run build
|
|
```
|
|
|
|
3. **Deploy to web server**:
|
|
```bash
|
|
sudo cp -r build/* /var/www/html/video-query/
|
|
```
|
|
|
|
4. **Configure web server** (Apache example):
|
|
```apache
|
|
<VirtualHost *:443>
|
|
DocumentRoot /var/www/html
|
|
|
|
# Frontend
|
|
Alias /video-query /var/www/html/video-query
|
|
|
|
# Backend proxy
|
|
ProxyPass /video_query_back http://localhost:5010
|
|
ProxyPassReverse /video_query_back http://localhost:5010
|
|
</VirtualHost>
|
|
```
|
|
|
|
## API Reference
|
|
|
|
### Video Processing Endpoints
|
|
- **POST /api/process**: Single video processing endpoint
|
|
- Accepts JSON: `file_path`, `filename`, `prompt` (for chunked uploads)
|
|
- Returns: Processing result with content, processing time, chunks info
|
|
- **POST /api/process-batch**: Batch video processing endpoint
|
|
- Accepts JSON: `videos` (array of {file_path, filename, order}), `prompt`, `batch_id`
|
|
- Returns: Unified result for all videos, total chunks processed
|
|
- Maximum 10 videos per batch
|
|
|
|
### Chunked Upload Endpoints
|
|
- **POST /api/init-upload**: Initialize chunked upload session
|
|
- **POST /api/upload-chunk/<upload_id>**: Upload file chunk
|
|
- **POST /api/complete-upload/<upload_id>**: Mark upload complete
|
|
- **POST /api/cancel-upload/<upload_id>**: Cancel upload
|
|
|
|
### PDF Generation Endpoints
|
|
- **POST /api/generate-pdf**: Generate PDF from HTML with Mermaid diagrams
|
|
- JSON data: `html`, `textDiagrams`, `diagramPngs`, `videoFileName`
|
|
|
|
### Authentication Endpoints (if enabled)
|
|
- **GET /api/auth-test**: Verify authentication status
|
|
|
|
## Configuration Files
|
|
|
|
### Backend Configuration
|
|
- **backend/.env**: Environment variables
|
|
```
|
|
GOOGLE_API_KEY=your_api_key
|
|
```
|
|
|
|
### Frontend Configuration
|
|
- **frontend/.env**: React environment variables
|
|
```
|
|
REACT_APP_DISABLE_AUTH=true # Optional: disable auth for local dev
|
|
```
|
|
- **frontend/public/config.js**: Production configuration (committed to git)
|
|
- **frontend/public/config.local.js**: Local development override (not committed)
|
|
|
|
### Key Configuration Details
|
|
- **Parallel Processing**: Max 2 concurrent videos (App.js:245)
|
|
- **Rate Limiting**: 2-second delay between API calls (video_processor.py:224)
|
|
- **File Size Threshold**: 10MB for inline vs upload API (video_processor.py:167)
|
|
- **Video Chunk Duration**: 54 minutes (video_splitter.py)
|
|
|
|
## Usage
|
|
|
|
### Local Development
|
|
1. Start backend: `cd backend && source venv/bin/activate && python3 run.py`
|
|
2. Start frontend: `cd frontend && npm start`
|
|
3. Open: http://localhost:3000
|
|
|
|
### Processing Videos
|
|
|
|
The application supports two processing modes, selected via a toggle at the top:
|
|
|
|
#### Single Video Mode (Default)
|
|
Process videos individually with per-video control:
|
|
1. **Select Mode**: Click "Single Video Mode" button at the top
|
|
2. **Upload**: Drag & drop videos or click to select files
|
|
3. **Choose Prompt**: Select a prompt template or write custom prompt
|
|
4. **Process Each Video**: Click individual "Process Video" button for each uploaded video
|
|
- Videos show status: uploading → uploaded → processing → completed
|
|
- Up to 2 videos process in parallel automatically
|
|
5. **Monitor Progress**: Watch real-time status updates and processing indicators
|
|
6. **Manage Queue**: Use Stop (⏸️), Retry (🔄), or Remove (🗑️) per video
|
|
7. **View Results**: Completed videos appear in "Processed Videos" section
|
|
8. **Download**: Click "Download PDF" or "Copy Formatted" for any result
|
|
|
|
#### Batch Mode
|
|
Process multiple related videos as one unified analysis:
|
|
1. **Select Mode**: Click "Batch Mode" button at the top
|
|
2. **Upload Videos**: Add multiple related videos (max 10 per batch)
|
|
3. **Arrange Videos**: Use Up/Down arrows to reorder videos in logical sequence
|
|
4. **Remove Unwanted**: Click Remove button to exclude videos from batch
|
|
5. **Choose Prompt**: Select or customize the analysis prompt
|
|
6. **Process Batch**: Click single "Process Batch" button to analyze all videos together
|
|
- Backend automatically handles video splitting and chunking
|
|
- Two-stage synthesis creates unified result across all videos
|
|
- Multiple Mermaid diagrams merged into one comprehensive diagram
|
|
7. **View Results**: Single unified result appears for entire batch
|
|
8. **Download**: Generate PDF with combined analysis
|
|
|
|
**Key Differences**:
|
|
- **Single Mode**: Each video = separate result, manual per-video processing
|
|
- **Batch Mode**: All videos = one unified result, single batch processing
|
|
- **Explicit Control**: No auto-processing - all require button clicks
|
|
|
|
### Processing Long Videos
|
|
- Videos > 54 minutes automatically split into chunks
|
|
- Each chunk processed in parallel (backend handles this)
|
|
- Results intelligently combined
|
|
- Processing time displayed for transparency
|
|
|
|
## Development Utilities
|
|
|
|
- **restart.sh**: Quick development environment restart
|
|
- **backend/test_*.py**: API testing and validation scripts
|
|
- **backend/run.py**: Production server with optimized settings for large uploads
|
|
- **extract_user_logs*.sh**: Usage analytics extraction
|
|
|
|
## Security Features
|
|
|
|
- Azure AD B2C integration with JWT validation (optional)
|
|
- CORS protection with specific origin allowlisting
|
|
- Secure file upload validation
|
|
- Temporary file cleanup
|
|
- Token expiration handling
|
|
- Rate limiting to prevent API abuse
|
|
- Abort signal support for cancellation
|
|
|
|
## Troubleshooting
|
|
|
|
### Backend Issues
|
|
- **400 INVALID_ARGUMENT**: Usually rate limiting - check logs for details
|
|
- **File upload errors**: Verify ffmpeg installed (`which ffprobe`)
|
|
- **PDF generation fails**: Ensure wkhtmltopdf installed
|
|
|
|
### Frontend Issues
|
|
- **CORS errors**: Check backend CORS settings in app.py
|
|
- **Changes not visible**: Clear browser cache (Ctrl+Shift+R)
|
|
- **Config not loading**: Verify config.js and config.local.js exist in public/
|
|
|
|
### Rate Limiting
|
|
- Backend: 2-second delay between API calls (automatic)
|
|
- Frontend: Max 2 parallel videos
|
|
- Free tier: 5 RPM limit enforced by Gemini API
|
|
|
|
## License
|
|
|
|
This project is proprietary and confidential.
|