veo3/README.md
2025-11-15 06:07:15 +05:30

511 lines
No EOL
20 KiB
Markdown

# Veo 3.1 Video Generator
A full-stack web application for generating AI videos using Google's Veo 3.1 models. Generate videos from text prompts with advanced features including frame interpolation, reference images, and customizable parameters.
## Quick Start
```bash
# Clone and navigate to project
cd veo3_poc
# Ensure service-account.json is in the root directory
# Run development servers (both frontend and backend)
./run-dev.sh
# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:7394
```
## Architecture
- **Frontend**: React 18 + Vite + Material-UI 5 + Montserrat typography
- **Backend**: Flask 3.0 + Google Gen AI SDK 1.47.0
- **Authentication**: Microsoft Azure AD SSO (MSAL 2.0)
- **Storage**: Google Cloud Storage for temporary video and image files
- **Deployment**: Systemd service + Apache reverse proxy
## Features
### Core Video Generation
- **Text-to-Video Generation**: Create videos from descriptive text prompts
- **Image-to-Video Generation**: Upload first frame images to guide video generation
- **Quad Model Support**: Choose between four models:
- **Veo 3.1** (Standard): High-quality with advanced features - $0.40/sec
- **Veo 3.1 Fast**: Optimized speed with frame interpolation - $0.15/sec
### Veo 3.1 Advanced Features
- **Frame Interpolation**: Upload both first and last frames to generate smooth transitions between them (8-second videos only)
- **Reference Images**: Guide video content with up to 3 reference images for consistent characters, objects, or styles (16:9 aspect ratio, 8-second videos, Standard model only)
- **Conditional UI**: Advanced features automatically appear/disappear based on selected model capabilities
### Job Management
- **Multi-Video Generation**: Generate 1-4 videos per request with batch processing
- **Unlimited Job Queue**: Submit unlimited video generation jobs with FIFO processing
- **Advanced Job Management**: Cancel, retry, and delete jobs with complete cleanup
- **Real-time Queue Visualization**: Live status updates with three-section queue display
### Customizable Parameters
- Video length (4, 6, or 8 seconds)
- Aspect ratio (16:9 landscape or 9:16 portrait)
- Person generation policy (allow/don't allow)
- Custom seed values for reproducible results
- Audio generation toggle
### Additional Features
- **Intelligent File Management**: Auto-cleanup after download, comprehensive GCS cleanup
- **Usage Tracking**: Webhook integration for monitoring generation requests
- **Development Mode**: Local development with authentication bypass
## Prerequisites
- Python 3.13+ (or 3.8+)
- Node.js 16+
- Google Cloud Project with Veo 3.1 API access
- Google Cloud Storage bucket
- Service account JSON key with appropriate permissions
- Microsoft Azure AD application configured (for production SSO)
## Setup Instructions
### Backend Setup
1. Navigate to the backend directory:
```bash
cd backend
```
2. Create and activate virtual environment:
```bash
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Configure environment variables:
```bash
# For development
cp .env.development .env
# For production
cp .env.production .env
# Edit .env with your specific configuration if needed
```
5. Run in development:
```bash
python app.py
```
### Frontend Setup
1. Navigate to the frontend directory:
```bash
cd frontend
```
2. Install dependencies:
```bash
npm install
```
3. Configure environment variables:
```bash
# For development
cp .env.development .env
# For production
cp .env.production .env
# Edit .env with your specific configuration if needed
```
4. Run in development:
```bash
npm run dev
```
5. Build for production:
```bash
npm run build
```
## Production Deployment
### Backend Deployment (systemd service)
1. Copy the backend files to your server
2. Update paths in `veo-video-generator.service`
3. Copy service file:
```bash
sudo cp veo-video-generator.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable veo-video-generator
sudo systemctl start veo-video-generator
```
### Frontend Deployment
1. Build the frontend:
```bash
cd frontend
npm run build
```
2. Copy `dist/` contents to your web server directory:
```bash
cp -r dist/* /path/to/your/web/server/veo/
```
### Apache Configuration
Add the Apache configuration to your virtual host. Update paths as needed.
#### Required Apache Modules
Ensure these modules are enabled:
```bash
sudo a2enmod proxy
sudo a2enmod proxy_http
sudo a2enmod rewrite
sudo a2enmod headers
sudo a2enmod expires
sudo systemctl restart apache2
```
#### Configuration Files
1. **Main Apache Config**: Use `apache.conf` for virtual host configuration
2. **Frontend .htaccess**: Copy `apache-htaccess.txt` to `/path/to/your/web/server/veo/.htaccess`
## Project Structure
```
veo3_poc/
├── backend/ # Flask backend application
│ ├── routes/ # API and health check endpoints
│ │ ├── api.py # Main API routes (generate, status, download, cleanup)
│ │ └── health.py # Health check endpoints
│ ├── utils/ # Utility modules
│ │ ├── auth.py # Google Cloud authentication
│ │ └── storage.py # GCS operations and image processing
│ ├── app.py # Flask app initialization and CORS config
│ ├── config.py # Configuration management
│ ├── video_generator.py # Core 3.1 integration logic
│ ├── requirements.txt # Python dependencies
│ ├── .env.development # Development environment config
│ ├── .env.production # Production environment config
│ └── temp_downloads/ # Temporary video storage
├── frontend/ # React frontend application
│ ├── src/
│ │ ├── components/ # React components
│ │ │ ├── VideoForm.jsx # Main video generation form
│ │ │ ├── VideoGenerator.jsx # Top-level container
│ │ │ ├── ProgressIndicator.jsx # Status display
│ │ │ ├── Layout.jsx # App layout wrapper
│ │ │ ├── AuthGuard.jsx # Authentication wrapper
│ │ │ └── DevAuthWrapper.jsx # Dev mode auth bypass
│ │ ├── config/ # MSAL configuration
│ │ ├── services/ # API service layer
│ │ ├── hooks/ # Custom React hooks
│ │ └── App.jsx # Main app component
│ ├── .env.development # Development environment config
│ ├── .env.production # Production environment config
│ └── package.json # Node.js dependencies
├── service-account.json # Google Cloud service account key
├── run-dev.sh # Development startup script
├── apache.conf # Apache virtual host configuration
├── apache-htaccess.txt # Frontend .htaccess rules
└── veo-video-generator.service # Systemd service definition
```
## Configuration
### Environment File Structure
The application uses environment-specific configuration files:
**Backend:**
- `.env.development` - Debug mode, localhost CORS, development settings
- `.env.production` - Production mode, strict CORS, optimized for deployment
- `.env` - Active environment file (copy from development or production)
**Frontend:**
- `.env.development` - Localhost API, authentication bypass (`VITE_DEV_MODE=true`)
- `.env.production` - Production API, MSAL authentication enabled (`VITE_DEV_MODE=false`)
- `.env` - Active environment file (copy from development or production)
### Backend Environment Variables
| Variable | Description | Default/Example |
|----------|-------------|-----------------|
| `PROJECT_ID` | Google Cloud project ID | `optical-414516` |
| `REGION` | Google Cloud region | `us-central1` |
| `MODEL_ID` | Default Veo model identifier | `veo-3.0-generate-preview` |
| `MODEL_FAST_ID` | Default Veo Fast model identifier | `veo-3.0-fast-generate-preview` |
| `OUTPUT_GCS_BUCKET_NAME` | GCS bucket for temporary storage | `optical-veo3-test` |
| `SERVICE_ACCOUNT_KEY_PATH` | Path to service account JSON | `./service-account.json` |
| `PORT` | Backend server port | `7394` |
| `FLASK_ENV` | Environment mode | `development` or `production` |
| `FLASK_DEBUG` | Debug mode | `True` or `False` |
| `FRONTEND_URL` | Frontend URL for CORS | `http://localhost:3000` or production URL |
| `WEBHOOK_URL` | Usage tracking webhook URL | Optional |
| `WEBHOOK_ENABLED` | Enable usage tracking | `true` or `false` |
**Available Models:**
- `veo-3.1-generate-preview` - Veo 3.1 Standard (with advanced features)
- `veo-3.1-fast-generate-preview` - Veo 3.1 Fast (frame interpolation only)
### Frontend Environment Variables
| Variable | Description | Example |
|----------|-------------|---------|
| `VITE_API_BASE_URL` | Backend API URL | `http://localhost:7394` |
| `VITE_APP_TITLE` | Application title | `Veo Video Generator (Dev)` |
| `VITE_DEV_MODE` | Development mode flag | `true` or `false` |
| `VITE_MSAL_CLIENT_ID` | Azure AD client ID | `dd434534-...` |
| `VITE_MSAL_AUTHORITY` | Azure AD authority URL | `https://login.microsoftonline.com/...` |
| `VITE_MSAL_REDIRECT_URI` | Authentication redirect URI | `http://localhost:3000` |
## Key Dependencies
### Backend
- `flask==3.0.0` - Web framework
- `flask-cors==4.0.0` - Cross-origin resource sharing
- `google-genai==1.47.0` - Google Gen AI SDK for 3.1 (with advanced features support)
- `google-cloud-storage==2.12.0` - GCS file operations
- `google-cloud-aiplatform==1.38.0` - Vertex AI platform
- `hypercorn==0.15.0` - ASGI server for production
- `python-dotenv==1.0.0` - Environment configuration
- `Pillow==10.1.0` - Image processing and format conversion
### Frontend
- `react==18.2.0` - UI framework
- `@mui/material==5.15.1` - Material-UI component library
- `@azure/msal-react==2.0.7` - Microsoft authentication
- `axios==1.6.2` - HTTP client
- `vite==5.0.8` - Build tool and dev server
- `@fontsource/montserrat==5.0.16` - Typography
## API Endpoints
### Main API Routes (`/api`)
| Method | Endpoint | Description | Request Body |
|--------|----------|-------------|--------------|
| `POST` | `/api/generate` | Start video generation | `{ prompt, model_name, video_length_sec, aspect_ratio, person_generation, sampleCount, seed, generate_audio, image, lastFrame, referenceImage1, referenceImage2, referenceImage3 }` |
| `GET` | `/api/status/<job_id>` | Check generation status | - |
| `GET` | `/api/download/<job_id>` | Download completed content (auto-deletes job) | - |
| `GET` | `/api/download/<job_id>/video/<index>` | Download individual video | - |
| `GET` | `/api/user-jobs` | Get all jobs for user | Query: `user_email` |
| `GET` | `/api/queue-status` | Get overall queue status | - |
| `POST` | `/api/cancel/<job_id>` | Cancel queued/processing job | - |
| `POST` | `/api/retry/<job_id>` | Retry failed/cancelled job | - |
| `DELETE` | `/api/delete/<job_id>` | Delete job completely | - |
| `DELETE` | `/api/cleanup/<job_id>` | Manual cleanup of temp files | - |
**Veo 3.1 Image Parameters:**
- `image` - First frame image (optional, all models)
- `lastFrame` - Last frame image for interpolation (optional, Veo 3.1 only, requires 8-second duration)
- `referenceImage1`, `referenceImage2`, `referenceImage3` - Reference images for content guidance (optional, Veo 3.1 Standard only, requires 16:9 aspect ratio and 8-second duration)
### Health Check Routes
| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/health` | Detailed health check with configuration info |
| `GET` | `/ping` | Simple ping response |
## Video Generation Lifecycle
### Job Submission & Queuing
1. **User Input**: User provides prompt, optional images (first frame, last frame, reference images), and generation parameters (1-4 videos)
2. **Job Creation**: Backend creates unique job ID and validates parameters including Veo 3.1 feature constraints
3. **Queue Management**: Job added to global FIFO queue (unlimited per user)
4. **Queue Position**: Job displayed in "In Queue" section with position indicator
### Processing Pipeline
5. **Queue Processing**: Background thread picks next job when processing slot available (max 2 concurrent)
6. **Status Transition**: Job moves from "In Queue" to "Currently Processing" section
7. **Image Processing** (if provided):
- First frame: Validated, converted to JPEG, uploaded to GCS (all models)
- Last frame: Processed for frame interpolation (Veo 3.1 only)
- Reference images: Up to 3 images processed for content guidance (Veo 3.1 Standard only)
8. **API Calls**: Multiple requests sent to Google Gen AI SDK with appropriate parameters for selected model
9. **Backend Polling**: Long-running operations polled every 30 seconds with retry logic
10. **Progress Updates**: Frontend polls status every 2 seconds for real-time updates
### Completion & Cleanup
11. **Video Download**: Completed videos downloaded from GCS to local temp storage
12. **File Packaging**: Multiple videos and images packaged into downloadable zip
13. **User Download**: Videos served to user with multiple download options
14. **Auto-cleanup**: Job automatically deleted 5 seconds after successful download
### Job Management Actions
- **Cancel**: Remove from queue or stop active processing
- **Retry**: Re-queue failed/cancelled jobs with original parameters
- **Delete**: Complete removal of job data, local files, and GCS resources
- **Download Options**: Individual videos or complete zip package
## Security
- CORS configured for specific frontend domain(s)
- Azure AD SSO authentication in production (bypassed in dev mode)
- Automatic cleanup of temporary files after download
- Service account with minimal required GCS permissions
- Secure headers in Apache configuration
- Backend service runs as non-root user in production
## Monitoring and Logging
### Backend Logs
```bash
# View systemd service logs (production)
sudo journalctl -u veo-video-generator -f
# View Flask app logs (development)
# Logs printed to terminal running app.py
```
### Frontend Logs
- Browser console for React errors
- Network tab for API request/response debugging
- Apache access logs: `/var/log/apache2/access.log`
### Usage Tracking
- Webhook integration sends generation requests to configured endpoint
- Tracks: user email, prompt, model, timestamp
- Can be disabled via `WEBHOOK_ENABLED=false`
## Troubleshooting
### Common Issues
| Issue | Possible Cause | Solution |
|-------|----------------|----------|
| **Authentication fails** | Azure AD misconfiguration | Verify `VITE_MSAL_CLIENT_ID`, `VITE_MSAL_AUTHORITY`, and redirect URIs match Azure AD app |
| **Backend connection error** | Service not running or CORS issue | Check `systemctl status veo-video-generator` and `FRONTEND_URL` in backend `.env` |
| **Video generation fails** | Invalid credentials or API access | Verify service account permissions and Veo 3.1 APIs are enabled in GCP |
| **Image upload rejected** | Invalid format or size | Ensure image is <10MB and meets minimum 720x720 resolution |
| **Download hangs** | GCS permission issue | Check service account has `storage.objects.get` permission on bucket |
| **Model not found** | Wrong region or model ID | Verify Veo 3.1 is available in specified `REGION` |
| **Reference images fail** | Wrong model or constraints | Reference images require Veo 3.1 Standard model, 16:9 aspect ratio, and 8-second duration |
| **Last frame fails** | Wrong constraints | Last frame interpolation requires Veo 3.1 model (Standard or Fast) and 8-second duration |
| **SDK parameter error** | Outdated SDK version | Ensure `google-genai>=1.47.0` is installed for Veo 3.1 features |
### Veo 3.1 Feature Requirements
**Frame Interpolation (Last Frame):**
- ✅ Supported models: `veo-3.1-generate-preview`, `veo-3.1-fast-generate-preview`
- ✅ Required duration: 8 seconds
- ✅ Supported aspect ratios: 16:9, 9:16
**Reference Images:**
- ✅ Supported model: `veo-3.1-generate-preview` (Standard only, NOT Fast)
- ✅ Required duration: 8 seconds
- ✅ Required aspect ratio: 16:9 only
- ✅ Maximum images: 3 reference images
- ❌ Not supported in: Veo 3.1 Fast
### Debug Mode
Enable detailed logging in development:
```bash
# Backend
FLASK_DEBUG=True in .env
# Frontend
Check browser console with React DevTools
```
## Development
### Local Development Setup
For local testing without authentication:
1. **Quick Start** (runs both backend and frontend):
```bash
./run-dev.sh
```
2. **Manual Start**:
**Backend** (Terminal 1):
```bash
cd backend
cp .env.development .env
python app.py
```
**Frontend** (Terminal 2):
```bash
cd frontend
npm run dev
```
### Development Features
- **Authentication Bypass**: MSAL/SSO automatically bypassed when `VITE_DEV_MODE=true`
- **CORS**: Configured for `localhost:3000` and `127.0.0.1:3000`
- **Hot Reload**: Vite dev server auto-reloads frontend on file changes
- **Debug Mode**: Flask runs with detailed error pages and auto-reload
- **Mock User**: Shows "Dev User" in the interface header
### Development URLs
- Backend API: `http://localhost:7394`
- Frontend: `http://localhost:3000`
- No authentication required in dev mode
## Additional Files
- **`user_docs.md`**: Comprehensive user documentation and feature guide
- **`CLAUDE.md`**: AI assistant guidance for working with this codebase
- **`extract_usage_logs.sh`**: Script for extracting usage data from webhook logs
- **`veo3.zip`**: Archive of production deployment artifacts
- **`.gitignore`**: Git exclusions (includes `.env`, `node_modules`, `temp_downloads`, etc.)
## Video Generation Architecture
### Job Queue System
- **Global Queue**: FIFO processing with unlimited submissions per user
- **Concurrent Processing**: Maximum 2 jobs processing simultaneously
- **Status Tracking**: In-memory job status dictionary (consider Redis for scaling)
- **User Limits**: No queue limits, but 1-4 videos per individual request
### Queue Display Sections
1. **Currently Processing**: Jobs actively generating videos (highlighted in blue)
2. **In Queue**: Jobs waiting for processing slots (highlighted in orange)
3. **History**: Completed, failed, or cancelled jobs (standard styling)
### File Management
- **Local Storage**: `temp_downloads/job_{job_id}/` for each job
- **GCS Integration**: Temporary images uploaded to `temp_images/` bucket path
- **Auto-cleanup**: Jobs deleted 5 seconds after successful download
- **Manual Cleanup**: Complete job deletion via delete button
- **Download Formats**: Individual MP4s or complete ZIP packages
### Job Actions by Status
- **Queued**: Cancel, Delete
- **Processing**: Cancel, Delete
- **Failed/Cancelled**: Retry, Delete
- **Completed**: Download All, Download Individual Videos (auto-deletes after download)
## Notes
- The original `veo.py` standalone script has been replaced by the full-stack application
- **Quad model support**: Veo 3.1 (Standard & Fast)
- **Veo 3.1 advanced features**: Frame interpolation and reference images with conditional UI
- Multi-video generation support (1-4 videos per request)
- Unlimited job submissions with intelligent queue management
- Complete job lifecycle management with cancel/retry/delete functionality
- Generated videos are automatically cleaned up after download
- Image uploads are automatically converted to JPEG format regardless of input format
- The application uses in-memory job status tracking (consider Redis for production scaling)
- SDK upgraded to `google-genai==1.47.0` for Veo 3.1 feature support