veo3/user_docs.md
2025-10-11 00:07:33 +05:30

441 lines
No EOL
13 KiB
Markdown

# Veo 3.0 Video Generator - User Documentation
## Overview
The Veo 3.0 Video Generator is a web application that allows users to create AI-generated videos using Google's Veo 3.0 model. The application features a modern React frontend with Material-UI components and a Flask backend that interfaces with Google's GenerativeAI SDK.
## Features
- **Text-to-Video Generation**: Create videos from text prompts using Google's Veo 3.0 model
- **Image-to-Video Generation**: Upload reference images to guide video generation
- **Multi-Video Generation**: Generate 1-4 videos per request with batch processing
- **Unlimited Job Queue**: Submit unlimited video generation jobs with FIFO processing
- **Advanced Job Management**: Cancel, retry, and delete jobs with complete file cleanup
- **Real-time Queue Visualization**: Three-section display (Processing, Queued, History)
- **Customizable Settings**: Control video length (4-8 seconds), aspect ratio, person generation, seed values, and audio
- **Real-time Progress Tracking**: Monitor generation progress with live status updates and queue positions
- **Secure Authentication**: Microsoft Azure AD integration for user authentication
- **Intelligent Downloads**: Multiple download options with automatic cleanup
- **Comprehensive File Management**: Auto-cleanup after download, complete GCS resource management
## System Requirements
### Prerequisites
- Google Cloud Project with Veo 3.0 API access
- Google Cloud Storage bucket for temporary file storage
- Service account with appropriate permissions
- Python 3.8+ and Node.js 16+ for development
- Microsoft Azure AD app registration (for authentication)
### Supported Formats
- **Input Images**: JPEG, PNG, GIF, BMP, TIFF, WebP, ICO (auto-converted to JPEG)
- **Output Videos**: MP4 format (individual files or ZIP packages)
- **Image Size**: Minimum 720x720 pixels, maximum 10MB
- **Video Length**: 4, 6, or 8 seconds
- **Video Count**: 1-4 videos per request
- **Aspect Ratios**: 16:9 (landscape) or 9:16 (portrait)
- **Seeds**: Optional numeric seeds (0-4294967295) for reproducible results
## Getting Started
### 1. Environment Setup
Create a `.env` file in the project root with the following configuration:
```env
# Google Cloud Configuration
PROJECT_ID=your-google-cloud-project-id
REGION=us-central1
MODEL_ID=veo-3.0-generate-preview
OUTPUT_GCS_BUCKET_NAME=your-storage-bucket-name
SERVICE_ACCOUNT_KEY_PATH=./service-account.json
# Flask Configuration
SECRET_KEY=your-secret-key-here
FLASK_ENV=development
FLASK_DEBUG=True
PORT=7394
# Frontend Configuration
FRONTEND_URL=http://localhost:3000
# Webhook Configuration (optional)
WEBHOOK_URL=your-webhook-url
WEBHOOK_ENABLED=false
WEBHOOK_TIMEOUT=10
```
### 2. Google Cloud Setup
1. **Enable APIs**:
- Vertex AI API
- Cloud Storage API
- GenerativeAI API
2. **Create Service Account**:
```bash
gcloud iam service-accounts create veo-video-generator \
--display-name="Veo Video Generator"
```
3. **Grant Permissions**:
```bash
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:veo-video-generator@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:veo-video-generator@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/storage.admin"
```
4. **Download Service Account Key**:
```bash
gcloud iam service-accounts keys create service-account.json \
--iam-account=veo-video-generator@YOUR_PROJECT_ID.iam.gserviceaccount.com
```
### 3. Installation
#### Backend Setup
```bash
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
```
#### Frontend Setup
```bash
cd frontend
npm install
```
### 4. Running the Application
#### Development Mode
Use the provided development script:
```bash
./run-dev.sh
```
This starts both frontend (port 3000) and backend (port 7394) servers.
#### Manual Startup
**Backend**:
```bash
cd backend
source venv/bin/activate
python app.py
```
**Frontend**:
```bash
cd frontend
npm run dev
```
#### Production Mode
```bash
# Backend
cd backend
python app.py
# Frontend (build and serve)
cd frontend
npm run build
# Serve the dist/ folder with your web server
```
## Using the Application
### 1. Authentication
- In development mode, authentication is bypassed with a mock user
- In production, users authenticate via Microsoft Azure AD
- Authenticated users' email addresses are tracked for usage analytics
### 2. Creating Videos
#### Text-to-Video Generation
1. Enter your video prompt in the text area (minimum 10 characters)
2. Adjust settings:
- **Video Length**: 4, 6, or 8 seconds (default: 8 seconds)
- **Number of Videos**: 1-4 videos per request (default: 1)
- **Model**: Veo 3.0 (high-quality) or Veo 3.0 Fast (optimized speed)
- **Aspect Ratio**: 16:9 or 9:16 (default: 16:9)
- **Person Generation**: Allow/don't allow people in videos
- **Seed**: Optional numeric value for reproducible results
- **Audio**: Toggle audio generation on/off
3. Click "Generate Video" or "Add to Queue"
4. Monitor progress in the queue visualization
5. Download when complete (auto-removes from queue)
#### Image-to-Video Generation
1. Upload a reference image (JPEG, PNG, etc.)
2. The system automatically:
- Validates image format and size
- Converts to JPEG format
- Detects aspect ratio and adjusts settings
- Crops if necessary to fit aspect ratio
3. Enter your video prompt
4. Adjust additional settings as needed
5. Generate and download
#### Queue Management
- **Unlimited Submissions**: No limit on number of jobs per user
- **FIFO Processing**: Jobs processed in order of submission
- **Concurrent Limit**: Maximum 2 jobs process simultaneously
- **Real-time Updates**: Queue status updates every 2 seconds
- **Job Actions**: Cancel, retry, or delete jobs based on status
### 3. Queue Visualization & Progress Monitoring
The application displays jobs in three distinct sections:
#### Currently Processing (Blue Highlight)
- Jobs actively generating videos (max 2 concurrent)
- Real-time progress percentage and status updates
- Available actions: Cancel, Delete
#### In Queue (Orange Highlight)
- Jobs waiting for processing slots
- Queue position indicator (e.g., "Queue Position: 3")
- Available actions: Cancel, Delete
#### History (Standard Styling)
- Completed, failed, or cancelled jobs
- Available actions based on status:
- **Completed**: Download All, Download Individual Videos
- **Failed/Cancelled**: Retry, Delete
#### Progress Stages
- **Queued**: Waiting in queue for processing slot
- **Starting** (0%): Initializing request
- **Uploading Image** (5%): Processing reference image
- **Generating** (10%): Submitting to Veo 3.0 API
- **Processing** (20-80%): Video generation in progress
- **Downloading** (90%): Retrieving completed videos
- **Completed** (100%): Ready for download
### 4. Error Handling
Common error scenarios and solutions:
- **Content Filtered**: Prompt violates safety policies - try rewording
- **Image Too Large**: Resize image to under 10MB
- **Invalid Format**: Use supported image formats only
- **Authentication Failed**: Check service account configuration
- **Quota Exceeded**: Check Google Cloud quotas and billing
## API Reference
### Endpoints
#### `POST /api/generate`
Start video generation process.
**Request Body**:
```json
{
"prompt": "A cat playing in a garden",
"video_length_sec": 8,
"aspect_ratio": "16:9",
"person_generation": "dont_allow",
"model_name": "veo-3.0-generate-preview",
"sampleCount": 2,
"seed": 12345,
"generate_audio": true,
"user_email": "user@example.com"
}
```
**With Image** (multipart/form-data):
- `image`: Image file
- Other parameters as form fields
**Response**:
```json
{
"job_id": "uuid-string",
"status": "started"
}
```
#### Job Management Endpoints
**`GET /api/user-jobs?user_email=user@example.com`**
Get all jobs for user.
**`GET /api/queue-status`**
Get overall queue status.
**`POST /api/cancel/{job_id}`**
Cancel queued or processing job.
**`POST /api/retry/{job_id}`**
Retry failed or cancelled job.
**`DELETE /api/delete/{job_id}`**
Completely delete job and all associated files.
#### `GET /api/status/{job_id}`
Check generation status.
**Response**:
```json
{
"status": "processing",
"progress": 45,
"message": "Video generation in progress...",
"video_count": 2,
"videos_requested": 2,
"video_path": null,
"individual_video_paths": [],
"is_zip": true,
"created_at": "2024-01-01T12:00:00Z",
"error": null
}
```
#### Download Endpoints
**`GET /api/download/{job_id}`**
Download completed content (ZIP for multiple videos, MP4 for single).
Automatically deletes job 5 seconds after successful download.
**`GET /api/download/{job_id}/video/{index}`**
Download individual video (index 1-4).
**Response**: MP4 video file, ZIP package, or error message.
#### `DELETE /api/cleanup/{job_id}`
Manually clean up job files (local only).
**Response**:
```json
{
"message": "Files cleaned up successfully"
}
```
## Configuration Reference
### Backend Configuration (`config.py`)
| Variable | Description | Default |
|----------|-------------|---------|
| `PROJECT_ID` | Google Cloud project ID | - |
| `REGION` | Google Cloud region | `us-central1` |
| `MODEL_ID` | Veo model identifier | `veo-3.0-generate-preview` |
| `MODEL_FAST_ID` | Veo Fast model identifier | `veo-3.0-fast-generate-preview` |
| `OUTPUT_GCS_BUCKET_NAME` | GCS bucket for storage | - |
| `SERVICE_ACCOUNT_KEY_PATH` | Path to service account JSON | `../service-account.json` |
| `SECRET_KEY` | Flask secret key | - |
| `FLASK_ENV` | Environment mode | `production` |
| `PORT` | Backend server port | `7394` |
| `FRONTEND_URL` | Frontend URL for CORS | - |
| `MAX_IMAGE_SIZE` | Maximum image size in bytes | `10MB` |
| `CONCURRENT_JOB_LIMIT` | Max concurrent processing jobs | `2` |
| `MAX_RETRIES` | Max retry attempts per job | `3` |
| `WEBHOOK_URL` | Usage tracking webhook URL | - |
| `WEBHOOK_ENABLED` | Enable usage tracking | `true` |
### Frontend Configuration
Set environment variables in `.env.local`:
```env
VITE_API_BASE_URL=http://localhost:7394
VITE_DEV_MODE=true
VITE_APP_TITLE=Veo Video Generator (Dev)
VITE_MSAL_CLIENT_ID=your-azure-client-id
VITE_MSAL_AUTHORITY=https://login.microsoftonline.com/your-tenant-id
VITE_MSAL_REDIRECT_URI=http://localhost:3000
```
### Queue System Configuration
- **Unlimited Queue**: No per-user job limits
- **Video Limits**: 1-4 videos per individual request
- **Processing Slots**: Maximum 2 concurrent jobs
- **Polling Intervals**: Frontend polls every 2 seconds, backend polls Google API every 30 seconds
- **Auto-cleanup**: Jobs deleted 5 seconds after successful download
- **Retry Logic**: Up to 3 retry attempts with exponential backoff
## Troubleshooting
### Common Issues
1. **"Service account not found"**
- Verify `service-account.json` exists and is valid
- Check file permissions
- Ensure service account has required roles
2. **"Quota exceeded"**
- Check Google Cloud quotas for Vertex AI
- Verify billing is enabled
- Request quota increases if needed
3. **"Content filtered"**
- Modify prompt to avoid restricted content
- Review Google's content policies
- Try different wording or concepts
4. **"Image validation failed"**
- Check image format (use common formats)
- Ensure image is at least 720x720 pixels
- Verify file size is under 10MB
5. **CORS errors**
- Check `FRONTEND_URL` configuration
- Verify both frontend and backend URLs
- Ensure proper development/production settings
### Debug Mode
Enable debug logging by setting:
```env
FLASK_DEBUG=True
```
Debug information includes:
- Request/response details
- Image processing steps
- API call parameters
- File paths and operations
### Log Files
Check application logs for detailed error information:
- Backend logs: Console output from Flask application
- Frontend logs: Browser developer console
- Google Cloud logs: Cloud Logging for API errors
## Security Considerations
- Service account keys should be kept secure and not committed to version control
- Use environment variables for sensitive configuration
- Implement proper authentication in production
- Regular rotation of service account keys
- Monitor usage and costs through Google Cloud Console
- Webhook URLs should use HTTPS in production
## Performance Tips
- Use appropriate video lengths (shorter = faster generation)
- Optimize image sizes before upload
- Monitor Google Cloud quotas and billing
- Implement caching for repeated requests
- Use CDN for frontend assets in production
- Clean up temporary files regularly
## Support and Resources
- [Google Vertex AI Documentation](https://cloud.google.com/vertex-ai/docs)
- [Veo 3.0 Model Documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/veo)
- [Google GenAI SDK Reference](https://googleapis.dev/python/google-genai/latest/)
- [Flask Documentation](https://flask.palletsprojects.com/)
- [React Documentation](https://reactjs.org/docs/)
- [Material-UI Documentation](https://mui.com/)
For technical issues, check the debug logs and refer to Google Cloud support resources.