2631 lines
81 KiB
Markdown
2631 lines
81 KiB
Markdown
# Accessible Video Processing Platform - Technical Documentation v2.0
|
||
|
||
**Document Version:** 2.0
|
||
**Last Updated:** October 9, 2025
|
||
**System Version:** 1.0.0
|
||
**Author:** Technical Documentation Team
|
||
|
||
---
|
||
|
||
## Table of Contents
|
||
|
||
1. [Executive Summary](#1-executive-summary)
|
||
2. [Platform Capabilities](#2-platform-capabilities)
|
||
3. [User Roles & Permissions](#3-user-roles--permissions)
|
||
4. [System Architecture](#4-system-architecture)
|
||
5. [Process Flows & Workflows](#5-process-flows--workflows)
|
||
6. [Database Schema](#6-database-schema)
|
||
7. [API Overview](#7-api-overview)
|
||
8. [AI Processing Pipeline](#8-ai-processing-pipeline)
|
||
9. [Real-time Features](#9-real-time-features)
|
||
10. [Deployment Architecture](#10-deployment-architecture)
|
||
11. [Security Model](#11-security-model)
|
||
12. [Technical Stack](#12-technical-stack)
|
||
|
||
---
|
||
|
||
## 1. Executive Summary
|
||
|
||
The Accessible Video Processing Platform is an enterprise-grade SaaS solution that automatically generates closed captions and audio descriptions for video content using advanced AI technology. The platform combines Google's Gemini 2.5 Pro AI with human quality control workflows to deliver WCAG 2.1 AA/AAA compliant accessibility content at scale.
|
||
|
||
### Key Value Propositions
|
||
|
||
- **Automated AI Processing:** Converts uploaded videos to accessibility-compliant content in minutes
|
||
- **Multi-Language Support:** Translates captions and audio descriptions to 40+ languages
|
||
- **Professional Quality Control:** Built-in review workflows ensure accuracy and compliance
|
||
- **Real-Time Monitoring:** Live status updates and WebSocket-powered dashboards
|
||
- **Scalable Architecture:** Docker-based microservices handle concurrent processing
|
||
- **Enterprise Security:** Role-based access control, JWT authentication, audit logging
|
||
|
||
### Primary Use Cases
|
||
|
||
1. **Corporate Training Videos** - Make internal training accessible to all employees
|
||
2. **Marketing Content** - Expand reach with multilingual captions and audio descriptions
|
||
3. **Educational Content** - Comply with accessibility requirements for online courses
|
||
4. **Broadcast Media** - Prepare content for FCC compliance and international distribution
|
||
5. **Legal Compliance** - Meet ADA, Section 508, and WCAG requirements
|
||
|
||
---
|
||
|
||
## 2. Platform Capabilities
|
||
|
||
### 2.1 Core Features
|
||
|
||
#### Automated Accessibility Generation
|
||
|
||
**AI-Powered Caption Creation**
|
||
- Automatic speech-to-text transcription with 95%+ accuracy
|
||
- Speaker identification and dialogue attribution
|
||
- Punctuation and formatting (question marks, exclamations)
|
||
- Technical term recognition
|
||
- Proper noun capitalization
|
||
- WebVTT format with precise millisecond timing
|
||
|
||
**Audio Description Generation**
|
||
- Scene setting descriptions (location, time of day, environment)
|
||
- Character appearance and actions
|
||
- On-screen text narration (signs, graphics, titles)
|
||
- Visual storytelling elements (expressions, gestures)
|
||
- Timed to fit between dialogue gaps
|
||
- WebVTT format synchronized with video
|
||
- Optional MP3 audio track generation
|
||
|
||
**Quality Assurance**
|
||
- AI confidence scoring (0-100%)
|
||
- Automatic validation of VTT format
|
||
- Timing overlap detection
|
||
- Minimum content requirements
|
||
- Self-healing for malformed AI responses
|
||
|
||
#### Multi-Language Translation
|
||
|
||
**Supported Languages**
|
||
- Spanish, French, German, Italian, Portuguese
|
||
- Japanese, Korean, Chinese (Simplified/Traditional)
|
||
- Arabic, Hebrew, Russian
|
||
- 40+ total languages via Google Cloud Translate
|
||
|
||
**Translation Methods**
|
||
- **Standard Translation:** Direct language conversion preserving meaning
|
||
- **Transcreation:** Cultural adaptation maintaining brand voice and local idioms (via Gemini AI)
|
||
- **Timing Preservation:** All translations maintain original cue timestamps
|
||
|
||
**Audio Description Localization**
|
||
- Translated audio description scripts (VTT)
|
||
- Text-to-speech generation in target languages
|
||
- Language-specific voice selection
|
||
- Natural-sounding neural voices
|
||
|
||
#### Professional VTT Editor
|
||
|
||
**Editing Capabilities**
|
||
- Inline text editing with live preview
|
||
- Cue-by-cue navigation
|
||
- Timestamp display (HH:MM:SS.mmm format)
|
||
- Bulk timing adjustments (-30s to +30s)
|
||
- Real-time validation with error highlighting
|
||
- Cue duration calculations
|
||
- Total duration statistics
|
||
|
||
**Editing Controls**
|
||
- Add/remove cues (planned feature)
|
||
- Split/merge cues (planned feature)
|
||
- Undo/redo (browser native)
|
||
- Keyboard shortcuts (Ctrl+S save, Ctrl+Enter confirm)
|
||
- Auto-save with change detection
|
||
|
||
#### Video Preview & Playback
|
||
|
||
**Integrated Video Player**
|
||
- HTML5 video player with standard controls
|
||
- Real-time caption overlay (synchronized)
|
||
- Audio description track player
|
||
- Multi-language caption selection
|
||
- Click-to-jump timeline navigation
|
||
- Cue highlighting (shows active caption)
|
||
- Caption on/off toggle
|
||
|
||
**Preview Modes**
|
||
- Side-by-side (video + editors)
|
||
- Video only (full preview)
|
||
- Editor only (text focus)
|
||
|
||
### 2.2 Quality Control Workflow
|
||
|
||
#### English Content Review (Primary QC)
|
||
|
||
**Reviewer Responsibilities**
|
||
- Verify caption accuracy against audio
|
||
- Check audio description completeness
|
||
- Edit VTT content as needed
|
||
- Adjust timing for synchronization issues
|
||
- Approve or reject with detailed notes
|
||
|
||
**Tools Provided**
|
||
- Dual VTT editors (captions + audio descriptions)
|
||
- Synchronized video preview
|
||
- Timing adjustment tool
|
||
- Validation feedback
|
||
- Keyboard shortcuts for efficiency
|
||
|
||
**Decision Outcomes**
|
||
- **Approve:** Triggers automatic translation/TTS pipeline
|
||
- **Reject:** Returns to AI processing with feedback for reprocessing
|
||
|
||
#### Multi-Language Final Review
|
||
|
||
**Reviewer Responsibilities**
|
||
- Validate translated caption accuracy
|
||
- Verify audio description translations
|
||
- Test TTS audio quality and pronunciation
|
||
- Check all assets present and downloadable
|
||
- Final approval for client delivery
|
||
|
||
**Tools Provided**
|
||
- Per-language asset viewers
|
||
- MP3 audio players for TTS validation
|
||
- Read-only VTT preview
|
||
- Asset completeness checklist
|
||
- Error reporting
|
||
|
||
**Decision Outcomes**
|
||
- **Approve for Delivery:** Job marked complete, client notified
|
||
- **Return for QC:** Send back for corrections with detailed notes
|
||
|
||
### 2.3 Job Management Features
|
||
|
||
#### Job Creation & Upload
|
||
|
||
**Upload Methods**
|
||
- Drag-and-drop file upload
|
||
- Click-to-browse file selection
|
||
- Real-time progress tracking (0-100%)
|
||
- Upload cancellation support
|
||
|
||
**Job Configuration**
|
||
- Custom job title
|
||
- Source language selection
|
||
- Output type selection (captions, AD script, AD audio)
|
||
- Target language selection (multiple)
|
||
- Transcreation preference (per language)
|
||
|
||
**Validation**
|
||
- File type restrictions (MP4 only)
|
||
- File size limits (up to 2GB)
|
||
- Required field validation
|
||
- Instant feedback on errors
|
||
|
||
#### Job Monitoring Dashboard
|
||
|
||
**Client View**
|
||
- Total jobs count
|
||
- Processing jobs count
|
||
- Jobs in review count
|
||
- Completed jobs count
|
||
- Recent activity feed (last 5 jobs)
|
||
- Real-time status updates
|
||
|
||
**Reviewer/Admin View**
|
||
- System-wide job statistics
|
||
- QC queue depth with pending counts
|
||
- Final review queue depth
|
||
- Processing activity across all clients
|
||
- Quick navigation to review queues
|
||
|
||
#### Job Lifecycle Tracking
|
||
|
||
**Status Indicators**
|
||
- Created (gray) - Job queued for processing
|
||
- Ingesting (blue) - Downloading and analyzing video
|
||
- AI Processing (blue) - Generating accessibility content
|
||
- Pending QC (yellow) - Awaiting human review
|
||
- Approved (green) - English content approved
|
||
- Translating (purple) - Multi-language processing
|
||
- TTS Generating (purple) - Audio synthesis in progress
|
||
- Pending Final Review (orange) - Awaiting final approval
|
||
- Completed (green) - Ready for client download
|
||
- Rejected (red) - Requires revision
|
||
|
||
**Real-Time Updates**
|
||
- WebSocket-powered status changes
|
||
- Toast notifications for major transitions
|
||
- Progress percentage (when available)
|
||
- Estimated time remaining (calculated)
|
||
- Error messages with context
|
||
|
||
#### Asset Download System
|
||
|
||
**Download Experience**
|
||
- Organized by language
|
||
- Source video included
|
||
- 24-hour signed URL generation
|
||
- Secure download links (no authentication needed after generation)
|
||
- Batch download capability (coming soon)
|
||
|
||
**Asset Organization**
|
||
- Source video (MP4)
|
||
- Per language:
|
||
- Closed Captions (VTT file)
|
||
- Audio Description Script (VTT file)
|
||
- Audio Description Audio (MP3 file)
|
||
|
||
**File Naming Convention**
|
||
```
|
||
{JobTitle}_source.mp4
|
||
{JobTitle}_en_captions.vtt
|
||
{JobTitle}_en_ad.vtt
|
||
{JobTitle}_en_ad.mp3
|
||
{JobTitle}_es_captions.vtt
|
||
{JobTitle}_es_ad.vtt
|
||
{JobTitle}_es_ad.mp3
|
||
```
|
||
|
||
### 2.4 Administrative Capabilities
|
||
|
||
#### User Management
|
||
|
||
**User Operations**
|
||
- Create new users (client, reviewer, admin)
|
||
- Update user profiles (email, name, role)
|
||
- Deactivate/reactivate accounts
|
||
- Reset passwords (generates secure temporary password)
|
||
- View user activity history
|
||
|
||
**User Listing**
|
||
- Filter by role (client, reviewer, admin)
|
||
- Filter by active status
|
||
- Pagination (20 users per page)
|
||
- Sort by creation date, email, role
|
||
- Quick search by email or name
|
||
|
||
#### System Monitoring & Statistics
|
||
|
||
**Job Statistics**
|
||
- Total jobs processed
|
||
- Jobs by status breakdown (pie chart data)
|
||
- Average processing time
|
||
- Completion rate percentage
|
||
- Daily job creation trends
|
||
- Queue depth monitoring
|
||
|
||
**Health Monitoring**
|
||
- MongoDB connection status
|
||
- Redis connection status
|
||
- Google Cloud Storage accessibility
|
||
- Celery worker count and active tasks
|
||
- API response time metrics
|
||
- Error rate tracking
|
||
|
||
**Performance Metrics**
|
||
- Min/max/avg processing times by pipeline stage
|
||
- Time-range analysis (7, 30, 90 days)
|
||
- Jobs created vs completed rates
|
||
- Queue wait time statistics
|
||
- Worker utilization percentage
|
||
|
||
#### Audit Trail & Compliance
|
||
|
||
**Audit Log Features**
|
||
- Comprehensive logging of all user actions
|
||
- Security event tracking
|
||
- Filterable by:
|
||
- Time range (date pickers)
|
||
- Action type (login, create, update, delete, approve)
|
||
- Severity (info, warning, critical)
|
||
- User ID or email
|
||
- Resource type (job, user, file)
|
||
- Success/failure status
|
||
- Full-text search across descriptions
|
||
- Exportable audit reports (planned)
|
||
|
||
**Tracked Events**
|
||
- All authentication attempts (success/failure)
|
||
- Job creation, approval, rejection, completion, deletion
|
||
- User account changes (create, update, deactivate, role change)
|
||
- Password resets
|
||
- Bulk operations
|
||
- Security violations (rate limits, unauthorized access)
|
||
- Admin maintenance actions
|
||
|
||
**Retention Policy**
|
||
- Configurable retention (default: 365 days)
|
||
- Admin-triggered cleanup with confirmation
|
||
- Cleanup actions themselves audited (meta-auditing)
|
||
|
||
#### Maintenance Operations
|
||
|
||
**Job Reprocessing**
|
||
- Emergency function for stuck or failed jobs
|
||
- Resets job to "created" status
|
||
- Triggers full ingestion pipeline again
|
||
- Overwrites existing results
|
||
- Use cases:
|
||
- AI generated incorrect content
|
||
- Processing interrupted mid-pipeline
|
||
- Updated AI prompts require regeneration
|
||
|
||
**Bulk Job Operations**
|
||
- Bulk delete with confirmation
|
||
- Counts affected assets (videos, captions, audio)
|
||
- Itemized deletion summary
|
||
- Error handling for partial failures
|
||
- Irreversible action warnings
|
||
|
||
---
|
||
|
||
## 3. User Roles & Permissions
|
||
|
||
### 3.1 Role Hierarchy
|
||
|
||
```mermaid
|
||
graph TD
|
||
A[Admin] -->|Inherits| B[Reviewer]
|
||
B -->|Inherits| C[Client]
|
||
A -->|Exclusive| D[User Management]
|
||
A -->|Exclusive| E[System Stats]
|
||
A -->|Exclusive| F[Audit Logs]
|
||
A -->|Exclusive| G[Bulk Operations]
|
||
B -->|Exclusive| H[QC Review]
|
||
B -->|Exclusive| I[Final Approval]
|
||
B -->|Exclusive| J[VTT Editing]
|
||
C -->|Exclusive| K[Job Creation]
|
||
C -->|Exclusive| L[Own Jobs Only]
|
||
```
|
||
|
||
### 3.2 Permission Matrix
|
||
|
||
| Feature | Client | Reviewer | Admin |
|
||
|---------|--------|----------|-------|
|
||
| **Job Management** ||||
|
||
| Create jobs | | | |
|
||
| View own jobs | | | |
|
||
| View all jobs | | | |
|
||
| Delete own jobs | | | |
|
||
| Delete any job | | | |
|
||
| Bulk delete jobs | | | |
|
||
| Reprocess jobs | | | |
|
||
| **Quality Control** ||||
|
||
| Access QC queue | | | |
|
||
| Edit VTT content | | | |
|
||
| Approve English content | | | |
|
||
| Reject jobs | | | |
|
||
| Adjust VTT timing | | | |
|
||
| **Final Review** ||||
|
||
| Access final queue | | | |
|
||
| Validate assets | | | |
|
||
| Approve for delivery | | | |
|
||
| Return for QC | | | |
|
||
| **Downloads** ||||
|
||
| Download own jobs | | | |
|
||
| Download any job | | | |
|
||
| **Administration** ||||
|
||
| User management | | | |
|
||
| System statistics | | | |
|
||
| Audit log access | | | |
|
||
| Health monitoring | | | |
|
||
|
||
### 3.3 Access Control Implementation
|
||
|
||
**Authentication Layer**
|
||
- JWT token-based authentication
|
||
- HttpOnly cookies for refresh tokens
|
||
- Token expiration and automatic refresh
|
||
- Secure session management
|
||
|
||
**Authorization Layer**
|
||
- Route-level protection (React Router guards)
|
||
- API endpoint protection (FastAPI dependencies)
|
||
- Database query filtering (client_id restrictions)
|
||
- Resource-level access checks
|
||
|
||
**Security Boundaries**
|
||
- Clients cannot access other clients' jobs
|
||
- Reviewers have read-only access to job data (except VTT editing)
|
||
- Admins have full CRUD access to all resources
|
||
- Audit logging for all privileged operations
|
||
|
||
---
|
||
|
||
## 4. System Architecture
|
||
|
||
### 4.1 High-Level Architecture
|
||
|
||
```mermaid
|
||
graph TB
|
||
subgraph "Client Browser"
|
||
FE[React SPA<br/>TypeScript]
|
||
end
|
||
|
||
subgraph "Web Server - GCP VM"
|
||
Apache[Apache 2.4<br/>Reverse Proxy]
|
||
Static[Static Files<br/>/var/www/html]
|
||
end
|
||
|
||
subgraph "Docker Environment"
|
||
API[FastAPI Backend<br/>Gunicorn + Uvicorn<br/>Port 8003]
|
||
Worker[Celery Worker<br/>Background Processing]
|
||
Redis[Redis 7<br/>Queue & Pub/Sub]
|
||
Mongo[MongoDB 7<br/>Database]
|
||
end
|
||
|
||
subgraph "Google Cloud Platform"
|
||
Gemini[Gemini 2.5 Pro<br/>AI Analysis]
|
||
GCS[Cloud Storage<br/>Video & Assets]
|
||
Translate[Cloud Translate<br/>Multi-language]
|
||
TTS[Text-to-Speech<br/>Audio Generation]
|
||
end
|
||
|
||
subgraph "External Services"
|
||
Email[SendGrid<br/>Notifications]
|
||
ElevenLabs[ElevenLabs<br/>TTS Fallback]
|
||
end
|
||
|
||
FE -->|HTTPS| Apache
|
||
Apache -->|Proxy /video-accessibility-back| API
|
||
Apache -->|Serve /video-accessibility| Static
|
||
Apache -->|WebSocket Upgrade| API
|
||
|
||
API -->|Jobs, Users| Mongo
|
||
API -->|Queue Tasks| Redis
|
||
API -->|Upload/Download| GCS
|
||
API -->|AI Requests| Gemini
|
||
API -->|WebSocket Pub/Sub| Redis
|
||
|
||
Worker -->|Read Tasks| Redis
|
||
Worker -->|Store Results| Mongo
|
||
Worker -->|Upload Assets| GCS
|
||
Worker -->|AI Requests| Gemini
|
||
Worker -->|Translate| Translate
|
||
Worker -->|Synthesize| TTS
|
||
Worker -->|Synthesize Fallback| ElevenLabs
|
||
Worker -->|Send Emails| Email
|
||
Worker -->|Broadcast Status| Redis
|
||
|
||
Redis -->|Subscribe| API
|
||
|
||
style FE fill:#e1f5ff
|
||
style API fill:#fff4e6
|
||
style Worker fill:#fff4e6
|
||
style Mongo fill:#e8f5e9
|
||
style Redis fill:#fce4ec
|
||
style GCS fill:#f3e5f5
|
||
style Gemini fill:#f3e5f5
|
||
```
|
||
|
||
### 4.2 Component Architecture
|
||
|
||
```mermaid
|
||
graph LR
|
||
subgraph "Frontend - React SPA"
|
||
Routes[Routes<br/>React Router]
|
||
Components[Components<br/>VTT Editor, Video Player]
|
||
State[State Management<br/>Zustand + React Query]
|
||
WS[WebSocket Client<br/>Real-time Updates]
|
||
end
|
||
|
||
subgraph "Backend - FastAPI"
|
||
AuthAPI[Auth API<br/>JWT Tokens]
|
||
JobsAPI[Jobs API<br/>CRUD + Actions]
|
||
AdminAPI[Admin API<br/>Users + Stats]
|
||
WebSocketAPI[WebSocket API<br/>Status Broadcasts]
|
||
end
|
||
|
||
subgraph "Background Workers - Celery"
|
||
IngestWorker[Ingest & AI Task<br/>Video <20> Accessibility]
|
||
TranslateWorker[Translation & TTS Task<br/>Multi-language]
|
||
NotifyWorker[Notification Task<br/>Client Emails]
|
||
end
|
||
|
||
subgraph "Core Services"
|
||
GeminiService[Gemini Service<br/>AI Integration]
|
||
GCSService[GCS Service<br/>File Storage]
|
||
TTSService[TTS Service<br/>Audio Synthesis]
|
||
TranslateService[Translation Service]
|
||
WebSocketService[WebSocket Manager<br/>Connection Handling]
|
||
end
|
||
|
||
Routes --> Components
|
||
Components --> State
|
||
State -->|HTTP| JobsAPI
|
||
State -->|HTTP| AuthAPI
|
||
WS -->|WSS| WebSocketAPI
|
||
|
||
JobsAPI -->|Dispatch| IngestWorker
|
||
JobsAPI -->|Dispatch| TranslateWorker
|
||
JobsAPI -->|Dispatch| NotifyWorker
|
||
|
||
IngestWorker --> GeminiService
|
||
IngestWorker --> GCSService
|
||
TranslateWorker --> TranslateService
|
||
TranslateWorker --> TTSService
|
||
TranslateWorker --> GCSService
|
||
NotifyWorker --> GCSService
|
||
|
||
WebSocketAPI --> WebSocketService
|
||
WebSocketService -->|Publish| WS
|
||
|
||
style Routes fill:#e1f5ff
|
||
style Components fill:#e1f5ff
|
||
style State fill:#e1f5ff
|
||
style WS fill:#e1f5ff
|
||
style AuthAPI fill:#fff4e6
|
||
style JobsAPI fill:#fff4e6
|
||
style AdminAPI fill:#fff4e6
|
||
style WebSocketAPI fill:#fff4e6
|
||
```
|
||
|
||
### 4.3 Request Flow Architecture
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant Browser
|
||
participant Apache
|
||
participant API
|
||
participant Worker
|
||
participant MongoDB
|
||
participant Redis
|
||
participant GCS
|
||
participant Gemini
|
||
|
||
Note over Browser,Gemini: Job Creation Flow
|
||
|
||
Browser->>Apache: POST /video-accessibility-back/api/v1/jobs
|
||
Apache->>API: Proxy to localhost:8003
|
||
API->>MongoDB: Create job document (status=created)
|
||
API->>GCS: Upload video file
|
||
API->>Redis: Queue ingest_and_ai_task
|
||
API->>Browser: Return job_id
|
||
|
||
Worker->>Redis: Pick up task
|
||
Worker->>GCS: Download video
|
||
Worker->>Gemini: Upload & analyze video
|
||
Gemini->>Worker: Return JSON (captions + AD)
|
||
Worker->>GCS: Upload VTT files
|
||
Worker->>MongoDB: Update job (status=pending_qc)
|
||
Worker->>Redis: Publish status update
|
||
Redis->>API: Broadcast to subscribers
|
||
API->>Browser: WebSocket message
|
||
Browser->>Browser: Show toast notification
|
||
|
||
Note over Browser,Gemini: QC Approval Flow
|
||
|
||
Browser->>Apache: POST /api/v1/jobs/{id}/actions/approve_english
|
||
Apache->>API: Proxy request
|
||
API->>MongoDB: Update status (approved_english)
|
||
API->>Redis: Queue translate_and_synthesize_task
|
||
API->>Browser: Return success
|
||
|
||
Worker->>Redis: Pick up translation task
|
||
Worker->>GCS: Download English VTT
|
||
Worker->>Gemini: Transcreate (if needed)
|
||
Worker->>GCS: Upload translated VTT
|
||
Worker->>GCS: Upload TTS MP3
|
||
Worker->>MongoDB: Update outputs
|
||
Worker->>Redis: Publish status update
|
||
Redis->>API: Broadcast completion
|
||
API->>Browser: WebSocket message
|
||
```
|
||
|
||
---
|
||
|
||
## 5. Process Flows & Workflows
|
||
|
||
### 5.1 Complete Job Processing Flow
|
||
|
||
```mermaid
|
||
stateDiagram-v2
|
||
[*] --> created: Client uploads video
|
||
created --> ingesting: Worker picks up task
|
||
ingesting --> ai_processing: Video downloaded & probed
|
||
ai_processing --> pending_qc: AI generates VTT files
|
||
|
||
pending_qc --> rejected: Reviewer rejects
|
||
pending_qc --> approved_english: Reviewer approves
|
||
|
||
rejected --> ingesting: System retries
|
||
|
||
approved_english --> translating: Translation task queued
|
||
translating --> tts_generating: Translations complete
|
||
tts_generating --> pending_final_review: TTS MP3s generated
|
||
|
||
pending_final_review --> qc_feedback: Reviewer returns for fixes
|
||
pending_final_review --> completed: Reviewer approves delivery
|
||
|
||
qc_feedback --> pending_qc: Routed back to QC queue
|
||
|
||
completed --> [*]: Client downloads assets
|
||
|
||
note right of created
|
||
Duration: Instant
|
||
Actor: Client
|
||
end note
|
||
|
||
note right of ingesting
|
||
Duration: 10-30s
|
||
Actor: System (Worker)
|
||
end note
|
||
|
||
note right of ai_processing
|
||
Duration: 30-90s
|
||
Actor: Gemini AI
|
||
end note
|
||
|
||
note right of pending_qc
|
||
Duration: Variable
|
||
Actor: Reviewer (Human)
|
||
end note
|
||
|
||
note right of translating
|
||
Duration: 10-60s
|
||
Actor: System (Worker)
|
||
end note
|
||
|
||
note right of tts_generating
|
||
Duration: 30-120s
|
||
Actor: System (Worker)
|
||
end note
|
||
|
||
note right of pending_final_review
|
||
Duration: Variable
|
||
Actor: Reviewer (Human)
|
||
end note
|
||
```
|
||
|
||
### 5.2 AI Processing Pipeline Detail
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
Start([Video Upload]) --> ValidateFile{File Valid?}
|
||
ValidateFile -->|No| Error1[Return Error]
|
||
ValidateFile -->|Yes| CreateJob[Create Job Record<br/>status=created]
|
||
CreateJob --> UploadGCS[Upload to GCS<br/>gs://bucket/job_id/source.mp4]
|
||
UploadGCS --> QueueTask[Queue Celery Task<br/>ingest_and_ai_task]
|
||
|
||
QueueTask --> WorkerPick[Worker Picks Up Task]
|
||
WorkerPick --> DownloadVideo[Download Video<br/>to Temp File]
|
||
DownloadVideo --> ProbeVideo[Probe Metadata<br/>FFmpeg: duration, codec]
|
||
ProbeVideo --> UpdateStatus1[Update Status<br/>ingesting <20> ai_processing]
|
||
|
||
UpdateStatus1 --> UploadGemini[Upload to Gemini<br/>Files API]
|
||
UploadGemini --> WaitActive{File Active?}
|
||
WaitActive -->|No, wait| WaitActive
|
||
WaitActive -->|Yes| SendPrompt[Send AI Prompt<br/>Extract Accessibility]
|
||
|
||
SendPrompt --> ReceiveJSON[Receive JSON Response]
|
||
ReceiveJSON --> ParseJSON{Valid JSON?}
|
||
ParseJSON -->|No| SelfHeal[Self-Healing<br/>Fix JSON or Re-prompt]
|
||
SelfHeal --> ParseJSON
|
||
ParseJSON -->|Yes| ValidateVTT{VTT Valid?}
|
||
|
||
ValidateVTT -->|No| CreateFallback[Create Fallback<br/>Minimal VTT]
|
||
ValidateVTT -->|Yes| ExtractData[Extract Data<br/>confidence, summary, VTT]
|
||
CreateFallback --> ExtractData
|
||
|
||
ExtractData --> UploadVTT[Upload VTT to GCS<br/>en/captions.vtt<br/>en/ad.vtt]
|
||
UploadVTT --> UpdateJob[Update Job Document<br/>outputs, ai.confidence]
|
||
UpdateJob --> UpdateStatus2[Update Status<br/>pending_qc]
|
||
UpdateStatus2 --> BroadcastWS[Broadcast WebSocket<br/>Status Update]
|
||
BroadcastWS --> CleanupGemini[Delete File from Gemini]
|
||
CleanupGemini --> CleanupTemp[Delete Temp Video]
|
||
CleanupTemp --> End([Task Complete])
|
||
|
||
Error1 --> End
|
||
|
||
style Start fill:#e1f5ff
|
||
style End fill:#c8e6c9
|
||
style Error1 fill:#ffcdd2
|
||
style SendPrompt fill:#f3e5f5
|
||
style UploadGemini fill:#f3e5f5
|
||
style ReceiveJSON fill:#f3e5f5
|
||
style UploadVTT fill:#fff9c4
|
||
style BroadcastWS fill:#fff9c4
|
||
```
|
||
|
||
### 5.3 Translation & TTS Pipeline
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
Start([English Approved]) --> QueueTranslate[Queue Translation Task]
|
||
QueueTranslate --> WorkerPick[Worker Picks Up Task]
|
||
WorkerPick --> UpdateStatus1[Update Status<br/>approved_english <20> translating]
|
||
UpdateStatus1 --> DownloadEN[Download English VTT<br/>captions.vtt + ad.vtt]
|
||
|
||
DownloadEN --> LoopLang{For Each<br/>Target Language}
|
||
|
||
LoopLang -->|Language| CheckMethod{Transcreation<br/>or Translation?}
|
||
|
||
CheckMethod -->|Transcreation| Gemini[Gemini AI<br/>Cultural Adaptation]
|
||
CheckMethod -->|Translation| GoogleTranslate[Google Translate<br/>API Call]
|
||
|
||
Gemini --> BuildVTT[Build Translated VTT<br/>Preserve Timing]
|
||
GoogleTranslate --> BuildVTT
|
||
|
||
BuildVTT --> UploadTransVTT[Upload to GCS<br/>lang/captions.vtt<br/>lang/ad.vtt]
|
||
|
||
UploadTransVTT --> CheckMP3{MP3<br/>Requested?}
|
||
CheckMP3 -->|No| NextLang
|
||
CheckMP3 -->|Yes| UpdateStatus2[Update Status<br/>translating <20> tts_generating]
|
||
|
||
UpdateStatus2 --> ParseAD[Parse AD VTT<br/>Extract Cues + Timing]
|
||
ParseAD --> LoopCues{For Each Cue}
|
||
|
||
LoopCues --> CalcSilence[Calculate Silence<br/>to Match VTT Time]
|
||
CalcSilence --> SynthCue[Synthesize Cue<br/>Google TTS or ElevenLabs]
|
||
SynthCue --> AppendAudio[Append Audio Segment]
|
||
AppendAudio --> LoopCues
|
||
|
||
LoopCues -->|All Done| StitchAudio[Stitch All Segments<br/>Export MP3]
|
||
StitchAudio --> UploadMP3[Upload to GCS<br/>lang/ad.mp3]
|
||
UploadMP3 --> NextLang[Next Language]
|
||
|
||
NextLang --> LoopLang
|
||
LoopLang -->|All Done| UpdateFinal[Update Status<br/>pending_final_review]
|
||
UpdateFinal --> BroadcastWS[Broadcast WebSocket<br/>Translation Complete]
|
||
BroadcastWS --> End([Task Complete])
|
||
|
||
style Start fill:#c8e6c9
|
||
style End fill:#c8e6c9
|
||
style Gemini fill:#f3e5f5
|
||
style GoogleTranslate fill:#f3e5f5
|
||
style SynthCue fill:#fff9c4
|
||
style BroadcastWS fill:#fff9c4
|
||
```
|
||
|
||
### 5.4 Quality Control Decision Flow
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
Start([Job in QC Queue]) --> ReviewerOpen[Reviewer Opens<br/>QC Detail Page]
|
||
ReviewerOpen --> LoadAssets[Load English VTT<br/>from GCS]
|
||
LoadAssets --> VideoReview[Watch Video<br/>with Captions]
|
||
|
||
VideoReview --> CheckAccuracy{Captions<br/>Accurate?}
|
||
CheckAccuracy -->|No| EditVTT[Edit VTT Content<br/>Fix Errors]
|
||
EditVTT --> CheckTiming
|
||
CheckAccuracy -->|Yes| CheckTiming{Timing<br/>Synchronized?}
|
||
|
||
CheckTiming -->|No| AdjustTiming[Adjust Timing<br/>+/- Offset]
|
||
AdjustTiming --> SaveChanges
|
||
CheckTiming -->|Yes| SaveChanges[Save All Changes]
|
||
|
||
SaveChanges --> ReviewAD{Audio Description<br/>Complete?}
|
||
ReviewAD -->|No| EditAD[Edit AD VTT<br/>Add Missing Descriptions]
|
||
EditAD --> FinalDecision
|
||
ReviewAD -->|Yes| FinalDecision{Approve<br/>or Reject?}
|
||
|
||
FinalDecision -->|Reject| AddNotes[Add Required Notes<br/>Explain Issues]
|
||
AddNotes --> RejectJob[Submit Rejection]
|
||
RejectJob --> UpdateRejected[Status <20> rejected]
|
||
UpdateRejected --> NotifyClient[Notify Client<br/>Toast + Email]
|
||
NotifyClient --> RetryAI[System Retriggers<br/>AI Processing]
|
||
RetryAI --> End1([Back to Queue])
|
||
|
||
FinalDecision -->|Approve| AddOptionalNotes[Add Optional Notes<br/>QC Comments]
|
||
AddOptionalNotes --> ApproveJob[Submit Approval]
|
||
ApproveJob --> UpdateApproved[Status <20> approved_english]
|
||
UpdateApproved --> TriggerTranslation[Queue Translation Task<br/>Automatic]
|
||
TriggerTranslation --> NotifyProgress[Broadcast WebSocket<br/>Status Update]
|
||
NotifyProgress --> End2([Translation Begins])
|
||
|
||
style Start fill:#fff9c4
|
||
style End1 fill:#ffcdd2
|
||
style End2 fill:#c8e6c9
|
||
style EditVTT fill:#e1f5ff
|
||
style AdjustTiming fill:#e1f5ff
|
||
style ApproveJob fill:#c8e6c9
|
||
style RejectJob fill:#ffcdd2
|
||
```
|
||
|
||
### 5.5 Asset Download Flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant Client
|
||
participant Frontend
|
||
participant API
|
||
participant GCS
|
||
|
||
Client->>Frontend: Navigate to /downloads/{job_id}
|
||
Frontend->>API: GET /api/v1/jobs/{job_id}/downloads
|
||
|
||
API->>API: Verify user has access
|
||
API->>API: Check job status (must be completed)
|
||
|
||
loop For each asset
|
||
API->>GCS: Generate signed URL (24h expiry)
|
||
GCS->>API: Return signed URL
|
||
end
|
||
|
||
API->>Frontend: Return download manifest<br/>{source_video: url, en: {captions: url, ad: url, mp3: url}, ...}
|
||
|
||
Frontend->>Client: Display organized download page<br/>with signed URLs
|
||
|
||
Note over Client,GCS: Client clicks download link
|
||
|
||
Client->>GCS: Direct download via signed URL<br/>(bypasses API)
|
||
GCS->>Client: Stream file (MP4/VTT/MP3)
|
||
|
||
Note over Client: URL expires after 24 hours
|
||
```
|
||
|
||
---
|
||
|
||
## 6. Database Schema
|
||
|
||
### 6.1 Entity Relationship Diagram
|
||
|
||
```mermaid
|
||
erDiagram
|
||
USER ||--o{ JOB : creates
|
||
USER ||--o{ JOB : reviews
|
||
USER ||--o{ AUDIT_LOG : generates
|
||
JOB ||--o{ AUDIT_LOG : "logs actions on"
|
||
|
||
USER {
|
||
string _id PK
|
||
string email UK
|
||
string hashed_password
|
||
string full_name
|
||
enum role
|
||
boolean is_active
|
||
datetime created_at
|
||
datetime updated_at
|
||
}
|
||
|
||
JOB {
|
||
string _id PK
|
||
string client_id FK
|
||
string title
|
||
enum status
|
||
object source
|
||
object requested_outputs
|
||
object outputs
|
||
object review
|
||
object ai
|
||
object error
|
||
string task_id
|
||
datetime created_at
|
||
datetime updated_at
|
||
}
|
||
|
||
AUDIT_LOG {
|
||
string _id PK
|
||
enum action
|
||
enum severity
|
||
string description
|
||
datetime timestamp
|
||
string user_id FK
|
||
string user_email
|
||
string user_role
|
||
string ip_address
|
||
string user_agent
|
||
string request_id
|
||
string resource_type
|
||
string resource_id
|
||
string resource_name
|
||
object details
|
||
boolean success
|
||
string error_message
|
||
string environment
|
||
string service_name
|
||
string api_version
|
||
}
|
||
```
|
||
|
||
### 6.2 Job Document Structure
|
||
|
||
**Primary Fields**
|
||
- `_id` (string) - Unique job identifier
|
||
- `client_id` (string) - Foreign key to users collection
|
||
- `title` (string) - Job name (user-provided)
|
||
- `status` (enum) - Current pipeline stage
|
||
- `task_id` (string) - Celery task ID for monitoring
|
||
|
||
**Source Object**
|
||
```json
|
||
{
|
||
"filename": "source.mp4",
|
||
"original_filename": "Corporate_Training_Q4.mp4",
|
||
"gcs_uri": "gs://accessible-video/68e7.../source.mp4",
|
||
"duration_s": 525.4,
|
||
"language": "en"
|
||
}
|
||
```
|
||
|
||
**Requested Outputs Object**
|
||
```json
|
||
{
|
||
"captions_vtt": true,
|
||
"audio_description_vtt": true,
|
||
"audio_description_mp3": true,
|
||
"languages": ["en", "es", "fr"],
|
||
"transcreation": ["es"]
|
||
}
|
||
```
|
||
|
||
**Outputs Object** (per language)
|
||
```json
|
||
{
|
||
"en": {
|
||
"captions_vtt_gcs": "gs://accessible-video/68e7.../en/captions.vtt",
|
||
"ad_vtt_gcs": "gs://accessible-video/68e7.../en/ad.vtt",
|
||
"ad_mp3_gcs": "gs://accessible-video/68e7.../en/ad.mp3"
|
||
},
|
||
"es": {
|
||
"captions_vtt_gcs": "gs://accessible-video/68e7.../es/captions.vtt",
|
||
"ad_vtt_gcs": "gs://accessible-video/68e7.../es/ad.vtt",
|
||
"ad_mp3_gcs": "gs://accessible-video/68e7.../es/ad.mp3",
|
||
"origin": "transcreate",
|
||
"qa_notes": ""
|
||
}
|
||
}
|
||
```
|
||
|
||
**Review Object**
|
||
```json
|
||
{
|
||
"notes": "Fixed timing issues. Content accurate.",
|
||
"reviewer_id": "reviewer-001",
|
||
"history": [
|
||
{
|
||
"at": "2025-01-15T14:30:00Z",
|
||
"status": "pending_qc",
|
||
"by": "system",
|
||
"notes": ""
|
||
},
|
||
{
|
||
"at": "2025-01-15T14:45:22Z",
|
||
"status": "approved_english",
|
||
"by": "reviewer-001",
|
||
"notes": "Fixed timing issues. Content accurate."
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
**AI Object**
|
||
```json
|
||
{
|
||
"confidence": 0.94,
|
||
"ingestion_json": {
|
||
"language": "en",
|
||
"confidence": 0.94,
|
||
"summary": "Corporate training video...",
|
||
"transcript_plaintext": "Welcome to Q4...",
|
||
"captions_vtt": "WEBVTT\n\n00:00:00.000 --> ...",
|
||
"audio_description_vtt": "WEBVTT\n\n00:00:00.000 --> ..."
|
||
}
|
||
}
|
||
```
|
||
|
||
### 6.3 Database Indexes
|
||
|
||
**Users Collection**
|
||
- `email` (unique) - Fast user lookup during authentication
|
||
- `role` - Filter users by role for admin pages
|
||
|
||
**Jobs Collection**
|
||
- `status` + `created_at` (compound) - QC/review queue queries
|
||
- `client_id` - Filter jobs by owner (client view)
|
||
- `created_at` (desc) - Recent jobs first
|
||
|
||
**Audit Logs Collection**
|
||
- `timestamp` (desc) - Chronological queries
|
||
- `action` + `timestamp` - Filter by action type
|
||
- `user_id` + `timestamp` - User activity history
|
||
- `severity` + `timestamp` - Security event queries
|
||
- `resource_type` + `resource_id` - Resource tracking
|
||
- Full-text index on `description`, `details`, `error_message` - Search capability
|
||
|
||
### 6.4 File Storage Structure (GCS)
|
||
|
||
```
|
||
gs://accessible-video/
|
||
|