# Video Accessibility Processing Platform - Software Specification ## 1. Executive Summary The Video Accessibility Processing Platform is a comprehensive web application designed to automatically generate closed captions and audio descriptions for video content using artificial intelligence. The platform provides a complete workflow from video upload through AI processing, human quality control, multi-language translation, and final content delivery. **Core Capabilities:** - Automated generation of closed captions and audio descriptions using Google Gemini 2.5 Pro - Multi-language translation and transcreation services - Professional quality control workflow for reviewers - Text-to-speech generation for audio descriptions - Role-based access control for clients, reviewers, and administrators - Real-time job status updates via WebSocket connections - Secure file storage and signed URL download system **Target Users:** - **Clients**: Organizations needing video accessibility services - **Reviewers**: Professional accessibility specialists who review and approve content - **Administrators**: System administrators managing users and system operations ## 2. System Architecture ### 2.1 Technology Stack **Frontend:** - React 18 with TypeScript - Vite for build tooling - TanStack Query for state management - React Router for navigation - Tailwind CSS for styling **Backend:** - FastAPI (Python 3.11+) for REST API - Celery with Redis for background task processing - MongoDB Atlas for data storage - JWT authentication with HttpOnly refresh cookies **External Services:** - Google Cloud Storage for file storage - Google Gemini 2.5 Pro for AI processing - Google Cloud Translate for language translation - ElevenLabs for text-to-speech synthesis **Infrastructure:** - Docker containerization - Redis for caching and task queues - WebSocket support for real-time updates ### 2.2 System Components ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ React SPA │ │ FastAPI │ │ Celery │ │ Frontend │◄──►│ Backend │◄──►│ Workers │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ MongoDB │ │ Redis │ │ Database │ │ Queue/Cache │ └─────────────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ Google Cloud │ │ Storage │ └─────────────────┘ ``` ## 3. User Roles and Access Control ### 3.1 Role Definitions **Client Role:** - Upload videos and create processing jobs - View own job status and progress - Download completed accessibility assets - Limited to own content only **Reviewer Role:** - Access quality control dashboard - Review AI-generated content for accuracy - Edit VTT files (captions and audio descriptions) - Approve or reject English content - Perform final review of completed jobs - Access to all jobs in system **Admin Role:** - Full system access including all reviewer capabilities - User management (create, edit, deactivate users) - System monitoring and health checks - Bulk operations and maintenance tasks - Access to audit logs and system statistics ### 3.2 Authentication System **JWT Token Management:** - Access tokens stored in memory (15-minute expiry) - Refresh tokens stored in HttpOnly cookies (7-day expiry) - Automatic token refresh for active sessions - Secure logout with cookie clearing **Security Features:** - Password hashing using bcrypt - CORS protection with configurable origins - Rate limiting on authentication endpoints - Session-based security with proper token rotation ## 4. Job Processing Workflow ### 4.1 Job Status State Machine The system implements a comprehensive state machine for tracking job progress: ``` created → ingesting → ai_processing → pending_qc → approved_english → translating → tts_generating → pending_final_review → completed ↓ rejected → (manual intervention required) ↓ qc_feedback → (back to pending_qc after fixes) ``` **Status Definitions:** - **created**: Job record created, video uploaded to storage - **ingesting**: Video being processed for metadata extraction - **ai_processing**: AI analyzing video content and generating captions/audio descriptions - **pending_qc**: Awaiting human quality control review - **approved_english**: English content approved, ready for translation - **rejected**: Content rejected, requires client revision - **qc_feedback**: Reviewer provided feedback, awaiting fixes - **translating**: Processing multi-language translations - **tts_generating**: Generating audio files from text descriptions - **pending_final_review**: All content ready, awaiting final approval - **completed**: Job finished, all assets available for download ### 4.2 Processing Pipeline **Phase 1: Upload and Ingestion** 1. Client uploads MP4 video file through web interface 2. File stored in Google Cloud Storage with unique job ID path 3. Job record created in MongoDB with metadata 4. Background Celery task queued for processing **Phase 2: AI Content Generation** 1. Video file sent to Google Gemini 2.5 Pro API 2. AI generates: - Plain text transcript - Closed captions in WebVTT format - Audio description script in WebVTT format - Confidence score for generated content 3. Generated content stored in GCS and linked to job 4. Job status updated to `pending_qc` **Phase 3: Quality Control Review** 1. Reviewer accesses job through QC dashboard 2. Side-by-side video player with generated captions/audio descriptions 3. Inline VTT editor for making corrections 4. Timing adjustment tools for synchronization 5. Approve or reject with reviewer notes 6. If approved, job moves to translation phase **Phase 4: Translation and Localization** 1. Automatic translation of approved English content 2. Support for standard translation and cultural transcreation 3. Available target languages: Spanish, French, German (expandable) 4. Translated VTT files stored per language **Phase 5: Audio Generation** 1. Text-to-speech synthesis using ElevenLabs API 2. MP3 files generated for each audio description track 3. Language-specific voice selection 4. Audio files stored alongside VTT content **Phase 6: Final Review and Delivery** 1. Final review by authorized reviewer 2. Asset validation to ensure all requested outputs present 3. Client notification of job completion 4. Signed URL generation for secure downloads ## 5. User Interface and Experience ### 5.1 Client Workflow **Dashboard:** - Overview of all jobs with status indicators - Quick actions for creating new jobs - Real-time status updates via WebSocket - Notification system for job completion **Job Creation Process:** 1. **Video Upload**: Drag-and-drop interface with progress tracking 2. **Job Configuration**: - Descriptive title - Source language selection - Output format selection (captions VTT, audio description VTT, audio MP3) - Target languages for translation 3. **Processing Initiation**: Automatic background processing begins 4. **Confirmation**: Success page with job tracking link **Job Monitoring:** - Detailed status view with progress indicators - Processing history timeline - Real-time updates without page refresh - Error notifications with context **Content Download:** - Secure download links for completed assets - Organized by language (en/, es/, fr/, de/) - File format options (VTT, MP3) - Source video access ### 5.2 Reviewer Workflow **Quality Control Dashboard:** - Queue view of jobs pending review - Priority sorting by creation date - Job metadata preview - Quick status filtering **Review Interface:** - **Video Player**: HTML5 player with custom controls - **VTT Editor**: Syntax-highlighted editor with validation - **Side-by-Side View**: Simultaneous video and text editing - **Timing Tools**: Bulk timing adjustment with offset controls - **Review Controls**: Approve/reject with mandatory notes **Advanced Features:** - Keyboard shortcuts for efficient workflow (A=Approve, R=Reject, S=Save) - View mode switching (side-by-side, video-only, editor-only) - Real-time VTT validation and error highlighting - Unsaved changes warnings **Final Review Process:** - Asset validation before completion - Final quality checks - Client notification triggering - Completion workflow ### 5.3 Administrator Interface **User Management:** - Create users with role assignment - Password reset functionality - User activation/deactivation - Role-based permission enforcement **System Monitoring:** - Health check dashboard with component status - Job processing statistics and metrics - Queue monitoring for background tasks - Performance analytics **Audit and Security:** - Comprehensive audit logging - Security event monitoring - User activity tracking - System maintenance tools ## 6. Data Models and Storage ### 6.1 Job Data Structure ```typescript interface Job { id: string; // Unique job identifier client_id: string; // Owner client ID title: string; // Human-readable job name status: JobStatus; // Current processing status source: { filename: string; // Storage path original_filename: string; // User's original filename gcs_uri: string; // Google Cloud Storage URI duration_s: number; // Video duration in seconds language: string; // Source language code }; requested_outputs: { captions_vtt: boolean; // Closed captions requested audio_description_vtt: boolean; // Audio description script requested audio_description_mp3: boolean; // Audio voiceover requested languages: string[]; // Target languages transcreation: string[]; // Languages requiring cultural adaptation }; outputs: { [language: string]: { captions_vtt_gcs?: string; // VTT file location ad_vtt_gcs?: string; // Audio description VTT location ad_mp3_gcs?: string; // Audio MP3 file location origin: "translate" | "transcreate"; // Processing method qa_notes?: string; // Quality assurance notes }; }; ai: { ingestion_json: object; // Full AI response data confidence: number; // AI confidence score (0-1) }; review: { notes: string; // Current reviewer notes reviewer_id?: string; // Last reviewer ID history: ReviewHistoryItem[]; // Complete review history }; created_at: Date; updated_at: Date; error?: ErrorInfo; // Processing error details } ``` ### 6.2 User Data Structure ```typescript interface User { id: string; email: string; // Unique login identifier hashed_password: string; // Bcrypt hashed password full_name: string; // Display name role: "client" | "reviewer" | "admin"; is_active: boolean; // Account status created_at: Date; updated_at: Date; } ``` ### 6.3 File Storage Organization **Google Cloud Storage Bucket Structure:** ``` gs://accessible-video/ ├── {jobId}/ │ ├── source.mp4 # Original video │ ├── en/ │ │ ├── captions.vtt # English captions │ │ ├── ad.vtt # English audio description │ │ └── ad.mp3 # English audio file │ ├── es/ │ │ ├── captions.vtt # Spanish captions │ │ ├── ad.vtt # Spanish audio description │ │ └── ad.mp3 # Spanish audio file │ └── [other languages]/ └── health_check_dummy # System health verification ``` **Security Features:** - Signed URLs with 24-hour expiration - Role-based access control - Automatic cleanup on job deletion - Secure upload with content-type validation ## 7. API Design ### 7.1 Authentication Endpoints ``` POST /api/v1/auth/login POST /api/v1/auth/refresh POST /api/v1/auth/logout ``` ### 7.2 Job Management Endpoints ``` POST /api/v1/jobs # Create new job GET /api/v1/jobs # List jobs (filtered by role) GET /api/v1/jobs/{id} # Get job details DELETE /api/v1/jobs/{id} # Delete job DELETE /api/v1/jobs/bulk # Bulk delete (admin only) # Job Actions POST /api/v1/jobs/{id}/actions/approve_english POST /api/v1/jobs/{id}/actions/reject POST /api/v1/jobs/{id}/actions/complete POST /api/v1/jobs/{id}/actions/reject_final # Content Management GET /api/v1/jobs/{id}/vtt # Get VTT content PATCH /api/v1/jobs/{id}/vtt # Update VTT content POST /api/v1/jobs/{id}/vtt/adjust-timing # Adjust timing GET /api/v1/jobs/{id}/downloads # Get download URLs GET /api/v1/jobs/{id}/validate # Validate assets ``` ### 7.3 Administrative Endpoints ``` # User Management GET /api/v1/admin/users POST /api/v1/admin/users GET /api/v1/admin/users/{id} PATCH /api/v1/admin/users/{id} DELETE /api/v1/admin/users/{id} # System Monitoring GET /api/v1/admin/stats GET /api/v1/admin/health/detailed GET /api/v1/admin/jobs/stats GET /api/v1/admin/audit-logs ``` ### 7.4 File Management ``` GET /api/v1/files/signed-url/{path} # Generate signed download URL POST /api/v1/files/upload # Direct file upload endpoint ``` ### 7.5 Real-time Updates **WebSocket Endpoints:** - `/ws/jobs` - General job status updates - `/ws/jobs/{job_id}` - Job-specific status updates **WebSocket Message Format:** ```json { "job_id": "string", "status": "string", "updated_at": "ISO8601", "job_title": "string", "message": "string", "progress": "number" } ``` ## 8. AI Services Integration ### 8.1 Google Gemini 2.5 Pro Integration **Content Generation Capabilities:** - Video content analysis and understanding - Automatic transcript generation - Closed caption creation with proper timing - Audio description generation for visual elements - Content confidence scoring **Processing Flow:** 1. Video upload to Gemini Files API 2. Content generation using multimodal prompt 3. Structured JSON response parsing 4. Error handling and self-healing for invalid responses 5. Automatic file cleanup after processing **Quality Assurance:** - VTT format validation - Timestamp accuracy verification - Content completeness checks - Fallback content generation for missing elements ### 8.2 Translation Services **Google Cloud Translate:** - High-quality machine translation for standard content - Support for multiple target languages - VTT format preservation during translation - Batch processing for efficiency **Transcreation via Gemini:** - Cultural adaptation for marketing content - Context-aware translation with brand guidelines - Maintained timing synchronization - Creative adaptation while preserving meaning ### 8.3 Text-to-Speech Integration **ElevenLabs TTS Service:** - High-quality voice synthesis - Language-specific voice selection - MP3 output format - Proper pronunciation for accessibility terms **Audio Processing:** - Per-cue synthesis for precise timing - Audio quality optimization - File format standardization - Integration with VTT timing ## 9. Quality Control Features ### 9.1 Review Workflow **Content Review Process:** 1. **Initial Review**: AI-generated content assessment 2. **Content Editing**: Direct VTT file modification 3. **Synchronization Check**: Video timing validation 4. **Quality Verification**: Accessibility standards compliance 5. **Final Approval**: Content ready for translation **Review Tools:** - Integrated video player with caption overlay - Syntax-highlighted VTT editor - Real-time content validation - Timing adjustment utilities - Review history tracking ### 9.2 Quality Metrics **AI Confidence Scoring:** - Content generation confidence (0-100%) - Quality indicators for reviewer guidance - Threshold-based workflow routing **Review Analytics:** - Processing time tracking - Reviewer performance metrics - Quality score trending - Error rate monitoring ## 10. Security and Compliance ### 10.1 Data Security **Authentication Security:** - JWT token-based authentication - HttpOnly cookie refresh tokens - Automatic token rotation - Secure password hashing (bcrypt) **File Security:** - Signed URL access control - Time-limited download permissions - Secure file upload validation - Automatic cleanup procedures **API Security:** - CORS protection - Rate limiting - Input validation and sanitization - SQL injection prevention (NoSQL) ### 10.2 Privacy Protection **Data Handling:** - Client data isolation - Role-based access enforcement - Audit trail maintenance - Secure data deletion **Content Protection:** - Temporary file processing - Secure cloud storage - Access logging - Data retention policies ### 10.3 Audit and Compliance **Audit Logging:** - User action tracking - System event logging - Security event monitoring - Performance metric collection **Compliance Features:** - Data export capabilities - User consent management - Access control documentation - Security incident tracking ## 11. Performance and Scalability ### 11.1 System Performance **Backend Performance:** - Async request handling with FastAPI - Background task processing via Celery - Database query optimization - Caching strategy with Redis **Frontend Performance:** - React Query for data caching - Lazy loading of components - Optimized bundle splitting - Progressive web app features ### 11.2 Scalability Architecture **Horizontal Scaling:** - Stateless API servers - Independent worker processes - Load balancing ready - Database connection pooling **Resource Optimization:** - File compression and optimization - CDN integration ready - Memory-efficient processing - Garbage collection optimization ### 11.3 Monitoring and Observability **Health Monitoring:** - Component health checks - Service dependency monitoring - Performance metric collection - Error rate tracking **Logging and Debugging:** - Structured logging with correlation IDs - Error tracking and alerting - Performance profiling - Debug mode capabilities ## 12. Deployment and Infrastructure ### 12.1 Containerization **Docker Configuration:** - Multi-stage builds for optimization - Health check integration - Environment-based configuration - Security-hardened images ### 12.2 Environment Configuration **Development Environment:** - Local Docker Compose setup - Hot-reload development servers - Test database seeding - Mock external services **Production Environment:** - Cloud-native deployment - SSL/TLS termination - Environment variable management - Secret management integration ### 12.3 Database Management **MongoDB Configuration:** - Document schema validation - Index optimization - Replica set support - Backup and recovery procedures **Migration System:** - Schema version tracking - Safe migration procedures - Rollback capabilities - Data integrity validation ## 13. Testing Strategy ### 13.1 Testing Levels **Unit Testing:** - Service layer testing - Utility function testing - Component testing - Mock external dependencies **Integration Testing:** - API endpoint testing - Database integration testing - File storage integration - Authentication flow testing **End-to-End Testing:** - Complete user workflow testing - Cross-browser compatibility - Mobile responsiveness - Performance testing ### 13.2 Testing Tools **Backend Testing:** - PyTest for unit and integration tests - Factory Boy for test data generation - Async test support - Mock external services **Frontend Testing:** - Jest for unit testing - React Testing Library - Playwright for E2E testing - Visual regression testing ## 14. Error Handling and Recovery ### 14.1 Error Classification **User Errors:** - Invalid file formats - Insufficient permissions - Validation failures - Authentication errors **System Errors:** - External service failures - Database connection issues - File storage problems - Processing timeouts **Recovery Strategies:** - Automatic retry mechanisms - Graceful degradation - User-friendly error messages - Administrative error resolution ### 14.2 Reliability Features **Fault Tolerance:** - Circuit breaker patterns - Timeout configurations - Retry logic with exponential backoff - Fallback procedures **Data Integrity:** - Transaction management - Consistent state handling - Backup and recovery - Data validation ## 15. Configuration and Customization ### 15.1 System Configuration **Application Settings:** - Environment-specific configurations - Feature flag support - Service endpoint configuration - Security parameter tuning **Processing Configuration:** - AI model parameters - Translation service options - File size limits - Processing timeouts ### 15.2 User Customization **Client Settings:** - Language preferences - Notification preferences - Default job settings - Download preferences **Reviewer Settings:** - Workflow preferences - Editor configurations - Keyboard shortcuts - Quality thresholds ## 16. Future Enhancements ### 16.1 Planned Features **Enhanced AI Capabilities:** - Multi-modal content analysis - Improved accuracy metrics - Custom model training - Advanced quality scoring **Extended Language Support:** - Additional target languages - Regional dialect support - Custom transcreation workflows - Cultural adaptation tools **Advanced Workflow Features:** - Batch processing capabilities - Template-based job creation - Advanced approval workflows - Custom review stages ### 16.2 Integration Opportunities **Third-Party Integrations:** - Content management systems - Video hosting platforms - Accessibility testing tools - Quality assurance services **API Extensions:** - Webhook support for job events - Advanced reporting APIs - Bulk operation endpoints - Custom integration points ## 17. Conclusion The Video Accessibility Processing Platform represents a comprehensive solution for automated video accessibility content generation. Built with modern web technologies and integrated with leading AI services, the platform provides an end-to-end workflow from video upload to final content delivery. The system's architecture supports scalability, security, and reliability while maintaining a focus on user experience and content quality. The role-based access control ensures appropriate separation of concerns between content creators, quality reviewers, and system administrators. With its robust API design, real-time updates, and comprehensive error handling, the platform serves as a professional-grade solution for organizations requiring high-quality accessibility content at scale. --- *This specification document serves as the comprehensive technical and functional guide for the Video Accessibility Processing Platform, detailing all implemented features, workflows, and system capabilities as of the current release.*