24 KiB
Video Accessibility Processing Platform - Software Specification
1. Executive Summary
The Video Accessibility Processing Platform is a comprehensive web application designed to automatically generate closed captions and audio descriptions for video content using artificial intelligence. The platform provides a complete workflow from video upload through AI processing, human quality control, multi-language translation, and final content delivery.
Core Capabilities:
- Automated generation of closed captions and audio descriptions using Google Gemini 2.5 Pro
- Multi-language translation and transcreation services
- Professional quality control workflow for reviewers
- Text-to-speech generation for audio descriptions
- Role-based access control for clients, reviewers, and administrators
- Real-time job status updates via WebSocket connections
- Secure file storage and signed URL download system
Target Users:
- Clients: Organizations needing video accessibility services
- Reviewers: Professional accessibility specialists who review and approve content
- Administrators: System administrators managing users and system operations
2. System Architecture
2.1 Technology Stack
Frontend:
- React 18 with TypeScript
- Vite for build tooling
- TanStack Query for state management
- React Router for navigation
- Tailwind CSS for styling
Backend:
- FastAPI (Python 3.11+) for REST API
- Celery with Redis for background task processing
- MongoDB Atlas for data storage
- JWT authentication with HttpOnly refresh cookies
External Services:
- Google Cloud Storage for file storage
- Google Gemini 2.5 Pro for AI processing
- Google Cloud Translate for language translation
- ElevenLabs for text-to-speech synthesis
Infrastructure:
- Docker containerization
- Redis for caching and task queues
- WebSocket support for real-time updates
2.2 System Components
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ React SPA │ │ FastAPI │ │ Celery │
│ Frontend │◄──►│ Backend │◄──►│ Workers │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ MongoDB │ │ Redis │
│ Database │ │ Queue/Cache │
└─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Google Cloud │
│ Storage │
└─────────────────┘
3. User Roles and Access Control
3.1 Role Definitions
Client Role:
- Upload videos and create processing jobs
- View own job status and progress
- Download completed accessibility assets
- Limited to own content only
Reviewer Role:
- Access quality control dashboard
- Review AI-generated content for accuracy
- Edit VTT files (captions and audio descriptions)
- Approve or reject English content
- Perform final review of completed jobs
- Access to all jobs in system
Admin Role:
- Full system access including all reviewer capabilities
- User management (create, edit, deactivate users)
- System monitoring and health checks
- Bulk operations and maintenance tasks
- Access to audit logs and system statistics
3.2 Authentication System
JWT Token Management:
- Access tokens stored in memory (15-minute expiry)
- Refresh tokens stored in HttpOnly cookies (7-day expiry)
- Automatic token refresh for active sessions
- Secure logout with cookie clearing
Security Features:
- Password hashing using bcrypt
- CORS protection with configurable origins
- Rate limiting on authentication endpoints
- Session-based security with proper token rotation
4. Job Processing Workflow
4.1 Job Status State Machine
The system implements a comprehensive state machine for tracking job progress:
created → ingesting → ai_processing → pending_qc → approved_english → translating → tts_generating → pending_final_review → completed
↓
rejected → (manual intervention required)
↓
qc_feedback → (back to pending_qc after fixes)
Status Definitions:
- created: Job record created, video uploaded to storage
- ingesting: Video being processed for metadata extraction
- ai_processing: AI analyzing video content and generating captions/audio descriptions
- pending_qc: Awaiting human quality control review
- approved_english: English content approved, ready for translation
- rejected: Content rejected, requires client revision
- qc_feedback: Reviewer provided feedback, awaiting fixes
- translating: Processing multi-language translations
- tts_generating: Generating audio files from text descriptions
- pending_final_review: All content ready, awaiting final approval
- completed: Job finished, all assets available for download
4.2 Processing Pipeline
Phase 1: Upload and Ingestion
- Client uploads MP4 video file through web interface
- File stored in Google Cloud Storage with unique job ID path
- Job record created in MongoDB with metadata
- Background Celery task queued for processing
Phase 2: AI Content Generation
- Video file sent to Google Gemini 2.5 Pro API
- AI generates:
- Plain text transcript
- Closed captions in WebVTT format
- Audio description script in WebVTT format
- Confidence score for generated content
- Generated content stored in GCS and linked to job
- Job status updated to
pending_qc
Phase 3: Quality Control Review
- Reviewer accesses job through QC dashboard
- Side-by-side video player with generated captions/audio descriptions
- Inline VTT editor for making corrections
- Timing adjustment tools for synchronization
- Approve or reject with reviewer notes
- If approved, job moves to translation phase
Phase 4: Translation and Localization
- Automatic translation of approved English content
- Support for standard translation and cultural transcreation
- Available target languages: Spanish, French, German (expandable)
- Translated VTT files stored per language
Phase 5: Audio Generation
- Text-to-speech synthesis using ElevenLabs API
- MP3 files generated for each audio description track
- Language-specific voice selection
- Audio files stored alongside VTT content
Phase 6: Final Review and Delivery
- Final review by authorized reviewer
- Asset validation to ensure all requested outputs present
- Client notification of job completion
- Signed URL generation for secure downloads
5. User Interface and Experience
5.1 Client Workflow
Dashboard:
- Overview of all jobs with status indicators
- Quick actions for creating new jobs
- Real-time status updates via WebSocket
- Notification system for job completion
Job Creation Process:
- Video Upload: Drag-and-drop interface with progress tracking
- Job Configuration:
- Descriptive title
- Source language selection
- Output format selection (captions VTT, audio description VTT, audio MP3)
- Target languages for translation
- Processing Initiation: Automatic background processing begins
- Confirmation: Success page with job tracking link
Job Monitoring:
- Detailed status view with progress indicators
- Processing history timeline
- Real-time updates without page refresh
- Error notifications with context
Content Download:
- Secure download links for completed assets
- Organized by language (en/, es/, fr/, de/)
- File format options (VTT, MP3)
- Source video access
5.2 Reviewer Workflow
Quality Control Dashboard:
- Queue view of jobs pending review
- Priority sorting by creation date
- Job metadata preview
- Quick status filtering
Review Interface:
- Video Player: HTML5 player with custom controls
- VTT Editor: Syntax-highlighted editor with validation
- Side-by-Side View: Simultaneous video and text editing
- Timing Tools: Bulk timing adjustment with offset controls
- Review Controls: Approve/reject with mandatory notes
Advanced Features:
- Keyboard shortcuts for efficient workflow (A=Approve, R=Reject, S=Save)
- View mode switching (side-by-side, video-only, editor-only)
- Real-time VTT validation and error highlighting
- Unsaved changes warnings
Final Review Process:
- Asset validation before completion
- Final quality checks
- Client notification triggering
- Completion workflow
5.3 Administrator Interface
User Management:
- Create users with role assignment
- Password reset functionality
- User activation/deactivation
- Role-based permission enforcement
System Monitoring:
- Health check dashboard with component status
- Job processing statistics and metrics
- Queue monitoring for background tasks
- Performance analytics
Audit and Security:
- Comprehensive audit logging
- Security event monitoring
- User activity tracking
- System maintenance tools
6. Data Models and Storage
6.1 Job Data Structure
interface Job {
id: string; // Unique job identifier
client_id: string; // Owner client ID
title: string; // Human-readable job name
status: JobStatus; // Current processing status
source: {
filename: string; // Storage path
original_filename: string; // User's original filename
gcs_uri: string; // Google Cloud Storage URI
duration_s: number; // Video duration in seconds
language: string; // Source language code
};
requested_outputs: {
captions_vtt: boolean; // Closed captions requested
audio_description_vtt: boolean; // Audio description script requested
audio_description_mp3: boolean; // Audio voiceover requested
languages: string[]; // Target languages
transcreation: string[]; // Languages requiring cultural adaptation
};
outputs: {
[language: string]: {
captions_vtt_gcs?: string; // VTT file location
ad_vtt_gcs?: string; // Audio description VTT location
ad_mp3_gcs?: string; // Audio MP3 file location
origin: "translate" | "transcreate"; // Processing method
qa_notes?: string; // Quality assurance notes
};
};
ai: {
ingestion_json: object; // Full AI response data
confidence: number; // AI confidence score (0-1)
};
review: {
notes: string; // Current reviewer notes
reviewer_id?: string; // Last reviewer ID
history: ReviewHistoryItem[]; // Complete review history
};
created_at: Date;
updated_at: Date;
error?: ErrorInfo; // Processing error details
}
6.2 User Data Structure
interface User {
id: string;
email: string; // Unique login identifier
hashed_password: string; // Bcrypt hashed password
full_name: string; // Display name
role: "client" | "reviewer" | "admin";
is_active: boolean; // Account status
created_at: Date;
updated_at: Date;
}
6.3 File Storage Organization
Google Cloud Storage Bucket Structure:
gs://accessible-video/
├── {jobId}/
│ ├── source.mp4 # Original video
│ ├── en/
│ │ ├── captions.vtt # English captions
│ │ ├── ad.vtt # English audio description
│ │ └── ad.mp3 # English audio file
│ ├── es/
│ │ ├── captions.vtt # Spanish captions
│ │ ├── ad.vtt # Spanish audio description
│ │ └── ad.mp3 # Spanish audio file
│ └── [other languages]/
└── health_check_dummy # System health verification
Security Features:
- Signed URLs with 24-hour expiration
- Role-based access control
- Automatic cleanup on job deletion
- Secure upload with content-type validation
7. API Design
7.1 Authentication Endpoints
POST /api/v1/auth/login
POST /api/v1/auth/refresh
POST /api/v1/auth/logout
7.2 Job Management Endpoints
POST /api/v1/jobs # Create new job
GET /api/v1/jobs # List jobs (filtered by role)
GET /api/v1/jobs/{id} # Get job details
DELETE /api/v1/jobs/{id} # Delete job
DELETE /api/v1/jobs/bulk # Bulk delete (admin only)
# Job Actions
POST /api/v1/jobs/{id}/actions/approve_english
POST /api/v1/jobs/{id}/actions/reject
POST /api/v1/jobs/{id}/actions/complete
POST /api/v1/jobs/{id}/actions/reject_final
# Content Management
GET /api/v1/jobs/{id}/vtt # Get VTT content
PATCH /api/v1/jobs/{id}/vtt # Update VTT content
POST /api/v1/jobs/{id}/vtt/adjust-timing # Adjust timing
GET /api/v1/jobs/{id}/downloads # Get download URLs
GET /api/v1/jobs/{id}/validate # Validate assets
7.3 Administrative Endpoints
# User Management
GET /api/v1/admin/users
POST /api/v1/admin/users
GET /api/v1/admin/users/{id}
PATCH /api/v1/admin/users/{id}
DELETE /api/v1/admin/users/{id}
# System Monitoring
GET /api/v1/admin/stats
GET /api/v1/admin/health/detailed
GET /api/v1/admin/jobs/stats
GET /api/v1/admin/audit-logs
7.4 File Management
GET /api/v1/files/signed-url/{path} # Generate signed download URL
POST /api/v1/files/upload # Direct file upload endpoint
7.5 Real-time Updates
WebSocket Endpoints:
/ws/jobs- General job status updates/ws/jobs/{job_id}- Job-specific status updates
WebSocket Message Format:
{
"job_id": "string",
"status": "string",
"updated_at": "ISO8601",
"job_title": "string",
"message": "string",
"progress": "number"
}
8. AI Services Integration
8.1 Google Gemini 2.5 Pro Integration
Content Generation Capabilities:
- Video content analysis and understanding
- Automatic transcript generation
- Closed caption creation with proper timing
- Audio description generation for visual elements
- Content confidence scoring
Processing Flow:
- Video upload to Gemini Files API
- Content generation using multimodal prompt
- Structured JSON response parsing
- Error handling and self-healing for invalid responses
- Automatic file cleanup after processing
Quality Assurance:
- VTT format validation
- Timestamp accuracy verification
- Content completeness checks
- Fallback content generation for missing elements
8.2 Translation Services
Google Cloud Translate:
- High-quality machine translation for standard content
- Support for multiple target languages
- VTT format preservation during translation
- Batch processing for efficiency
Transcreation via Gemini:
- Cultural adaptation for marketing content
- Context-aware translation with brand guidelines
- Maintained timing synchronization
- Creative adaptation while preserving meaning
8.3 Text-to-Speech Integration
ElevenLabs TTS Service:
- High-quality voice synthesis
- Language-specific voice selection
- MP3 output format
- Proper pronunciation for accessibility terms
Audio Processing:
- Per-cue synthesis for precise timing
- Audio quality optimization
- File format standardization
- Integration with VTT timing
9. Quality Control Features
9.1 Review Workflow
Content Review Process:
- Initial Review: AI-generated content assessment
- Content Editing: Direct VTT file modification
- Synchronization Check: Video timing validation
- Quality Verification: Accessibility standards compliance
- Final Approval: Content ready for translation
Review Tools:
- Integrated video player with caption overlay
- Syntax-highlighted VTT editor
- Real-time content validation
- Timing adjustment utilities
- Review history tracking
9.2 Quality Metrics
AI Confidence Scoring:
- Content generation confidence (0-100%)
- Quality indicators for reviewer guidance
- Threshold-based workflow routing
Review Analytics:
- Processing time tracking
- Reviewer performance metrics
- Quality score trending
- Error rate monitoring
10. Security and Compliance
10.1 Data Security
Authentication Security:
- JWT token-based authentication
- HttpOnly cookie refresh tokens
- Automatic token rotation
- Secure password hashing (bcrypt)
File Security:
- Signed URL access control
- Time-limited download permissions
- Secure file upload validation
- Automatic cleanup procedures
API Security:
- CORS protection
- Rate limiting
- Input validation and sanitization
- SQL injection prevention (NoSQL)
10.2 Privacy Protection
Data Handling:
- Client data isolation
- Role-based access enforcement
- Audit trail maintenance
- Secure data deletion
Content Protection:
- Temporary file processing
- Secure cloud storage
- Access logging
- Data retention policies
10.3 Audit and Compliance
Audit Logging:
- User action tracking
- System event logging
- Security event monitoring
- Performance metric collection
Compliance Features:
- Data export capabilities
- User consent management
- Access control documentation
- Security incident tracking
11. Performance and Scalability
11.1 System Performance
Backend Performance:
- Async request handling with FastAPI
- Background task processing via Celery
- Database query optimization
- Caching strategy with Redis
Frontend Performance:
- React Query for data caching
- Lazy loading of components
- Optimized bundle splitting
- Progressive web app features
11.2 Scalability Architecture
Horizontal Scaling:
- Stateless API servers
- Independent worker processes
- Load balancing ready
- Database connection pooling
Resource Optimization:
- File compression and optimization
- CDN integration ready
- Memory-efficient processing
- Garbage collection optimization
11.3 Monitoring and Observability
Health Monitoring:
- Component health checks
- Service dependency monitoring
- Performance metric collection
- Error rate tracking
Logging and Debugging:
- Structured logging with correlation IDs
- Error tracking and alerting
- Performance profiling
- Debug mode capabilities
12. Deployment and Infrastructure
12.1 Containerization
Docker Configuration:
- Multi-stage builds for optimization
- Health check integration
- Environment-based configuration
- Security-hardened images
12.2 Environment Configuration
Development Environment:
- Local Docker Compose setup
- Hot-reload development servers
- Test database seeding
- Mock external services
Production Environment:
- Cloud-native deployment
- SSL/TLS termination
- Environment variable management
- Secret management integration
12.3 Database Management
MongoDB Configuration:
- Document schema validation
- Index optimization
- Replica set support
- Backup and recovery procedures
Migration System:
- Schema version tracking
- Safe migration procedures
- Rollback capabilities
- Data integrity validation
13. Testing Strategy
13.1 Testing Levels
Unit Testing:
- Service layer testing
- Utility function testing
- Component testing
- Mock external dependencies
Integration Testing:
- API endpoint testing
- Database integration testing
- File storage integration
- Authentication flow testing
End-to-End Testing:
- Complete user workflow testing
- Cross-browser compatibility
- Mobile responsiveness
- Performance testing
13.2 Testing Tools
Backend Testing:
- PyTest for unit and integration tests
- Factory Boy for test data generation
- Async test support
- Mock external services
Frontend Testing:
- Jest for unit testing
- React Testing Library
- Playwright for E2E testing
- Visual regression testing
14. Error Handling and Recovery
14.1 Error Classification
User Errors:
- Invalid file formats
- Insufficient permissions
- Validation failures
- Authentication errors
System Errors:
- External service failures
- Database connection issues
- File storage problems
- Processing timeouts
Recovery Strategies:
- Automatic retry mechanisms
- Graceful degradation
- User-friendly error messages
- Administrative error resolution
14.2 Reliability Features
Fault Tolerance:
- Circuit breaker patterns
- Timeout configurations
- Retry logic with exponential backoff
- Fallback procedures
Data Integrity:
- Transaction management
- Consistent state handling
- Backup and recovery
- Data validation
15. Configuration and Customization
15.1 System Configuration
Application Settings:
- Environment-specific configurations
- Feature flag support
- Service endpoint configuration
- Security parameter tuning
Processing Configuration:
- AI model parameters
- Translation service options
- File size limits
- Processing timeouts
15.2 User Customization
Client Settings:
- Language preferences
- Notification preferences
- Default job settings
- Download preferences
Reviewer Settings:
- Workflow preferences
- Editor configurations
- Keyboard shortcuts
- Quality thresholds
16. Future Enhancements
16.1 Planned Features
Enhanced AI Capabilities:
- Multi-modal content analysis
- Improved accuracy metrics
- Custom model training
- Advanced quality scoring
Extended Language Support:
- Additional target languages
- Regional dialect support
- Custom transcreation workflows
- Cultural adaptation tools
Advanced Workflow Features:
- Batch processing capabilities
- Template-based job creation
- Advanced approval workflows
- Custom review stages
16.2 Integration Opportunities
Third-Party Integrations:
- Content management systems
- Video hosting platforms
- Accessibility testing tools
- Quality assurance services
API Extensions:
- Webhook support for job events
- Advanced reporting APIs
- Bulk operation endpoints
- Custom integration points
17. Conclusion
The Video Accessibility Processing Platform represents a comprehensive solution for automated video accessibility content generation. Built with modern web technologies and integrated with leading AI services, the platform provides an end-to-end workflow from video upload to final content delivery.
The system's architecture supports scalability, security, and reliability while maintaining a focus on user experience and content quality. The role-based access control ensures appropriate separation of concerns between content creators, quality reviewers, and system administrators.
With its robust API design, real-time updates, and comprehensive error handling, the platform serves as a professional-grade solution for organizations requiring high-quality accessibility content at scale.
This specification document serves as the comprehensive technical and functional guide for the Video Accessibility Processing Platform, detailing all implemented features, workflows, and system capabilities as of the current release.