video-accessibility-old/docs/video_accessibility_technical_docs_2025-08-24.md
2025-08-24 16:28:33 -05:00

42 KiB

Video Accessibility Platform - Technical Documentation

Generated: August 24, 2025

Table of Contents

  1. Application Architecture
  2. Data Models
  3. Detailed Features & Functions
  4. Process Flows
  5. User Journey
  6. Technical Concepts
  7. Security Architecture
  8. API Specifications

Application Architecture

High-Level Architecture

graph TB
    subgraph "Client Layer"
        UI[React SPA<br/>TypeScript + Vite]
        CDN[Google Cloud CDN<br/>Static Asset Delivery]
    end
    
    subgraph "API Layer"
        API[FastAPI Backend<br/>Python 3.11+]
        AUTH[JWT Authentication<br/>RBAC Authorization]
    end
    
    subgraph "Processing Layer" 
        CELERY[Celery Workers<br/>Background Processing]
        REDIS[Redis Queue<br/>Task Management]
    end
    
    subgraph "Data Layer"
        MONGO[MongoDB Atlas<br/>Document Database]
        GCS[Google Cloud Storage<br/>File Storage]
    end
    
    subgraph "AI Services"
        GEMINI[Gemini 2.5 Pro<br/>Video Analysis]
        TRANSLATE[Google Translate<br/>Localization]
        TTS[Text-to-Speech<br/>Google TTS + ElevenLabs]
    end
    
    subgraph "External Services"
        EMAIL[SendGrid<br/>Email Notifications]
        MONITORING[OpenTelemetry<br/>Observability]
    end
    
    UI --> API
    API --> AUTH
    API --> CELERY
    API --> MONGO
    API --> GCS
    CELERY --> REDIS
    CELERY --> GEMINI
    CELERY --> TRANSLATE
    CELERY --> TTS
    CELERY --> EMAIL
    API --> MONITORING
    UI --> CDN

Technology Stack Overview

Backend Stack

  • Framework: FastAPI with async/await support
  • Language: Python 3.11+
  • Database: MongoDB Atlas with Motor async driver
  • Queue System: Redis + Celery for distributed task processing
  • Cloud Storage: Google Cloud Storage with signed URL security
  • AI Services: Gemini 2.5 Pro, Google Translate, Google/ElevenLabs TTS
  • Authentication: JWT with HttpOnly refresh cookies and RBAC
  • Observability: OpenTelemetry tracing, Sentry error tracking, Prometheus metrics
  • Validation: Pydantic models with strict typing

Frontend Stack

  • Framework: React 18 with Vite build system
  • Language: TypeScript for type safety
  • Routing: React Router v6 with role-based route guards
  • State Management: TanStack Query (server state) + Zustand (minimal UI state)
  • Styling: Tailwind CSS with responsive design principles
  • Forms: React Hook Form with Zod validation
  • File Upload: React Dropzone with real-time progress tracking

Infrastructure

  • Containerization: Docker with optimized multi-stage builds
  • Deployment: Google Cloud Run for API and worker services
  • CDN: Google Cloud CDN for frontend asset delivery
  • Database: MongoDB Atlas (fully managed)
  • Caching: Redis (managed service)
  • Infrastructure as Code: Terraform for complete stack management

Monorepo Structure

video_accessibility/
├── backend/                 # FastAPI Python backend
│   ├── app/
│   │   ├── api/            # API route handlers
│   │   ├── models/         # Pydantic data models
│   │   ├── services/       # Business logic services
│   │   ├── tasks/          # Celery background tasks
│   │   └── core/           # Configuration and utilities
│   ├── tests/              # Backend test suites
│   └── requirements.txt    # Python dependencies
├── frontend/               # React TypeScript SPA
│   ├── src/
│   │   ├── components/     # Reusable UI components
│   │   ├── routes/         # Page-level components
│   │   ├── lib/            # Utilities and configurations
│   │   └── types/          # TypeScript type definitions
│   ├── public/             # Static assets
│   └── package.json        # Node.js dependencies
├── infra/                  # Infrastructure as Code
│   ├── cloud-run/          # Google Cloud Run deployment
│   └── cloud-cdn/          # CDN configuration
├── examples/               # Test video files for development
└── docker-compose.yml      # Local development environment

Data Models

Job State Machine

stateDiagram-v2
    [*] --> created: Client uploads video
    created --> ingesting: Worker starts processing
    ingesting --> ai_processing: Video uploaded to Gemini
    ai_processing --> pending_qc: AI generation complete
    pending_qc --> approved_english: QC approves content
    pending_qc --> rejected: QC rejects content
    rejected --> [*]: Job terminated
    approved_english --> translating: Multi-language processing
    translating --> tts_generating: Translation complete
    tts_generating --> pending_final_review: TTS synthesis complete
    pending_final_review --> completed: Final approval
    pending_final_review --> rejected: Final rejection
    completed --> [*]: Assets delivered to client

Core Data Models

Job Document (MongoDB)

{
  "_id": ObjectId("..."),
  "client_id": ObjectId("..."),
  "title": "Marketing Video Q3 2025",
  "source": {
    "filename": "marketing_q3.mp4",
    "original_filename": "Marketing Video - Final Cut.mp4",
    "gcs_uri": "gs://accessible-video/job123/source.mp4",
    "duration_s": 180.5,
    "language": "en"
  },
  "requested_outputs": {
    "captions_vtt": true,
    "audio_description_vtt": true,
    "audio_description_mp3": true,
    "languages": ["es", "fr", "de"],
    "transcreation": ["es"] // Cultural adaptation vs direct translation
  },
  "status": "pending_qc", // State machine value
  "review": {
    "notes": "Minor timing adjustments needed",
    "reviewer_id": ObjectId("..."),
    "history": [
      {
        "at": "2025-08-24T10:30:00Z",
        "status": "pending_qc",
        "by": "reviewer@company.com",
        "notes": "Ready for QC review"
      }
    ]
  },
  "outputs": {
    "en": {
      "captions_vtt_gcs": "gs://accessible-video/job123/en/captions.vtt",
      "ad_vtt_gcs": "gs://accessible-video/job123/en/ad.vtt",
      "ad_mp3_gcs": "gs://accessible-video/job123/en/ad.mp3"
    },
    "es": {
      "captions_vtt_gcs": "gs://accessible-video/job123/es/captions.vtt",
      "ad_vtt_gcs": "gs://accessible-video/job123/es/ad.vtt", 
      "ad_mp3_gcs": "gs://accessible-video/job123/es/ad.mp3",
      "origin": "transcreate", // vs "translate"
      "qa_notes": "Cultural references adapted for Spanish market"
    }
  },
  "ai": {
    "ingestion_json": {
      "captions": [...],
      "audio_descriptions": [...],
      "confidence": 0.94
    },
    "confidence": 0.94
  },
  "created_at": "2025-08-24T09:00:00Z",
  "updated_at": "2025-08-24T10:35:00Z"
}

User Document (MongoDB)

{
  "_id": ObjectId("..."),
  "email": "client@company.com",
  "hashed_password": "$2b$12$...",
  "full_name": "Jane Smith",
  "role": "client", // client | reviewer | admin
  "is_active": true,
  "created_at": "2025-01-15T08:00:00Z"
}

File Storage Structure (Google Cloud Storage)

gs://accessible-video/{jobId}/
├── source.mp4                     # Original uploaded video
├── en/                           # English outputs
│   ├── captions.vtt              # English closed captions
│   ├── ad.vtt                    # English audio description script
│   └── ad.mp3                    # English audio description audio
├── es/                           # Spanish outputs (example)
│   ├── captions.vtt              # Spanish captions (translated/transcreated)
│   ├── ad.vtt                    # Spanish audio description script
│   └── ad.mp3                    # Spanish TTS audio
└── {additional_languages}/       # Additional language folders
    └── ...                       # Same structure per language

API Data Models (Pydantic)

Key backend models ensuring type safety:

  • JobResponse: Complete job data for API responses
  • CreateJobRequest: Video upload with metadata
  • UpdateJobRequest: Partial job updates
  • ReviewAction: QC approval/rejection with notes
  • VttUpdateRequest: Caption editing operations
  • DownloadResponse: Signed URL generation
  • UserResponse: User data without sensitive fields

Detailed Features & Functions

1. Video Upload & Processing System

Upload Features

  • Drag-and-drop interface with React Dropzone
  • Progress tracking with real-time upload percentage
  • File validation (format, size, duration limits)
  • Secure multipart upload to Google Cloud Storage
  • Metadata extraction using ffprobe for video properties

AI Processing Pipeline

  • Gemini 2.5 Pro integration for content analysis
  • Structured JSON output with self-healing parsing
  • Dual output generation: Closed captions + Audio descriptions
  • Confidence scoring for quality assessment
  • Error recovery mechanisms for failed AI calls

2. Quality Control (QC) System

VTT Editor Component

interface VttEditorProps {
  jobId: string;
  language: string;
  type: 'captions' | 'audio_description';
  readonly?: boolean;
}

Features:

  • Rich text editing of VTT caption content
  • Timing preservation during text modifications
  • Bulk timing adjustments for synchronization fixes
  • Preview integration with HTML5 video player
  • Validation of WebVTT format compliance
  • Auto-save functionality with conflict resolution

Review Workflow

  • Assignment system for reviewer workload management
  • Approval/rejection with mandatory review notes
  • History tracking for audit trail and accountability
  • Quality metrics tracking for process improvement

3. Multi-Language Translation System

Translation Options

  • Standard Translation: Google Cloud Translate for direct language conversion
  • Transcreation: Gemini-powered cultural adaptation for marketing content
  • VTT Structure Preservation: Maintains timing and formatting across languages

Process Flow

  1. QC approval triggers translation pipeline
  2. Parallel processing for multiple target languages
  3. VTT timing synchronization across all languages
  4. Quality validation before TTS synthesis

4. Text-to-Speech (TTS) Integration

Supported Providers

  • Google Cloud TTS: High-quality natural voices
  • ElevenLabs: Premium AI voices for enhanced quality
  • Voice Configuration: Per-language voice selection

Audio Processing

  • Per-cue synthesis: Individual audio clips for each VTT segment
  • Timing synchronization: Audio duration matches VTT timing
  • Cross-fade stitching: Seamless audio transitions
  • Format optimization: MP3 output with appropriate bitrates

5. Role-Based Access Control (RBAC)

User Roles & Permissions

graph TD
    subgraph "Role Hierarchy"
        ADMIN[Admin<br/>Full system access]
        REVIEWER[Reviewer<br/>QC operations]
        CLIENT[Client<br/>Own jobs only]
    end
    
    subgraph "Client Permissions"
        C1[Create jobs]
        C2[View own jobs]
        C3[Download outputs]
        C4[Update job metadata]
    end
    
    subgraph "Reviewer Permissions"
        R1[View all jobs]
        R2[Edit VTT content]
        R3[Approve/reject jobs]
        R4[Access QC dashboard]
    end
    
    subgraph "Admin Permissions"
        A1[User management]
        A2[Bulk operations]
        A3[System configuration]
        A4[Analytics access]
    end
    
    CLIENT --> C1
    CLIENT --> C2
    CLIENT --> C3
    CLIENT --> C4
    
    REVIEWER --> C1
    REVIEWER --> C2
    REVIEWER --> C3
    REVIEWER --> C4
    REVIEWER --> R1
    REVIEWER --> R2
    REVIEWER --> R3
    REVIEWER --> R4
    
    ADMIN --> C1
    ADMIN --> C2
    ADMIN --> C3
    ADMIN --> C4
    ADMIN --> R1
    ADMIN --> R2
    ADMIN --> R3
    ADMIN --> R4
    ADMIN --> A1
    ADMIN --> A2
    ADMIN --> A3
    ADMIN --> A4

6. File Management & Security

Secure File Access

  • Signed URLs with 24-hour expiration for all file downloads
  • CORS configuration allowing cross-origin access for authenticated users
  • Content-type validation ensuring proper file handling
  • Audit logging for all file access operations

Storage Optimization

  • Lifecycle policies for automatic cleanup of expired files
  • Compression for VTT text files
  • CDN integration for fast global file delivery

7. Notification System

Email Integration (SendGrid)

  • HTML email templates with professional styling
  • Signed URL embedding for secure file access
  • Notification triggers: Job completion, approval, rejection
  • Expiration reminders for download links

Real-time Updates

  • WebSocket connections for live status updates (planned)
  • Browser notifications for important events
  • In-app notification center for user alerts

8. Dashboard & Analytics

Client Dashboard

  • Recent jobs overview with status indicators
  • Upload statistics and processing times
  • Download history with expiration tracking
  • Usage analytics for billing and optimization

Admin Analytics

  • System performance metrics (processing times, error rates)
  • User activity tracking and engagement metrics
  • Resource utilization monitoring
  • Quality control statistics and reviewer performance

Process Flows

Complete Video Processing Pipeline

sequenceDiagram
    participant Client
    participant Frontend
    participant API
    participant GCS as Cloud Storage
    participant Celery as Worker Queue
    participant Gemini as AI Service
    participant Reviewer
    participant TTS as Text-to-Speech
    participant Email as Notification
    
    Client->>Frontend: Upload video file
    Frontend->>API: POST /jobs (multipart)
    API->>GCS: Store video file
    API->>Celery: Queue ingestion task
    API-->>Frontend: Job created (status: created)
    
    Celery->>GCS: Download video
    Celery->>Gemini: Process video with AI
    Gemini-->>Celery: Return captions + audio descriptions
    Celery->>GCS: Upload VTT files
    Celery->>API: Update job (status: pending_qc)
    
    Reviewer->>Frontend: Access QC dashboard
    Frontend->>API: GET /admin/qc
    API-->>Frontend: Jobs pending review
    Reviewer->>Frontend: Edit and approve VTT content
    Frontend->>API: POST /jobs/{id}/actions/approve_english
    API->>Celery: Queue translation task
    
    Celery->>API: Process translations
    Celery->>TTS: Generate audio descriptions
    TTS-->>Celery: Return MP3 files
    Celery->>GCS: Upload translated assets
    Celery->>API: Update job (status: pending_final_review)
    
    Reviewer->>Frontend: Final review and approval
    Frontend->>API: POST /jobs/{id}/actions/complete
    API->>Email: Send completion notification
    Email-->>Client: Notification with download links
    API->>API: Update job (status: completed)

Quality Control Workflow

flowchart TD
    START([Job AI Processing Complete]) --> PENDING[Status: pending_qc]
    PENDING --> ASSIGN[Assign to Reviewer]
    ASSIGN --> REVIEW[Reviewer Opens VTT Editor]
    
    REVIEW --> EDIT{Edit Required?}
    EDIT -->|Yes| MODIFY[Modify VTT Content]
    MODIFY --> VALIDATE[Validate VTT Format]
    VALIDATE --> SAVE[Auto-save Changes]
    SAVE --> DECISION{Approval Decision}
    
    EDIT -->|No| DECISION
    
    DECISION -->|Approve| APPROVE[POST approve_english]
    DECISION -->|Reject| REJECT[POST reject with notes]
    
    APPROVE --> TRANSLATION[Trigger Translation Pipeline]
    TRANSLATION --> FINAL_REVIEW[Status: pending_final_review]
    
    REJECT --> REJECTED[Status: rejected]
    REJECTED --> NOTIFY_CLIENT[Email Client with Notes]
    NOTIFY_CLIENT --> END([Job Terminated])
    
    FINAL_REVIEW --> FINAL_DECISION{Final Approval?}
    FINAL_DECISION -->|Approve| COMPLETE[Status: completed]
    FINAL_DECISION -->|Reject| FINAL_REJECT[Status: rejected]
    
    COMPLETE --> NOTIFY_COMPLETION[Email with Download Links]
    NOTIFY_COMPLETION --> END
    
    FINAL_REJECT --> NOTIFY_CLIENT

Authentication & Authorization Flow

sequenceDiagram
    participant User
    participant Frontend
    participant API
    participant Auth as Auth Service
    participant DB as Database
    
    User->>Frontend: Enter credentials
    Frontend->>API: POST /auth/login
    API->>DB: Validate user credentials
    DB-->>API: User data + role
    
    API->>Auth: Generate JWT access token
    API->>Auth: Generate refresh token
    Auth-->>API: Tokens created
    
    API->>API: Set refresh token as HttpOnly cookie
    API-->>Frontend: Access token + user data
    Frontend->>Frontend: Store access token in memory
    
    Note over Frontend,API: Subsequent API requests
    
    Frontend->>API: Request with Authorization header
    API->>Auth: Validate access token
    Auth-->>API: Token valid + user claims
    API->>API: Check RBAC permissions
    API-->>Frontend: Authorized response
    
    Note over Frontend,API: Token refresh flow
    
    API-->>Frontend: 401 Unauthorized (expired token)
    Frontend->>API: POST /auth/refresh (with cookie)
    API->>Auth: Validate refresh token
    Auth->>Auth: Generate new access token
    Auth-->>API: New access token
    API-->>Frontend: New access token
    Frontend->>API: Retry original request

User Journey

Client User Journey

journey
    title Video Accessibility Client Journey
    section Video Upload
      Login to Platform: 5: Client
      Navigate to New Job: 4: Client
      Select Video File: 3: Client
      Fill Job Details: 3: Client
      Upload with Progress: 4: Client
      Receive Confirmation: 5: Client
    
    section Processing Wait
      Receive Processing Email: 4: Client
      Check Job Status: 3: Client
      Wait for QC Complete: 2: Client
    
    section Review & Download
      Receive Completion Email: 5: Client
      Access Download Links: 4: Client
      Download VTT Files: 5: Client
      Download Audio Files: 5: Client
      Validate Output Quality: 4: Client

Reviewer User Journey

journey
    title Reviewer Quality Control Journey
    section Daily Workflow
      Login to QC Dashboard: 4: Reviewer
      View Pending Jobs Queue: 4: Reviewer
      Select Job for Review: 3: Reviewer
    
    section Content Review
      Open VTT Editor: 4: Reviewer
      Review Generated Captions: 3: Reviewer
      Edit Text Content: 2: Reviewer
      Adjust Timing if Needed: 2: Reviewer
      Preview with Video Player: 4: Reviewer
    
    section Decision Making
      Assess Content Quality: 3: Reviewer
      Add Review Notes: 3: Reviewer
      Make Approval Decision: 4: Reviewer
      Submit Review: 5: Reviewer
    
    section Final Review
      Review Translated Content: 3: Reviewer
      Validate TTS Output: 4: Reviewer
      Final Quality Check: 3: Reviewer
      Complete Job Approval: 5: Reviewer

Detailed User Flow Scenarios

Scenario 1: Successful Job Completion

  1. Client Upload (5 minutes)

    • User drags video file to upload zone
    • Fills required metadata (title, target languages)
    • Monitors upload progress to completion
    • Receives job ID and initial confirmation
  2. AI Processing (10-30 minutes, automated)

    • Celery worker picks up ingestion task
    • Video uploaded to Gemini 2.5 Pro
    • AI generates captions and audio descriptions
    • VTT files created and stored in GCS
    • Job status updated to "pending_qc"
  3. Quality Control (15-45 minutes)

    • Reviewer receives notification of pending job
    • Opens VTT editor for content review
    • Makes text corrections and timing adjustments
    • Previews content with integrated video player
    • Approves job triggering translation pipeline
  4. Multi-Language Processing (20-60 minutes, automated)

    • Translation worker processes all target languages
    • TTS service generates audio descriptions
    • All output files uploaded to structured GCS paths
    • Job status updated to "pending_final_review"
  5. Final Review & Delivery (10-20 minutes)

    • Reviewer validates all translated outputs
    • Performs final quality check on audio files
    • Approves job for completion
    • Client receives email with signed download URLs

Scenario 2: Quality Control Rejection

  1. Initial Processing (same as Scenario 1, steps 1-2)

  2. QC Rejection (10-15 minutes)

    • Reviewer identifies quality issues
    • Adds detailed notes explaining problems
    • Rejects job with specific feedback
    • Job status changed to "rejected"
  3. Client Notification & Resolution

    • Client receives rejection email with reviewer notes
    • Client can contact support for clarification
    • New job may need to be created with improved source material

Technical Concepts

Asynchronous Processing Architecture

FastAPI Async/Await Pattern

The backend leverages Python's async/await functionality for high-concurrency request handling:

@router.post("/jobs/{job_id}/vtt")
async def update_vtt(
    job_id: str,
    vtt_data: VttUpdateRequest,
    current_user: User = Depends(get_current_user)
):
    # Non-blocking database operations
    job = await job_service.get_job(job_id)
    await job_service.update_vtt_content(job, vtt_data)
    return await job_service.validate_and_return(job)

Celery Distributed Task Processing

Background tasks are handled by Celery workers for scalable processing:

@celery_app.task(bind=True, max_retries=3)
def ingest_and_process_with_ai(self, job_id: str):
    try:
        # Long-running AI processing
        result = gemini_service.process_video(job_id)
        return result
    except Exception as exc:
        # Exponential backoff retry
        raise self.retry(exc=exc, countdown=60 * (2 ** self.request.retries))

AI Integration Patterns

Self-Healing JSON Parsing

Gemini AI responses are parsed with fallback mechanisms:

def parse_ai_response(raw_response: str) -> dict:
    try:
        return json.loads(raw_response)
    except json.JSONDecodeError:
        # Self-healing: Extract JSON from markdown code blocks
        json_match = re.search(r'```json\s*(\{.*?\})\s*```', raw_response, re.DOTALL)
        if json_match:
            return json.loads(json_match.group(1))
        raise InvalidAIResponseError("Could not parse AI response")

Structured AI Prompts

Gemini prompts are designed for consistent JSON output:

GEMINI_PROMPT_TEMPLATE = """
Analyze this video and generate accessibility content in this exact JSON format:
{
  "captions": [
    {"start": "00:00:00.000", "end": "00:00:05.000", "text": "Caption text here"}
  ],
  "audio_descriptions": [
    {"start": "00:00:00.000", "end": "00:00:05.000", "text": "Visual description here"}
  ],
  "confidence": 0.95
}

Requirements:
- Use WebVTT timestamp format (HH:MM:SS.mmm)
- Ensure no overlapping time ranges
- Maximum 32 characters per caption line
- Audio descriptions should be concise and descriptive
"""

WebVTT Processing & Validation

VTT Format Handling

The platform maintains strict WebVTT compliance:

def validate_vtt_format(vtt_content: str) -> bool:
    lines = vtt_content.strip().split('\n')
    if not lines[0].startswith('WEBVTT'):
        return False
    
    # Validate timestamp format and sequence
    for i, line in enumerate(lines):
        if '-->' in line:
            timestamps = line.split('-->')
            if not is_valid_timestamp(timestamps[0].strip()) or \
               not is_valid_timestamp(timestamps[1].strip()):
                return False
    return True

Timing Preservation During Translation

Translation maintains original VTT timing structure:

def translate_vtt_preserving_timing(vtt_content: str, target_language: str) -> str:
    cues = parse_vtt_cues(vtt_content)
    
    # Extract only text content for translation
    text_to_translate = [cue.text for cue in cues]
    translated_text = translate_service.translate_batch(text_to_translate, target_language)
    
    # Reconstruct VTT with original timing
    for i, cue in enumerate(cues):
        cue.text = translated_text[i]
    
    return serialize_to_vtt(cues)

Security Implementation Details

JWT Token Strategy

# Access Token (15-minute lifespan, stored in memory)
access_token_data = {
    "sub": user.email,
    "role": user.role,
    "exp": datetime.utcnow() + timedelta(minutes=15)
}

# Refresh Token (7-day lifespan, HttpOnly cookie)
refresh_token_data = {
    "sub": user.email,
    "type": "refresh",
    "exp": datetime.utcnow() + timedelta(days=7)
}

Signed URL Generation

def generate_signed_url(blob_name: str, expiration: int = 24*60*60) -> str:
    """Generate signed URL with 24-hour expiration"""
    bucket = gcs_client.bucket(GCS_BUCKET_NAME)
    blob = bucket.blob(blob_name)
    
    return blob.generate_signed_url(
        version="v4",
        expiration=datetime.utcnow() + timedelta(seconds=expiration),
        method="GET"
    )

Real-time Communication Patterns

TanStack Query for State Management

// Automatic cache management and background refetching
const { data: job, isLoading, error } = useQuery({
  queryKey: ['job', jobId],
  queryFn: () => api.getJob(jobId),
  refetchInterval: 5000, // Poll every 5 seconds for status updates
  staleTime: 1000 * 60, // Consider data fresh for 1 minute
});

// Optimistic updates for better UX
const updateJobMutation = useMutation({
  mutationFn: api.updateJob,
  onMutate: async (newJobData) => {
    // Cancel outgoing refetches
    await queryClient.cancelQueries(['job', jobId]);
    
    // Snapshot previous value
    const previousJob = queryClient.getQueryData(['job', jobId]);
    
    // Optimistically update cache
    queryClient.setQueryData(['job', jobId], newJobData);
    
    return { previousJob };
  },
  onError: (err, newJobData, context) => {
    // Rollback on error
    queryClient.setQueryData(['job', jobId], context.previousJob);
  },
});

Error Handling & Resilience

Exponential Backoff Retry Pattern

class RetryableTaskMixin:
    def retry_with_backoff(self, exc, base_delay=60):
        """Exponential backoff with jitter"""
        delay = base_delay * (2 ** self.request.retries)
        jitter = random.uniform(0, 0.1) * delay
        return self.retry(exc=exc, countdown=int(delay + jitter))

@celery_app.task(bind=True, base=RetryableTaskMixin, max_retries=5)
def process_with_gemini(self, job_id: str):
    try:
        return gemini_service.process(job_id)
    except (ConnectionError, TimeoutError) as exc:
        return self.retry_with_backoff(exc)

Circuit Breaker Pattern for External Services

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=60):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = 'CLOSED'  # CLOSED, OPEN, HALF_OPEN
    
    def call(self, func, *args, **kwargs):
        if self.state == 'OPEN':
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = 'HALF_OPEN'
            else:
                raise CircuitBreakerOpenError("Service unavailable")
        
        try:
            result = func(*args, **kwargs)
            self.reset()
            return result
        except Exception as e:
            self.record_failure()
            raise e

Performance Optimization Strategies

Database Indexing Strategy

// MongoDB indexes for optimal query performance
db.jobs.createIndex({ "status": 1, "created_at": -1 }); // Job list queries
db.jobs.createIndex({ "client_id": 1 }); // Client-specific filtering
db.jobs.createIndex({ "review.reviewer_id": 1 }); // Reviewer workload
db.users.createIndex({ "email": 1 }, { unique: true }); // User authentication

Caching Strategy

# Redis caching for frequently accessed data
@lru_cache(maxsize=1000)
def get_user_permissions(user_id: str) -> List[str]:
    """Cache user permissions for 5 minutes"""
    return permission_service.get_permissions(user_id)

# Cache invalidation on user role changes
def update_user_role(user_id: str, new_role: UserRole):
    user_service.update_role(user_id, new_role)
    get_user_permissions.cache_clear()  # Invalidate cache

Security Architecture

Multi-Layer Security Model

graph TB
    subgraph "Client Layer Security"
        CSP[Content Security Policy]
        CORS[CORS Configuration] 
        HTTPS[HTTPS Enforcement]
    end
    
    subgraph "API Layer Security"
        JWT[JWT Authentication]
        RBAC[Role-Based Access Control]
        RATE[Rate Limiting]
        INPUT[Input Validation]
    end
    
    subgraph "Data Layer Security"
        ENCRYPT[Data Encryption at Rest]
        SIGNED[Signed URLs]
        IAM[IAM Policies]
    end
    
    subgraph "Infrastructure Security"
        VPC[Private VPC]
        FIREWALL[Cloud Firewall]
        SECRETS[Secret Management]
    end
    
    CSP --> JWT
    CORS --> RBAC
    HTTPS --> RATE
    JWT --> ENCRYPT
    RBAC --> SIGNED
    RATE --> IAM
    INPUT --> VPC
    ENCRYPT --> FIREWALL
    SIGNED --> SECRETS

Authentication Security

Token Storage Strategy

  • Access Tokens: Stored in JavaScript memory (not localStorage) to prevent XSS attacks
  • Refresh Tokens: Stored in HttpOnly cookies to prevent JavaScript access
  • Token Rotation: Automatic rotation on refresh to limit exposure window
  • Secure Transmission: All tokens transmitted over HTTPS only

Password Security

# Strong password hashing with bcrypt
def hash_password(password: str) -> str:
    salt = bcrypt.gensalt(rounds=12)  # Computational cost of 2^12
    return bcrypt.hashpw(password.encode('utf-8'), salt).decode('utf-8')

def verify_password(password: str, hashed: str) -> bool:
    return bcrypt.checkpw(password.encode('utf-8'), hashed.encode('utf-8'))

Authorization Implementation

Role-Based Middleware

def require_roles(*allowed_roles: UserRole):
    """Decorator for endpoint authorization"""
    def decorator(func):
        async def wrapper(*args, **kwargs):
            current_user = kwargs.get('current_user')
            if not current_user or current_user.role not in allowed_roles:
                raise HTTPException(
                    status_code=403,
                    detail=f"Access denied. Required roles: {allowed_roles}"
                )
            return await func(*args, **kwargs)
        return wrapper
    return decorator

# Usage example
@router.delete("/jobs/{job_id}")
async def delete_job(
    job_id: str,
    current_user: User = Depends(require_roles(UserRole.ADMIN, UserRole.CLIENT))
):
    # Additional ownership check for clients
    if current_user.role == UserRole.CLIENT:
        job = await job_service.get_job(job_id)
        if job.client_id != current_user.id:
            raise HTTPException(status_code=404, detail="Job not found")
    
    await job_service.delete_job(job_id)

Data Protection Measures

Input Validation & Sanitization

class VttUpdateRequest(BaseModel):
    content: str = Field(..., max_length=100000)  # Limit size
    language: str = Field(..., regex=r'^[a-z]{2}(-[A-Z]{2})?$')  # ISO language codes
    
    @validator('content')
    def validate_vtt_format(cls, v):
        if not v.startswith('WEBVTT'):
            raise ValueError('Invalid VTT format')
        return bleach.clean(v, tags=[], strip=True)  # Remove any HTML

File Upload Security

ALLOWED_VIDEO_TYPES = {'video/mp4', 'video/quicktime', 'video/x-msvideo'}
MAX_FILE_SIZE = 5 * 1024 * 1024 * 1024  # 5GB

async def validate_upload(file: UploadFile):
    # Check file type
    if file.content_type not in ALLOWED_VIDEO_TYPES:
        raise HTTPException(400, "Invalid file type")
    
    # Check file size
    if file.size > MAX_FILE_SIZE:
        raise HTTPException(400, "File too large")
    
    # Scan file header for additional validation
    header = await file.read(1024)
    await file.seek(0)  # Reset for actual upload
    
    if not is_valid_video_header(header):
        raise HTTPException(400, "Invalid video file")

Infrastructure Security

Google Cloud Security Configuration

# Cloud Run security settings
metadata:
  annotations:
    run.googleapis.com/ingress: all
    run.googleapis.com/ingress-status: all
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/maxScale: "10"
        run.googleapis.com/execution-environment: gen2
        run.googleapis.com/cpu-throttling: "false"
    spec:
      containerConcurrency: 100
      timeoutSeconds: 300
      serviceAccountName: video-accessibility-runner@project.iam.gserviceaccount.com
      containers:
      - image: gcr.io/project/api:latest
        env:
        - name: ENVIRONMENT
          value: production
        resources:
          limits:
            cpu: 2000m
            memory: 4Gi
        securityContext:
          runAsNonRoot: true
          runAsUser: 1000

Audit Logging

Security Event Logging

class SecurityLogger:
    @staticmethod
    def log_authentication_attempt(email: str, success: bool, ip: str):
        logger.info(
            "Authentication attempt",
            extra={
                "event_type": "auth_attempt",
                "email": email,
                "success": success,
                "ip_address": ip,
                "timestamp": datetime.utcnow()
            }
        )
    
    @staticmethod
    def log_permission_denied(user_id: str, resource: str, action: str):
        logger.warning(
            "Permission denied",
            extra={
                "event_type": "permission_denied",
                "user_id": user_id,
                "resource": resource,
                "action": action,
                "timestamp": datetime.utcnow()
            }
        )

API Specifications

Authentication Endpoints

POST /api/v1/auth/login

Request Body:

{
  "email": "user@example.com",
  "password": "secure_password"
}

Response:

{
  "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9...",
  "user": {
    "id": "user_123",
    "email": "user@example.com", 
    "full_name": "John Doe",
    "role": "client"
  }
}

Response Headers:

Set-Cookie: refresh_token=<secure_token>; HttpOnly; Secure; SameSite=Lax; Path=/api/v1/auth/refresh; Max-Age=604800

POST /api/v1/auth/refresh

Headers:

Cookie: refresh_token=<refresh_token>

Response:

{
  "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9..."
}

Job Management Endpoints

POST /api/v1/jobs

Content-Type: multipart/form-data

Form Fields:

  • title: string (required) - Job title
  • language: string (required) - Source video language (ISO 639-1)
  • target_languages: array of strings - Target languages for translation
  • captions_vtt: boolean - Generate closed captions
  • audio_description_vtt: boolean - Generate audio description script
  • audio_description_mp3: boolean - Generate audio description audio
  • transcreation_languages: array of strings - Languages requiring cultural adaptation
  • file: file (required) - Video file (MP4, MOV, AVI)

Response:

{
  "id": "job_123",
  "title": "Marketing Video Q3 2025",
  "status": "created",
  "source": {
    "filename": "marketing_q3.mp4",
    "original_filename": "Marketing Video - Final Cut.mp4",
    "duration_s": 180.5,
    "language": "en"
  },
  "requested_outputs": {
    "captions_vtt": true,
    "audio_description_vtt": true,
    "audio_description_mp3": true,
    "languages": ["es", "fr"],
    "transcreation": ["es"]
  },
  "created_at": "2025-08-24T10:00:00Z"
}

GET /api/v1/jobs

Query Parameters:

  • status: string (optional) - Filter by job status
  • mine: boolean (optional) - Show only user's jobs (clients only)
  • page: integer (optional, default: 1) - Page number
  • limit: integer (optional, default: 20) - Items per page

Response:

{
  "jobs": [
    {
      "id": "job_123",
      "title": "Marketing Video Q3 2025",
      "status": "pending_qc",
      "created_at": "2025-08-24T10:00:00Z",
      "client": {
        "id": "user_456",
        "full_name": "Jane Smith",
        "email": "jane@company.com"
      }
    }
  ],
  "total": 1,
  "page": 1,
  "limit": 20,
  "pages": 1
}

GET /api/v1/jobs/{job_id}

Response:

{
  "id": "job_123",
  "title": "Marketing Video Q3 2025", 
  "status": "pending_qc",
  "source": {
    "filename": "marketing_q3.mp4",
    "original_filename": "Marketing Video - Final Cut.mp4",
    "gcs_uri": "gs://accessible-video/job_123/source.mp4",
    "duration_s": 180.5,
    "language": "en"
  },
  "requested_outputs": {
    "captions_vtt": true,
    "audio_description_vtt": true,
    "audio_description_mp3": true,
    "languages": ["es", "fr"],
    "transcreation": ["es"]
  },
  "outputs": {
    "en": {
      "captions_vtt_gcs": "gs://accessible-video/job_123/en/captions.vtt",
      "ad_vtt_gcs": "gs://accessible-video/job_123/en/ad.vtt"
    }
  },
  "review": {
    "notes": "Content looks good, minor timing adjustments made",
    "reviewer_id": "reviewer_789",
    "history": [
      {
        "at": "2025-08-24T11:30:00Z",
        "status": "pending_qc",
        "by": "reviewer@company.com",
        "notes": "Assigned for QC review"
      }
    ]
  },
  "ai": {
    "confidence": 0.94,
    "ingestion_json": {
      "captions": [...],
      "audio_descriptions": [...] 
    }
  },
  "created_at": "2025-08-24T10:00:00Z",
  "updated_at": "2025-08-24T11:35:00Z"
}

Job Action Endpoints

POST /api/v1/jobs/{job_id}/actions/approve_english

Authorization: Reviewer or Admin role required

Request Body:

{
  "notes": "Content approved after minor timing adjustments"
}

Response:

{
  "message": "Job approved for translation processing",
  "job": {
    "id": "job_123",
    "status": "translating",
    "review": {
      "notes": "Content approved after minor timing adjustments",
      "reviewer_id": "reviewer_789"
    }
  }
}

POST /api/v1/jobs/{job_id}/actions/reject

Authorization: Reviewer or Admin role required

Request Body:

{
  "notes": "Audio quality is poor, please provide higher quality source video"
}

Response:

{
  "message": "Job rejected",
  "job": {
    "id": "job_123", 
    "status": "rejected",
    "review": {
      "notes": "Audio quality is poor, please provide higher quality source video",
      "reviewer_id": "reviewer_789"
    }
  }
}

File Operation Endpoints

GET /api/v1/jobs/{job_id}/downloads

Authorization: Job owner, Reviewer, or Admin

Response:

{
  "expires_at": "2025-08-25T12:00:00Z",
  "downloads": {
    "source": {
      "url": "https://storage.googleapis.com/accessible-video/job_123/source.mp4?X-Goog-Algorithm=...",
      "filename": "marketing_q3.mp4",
      "size_bytes": 52428800
    },
    "outputs": {
      "en": {
        "captions_vtt": {
          "url": "https://storage.googleapis.com/accessible-video/job_123/en/captions.vtt?X-Goog-Algorithm=...",
          "filename": "captions_en.vtt",
          "size_bytes": 8192
        },
        "ad_vtt": {
          "url": "https://storage.googleapis.com/accessible-video/job_123/en/ad.vtt?X-Goog-Algorithm=...",
          "filename": "audio_description_en.vtt",  
          "size_bytes": 6144
        },
        "ad_mp3": {
          "url": "https://storage.googleapis.com/accessible-video/job_123/en/ad.mp3?X-Goog-Algorithm=...",
          "filename": "audio_description_en.mp3",
          "size_bytes": 2097152
        }
      },
      "es": {
        "captions_vtt": {
          "url": "https://storage.googleapis.com/accessible-video/job_123/es/captions.vtt?X-Goog-Algorithm=...",
          "filename": "captions_es.vtt",
          "size_bytes": 9216
        }
      }
    }
  }
}

GET /api/v1/jobs/{job_id}/vtt

Authorization: Reviewer or Admin

Query Parameters:

  • language: string (required) - Language code (e.g., "en", "es")
  • type: string (required) - VTT type ("captions" or "audio_description")

Response:

{
  "content": "WEBVTT\n\n00:00:00.000 --> 00:00:05.000\nOpening scene with mountain landscape.\n\n00:00:05.000 --> 00:00:10.000\nNarrator introduces the topic of accessibility.",
  "language": "en",
  "type": "captions",
  "last_modified": "2025-08-24T11:35:00Z"
}

PATCH /api/v1/jobs/{job_id}/vtt

Authorization: Reviewer or Admin

Request Body:

{
  "content": "WEBVTT\n\n00:00:00.000 --> 00:00:05.000\nUpdated caption text here.\n\n00:00:05.000 --> 00:00:10.000\nNarrator introduces the topic of accessibility.",
  "language": "en",
  "type": "captions"
}

Response:

{
  "message": "VTT content updated successfully",
  "validation": {
    "valid": true,
    "cue_count": 2,
    "total_duration": "00:00:10.000"
  }
}

Error Response Format

All API endpoints return consistent error responses:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Request validation failed",
    "details": [
      {
        "field": "email",
        "message": "Invalid email format"
      }
    ]
  },
  "request_id": "req_123456789"
}

Common HTTP Status Codes:

  • 200 - Success
  • 201 - Created
  • 400 - Bad Request (validation errors)
  • 401 - Unauthorized (invalid/expired token)
  • 403 - Forbidden (insufficient permissions)
  • 404 - Not Found
  • 409 - Conflict (duplicate resource)
  • 422 - Unprocessable Entity (business logic error)
  • 429 - Too Many Requests (rate limited)
  • 500 - Internal Server Error

Conclusion

This video accessibility platform represents a comprehensive, production-ready solution that combines cutting-edge AI technology with human quality control workflows. The architecture emphasizes scalability, security, and user experience while maintaining strict accessibility standards and multi-language support.

Key architectural strengths include:

  • Scalable Processing: Async FastAPI backend with distributed Celery workers
  • Robust AI Integration: Self-healing AI response parsing with confidence scoring
  • Enterprise Security: Multi-layered security with JWT, RBAC, and signed URLs
  • Quality Assurance: Human-in-the-loop workflows with comprehensive audit trails
  • Developer Experience: Full TypeScript integration and comprehensive API documentation
  • Operational Excellence: Complete observability with OpenTelemetry, metrics, and error tracking

The platform successfully addresses the complex challenges of automated video accessibility while maintaining the flexibility needed for diverse client requirements and quality standards.