video-accessibility/CLAUDE.md
2025-08-24 16:28:33 -05:00

4.6 KiB

Accessible Video Processing Platform - Development Guide

Project Overview

This is a comprehensive video accessibility platform that automatically generates closed captions and audio descriptions using AI, with quality control workflows and multi-language support.

Core Tech Stack:

  • Frontend: React 18 + Vite SPA (TypeScript)
  • Backend: FastAPI + Celery workers (Python 3.11+)
  • Database: MongoDB Atlas
  • Storage: Google Cloud Storage with signed URLs
  • AI: Gemini 2.5 Pro, Google Cloud Translate, ElevenLabs TTS
  • Queue: Redis + Celery
  • Auth: JWT with HttpOnly refresh cookies

Development Instructions

CRITICAL: Always Read the Full Development Plan

Before starting any development work, ALWAYS read the entire video_accessibility_development_plan.txt file. This document contains:

  • Complete technical specifications
  • API contracts and schemas
  • Database models and indexes
  • Worker pipeline details
  • Frontend component specifications
  • Security requirements
  • Testing strategies

The development plan is the authoritative source for all implementation details. Refer to it frequently during development to ensure consistency with the overall architecture.

Key Implementation Phases

Phase 1: Foundation & Setup

  • Monorepo structure (backend/, frontend/, infra/)
  • FastAPI backend initialization
  • React + Vite frontend setup
  • MongoDB and Redis configuration
  • JWT authentication with RBAC

Phase 2: Core Services

  • Google Cloud Storage integration
  • Gemini 2.5 Pro service
  • Job model with state machine
  • Celery worker infrastructure

Phase 3: Ingestion & AI Pipeline

  • Video upload system
  • Ingestion worker task
  • VTT generation
  • Gemini prompt system

Phase 4: Quality Control System

  • VTT editor component
  • QC dashboard for reviewers
  • Approval/rejection workflow
  • Video player with captions

Phase 5: Translation & TTS Pipeline

  • Google Cloud Translate integration
  • Transcreation system
  • Translation worker
  • TTS service integration

Phase 6: Final Review & Delivery

  • Final review interface
  • Job completion workflow
  • Email notifications
  • Client download portal

Phase 7: Production Readiness

  • Comprehensive testing
  • Security hardening
  • Observability setup
  • CI/CD configuration

Job Status State Machine

created → ingesting → ai_processing → pending_qc → approved_english | rejected → translating → tts_generating → pending_final_review → completed

Key Architecture Decisions

Security

  • Access tokens stored in memory (not localStorage)
  • Refresh tokens in HttpOnly cookies
  • RBAC enforcement server-side
  • Signed URLs for file access (24h expiry)
  • Audit logs for all reviewer actions

Data Flow

  1. Client uploads MP4 → GCS + MongoDB record
  2. Celery worker processes video with Gemini 2.5 Pro
  3. Generates English captions.vtt and audio_description.vtt
  4. Reviewer QC approval triggers translation pipeline
  5. Multi-language assets generated (translate/transcreate + TTS)
  6. Final review and client notification with download links

File Structure

gs://accessible-video/{jobId}/
  source.mp4
  en/
    captions.vtt
    ad.vtt
    ad.mp3
  {lang}/
    captions.vtt
    ad.vtt
    ad.mp3

Development Guidelines

Before Each Session

  1. Read the complete video_accessibility_development_plan.txt
  2. Review the current todo list and phase
  3. Check existing code patterns and conventions
  4. Understand the security and accessibility requirements

Code Standards

  • Follow existing patterns in the codebase
  • Implement proper error handling and retries
  • Add OpenTelemetry tracing for observability
  • Ensure RBAC is enforced on all endpoints
  • Validate all VTT outputs for correctness
  • Write unit tests for all services and utilities

Testing Requirements

  • Unit tests ≥80% coverage for services/utils
  • Integration tests with mocked AI services
  • E2E tests for complete workflows
  • Performance testing for video processing

Lint/Type Check Commands

  • Backend: ruff check . and mypy .
  • Frontend: npm run lint and npm run type-check

Important Files to Reference

  • video_accessibility_development_plan.txt - Complete specification
  • Backend schemas in section 17 of the plan
  • API design in section 7 of the plan
  • Frontend component specs in section 10 of the plan
  • Security requirements in section 11 of the plan

Risk Mitigations

  • Invalid JSON from AI models: Pydantic validation + self-heal prompts
  • Timestamp drift: Preserve cue timings in translations
  • TTS alignment: Per-cue synthesis with crossfades
  • Queue backlog: Autoscaling workers with monitoring
  • Security: Secret Manager, least-privilege IAM, no client secrets