# Accessible Video Processing Platform - Development Guide ## Project Overview This is a comprehensive video accessibility platform that automatically generates closed captions and audio descriptions using AI, with quality control workflows and multi-language support. **Core Tech Stack:** - Frontend: React 18 + Vite SPA (TypeScript) - Backend: FastAPI + Celery workers (Python 3.11+) - Database: MongoDB Atlas - Storage: Google Cloud Storage with signed URLs - AI: Gemini 2.5 Pro, Google Cloud Translate, ElevenLabs TTS - Queue: Redis + Celery - Auth: JWT with HttpOnly refresh cookies ## Development Instructions ### CRITICAL: Always Read the Full Development Plan **Before starting any development work, ALWAYS read the entire `video_accessibility_development_plan.txt` file.** This document contains: - Complete technical specifications - API contracts and schemas - Database models and indexes - Worker pipeline details - Frontend component specifications - Security requirements - Testing strategies The development plan is the authoritative source for all implementation details. Refer to it frequently during development to ensure consistency with the overall architecture. ## Key Implementation Phases ### Phase 1: Foundation & Setup - Monorepo structure (backend/, frontend/, infra/) - FastAPI backend initialization - React + Vite frontend setup - MongoDB and Redis configuration - JWT authentication with RBAC ### Phase 2: Core Services - Google Cloud Storage integration - Gemini 2.5 Pro service - Job model with state machine - Celery worker infrastructure ### Phase 3: Ingestion & AI Pipeline - Video upload system - Ingestion worker task - VTT generation - Gemini prompt system ### Phase 4: Quality Control System - VTT editor component - QC dashboard for reviewers - Approval/rejection workflow - Video player with captions ### Phase 5: Translation & TTS Pipeline - Google Cloud Translate integration - Transcreation system - Translation worker - TTS service integration ### Phase 6: Final Review & Delivery - Final review interface - Job completion workflow - Email notifications - Client download portal ### Phase 7: Production Readiness - Comprehensive testing - Security hardening - Observability setup - CI/CD configuration ## Job Status State Machine ``` created → ingesting → ai_processing → translating → tts_generating → rendering_video → pending_qc → pending_final_review → completed ↓ rejected ``` ## Key Architecture Decisions ### Security - Access tokens stored in memory (not localStorage) - Refresh tokens in HttpOnly cookies - RBAC enforcement server-side - Signed URLs for file access (24h expiry) - Audit logs for all reviewer actions ### Data Flow 1. Client uploads MP4 → GCS + MongoDB record 2. Celery worker processes video with Gemini 2.5 Pro 3. Generates captions.vtt and audio_description.vtt for source language 4. Translation, TTS synthesis, and accessible video rendering run automatically 5. Job enters QC Review for reviewer approval (edits can trigger re-rendering) 6. QC approval moves job directly to Final Review 7. Final review and client notification with download links ### File Structure ``` gs://accessible-video/{jobId}/ source.mp4 en/ captions.vtt ad.vtt ad.mp3 {lang}/ captions.vtt ad.vtt ad.mp3 ``` ## Development Guidelines ### Before Each Session 1. Read the complete `video_accessibility_development_plan.txt` 2. Review the current todo list and phase 3. Check existing code patterns and conventions 4. Understand the security and accessibility requirements ### Code Standards - Follow existing patterns in the codebase - Implement proper error handling and retries - Add OpenTelemetry tracing for observability - Ensure RBAC is enforced on all endpoints - Validate all VTT outputs for correctness - Write unit tests for all services and utilities ### Testing Requirements - Unit tests ≥80% coverage for services/utils - Integration tests with mocked AI services - E2E tests for complete workflows - Performance testing for video processing ### Lint/Type Check Commands - Backend: `ruff check .` and `mypy .` - Frontend: `npm run lint` and `npm run type-check` ## Important Files to Reference - `video_accessibility_development_plan.txt` - Complete specification - Backend schemas in section 17 of the plan - API design in section 7 of the plan - Frontend component specs in section 10 of the plan - Security requirements in section 11 of the plan ## Risk Mitigations - Invalid JSON from AI models: Pydantic validation + self-heal prompts - Timestamp drift: Preserve cue timings in translations - TTS alignment: Per-cue synthesis with crossfades - Queue backlog: Autoscaling workers with monitoring - Security: Secret Manager, least-privilege IAM, no client secrets ## Knowledge Wiki A cross-project knowledge base is maintained automatically from all Claude Code sessions. - **Index:** `/Users/aimpress/Library/Mobile Documents/iCloud~md~obsidian/Documents/VadymSamoilenko/wiki/index.md` - **Query:** `cd ~/.claude/memory-compiler && uv run python scripts/query.py "your question"` - Every session in this project automatically feeds the knowledge base.