michael af2562096a initial commit

2025-08-24 16:28:33 -05:00

4.6 KiB

Raw Blame History

Accessible Video Processing Platform - Development Guide

Project Overview

This is a comprehensive video accessibility platform that automatically generates closed captions and audio descriptions using AI, with quality control workflows and multi-language support.

Core Tech Stack:

Frontend: React 18 + Vite SPA (TypeScript)
Backend: FastAPI + Celery workers (Python 3.11+)
Database: MongoDB Atlas
Storage: Google Cloud Storage with signed URLs
AI: Gemini 2.5 Pro, Google Cloud Translate, ElevenLabs TTS
Queue: Redis + Celery
Auth: JWT with HttpOnly refresh cookies

Development Instructions

CRITICAL: Always Read the Full Development Plan

Before starting any development work, ALWAYS read the entire video_accessibility_development_plan.txt file. This document contains:

Complete technical specifications
API contracts and schemas
Database models and indexes
Worker pipeline details
Frontend component specifications
Security requirements
Testing strategies

The development plan is the authoritative source for all implementation details. Refer to it frequently during development to ensure consistency with the overall architecture.

Key Implementation Phases

Phase 1: Foundation & Setup

Monorepo structure (backend/, frontend/, infra/)
FastAPI backend initialization
React + Vite frontend setup
MongoDB and Redis configuration
JWT authentication with RBAC

Phase 2: Core Services

Google Cloud Storage integration
Gemini 2.5 Pro service
Job model with state machine
Celery worker infrastructure

Phase 3: Ingestion & AI Pipeline

Video upload system
Ingestion worker task
VTT generation
Gemini prompt system

Phase 4: Quality Control System

VTT editor component
QC dashboard for reviewers
Approval/rejection workflow
Video player with captions

Phase 5: Translation & TTS Pipeline

Google Cloud Translate integration
Transcreation system
Translation worker
TTS service integration

Phase 6: Final Review & Delivery

Final review interface
Job completion workflow
Email notifications
Client download portal

Phase 7: Production Readiness

Comprehensive testing
Security hardening
Observability setup
CI/CD configuration

Job Status State Machine

created → ingesting → ai_processing → pending_qc → approved_english | rejected → translating → tts_generating → pending_final_review → completed

Key Architecture Decisions

Security

Access tokens stored in memory (not localStorage)
Refresh tokens in HttpOnly cookies
RBAC enforcement server-side
Signed URLs for file access (24h expiry)
Audit logs for all reviewer actions

Data Flow

Client uploads MP4 → GCS + MongoDB record
Celery worker processes video with Gemini 2.5 Pro
Generates English captions.vtt and audio_description.vtt
Reviewer QC approval triggers translation pipeline
Multi-language assets generated (translate/transcreate + TTS)
Final review and client notification with download links

File Structure

gs://accessible-video/{jobId}/
  source.mp4
  en/
    captions.vtt
    ad.vtt
    ad.mp3
  {lang}/
    captions.vtt
    ad.vtt
    ad.mp3

Development Guidelines

Before Each Session

Read the complete video_accessibility_development_plan.txt
Review the current todo list and phase
Check existing code patterns and conventions
Understand the security and accessibility requirements

Code Standards

Follow existing patterns in the codebase
Implement proper error handling and retries
Add OpenTelemetry tracing for observability
Ensure RBAC is enforced on all endpoints
Validate all VTT outputs for correctness
Write unit tests for all services and utilities

Testing Requirements

Unit tests ≥80% coverage for services/utils
Integration tests with mocked AI services
E2E tests for complete workflows
Performance testing for video processing

Lint/Type Check Commands

Backend: ruff check . and mypy .
Frontend: npm run lint and npm run type-check

Important Files to Reference

video_accessibility_development_plan.txt - Complete specification
Backend schemas in section 17 of the plan
API design in section 7 of the plan
Frontend component specs in section 10 of the plan
Security requirements in section 11 of the plan

Risk Mitigations

Invalid JSON from AI models: Pydantic validation + self-heal prompts
Timestamp drift: Preserve cue timings in translations
TTS alignment: Per-cue synthesis with crossfades
Queue backlog: Autoscaling workers with monitoring
Security: Secret Manager, least-privilege IAM, no client secrets

4.6 KiB Raw Blame History