No description
Find a file
Vadym Samoilenko 6f963ff7c4 feat: DCMP compliance, descriptive transcript, new languages, QA bug fixes
- Rewrote VTT translation to two-step (text-only → Gemini → apply to original timestamps) preventing caption timing desync
- Added polling fallback for all processing states and Safari visibilitychange WebSocket reconnect
- Added 11 new TTS languages (cs, da, fi, hu, no, sk, sv, es-419, pt-BR, fr-CA)
- Updated caption/AD prompts to DCMP Captioning Key & Description Key standards (line splitting, ♪ music notation, italic tags, caption positioning, ethics guidelines)
- Added descriptive transcript generation (WCAG 2.1 §1.2.1) combining captions + AD into plain text
- Fixed amix normalize=0 to prevent audio loss in rendered videos
- Fixed AD re-timing double-count when source_ms is None
- Fixed cue block numbering to be 1-based in VttEditor and Timeline Preview

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 11:50:43 +00:00
.github/workflows initial commit 2025-08-24 16:28:33 -05:00
backend feat: DCMP compliance, descriptive transcript, new languages, QA bug fixes 2026-03-27 11:50:43 +00:00
config suppress WiredTiger checkpoint progress log spam 2025-12-22 13:54:37 -06:00
docs added MSAL microsoft authentication 2025-10-10 09:19:39 -05:00
frontend feat: DCMP compliance, descriptive transcript, new languages, QA bug fixes 2026-03-27 11:50:43 +00:00
infra fix: update Cloud Run service configs for compatibility 2026-01-02 17:34:10 -06:00
nginx removed mongodb change stream monitoring, added global websockets monitoring for notifications, broke symmetry between toasts and persistent notifications (and refined which notifications get sent and how) 2025-08-25 15:48:18 -05:00
scripts chore: update check_job.py to dump full outputs structure 2026-03-03 11:23:46 +00:00
secrets added MSAL microsoft authentication 2025-10-10 09:19:39 -05:00
.DS_Store fixed session refresh and added full deploy script - and added documentation including videos 2025-10-08 22:29:08 -05:00
.env.local fix: add Cloud Run URLs to .env.local for docker-compose 2026-01-02 10:54:27 -06:00
.env.prod.example removed mongodb change stream monitoring, added global websockets monitoring for notifications, broke symmetry between toasts and persistent notifications (and refined which notifications get sent and how) 2025-08-25 15:48:18 -05:00
.env.production .env.production 2026-01-02 22:32:40 +00:00
.gitignore .env and .gitignore 2026-01-02 16:21:41 +00:00
apache-config-snippet.conf wrote docker files and deployment instructions 2025-10-08 16:00:12 -05:00
APACHE_DEPLOYMENT.md removed mongodb change stream monitoring, added global websockets monitoring for notifications, broke symmetry between toasts and persistent notifications (and refined which notifications get sent and how) 2025-08-25 15:48:18 -05:00
CLAUDE.md feat: streamline QC approval to skip translation pipeline 2026-01-12 10:37:37 -06:00
DEPLOYMENT.md adjusted full deploy script 2025-10-08 22:33:46 -05:00
DEPLOYMENT_OPTIONS.md removed mongodb change stream monitoring, added global websockets monitoring for notifications, broke symmetry between toasts and persistent notifications (and refined which notifications get sent and how) 2025-08-25 15:48:18 -05:00
docker-compose.local.yml added MSAL microsoft authentication 2025-10-10 09:19:39 -05:00
docker-compose.prod.yml wrote docker files and deployment instructions 2025-10-08 16:00:12 -05:00
docker-compose.yml feat: add environment-based worker concurrency for Cloud Run mode 2026-01-02 10:27:07 -06:00
docker-compose.yml.old wrote docker files and deployment instructions 2025-10-08 16:00:12 -05:00
Makefile removed mongodb change stream monitoring, added global websockets monitoring for notifications, broke symmetry between toasts and persistent notifications (and refined which notifications get sent and how) 2025-08-25 15:48:18 -05:00
mongo-init.js initial commit 2025-08-24 16:28:33 -05:00
mongo-keyfile initial commit 2025-08-24 16:28:33 -05:00
README.md adjusted login page to emphasize microsoft login, hide local login options behind small link 2025-10-10 11:50:52 -05:00
video_accessibility_development_plan.txt initial commit 2025-08-24 16:28:33 -05:00

Accessible Video Processing Platform

A comprehensive AI-powered platform for generating accessible video content with closed captions, audio descriptions, and multi-language translations. Features a complete workflow from video upload to final delivery with quality control processes.

Current Status: Production-Ready (85% Complete)

Lines of Code: 20,471 total (12,198 backend + 8,273 frontend)

🚀 Key Features Implemented

Core Functionality

  • AI-Powered Processing: Complete Gemini 2.5 Pro integration for intelligent caption and audio description generation
  • Multi-Language Pipeline: Google Translate + cultural transcreation with 50+ language support
  • Quality Control Workflow: Full reviewer approval/rejection system with VTT editing capabilities
  • Audio Description TTS: Google Cloud TTS and ElevenLabs integration with audio synthesis
  • Real-time Updates: WebSocket-powered job status tracking and notifications
  • Advanced Video Player: Multi-language caption support with timeline navigation
  • Role-Based Access Control: Complete CLIENT/REVIEWER/ADMIN role system

Security & Infrastructure

  • JWT Authentication: Secure access/refresh token system with HttpOnly cookies
  • Audit Logging: Comprehensive audit trail for all reviewer actions
  • Signed URLs: Secure Google Cloud Storage file access (24h expiry)
  • Input Validation: Complete request validation and sanitization
  • HTTPS/CORS: Production-ready security configuration

User Experience

  • Responsive Design: Mobile-first Tailwind CSS implementation
  • Real-time Feedback: Live job progress tracking and notifications
  • Advanced File Management: Drag-and-drop uploads with progress indicators
  • VTT Editor: Inline caption editing with live preview
  • Download Portal: Secure asset delivery with organized file structure

🛠 Tech Stack

Backend (FastAPI + Python 3.11)

  • FastAPI 0.115.0 - Modern async web framework with OpenAPI documentation
  • Celery 5.3.4 - Distributed task queue with Redis broker
  • MongoDB 7.0 - Document database with replica set support
  • Redis 7.2 - Caching and message queuing
  • Google Cloud Platform - Storage, AI services, Secret Manager, TTS
  • Pydantic 2.5 - Data validation and serialization
  • OpenTelemetry - Observability and monitoring
  • Sentry - Error tracking and performance monitoring

Frontend (React 19 + TypeScript)

  • React 19.1.1 - Modern UI framework with latest features
  • Vite 7.1.2 - Lightning-fast build tool and dev server
  • TypeScript 5.8 - Full type safety throughout application
  • TanStack Query 5.85 - Advanced server state management with caching
  • React Router 7.8 - Client-side routing with protected routes
  • Tailwind CSS 4.1 - Utility-first CSS framework
  • Zustand 5.0 - Lightweight client state management
  • React Hook Form + Zod - Form handling with schema validation

🏗 Architecture Overview

Complete Job Processing Pipeline

Upload → Ingestion → AI Processing → QC Review → Translation → TTS → Final Review → Delivery
  ↓         ↓            ↓            ↓           ↓         ↓         ↓          ↓
 GCS    Gemini 2.5   VTT Generation  Human    Google    Text-to-  Reviewer   Email +
Storage    Pro         + Validation   Review   Translate  Speech    Approval   Downloads

System Architecture

  • Monorepo Structure: /backend, /frontend, /infra with clear separation
  • Microservices Ready: Modular FastAPI services with proper dependency injection
  • Event-Driven: WebSocket real-time updates with connection management
  • Scalable Workers: Celery task queue with auto-retry and error recovery
  • Secure by Design: RBAC, signed URLs, audit logging, input validation

🚀 Getting Started

Prerequisites

  • Python 3.11+ (backend development)
  • Node.js 18+ (frontend development)
  • Docker & Docker Compose (required for local development)
  • Google Cloud Project with APIs enabled (for video processing)

This is the recommended approach for local development. Backend services run in Docker containers while the frontend runs via Vite dev server for fast hot-reload.

Initial Setup

# 1. Clone the repository
git clone <repository>
cd video_accessibility

# 2. Copy and configure environment files
cp .env.prod.example .env.local
# Edit .env.local with your API keys and settings

# 3. Set up frontend environment
cp frontend/.env.example frontend/.env.local
# The defaults should work for local development

# 4. Ensure GCP credentials are in place
# Copy your GCP service account JSON to: ./secrets/gcp-credentials.json

Starting the Development Environment

Step 1: Start Backend Services (Docker)

# Start API, Worker, MongoDB, and Redis in Docker
./scripts/run-local.sh

# Services will be available at:
# - API: http://localhost:8003
# - API Docs: http://localhost:8003/docs
# - MongoDB: mongodb://localhost:27017
# - Redis: redis://localhost:6379

Step 2: Start Frontend (Vite Dev Server)

# In a separate terminal
cd frontend
npm install  # First time only
npm run dev

# Frontend will be available at:
# - Application: http://localhost:6001/video-accessibility

Useful Commands

# View logs
docker compose logs -f api          # API logs
docker compose logs -f worker       # Worker logs
docker compose logs -f              # All logs

# Restart a service
docker compose restart api
docker compose restart worker

# Rebuild and restart (after code changes)
./scripts/run-local.sh --rebuild

# Stop all services
./scripts/run-local.sh --stop
# or
docker compose down

Test User Credentials (Local Development Only)

For testing different user roles locally:

Admin:      admin@example.com      / admin
Production: production@example.com / production
Reviewer:   reviewer@example.com   / reviewer
Client:     client@example.com     / client123

Note: These test users are only for local development. Production uses Microsoft authentication.

Alternative: Native Development (Without Docker)

For development without Docker, you'll need to run each service manually:

# Terminal 1: MongoDB
mongod --dbpath ./data/db

# Terminal 2: Redis
redis-server

# Terminal 3: Backend API
cd backend
poetry install
poetry run uvicorn app.main:app --reload --port 8000

# Terminal 4: Celery Worker
cd backend
poetry run celery -A app.tasks worker --loglevel=info

# Terminal 5: Frontend
cd frontend
npm install
npm run dev

Note: The Docker approach is strongly recommended as it ensures consistency and simplifies setup.

Testing & Quality

# Backend tests + linting
cd backend
poetry run pytest
poetry run ruff check .
poetry run mypy .

# Frontend tests + linting  
cd frontend
npm run test
npm run test:e2e
npm run lint
npm run type-check

📁 Project Structure

video_accessibility/                    # Root monorepo
├── backend/                           # FastAPI Python backend (12,198 LOC)
│   ├── app/
│   │   ├── api/v1/                   # REST API endpoints
│   │   │   ├── auth.py               # JWT authentication
│   │   │   ├── jobs.py               # Job CRUD & workflow
│   │   │   ├── admin.py              # Admin operations
│   │   │   └── files.py              # File management
│   │   ├── core/                     # Core configuration
│   │   ├── models/                   # Database models
│   │   ├── schemas/                  # Pydantic request/response schemas  
│   │   ├── services/                 # External service integrations
│   │   │   ├── gemini.py             # AI processing
│   │   │   ├── gcs.py                # Google Cloud Storage
│   │   │   ├── translation.py        # Multi-language support
│   │   │   └── tts.py                # Text-to-speech
│   │   ├── tasks/                    # Celery background workers
│   │   ├── middleware/               # Request processing
│   │   └── telemetry/                # Observability
│   ├── tests/                        # Comprehensive test suite
│   └── Dockerfile                    # Container configuration
├── frontend/                         # React TypeScript SPA (8,273 LOC)
│   ├── src/
│   │   ├── routes/                   # Page components
│   │   │   ├── auth/                 # Login system
│   │   │   ├── jobs/                 # Job management
│   │   │   ├── qc/                   # Quality control
│   │   │   └── admin/                # Admin interface
│   │   ├── components/               # Reusable UI components
│   │   │   ├── VideoWithCaptions.tsx # Advanced video player
│   │   │   ├── VttEditor.tsx         # Caption editing
│   │   │   └── UploadDropzone.tsx    # File upload
│   │   ├── lib/                      # Utilities and API client
│   │   ├── hooks/                    # Custom React hooks
│   │   └── types/                    # TypeScript definitions
│   ├── tests/                        # Unit + E2E tests
│   ├── .env.local                    # Local development config
│   └── Dockerfile                    # Container configuration
├── scripts/
│   ├── run-local.sh                 # Local development startup
│   ├── deploy.sh                    # Production deployment
│   ├── full-deploy.sh               # Full production rebuild
│   └── build-frontend.sh            # Frontend build script
├── docker-compose.yml               # Base Docker configuration
├── docker-compose.local.yml         # Local development overrides
├── docker-compose.prod.yml          # Production overrides
├── .env.local                       # Local environment variables
├── .env.production                  # Production environment variables
├── CLAUDE.md                        # Development guidelines
└── video_accessibility_development_plan.txt  # Complete specification

⚙️ Configuration

Environment Variables

Backend (backend/.env):

# Database
MONGODB_URL=mongodb://admin:password@localhost:27017/accessible_video
REDIS_URL=redis://localhost:6379/0

# Authentication  
JWT_SECRET_KEY=your-jwt-secret
JWT_REFRESH_SECRET_KEY=your-refresh-secret

# AI Services
GEMINI_API_KEY=your-gemini-key
ELEVENLABS_API_KEY=your-elevenlabs-key

# Google Cloud
GCS_BUCKET_NAME=your-bucket-name
GOOGLE_CLOUD_PROJECT=your-project-id

# Email
SENDGRID_API_KEY=your-sendgrid-key

# Monitoring
SENTRY_DSN=your-sentry-dsn

Frontend (frontend/.env):

VITE_API_URL=http://localhost:8000
VITE_SENTRY_DSN=your-sentry-dsn
VITE_ENVIRONMENT=development

Google Cloud Setup

  1. Create GCP Project with billing enabled
  2. Enable APIs:
    • Cloud Storage API
    • Cloud Translation API
    • Cloud Text-to-Speech API
    • Vertex AI API (for Gemini)
    • Secret Manager API
  3. Create Service Account with roles:
    • Storage Admin
    • AI Platform Admin
    • Secret Manager Admin
  4. Download JSON key and set GOOGLE_APPLICATION_CREDENTIALS

🚢 Deployment Options

Production Architecture (Google Cloud)

  • Frontend: Cloud Storage + Cloud CDN (static hosting)
  • Backend API: Cloud Run (serverless, auto-scaling)
  • Workers: Cloud Run (Celery with Redis)
  • Database: MongoDB Atlas (managed)
  • Queue: Cloud Memorystore (Redis)
  • Storage: Google Cloud Storage
  • Monitoring: Cloud Monitoring + Sentry

Docker Production

# Build production images
docker-compose -f docker-compose.prod.yml up -d

🔒 Security Features

Implemented Security

  • JWT Authentication: Access (15min) + refresh (7 days) token rotation
  • RBAC System: CLIENT/REVIEWER/ADMIN roles with endpoint protection
  • Secure Storage: HttpOnly cookies for refresh tokens
  • File Security: Signed URLs with 24h expiry, no client access to raw files
  • Input Validation: Comprehensive Pydantic validation on all endpoints
  • Audit Logging: Complete trail of all reviewer actions and system events
  • CORS Protection: Configured for production domains
  • Rate Limiting: Request throttling and validation middleware

🔧 API Documentation

Key Endpoints Implemented

POST /api/v1/auth/login              # Authentication
POST /api/v1/jobs                    # Create job with file upload
GET  /api/v1/jobs                    # List jobs (filtered by role)
GET  /api/v1/jobs/{id}               # Job details with real-time status
POST /api/v1/jobs/{id}/actions/*     # Workflow actions (approve/reject/complete)
GET  /api/v1/jobs/{id}/vtt           # VTT content retrieval
PATCH /api/v1/jobs/{id}/vtt          # VTT editing and updates
GET  /api/v1/jobs/{id}/downloads     # Signed download URLs
WS   /api/v1/ws/jobs/{id}            # Real-time job status updates

OpenAPI Documentation: http://localhost:8000/docs

🎯 Development Status

Completed (Production Ready)

  • User Management: Full authentication, RBAC, password management
  • Job Pipeline: Complete video processing workflow with state machine
  • Quality Control: VTT editor, approval workflows, reviewer dashboards
  • Real-time Features: WebSocket updates, live notifications
  • Multi-language: Translation pipeline with cultural transcreation
  • File Management: Secure uploads, downloads, asset validation
  • Admin Features: User management, system monitoring, audit logs

⚠️ Needs Attention (Minor)

  • Integration Tests: Framework exists but needs completion
  • Email Templates: Service implemented, templates may need customization
  • Performance Testing: No load testing implemented yet
  • Documentation: API docs complete, user guides could be enhanced
  1. Complete integration test suite for end-to-end validation
  2. Performance testing with realistic video processing loads
  3. Production deployment configuration and CI/CD pipeline
  4. User documentation and training materials
  5. Monitoring dashboards for production operations

📚 Development Resources

  • Complete Specification: video_accessibility_development_plan.txt
  • Development Guidelines: CLAUDE.md
  • API Documentation: http://localhost:8000/docs (when running)
  • Test Coverage Reports: backend/htmlcov/ (after running tests)