- AGENTS.md: canonical project entry point (Quick Nav, pipeline, constraints) - docs/: complete docs tree — architecture, API spec, DB schema, infra, runbook, requirements, tech stack, principles, reference ADRs, guides, tasks backlog, testing strategy - tests/README.md: test commands, structure, known gaps - README.md / CLAUDE.md / DEPLOYMENT.md: updated with canonical doc links - .archive/: backup of pre-documentation-pipeline originals - backend/uv.lock: uv dependency lockfile - Delete committed __pycache__ .pyc files (should have been gitignored) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
15 KiB
Accessible Video Processing Platform
A comprehensive AI-powered platform for generating accessible video content with closed captions, audio descriptions, and multi-language translations. Features a complete workflow from video upload to final delivery with quality control processes.
Documentation: See AGENTS.md for full navigation, or docs/README.md for the documentation hub.
✅ Current Status: Production-Ready (85% Complete)
Lines of Code: 20,471 total (12,198 backend + 8,273 frontend)
🚀 Key Features Implemented
Core Functionality ✅
- AI-Powered Processing: Complete Gemini 2.5 Pro integration for intelligent caption and audio description generation
- Multi-Language Pipeline: Google Translate + cultural transcreation with 50+ language support
- Quality Control Workflow: Full reviewer approval/rejection system with VTT editing capabilities
- Audio Description TTS: Google Cloud TTS and ElevenLabs integration with audio synthesis
- Real-time Updates: WebSocket-powered job status tracking and notifications
- Advanced Video Player: Multi-language caption support with timeline navigation
- Role-Based Access Control: Complete CLIENT/REVIEWER/ADMIN role system
Security & Infrastructure ✅
- JWT Authentication: Secure access/refresh token system with HttpOnly cookies
- Audit Logging: Comprehensive audit trail for all reviewer actions
- Signed URLs: Secure Google Cloud Storage file access (24h expiry)
- Input Validation: Complete request validation and sanitization
- HTTPS/CORS: Production-ready security configuration
User Experience ✅
- Responsive Design: Mobile-first Tailwind CSS implementation
- Real-time Feedback: Live job progress tracking and notifications
- Advanced File Management: Drag-and-drop uploads with progress indicators
- VTT Editor: Inline caption editing with live preview
- Download Portal: Secure asset delivery with organized file structure
🛠 Tech Stack
Backend (FastAPI + Python 3.11)
- FastAPI 0.115.0 - Modern async web framework with OpenAPI documentation
- Celery 5.3.4 - Distributed task queue with Redis broker
- MongoDB 7.0 - Document database with replica set support
- Redis 7.2 - Caching and message queuing
- Google Cloud Platform - Storage, AI services, Secret Manager, TTS
- Pydantic 2.5 - Data validation and serialization
- OpenTelemetry - Observability and monitoring
- Sentry - Error tracking and performance monitoring
Frontend (React 19 + TypeScript)
- React 19.1.1 - Modern UI framework with latest features
- Vite 7.1.2 - Lightning-fast build tool and dev server
- TypeScript 5.8 - Full type safety throughout application
- TanStack Query 5.85 - Advanced server state management with caching
- React Router 7.8 - Client-side routing with protected routes
- Tailwind CSS 4.1 - Utility-first CSS framework
- Zustand 5.0 - Lightweight client state management
- React Hook Form + Zod - Form handling with schema validation
🏗 Architecture Overview
Complete Job Processing Pipeline ✅
Upload → Ingestion → AI Processing → QC Review → Translation → TTS → Final Review → Delivery
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
GCS Gemini 2.5 VTT Generation Human Google Text-to- Reviewer Email +
Storage Pro + Validation Review Translate Speech Approval Downloads
System Architecture
- Monorepo Structure:
/backend,/frontend,/infrawith clear separation - Microservices Ready: Modular FastAPI services with proper dependency injection
- Event-Driven: WebSocket real-time updates with connection management
- Scalable Workers: Celery task queue with auto-retry and error recovery
- Secure by Design: RBAC, signed URLs, audit logging, input validation
🚀 Getting Started
Prerequisites
- Python 3.11+ (backend development)
- Node.js 18+ (frontend development)
- Docker & Docker Compose (required for local development)
- Google Cloud Project with APIs enabled (for video processing)
🐳 Local Development with Docker (Recommended)
This is the recommended approach for local development. Backend services run in Docker containers while the frontend runs via Vite dev server for fast hot-reload.
Initial Setup
# 1. Clone the repository
git clone <repository>
cd video_accessibility
# 2. Copy and configure environment files
cp .env.prod.example .env.local
# Edit .env.local with your API keys and settings
# 3. Set up frontend environment
cp frontend/.env.example frontend/.env.local
# The defaults should work for local development
# 4. Ensure GCP credentials are in place
# Copy your GCP service account JSON to: ./secrets/gcp-credentials.json
Starting the Development Environment
Step 1: Start Backend Services (Docker)
# Start API, Worker, MongoDB, and Redis in Docker
./scripts/run-local.sh
# Services will be available at:
# - API: http://localhost:8003
# - API Docs: http://localhost:8003/docs
# - MongoDB: mongodb://localhost:27017
# - Redis: redis://localhost:6379
Step 2: Start Frontend (Vite Dev Server)
# In a separate terminal
cd frontend
npm install # First time only
npm run dev
# Frontend will be available at:
# - Application: http://localhost:6001/video-accessibility
Useful Commands
# View logs
docker compose logs -f api # API logs
docker compose logs -f worker # Worker logs
docker compose logs -f # All logs
# Restart a service
docker compose restart api
docker compose restart worker
# Rebuild and restart (after code changes)
./scripts/run-local.sh --rebuild
# Stop all services
./scripts/run-local.sh --stop
# or
docker compose down
Test User Credentials (Local Development Only)
For testing different user roles locally:
Admin: admin@example.com / admin
Production: production@example.com / production
Reviewer: reviewer@example.com / reviewer
Client: client@example.com / client123
Note: These test users are only for local development. Production uses Microsoft authentication.
Alternative: Native Development (Without Docker)
For development without Docker, you'll need to run each service manually:
# Terminal 1: MongoDB
mongod --dbpath ./data/db
# Terminal 2: Redis
redis-server
# Terminal 3: Backend API
cd backend
poetry install
poetry run uvicorn app.main:app --reload --port 8000
# Terminal 4: Celery Worker
cd backend
poetry run celery -A app.tasks worker --loglevel=info
# Terminal 5: Frontend
cd frontend
npm install
npm run dev
Note: The Docker approach is strongly recommended as it ensures consistency and simplifies setup.
Testing & Quality
# Backend tests + linting
cd backend
poetry run pytest
poetry run ruff check .
poetry run mypy .
# Frontend tests + linting
cd frontend
npm run test
npm run test:e2e
npm run lint
npm run type-check
📁 Project Structure
video_accessibility/ # Root monorepo
├── backend/ # FastAPI Python backend (12,198 LOC)
│ ├── app/
│ │ ├── api/v1/ # REST API endpoints
│ │ │ ├── auth.py # JWT authentication
│ │ │ ├── jobs.py # Job CRUD & workflow
│ │ │ ├── admin.py # Admin operations
│ │ │ └── files.py # File management
│ │ ├── core/ # Core configuration
│ │ ├── models/ # Database models
│ │ ├── schemas/ # Pydantic request/response schemas
│ │ ├── services/ # External service integrations
│ │ │ ├── gemini.py # AI processing
│ │ │ ├── gcs.py # Google Cloud Storage
│ │ │ ├── translation.py # Multi-language support
│ │ │ └── tts.py # Text-to-speech
│ │ ├── tasks/ # Celery background workers
│ │ ├── middleware/ # Request processing
│ │ └── telemetry/ # Observability
│ ├── tests/ # Comprehensive test suite
│ └── Dockerfile # Container configuration
├── frontend/ # React TypeScript SPA (8,273 LOC)
│ ├── src/
│ │ ├── routes/ # Page components
│ │ │ ├── auth/ # Login system
│ │ │ ├── jobs/ # Job management
│ │ │ ├── qc/ # Quality control
│ │ │ └── admin/ # Admin interface
│ │ ├── components/ # Reusable UI components
│ │ │ ├── VideoWithCaptions.tsx # Advanced video player
│ │ │ ├── VttEditor.tsx # Caption editing
│ │ │ └── UploadDropzone.tsx # File upload
│ │ ├── lib/ # Utilities and API client
│ │ ├── hooks/ # Custom React hooks
│ │ └── types/ # TypeScript definitions
│ ├── tests/ # Unit + E2E tests
│ ├── .env.local # Local development config
│ └── Dockerfile # Container configuration
├── scripts/
│ ├── run-local.sh # Local development startup
│ ├── deploy.sh # Production deployment
│ ├── full-deploy.sh # Full production rebuild
│ └── build-frontend.sh # Frontend build script
├── docker-compose.yml # Base Docker configuration
├── docker-compose.local.yml # Local development overrides
├── docker-compose.prod.yml # Production overrides
├── .env.local # Local environment variables
├── .env.production # Production environment variables
├── CLAUDE.md # Development guidelines
└── video_accessibility_development_plan.txt # Complete specification
⚙️ Configuration
Environment Variables
Backend (backend/.env):
# Database
MONGODB_URL=mongodb://admin:password@localhost:27017/accessible_video
REDIS_URL=redis://localhost:6379/0
# Authentication
JWT_SECRET_KEY=your-jwt-secret
JWT_REFRESH_SECRET_KEY=your-refresh-secret
# AI Services
GEMINI_API_KEY=your-gemini-key
ELEVENLABS_API_KEY=your-elevenlabs-key
# Google Cloud
GCS_BUCKET_NAME=your-bucket-name
GOOGLE_CLOUD_PROJECT=your-project-id
# Email
SENDGRID_API_KEY=your-sendgrid-key
# Monitoring
SENTRY_DSN=your-sentry-dsn
Frontend (frontend/.env):
VITE_API_URL=http://localhost:8000
VITE_SENTRY_DSN=your-sentry-dsn
VITE_ENVIRONMENT=development
Google Cloud Setup
- Create GCP Project with billing enabled
- Enable APIs:
- Cloud Storage API
- Cloud Translation API
- Cloud Text-to-Speech API
- Vertex AI API (for Gemini)
- Secret Manager API
- Create Service Account with roles:
- Storage Admin
- AI Platform Admin
- Secret Manager Admin
- Download JSON key and set
GOOGLE_APPLICATION_CREDENTIALS
🚢 Deployment Options
Production Architecture (Google Cloud)
- Frontend: Cloud Storage + Cloud CDN (static hosting)
- Backend API: Cloud Run (serverless, auto-scaling)
- Workers: Cloud Run (Celery with Redis)
- Database: MongoDB Atlas (managed)
- Queue: Cloud Memorystore (Redis)
- Storage: Google Cloud Storage
- Monitoring: Cloud Monitoring + Sentry
Docker Production
# Build production images
docker-compose -f docker-compose.prod.yml up -d
🔒 Security Features
Implemented Security ✅
- JWT Authentication: Access (15min) + refresh (7 days) token rotation
- RBAC System: CLIENT/REVIEWER/ADMIN roles with endpoint protection
- Secure Storage: HttpOnly cookies for refresh tokens
- File Security: Signed URLs with 24h expiry, no client access to raw files
- Input Validation: Comprehensive Pydantic validation on all endpoints
- Audit Logging: Complete trail of all reviewer actions and system events
- CORS Protection: Configured for production domains
- Rate Limiting: Request throttling and validation middleware
🔧 API Documentation
Key Endpoints Implemented
POST /api/v1/auth/login # Authentication
POST /api/v1/jobs # Create job with file upload
GET /api/v1/jobs # List jobs (filtered by role)
GET /api/v1/jobs/{id} # Job details with real-time status
POST /api/v1/jobs/{id}/actions/* # Workflow actions (approve/reject/complete)
GET /api/v1/jobs/{id}/vtt # VTT content retrieval
PATCH /api/v1/jobs/{id}/vtt # VTT editing and updates
GET /api/v1/jobs/{id}/downloads # Signed download URLs
WS /api/v1/ws/jobs/{id} # Real-time job status updates
OpenAPI Documentation: http://localhost:8000/docs
🎯 Development Status
✅ Completed (Production Ready)
- User Management: Full authentication, RBAC, password management
- Job Pipeline: Complete video processing workflow with state machine
- Quality Control: VTT editor, approval workflows, reviewer dashboards
- Real-time Features: WebSocket updates, live notifications
- Multi-language: Translation pipeline with cultural transcreation
- File Management: Secure uploads, downloads, asset validation
- Admin Features: User management, system monitoring, audit logs
⚠️ Needs Attention (Minor)
- Integration Tests: Framework exists but needs completion
- Email Templates: Service implemented, templates may need customization
- Performance Testing: No load testing implemented yet
- Documentation: API docs complete, user guides could be enhanced
🎯 Recommended Next Steps
- Complete integration test suite for end-to-end validation
- Performance testing with realistic video processing loads
- Production deployment configuration and CI/CD pipeline
- User documentation and training materials
- Monitoring dashboards for production operations
📚 Development Resources
- Complete Specification:
video_accessibility_development_plan.txt - Development Guidelines:
CLAUDE.md - API Documentation: http://localhost:8000/docs (when running)
- Test Coverage Reports:
backend/htmlcov/(after running tests)