video-accessibility/README.md
Vadym Samoilenko a3b300b76a docs: add canonical documentation + audit cleanup
- AGENTS.md: canonical project entry point (Quick Nav, pipeline, constraints)
- docs/: complete docs tree — architecture, API spec, DB schema, infra,
  runbook, requirements, tech stack, principles, reference ADRs, guides,
  tasks backlog, testing strategy
- tests/README.md: test commands, structure, known gaps
- README.md / CLAUDE.md / DEPLOYMENT.md: updated with canonical doc links
- .archive/: backup of pre-documentation-pipeline originals
- backend/uv.lock: uv dependency lockfile
- Delete committed __pycache__ .pyc files (should have been gitignored)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:22:51 +01:00

386 lines
No EOL
15 KiB
Markdown

# Accessible Video Processing Platform
A comprehensive AI-powered platform for generating accessible video content with closed captions, audio descriptions, and multi-language translations. Features a complete workflow from video upload to final delivery with quality control processes.
**Documentation:** See [AGENTS.md](AGENTS.md) for full navigation, or [docs/README.md](docs/README.md) for the documentation hub.
## ✅ Current Status: **Production-Ready** (85% Complete)
**Lines of Code:** 20,471 total (12,198 backend + 8,273 frontend)
## 🚀 Key Features Implemented
### Core Functionality ✅
- **AI-Powered Processing**: Complete Gemini 2.5 Pro integration for intelligent caption and audio description generation
- **Multi-Language Pipeline**: Google Translate + cultural transcreation with 50+ language support
- **Quality Control Workflow**: Full reviewer approval/rejection system with VTT editing capabilities
- **Audio Description TTS**: Google Cloud TTS and ElevenLabs integration with audio synthesis
- **Real-time Updates**: WebSocket-powered job status tracking and notifications
- **Advanced Video Player**: Multi-language caption support with timeline navigation
- **Role-Based Access Control**: Complete CLIENT/REVIEWER/ADMIN role system
### Security & Infrastructure ✅
- **JWT Authentication**: Secure access/refresh token system with HttpOnly cookies
- **Audit Logging**: Comprehensive audit trail for all reviewer actions
- **Signed URLs**: Secure Google Cloud Storage file access (24h expiry)
- **Input Validation**: Complete request validation and sanitization
- **HTTPS/CORS**: Production-ready security configuration
### User Experience ✅
- **Responsive Design**: Mobile-first Tailwind CSS implementation
- **Real-time Feedback**: Live job progress tracking and notifications
- **Advanced File Management**: Drag-and-drop uploads with progress indicators
- **VTT Editor**: Inline caption editing with live preview
- **Download Portal**: Secure asset delivery with organized file structure
## 🛠 Tech Stack
### Backend (FastAPI + Python 3.11)
- **FastAPI 0.115.0** - Modern async web framework with OpenAPI documentation
- **Celery 5.3.4** - Distributed task queue with Redis broker
- **MongoDB 7.0** - Document database with replica set support
- **Redis 7.2** - Caching and message queuing
- **Google Cloud Platform** - Storage, AI services, Secret Manager, TTS
- **Pydantic 2.5** - Data validation and serialization
- **OpenTelemetry** - Observability and monitoring
- **Sentry** - Error tracking and performance monitoring
### Frontend (React 19 + TypeScript)
- **React 19.1.1** - Modern UI framework with latest features
- **Vite 7.1.2** - Lightning-fast build tool and dev server
- **TypeScript 5.8** - Full type safety throughout application
- **TanStack Query 5.85** - Advanced server state management with caching
- **React Router 7.8** - Client-side routing with protected routes
- **Tailwind CSS 4.1** - Utility-first CSS framework
- **Zustand 5.0** - Lightweight client state management
- **React Hook Form + Zod** - Form handling with schema validation
## 🏗 Architecture Overview
### Complete Job Processing Pipeline ✅
```
Upload → Ingestion → AI Processing → QC Review → Translation → TTS → Final Review → Delivery
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
GCS Gemini 2.5 VTT Generation Human Google Text-to- Reviewer Email +
Storage Pro + Validation Review Translate Speech Approval Downloads
```
### System Architecture
- **Monorepo Structure**: `/backend`, `/frontend`, `/infra` with clear separation
- **Microservices Ready**: Modular FastAPI services with proper dependency injection
- **Event-Driven**: WebSocket real-time updates with connection management
- **Scalable Workers**: Celery task queue with auto-retry and error recovery
- **Secure by Design**: RBAC, signed URLs, audit logging, input validation
## 🚀 Getting Started
### Prerequisites
- **Python 3.11+** (backend development)
- **Node.js 18+** (frontend development)
- **Docker & Docker Compose** (required for local development)
- **Google Cloud Project** with APIs enabled (for video processing)
### 🐳 Local Development with Docker (Recommended)
This is the recommended approach for local development. Backend services run in Docker containers while the frontend runs via Vite dev server for fast hot-reload.
#### Initial Setup
```bash
# 1. Clone the repository
git clone <repository>
cd video_accessibility
# 2. Copy and configure environment files
cp .env.prod.example .env.local
# Edit .env.local with your API keys and settings
# 3. Set up frontend environment
cp frontend/.env.example frontend/.env.local
# The defaults should work for local development
# 4. Ensure GCP credentials are in place
# Copy your GCP service account JSON to: ./secrets/gcp-credentials.json
```
#### Starting the Development Environment
**Step 1: Start Backend Services (Docker)**
```bash
# Start API, Worker, MongoDB, and Redis in Docker
./scripts/run-local.sh
# Services will be available at:
# - API: http://localhost:8003
# - API Docs: http://localhost:8003/docs
# - MongoDB: mongodb://localhost:27017
# - Redis: redis://localhost:6379
```
**Step 2: Start Frontend (Vite Dev Server)**
```bash
# In a separate terminal
cd frontend
npm install # First time only
npm run dev
# Frontend will be available at:
# - Application: http://localhost:6001/video-accessibility
```
#### Useful Commands
```bash
# View logs
docker compose logs -f api # API logs
docker compose logs -f worker # Worker logs
docker compose logs -f # All logs
# Restart a service
docker compose restart api
docker compose restart worker
# Rebuild and restart (after code changes)
./scripts/run-local.sh --rebuild
# Stop all services
./scripts/run-local.sh --stop
# or
docker compose down
```
#### Test User Credentials (Local Development Only)
For testing different user roles locally:
```
Admin: admin@example.com / admin
Production: production@example.com / production
Reviewer: reviewer@example.com / reviewer
Client: client@example.com / client123
```
**Note**: These test users are only for local development. Production uses Microsoft authentication.
### Alternative: Native Development (Without Docker)
For development without Docker, you'll need to run each service manually:
```bash
# Terminal 1: MongoDB
mongod --dbpath ./data/db
# Terminal 2: Redis
redis-server
# Terminal 3: Backend API
cd backend
poetry install
poetry run uvicorn app.main:app --reload --port 8000
# Terminal 4: Celery Worker
cd backend
poetry run celery -A app.tasks worker --loglevel=info
# Terminal 5: Frontend
cd frontend
npm install
npm run dev
```
**Note**: The Docker approach is strongly recommended as it ensures consistency and simplifies setup.
### Testing & Quality
```bash
# Backend tests + linting
cd backend
poetry run pytest
poetry run ruff check .
poetry run mypy .
# Frontend tests + linting
cd frontend
npm run test
npm run test:e2e
npm run lint
npm run type-check
```
## 📁 Project Structure
```
video_accessibility/ # Root monorepo
├── backend/ # FastAPI Python backend (12,198 LOC)
│ ├── app/
│ │ ├── api/v1/ # REST API endpoints
│ │ │ ├── auth.py # JWT authentication
│ │ │ ├── jobs.py # Job CRUD & workflow
│ │ │ ├── admin.py # Admin operations
│ │ │ └── files.py # File management
│ │ ├── core/ # Core configuration
│ │ ├── models/ # Database models
│ │ ├── schemas/ # Pydantic request/response schemas
│ │ ├── services/ # External service integrations
│ │ │ ├── gemini.py # AI processing
│ │ │ ├── gcs.py # Google Cloud Storage
│ │ │ ├── translation.py # Multi-language support
│ │ │ └── tts.py # Text-to-speech
│ │ ├── tasks/ # Celery background workers
│ │ ├── middleware/ # Request processing
│ │ └── telemetry/ # Observability
│ ├── tests/ # Comprehensive test suite
│ └── Dockerfile # Container configuration
├── frontend/ # React TypeScript SPA (8,273 LOC)
│ ├── src/
│ │ ├── routes/ # Page components
│ │ │ ├── auth/ # Login system
│ │ │ ├── jobs/ # Job management
│ │ │ ├── qc/ # Quality control
│ │ │ └── admin/ # Admin interface
│ │ ├── components/ # Reusable UI components
│ │ │ ├── VideoWithCaptions.tsx # Advanced video player
│ │ │ ├── VttEditor.tsx # Caption editing
│ │ │ └── UploadDropzone.tsx # File upload
│ │ ├── lib/ # Utilities and API client
│ │ ├── hooks/ # Custom React hooks
│ │ └── types/ # TypeScript definitions
│ ├── tests/ # Unit + E2E tests
│ ├── .env.local # Local development config
│ └── Dockerfile # Container configuration
├── scripts/
│ ├── run-local.sh # Local development startup
│ ├── deploy.sh # Production deployment
│ ├── full-deploy.sh # Full production rebuild
│ └── build-frontend.sh # Frontend build script
├── docker-compose.yml # Base Docker configuration
├── docker-compose.local.yml # Local development overrides
├── docker-compose.prod.yml # Production overrides
├── .env.local # Local environment variables
├── .env.production # Production environment variables
├── CLAUDE.md # Development guidelines
└── video_accessibility_development_plan.txt # Complete specification
```
## ⚙️ Configuration
### Environment Variables
**Backend** (`backend/.env`):
```bash
# Database
MONGODB_URL=mongodb://admin:password@localhost:27017/accessible_video
REDIS_URL=redis://localhost:6379/0
# Authentication
JWT_SECRET_KEY=your-jwt-secret
JWT_REFRESH_SECRET_KEY=your-refresh-secret
# AI Services
GEMINI_API_KEY=your-gemini-key
ELEVENLABS_API_KEY=your-elevenlabs-key
# Google Cloud
GCS_BUCKET_NAME=your-bucket-name
GOOGLE_CLOUD_PROJECT=your-project-id
# Email
SENDGRID_API_KEY=your-sendgrid-key
# Monitoring
SENTRY_DSN=your-sentry-dsn
```
**Frontend** (`frontend/.env`):
```bash
VITE_API_URL=http://localhost:8000
VITE_SENTRY_DSN=your-sentry-dsn
VITE_ENVIRONMENT=development
```
### Google Cloud Setup
1. **Create GCP Project** with billing enabled
2. **Enable APIs**:
- Cloud Storage API
- Cloud Translation API
- Cloud Text-to-Speech API
- Vertex AI API (for Gemini)
- Secret Manager API
3. **Create Service Account** with roles:
- Storage Admin
- AI Platform Admin
- Secret Manager Admin
4. **Download JSON key** and set `GOOGLE_APPLICATION_CREDENTIALS`
## 🚢 Deployment Options
### Production Architecture (Google Cloud)
- **Frontend**: Cloud Storage + Cloud CDN (static hosting)
- **Backend API**: Cloud Run (serverless, auto-scaling)
- **Workers**: Cloud Run (Celery with Redis)
- **Database**: MongoDB Atlas (managed)
- **Queue**: Cloud Memorystore (Redis)
- **Storage**: Google Cloud Storage
- **Monitoring**: Cloud Monitoring + Sentry
### Docker Production
```bash
# Build production images
docker-compose -f docker-compose.prod.yml up -d
```
## 🔒 Security Features
### Implemented Security ✅
- **JWT Authentication**: Access (15min) + refresh (7 days) token rotation
- **RBAC System**: CLIENT/REVIEWER/ADMIN roles with endpoint protection
- **Secure Storage**: HttpOnly cookies for refresh tokens
- **File Security**: Signed URLs with 24h expiry, no client access to raw files
- **Input Validation**: Comprehensive Pydantic validation on all endpoints
- **Audit Logging**: Complete trail of all reviewer actions and system events
- **CORS Protection**: Configured for production domains
- **Rate Limiting**: Request throttling and validation middleware
## 🔧 API Documentation
### Key Endpoints Implemented
```
POST /api/v1/auth/login # Authentication
POST /api/v1/jobs # Create job with file upload
GET /api/v1/jobs # List jobs (filtered by role)
GET /api/v1/jobs/{id} # Job details with real-time status
POST /api/v1/jobs/{id}/actions/* # Workflow actions (approve/reject/complete)
GET /api/v1/jobs/{id}/vtt # VTT content retrieval
PATCH /api/v1/jobs/{id}/vtt # VTT editing and updates
GET /api/v1/jobs/{id}/downloads # Signed download URLs
WS /api/v1/ws/jobs/{id} # Real-time job status updates
```
**OpenAPI Documentation**: http://localhost:8000/docs
## 🎯 Development Status
### ✅ Completed (Production Ready)
- **User Management**: Full authentication, RBAC, password management
- **Job Pipeline**: Complete video processing workflow with state machine
- **Quality Control**: VTT editor, approval workflows, reviewer dashboards
- **Real-time Features**: WebSocket updates, live notifications
- **Multi-language**: Translation pipeline with cultural transcreation
- **File Management**: Secure uploads, downloads, asset validation
- **Admin Features**: User management, system monitoring, audit logs
### ⚠️ Needs Attention (Minor)
- **Integration Tests**: Framework exists but needs completion
- **Email Templates**: Service implemented, templates may need customization
- **Performance Testing**: No load testing implemented yet
- **Documentation**: API docs complete, user guides could be enhanced
### 🎯 Recommended Next Steps
1. **Complete integration test suite** for end-to-end validation
2. **Performance testing** with realistic video processing loads
3. **Production deployment** configuration and CI/CD pipeline
4. **User documentation** and training materials
5. **Monitoring dashboards** for production operations
## 📚 Development Resources
- **Complete Specification**: `video_accessibility_development_plan.txt`
- **Development Guidelines**: `CLAUDE.md`
- **API Documentation**: http://localhost:8000/docs (when running)
- **Test Coverage Reports**: `backend/htmlcov/` (after running tests)