docs: add canonical documentation + audit cleanup
- AGENTS.md: canonical project entry point (Quick Nav, pipeline, constraints) - docs/: complete docs tree — architecture, API spec, DB schema, infra, runbook, requirements, tech stack, principles, reference ADRs, guides, tasks backlog, testing strategy - tests/README.md: test commands, structure, known gaps - README.md / CLAUDE.md / DEPLOYMENT.md: updated with canonical doc links - .archive/: backup of pre-documentation-pipeline originals - backend/uv.lock: uv dependency lockfile - Delete committed __pycache__ .pyc files (should have been gitignored) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
fd154e7799
commit
a3b300b76a
69 changed files with 4245 additions and 0 deletions
25
.archive/source-docs-2026-04-29/README_cleanup.md
Normal file
25
.archive/source-docs-2026-04-29/README_cleanup.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
# Source Documentation Archive — 2026-04-29
|
||||
|
||||
## What was archived
|
||||
|
||||
Original non-canonical documentation files backed up before canonical structure was created.
|
||||
|
||||
## Files archived
|
||||
|
||||
| File | Migrated to |
|
||||
|------|------------|
|
||||
| `README.md` | Updated in place; canonical docs in `docs/` |
|
||||
| `DEPLOYMENT.md` | `docs/project/runbook.md` + `docs/project/infrastructure.md` |
|
||||
| `DEPLOYMENT_OPTIONS.md` | `docs/project/infrastructure.md` |
|
||||
| `APACHE_DEPLOYMENT.md` | `docs/project/runbook.md` (Apache config section) |
|
||||
|
||||
## Rollback
|
||||
|
||||
To restore original files: copy from `original/` back to project root.
|
||||
|
||||
```
|
||||
cp original/README.md ../../README.md
|
||||
cp original/DEPLOYMENT.md ../../DEPLOYMENT.md
|
||||
cp original/DEPLOYMENT_OPTIONS.md ../../DEPLOYMENT_OPTIONS.md
|
||||
cp original/APACHE_DEPLOYMENT.md ../../APACHE_DEPLOYMENT.md
|
||||
```
|
||||
236
.archive/source-docs-2026-04-29/original/APACHE_DEPLOYMENT.md
Normal file
236
.archive/source-docs-2026-04-29/original/APACHE_DEPLOYMENT.md
Normal file
|
|
@ -0,0 +1,236 @@
|
|||
# Apache Frontend + Docker Backend Deployment Guide
|
||||
|
||||
## 🏗 Architecture Overview
|
||||
|
||||
**Frontend**: Built React app served by your existing Apache webserver
|
||||
**Backend**: Docker containers running FastAPI + workers + database
|
||||
|
||||
```
|
||||
Apache Webserver (Frontend) → Docker Backend Services
|
||||
└── Built React App ├── FastAPI API (:8000)
|
||||
├── Celery Workers
|
||||
├── Change Stream Service
|
||||
├── MongoDB
|
||||
└── Redis
|
||||
```
|
||||
|
||||
## 🚀 Deployment Steps
|
||||
|
||||
### 1. **Deploy Backend Services**
|
||||
|
||||
```bash
|
||||
# 1. Create production environment file
|
||||
cp .env.prod.example .env.prod
|
||||
# Edit .env.prod with your production values
|
||||
|
||||
# 2. Start backend services only
|
||||
docker-compose -f docker-compose.prod.yml up -d
|
||||
|
||||
# 3. Verify services are running
|
||||
docker-compose -f docker-compose.prod.yml ps
|
||||
```
|
||||
|
||||
**Running Services:**
|
||||
- `accessible-video-api-prod` - FastAPI API (port 8000)
|
||||
- `accessible-video-worker-prod` - Celery workers
|
||||
- `accessible-video-mongo-prod` - MongoDB database
|
||||
- `accessible-video-redis-prod` - Redis cache/queue
|
||||
|
||||
### 2. **Build and Deploy Frontend to Apache**
|
||||
|
||||
```bash
|
||||
# 1. Configure frontend environment
|
||||
cd frontend
|
||||
cp .env.example .env.production.local
|
||||
|
||||
# Edit .env.production.local:
|
||||
# VITE_API_URL=https://your-api-domain.com:8000
|
||||
# VITE_SENTRY_DSN=your-sentry-dsn
|
||||
# VITE_ENVIRONMENT=production
|
||||
|
||||
# 2. Build production frontend
|
||||
npm run build
|
||||
|
||||
# 3. Deploy to Apache document root
|
||||
sudo cp -r dist/* /var/www/html/your-app/
|
||||
# OR
|
||||
sudo rsync -av --delete dist/ /var/www/html/your-app/
|
||||
```
|
||||
|
||||
### 3. **Configure Apache Virtual Host**
|
||||
|
||||
Create `/etc/apache2/sites-available/your-app.conf`:
|
||||
|
||||
```apache
|
||||
<VirtualHost *:443>
|
||||
ServerName your-domain.com
|
||||
ServerAlias www.your-domain.com
|
||||
DocumentRoot /var/www/html/your-app
|
||||
|
||||
# SSL Configuration
|
||||
SSLEngine on
|
||||
SSLCertificateFile /path/to/your/certificate.crt
|
||||
SSLCertificateKeyFile /path/to/your/private.key
|
||||
|
||||
# Security Headers
|
||||
Header always set X-Frame-Options "SAMEORIGIN"
|
||||
Header always set X-Content-Type-Options "nosniff"
|
||||
Header always set X-XSS-Protection "1; mode=block"
|
||||
Header always set Referrer-Policy "strict-origin-when-cross-origin"
|
||||
Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
|
||||
|
||||
# Compression
|
||||
<IfModule mod_deflate.c>
|
||||
AddOutputFilterByType DEFLATE text/plain
|
||||
AddOutputFilterByType DEFLATE text/html
|
||||
AddOutputFilterByType DEFLATE text/xml
|
||||
AddOutputFilterByType DEFLATE text/css
|
||||
AddOutputFilterByType DEFLATE application/xml
|
||||
AddOutputFilterByType DEFLATE application/xhtml+xml
|
||||
AddOutputFilterByType DEFLATE application/rss+xml
|
||||
AddOutputFilterByType DEFLATE application/javascript
|
||||
AddOutputFilterByType DEFLATE application/x-javascript
|
||||
</IfModule>
|
||||
|
||||
# Caching for static assets
|
||||
<LocationMatch "\.(css|js|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$">
|
||||
ExpiresActive On
|
||||
ExpiresDefault "access plus 1 year"
|
||||
Header set Cache-Control "public, immutable"
|
||||
</LocationMatch>
|
||||
|
||||
# Don't cache HTML files
|
||||
<LocationMatch "\.html$">
|
||||
ExpiresActive On
|
||||
ExpiresDefault "access plus 0 seconds"
|
||||
Header set Cache-Control "no-cache, no-store, must-revalidate"
|
||||
</LocationMatch>
|
||||
|
||||
# React Router support (handle client-side routing)
|
||||
<Directory "/var/www/html/your-app">
|
||||
Options -Indexes
|
||||
AllowOverride All
|
||||
Require all granted
|
||||
|
||||
# Fallback to index.html for client-side routing
|
||||
FallbackResource /index.html
|
||||
</Directory>
|
||||
|
||||
# Optional: Proxy API requests (alternative to CORS)
|
||||
# ProxyPreserveHost On
|
||||
# ProxyPass /api/ http://your-docker-host:8000/api/
|
||||
# ProxyPassReverse /api/ http://your-docker-host:8000/api/
|
||||
|
||||
# Logs
|
||||
ErrorLog ${APACHE_LOG_DIR}/your-app_error.log
|
||||
CustomLog ${APACHE_LOG_DIR}/your-app_access.log combined
|
||||
</VirtualHost>
|
||||
|
||||
# HTTP to HTTPS redirect
|
||||
<VirtualHost *:80>
|
||||
ServerName your-domain.com
|
||||
ServerAlias www.your-domain.com
|
||||
Redirect permanent / https://your-domain.com/
|
||||
</VirtualHost>
|
||||
```
|
||||
|
||||
Enable the site:
|
||||
```bash
|
||||
sudo a2ensite your-app.conf
|
||||
sudo systemctl reload apache2
|
||||
```
|
||||
|
||||
## ⚙️ Configuration Files Updated
|
||||
|
||||
### `docker-compose.prod.yml`
|
||||
- ✅ Removed frontend and nginx services
|
||||
- ✅ Added CORS_ORIGINS environment variable
|
||||
- ✅ Backend services only (API, workers, database)
|
||||
|
||||
### `.env.prod.example`
|
||||
- ✅ Production environment template
|
||||
- ✅ CORS configuration for Apache frontend
|
||||
- ✅ All required variables documented
|
||||
|
||||
## 🔧 CORS Configuration
|
||||
|
||||
Since frontend and backend are on different domains, configure CORS in your backend:
|
||||
|
||||
**In `.env.prod`:**
|
||||
```bash
|
||||
CORS_ORIGINS=https://your-domain.com,https://www.your-domain.com
|
||||
```
|
||||
|
||||
**Backend automatically handles CORS** based on this environment variable.
|
||||
|
||||
## 📋 Deployment Checklist
|
||||
|
||||
### Backend Services
|
||||
- [ ] Copy `.env.prod.example` to `.env.prod`
|
||||
- [ ] Update all environment variables in `.env.prod`
|
||||
- [ ] Run `docker-compose -f docker-compose.prod.yml up -d`
|
||||
- [ ] Verify API accessible at `http://your-docker-host:8000/docs`
|
||||
- [ ] Check logs: `docker-compose -f docker-compose.prod.yml logs -f`
|
||||
|
||||
### Frontend Deployment
|
||||
- [ ] Update `frontend/.env.production.local` with API URL
|
||||
- [ ] Run `npm run build` in frontend directory
|
||||
- [ ] Copy `dist/*` to Apache document root
|
||||
- [ ] Configure Apache virtual host
|
||||
- [ ] Enable site and reload Apache
|
||||
- [ ] Test frontend loads and connects to API
|
||||
|
||||
### Security & Performance
|
||||
- [ ] SSL certificate configured
|
||||
- [ ] Security headers enabled
|
||||
- [ ] Gzip compression enabled
|
||||
- [ ] Static file caching configured
|
||||
- [ ] CORS origins properly set
|
||||
- [ ] Firewall rules: only expose port 8000 for API
|
||||
|
||||
## 🔍 Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**CORS Errors:**
|
||||
- Verify `CORS_ORIGINS` in `.env.prod` matches your domain
|
||||
- Check browser dev tools for exact error
|
||||
|
||||
**API Connection Failed:**
|
||||
- Verify `VITE_API_URL` in frontend build
|
||||
- Check backend API is accessible from frontend server
|
||||
- Ensure port 8000 is open and reachable
|
||||
|
||||
**React Router 404s:**
|
||||
- Verify `FallbackResource /index.html` in Apache config
|
||||
- Ensure `AllowOverride All` is set
|
||||
|
||||
**File Upload Issues:**
|
||||
- Check Apache `LimitRequestBody` directive
|
||||
- Verify backend can write to GCS bucket
|
||||
|
||||
### Monitoring Commands
|
||||
|
||||
```bash
|
||||
# Backend services status
|
||||
docker-compose -f docker-compose.prod.yml ps
|
||||
|
||||
# View logs
|
||||
docker-compose -f docker-compose.prod.yml logs -f api
|
||||
docker-compose -f docker-compose.prod.yml logs -f worker
|
||||
|
||||
# Apache status
|
||||
sudo systemctl status apache2
|
||||
sudo tail -f /var/log/apache2/your-app_error.log
|
||||
```
|
||||
|
||||
## 🎯 Benefits of This Setup
|
||||
|
||||
✅ **Separation of Concerns** - Frontend and backend independently deployable
|
||||
✅ **Existing Infrastructure** - Uses your current Apache setup
|
||||
✅ **Scalability** - Backend can be moved to different hosts easily
|
||||
✅ **Caching** - Apache handles static file caching efficiently
|
||||
✅ **SSL Termination** - Apache handles HTTPS for frontend
|
||||
✅ **Monitoring** - Separate logs and monitoring for each tier
|
||||
|
||||
Your backend services will run in Docker containers while the frontend integrates seamlessly with your existing Apache web server infrastructure.
|
||||
BIN
.archive/source-docs-2026-04-29/original/DEPLOYMENT.md
Normal file
BIN
.archive/source-docs-2026-04-29/original/DEPLOYMENT.md
Normal file
Binary file not shown.
168
.archive/source-docs-2026-04-29/original/DEPLOYMENT_OPTIONS.md
Normal file
168
.archive/source-docs-2026-04-29/original/DEPLOYMENT_OPTIONS.md
Normal file
|
|
@ -0,0 +1,168 @@
|
|||
# Deployment Options for Video Accessibility Platform
|
||||
|
||||
## 🏗 Current Docker Setup
|
||||
|
||||
Your `docker-compose.yml` serves **both frontend and backend** in **development mode**:
|
||||
|
||||
- **Frontend**: Vite dev server on port 5173 (hot reload)
|
||||
- **Backend**: FastAPI on port 8000 (auto-reload)
|
||||
- **Database**: MongoDB + Redis
|
||||
- **Workers**: Celery + Change Stream service
|
||||
|
||||
## 🚀 Production Deployment Options
|
||||
|
||||
### 1. **All-in-Docker Production** ✅ Recommended
|
||||
|
||||
**What it does:**
|
||||
- Frontend: Built React app served by Nginx (port 80)
|
||||
- Backend: Production FastAPI (port 8000)
|
||||
- Single `docker-compose up` deployment
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Production deployment
|
||||
docker-compose -f docker-compose.prod.yml up -d
|
||||
|
||||
# Access:
|
||||
# Frontend: http://localhost:80
|
||||
# Backend API: http://localhost:8000
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- ✅ Single command deployment
|
||||
- ✅ Optimized frontend build
|
||||
- ✅ Production-ready configuration
|
||||
- ✅ Built-in health checks
|
||||
- ✅ Nginx caching and compression
|
||||
|
||||
### 2. **Single Domain with Nginx Proxy** ✅ Best UX
|
||||
|
||||
**What it does:**
|
||||
- Everything served from one domain (port 80)
|
||||
- `/api/*` routes to backend
|
||||
- `/*` routes to frontend
|
||||
- WebSocket support included
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Uses nginx/nginx.conf for routing
|
||||
docker-compose -f docker-compose.prod.yml up nginx
|
||||
|
||||
# Access everything at: http://localhost
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- ✅ No CORS issues
|
||||
- ✅ Single domain simplicity
|
||||
- ✅ Better caching control
|
||||
- ✅ Rate limiting built-in
|
||||
- ✅ SSL termination ready
|
||||
|
||||
### 3. **Cloud-Native (Google Cloud)** 🌟 Enterprise
|
||||
|
||||
**Architecture:**
|
||||
```
|
||||
Frontend (Cloud Storage + CDN) → API (Cloud Run) → Database (MongoDB Atlas)
|
||||
↓
|
||||
Workers (Cloud Run)
|
||||
```
|
||||
|
||||
**Components:**
|
||||
- **Frontend**: Build + deploy to Cloud Storage, serve via Cloud CDN
|
||||
- **Backend**: Deploy to Cloud Run (auto-scaling)
|
||||
- **Workers**: Separate Cloud Run service for Celery
|
||||
- **Database**: MongoDB Atlas (managed)
|
||||
- **Files**: Google Cloud Storage (already integrated)
|
||||
|
||||
**Benefits:**
|
||||
- ✅ Auto-scaling
|
||||
- ✅ Global CDN
|
||||
- ✅ Managed services
|
||||
- ✅ Pay-per-use
|
||||
- ✅ High availability
|
||||
|
||||
## 📊 Comparison Matrix
|
||||
|
||||
| Option | Complexity | Cost | Scalability | Maintenance |
|
||||
|--------|------------|------|-------------|-------------|
|
||||
| **Dev Docker** | Low | Very Low | Limited | Manual |
|
||||
| **Prod Docker** | Low | Low | Manual | Medium |
|
||||
| **Nginx Proxy** | Medium | Low | Manual | Medium |
|
||||
| **Cloud Native** | High | Variable | Automatic | Low |
|
||||
|
||||
## 🚀 Quick Migration Guide
|
||||
|
||||
### From Development → Production Docker
|
||||
|
||||
1. **Update environment variables:**
|
||||
```bash
|
||||
cp .env.example .env.prod
|
||||
# Edit .env.prod with production values
|
||||
```
|
||||
|
||||
2. **Deploy:**
|
||||
```bash
|
||||
docker-compose -f docker-compose.prod.yml up -d
|
||||
```
|
||||
|
||||
3. **Verify:**
|
||||
```bash
|
||||
# Frontend (optimized build)
|
||||
curl http://localhost:80
|
||||
|
||||
# Backend API
|
||||
curl http://localhost:8000/health
|
||||
```
|
||||
|
||||
### From Docker → Cloud Native
|
||||
|
||||
1. **Build frontend:**
|
||||
```bash
|
||||
cd frontend && npm run build
|
||||
gsutil -m rsync -r -d dist/ gs://your-bucket/
|
||||
```
|
||||
|
||||
2. **Deploy backend:**
|
||||
```bash
|
||||
gcloud run deploy video-api --source=./backend --region=us-central1
|
||||
```
|
||||
|
||||
3. **Deploy workers:**
|
||||
```bash
|
||||
gcloud run deploy video-workers --source=./backend --region=us-central1
|
||||
```
|
||||
|
||||
## 🔧 Configuration Files Created
|
||||
|
||||
### `docker-compose.prod.yml`
|
||||
- Production-ready Docker setup
|
||||
- Nginx serving frontend
|
||||
- Optimized environment variables
|
||||
- Health checks included
|
||||
|
||||
### `nginx/nginx.conf`
|
||||
- Single-domain routing configuration
|
||||
- API proxy with rate limiting
|
||||
- WebSocket support
|
||||
- Static file caching
|
||||
- Security headers
|
||||
|
||||
## 🎯 Recommendations by Use Case
|
||||
|
||||
### **Small Team / MVP**
|
||||
→ Use **Production Docker** (`docker-compose.prod.yml`)
|
||||
|
||||
### **Growing Business**
|
||||
→ Use **Nginx Proxy** setup for better performance
|
||||
|
||||
### **Enterprise / Scale**
|
||||
→ Go **Cloud Native** with Google Cloud Run + CDN
|
||||
|
||||
## 🔍 Current Status
|
||||
|
||||
✅ **Development**: Already working with `docker-compose up`
|
||||
✅ **Production Docker**: Ready with `docker-compose.prod.yml`
|
||||
✅ **Nginx Proxy**: Configured and ready to deploy
|
||||
⚠️ **Cloud Native**: Requires GCP setup and configuration
|
||||
|
||||
Your current Docker setup is **development-optimized**. For production, use the new `docker-compose.prod.yml` which properly builds and serves the React app through Nginx while keeping the backend API separate but coordinated.
|
||||
384
.archive/source-docs-2026-04-29/original/README.md
Normal file
384
.archive/source-docs-2026-04-29/original/README.md
Normal file
|
|
@ -0,0 +1,384 @@
|
|||
# Accessible Video Processing Platform
|
||||
|
||||
A comprehensive AI-powered platform for generating accessible video content with closed captions, audio descriptions, and multi-language translations. Features a complete workflow from video upload to final delivery with quality control processes.
|
||||
|
||||
## ✅ Current Status: **Production-Ready** (85% Complete)
|
||||
|
||||
**Lines of Code:** 20,471 total (12,198 backend + 8,273 frontend)
|
||||
|
||||
## 🚀 Key Features Implemented
|
||||
|
||||
### Core Functionality ✅
|
||||
- **AI-Powered Processing**: Complete Gemini 2.5 Pro integration for intelligent caption and audio description generation
|
||||
- **Multi-Language Pipeline**: Google Translate + cultural transcreation with 50+ language support
|
||||
- **Quality Control Workflow**: Full reviewer approval/rejection system with VTT editing capabilities
|
||||
- **Audio Description TTS**: Google Cloud TTS and ElevenLabs integration with audio synthesis
|
||||
- **Real-time Updates**: WebSocket-powered job status tracking and notifications
|
||||
- **Advanced Video Player**: Multi-language caption support with timeline navigation
|
||||
- **Role-Based Access Control**: Complete CLIENT/REVIEWER/ADMIN role system
|
||||
|
||||
### Security & Infrastructure ✅
|
||||
- **JWT Authentication**: Secure access/refresh token system with HttpOnly cookies
|
||||
- **Audit Logging**: Comprehensive audit trail for all reviewer actions
|
||||
- **Signed URLs**: Secure Google Cloud Storage file access (24h expiry)
|
||||
- **Input Validation**: Complete request validation and sanitization
|
||||
- **HTTPS/CORS**: Production-ready security configuration
|
||||
|
||||
### User Experience ✅
|
||||
- **Responsive Design**: Mobile-first Tailwind CSS implementation
|
||||
- **Real-time Feedback**: Live job progress tracking and notifications
|
||||
- **Advanced File Management**: Drag-and-drop uploads with progress indicators
|
||||
- **VTT Editor**: Inline caption editing with live preview
|
||||
- **Download Portal**: Secure asset delivery with organized file structure
|
||||
|
||||
## 🛠 Tech Stack
|
||||
|
||||
### Backend (FastAPI + Python 3.11)
|
||||
- **FastAPI 0.115.0** - Modern async web framework with OpenAPI documentation
|
||||
- **Celery 5.3.4** - Distributed task queue with Redis broker
|
||||
- **MongoDB 7.0** - Document database with replica set support
|
||||
- **Redis 7.2** - Caching and message queuing
|
||||
- **Google Cloud Platform** - Storage, AI services, Secret Manager, TTS
|
||||
- **Pydantic 2.5** - Data validation and serialization
|
||||
- **OpenTelemetry** - Observability and monitoring
|
||||
- **Sentry** - Error tracking and performance monitoring
|
||||
|
||||
### Frontend (React 19 + TypeScript)
|
||||
- **React 19.1.1** - Modern UI framework with latest features
|
||||
- **Vite 7.1.2** - Lightning-fast build tool and dev server
|
||||
- **TypeScript 5.8** - Full type safety throughout application
|
||||
- **TanStack Query 5.85** - Advanced server state management with caching
|
||||
- **React Router 7.8** - Client-side routing with protected routes
|
||||
- **Tailwind CSS 4.1** - Utility-first CSS framework
|
||||
- **Zustand 5.0** - Lightweight client state management
|
||||
- **React Hook Form + Zod** - Form handling with schema validation
|
||||
|
||||
## 🏗 Architecture Overview
|
||||
|
||||
### Complete Job Processing Pipeline ✅
|
||||
```
|
||||
Upload → Ingestion → AI Processing → QC Review → Translation → TTS → Final Review → Delivery
|
||||
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
|
||||
GCS Gemini 2.5 VTT Generation Human Google Text-to- Reviewer Email +
|
||||
Storage Pro + Validation Review Translate Speech Approval Downloads
|
||||
```
|
||||
|
||||
### System Architecture
|
||||
- **Monorepo Structure**: `/backend`, `/frontend`, `/infra` with clear separation
|
||||
- **Microservices Ready**: Modular FastAPI services with proper dependency injection
|
||||
- **Event-Driven**: WebSocket real-time updates with connection management
|
||||
- **Scalable Workers**: Celery task queue with auto-retry and error recovery
|
||||
- **Secure by Design**: RBAC, signed URLs, audit logging, input validation
|
||||
|
||||
## 🚀 Getting Started
|
||||
|
||||
### Prerequisites
|
||||
- **Python 3.11+** (backend development)
|
||||
- **Node.js 18+** (frontend development)
|
||||
- **Docker & Docker Compose** (required for local development)
|
||||
- **Google Cloud Project** with APIs enabled (for video processing)
|
||||
|
||||
### 🐳 Local Development with Docker (Recommended)
|
||||
|
||||
This is the recommended approach for local development. Backend services run in Docker containers while the frontend runs via Vite dev server for fast hot-reload.
|
||||
|
||||
#### Initial Setup
|
||||
```bash
|
||||
# 1. Clone the repository
|
||||
git clone <repository>
|
||||
cd video_accessibility
|
||||
|
||||
# 2. Copy and configure environment files
|
||||
cp .env.prod.example .env.local
|
||||
# Edit .env.local with your API keys and settings
|
||||
|
||||
# 3. Set up frontend environment
|
||||
cp frontend/.env.example frontend/.env.local
|
||||
# The defaults should work for local development
|
||||
|
||||
# 4. Ensure GCP credentials are in place
|
||||
# Copy your GCP service account JSON to: ./secrets/gcp-credentials.json
|
||||
```
|
||||
|
||||
#### Starting the Development Environment
|
||||
|
||||
**Step 1: Start Backend Services (Docker)**
|
||||
```bash
|
||||
# Start API, Worker, MongoDB, and Redis in Docker
|
||||
./scripts/run-local.sh
|
||||
|
||||
# Services will be available at:
|
||||
# - API: http://localhost:8003
|
||||
# - API Docs: http://localhost:8003/docs
|
||||
# - MongoDB: mongodb://localhost:27017
|
||||
# - Redis: redis://localhost:6379
|
||||
```
|
||||
|
||||
**Step 2: Start Frontend (Vite Dev Server)**
|
||||
```bash
|
||||
# In a separate terminal
|
||||
cd frontend
|
||||
npm install # First time only
|
||||
npm run dev
|
||||
|
||||
# Frontend will be available at:
|
||||
# - Application: http://localhost:6001/video-accessibility
|
||||
```
|
||||
|
||||
#### Useful Commands
|
||||
```bash
|
||||
# View logs
|
||||
docker compose logs -f api # API logs
|
||||
docker compose logs -f worker # Worker logs
|
||||
docker compose logs -f # All logs
|
||||
|
||||
# Restart a service
|
||||
docker compose restart api
|
||||
docker compose restart worker
|
||||
|
||||
# Rebuild and restart (after code changes)
|
||||
./scripts/run-local.sh --rebuild
|
||||
|
||||
# Stop all services
|
||||
./scripts/run-local.sh --stop
|
||||
# or
|
||||
docker compose down
|
||||
```
|
||||
|
||||
#### Test User Credentials (Local Development Only)
|
||||
|
||||
For testing different user roles locally:
|
||||
|
||||
```
|
||||
Admin: admin@example.com / admin
|
||||
Production: production@example.com / production
|
||||
Reviewer: reviewer@example.com / reviewer
|
||||
Client: client@example.com / client123
|
||||
```
|
||||
|
||||
**Note**: These test users are only for local development. Production uses Microsoft authentication.
|
||||
|
||||
### Alternative: Native Development (Without Docker)
|
||||
|
||||
For development without Docker, you'll need to run each service manually:
|
||||
|
||||
```bash
|
||||
# Terminal 1: MongoDB
|
||||
mongod --dbpath ./data/db
|
||||
|
||||
# Terminal 2: Redis
|
||||
redis-server
|
||||
|
||||
# Terminal 3: Backend API
|
||||
cd backend
|
||||
poetry install
|
||||
poetry run uvicorn app.main:app --reload --port 8000
|
||||
|
||||
# Terminal 4: Celery Worker
|
||||
cd backend
|
||||
poetry run celery -A app.tasks worker --loglevel=info
|
||||
|
||||
# Terminal 5: Frontend
|
||||
cd frontend
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
**Note**: The Docker approach is strongly recommended as it ensures consistency and simplifies setup.
|
||||
|
||||
### Testing & Quality
|
||||
```bash
|
||||
# Backend tests + linting
|
||||
cd backend
|
||||
poetry run pytest
|
||||
poetry run ruff check .
|
||||
poetry run mypy .
|
||||
|
||||
# Frontend tests + linting
|
||||
cd frontend
|
||||
npm run test
|
||||
npm run test:e2e
|
||||
npm run lint
|
||||
npm run type-check
|
||||
```
|
||||
|
||||
## 📁 Project Structure
|
||||
|
||||
```
|
||||
video_accessibility/ # Root monorepo
|
||||
├── backend/ # FastAPI Python backend (12,198 LOC)
|
||||
│ ├── app/
|
||||
│ │ ├── api/v1/ # REST API endpoints
|
||||
│ │ │ ├── auth.py # JWT authentication
|
||||
│ │ │ ├── jobs.py # Job CRUD & workflow
|
||||
│ │ │ ├── admin.py # Admin operations
|
||||
│ │ │ └── files.py # File management
|
||||
│ │ ├── core/ # Core configuration
|
||||
│ │ ├── models/ # Database models
|
||||
│ │ ├── schemas/ # Pydantic request/response schemas
|
||||
│ │ ├── services/ # External service integrations
|
||||
│ │ │ ├── gemini.py # AI processing
|
||||
│ │ │ ├── gcs.py # Google Cloud Storage
|
||||
│ │ │ ├── translation.py # Multi-language support
|
||||
│ │ │ └── tts.py # Text-to-speech
|
||||
│ │ ├── tasks/ # Celery background workers
|
||||
│ │ ├── middleware/ # Request processing
|
||||
│ │ └── telemetry/ # Observability
|
||||
│ ├── tests/ # Comprehensive test suite
|
||||
│ └── Dockerfile # Container configuration
|
||||
├── frontend/ # React TypeScript SPA (8,273 LOC)
|
||||
│ ├── src/
|
||||
│ │ ├── routes/ # Page components
|
||||
│ │ │ ├── auth/ # Login system
|
||||
│ │ │ ├── jobs/ # Job management
|
||||
│ │ │ ├── qc/ # Quality control
|
||||
│ │ │ └── admin/ # Admin interface
|
||||
│ │ ├── components/ # Reusable UI components
|
||||
│ │ │ ├── VideoWithCaptions.tsx # Advanced video player
|
||||
│ │ │ ├── VttEditor.tsx # Caption editing
|
||||
│ │ │ └── UploadDropzone.tsx # File upload
|
||||
│ │ ├── lib/ # Utilities and API client
|
||||
│ │ ├── hooks/ # Custom React hooks
|
||||
│ │ └── types/ # TypeScript definitions
|
||||
│ ├── tests/ # Unit + E2E tests
|
||||
│ ├── .env.local # Local development config
|
||||
│ └── Dockerfile # Container configuration
|
||||
├── scripts/
|
||||
│ ├── run-local.sh # Local development startup
|
||||
│ ├── deploy.sh # Production deployment
|
||||
│ ├── full-deploy.sh # Full production rebuild
|
||||
│ └── build-frontend.sh # Frontend build script
|
||||
├── docker-compose.yml # Base Docker configuration
|
||||
├── docker-compose.local.yml # Local development overrides
|
||||
├── docker-compose.prod.yml # Production overrides
|
||||
├── .env.local # Local environment variables
|
||||
├── .env.production # Production environment variables
|
||||
├── CLAUDE.md # Development guidelines
|
||||
└── video_accessibility_development_plan.txt # Complete specification
|
||||
```
|
||||
|
||||
## ⚙️ Configuration
|
||||
|
||||
### Environment Variables
|
||||
**Backend** (`backend/.env`):
|
||||
```bash
|
||||
# Database
|
||||
MONGODB_URL=mongodb://admin:password@localhost:27017/accessible_video
|
||||
REDIS_URL=redis://localhost:6379/0
|
||||
|
||||
# Authentication
|
||||
JWT_SECRET_KEY=your-jwt-secret
|
||||
JWT_REFRESH_SECRET_KEY=your-refresh-secret
|
||||
|
||||
# AI Services
|
||||
GEMINI_API_KEY=your-gemini-key
|
||||
ELEVENLABS_API_KEY=your-elevenlabs-key
|
||||
|
||||
# Google Cloud
|
||||
GCS_BUCKET_NAME=your-bucket-name
|
||||
GOOGLE_CLOUD_PROJECT=your-project-id
|
||||
|
||||
# Email
|
||||
SENDGRID_API_KEY=your-sendgrid-key
|
||||
|
||||
# Monitoring
|
||||
SENTRY_DSN=your-sentry-dsn
|
||||
```
|
||||
|
||||
**Frontend** (`frontend/.env`):
|
||||
```bash
|
||||
VITE_API_URL=http://localhost:8000
|
||||
VITE_SENTRY_DSN=your-sentry-dsn
|
||||
VITE_ENVIRONMENT=development
|
||||
```
|
||||
|
||||
### Google Cloud Setup
|
||||
1. **Create GCP Project** with billing enabled
|
||||
2. **Enable APIs**:
|
||||
- Cloud Storage API
|
||||
- Cloud Translation API
|
||||
- Cloud Text-to-Speech API
|
||||
- Vertex AI API (for Gemini)
|
||||
- Secret Manager API
|
||||
3. **Create Service Account** with roles:
|
||||
- Storage Admin
|
||||
- AI Platform Admin
|
||||
- Secret Manager Admin
|
||||
4. **Download JSON key** and set `GOOGLE_APPLICATION_CREDENTIALS`
|
||||
|
||||
## 🚢 Deployment Options
|
||||
|
||||
### Production Architecture (Google Cloud)
|
||||
- **Frontend**: Cloud Storage + Cloud CDN (static hosting)
|
||||
- **Backend API**: Cloud Run (serverless, auto-scaling)
|
||||
- **Workers**: Cloud Run (Celery with Redis)
|
||||
- **Database**: MongoDB Atlas (managed)
|
||||
- **Queue**: Cloud Memorystore (Redis)
|
||||
- **Storage**: Google Cloud Storage
|
||||
- **Monitoring**: Cloud Monitoring + Sentry
|
||||
|
||||
### Docker Production
|
||||
```bash
|
||||
# Build production images
|
||||
docker-compose -f docker-compose.prod.yml up -d
|
||||
```
|
||||
|
||||
## 🔒 Security Features
|
||||
|
||||
### Implemented Security ✅
|
||||
- **JWT Authentication**: Access (15min) + refresh (7 days) token rotation
|
||||
- **RBAC System**: CLIENT/REVIEWER/ADMIN roles with endpoint protection
|
||||
- **Secure Storage**: HttpOnly cookies for refresh tokens
|
||||
- **File Security**: Signed URLs with 24h expiry, no client access to raw files
|
||||
- **Input Validation**: Comprehensive Pydantic validation on all endpoints
|
||||
- **Audit Logging**: Complete trail of all reviewer actions and system events
|
||||
- **CORS Protection**: Configured for production domains
|
||||
- **Rate Limiting**: Request throttling and validation middleware
|
||||
|
||||
## 🔧 API Documentation
|
||||
|
||||
### Key Endpoints Implemented
|
||||
```
|
||||
POST /api/v1/auth/login # Authentication
|
||||
POST /api/v1/jobs # Create job with file upload
|
||||
GET /api/v1/jobs # List jobs (filtered by role)
|
||||
GET /api/v1/jobs/{id} # Job details with real-time status
|
||||
POST /api/v1/jobs/{id}/actions/* # Workflow actions (approve/reject/complete)
|
||||
GET /api/v1/jobs/{id}/vtt # VTT content retrieval
|
||||
PATCH /api/v1/jobs/{id}/vtt # VTT editing and updates
|
||||
GET /api/v1/jobs/{id}/downloads # Signed download URLs
|
||||
WS /api/v1/ws/jobs/{id} # Real-time job status updates
|
||||
```
|
||||
|
||||
**OpenAPI Documentation**: http://localhost:8000/docs
|
||||
|
||||
## 🎯 Development Status
|
||||
|
||||
### ✅ Completed (Production Ready)
|
||||
- **User Management**: Full authentication, RBAC, password management
|
||||
- **Job Pipeline**: Complete video processing workflow with state machine
|
||||
- **Quality Control**: VTT editor, approval workflows, reviewer dashboards
|
||||
- **Real-time Features**: WebSocket updates, live notifications
|
||||
- **Multi-language**: Translation pipeline with cultural transcreation
|
||||
- **File Management**: Secure uploads, downloads, asset validation
|
||||
- **Admin Features**: User management, system monitoring, audit logs
|
||||
|
||||
### ⚠️ Needs Attention (Minor)
|
||||
- **Integration Tests**: Framework exists but needs completion
|
||||
- **Email Templates**: Service implemented, templates may need customization
|
||||
- **Performance Testing**: No load testing implemented yet
|
||||
- **Documentation**: API docs complete, user guides could be enhanced
|
||||
|
||||
### 🎯 Recommended Next Steps
|
||||
1. **Complete integration test suite** for end-to-end validation
|
||||
2. **Performance testing** with realistic video processing loads
|
||||
3. **Production deployment** configuration and CI/CD pipeline
|
||||
4. **User documentation** and training materials
|
||||
5. **Monitoring dashboards** for production operations
|
||||
|
||||
## 📚 Development Resources
|
||||
|
||||
- **Complete Specification**: `video_accessibility_development_plan.txt`
|
||||
- **Development Guidelines**: `CLAUDE.md`
|
||||
- **API Documentation**: http://localhost:8000/docs (when running)
|
||||
- **Test Coverage Reports**: `backend/htmlcov/` (after running tests)
|
||||
97
AGENTS.md
Normal file
97
AGENTS.md
Normal file
|
|
@ -0,0 +1,97 @@
|
|||
# Accessible Video Processing Platform — Project Entry Point
|
||||
|
||||
<!-- SCOPE: root | owner: ln-111 | generated: 2026-04-29 -->
|
||||
|
||||
## What Is This Project
|
||||
|
||||
AI-powered SaaS platform that generates legally-required accessibility assets from video files: closed captions, audio descriptions, SDH captions, and descriptive transcripts. Outputs are reviewed through a human QC workflow before client delivery. 50+ language translation and cultural transcreation are built in.
|
||||
|
||||
**Client:** Oliver Internal
|
||||
**Server:** optical-web-1
|
||||
**Status:** 85% production-ready
|
||||
|
||||
---
|
||||
|
||||
## Quick Navigation
|
||||
|
||||
| Need | Go to |
|
||||
|------|-------|
|
||||
| Architecture, data flow, state machine | [docs/project/architecture.md](docs/project/architecture.md) |
|
||||
| Tech stack versions and config | [docs/project/tech_stack.md](docs/project/tech_stack.md) |
|
||||
| API endpoint reference | [docs/project/api_spec.md](docs/project/api_spec.md) |
|
||||
| Database collections and indexes | [docs/project/database_schema.md](docs/project/database_schema.md) |
|
||||
| Infrastructure inventory | [docs/project/infrastructure.md](docs/project/infrastructure.md) |
|
||||
| Runbook — deploy, restart, rollback | [docs/project/runbook.md](docs/project/runbook.md) |
|
||||
| Functional requirements | [docs/project/requirements.md](docs/project/requirements.md) |
|
||||
| Development principles | [docs/principles.md](docs/principles.md) |
|
||||
| Reference — ADRs, guides, research | [docs/reference/README.md](docs/reference/README.md) |
|
||||
| Task management | [docs/tasks/README.md](docs/tasks/README.md) |
|
||||
| Test strategy and commands | [tests/README.md](tests/README.md) |
|
||||
| Documentation hub | [docs/README.md](docs/README.md) |
|
||||
|
||||
---
|
||||
|
||||
## Entry Points by Audience
|
||||
|
||||
| Audience | Start here |
|
||||
|----------|-----------|
|
||||
| New developer | [docs/project/runbook.md](docs/project/runbook.md) → local setup section |
|
||||
| Reviewer / QC | [docs/project/requirements.md](docs/project/requirements.md) → QC workflow section |
|
||||
| DevOps | [docs/project/infrastructure.md](docs/project/infrastructure.md) + [docs/project/runbook.md](docs/project/runbook.md) |
|
||||
| Security reviewer | [docs/project/architecture.md](docs/project/architecture.md) → security section |
|
||||
| AI agent | Read this file → pick topic → read `_index`-equivalent doc → synthesize |
|
||||
|
||||
---
|
||||
|
||||
## Core Pipeline (one-line summary per stage)
|
||||
|
||||
| Stage | What happens | Key file |
|
||||
|-------|-------------|---------|
|
||||
| Upload | MP4 → GCS + MongoDB job record | `routes_files.py` |
|
||||
| Ingestion | Celery worker transcribes with Gemini 2.5 Pro | `tasks/ingest_and_ai.py` |
|
||||
| AI Processing | VTT generated, validated, stored in GCS | `services/gemini.py` |
|
||||
| QC Review | Reviewer edits VTT, approves or rejects | `services/language_qc.py` |
|
||||
| Translation | Google Translate + transcreation per language | `tasks/translate_and_synthesize.py` |
|
||||
| TTS | Per-cue audio synthesis (Google TTS / ElevenLabs) | `services/tts.py` |
|
||||
| Final Review | PM approves deliverables | `routes_language_qc.py` |
|
||||
| Delivery | Signed GCS URLs emailed to client | `services/emailer.py` |
|
||||
|
||||
See full state machine (16 states) in [docs/project/architecture.md](docs/project/architecture.md#job-state-machine).
|
||||
|
||||
---
|
||||
|
||||
## Development Commands
|
||||
|
||||
| Action | Command |
|
||||
|--------|---------|
|
||||
| Start local (Docker + Vite) | `./scripts/run-local.sh` |
|
||||
| Rebuild after code change | `./scripts/run-local.sh --rebuild` |
|
||||
| Stop all local services | `./scripts/run-local.sh --stop` |
|
||||
| Backend lint | `cd backend && ruff check .` |
|
||||
| Backend type-check | `cd backend && mypy .` (run in Docker container) |
|
||||
| Frontend lint | `cd frontend && npm run lint` |
|
||||
| Frontend type-check | `cd frontend && npm run type-check` |
|
||||
| Backend tests | `cd backend && poetry run pytest` |
|
||||
| Frontend tests | `cd frontend && npm run test` |
|
||||
| E2E tests | `cd frontend && npm run test:e2e` |
|
||||
|
||||
---
|
||||
|
||||
## Key Constraints
|
||||
|
||||
- **NO SSH to optical-web-1** without explicit user instruction — hard rule in CLAUDE.md
|
||||
- **Access tokens in memory only** (not localStorage) — auth architecture constraint
|
||||
- **Refresh tokens in HttpOnly cookies** — security requirement
|
||||
- **Signed GCS URLs** expire in 24h — do not cache or store URLs
|
||||
- **RBAC enforced server-side** — never trust client-supplied role claims
|
||||
- **All reviewer actions emit audit log entries** — compliance requirement
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** New route added, deployment target changes, key dependency version change, new team member onboarded.
|
||||
|
||||
**Verification:** All links in Quick Navigation resolve. Entry commands are correct against current scripts/.
|
||||
|
||||
<!-- END SCOPE: root -->
|
||||
|
|
@ -1,5 +1,8 @@
|
|||
# Accessible Video Processing Platform - Development Guide
|
||||
|
||||
<!-- Documentation entry point: see @AGENTS.md for full project navigation -->
|
||||
@AGENTS.md
|
||||
|
||||
## Project Overview
|
||||
This is a comprehensive video accessibility platform that automatically generates closed captions and audio descriptions using AI, with quality control workflows and multi-language support.
|
||||
|
||||
|
|
|
|||
BIN
DEPLOYMENT.md
BIN
DEPLOYMENT.md
Binary file not shown.
|
|
@ -2,6 +2,8 @@
|
|||
|
||||
A comprehensive AI-powered platform for generating accessible video content with closed captions, audio descriptions, and multi-language translations. Features a complete workflow from video upload to final delivery with quality control processes.
|
||||
|
||||
**Documentation:** See [AGENTS.md](AGENTS.md) for full navigation, or [docs/README.md](docs/README.md) for the documentation hub.
|
||||
|
||||
## ✅ Current Status: **Production-Ready** (85% Complete)
|
||||
|
||||
**Lines of Code:** 20,471 total (12,198 backend + 8,273 frontend)
|
||||
|
|
|
|||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
3
backend/uv.lock
generated
Normal file
3
backend/uv.lock
generated
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
version = 1
|
||||
revision = 3
|
||||
requires-python = ">=3.14"
|
||||
56
docs/README.md
Normal file
56
docs/README.md
Normal file
|
|
@ -0,0 +1,56 @@
|
|||
# Documentation Hub — Accessible Video Processing Platform
|
||||
|
||||
<!-- SCOPE: docs-hub | owner: ln-111 | generated: 2026-04-29 -->
|
||||
|
||||
This is the central index for all project documentation. Each section links to a specific domain doc.
|
||||
|
||||
## Project Documentation
|
||||
|
||||
| Document | Contents |
|
||||
|----------|---------|
|
||||
| [requirements.md](project/requirements.md) | Functional requirements, QC workflow, RBAC matrix |
|
||||
| [architecture.md](project/architecture.md) | System design, job state machine, data flow, security model |
|
||||
| [tech_stack.md](project/tech_stack.md) | All dependency versions, configuration anchors |
|
||||
| [api_spec.md](project/api_spec.md) | REST endpoints, WebSocket, auth flows |
|
||||
| [database_schema.md](project/database_schema.md) | MongoDB collections, indexes, data shapes |
|
||||
| [infrastructure.md](project/infrastructure.md) | Server inventory, ports, GCS layout, external services |
|
||||
| [runbook.md](project/runbook.md) | Local setup, deploy, restart, backup, rollback |
|
||||
|
||||
## Reference Documentation
|
||||
|
||||
| Directory | Contains |
|
||||
|-----------|---------|
|
||||
| [reference/README.md](reference/README.md) | Index of ADRs, guides, manuals, research |
|
||||
| [reference/adrs/](reference/adrs/) | Architecture Decision Records |
|
||||
| [reference/guides/](reference/guides/) | Developer how-to guides |
|
||||
| [reference/manuals/](reference/manuals/) | Operator manuals |
|
||||
| [reference/research/](reference/research/) | Technology research notes |
|
||||
|
||||
## Standards and Principles
|
||||
|
||||
| Document | Contents |
|
||||
|----------|---------|
|
||||
| [documentation_standards.md](documentation_standards.md) | 60 universal documentation requirements |
|
||||
| [principles.md](principles.md) | 11 development principles (Standards First, YAGNI, KISS, DRY…) |
|
||||
|
||||
## Task and Test Management
|
||||
|
||||
| Document | Contents |
|
||||
|----------|---------|
|
||||
| [tasks/README.md](tasks/README.md) | Task management rules and conventions |
|
||||
| [tests/README.md](../tests/README.md) | Test strategy, commands, coverage targets |
|
||||
|
||||
---
|
||||
|
||||
## Canonical Entry Point
|
||||
|
||||
All documentation is reachable from [../AGENTS.md](../AGENTS.md). Agents and humans should start there.
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** New doc added, section moved, external URL changes.
|
||||
**Verification:** All links in this file resolve. AGENTS.md Quick Navigation matches this hub.
|
||||
|
||||
<!-- END SCOPE: docs-hub -->
|
||||
61
docs/documentation_standards.md
Normal file
61
docs/documentation_standards.md
Normal file
|
|
@ -0,0 +1,61 @@
|
|||
# Documentation Standards — Accessible Video Processing Platform
|
||||
|
||||
<!-- SCOPE: doc-standards | owner: ln-111 | generated: 2026-04-29 -->
|
||||
|
||||
## Core Rules
|
||||
|
||||
| Rule | Requirement |
|
||||
|------|-------------|
|
||||
| NO_CODE | No code block longer than 5 lines. Use tables, ASCII diagrams, or links to source files instead. |
|
||||
| SCOPE tag | Every document must open with `<!-- SCOPE: {name} | owner: {ln-xxx} | generated: {date} -->` and close with `<!-- END SCOPE: {name} -->`. |
|
||||
| Maintenance section | Every document must have a `## Maintenance` section at the end with: **Update triggers** (when to update) and **Verification** (how to confirm accuracy). |
|
||||
| Canonical entry | All documentation is reachable from `AGENTS.md`. Never create a document that is an island. |
|
||||
| No placeholder text | No `TODO`, `TBD`, `PLACEHOLDER`, `[describe here]`, or template metadata in committed documents. |
|
||||
| No stale dates | Inline dates must be accurate at time of writing. Use `generated: {date}` in SCOPE tags rather than inline prose dates that rot. |
|
||||
| Official links only | External links must point to official documentation (docs.python.org, fastapi.tiangolo.com, reactjs.org, MDN). No Stack Overflow, Medium, or blog links in reference docs. |
|
||||
|
||||
## Structure Rules
|
||||
|
||||
| Rule | Requirement |
|
||||
|------|-------------|
|
||||
| Tables over lists | Use markdown tables for: parameters, config values, comparison of alternatives, inventory lists. Use bullet lists only for sequential steps or genuinely unordered enumerations. |
|
||||
| DAG navigation | Documents form a Directed Acyclic Graph. `AGENTS.md` → `docs/README.md` → domain docs. No circular links. |
|
||||
| One canonical source | Every fact has exactly one home. Other documents link to it rather than repeating it. |
|
||||
| File naming | Snake_case for all doc files. No spaces in filenames. |
|
||||
|
||||
## Document Types and Templates
|
||||
|
||||
| Type | Location | Contains |
|
||||
|------|----------|---------|
|
||||
| Root entry | `AGENTS.md` | Quick navigation, pipeline summary, constraints |
|
||||
| Documentation hub | `docs/README.md` | Index of all docs |
|
||||
| Requirements | `docs/project/requirements.md` | Functional requirements only (no implementation) |
|
||||
| Architecture | `docs/project/architecture.md` | System design, state machines, data flow |
|
||||
| Tech stack | `docs/project/tech_stack.md` | Dependency versions |
|
||||
| API spec | `docs/project/api_spec.md` | Endpoints, auth, request/response shapes |
|
||||
| Database schema | `docs/project/database_schema.md` | Collections, indexes, field definitions |
|
||||
| Infrastructure | `docs/project/infrastructure.md` | Servers, ports, external services |
|
||||
| Runbook | `docs/project/runbook.md` | Operational procedures |
|
||||
| Principles | `docs/principles.md` | Engineering principles |
|
||||
| ADR | `docs/reference/adrs/{date}-{slug}.md` | Single architectural decision |
|
||||
| Guide | `docs/reference/guides/{slug}.md` | Developer how-to |
|
||||
| Manual | `docs/reference/manuals/{slug}.md` | Operator procedures |
|
||||
| Research | `docs/reference/research/{slug}.md` | Technology evaluation notes |
|
||||
| Test docs | `tests/README.md` | Test commands, strategy, coverage |
|
||||
|
||||
## ADR Format (Michael Nygard)
|
||||
|
||||
| Section | Contents |
|
||||
|---------|---------|
|
||||
| Title | Short imperative verb phrase |
|
||||
| Status | Proposed / Accepted / Deprecated / Superseded |
|
||||
| Context | The problem or constraint that forced a decision |
|
||||
| Decision | What was decided |
|
||||
| Consequences | Trade-offs and impact |
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** New document type added to the project, naming convention changes.
|
||||
**Verification:** All committed docs have SCOPE tags. `grep -r "TODO\|PLACEHOLDER\|TBD" docs/` returns no results.
|
||||
|
||||
<!-- END SCOPE: doc-standards -->
|
||||
70
docs/principles.md
Normal file
70
docs/principles.md
Normal file
|
|
@ -0,0 +1,70 @@
|
|||
# Development Principles — Accessible Video Processing Platform
|
||||
|
||||
<!-- SCOPE: principles | owner: ln-111 | generated: 2026-04-29 -->
|
||||
|
||||
These 11 principles govern all engineering decisions on this project. They are ordered by priority — earlier principles override later ones when they conflict.
|
||||
|
||||
---
|
||||
|
||||
## P-01: Standards First
|
||||
|
||||
Implement to agreed specifications. The `video_accessibility_development_plan.txt` is the authoritative source for API contracts, schemas, state machine transitions, and worker pipeline behaviour. Read it before implementing a new feature. When the spec and the code diverge, fix the code.
|
||||
|
||||
## P-02: Security by Design
|
||||
|
||||
Security controls are non-negotiable:
|
||||
- Access tokens in JS memory only — never localStorage
|
||||
- Refresh tokens in HttpOnly cookies only
|
||||
- RBAC enforced on every endpoint server-side
|
||||
- Signed GCS URLs with 24h expiry — never store URLs
|
||||
- All reviewer actions must emit audit log entries
|
||||
- Generic error messages — never return internal exception details to clients
|
||||
|
||||
## P-03: Async Correctness
|
||||
|
||||
The backend is async (FastAPI + asyncio). Celery workers run in a separate sync process. Rules:
|
||||
- Never call synchronous blocking I/O (`requests.get`, `time.sleep`) in async FastAPI routes
|
||||
- Never share asyncio connections across Celery task boundaries — create connections per task
|
||||
- Use `httpx.AsyncClient` for HTTP in async routes
|
||||
- Use `asyncio.get_running_loop().run_in_executor()` only as a last resort for unavoidable sync calls
|
||||
|
||||
## P-04: Fail Loudly on Configuration
|
||||
|
||||
Missing required secrets must crash startup, not fall back to insecure defaults. Use `os.environ["KEY"]` (raises `KeyError`) instead of `os.environ.get("KEY", "weak_default")`. The `DEFAULT_ADMIN_PASSWORD` fallback is a known violation that must be fixed.
|
||||
|
||||
## P-05: YAGNI
|
||||
|
||||
Build only what is specified and currently needed. No speculative abstractions, no helper utilities for hypothetical future use cases. Three similar lines is better than a premature abstraction. If a feature is out of scope, defer it — don't build a "foundation" for it.
|
||||
|
||||
## P-06: KISS
|
||||
|
||||
Simple code is correct code. Prefer flat, readable functions over clever abstractions. A 20-line function that does one thing clearly is better than a 5-line function using three layers of metaprogramming.
|
||||
|
||||
## P-07: DRY — but only after the second time
|
||||
|
||||
Do not abstract on first encounter. Abstract when the same logic appears in a second place. The `broadcast_status_update()` function is copy-pasted in two task files — this is the known violation to fix.
|
||||
|
||||
## P-08: No Comments for What, Only for Why
|
||||
|
||||
Code identifiers describe what the code does. Comments explain non-obvious constraints, hidden invariants, or bug workarounds. `# logger undefined here` is a good comment. `# increment counter` is not.
|
||||
|
||||
## P-09: Validate at System Boundaries Only
|
||||
|
||||
Validate untrusted input (HTTP request bodies, AI model output, file uploads) at the boundary. Trust internal code, framework guarantees, and typed function signatures. Do not add defensive null-checks on values that the framework guarantees are non-null.
|
||||
|
||||
## P-10: No Silent Failures
|
||||
|
||||
Every error has two acceptable outcomes: it is handled with a logged warning, or it propagates up as an exception. Swallowed exceptions that log nothing (`except Exception: pass`) are forbidden. The `authz.py` `cache_key` NameError swallowed silently is a known violation.
|
||||
|
||||
## P-11: Test What Matters
|
||||
|
||||
Follow risk-based testing (Priority = Business Impact × Probability). Tests with Priority ≥15 must exist before a feature is considered production-ready. Current critical gaps: RBAC (`authz.py`), job state machine (`ingest_and_ai.py`), audit logger, glossary retrieval — all Priority ≥20.
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** New architectural decision that changes how engineers should approach a class of problems.
|
||||
**Verification:** Each principle has a measurable compliance check — run a brief audit against recent commits before each production deploy.
|
||||
|
||||
<!-- END SCOPE: principles -->
|
||||
198
docs/project/api_spec.md
Normal file
198
docs/project/api_spec.md
Normal file
|
|
@ -0,0 +1,198 @@
|
|||
# API Specification — Accessible Video Processing Platform
|
||||
|
||||
<!-- SCOPE: api-spec | owner: ln-113 | generated: 2026-04-29 -->
|
||||
|
||||
**Base URL (production):** `https://ai-sandbox.oliver.solutions/video-accessibility-back`
|
||||
**Base URL (local):** `http://localhost:8003`
|
||||
**OpenAPI docs:** `{base_url}/docs` (Swagger UI)
|
||||
|
||||
All endpoints require `Authorization: Bearer <access_token>` except `/auth/login`, `/auth/refresh`, `/auth/microsoft/*`, and `/health`.
|
||||
|
||||
---
|
||||
|
||||
## Authentication
|
||||
|
||||
| Method | Path | Auth | Description |
|
||||
|--------|------|------|-------------|
|
||||
| POST | `/api/v1/auth/login` | None | Email/password login; returns access token + sets refresh cookie |
|
||||
| POST | `/api/v1/auth/refresh` | Cookie | Exchange refresh cookie for new access token |
|
||||
| POST | `/api/v1/auth/logout` | Bearer | Revoke refresh token, clear cookie |
|
||||
| POST | `/api/v1/auth/microsoft/callback` | None | Microsoft SSO callback; validates OIDC token |
|
||||
| GET | `/api/v1/auth/microsoft/login` | None | Redirect to Microsoft login |
|
||||
| POST | `/api/v1/auth/change-password` | Bearer | Change own password |
|
||||
|
||||
**Login response fields:**
|
||||
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| access_token | string | JWT, 15-minute TTL |
|
||||
| token_type | string | Always "bearer" |
|
||||
| user | object | User profile (id, email, role, org_id) |
|
||||
|
||||
---
|
||||
|
||||
## Jobs
|
||||
|
||||
| Method | Path | Roles | Description |
|
||||
|--------|------|-------|-------------|
|
||||
| GET | `/api/v1/jobs` | ALL | List jobs (role-filtered: client sees own, reviewer/admin see all) |
|
||||
| POST | `/api/v1/jobs` | CLIENT, ADMIN | Create job with MP4 upload |
|
||||
| GET | `/api/v1/jobs/{id}` | ALL | Job detail with current status + outputs |
|
||||
| DELETE | `/api/v1/jobs/{id}` | ADMIN | Delete job and GCS files |
|
||||
| GET | `/api/v1/jobs/{id}/downloads` | ALL | Signed download URLs for deliverables (24h expiry) |
|
||||
| POST | `/api/v1/jobs/{id}/actions/approve` | REVIEWER, ADMIN | Approve job at current QC stage |
|
||||
| POST | `/api/v1/jobs/{id}/actions/reject` | REVIEWER, ADMIN | Reject job with reason |
|
||||
| POST | `/api/v1/jobs/{id}/actions/feedback` | REVIEWER, ADMIN | Send QC feedback without rejection |
|
||||
| POST | `/api/v1/jobs/{id}/actions/retry` | ADMIN | Retry failed task (TTS_FAILED, RENDER_FAILED) |
|
||||
|
||||
**Job object key fields:**
|
||||
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| _id | string | MongoDB ObjectId |
|
||||
| status | string | JobStatus enum — see architecture.md |
|
||||
| org_id | string | Organisation that owns the job |
|
||||
| source_language | string | BCP-47 language code |
|
||||
| requested_outputs | array | Output language codes requested |
|
||||
| outputs | object | Per-language GCS paths |
|
||||
| language_qc | object | Per-language QC state |
|
||||
| created_at | datetime | ISO 8601 |
|
||||
| updated_at | datetime | ISO 8601 |
|
||||
| error | string | Last error message if failed |
|
||||
|
||||
---
|
||||
|
||||
## VTT Management
|
||||
|
||||
| Method | Path | Roles | Description |
|
||||
|--------|------|-------|-------------|
|
||||
| GET | `/api/v1/jobs/{id}/vtt/{lang}` | REVIEWER, ADMIN | Get VTT content for language |
|
||||
| PATCH | `/api/v1/jobs/{id}/vtt/{lang}` | REVIEWER, LINGUIST, ADMIN | Update VTT content (auto-snapshots before save) |
|
||||
| POST | `/api/v1/vtt/adjust-timing` | REVIEWER, ADMIN | Bulk shift all cue timings |
|
||||
|
||||
---
|
||||
|
||||
## VTT Version Control
|
||||
|
||||
| Method | Path | Roles | Description |
|
||||
|--------|------|-------|-------------|
|
||||
| GET | `/api/v1/jobs/{id}/vtt-versions/{lang}` | REVIEWER, ADMIN | List version history |
|
||||
| GET | `/api/v1/jobs/{id}/vtt-versions/{lang}/{version_id}` | REVIEWER, ADMIN | Get specific version content |
|
||||
| POST | `/api/v1/jobs/{id}/vtt-versions/{lang}/{version_id}/restore` | REVIEWER, ADMIN | Restore a previous version (creates new snapshot) |
|
||||
| GET | `/api/v1/jobs/{id}/vtt-versions/{lang}/diff` | REVIEWER, ADMIN | Diff two versions (`?from=v1_id&to=v2_id`) |
|
||||
|
||||
---
|
||||
|
||||
## Language QC
|
||||
|
||||
| Method | Path | Roles | Description |
|
||||
|--------|------|-------|-------------|
|
||||
| GET | `/api/v1/jobs/{id}/language-qc` | REVIEWER, PM, ADMIN | Get per-language QC status for all languages |
|
||||
| POST | `/api/v1/jobs/{id}/language-qc/{lang}/assign` | PM, ADMIN | Assign linguist to language |
|
||||
| POST | `/api/v1/jobs/{id}/language-qc/{lang}/approve` | LINGUIST (assigned), PM, ADMIN | Approve language |
|
||||
| POST | `/api/v1/jobs/{id}/language-qc/{lang}/reject` | LINGUIST (assigned), PM, ADMIN | Reject language with reason |
|
||||
| POST | `/api/v1/jobs/{id}/language-qc/{lang}/feedback` | LINGUIST (assigned), PM, ADMIN | Send feedback without rejection |
|
||||
|
||||
---
|
||||
|
||||
## Glossaries
|
||||
|
||||
| Method | Path | Roles | Description |
|
||||
|--------|------|-------|-------------|
|
||||
| GET | `/api/v1/glossaries` | ALL | List glossaries for current org |
|
||||
| POST | `/api/v1/glossaries` | ADMIN | Create glossary |
|
||||
| GET | `/api/v1/glossaries/{id}` | ALL | Get glossary with terms |
|
||||
| PUT | `/api/v1/glossaries/{id}` | ADMIN | Update glossary metadata |
|
||||
| DELETE | `/api/v1/glossaries/{id}` | ADMIN | Delete glossary |
|
||||
| POST | `/api/v1/glossaries/{id}/terms` | ADMIN | Add term |
|
||||
| DELETE | `/api/v1/glossaries/{id}/terms/{term_id}` | ADMIN | Delete term |
|
||||
|
||||
---
|
||||
|
||||
## Files
|
||||
|
||||
| Method | Path | Roles | Description |
|
||||
|--------|------|-------|-------------|
|
||||
| POST | `/api/v1/files/upload-url` | CLIENT, ADMIN | Get signed GCS upload URL |
|
||||
| GET | `/api/v1/files/{job_id}/{path}` | ALL | Get signed download URL |
|
||||
|
||||
---
|
||||
|
||||
## Users and Organisations
|
||||
|
||||
| Method | Path | Roles | Description |
|
||||
|--------|------|-------|-------------|
|
||||
| GET | `/api/v1/users/me` | ALL | Current user profile |
|
||||
| GET | `/api/v1/organizations` | ADMIN | List organisations |
|
||||
| POST | `/api/v1/organizations` | ADMIN | Create organisation |
|
||||
| GET | `/api/v1/organizations/{id}/members` | PM, ADMIN | List org members |
|
||||
| POST | `/api/v1/organizations/{id}/invite` | PM, ADMIN | Invite member |
|
||||
| DELETE | `/api/v1/organizations/{id}/members/{user_id}` | PM, ADMIN | Remove member |
|
||||
|
||||
---
|
||||
|
||||
## Admin
|
||||
|
||||
| Method | Path | Roles | Description |
|
||||
|--------|------|-------|-------------|
|
||||
| GET | `/api/v1/admin/users` | ADMIN | List all users |
|
||||
| PATCH | `/api/v1/admin/users/{id}` | ADMIN | Update user role or status |
|
||||
| GET | `/api/v1/admin/audit-log` | ADMIN, PM | Query audit log |
|
||||
|
||||
---
|
||||
|
||||
## WebSocket
|
||||
|
||||
| Path | Auth | Description |
|
||||
|------|------|-------------|
|
||||
| `WS /api/v1/ws/jobs/{id}` | Query param `token=<access_token>` | Real-time job status updates |
|
||||
| `WS /api/v1/ws/org/{org_id}` | Query param `token=<access_token>` | Org-scoped event stream |
|
||||
|
||||
**Message format:**
|
||||
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| type | string | `job_status_update`, `notification`, `ping` |
|
||||
| job_id | string | Job ObjectId |
|
||||
| status | string | New JobStatus value |
|
||||
| updated_at | datetime | ISO 8601 |
|
||||
|
||||
---
|
||||
|
||||
## Health
|
||||
|
||||
| Method | Path | Auth | Description |
|
||||
|--------|------|------|-------------|
|
||||
| GET | `/health` | None | Returns `{"status":"healthy","version":"1.0.0"}` |
|
||||
| GET | `/metrics` | None (internal) | Prometheus metrics |
|
||||
|
||||
---
|
||||
|
||||
## Error Response Format
|
||||
|
||||
All errors return:
|
||||
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| detail | string | Human-readable error message (never internal exception text) |
|
||||
|
||||
Common status codes:
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| 400 | Bad request / validation error |
|
||||
| 401 | Unauthenticated or invalid token |
|
||||
| 403 | Forbidden — insufficient role |
|
||||
| 404 | Resource not found |
|
||||
| 422 | Pydantic validation error |
|
||||
| 429 | Rate limit exceeded |
|
||||
| 500 | Internal server error (details logged, not returned) |
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** New endpoint added, request/response schema changed, auth flow change.
|
||||
**Verification:** All endpoints listed here exist in `backend/app/api/v1/routes_*.py`. OpenAPI schema at `/docs` matches this table.
|
||||
|
||||
<!-- END SCOPE: api-spec -->
|
||||
170
docs/project/architecture.md
Normal file
170
docs/project/architecture.md
Normal file
|
|
@ -0,0 +1,170 @@
|
|||
# Architecture — Accessible Video Processing Platform
|
||||
|
||||
<!-- SCOPE: architecture | owner: ln-112 | generated: 2026-04-29 -->
|
||||
|
||||
## System Overview
|
||||
|
||||
Three-tier monorepo: React SPA frontend → FastAPI backend → Celery worker pool. Persistent stores are MongoDB Atlas (documents) and Redis (queue + cache). All AI processing happens asynchronously in Celery tasks. All file I/O is via GCS signed URLs.
|
||||
|
||||
```
|
||||
Browser → Apache → FastAPI (sync surface)
|
||||
→ Celery Workers (async AI pipeline)
|
||||
→ MongoDB Atlas (job state)
|
||||
→ Redis (task queue + rate limit state)
|
||||
→ GCS (video + VTT files)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Job State Machine
|
||||
|
||||
16 states. Transitions are one-directional except for the QC feedback loop.
|
||||
|
||||
| State | Description | Triggered by |
|
||||
|-------|-------------|-------------|
|
||||
| CREATED | Job record created | Upload complete |
|
||||
| INGESTING | Worker has picked up the job | Celery task start |
|
||||
| AI_PROCESSING | Gemini 2.5 Pro generating VTT | Ingestion complete |
|
||||
| PENDING_QC | VTT ready for reviewer | AI processing done |
|
||||
| QC_FEEDBACK | Reviewer sent feedback, not rejected | Reviewer action |
|
||||
| APPROVED_ENGLISH | English content approved | QC approve (EN) |
|
||||
| APPROVED_SOURCE | Source language approved | QC approve (source) |
|
||||
| TRANSLATING | Google Translate + transcreation running | Approval triggers |
|
||||
| TTS_GENERATING | Per-cue audio synthesis in progress | Translation done |
|
||||
| TTS_FAILED | TTS service error — manual retry required | ElevenLabs/Google error |
|
||||
| RENDERING_VIDEO | FFmpeg compositing accessible video | TTS done |
|
||||
| RENDER_FAILED | FFmpeg error — manual retry required | FFmpeg error |
|
||||
| RENDERING_QC | Rendering complete, awaiting final QC | Render done |
|
||||
| PENDING_FINAL_REVIEW | PM reviewing final deliverables | QC approved |
|
||||
| REJECTED | Job permanently rejected | Reviewer action |
|
||||
| COMPLETED | Client notified, signed URLs delivered | PM final approval |
|
||||
|
||||
**Terminal states:** COMPLETED, REJECTED.
|
||||
**Manual-retry states:** TTS_FAILED, RENDER_FAILED.
|
||||
**Feedback loop:** QC_FEEDBACK → (fix) → PENDING_QC.
|
||||
|
||||
---
|
||||
|
||||
## Component Map
|
||||
|
||||
### Backend (`backend/app/`)
|
||||
|
||||
| Layer | Path | Responsibility |
|
||||
|-------|------|---------------|
|
||||
| API routes | `api/v1/routes_*.py` | HTTP + WebSocket endpoints, RBAC enforcement |
|
||||
| Core | `core/security.py` | JWT encode/decode, password hashing |
|
||||
| Core | `core/authz.py` | RBAC permission checks, `MembershipContext` |
|
||||
| Core | `core/dependencies.py` | FastAPI DI — `get_current_user`, `get_database` |
|
||||
| Core | `core/config.py` | Pydantic settings from env vars |
|
||||
| Models | `models/job.py` | Job document schema + `JobStatus` enum (16 states) |
|
||||
| Models | `models/user.py` | User document with roles |
|
||||
| Services | `services/gemini.py` | Gemini 2.5 Pro API wrapper |
|
||||
| Services | `services/gcs.py` | GCS V4 signed URLs, upload/download |
|
||||
| Services | `services/language_qc.py` | Per-language QC state machine |
|
||||
| Services | `services/glossary_service.py` | Hybrid exact + vector glossary retrieval |
|
||||
| Services | `services/audit_logger.py` | Audit trail — all state-changing actions |
|
||||
| Services | `services/microsoft_auth.py` | Microsoft SSO JWKS validation |
|
||||
| Services | `services/websocket.py` | WebSocket connection manager |
|
||||
| Tasks | `tasks/ingest_and_ai.py` | Main ingestion Celery task |
|
||||
| Tasks | `tasks/translate_and_synthesize.py` | Translation + TTS pipeline |
|
||||
| Tasks | `tasks/ffmpeg_operations.py` | Video rendering |
|
||||
| Middleware | `middleware/rate_limiting.py` | Redis-backed request throttling |
|
||||
| Middleware | `middleware/validation.py` | MongoDB injection protection |
|
||||
|
||||
### Frontend (`frontend/src/`)
|
||||
|
||||
| Layer | Path | Responsibility |
|
||||
|-------|------|---------------|
|
||||
| Routes | `routes/auth/` | Login, refresh, Microsoft SSO |
|
||||
| Routes | `routes/jobs/` | Job list, job detail, VTT editor |
|
||||
| Routes | `routes/admin/` | QC dashboard, audit log, user management |
|
||||
| Routes | `routes/org/` | Organisation settings, invite members |
|
||||
| Hooks | `hooks/useJob.tsx` | Job state + API calls |
|
||||
| Hooks | `hooks/useJobStatusWebSocket.ts` | WS connection with backoff reconnect |
|
||||
| Contexts | `contexts/GlobalWebSocketContext.tsx` | WS singleton per session |
|
||||
| Contexts | `contexts/NotificationContext.tsx` | Toast notifications |
|
||||
| Lib | `lib/auth.ts` | JWT in-memory store, refresh flow |
|
||||
| Lib | `lib/api.ts` | Axios instance with auth interceptor |
|
||||
| Components | `components/VttEditor.tsx` | Inline VTT editing with preview |
|
||||
| Components | `components/VideoWithCaptions.tsx` | Multi-language video player |
|
||||
| Components | `components/Layout/Sidebar.tsx` | Role-aware navigation |
|
||||
|
||||
---
|
||||
|
||||
## Auth Architecture
|
||||
|
||||
| Token | Storage | TTL | Purpose |
|
||||
|-------|---------|-----|---------|
|
||||
| Access token | JS memory (React context) | 15 min | Bearer for all API calls |
|
||||
| Refresh token | HttpOnly cookie | 7 days | Obtain new access tokens |
|
||||
|
||||
**Token flow:** Login → both tokens issued → access token in memory → on expiry, silent refresh via cookie → new access token in memory. On logout, both tokens revoked.
|
||||
|
||||
**Critical:** `get_current_user()` in `dependencies.py` must reject refresh tokens used as Bearer tokens (type check on payload).
|
||||
|
||||
---
|
||||
|
||||
## RBAC Matrix
|
||||
|
||||
| Resource | CLIENT | REVIEWER | LINGUIST | PM | ADMIN |
|
||||
|----------|--------|---------|---------|-----|-------|
|
||||
| Upload video | ✓ | — | — | — | ✓ |
|
||||
| View own jobs | ✓ | ✓ | — | ✓ | ✓ |
|
||||
| View all jobs | — | ✓ | — | ✓ | ✓ |
|
||||
| Edit VTT | — | ✓ | ✓ | — | ✓ |
|
||||
| QC approve/reject | — | ✓ | — | — | ✓ |
|
||||
| Assign linguist | — | — | — | ✓ | ✓ |
|
||||
| Final review | — | — | — | ✓ | ✓ |
|
||||
| User management | — | — | — | — | ✓ |
|
||||
| Audit log | — | — | — | ✓ | ✓ |
|
||||
|
||||
Implementation: `authz.py` → `MembershipContext`, `require_org_role(role)`, `require_platform_admin()`.
|
||||
|
||||
---
|
||||
|
||||
## Security Model
|
||||
|
||||
| Control | Implementation | File |
|
||||
|---------|---------------|------|
|
||||
| Rate limiting | Redis-backed, 5 req/5 min on login | `middleware/rate_limiting.py` |
|
||||
| Input validation | MongoDB operator blocklist + Pydantic | `middleware/validation.py` |
|
||||
| File access | GCS V4 signed URLs, 24h expiry | `services/gcs.py` |
|
||||
| Audit trail | Every state-changing action logged | `services/audit_logger.py` |
|
||||
| Secrets | GCP Secret Manager in production | `core/secrets_config.py` |
|
||||
| Error messages | Generic HTTP errors — no internal detail | `routes_auth.py` |
|
||||
| Token type check | Reject refresh tokens as Bearer | `core/dependencies.py` |
|
||||
|
||||
**Known gaps (from security audit 2026-04-29):** Login endpoint currently bypasses rate limiting (debugging artifact — must be fixed before launch). Microsoft SSO uses synchronous `requests.get()` in async context.
|
||||
|
||||
---
|
||||
|
||||
## Glossary Retrieval (Hybrid)
|
||||
|
||||
Two-pass retrieval for translation prompt injection:
|
||||
|
||||
| Pass | Method | Threshold | Limit |
|
||||
|------|--------|-----------|-------|
|
||||
| 1 — Exact | String match on source term | — | All matches |
|
||||
| 2 — Vector | Atlas Vector Search on embedding | ≥ 0.75 similarity | Top 20 |
|
||||
|
||||
Merged result: exact matches first, then vector matches, deduplicated, truncated to 50 terms. Injected as a block in the Gemini translation prompt.
|
||||
|
||||
**Index:** `glossary_embedding_index` in MongoDB Atlas.
|
||||
|
||||
---
|
||||
|
||||
## WebSocket Architecture
|
||||
|
||||
- Server: `services/websocket.py` — `ConnectionManager` class, org-scoped broadcasts
|
||||
- Client: `hooks/useJobStatusWebSocket.ts` — exponential backoff reconnect
|
||||
- Auth: WS upgrade requires valid access token
|
||||
- Events: `broadcast_to_org(org_id, event)` — no cross-tenant leakage
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** New job state added, auth flow change, new service integrated, RBAC change.
|
||||
**Verification:** State machine table matches `JobStatus` enum in `models/job.py`. RBAC matrix matches `authz.py` role checks.
|
||||
|
||||
<!-- END SCOPE: architecture -->
|
||||
219
docs/project/database_schema.md
Normal file
219
docs/project/database_schema.md
Normal file
|
|
@ -0,0 +1,219 @@
|
|||
# Database Schema — Accessible Video Processing Platform
|
||||
|
||||
<!-- SCOPE: database-schema | owner: ln-113 | generated: 2026-04-29 -->
|
||||
|
||||
**Database:** MongoDB Atlas
|
||||
**Database name:** configured via `MONGODB_DB` env var (default: `accessible_video`)
|
||||
|
||||
---
|
||||
|
||||
## Collections
|
||||
|
||||
### `jobs`
|
||||
|
||||
Central document for each video accessibility job.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| _id | ObjectId | Primary key |
|
||||
| org_id | ObjectId | Owning organisation |
|
||||
| client_user_id | ObjectId | User who uploaded the video |
|
||||
| status | string | JobStatus enum (16 values — see architecture.md) |
|
||||
| source_language | string | BCP-47 code (e.g., `en-US`) |
|
||||
| requested_outputs | array[string] | Output language codes |
|
||||
| source | object | `{ gcs_path, filename, duration_seconds }` |
|
||||
| outputs | object | Per-language `{ captions_vtt, ad_vtt, ad_mp3, accessible_mp4 }` GCS paths |
|
||||
| review | object | QC state `{ reviewer_id, approved_at, rejected_at, reason }` |
|
||||
| language_qc | object | Per-language QC state (see LanguageQCState below) |
|
||||
| vtt_versions | array | Version snapshot references (see `vtt_versions` collection) |
|
||||
| glossary_id | ObjectId | Client glossary to use for translation |
|
||||
| retry_count | int | Number of task retries |
|
||||
| error | string | Last error message |
|
||||
| created_at | datetime | ISO 8601 |
|
||||
| updated_at | datetime | ISO 8601 |
|
||||
| completed_at | datetime | ISO 8601 |
|
||||
|
||||
**LanguageQCState (per-language, nested in `language_qc`):**
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| status | string | `pending`, `assigned`, `approved`, `rejected`, `feedback_requested` |
|
||||
| linguist_id | ObjectId | Assigned linguist (nullable) |
|
||||
| assigned_at | datetime | When linguist was assigned |
|
||||
| reviewed_at | datetime | When approved/rejected |
|
||||
| reason | string | Rejection or feedback reason |
|
||||
|
||||
**Indexes:**
|
||||
|
||||
| Index | Fields | Purpose |
|
||||
|-------|--------|---------|
|
||||
| Primary | `_id` | Document lookup |
|
||||
| org_status | `org_id` + `status` | List jobs by org and status |
|
||||
| client | `client_user_id` | Client's own jobs |
|
||||
| created | `created_at` (desc) | Time-sorted listing |
|
||||
| status | `status` | Status-filtered queries |
|
||||
|
||||
---
|
||||
|
||||
### `users`
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| _id | ObjectId | Primary key |
|
||||
| email | string | Unique, lowercase |
|
||||
| hashed_password | string | bcrypt hash (null for SSO-only users) |
|
||||
| role | string | `client`, `reviewer`, `linguist`, `pm`, `admin` |
|
||||
| org_id | ObjectId | Primary organisation |
|
||||
| is_active | boolean | Account enabled flag |
|
||||
| microsoft_id | string | Entra ID subject claim (nullable) |
|
||||
| created_at | datetime | |
|
||||
| updated_at | datetime | |
|
||||
|
||||
**Indexes:**
|
||||
|
||||
| Index | Fields | Purpose |
|
||||
|-------|--------|---------|
|
||||
| email_unique | `email` (unique) | Login lookup |
|
||||
| org | `org_id` | Members-of-org query |
|
||||
| microsoft | `microsoft_id` (sparse) | SSO user lookup |
|
||||
|
||||
---
|
||||
|
||||
### `organizations`
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| _id | ObjectId | Primary key |
|
||||
| name | string | Organisation display name |
|
||||
| slug | string | URL-safe identifier |
|
||||
| member_ids | array[ObjectId] | User IDs in this org |
|
||||
| created_at | datetime | |
|
||||
|
||||
**Indexes:**
|
||||
|
||||
| Index | Fields | Purpose |
|
||||
|-------|--------|---------|
|
||||
| slug_unique | `slug` (unique) | Org lookup by slug |
|
||||
|
||||
---
|
||||
|
||||
### `glossaries`
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| _id | ObjectId | Primary key |
|
||||
| org_id | ObjectId | Owning organisation |
|
||||
| name | string | Glossary display name |
|
||||
| terms | array | Array of GlossaryTerm documents |
|
||||
| created_at | datetime | |
|
||||
| updated_at | datetime | |
|
||||
|
||||
**GlossaryTerm (embedded in `terms`):**
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| _id | ObjectId | Term ID |
|
||||
| source_term | string | Term in source language |
|
||||
| target_language | string | BCP-47 code |
|
||||
| preferred_translation | string | Required translation |
|
||||
| context | string | Usage notes (optional) |
|
||||
| embedding | array[float] | Vector embedding for similarity search |
|
||||
|
||||
**Indexes:**
|
||||
|
||||
| Index | Fields | Purpose |
|
||||
|-------|--------|---------|
|
||||
| org | `org_id` | List org glossaries |
|
||||
| vector | `terms.embedding` (Atlas Vector Search) | Similarity retrieval |
|
||||
|
||||
**Atlas Vector Search index name:** `glossary_embedding_index`
|
||||
|
||||
---
|
||||
|
||||
### `vtt_versions`
|
||||
|
||||
Immutable version snapshots created before each VTT save.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| _id | ObjectId | Primary key |
|
||||
| job_id | ObjectId | Parent job |
|
||||
| language | string | Language code |
|
||||
| version_number | int | Sequential version number |
|
||||
| content | string | Full VTT file content at time of snapshot |
|
||||
| author_id | ObjectId | User who made the change |
|
||||
| created_at | datetime | Snapshot timestamp |
|
||||
| diff_from_prev | string | Diff against previous version (optional) |
|
||||
|
||||
**Indexes:**
|
||||
|
||||
| Index | Fields | Purpose |
|
||||
|-------|--------|---------|
|
||||
| job_lang | `job_id` + `language` + `version_number` | Version history listing |
|
||||
| job_lang_created | `job_id` + `language` + `created_at` (desc) | Time-sorted history |
|
||||
|
||||
---
|
||||
|
||||
### `audit_logs`
|
||||
|
||||
Immutable audit trail for all reviewer, linguist, and PM actions.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| _id | ObjectId | Primary key |
|
||||
| actor_id | ObjectId | User performing the action |
|
||||
| actor_email | string | Denormalised for readability |
|
||||
| action | string | Action type enum (see below) |
|
||||
| job_id | ObjectId | Affected job (nullable) |
|
||||
| org_id | ObjectId | Organisation context |
|
||||
| before_state | string | Job status before action |
|
||||
| after_state | string | Job status after action |
|
||||
| metadata | object | Action-specific context (reason, language, etc.) |
|
||||
| created_at | datetime | Event timestamp |
|
||||
|
||||
**Action types:**
|
||||
|
||||
| Action | Trigger |
|
||||
|--------|---------|
|
||||
| `job_approved` | QC approve |
|
||||
| `job_rejected` | QC reject |
|
||||
| `qc_feedback_sent` | QC feedback |
|
||||
| `language_approved` | Language-level QC approve |
|
||||
| `language_rejected` | Language-level QC reject |
|
||||
| `linguist_assigned` | PM assigns linguist |
|
||||
| `vtt_edited` | VTT content saved |
|
||||
| `vtt_restored` | Version restore |
|
||||
| `job_retry` | Admin manual retry |
|
||||
| `user_invited` | PM/Admin invites member |
|
||||
|
||||
**Indexes:**
|
||||
|
||||
| Index | Fields | Purpose |
|
||||
|-------|--------|---------|
|
||||
| job | `job_id` + `created_at` | Per-job audit trail |
|
||||
| org_created | `org_id` + `created_at` (desc) | Org-level audit log |
|
||||
| actor | `actor_id` + `created_at` | Per-user action history |
|
||||
|
||||
---
|
||||
|
||||
### `invitations`
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| _id | ObjectId | Primary key |
|
||||
| email | string | Invitee email |
|
||||
| org_id | ObjectId | Org being joined |
|
||||
| role | string | Role to assign on accept |
|
||||
| token | string | Unique invite token (hashed) |
|
||||
| expires_at | datetime | 7-day expiry |
|
||||
| accepted_at | datetime | Nullable — set on accept |
|
||||
| created_by | ObjectId | User who sent invite |
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** New collection added, index added or removed, field added to model.
|
||||
**Verification:** All collections listed here exist in production Atlas. Index names match `backend/app/core/database.py` `create_indexes()` function (currently commented out — indexes were created manually).
|
||||
|
||||
<!-- END SCOPE: database-schema -->
|
||||
146
docs/project/infrastructure.md
Normal file
146
docs/project/infrastructure.md
Normal file
|
|
@ -0,0 +1,146 @@
|
|||
# Infrastructure — Accessible Video Processing Platform
|
||||
|
||||
<!-- SCOPE: infrastructure | owner: ln-115 | generated: 2026-04-29 -->
|
||||
|
||||
## Server Inventory
|
||||
|
||||
| Server | Role | Resources | Location |
|
||||
|--------|------|-----------|---------|
|
||||
| optical-web-1 | Production host | 32GB RAM, 8 CPU | GCP VM |
|
||||
|
||||
**Domain:** ai-sandbox.oliver.solutions
|
||||
**SSL:** Wildcard certificate covering *.ai-sandbox.oliver.solutions
|
||||
|
||||
---
|
||||
|
||||
## URL Map
|
||||
|
||||
| Endpoint | URL | Served by |
|
||||
|----------|-----|---------|
|
||||
| Frontend SPA | `https://ai-sandbox.oliver.solutions/video-accessibility/` | Apache → /var/www/html/video-accessibility |
|
||||
| Backend API | `https://ai-sandbox.oliver.solutions/video-accessibility-back/` | Apache → localhost:8000 |
|
||||
| Backend health | `https://ai-sandbox.oliver.solutions/video-accessibility-back/health` | FastAPI |
|
||||
| Backend docs | `https://ai-sandbox.oliver.solutions/video-accessibility-back/docs` | FastAPI (Swagger) |
|
||||
| Prometheus metrics | localhost:8001 | Prometheus client (internal only) |
|
||||
| WebSocket | `wss://ai-sandbox.oliver.solutions/video-accessibility-back/api/v1/ws/` | Apache mod_proxy_wstunnel |
|
||||
|
||||
---
|
||||
|
||||
## Docker Compose Services
|
||||
|
||||
| Service | Image | Port (internal) | Port (host) | Depends on |
|
||||
|---------|-------|----------------|------------|-----------|
|
||||
| api | backend/Dockerfile | 8000 | 8000 | mongodb, redis |
|
||||
| worker | backend/Dockerfile (celery cmd) | — | — | mongodb, redis |
|
||||
| mongodb | mongo:7.0 | 27017 | 27017 | — |
|
||||
| redis | redis:7.2 | 6379 | 6379 | — |
|
||||
|
||||
**Deploy path:** `/opt/video-accessibility/`
|
||||
|
||||
---
|
||||
|
||||
## Apache Configuration Requirements
|
||||
|
||||
| Module | Required for |
|
||||
|--------|-------------|
|
||||
| mod_rewrite | SPA routing (all paths → index.html) |
|
||||
| mod_proxy | API reverse proxy |
|
||||
| mod_proxy_http | HTTP proxying |
|
||||
| mod_proxy_wstunnel | WebSocket proxying |
|
||||
| mod_headers | CORS + security headers |
|
||||
|
||||
Config snippet location: `APACHE_DEPLOYMENT.md` (archived) and `/etc/apache2/sites-available/ai-sandbox.oliver.solutions-ssl.conf` on server.
|
||||
|
||||
---
|
||||
|
||||
## GCS Layout
|
||||
|
||||
**Bucket:** `accessible-video` (GCP project: `optical-414516`)
|
||||
|
||||
| Path pattern | Contents |
|
||||
|-------------|---------|
|
||||
| `{jobId}/source.mp4` | Original uploaded video |
|
||||
| `{jobId}/en/captions.vtt` | English closed captions |
|
||||
| `{jobId}/en/ad.vtt` | English audio description VTT |
|
||||
| `{jobId}/en/ad.mp3` | English audio description audio |
|
||||
| `{jobId}/{lang}/captions.vtt` | Translated captions (e.g., `fr/`, `de/`) |
|
||||
| `{jobId}/{lang}/ad.vtt` | Translated audio description VTT |
|
||||
| `{jobId}/{lang}/ad.mp3` | Translated audio description audio |
|
||||
| `{jobId}/accessible.mp4` | Final accessible video (burned-in captions + AD audio) |
|
||||
|
||||
**Signed URL expiry:** 24h (V4 signing). URLs must not be cached or stored in the database.
|
||||
|
||||
---
|
||||
|
||||
## External Service Dependencies
|
||||
|
||||
| Service | Region / Endpoint | Rate limits / Quotas |
|
||||
|---------|-----------------|-------------------|
|
||||
| MongoDB Atlas | Cloud (Atlas cluster) | M10+ tier recommended |
|
||||
| GCS | us-central1 | Standard storage class |
|
||||
| Gemini 2.5 Pro | `generativelanguage.googleapis.com` | Per project quota |
|
||||
| Google Cloud TTS | `texttospeech.googleapis.com` | 1M chars/month free tier |
|
||||
| Google Cloud Translate | `translate.googleapis.com` | 500k chars/month free tier |
|
||||
| ElevenLabs | `api.elevenlabs.io` | Subscription-dependent |
|
||||
| SendGrid | `api.sendgrid.com` | 100 emails/day free tier |
|
||||
| Microsoft Entra ID | `login.microsoftonline.com` | Tenant-configured |
|
||||
| GCP Secret Manager | `secretmanager.googleapis.com` | 10k ops/month free |
|
||||
| Sentry | `sentry.io` | Project DSN |
|
||||
|
||||
---
|
||||
|
||||
## Network Ports
|
||||
|
||||
| Port | Service | Exposed to |
|
||||
|------|---------|-----------|
|
||||
| 443 | Apache HTTPS | Public |
|
||||
| 80 | Apache HTTP (→ 443 redirect) | Public |
|
||||
| 8000 | FastAPI | localhost only |
|
||||
| 8001 | Prometheus metrics | localhost only |
|
||||
| 27017 | MongoDB | Docker network only |
|
||||
| 6379 | Redis | Docker network only |
|
||||
|
||||
---
|
||||
|
||||
## Secret Management
|
||||
|
||||
**Production:** GCP Secret Manager. Secrets fetched at startup via `core/secrets_config.py`.
|
||||
**Local:** `.env.local` (gitignored).
|
||||
**Template:** `.env.prod.example` (checked in, no real values).
|
||||
|
||||
| Secret | Where used |
|
||||
|--------|-----------|
|
||||
| `JWT_SECRET_KEY` | Access token signing |
|
||||
| `JWT_REFRESH_SECRET_KEY` | Refresh token signing |
|
||||
| `GEMINI_API_KEY` | Gemini API |
|
||||
| `ELEVENLABS_API_KEY` | ElevenLabs TTS |
|
||||
| `SENDGRID_API_KEY` | Email delivery |
|
||||
| `GCS_BUCKET_NAME` | File storage |
|
||||
| `GOOGLE_CLOUD_PROJECT` | GCP project ID |
|
||||
| `MONGODB_URI` | Atlas connection string |
|
||||
| `REDIS_URL` | Redis connection |
|
||||
| `SENTRY_DSN` | Error tracking |
|
||||
| `DEFAULT_ADMIN_PASSWORD` | Seed script (must not have fallback value) |
|
||||
|
||||
---
|
||||
|
||||
## GCP Service Account IAM Roles
|
||||
|
||||
| Role | Purpose |
|
||||
|------|---------|
|
||||
| Storage Admin | GCS read/write + signed URL generation |
|
||||
| AI Platform User | Gemini API access |
|
||||
| Cloud Translation User | Translate API access |
|
||||
| Cloud Text-to-Speech User | TTS API access |
|
||||
| Secret Manager Secret Accessor | Read secrets at runtime |
|
||||
|
||||
**Credentials file:** `./secrets/gcp-credentials.json` (mounted into Docker containers, permissions 600).
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** Server migration, new external service, GCS bucket rename, secret rotation.
|
||||
**Verification:** All URLs in URL Map resolve. Docker service ports match `docker-compose.prod.yml`. GCS bucket name matches `GCS_BUCKET_NAME` env var.
|
||||
|
||||
<!-- END SCOPE: infrastructure -->
|
||||
154
docs/project/requirements.md
Normal file
154
docs/project/requirements.md
Normal file
|
|
@ -0,0 +1,154 @@
|
|||
# Functional Requirements — Accessible Video Processing Platform
|
||||
|
||||
<!-- SCOPE: requirements | owner: ln-112 | generated: 2026-04-29 -->
|
||||
|
||||
## Purpose
|
||||
|
||||
This document specifies what the system must do from a user perspective. Implementation details belong in [architecture.md](architecture.md). Non-functional requirements (performance, security) belong in [architecture.md](architecture.md#security-model).
|
||||
|
||||
---
|
||||
|
||||
## User Roles
|
||||
|
||||
| Role | Who | Primary action |
|
||||
|------|-----|---------------|
|
||||
| CLIENT | Paying customer | Upload videos, download deliverables |
|
||||
| REVIEWER | Oliver internal | QC approve/reject captions + audio description |
|
||||
| LINGUIST | Language specialist | Review and approve translated content per language |
|
||||
| PM | Project manager | Assign linguists, give final approval, monitor all jobs |
|
||||
| ADMIN | Platform operator | Manage users, view audit log, configure platform |
|
||||
|
||||
---
|
||||
|
||||
## Core Features
|
||||
|
||||
### R-01: Video Upload
|
||||
|
||||
| Requirement | Detail |
|
||||
|-------------|--------|
|
||||
| R-01.1 | Client can upload an MP4 video file |
|
||||
| R-01.2 | File is stored in GCS; client receives a job ID |
|
||||
| R-01.3 | Upload progress is displayed in real time |
|
||||
| R-01.4 | System validates file type (MP4 only) and size on upload |
|
||||
| R-01.5 | Upload creates a job record in CREATED state |
|
||||
|
||||
### R-02: AI Processing Pipeline
|
||||
|
||||
| Requirement | Detail |
|
||||
|-------------|--------|
|
||||
| R-02.1 | System automatically generates closed captions in VTT format using Gemini 2.5 Pro |
|
||||
| R-02.2 | System generates audio description VTT (scene descriptions for blind/low-vision viewers) |
|
||||
| R-02.3 | System generates SDH captions (includes sound effects and speaker IDs) |
|
||||
| R-02.4 | System generates a descriptive transcript |
|
||||
| R-02.5 | All generated VTT files are validated for correct format before advancing to QC |
|
||||
| R-02.6 | Job status is updated in real time via WebSocket during processing |
|
||||
|
||||
### R-03: Quality Control Workflow
|
||||
|
||||
| Requirement | Detail |
|
||||
|-------------|--------|
|
||||
| R-03.1 | Reviewer can view video with captions side-by-side |
|
||||
| R-03.2 | Reviewer can edit individual VTT cues (text + timing) inline |
|
||||
| R-03.3 | Reviewer can approve English content (advances to APPROVED_ENGLISH) |
|
||||
| R-03.4 | Reviewer can reject a job with a reason (advances to REJECTED) |
|
||||
| R-03.5 | Reviewer can send QC feedback without full rejection (advances to QC_FEEDBACK) |
|
||||
| R-03.6 | All QC actions are recorded in the audit log with timestamp and user ID |
|
||||
| R-03.7 | VTT edits create a version snapshot before overwriting (version history maintained) |
|
||||
|
||||
### R-04: Per-Language QC
|
||||
|
||||
| Requirement | Detail |
|
||||
|-------------|--------|
|
||||
| R-04.1 | PM can assign a specific linguist to each output language |
|
||||
| R-04.2 | Linguist can approve or reject their assigned language |
|
||||
| R-04.3 | Language statuses are independent — approving French does not affect German |
|
||||
| R-04.4 | Linguist cannot approve a language not assigned to them |
|
||||
| R-04.5 | Job advances to PENDING_FINAL_REVIEW only when all languages are approved |
|
||||
| R-04.6 | Per-language QC actions are recorded in the audit log |
|
||||
|
||||
### R-05: Translation and TTS
|
||||
|
||||
| Requirement | Detail |
|
||||
|-------------|--------|
|
||||
| R-05.1 | System translates captions and AD into all requested output languages |
|
||||
| R-05.2 | System applies cultural transcreation (Gemini-assisted) where configured |
|
||||
| R-05.3 | System uses client-specific glossary terms in translation prompts |
|
||||
| R-05.4 | System synthesises audio description audio via Google TTS or ElevenLabs |
|
||||
| R-05.5 | TTS is performed per cue to preserve timing |
|
||||
| R-05.6 | TTS failures result in TTS_FAILED state; manual retry is supported |
|
||||
|
||||
### R-06: Glossary Management
|
||||
|
||||
| Requirement | Detail |
|
||||
|-------------|--------|
|
||||
| R-06.1 | Admin can create, read, update, and delete glossary terms per client organisation |
|
||||
| R-06.2 | Glossary terms specify source term, target language, preferred translation |
|
||||
| R-06.3 | System uses exact match first, then vector similarity (≥0.75) for retrieval |
|
||||
| R-06.4 | Up to 50 terms are injected per translation prompt |
|
||||
| R-06.5 | Glossary embeddings are indexed in Atlas Vector Search |
|
||||
|
||||
### R-07: VTT Version Control
|
||||
|
||||
| Requirement | Detail |
|
||||
|-------------|--------|
|
||||
| R-07.1 | System creates a snapshot before each VTT edit save |
|
||||
| R-07.2 | Reviewer can view version history with author, timestamp, and diff |
|
||||
| R-07.3 | Reviewer can restore any previous version |
|
||||
| R-07.4 | Concurrent edit conflict is detected and reported to the later editor |
|
||||
|
||||
### R-08: Final Review and Delivery
|
||||
|
||||
| Requirement | Detail |
|
||||
|-------------|--------|
|
||||
| R-08.1 | PM can view all final deliverable files before approving |
|
||||
| R-08.2 | PM approval triggers client notification email |
|
||||
| R-08.3 | Email contains signed GCS download URLs (24h expiry) |
|
||||
| R-08.4 | Client can download captions, audio descriptions, and accessible video |
|
||||
| R-08.5 | Job status advances to COMPLETED after PM approval |
|
||||
|
||||
### R-09: Authentication and Access Control
|
||||
|
||||
| Requirement | Detail |
|
||||
|-------------|--------|
|
||||
| R-09.1 | Users authenticate via email/password (local) or Microsoft SSO (enterprise) |
|
||||
| R-09.2 | Access tokens are valid for 15 minutes; refresh tokens for 7 days |
|
||||
| R-09.3 | Refresh tokens are stored in HttpOnly cookies only |
|
||||
| R-09.4 | All API endpoints enforce RBAC — role checked server-side on every request |
|
||||
| R-09.5 | Login is rate-limited to 5 attempts per 5-minute window |
|
||||
|
||||
### R-10: Audit Logging
|
||||
|
||||
| Requirement | Detail |
|
||||
|-------------|--------|
|
||||
| R-10.1 | Every state-changing action by a reviewer, linguist, or PM creates an audit log entry |
|
||||
| R-10.2 | Audit log entries contain: actor user ID, action type, job ID, timestamp, before/after state |
|
||||
| R-10.3 | Admin can view the full audit log filtered by user, job, or date range |
|
||||
| R-10.4 | Audit log entries are immutable once written |
|
||||
|
||||
### R-11: Real-time Notifications
|
||||
|
||||
| Requirement | Detail |
|
||||
|-------------|--------|
|
||||
| R-11.1 | Job status changes are pushed to connected clients via WebSocket |
|
||||
| R-11.2 | WebSocket events are org-scoped — users only receive events for their organisation |
|
||||
| R-11.3 | WebSocket connection recovers automatically after disconnect (exponential backoff) |
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope (Current Version)
|
||||
|
||||
| Feature | Reason |
|
||||
|---------|--------|
|
||||
| Automated transcription (Whisper) | Gemini handles transcription; Whisper worker exists but not active |
|
||||
| CI/CD pipeline | Manual deploy via scripts; CI exists but does not run full test suite |
|
||||
| Load testing | Not implemented; deferred to Phase 7 |
|
||||
| Multi-tenant billing | Cost tracking via oliver-cost-tracker SDK (read-only dashboard) |
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** New feature scope confirmed, requirement changed by stakeholder, QC workflow changes.
|
||||
**Verification:** Every R-XX.X requirement maps to at least one test in [tests/README.md](../../tests/README.md) or [/tmp/audit/test-plan.md](/tmp/audit/test-plan.md).
|
||||
|
||||
<!-- END SCOPE: requirements -->
|
||||
213
docs/project/runbook.md
Normal file
213
docs/project/runbook.md
Normal file
|
|
@ -0,0 +1,213 @@
|
|||
# Runbook — Accessible Video Processing Platform
|
||||
|
||||
<!-- SCOPE: runbook | owner: ln-115 | generated: 2026-04-29 -->
|
||||
|
||||
## Local Development Setup
|
||||
|
||||
### Prerequisites
|
||||
|
||||
| Requirement | Version |
|
||||
|-------------|---------|
|
||||
| Docker | 20.10+ |
|
||||
| Docker Compose | V2 (bundled with Docker Desktop) |
|
||||
| Node.js | 20+ |
|
||||
| Python | 3.11+ (for local scripts only; app runs in Docker) |
|
||||
| GCP credentials file | `./secrets/gcp-credentials.json` |
|
||||
|
||||
### First-Time Setup
|
||||
|
||||
| Step | Command / Action |
|
||||
|------|-----------------|
|
||||
| 1. Copy env template | `cp .env.prod.example .env.local` — fill in all values |
|
||||
| 2. Copy frontend env | `cp frontend/.env.example frontend/.env.local` |
|
||||
| 3. Place GCP credentials | Copy service account JSON to `./secrets/gcp-credentials.json` |
|
||||
| 4. Set permissions | `chmod 600 ./secrets/gcp-credentials.json` |
|
||||
|
||||
### Starting the Local Environment
|
||||
|
||||
**Step 1 — Backend (Docker):**
|
||||
|
||||
`./scripts/run-local.sh`
|
||||
|
||||
Services after start:
|
||||
|
||||
| Service | URL |
|
||||
|---------|-----|
|
||||
| API | http://localhost:8003 |
|
||||
| API docs (Swagger) | http://localhost:8003/docs |
|
||||
| MongoDB | mongodb://localhost:27017 |
|
||||
| Redis | redis://localhost:6379 |
|
||||
|
||||
**Step 2 — Frontend (Vite dev server, separate terminal):**
|
||||
|
||||
`cd frontend && npm install && npm run dev`
|
||||
|
||||
Frontend URL: http://localhost:6001/video-accessibility
|
||||
|
||||
### Common Local Commands
|
||||
|
||||
| Action | Command |
|
||||
|--------|---------|
|
||||
| Rebuild containers after code change | `./scripts/run-local.sh --rebuild` |
|
||||
| Stop all services | `./scripts/run-local.sh --stop` |
|
||||
| Tail all logs | `docker compose logs -f` |
|
||||
| Tail API logs | `docker compose logs -f api` |
|
||||
| Tail worker logs | `docker compose logs -f worker` |
|
||||
| Restart a service | `docker compose restart api` |
|
||||
|
||||
### Test Credentials (Local Only)
|
||||
|
||||
| Role | Email | Password |
|
||||
|------|-------|---------|
|
||||
| Admin | admin@example.com | admin |
|
||||
| Reviewer | reviewer@example.com | reviewer |
|
||||
| Client | client@example.com | client123 |
|
||||
|
||||
Production uses Microsoft SSO — these credentials do not work in production.
|
||||
|
||||
---
|
||||
|
||||
## Production Deployment
|
||||
|
||||
**Server:** optical-web-1
|
||||
**Deploy path:** `/opt/video-accessibility/`
|
||||
**URL:** https://ai-sandbox.oliver.solutions/video-accessibility/
|
||||
|
||||
### Full Deployment (code + frontend)
|
||||
|
||||
Run on server (requires explicit user instruction — NEVER run via SSH without user approval):
|
||||
|
||||
`./scripts/full-deploy.sh`
|
||||
|
||||
This script:
|
||||
|
||||
| Step | Action |
|
||||
|------|--------|
|
||||
| 1 | Pull latest code from git |
|
||||
| 2 | Build Docker images |
|
||||
| 3 | Restart containers |
|
||||
| 4 | Build frontend bundle |
|
||||
| 5 | Copy bundle to Apache webroot |
|
||||
| 6 | Run DB seed if needed |
|
||||
|
||||
### Frontend-Only Deployment
|
||||
|
||||
`./scripts/build-frontend.sh`
|
||||
|
||||
Builds the React bundle and copies to `/var/www/html/video-accessibility/`.
|
||||
|
||||
### Verification After Deploy
|
||||
|
||||
| Check | Command / URL |
|
||||
|-------|--------------|
|
||||
| API health | `curl https://ai-sandbox.oliver.solutions/video-accessibility-back/health` |
|
||||
| Container status | `docker compose ps` |
|
||||
| Frontend loads | Visit https://ai-sandbox.oliver.solutions/video-accessibility |
|
||||
| Worker running | `docker compose logs --tail=20 worker` |
|
||||
|
||||
---
|
||||
|
||||
## Database Operations
|
||||
|
||||
### Backup MongoDB
|
||||
|
||||
| Step | Command |
|
||||
|------|---------|
|
||||
| Dump to container | `docker compose exec mongodb mongodump --out=/data/backup` |
|
||||
| Copy to host | `docker cp accessible-video-mongodb:/data/backup ./mongodb-backup-$(date +%Y%m%d)` |
|
||||
|
||||
### Restore MongoDB
|
||||
|
||||
| Step | Command |
|
||||
|------|---------|
|
||||
| Copy to container | `docker cp ./mongodb-backup accessible-video-mongodb:/data/restore` |
|
||||
| Restore | `docker compose exec mongodb mongorestore /data/restore` |
|
||||
|
||||
### MongoDB Shell
|
||||
|
||||
`docker compose exec mongodb mongosh`
|
||||
|
||||
---
|
||||
|
||||
## Restarting Services
|
||||
|
||||
| Action | Command |
|
||||
|--------|---------|
|
||||
| Restart all | `docker compose restart` |
|
||||
| Restart API only | `docker compose restart api` |
|
||||
| Restart worker only | `docker compose restart worker` |
|
||||
| Rebuild + restart one service | `docker compose up -d --build api` |
|
||||
|
||||
---
|
||||
|
||||
## Updating Application
|
||||
|
||||
| Step | Command |
|
||||
|------|---------|
|
||||
| Pull code | `git pull origin main` |
|
||||
| Full redeploy | `./scripts/full-deploy.sh` |
|
||||
| Frontend only | `./scripts/build-frontend.sh` |
|
||||
|
||||
---
|
||||
|
||||
## Linting and Type Checking
|
||||
|
||||
| Check | Command | Must pass before deploy |
|
||||
|-------|---------|------------------------|
|
||||
| Backend lint | `cd backend && ruff check .` | Yes |
|
||||
| Backend type check | `docker compose exec api python -m mypy app/` | Yes |
|
||||
| Frontend lint | `cd frontend && npm run lint` | Yes |
|
||||
| Frontend type check | `cd frontend && npm run type-check` | Yes (currently 0 errors) |
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
| Tool | Access | Purpose |
|
||||
|------|--------|---------|
|
||||
| Docker stats | `docker stats` | Container CPU/memory usage |
|
||||
| API logs | `docker compose logs -f api` | Request errors |
|
||||
| Worker logs | `docker compose logs -f worker` | Task errors |
|
||||
| Sentry | sentry.io | Exception capture + stack traces |
|
||||
| Prometheus | localhost:8001/metrics | Metrics (internal only) |
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Check | Fix |
|
||||
|---------|-------|-----|
|
||||
| 502 Bad Gateway on API | `docker compose ps api` + logs | Restart: `docker compose restart api` |
|
||||
| Frontend 404 | `ls /var/www/html/video-accessibility/` | Rebuild: `./scripts/build-frontend.sh` |
|
||||
| WebSocket fails | `apache2ctl -M | grep proxy_wstunnel` | `sudo a2enmod proxy_wstunnel && sudo systemctl restart apache2` |
|
||||
| Worker not processing | `docker compose logs -f worker` | Check Redis URL + GCP credentials mount |
|
||||
| Upload fails (GCS) | Test credentials in container | Check `./secrets/gcp-credentials.json` exists + permissions |
|
||||
| MongoDB auth fails | Check `MONGODB_URI` env var | Verify Atlas connection string |
|
||||
|
||||
---
|
||||
|
||||
## Apache Configuration
|
||||
|
||||
Required modules:
|
||||
|
||||
`sudo a2enmod rewrite proxy proxy_http proxy_wstunnel headers && sudo systemctl restart apache2`
|
||||
|
||||
Config file: `/etc/apache2/sites-available/ai-sandbox.oliver.solutions-ssl.conf`
|
||||
|
||||
Key directives needed:
|
||||
|
||||
| Directive | Purpose |
|
||||
|-----------|---------|
|
||||
| `Alias /video-accessibility /var/www/html/video-accessibility` | Serve frontend |
|
||||
| `ProxyPass /video-accessibility-back http://localhost:8000` | Proxy API |
|
||||
| `RewriteRule ^ /video-accessibility/index.html [L]` | SPA routing |
|
||||
| `RewriteEngine On` with WebSocket rules | WS proxy |
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** New deploy script, new service port, new server.
|
||||
**Verification:** All commands in this runbook execute without error on a clean checkout. Test credentials are not committed to production env files.
|
||||
|
||||
<!-- END SCOPE: runbook -->
|
||||
94
docs/project/tech_stack.md
Normal file
94
docs/project/tech_stack.md
Normal file
|
|
@ -0,0 +1,94 @@
|
|||
# Tech Stack — Accessible Video Processing Platform
|
||||
|
||||
<!-- SCOPE: tech-stack | owner: ln-113 | generated: 2026-04-29 -->
|
||||
|
||||
## Backend
|
||||
|
||||
| Component | Package | Version | Role |
|
||||
|-----------|---------|---------|------|
|
||||
| Web framework | FastAPI | 0.115.0 | Async REST + WebSocket |
|
||||
| Task queue | Celery | 5.3.4 | Background AI processing |
|
||||
| Database driver | Motor (AsyncIOMotor) | latest | MongoDB async driver |
|
||||
| Cache / broker | Redis | 7.2 | Celery broker + rate limit state |
|
||||
| Data validation | Pydantic | 2.5 | Request/response schemas |
|
||||
| Runtime | Python | 3.11 | Language version |
|
||||
| Package manager | Poetry | latest | Dependency management |
|
||||
| ASGI server | Uvicorn | latest | HTTP server |
|
||||
| JWT | python-jose | ^3.3.0 | Token encode/decode |
|
||||
| Password hash | passlib + bcrypt | latest | Password hashing |
|
||||
| Observability | OpenTelemetry | latest | Tracing (currently disabled) |
|
||||
| Error tracking | Sentry SDK | latest | Exception capture |
|
||||
| Metrics | Prometheus client | latest | `/metrics` endpoint |
|
||||
|
||||
## Frontend
|
||||
|
||||
| Component | Package | Version | Role |
|
||||
|-----------|---------|---------|------|
|
||||
| UI framework | React | 19.1.1 | Component model |
|
||||
| Build tool | Vite | 7.1.2 | Dev server + bundler |
|
||||
| Language | TypeScript | 5.8 | Type safety |
|
||||
| Server state | TanStack Query | 5.85 | API caching + invalidation |
|
||||
| Routing | React Router | 7.8 | Client-side routing |
|
||||
| Styling | Tailwind CSS | 4.1 | Utility-first CSS |
|
||||
| Client state | Zustand | 5.0 | Auth token + UI state |
|
||||
| Forms | React Hook Form + Zod | latest | Form handling + validation |
|
||||
| HTTP client | Axios | latest | API calls with interceptors |
|
||||
| E2E testing | Playwright | latest | Browser automation |
|
||||
| Unit testing | Vitest + RTL | latest | Component tests |
|
||||
|
||||
## External Services
|
||||
|
||||
| Service | Provider | Purpose | Auth |
|
||||
|---------|---------|---------|------|
|
||||
| Caption / AD generation | Gemini 2.5 Pro | Core AI processing | API key |
|
||||
| Translation | Google Cloud Translate | 50+ language translation | Service account |
|
||||
| Transcreation | Gemini 2.5 Pro | Cultural adaptation | API key |
|
||||
| TTS (primary) | Google Cloud TTS | Audio description synthesis | Service account |
|
||||
| TTS (premium voices) | ElevenLabs | High-quality voice synthesis | API key |
|
||||
| File storage | Google Cloud Storage | Video + VTT files | Service account |
|
||||
| Email | SendGrid | Client delivery notifications | API key |
|
||||
| SSO | Microsoft Entra ID | Enterprise login | OAuth2/OIDC |
|
||||
| Error tracking | Sentry | Exception capture + performance | DSN |
|
||||
| Secrets | GCP Secret Manager | Production credentials | Service account |
|
||||
|
||||
## Database
|
||||
|
||||
| Store | Technology | Host | Purpose |
|
||||
|-------|-----------|------|---------|
|
||||
| Primary DB | MongoDB Atlas | Cloud (M10+) | Job documents, users, glossaries |
|
||||
| Cache / broker | Redis | Docker (local) / Cloud (prod) | Celery tasks, rate limit counters |
|
||||
| Vector index | Atlas Vector Search | In MongoDB Atlas | Glossary embedding retrieval |
|
||||
|
||||
## Infrastructure
|
||||
|
||||
| Layer | Technology | Notes |
|
||||
|-------|-----------|-------|
|
||||
| Container runtime | Docker + Docker Compose | All backend services containerized |
|
||||
| Reverse proxy | Apache 2.4 | Modules: mod_proxy, mod_proxy_wstunnel, mod_rewrite |
|
||||
| Server OS | Linux | optical-web-1 (GCP VM, 32GB RAM, 8 CPU) |
|
||||
| Python env | Poetry virtualenv | Inside Docker containers |
|
||||
| Node env | Node 20+ / npm | Frontend build |
|
||||
|
||||
## Configuration Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `backend/pyproject.toml` | Python deps + ruff config |
|
||||
| `backend/mypy.ini` | mypy type-check config |
|
||||
| `frontend/package.json` | Node deps |
|
||||
| `frontend/eslint.config.js` | ESLint rules |
|
||||
| `frontend/tsconfig.json` | TypeScript config |
|
||||
| `frontend/playwright.config.ts` | E2E test config |
|
||||
| `docker-compose.yml` | Base services |
|
||||
| `docker-compose.local.yml` | Local dev overrides |
|
||||
| `docker-compose.prod.yml` | Production overrides |
|
||||
| `.env.local` | Local secrets (gitignored) |
|
||||
| `.env.production` | Production secrets (gitignored) |
|
||||
| `.env.prod.example` | Template for production env |
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** Dependency version bump, new external service added, Python or Node runtime version change.
|
||||
**Verification:** Versions match `backend/pyproject.toml` and `frontend/package.json`.
|
||||
|
||||
<!-- END SCOPE: tech-stack -->
|
||||
46
docs/reference/README.md
Normal file
46
docs/reference/README.md
Normal file
|
|
@ -0,0 +1,46 @@
|
|||
# Reference Documentation — Accessible Video Processing Platform
|
||||
|
||||
<!-- SCOPE: reference-hub | owner: ln-120 | generated: 2026-04-29 -->
|
||||
|
||||
This hub indexes all reference documentation: architectural decisions, how-to guides, operator manuals, and research notes.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Decision Records (ADRs)
|
||||
|
||||
ADRs capture significant architectural decisions, their context, and their trade-offs.
|
||||
|
||||
| ADR | Title | Status |
|
||||
|-----|-------|--------|
|
||||
| [ADR-001](adrs/2026-04-29-async-celery-bridge.md) | Async Celery bridge via new event loop per task | Accepted |
|
||||
| [ADR-002](adrs/2026-04-29-jwt-memory-storage.md) | Access tokens stored in JS memory, not localStorage | Accepted |
|
||||
| [ADR-003](adrs/2026-04-29-hybrid-glossary-retrieval.md) | Hybrid exact + vector glossary retrieval | Accepted |
|
||||
|
||||
---
|
||||
|
||||
## Guides (Developer How-Tos)
|
||||
|
||||
| Guide | Description |
|
||||
|-------|-------------|
|
||||
| [testing-strategy.md](guides/testing-strategy.md) | Risk-based testing approach, what to test and how |
|
||||
|
||||
---
|
||||
|
||||
## Manuals (Operator Procedures)
|
||||
|
||||
_No operator manuals yet. Add as operational procedures are formalised._
|
||||
|
||||
---
|
||||
|
||||
## Research
|
||||
|
||||
_No research notes yet. Add technology evaluations as decisions are made._
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** New ADR created, new guide written, research note added.
|
||||
**Verification:** Every ADR in this index has a corresponding file. No files in subdirectories are missing from the index.
|
||||
|
||||
<!-- END SCOPE: reference-hub -->
|
||||
34
docs/reference/adrs/2026-04-29-async-celery-bridge.md
Normal file
34
docs/reference/adrs/2026-04-29-async-celery-bridge.md
Normal file
|
|
@ -0,0 +1,34 @@
|
|||
# ADR-001: Async Celery Bridge via New Event Loop Per Task
|
||||
|
||||
<!-- SCOPE: adr-001 | owner: ln-120 | generated: 2026-04-29 -->
|
||||
|
||||
**Status:** Accepted
|
||||
**Date:** 2026-04-29
|
||||
|
||||
## Context
|
||||
|
||||
FastAPI routes are async (asyncio). Celery workers run in synchronous Python processes. MongoDB (Motor) and other async clients cannot be shared across asyncio event loop boundaries. Tasks need to call async services (Gemini, GCS, MongoDB, TTS) that only have async APIs.
|
||||
|
||||
## Decision
|
||||
|
||||
Each Celery task creates a new `asyncio.EventLoop` via `asyncio.new_event_loop()` and runs its async implementation with `loop.run_until_complete(task_impl())`. The async implementation can freely use `await` with Motor, httpx, and other async clients. The event loop is closed in a `finally` block when the task completes.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Benefits:**
|
||||
- Async services work correctly inside Celery tasks
|
||||
- No shared mutable state between tasks
|
||||
- Each task is isolated — a failure does not corrupt another task's loop
|
||||
|
||||
**Trade-offs:**
|
||||
- Every task creates a new MongoDB connection (no connection pool reuse across tasks)
|
||||
- This is a known performance limitation; mitigation is connection pooling within the task's event loop lifetime
|
||||
- The pattern requires discipline: never `await` outside the task's own loop
|
||||
|
||||
**Known violations:** `ingest_and_ai.py` creates `AsyncIOMotorClient(settings.mongodb_uri)` directly instead of using a shared factory — should be extracted to a `get_task_db()` context manager.
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** If Celery adds native asyncio worker support, this ADR should be marked Deprecated and replaced.
|
||||
|
||||
<!-- END SCOPE: adr-001 -->
|
||||
43
docs/reference/adrs/2026-04-29-hybrid-glossary-retrieval.md
Normal file
43
docs/reference/adrs/2026-04-29-hybrid-glossary-retrieval.md
Normal file
|
|
@ -0,0 +1,43 @@
|
|||
# ADR-003: Hybrid Exact + Vector Glossary Retrieval
|
||||
|
||||
<!-- SCOPE: adr-003 | owner: ln-120 | generated: 2026-04-29 -->
|
||||
|
||||
**Status:** Accepted
|
||||
**Date:** 2026-04-29
|
||||
|
||||
## Context
|
||||
|
||||
Client glossaries contain brand-specific terminology that must appear verbatim in translations. Simple string matching works for exact terms but fails for morphological variants, synonyms, and related phrases. Vector similarity alone would miss exact mandatory terms and potentially return unrelated results above the similarity threshold.
|
||||
|
||||
## Decision
|
||||
|
||||
Two-pass retrieval in `services/glossary_service.py`:
|
||||
|
||||
| Pass | Method | Threshold | Limit |
|
||||
|------|--------|-----------|-------|
|
||||
| 1 | Exact string match on `source_term` | — | All |
|
||||
| 2 | Atlas Vector Search on `terms.embedding` | ≥ 0.75 cosine similarity | Top 20 |
|
||||
|
||||
Results are merged with exact matches first. Duplicates (same source term in both passes) are deduplicated. Total injected into the Gemini prompt is capped at 50 terms (`_MAX_TERMS_IN_PROMPT`). If the cap is reached, exact matches are kept over vector matches.
|
||||
|
||||
Embeddings are generated by a separate Celery task (`tasks/embed_glossary.py`) when a term is created or updated and stored in `terms.embedding` (float array). The Atlas Vector Search index is `glossary_embedding_index`.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Benefits:**
|
||||
- Mandatory brand terms always appear (exact match guarantees inclusion)
|
||||
- Related terms are surfaced without requiring exact phrasing (vector match)
|
||||
- 50-term cap prevents context window bloat in the Gemini prompt
|
||||
|
||||
**Trade-offs:**
|
||||
- Embeddings must be pre-generated — new terms are not searchable by vector until the embed task runs
|
||||
- Atlas Vector Search requires M10+ Atlas tier (not available on free tier)
|
||||
- Similarity threshold (0.75) is a tunable parameter; too low = noisy matches, too high = missed variants
|
||||
|
||||
**Known gaps:** No automated tests for exact-before-vector ordering, similarity threshold enforcement, or 50-term truncation. See test plan T-04 in `/tmp/audit/test-plan.md`.
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** Threshold tuned, embedding model changed, Atlas index rebuilt.
|
||||
|
||||
<!-- END SCOPE: adr-003 -->
|
||||
34
docs/reference/adrs/2026-04-29-jwt-memory-storage.md
Normal file
34
docs/reference/adrs/2026-04-29-jwt-memory-storage.md
Normal file
|
|
@ -0,0 +1,34 @@
|
|||
# ADR-002: Access Tokens Stored in JS Memory, Not localStorage
|
||||
|
||||
<!-- SCOPE: adr-002 | owner: ln-120 | generated: 2026-04-29 -->
|
||||
|
||||
**Status:** Accepted
|
||||
**Date:** 2026-04-29
|
||||
|
||||
## Context
|
||||
|
||||
SPAs need to persist access tokens for authenticated API calls. The traditional approach is `localStorage`, but this is vulnerable to XSS attacks — any injected script can read `localStorage` and exfiltrate tokens. The platform handles sensitive client video content, so the threat model warrants stronger protection.
|
||||
|
||||
## Decision
|
||||
|
||||
Access tokens (15-minute JWT) are stored in React context / Zustand in-memory state only. They are lost on page refresh. Refresh tokens (7-day JWT) are stored in HttpOnly cookies — inaccessible to JavaScript. On page load, the app silently calls `/auth/refresh` to obtain a new access token using the cookie. This exchange is transparent to the user.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Benefits:**
|
||||
- XSS cannot steal the access token — it is not in any DOM-accessible storage
|
||||
- Refresh tokens are protected by the browser's HttpOnly cookie isolation
|
||||
- Complies with OWASP token storage guidance
|
||||
|
||||
**Trade-offs:**
|
||||
- Page refresh requires a round-trip to `/auth/refresh` before the first authenticated API call
|
||||
- `Authorization` header must be set on every request (Axios interceptor handles this)
|
||||
- If the refresh endpoint is unavailable, the user must log in again
|
||||
|
||||
**Implementation:** `frontend/src/lib/auth.ts` — in-memory store. Axios interceptor in `frontend/src/lib/api.ts` attaches the token on every request and calls refresh on 401.
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** Token TTL changes, new auth provider added.
|
||||
|
||||
<!-- END SCOPE: adr-002 -->
|
||||
113
docs/reference/guides/testing-strategy.md
Normal file
113
docs/reference/guides/testing-strategy.md
Normal file
|
|
@ -0,0 +1,113 @@
|
|||
# Testing Strategy Guide
|
||||
|
||||
<!-- SCOPE: testing-strategy | owner: ln-140 | generated: 2026-04-29 -->
|
||||
|
||||
## Philosophy
|
||||
|
||||
Risk-Based Testing: **Priority = Business Impact (1–5) × Probability of failure (1–5)**.
|
||||
|
||||
| Priority | Decision | Example |
|
||||
|----------|----------|---------|
|
||||
| ≥ 15 | MUST test | RBAC logic, job state machine, audit logger |
|
||||
| 9–14 | SHOULD test | Translation pipeline, TTS routing |
|
||||
| ≤ 8 | SKIP (manual sufficient) | Email template rendering, UI cosmetics |
|
||||
|
||||
Write tests for business logic, not for framework behaviour. Never test that FastAPI routes a request — test that YOUR business logic in the handler produces the correct outcome.
|
||||
|
||||
---
|
||||
|
||||
## Current Coverage (as of 2026-04-29 audit)
|
||||
|
||||
| Layer | Files | Files with tests | Risk-weighted coverage |
|
||||
|-------|-------|-----------------|----------------------|
|
||||
| Backend | 118 | 8 (7%) | ~3% |
|
||||
| Frontend | 98 | ~12 (12%) | — |
|
||||
| E2E | — | 3 spec files | Effectively 0 (most tests skipped) |
|
||||
|
||||
**Critical gap:** All Celery tasks (10 files) and 19 service files have zero test coverage. See full audit at `/tmp/audit/test-audit.md`.
|
||||
|
||||
---
|
||||
|
||||
## Test Pyramid
|
||||
|
||||
| Level | Framework | Location | Current count |
|
||||
|-------|-----------|----------|--------------|
|
||||
| Unit (backend) | pytest + AsyncMock | `backend/tests/unit/` | 8 files, ~338 assertions |
|
||||
| Unit (frontend) | Vitest + RTL | `frontend/src/**/__tests__/` | 9 files, ~218 assertions |
|
||||
| Integration (backend) | pytest + FastAPI TestClient | `backend/tests/integration/` | Does not exist yet |
|
||||
| E2E | Playwright | `frontend/tests/e2e/` | 3 files, mostly skipped |
|
||||
|
||||
---
|
||||
|
||||
## Test Commands
|
||||
|
||||
| Command | What it runs |
|
||||
|---------|-------------|
|
||||
| `cd backend && poetry run pytest` | All backend unit tests |
|
||||
| `cd backend && poetry run pytest -v tests/unit/test_security.py` | Single test file |
|
||||
| `cd frontend && npm run test` | All frontend unit tests (Vitest) |
|
||||
| `cd frontend && npm run test:e2e` | Playwright E2E tests |
|
||||
| `cd frontend && npm run test:coverage` | Unit tests with coverage report |
|
||||
| `docker compose exec backend python -m pytest` | Tests inside Docker (for integration tests) |
|
||||
|
||||
---
|
||||
|
||||
## What Exists and Is High-Value
|
||||
|
||||
| Test file | Value | What it tests |
|
||||
|-----------|-------|--------------|
|
||||
| `backend/tests/unit/test_security.py` | HIGH | JWT encode/decode, expiry, type fields, password hashing |
|
||||
| `backend/tests/unit/test_vtt.py` | HIGH | VTT parsing (26 tests) |
|
||||
| `backend/tests/unit/test_vtt_retimer.py` | HIGH | VTT timing logic (27 tests) |
|
||||
| `frontend/src/lib/__tests__/auth.test.ts` | HIGH | JWT in-memory store, refresh flow |
|
||||
| `frontend/src/components/Auth/__tests__/RequireAuth.test.tsx` | HIGH | Auth guard redirect |
|
||||
|
||||
---
|
||||
|
||||
## Priority Gaps to Fill
|
||||
|
||||
The following are MUST-fill based on Priority ≥15:
|
||||
|
||||
| Priority | Module | Gap |
|
||||
|----------|--------|-----|
|
||||
| 25 | `tasks/ingest_and_ai.py` | Job state machine — zero tests |
|
||||
| 20 | `core/authz.py` | RBAC permission checks — zero tests |
|
||||
| 20 | `services/audit_logger.py` | Audit trail correctness — zero tests |
|
||||
| 20 | `services/glossary_service.py` | Hybrid retrieval — zero tests |
|
||||
| 16 | `services/language_qc.py` | QC state transitions — zero tests |
|
||||
| 16 | `tasks/translate_and_synthesize.py` | Translation pipeline — zero tests |
|
||||
|
||||
Full test plan at `/tmp/audit/test-plan.md`.
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns to Avoid
|
||||
|
||||
| Anti-pattern | Why | Fix |
|
||||
|-------------|-----|-----|
|
||||
| Hardcoded job IDs like `test-job-123` | Non-existent in test DB | Use factories to create real test data |
|
||||
| `with patch(...) as mock:` in every test method | Setup duplication | Move to `@pytest.fixture(autouse=True)` |
|
||||
| `MagicMock()` on async functions | Silently returns a mock, not a coroutine | Use `AsyncMock()` |
|
||||
| Testing that a library function was called | Tests library, not our logic | Test the business outcome |
|
||||
| E2E tests that are `.skip` | They provide no coverage | Implement auth fixture and un-skip |
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure Required Before Writing Integration/E2E Tests
|
||||
|
||||
| Blocker | What's needed |
|
||||
|---------|--------------|
|
||||
| Backend `conftest.py` | Shared `MockSettings`, `mock_db`, `test_user_factory`, `test_job_factory` |
|
||||
| Celery test mode | `task_always_eager=True` fixture for synchronous task execution |
|
||||
| Playwright auth fixture | Wire `tests/helpers/auth.ts` into `beforeEach` in all spec files |
|
||||
| Playwright seed fixture | `tests/fixtures/seed.ts` to create test jobs, glossary, linguists |
|
||||
| Mock AI responses | `tests/mocks/gemini-responses/*.json` fixtures |
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** New test file added, coverage target changes, new testing tool added.
|
||||
**Verification:** All commands in the Commands table execute without error. Priority gap table matches the current test-audit report.
|
||||
|
||||
<!-- END SCOPE: testing-strategy -->
|
||||
78
docs/tasks/README.md
Normal file
78
docs/tasks/README.md
Normal file
|
|
@ -0,0 +1,78 @@
|
|||
# Task Management — Accessible Video Processing Platform
|
||||
|
||||
<!-- SCOPE: tasks | owner: ln-130 | generated: 2026-04-29 -->
|
||||
|
||||
## Task Tracking
|
||||
|
||||
Tasks are tracked in conversation context and in the plan file at `~/.claude/plans/`. No external task tracker (Linear, Jira) is configured for this project.
|
||||
|
||||
---
|
||||
|
||||
## Task Conventions
|
||||
|
||||
| Convention | Rule |
|
||||
|-----------|------|
|
||||
| Status | `pending` → `in_progress` → `completed` |
|
||||
| Naming | Imperative verb phrase: "Fix login rate-limit bypass" |
|
||||
| Owner | Assigned agent or person |
|
||||
| Blocking | Security/data-loss tasks block all others |
|
||||
|
||||
---
|
||||
|
||||
## Active Work (as of 2026-04-29)
|
||||
|
||||
### Immediate Priority (Security Blockers)
|
||||
|
||||
| # | Task | File | Effort |
|
||||
|---|------|------|--------|
|
||||
| S-01 | Remove login endpoint from rate-limit bypass | `rate_limiting.py:165` | S |
|
||||
| S-02 | Add refresh token type check in `get_current_user` | `dependencies.py:23` | S |
|
||||
| S-03 | Generic exception message in refresh endpoint | `routes_auth.py:319` | S |
|
||||
| S-04 | Replace `requests` with `httpx.AsyncClient` in Microsoft SSO | `microsoft_auth.py:59,91` | M |
|
||||
| S-04b | Remove default admin password fallback | `seed.py:37` | S |
|
||||
|
||||
### Quality / Tech Debt
|
||||
|
||||
| # | Task | File | Effort |
|
||||
|---|------|------|--------|
|
||||
| Q-01 | Extract `broadcast_status_update()` to `tasks/utils.py` | `ingest_and_ai.py`, `translate_and_synthesize.py` | S |
|
||||
| Q-02 | Fix `cache_key` scope bug in `authz.py:71` | `authz.py` | S |
|
||||
| Q-03 | Replace all `print()` with `logger.debug()` in auth routes | `routes_auth.py` | S |
|
||||
| Q-04 | Replace `asyncio.get_event_loop()` with `asyncio.get_running_loop()` in `gcs.py` | `services/gcs.py` | S |
|
||||
| Q-05 | Fix MongoDB connection-per-login in auth routes | `routes_auth.py:44` | M |
|
||||
|
||||
### Test Coverage (Priority ≥15)
|
||||
|
||||
| # | Task | Target | Effort |
|
||||
|---|------|--------|--------|
|
||||
| T-01 | Create `backend/tests/conftest.py` with shared fixtures | All backend tests | M |
|
||||
| T-02 | Write RBAC unit tests for `authz.py` | `core/authz.py` | M |
|
||||
| T-03 | Write job state machine unit + integration tests | `tasks/ingest_and_ai.py` | L |
|
||||
| T-04 | Write audit logger unit tests | `services/audit_logger.py` | M |
|
||||
| T-05 | Write glossary hybrid retrieval unit tests | `services/glossary_service.py` | M |
|
||||
| T-06 | Implement Playwright auth fixture, un-skip E2E tests | `tests/helpers/auth.ts` | L |
|
||||
|
||||
---
|
||||
|
||||
## Backlog (Deferred)
|
||||
|
||||
| # | Task | Priority | Notes |
|
||||
|---|------|---------|-------|
|
||||
| B-01 | Add `pip-audit` + `npm audit` to CI | LOW | CI exists, no security scan step |
|
||||
| B-02 | Fix 53 B904 exception chain warnings (ruff) | LOW | `raise X from err` pattern |
|
||||
| B-03 | Fix 33 ESLint errors (mostly `no-explicit-any`) | LOW | No security impact |
|
||||
| B-04 | Fix B023 loop closure bug in translate_and_synthesize | MEDIUM | Safe in practice but violates best practices |
|
||||
| B-05 | Add nonce validation in Microsoft SSO | INFO | Replay protection |
|
||||
| B-06 | Validate `X-Forwarded-For` against trusted proxy list | MEDIUM | Rate limit bypass risk |
|
||||
| B-07 | Enable mypy in CI (run in Docker) | MEDIUM | Currently not in CI pipeline |
|
||||
| B-08 | VTT version control E2E tests | MEDIUM | Playwright spec needed |
|
||||
| B-09 | WebSocket reconnect unit tests | MEDIUM | `useJobStatusWebSocket.ts` stale closure |
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** Task completed, new task identified, priority changed.
|
||||
**Verification:** Security blockers (S-01 through S-04b) are resolved before next production deploy.
|
||||
|
||||
<!-- END SCOPE: tasks -->
|
||||
1476
docs/video_accessibility_user_guide_v3.md
Normal file
1476
docs/video_accessibility_user_guide_v3.md
Normal file
File diff suppressed because it is too large
Load diff
122
tests/README.md
Normal file
122
tests/README.md
Normal file
|
|
@ -0,0 +1,122 @@
|
|||
# Tests — Accessible Video Processing Platform
|
||||
|
||||
<!-- SCOPE: tests | owner: ln-140 | generated: 2026-04-29 -->
|
||||
|
||||
## Test Commands
|
||||
|
||||
### Backend
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `cd backend && poetry run pytest` | Run all unit tests |
|
||||
| `cd backend && poetry run pytest -v` | Verbose output |
|
||||
| `cd backend && poetry run pytest tests/unit/test_security.py` | Single file |
|
||||
| `cd backend && poetry run pytest -k "test_jwt"` | Keyword filter |
|
||||
| `docker compose exec api python -m pytest` | Tests inside Docker container |
|
||||
| `docker compose exec api python -m pytest --cov=app` | With coverage report |
|
||||
|
||||
### Frontend
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `cd frontend && npm run test` | Vitest unit tests (watch mode) |
|
||||
| `cd frontend && npm run test:run` | Vitest single run |
|
||||
| `cd frontend && npm run test:coverage` | Coverage report |
|
||||
| `cd frontend && npm run test:e2e` | Playwright E2E tests |
|
||||
| `cd frontend && npx playwright test --ui` | Playwright UI mode |
|
||||
|
||||
### Lint and Type Check (must pass before commit)
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `cd backend && ruff check .` | Python linting |
|
||||
| `cd backend && poetry run mypy app/` | Python type checking |
|
||||
| `cd frontend && npm run lint` | ESLint |
|
||||
| `cd frontend && npm run type-check` | TypeScript compile check |
|
||||
|
||||
---
|
||||
|
||||
## Test Structure
|
||||
|
||||
### Backend (`backend/tests/`)
|
||||
|
||||
| Directory | Purpose | Framework |
|
||||
|-----------|---------|-----------|
|
||||
| `tests/unit/` | Business logic unit tests | pytest |
|
||||
| `tests/fixtures/` | VTT and JSON test fixtures | — |
|
||||
| `tests/integration/` | FastAPI TestClient route tests | pytest (does not exist yet) |
|
||||
| `conftest.py` | Shared fixtures | pytest (does not exist yet) |
|
||||
|
||||
### Frontend (`frontend/src/` and `frontend/tests/`)
|
||||
|
||||
| Directory | Purpose | Framework |
|
||||
|-----------|---------|-----------|
|
||||
| `src/**/__tests__/` | Component unit tests | Vitest + RTL |
|
||||
| `src/hooks/__tests__/` | Hook tests | Vitest + RTL |
|
||||
| `src/lib/__tests__/` | Utility tests | Vitest |
|
||||
| `src/test/utils.tsx` | Shared test utilities | RTL |
|
||||
| `tests/e2e/` | End-to-end specs | Playwright |
|
||||
| `tests/helpers/auth.ts` | Auth fixture (exists, not yet wired) | Playwright |
|
||||
|
||||
---
|
||||
|
||||
## Coverage Targets
|
||||
|
||||
| Layer | Target | Current |
|
||||
|-------|--------|---------|
|
||||
| Backend unit | 80% line coverage on services/ | ~3% (critically low) |
|
||||
| Frontend unit | 70% branch coverage on hooks/ | ~12% |
|
||||
| E2E | All happy paths for QC workflow | 0% (tests skipped) |
|
||||
|
||||
---
|
||||
|
||||
## Writing New Tests
|
||||
|
||||
### Backend Unit Test Pattern
|
||||
|
||||
```python
|
||||
# Use AsyncMock for async service methods
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
async def test_something():
|
||||
with patch('app.services.gemini.settings') as mock_settings:
|
||||
mock_settings.gemini_api_key = "test-key"
|
||||
# test body
|
||||
```
|
||||
|
||||
Wait — **move the settings mock to a shared fixture in conftest.py**. See anti-patterns in [testing-strategy.md](../docs/reference/guides/testing-strategy.md).
|
||||
|
||||
### Frontend Hook Test Pattern
|
||||
|
||||
Use `renderHook` from `@testing-library/react` wrapped with `test/utils.tsx` providers.
|
||||
|
||||
### E2E Auth Pattern
|
||||
|
||||
Use `tests/helpers/auth.ts` in `beforeEach`:
|
||||
|
||||
```typescript
|
||||
// Wire auth fixture before un-skipping any test
|
||||
test.beforeEach(async ({ page }) => {
|
||||
await loginAs(page, 'reviewer');
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Known Issues
|
||||
|
||||
| Issue | Impact | Fix needed |
|
||||
|-------|--------|-----------|
|
||||
| `conftest.py` missing | All tests define fixtures inline | Create shared conftest with MockSettings, mock_db |
|
||||
| E2E tests mostly skipped | Zero E2E coverage | Implement auth + seed fixtures |
|
||||
| `MagicMock` used for async services | May silently pass on sync mocks | Replace with `AsyncMock` |
|
||||
| Hardcoded `test-job-123` in E2E | Tests would fail if un-skipped | Use seed fixtures |
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
**Update triggers:** New test file added, test framework version changed, new command available.
|
||||
**Verification:** All commands in the Commands section execute without error on a clean checkout.
|
||||
|
||||
<!-- END SCOPE: tests -->
|
||||
Loading…
Add table
Reference in a new issue