Compare commits

..

No commits in common. "main" and "dev" have entirely different histories.
main ... dev

384 changed files with 5716 additions and 31139 deletions

View file

@ -1,25 +0,0 @@
# Source Documentation Archive — 2026-04-29
## What was archived
Original non-canonical documentation files backed up before canonical structure was created.
## Files archived
| File | Migrated to |
|------|------------|
| `README.md` | Updated in place; canonical docs in `docs/` |
| `DEPLOYMENT.md` | `docs/project/runbook.md` + `docs/project/infrastructure.md` |
| `DEPLOYMENT_OPTIONS.md` | `docs/project/infrastructure.md` |
| `APACHE_DEPLOYMENT.md` | `docs/project/runbook.md` (Apache config section) |
## Rollback
To restore original files: copy from `original/` back to project root.
```
cp original/README.md ../../README.md
cp original/DEPLOYMENT.md ../../DEPLOYMENT.md
cp original/DEPLOYMENT_OPTIONS.md ../../DEPLOYMENT_OPTIONS.md
cp original/APACHE_DEPLOYMENT.md ../../APACHE_DEPLOYMENT.md
```

View file

@ -1,236 +0,0 @@
# Apache Frontend + Docker Backend Deployment Guide
## 🏗 Architecture Overview
**Frontend**: Built React app served by your existing Apache webserver
**Backend**: Docker containers running FastAPI + workers + database
```
Apache Webserver (Frontend) → Docker Backend Services
└── Built React App ├── FastAPI API (:8000)
├── Celery Workers
├── Change Stream Service
├── MongoDB
└── Redis
```
## 🚀 Deployment Steps
### 1. **Deploy Backend Services**
```bash
# 1. Create production environment file
cp .env.prod.example .env.prod
# Edit .env.prod with your production values
# 2. Start backend services only
docker-compose -f docker-compose.prod.yml up -d
# 3. Verify services are running
docker-compose -f docker-compose.prod.yml ps
```
**Running Services:**
- `accessible-video-api-prod` - FastAPI API (port 8000)
- `accessible-video-worker-prod` - Celery workers
- `accessible-video-mongo-prod` - MongoDB database
- `accessible-video-redis-prod` - Redis cache/queue
### 2. **Build and Deploy Frontend to Apache**
```bash
# 1. Configure frontend environment
cd frontend
cp .env.example .env.production.local
# Edit .env.production.local:
# VITE_API_URL=https://your-api-domain.com:8000
# VITE_SENTRY_DSN=your-sentry-dsn
# VITE_ENVIRONMENT=production
# 2. Build production frontend
npm run build
# 3. Deploy to Apache document root
sudo cp -r dist/* /var/www/html/your-app/
# OR
sudo rsync -av --delete dist/ /var/www/html/your-app/
```
### 3. **Configure Apache Virtual Host**
Create `/etc/apache2/sites-available/your-app.conf`:
```apache
<VirtualHost *:443>
ServerName your-domain.com
ServerAlias www.your-domain.com
DocumentRoot /var/www/html/your-app
# SSL Configuration
SSLEngine on
SSLCertificateFile /path/to/your/certificate.crt
SSLCertificateKeyFile /path/to/your/private.key
# Security Headers
Header always set X-Frame-Options "SAMEORIGIN"
Header always set X-Content-Type-Options "nosniff"
Header always set X-XSS-Protection "1; mode=block"
Header always set Referrer-Policy "strict-origin-when-cross-origin"
Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
# Compression
<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript
</IfModule>
# Caching for static assets
<LocationMatch "\.(css|js|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$">
ExpiresActive On
ExpiresDefault "access plus 1 year"
Header set Cache-Control "public, immutable"
</LocationMatch>
# Don't cache HTML files
<LocationMatch "\.html$">
ExpiresActive On
ExpiresDefault "access plus 0 seconds"
Header set Cache-Control "no-cache, no-store, must-revalidate"
</LocationMatch>
# React Router support (handle client-side routing)
<Directory "/var/www/html/your-app">
Options -Indexes
AllowOverride All
Require all granted
# Fallback to index.html for client-side routing
FallbackResource /index.html
</Directory>
# Optional: Proxy API requests (alternative to CORS)
# ProxyPreserveHost On
# ProxyPass /api/ http://your-docker-host:8000/api/
# ProxyPassReverse /api/ http://your-docker-host:8000/api/
# Logs
ErrorLog ${APACHE_LOG_DIR}/your-app_error.log
CustomLog ${APACHE_LOG_DIR}/your-app_access.log combined
</VirtualHost>
# HTTP to HTTPS redirect
<VirtualHost *:80>
ServerName your-domain.com
ServerAlias www.your-domain.com
Redirect permanent / https://your-domain.com/
</VirtualHost>
```
Enable the site:
```bash
sudo a2ensite your-app.conf
sudo systemctl reload apache2
```
## ⚙️ Configuration Files Updated
### `docker-compose.prod.yml`
- ✅ Removed frontend and nginx services
- ✅ Added CORS_ORIGINS environment variable
- ✅ Backend services only (API, workers, database)
### `.env.prod.example`
- ✅ Production environment template
- ✅ CORS configuration for Apache frontend
- ✅ All required variables documented
## 🔧 CORS Configuration
Since frontend and backend are on different domains, configure CORS in your backend:
**In `.env.prod`:**
```bash
CORS_ORIGINS=https://your-domain.com,https://www.your-domain.com
```
**Backend automatically handles CORS** based on this environment variable.
## 📋 Deployment Checklist
### Backend Services
- [ ] Copy `.env.prod.example` to `.env.prod`
- [ ] Update all environment variables in `.env.prod`
- [ ] Run `docker-compose -f docker-compose.prod.yml up -d`
- [ ] Verify API accessible at `http://your-docker-host:8000/docs`
- [ ] Check logs: `docker-compose -f docker-compose.prod.yml logs -f`
### Frontend Deployment
- [ ] Update `frontend/.env.production.local` with API URL
- [ ] Run `npm run build` in frontend directory
- [ ] Copy `dist/*` to Apache document root
- [ ] Configure Apache virtual host
- [ ] Enable site and reload Apache
- [ ] Test frontend loads and connects to API
### Security & Performance
- [ ] SSL certificate configured
- [ ] Security headers enabled
- [ ] Gzip compression enabled
- [ ] Static file caching configured
- [ ] CORS origins properly set
- [ ] Firewall rules: only expose port 8000 for API
## 🔍 Troubleshooting
### Common Issues
**CORS Errors:**
- Verify `CORS_ORIGINS` in `.env.prod` matches your domain
- Check browser dev tools for exact error
**API Connection Failed:**
- Verify `VITE_API_URL` in frontend build
- Check backend API is accessible from frontend server
- Ensure port 8000 is open and reachable
**React Router 404s:**
- Verify `FallbackResource /index.html` in Apache config
- Ensure `AllowOverride All` is set
**File Upload Issues:**
- Check Apache `LimitRequestBody` directive
- Verify backend can write to GCS bucket
### Monitoring Commands
```bash
# Backend services status
docker-compose -f docker-compose.prod.yml ps
# View logs
docker-compose -f docker-compose.prod.yml logs -f api
docker-compose -f docker-compose.prod.yml logs -f worker
# Apache status
sudo systemctl status apache2
sudo tail -f /var/log/apache2/your-app_error.log
```
## 🎯 Benefits of This Setup
**Separation of Concerns** - Frontend and backend independently deployable
**Existing Infrastructure** - Uses your current Apache setup
**Scalability** - Backend can be moved to different hosts easily
**Caching** - Apache handles static file caching efficiently
**SSL Termination** - Apache handles HTTPS for frontend
**Monitoring** - Separate logs and monitoring for each tier
Your backend services will run in Docker containers while the frontend integrates seamlessly with your existing Apache web server infrastructure.

View file

@ -1,168 +0,0 @@
# Deployment Options for Video Accessibility Platform
## 🏗 Current Docker Setup
Your `docker-compose.yml` serves **both frontend and backend** in **development mode**:
- **Frontend**: Vite dev server on port 5173 (hot reload)
- **Backend**: FastAPI on port 8000 (auto-reload)
- **Database**: MongoDB + Redis
- **Workers**: Celery + Change Stream service
## 🚀 Production Deployment Options
### 1. **All-in-Docker Production** ✅ Recommended
**What it does:**
- Frontend: Built React app served by Nginx (port 80)
- Backend: Production FastAPI (port 8000)
- Single `docker-compose up` deployment
**Usage:**
```bash
# Production deployment
docker-compose -f docker-compose.prod.yml up -d
# Access:
# Frontend: http://localhost:80
# Backend API: http://localhost:8000
```
**Benefits:**
- ✅ Single command deployment
- ✅ Optimized frontend build
- ✅ Production-ready configuration
- ✅ Built-in health checks
- ✅ Nginx caching and compression
### 2. **Single Domain with Nginx Proxy** ✅ Best UX
**What it does:**
- Everything served from one domain (port 80)
- `/api/*` routes to backend
- `/*` routes to frontend
- WebSocket support included
**Usage:**
```bash
# Uses nginx/nginx.conf for routing
docker-compose -f docker-compose.prod.yml up nginx
# Access everything at: http://localhost
```
**Benefits:**
- ✅ No CORS issues
- ✅ Single domain simplicity
- ✅ Better caching control
- ✅ Rate limiting built-in
- ✅ SSL termination ready
### 3. **Cloud-Native (Google Cloud)** 🌟 Enterprise
**Architecture:**
```
Frontend (Cloud Storage + CDN) → API (Cloud Run) → Database (MongoDB Atlas)
Workers (Cloud Run)
```
**Components:**
- **Frontend**: Build + deploy to Cloud Storage, serve via Cloud CDN
- **Backend**: Deploy to Cloud Run (auto-scaling)
- **Workers**: Separate Cloud Run service for Celery
- **Database**: MongoDB Atlas (managed)
- **Files**: Google Cloud Storage (already integrated)
**Benefits:**
- ✅ Auto-scaling
- ✅ Global CDN
- ✅ Managed services
- ✅ Pay-per-use
- ✅ High availability
## 📊 Comparison Matrix
| Option | Complexity | Cost | Scalability | Maintenance |
|--------|------------|------|-------------|-------------|
| **Dev Docker** | Low | Very Low | Limited | Manual |
| **Prod Docker** | Low | Low | Manual | Medium |
| **Nginx Proxy** | Medium | Low | Manual | Medium |
| **Cloud Native** | High | Variable | Automatic | Low |
## 🚀 Quick Migration Guide
### From Development → Production Docker
1. **Update environment variables:**
```bash
cp .env.example .env.prod
# Edit .env.prod with production values
```
2. **Deploy:**
```bash
docker-compose -f docker-compose.prod.yml up -d
```
3. **Verify:**
```bash
# Frontend (optimized build)
curl http://localhost:80
# Backend API
curl http://localhost:8000/health
```
### From Docker → Cloud Native
1. **Build frontend:**
```bash
cd frontend && npm run build
gsutil -m rsync -r -d dist/ gs://your-bucket/
```
2. **Deploy backend:**
```bash
gcloud run deploy video-api --source=./backend --region=us-central1
```
3. **Deploy workers:**
```bash
gcloud run deploy video-workers --source=./backend --region=us-central1
```
## 🔧 Configuration Files Created
### `docker-compose.prod.yml`
- Production-ready Docker setup
- Nginx serving frontend
- Optimized environment variables
- Health checks included
### `nginx/nginx.conf`
- Single-domain routing configuration
- API proxy with rate limiting
- WebSocket support
- Static file caching
- Security headers
## 🎯 Recommendations by Use Case
### **Small Team / MVP**
→ Use **Production Docker** (`docker-compose.prod.yml`)
### **Growing Business**
→ Use **Nginx Proxy** setup for better performance
### **Enterprise / Scale**
→ Go **Cloud Native** with Google Cloud Run + CDN
## 🔍 Current Status
**Development**: Already working with `docker-compose up`
**Production Docker**: Ready with `docker-compose.prod.yml`
**Nginx Proxy**: Configured and ready to deploy
⚠️ **Cloud Native**: Requires GCP setup and configuration
Your current Docker setup is **development-optimized**. For production, use the new `docker-compose.prod.yml` which properly builds and serves the React app through Nginx while keeping the backend API separate but coordinated.

View file

@ -1,384 +0,0 @@
# Accessible Video Processing Platform
A comprehensive AI-powered platform for generating accessible video content with closed captions, audio descriptions, and multi-language translations. Features a complete workflow from video upload to final delivery with quality control processes.
## ✅ Current Status: **Production-Ready** (85% Complete)
**Lines of Code:** 20,471 total (12,198 backend + 8,273 frontend)
## 🚀 Key Features Implemented
### Core Functionality ✅
- **AI-Powered Processing**: Complete Gemini 2.5 Pro integration for intelligent caption and audio description generation
- **Multi-Language Pipeline**: Google Translate + cultural transcreation with 50+ language support
- **Quality Control Workflow**: Full reviewer approval/rejection system with VTT editing capabilities
- **Audio Description TTS**: Google Cloud TTS and ElevenLabs integration with audio synthesis
- **Real-time Updates**: WebSocket-powered job status tracking and notifications
- **Advanced Video Player**: Multi-language caption support with timeline navigation
- **Role-Based Access Control**: Complete CLIENT/REVIEWER/ADMIN role system
### Security & Infrastructure ✅
- **JWT Authentication**: Secure access/refresh token system with HttpOnly cookies
- **Audit Logging**: Comprehensive audit trail for all reviewer actions
- **Signed URLs**: Secure Google Cloud Storage file access (24h expiry)
- **Input Validation**: Complete request validation and sanitization
- **HTTPS/CORS**: Production-ready security configuration
### User Experience ✅
- **Responsive Design**: Mobile-first Tailwind CSS implementation
- **Real-time Feedback**: Live job progress tracking and notifications
- **Advanced File Management**: Drag-and-drop uploads with progress indicators
- **VTT Editor**: Inline caption editing with live preview
- **Download Portal**: Secure asset delivery with organized file structure
## 🛠 Tech Stack
### Backend (FastAPI + Python 3.11)
- **FastAPI 0.115.0** - Modern async web framework with OpenAPI documentation
- **Celery 5.3.4** - Distributed task queue with Redis broker
- **MongoDB 7.0** - Document database with replica set support
- **Redis 7.2** - Caching and message queuing
- **Google Cloud Platform** - Storage, AI services, Secret Manager, TTS
- **Pydantic 2.5** - Data validation and serialization
- **OpenTelemetry** - Observability and monitoring
- **Sentry** - Error tracking and performance monitoring
### Frontend (React 19 + TypeScript)
- **React 19.1.1** - Modern UI framework with latest features
- **Vite 7.1.2** - Lightning-fast build tool and dev server
- **TypeScript 5.8** - Full type safety throughout application
- **TanStack Query 5.85** - Advanced server state management with caching
- **React Router 7.8** - Client-side routing with protected routes
- **Tailwind CSS 4.1** - Utility-first CSS framework
- **Zustand 5.0** - Lightweight client state management
- **React Hook Form + Zod** - Form handling with schema validation
## 🏗 Architecture Overview
### Complete Job Processing Pipeline ✅
```
Upload → Ingestion → AI Processing → QC Review → Translation → TTS → Final Review → Delivery
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
GCS Gemini 2.5 VTT Generation Human Google Text-to- Reviewer Email +
Storage Pro + Validation Review Translate Speech Approval Downloads
```
### System Architecture
- **Monorepo Structure**: `/backend`, `/frontend`, `/infra` with clear separation
- **Microservices Ready**: Modular FastAPI services with proper dependency injection
- **Event-Driven**: WebSocket real-time updates with connection management
- **Scalable Workers**: Celery task queue with auto-retry and error recovery
- **Secure by Design**: RBAC, signed URLs, audit logging, input validation
## 🚀 Getting Started
### Prerequisites
- **Python 3.11+** (backend development)
- **Node.js 18+** (frontend development)
- **Docker & Docker Compose** (required for local development)
- **Google Cloud Project** with APIs enabled (for video processing)
### 🐳 Local Development with Docker (Recommended)
This is the recommended approach for local development. Backend services run in Docker containers while the frontend runs via Vite dev server for fast hot-reload.
#### Initial Setup
```bash
# 1. Clone the repository
git clone <repository>
cd video_accessibility
# 2. Copy and configure environment files
cp .env.prod.example .env.local
# Edit .env.local with your API keys and settings
# 3. Set up frontend environment
cp frontend/.env.example frontend/.env.local
# The defaults should work for local development
# 4. Ensure GCP credentials are in place
# Copy your GCP service account JSON to: ./secrets/gcp-credentials.json
```
#### Starting the Development Environment
**Step 1: Start Backend Services (Docker)**
```bash
# Start API, Worker, MongoDB, and Redis in Docker
./scripts/run-local.sh
# Services will be available at:
# - API: http://localhost:8003
# - API Docs: http://localhost:8003/docs
# - MongoDB: mongodb://localhost:27017
# - Redis: redis://localhost:6379
```
**Step 2: Start Frontend (Vite Dev Server)**
```bash
# In a separate terminal
cd frontend
npm install # First time only
npm run dev
# Frontend will be available at:
# - Application: http://localhost:6001/video-accessibility
```
#### Useful Commands
```bash
# View logs
docker compose logs -f api # API logs
docker compose logs -f worker # Worker logs
docker compose logs -f # All logs
# Restart a service
docker compose restart api
docker compose restart worker
# Rebuild and restart (after code changes)
./scripts/run-local.sh --rebuild
# Stop all services
./scripts/run-local.sh --stop
# or
docker compose down
```
#### Test User Credentials (Local Development Only)
For testing different user roles locally:
```
Admin: admin@example.com / admin
Production: production@example.com / production
Reviewer: reviewer@example.com / reviewer
Client: client@example.com / client123
```
**Note**: These test users are only for local development. Production uses Microsoft authentication.
### Alternative: Native Development (Without Docker)
For development without Docker, you'll need to run each service manually:
```bash
# Terminal 1: MongoDB
mongod --dbpath ./data/db
# Terminal 2: Redis
redis-server
# Terminal 3: Backend API
cd backend
poetry install
poetry run uvicorn app.main:app --reload --port 8000
# Terminal 4: Celery Worker
cd backend
poetry run celery -A app.tasks worker --loglevel=info
# Terminal 5: Frontend
cd frontend
npm install
npm run dev
```
**Note**: The Docker approach is strongly recommended as it ensures consistency and simplifies setup.
### Testing & Quality
```bash
# Backend tests + linting
cd backend
poetry run pytest
poetry run ruff check .
poetry run mypy .
# Frontend tests + linting
cd frontend
npm run test
npm run test:e2e
npm run lint
npm run type-check
```
## 📁 Project Structure
```
video_accessibility/ # Root monorepo
├── backend/ # FastAPI Python backend (12,198 LOC)
│ ├── app/
│ │ ├── api/v1/ # REST API endpoints
│ │ │ ├── auth.py # JWT authentication
│ │ │ ├── jobs.py # Job CRUD & workflow
│ │ │ ├── admin.py # Admin operations
│ │ │ └── files.py # File management
│ │ ├── core/ # Core configuration
│ │ ├── models/ # Database models
│ │ ├── schemas/ # Pydantic request/response schemas
│ │ ├── services/ # External service integrations
│ │ │ ├── gemini.py # AI processing
│ │ │ ├── gcs.py # Google Cloud Storage
│ │ │ ├── translation.py # Multi-language support
│ │ │ └── tts.py # Text-to-speech
│ │ ├── tasks/ # Celery background workers
│ │ ├── middleware/ # Request processing
│ │ └── telemetry/ # Observability
│ ├── tests/ # Comprehensive test suite
│ └── Dockerfile # Container configuration
├── frontend/ # React TypeScript SPA (8,273 LOC)
│ ├── src/
│ │ ├── routes/ # Page components
│ │ │ ├── auth/ # Login system
│ │ │ ├── jobs/ # Job management
│ │ │ ├── qc/ # Quality control
│ │ │ └── admin/ # Admin interface
│ │ ├── components/ # Reusable UI components
│ │ │ ├── VideoWithCaptions.tsx # Advanced video player
│ │ │ ├── VttEditor.tsx # Caption editing
│ │ │ └── UploadDropzone.tsx # File upload
│ │ ├── lib/ # Utilities and API client
│ │ ├── hooks/ # Custom React hooks
│ │ └── types/ # TypeScript definitions
│ ├── tests/ # Unit + E2E tests
│ ├── .env.local # Local development config
│ └── Dockerfile # Container configuration
├── scripts/
│ ├── run-local.sh # Local development startup
│ ├── deploy.sh # Production deployment
│ ├── full-deploy.sh # Full production rebuild
│ └── build-frontend.sh # Frontend build script
├── docker-compose.yml # Base Docker configuration
├── docker-compose.local.yml # Local development overrides
├── docker-compose.prod.yml # Production overrides
├── .env.local # Local environment variables
├── .env.production # Production environment variables
├── CLAUDE.md # Development guidelines
└── video_accessibility_development_plan.txt # Complete specification
```
## ⚙️ Configuration
### Environment Variables
**Backend** (`backend/.env`):
```bash
# Database
MONGODB_URL=mongodb://admin:password@localhost:27017/accessible_video
REDIS_URL=redis://localhost:6379/0
# Authentication
JWT_SECRET_KEY=your-jwt-secret
JWT_REFRESH_SECRET_KEY=your-refresh-secret
# AI Services
GEMINI_API_KEY=your-gemini-key
ELEVENLABS_API_KEY=your-elevenlabs-key
# Google Cloud
GCS_BUCKET_NAME=your-bucket-name
GOOGLE_CLOUD_PROJECT=your-project-id
# Email
SENDGRID_API_KEY=your-sendgrid-key
# Monitoring
SENTRY_DSN=your-sentry-dsn
```
**Frontend** (`frontend/.env`):
```bash
VITE_API_URL=http://localhost:8000
VITE_SENTRY_DSN=your-sentry-dsn
VITE_ENVIRONMENT=development
```
### Google Cloud Setup
1. **Create GCP Project** with billing enabled
2. **Enable APIs**:
- Cloud Storage API
- Cloud Translation API
- Cloud Text-to-Speech API
- Vertex AI API (for Gemini)
- Secret Manager API
3. **Create Service Account** with roles:
- Storage Admin
- AI Platform Admin
- Secret Manager Admin
4. **Download JSON key** and set `GOOGLE_APPLICATION_CREDENTIALS`
## 🚢 Deployment Options
### Production Architecture (Google Cloud)
- **Frontend**: Cloud Storage + Cloud CDN (static hosting)
- **Backend API**: Cloud Run (serverless, auto-scaling)
- **Workers**: Cloud Run (Celery with Redis)
- **Database**: MongoDB Atlas (managed)
- **Queue**: Cloud Memorystore (Redis)
- **Storage**: Google Cloud Storage
- **Monitoring**: Cloud Monitoring + Sentry
### Docker Production
```bash
# Build production images
docker-compose -f docker-compose.prod.yml up -d
```
## 🔒 Security Features
### Implemented Security ✅
- **JWT Authentication**: Access (15min) + refresh (7 days) token rotation
- **RBAC System**: CLIENT/REVIEWER/ADMIN roles with endpoint protection
- **Secure Storage**: HttpOnly cookies for refresh tokens
- **File Security**: Signed URLs with 24h expiry, no client access to raw files
- **Input Validation**: Comprehensive Pydantic validation on all endpoints
- **Audit Logging**: Complete trail of all reviewer actions and system events
- **CORS Protection**: Configured for production domains
- **Rate Limiting**: Request throttling and validation middleware
## 🔧 API Documentation
### Key Endpoints Implemented
```
POST /api/v1/auth/login # Authentication
POST /api/v1/jobs # Create job with file upload
GET /api/v1/jobs # List jobs (filtered by role)
GET /api/v1/jobs/{id} # Job details with real-time status
POST /api/v1/jobs/{id}/actions/* # Workflow actions (approve/reject/complete)
GET /api/v1/jobs/{id}/vtt # VTT content retrieval
PATCH /api/v1/jobs/{id}/vtt # VTT editing and updates
GET /api/v1/jobs/{id}/downloads # Signed download URLs
WS /api/v1/ws/jobs/{id} # Real-time job status updates
```
**OpenAPI Documentation**: http://localhost:8000/docs
## 🎯 Development Status
### ✅ Completed (Production Ready)
- **User Management**: Full authentication, RBAC, password management
- **Job Pipeline**: Complete video processing workflow with state machine
- **Quality Control**: VTT editor, approval workflows, reviewer dashboards
- **Real-time Features**: WebSocket updates, live notifications
- **Multi-language**: Translation pipeline with cultural transcreation
- **File Management**: Secure uploads, downloads, asset validation
- **Admin Features**: User management, system monitoring, audit logs
### ⚠️ Needs Attention (Minor)
- **Integration Tests**: Framework exists but needs completion
- **Email Templates**: Service implemented, templates may need customization
- **Performance Testing**: No load testing implemented yet
- **Documentation**: API docs complete, user guides could be enhanced
### 🎯 Recommended Next Steps
1. **Complete integration test suite** for end-to-end validation
2. **Performance testing** with realistic video processing loads
3. **Production deployment** configuration and CI/CD pipeline
4. **User documentation** and training materials
5. **Monitoring dashboards** for production operations
## 📚 Development Resources
- **Complete Specification**: `video_accessibility_development_plan.txt`
- **Development Guidelines**: `CLAUDE.md`
- **API Documentation**: http://localhost:8000/docs (when running)
- **Test Coverage Reports**: `backend/htmlcov/` (after running tests)

View file

@ -1,94 +0,0 @@
{
"permissions": {
"allow": [
"WebSearch",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/backend && ruff check app/services/elevenlabs_voices.py app/services/tts.py app/api/v1/routes_tts.py app/models/job.py app/tasks/tts_synthesis.py app/core/config.py 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/backend && python -m ruff check app/services/elevenlabs_voices.py app/services/tts.py app/api/v1/routes_tts.py app/models/job.py app/tasks/tts_synthesis.py app/core/config.py 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/backend && pip3 show ruff 2>&1 | head -5; which pip3 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/frontend && npm run type-check 2>&1 | tail -20)",
"Bash(node_modules/.bin/tsc --noEmit 2>&1 | tail -20)",
"Bash(./node_modules/.bin/tsc --noEmit 2>&1 | tail -30)",
"Bash(npm run type-check 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/frontend && npm run type-check 2>&1)",
"Bash(npm run lint 2>&1)",
"WebFetch(domain:dcmp.org)",
"WebFetch(domain:www.w3.org)",
"WebFetch(domain:partnerhelp.netflixstudios.com)",
"WebFetch(domain:m.media-amazon.com)",
"WebFetch(domain:www.acb.org)",
"Bash(./node_modules/.bin/tsc --noEmit)",
"Bash(node_modules/.bin/tsc --noEmit)",
"Bash(pandoc --version)",
"WebFetch(domain:ai-sandbox.oliver.solutions)",
"Bash(gcloud run:*)",
"Bash(gcloud logging:*)",
"Bash(ssh optical:*)",
"Bash(/Volumes/SSD/Projects/Oliver/video-accessibility/backend/.venv/bin/python3.11 -c \"import sys; sys.path.insert\\(0, '.'\\); from app.models.user import UserRole; print\\([r.value for r in UserRole]\\)\")",
"Bash(npm list *)",
"Bash(brew list *)",
"Bash(npx --yes puppeteer --version)",
"Bash(node md_to_pdf.js)",
"Bash(npm root *)",
"Bash(node *)",
"Bash(ssh optical-web-1 *)",
"Bash(git *)",
"WebFetch(domain:docs.anthropic.com)",
"Bash(poetry lock *)",
"Bash(pip show *)",
"Read(//Users/ai_leed/.local/bin/**)",
"Read(//opt/homebrew/bin/**)",
"Bash(pip3 install *)",
"Bash(poetry --version)",
"Bash(docker run *)",
"Read(//Users/ai_leed/.docker/run/**)",
"Bash(docker context *)",
"Bash(DOCKER_HOST=unix:///var/run/docker.sock docker run --rm -v \"$\\(pwd\\):/app\" -w /app python:3.11-slim bash -c \"pip install poetry==1.8.2 -q && poetry lock --no-update\")",
"Bash(brew install *)",
"Bash(npm run *)",
"Bash(scp /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/models/audit_log.py optical:/tmp/audit_log.py)",
"Bash(scp *)",
"Bash(kill %1)",
"Bash(ssh optical-dev *)",
"Skill(fullstack-dev-skills:security-reviewer)",
"Bash(chmod +x *)",
"Bash(gcloud auth *)",
"Bash(gcloud config *)",
"Bash(gcloud artifacts *)",
"Bash(sed -n '190,200p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '1914,1922p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '2048,2062p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '2490,2502p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '2628,2638p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(gcloud builds submit *)",
"Bash(gcloud builds describe 79802b34-e17b-4446-b01d-68d99d569262 *)",
"Bash(gcloud compute instances list *)",
"Bash(gcloud compute networks vpc-access connectors list *)",
"Bash(gcloud builds *)",
"Bash(gcloud projects get-iam-policy optical-414516 *)",
"Bash(gcloud projects *)",
"Bash(npm audit *)",
"Skill(codebase-audit-suite:ln-622-build-auditor)",
"Skill(codebase-audit-suite:ln-624-code-quality-auditor)",
"Skill(codebase-audit-suite:ln-625-dependencies-auditor)",
"Skill(codebase-audit-suite:ln-626-dead-code-auditor)",
"Bash(/opt/homebrew/bin/ruff check *)",
"Bash(npm test *)",
"Bash(sed -n '35,42p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/src/test/utils.tsx)",
"Bash(sed -n '55,90p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/tests/helpers/auth.ts)",
"Bash(sed -n '48,60p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/src/components/Layout/Sidebar.tsx)",
"Bash(sed -n '152,170p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/src/components/Layout/Sidebar.tsx)",
"Bash(poetry env *)",
"Bash(poetry install *)",
"Bash(poetry run *)",
"Bash(docker info *)",
"Bash(sed -n '1,30p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/services/gcs.py)",
"Bash(sed -n '155,165p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/services/gcs.py)",
"Bash(gcloud secrets *)",
"Bash(openssl rand *)",
"Bash(ssh *)",
"Skill(commit-commands:commit-push-pr)",
"Bash(obsidian read *)",
"Bash(obsidian search *)"
]
}
}

View file

@ -10,8 +10,6 @@ REDIS_URL=redis://redis:6379/0
# JWT Authentication # JWT Authentication
JWT_SECRET_KEY=your-production-jwt-secret-key-min-32-chars JWT_SECRET_KEY=your-production-jwt-secret-key-min-32-chars
JWT_REFRESH_SECRET_KEY=your-production-refresh-secret-key-min-32-chars JWT_REFRESH_SECRET_KEY=your-production-refresh-secret-key-min-32-chars
# Required: admin account created on first boot. Unset = admin not seeded.
DEFAULT_ADMIN_PASSWORD=your-secure-admin-password
# AI Services # AI Services
GEMINI_API_KEY=your-gemini-api-key GEMINI_API_KEY=your-gemini-api-key
@ -21,11 +19,8 @@ ELEVENLABS_API_KEY=your-elevenlabs-api-key
GCS_BUCKET_NAME=your-production-bucket-name GCS_BUCKET_NAME=your-production-bucket-name
GOOGLE_CLOUD_PROJECT=your-gcp-project-id GOOGLE_CLOUD_PROJECT=your-gcp-project-id
# Email Service (Mailgun) # Email Service
SENDGRID_API_KEY= SENDGRID_API_KEY=your-sendgrid-api-key
MAILGUN_API_KEY=your-mailgun-api-key
MAILGUN_DOMAIN=mg.oliver.solutions
MAILGUN_FROM=noreply@mg.oliver.solutions
# Monitoring # Monitoring
SENTRY_DSN=your-sentry-dsn-url SENTRY_DSN=your-sentry-dsn-url

View file

@ -9,18 +9,18 @@
# App Configuration # App Configuration
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
APP_ENV=prod APP_ENV=prod
API_BASE_URL=https://optical-dev.oliver.solutions/video-accessibility API_BASE_URL=https://ai-sandbox.oliver.solutions/video-accessibility-back
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Authentication & Security # Authentication & Security
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# IMPORTANT: Generate a secure random secret for JWT_SECRET # IMPORTANT: Generate a secure random secret for JWT_SECRET
# Example: openssl rand -hex 32 # Example: openssl rand -hex 32
JWT_SECRET=d81fd31798510f53b374951908b6bedd75f7ddaabe9b4e4c4ca5bf81393f48b7 JWT_SECRET=CHANGE_ME_TO_SECURE_RANDOM_64_CHAR_STRING
JWT_ALG=HS256 JWT_ALG=HS256
JWT_ACCESS_TTL_MIN=240 JWT_ACCESS_TTL_MIN=240
JWT_REFRESH_TTL_DAYS=7 JWT_REFRESH_TTL_DAYS=7
COOKIE_DOMAIN=optical-dev.oliver.solutions COOKIE_DOMAIN=ai-sandbox.oliver.solutions
COOKIE_SECURE=true COOKIE_SECURE=true
COOKIE_SAMESITE=Lax COOKIE_SAMESITE=Lax
@ -63,31 +63,29 @@ TRANSLATE_API_KEY=
ELEVENLABS_API_KEY=sk_c17be2768ca784f1807018420b84c7f1ee969946e698f986 ELEVENLABS_API_KEY=sk_c17be2768ca784f1807018420b84c7f1ee969946e698f986
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Email Configuration (Mailgun) # Email Configuration (SendGrid)
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# IMPORTANT: Get SendGrid API key from https://app.sendgrid.com/settings/api_keys
SENDGRID_API_KEY= SENDGRID_API_KEY=
MAILGUN_API_KEY=1d8c6f38c53f237305353cc2e55f39f2-c6620443-4b9961f5
MAILGUN_DOMAIN=mg.oliver.solutions
MAILGUN_FROM=noreply@mg.oliver.solutions
# Email sender address # Email sender address (must be verified in SendGrid)
EMAIL_FROM=noreply@mg.oliver.solutions EMAIL_FROM=noreply@ai-sandbox.oliver.solutions
# Client-facing URL (used in emails) # Client-facing URL (used in emails)
CLIENT_BASE_URL=https://optical-dev.oliver.solutions/video-accessibility CLIENT_BASE_URL=https://ai-sandbox.oliver.solutions/video-accessibility
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Microsoft Authentication (Azure AD) # Microsoft Authentication (Azure AD)
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
AZURE_CLIENT_ID=9079054c-9620-4757-a256-23413042f1ef AZURE_CLIENT_ID=9079054c-9620-4757-a256-23413042f1ef
AZURE_AUTHORITY=https://login.microsoftonline.com/e519c2e6-bc6d-4fdf-8d9c-923c2f002385 AZURE_AUTHORITY=https://login.microsoftonline.com/e519c2e6-bc6d-4fdf-8d9c-923c2f002385
AZURE_REDIRECT_URI=https://optical-dev.oliver.solutions/video-accessibility/ AZURE_REDIRECT_URI=https://ai-sandbox.oliver.solutions/video-accessibility/
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# CORS Configuration # CORS Configuration
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Comma-separated list of allowed origins # Comma-separated list of allowed origins
CORS_ORIGINS=https://optical-dev.oliver.solutions CORS_ORIGINS=https://ai-sandbox.oliver.solutions
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Observability & Monitoring (Optional) # Observability & Monitoring (Optional)
@ -118,9 +116,6 @@ OTEL_EXPORTER_OTLP_ENDPOINT=
WHISPER_SERVICE_URL=https://whisper-http-service-bcb6ipdqka-uc.a.run.app WHISPER_SERVICE_URL=https://whisper-http-service-bcb6ipdqka-uc.a.run.app
FFMPEG_SERVICE_URL=https://ffmpeg-http-service-bcb6ipdqka-uc.a.run.app FFMPEG_SERVICE_URL=https://ffmpeg-http-service-bcb6ipdqka-uc.a.run.app
# optical-dev uses Celery workers (not Cloud Run Jobs) for pipeline dispatch
USE_CELERY_FALLBACK=true
# Worker Concurrency (higher values for Cloud Run mode since workers just make HTTP calls) # Worker Concurrency (higher values for Cloud Run mode since workers just make HTTP calls)
WHISPER_WORKER_CONCURRENCY=10 WHISPER_WORKER_CONCURRENCY=10
FFMPEG_WORKER_CONCURRENCY=20 FFMPEG_WORKER_CONCURRENCY=20

View file

@ -1,23 +0,0 @@
# Screenshot capture credentials — copy to .env.screenshots and fill in values
# NEVER commit .env.screenshots (it is gitignored)
BASE_URL=https://optical-dev.oliver.solutions/video-accessibility
# Local-password admin seeded by backend/scripts/seed_test_users.py
TEST_ADMIN_EMAIL=test-admin@oliver.agency
TEST_ADMIN_PASSWORD=TestAdmin2026!
TEST_CLIENT_EMAIL=test-client@oliver.agency
TEST_CLIENT_PASSWORD=TestClient2026!
TEST_LINGUIST_EMAIL=test-linguist@oliver.agency
TEST_LINGUIST_PASSWORD=TestLinguist2026!
TEST_REVIEWER_EMAIL=test-reviewer@oliver.agency
TEST_REVIEWER_PASSWORD=TestReviewer2026!
TEST_PRODUCTION_EMAIL=test-production@oliver.agency
TEST_PRODUCTION_PASSWORD=TestProduction2026!
TEST_PM_EMAIL=test-pm@oliver.agency
TEST_PM_PASSWORD=TestPM2026!

13
.gitignore vendored
View file

@ -12,7 +12,6 @@ examples/
.env.local .env.local
.env.production .env.production
.env.*.local .env.*.local
.env.screenshots
secrets/ secrets/
*.pem *.pem
*.key *.key
@ -99,15 +98,3 @@ docs/*.pdf
/var/www/html/video-accessibility.backup.* /var/www/html/video-accessibility.backup.*
backend/.env backend/.env
# Node / npm artifacts at repo root (Playwright MCP installs these)
node_modules/
package.json
package-lock.json
# Playwright MCP session snapshots
.playwright-mcp/
# Test videos
test-video.mp4
.worktrees/

View file

@ -1,118 +0,0 @@
# Build Health Audit — ln-622
**Score: 5.5/10** | Issues: 28 (C:0 H:5 M:18 L:5)
**Date:** 2026-04-30 | **Stack:** Python 3.11 / FastAPI / Celery + React 19 / Vite / TypeScript 5.8
---
## 1. Compiler / Linter Errors
### Backend — ruff: 1314 errors (HIGH)
`ruff check app/` exits non-zero with 1314 violations. The ruff config in `pyproject.toml` uses **deprecated top-level `select`/`ignore`/`per-file-ignores`** instead of `[tool.ruff.lint]` — ruff emits a warning on every run.
Top violation codes:
| Code | Meaning | Volume |
|------|---------|--------|
| I001 | Import block unsorted | ~400 |
| UP | pyupgrade (f-strings, typing aliases) | ~500 |
| B | flake8-bugbear | ~200 |
| F401 | Unused import | 58 |
Most violations are **auto-fixable** (`ruff check --fix`). The unsorted imports and UP rules are cosmetic but make CI noisy and block future enforcement.
**Severity: HIGH** — CI cannot gate on ruff without fixing this first.
### Frontend — ESLint: 36 problems (30 errors, 6 warnings) (MEDIUM)
Key errors:
| File | Rule | Count |
|------|------|-------|
| `contexts/GlobalWebSocketContext.tsx:56` | `react-refresh/only-export-components` | 1 |
| `contexts/NotificationContext.tsx:91` | `react-refresh/only-export-components` | 1 |
| `contexts/ToastContext.tsx:83` | `react-refresh/only-export-components` | 1 |
| `lib/api.ts:539` | `@typescript-eslint/no-explicit-any` | 1 |
| `routes/admin/QCDetail.tsx` | `@typescript-eslint/no-explicit-any` | 6 |
| `routes/AcceptInvite.tsx` | `@typescript-eslint/no-explicit-any` | 1 |
| `routes/jobs/JobDetail.tsx` | `no-unused-vars` (err catch) | 2 |
| `hooks/__tests__/useJob.test.tsx` | `no-unused-vars` | 1 |
| `tests/helpers/auth.ts` | `no-explicit-any` | 3 |
**Severity: MEDIUM** — build succeeds, but `any` types and react-refresh errors degrade DX and HMR.
---
## 2. Type Errors
### Frontend — tsc: CLEAN ✓
`tsc --noEmit` exits 0. No TypeScript compilation errors. The `any` issues above are ESLint-level, not tsc errors.
### Backend — mypy: NOT RUN
Cannot run mypy outside the poetry venv. Needs `poetry run mypy .` inside Docker or an activated venv.
**Severity: LOW** (mypy not blocking, but should be run in CI)
---
## 3. Tests
### Frontend — vitest: 13 failed / 75 total (HIGH)
8 test files affected:
| Test | Failures | Root cause |
|------|----------|-----------|
| `auth.test.ts` | 1 | Mock shape mismatch — response has extra field `organizationId` |
| `StatusBadge.test.tsx` | 1 | Unknown status no longer renders text (component changed) |
| `VttEditor.test.tsx` | 1 | Multiple elements found for `Insert cue before` title — DOM duplication |
| `useJob.test.tsx` | 3 | `useApproveEnglish` — pending state never resolves in test (timeout 1s); `useCreateJob` arg mismatch |
| `UploadDropzone.test.tsx` | 6 | Text broken across elements — test uses exact string match, component renders in `<span>` nodes |
| `useJobStatusWebSocket.test.tsx` | 1 | (see output) |
**Severity: HIGH** — 17% test failure rate. Several are stale tests from component refactors (UploadDropzone, StatusBadge).
### Backend — pytest: CANNOT RUN (CRITICAL)
Running `pytest` outside poetry venv fails with `ModuleNotFoundError` for `fastapi`, `aiohttp`, etc. Tests must be run with `poetry run pytest` inside Docker or an activated poetry environment.
The `backend/.venv` exists but appears to be a plain venv, not the poetry-managed one. **Tests are effectively unrunnable in local dev without explicit poetry activation.**
**Severity: CRITICAL** — Developers with system Python cannot run tests without explicit setup steps.
---
## 4. Build Configuration Issues
### ruff config deprecated (MEDIUM)
`pyproject.toml` uses `[tool.ruff]` top-level `select`, `ignore`, `per-file-ignores`. Current ruff ≥ 0.2 expects `[tool.ruff.lint]`. Fix:
```toml
# Before
[tool.ruff]
select = ["E", "W", ...]
ignore = ["E501", ...]
# After
[tool.ruff]
target-version = "py311"
line-length = 88
[tool.ruff.lint]
select = ["E", "W", ...]
ignore = ["E501", ...]
```
### Backend venv mismatch (MEDIUM)
`backend/.venv` cannot run `ruff`, `pytest`, or `mypy` — they are installed in the poetry-managed venv, not this one. Confusing to new devs.
### AGENTS.md commands incorrect (LOW)
`AGENTS.md` documents `cd backend && poetry run pytest` but the backend has `.venv` and `pyproject.toml` with no Makefile wrapper. The actual working path is `cd backend && .venv/bin/python -m pytest` or requires `poetry shell`.
---
## Summary
| Check | Result | Severity |
|-------|--------|---------|
| ruff backend | 1314 violations (auto-fixable) | HIGH |
| ESLint frontend | 36 problems | MEDIUM |
| tsc frontend | ✓ Clean | OK |
| mypy backend | Not runnable locally | LOW |
| vitest frontend | 13/75 failing | HIGH |
| pytest backend | Not runnable locally | CRITICAL |
| ruff config | Deprecated syntax | MEDIUM |
| venv setup | Confusing / broken | MEDIUM |

View file

@ -1,116 +0,0 @@
# Code Quality Audit — ln-624
**Score: 5.0/10** | Issues: 22 (C:2 H:8 M:9 L:3)
**Date:** 2026-04-30
---
## 1. God Classes / Files (> 500 lines)
| File | Lines | Severity |
|------|-------|---------|
| `backend/app/api/v1/routes_jobs.py` | 2882 | **CRITICAL** |
| `frontend/src/routes/admin/QCDetail.tsx` | 2079 | **CRITICAL** |
| `backend/app/services/video_renderer.py` | 1695 | **HIGH** |
| `frontend/src/routes/jobs/JobsList.tsx` | 1246 | **HIGH** |
| `frontend/src/lib/api.ts` | 1056 | **HIGH** |
| `backend/app/tasks/translate_and_synthesize.py` | 1019 | **HIGH** |
| `frontend/src/routes/jobs/NewJob.tsx` | 1038 | **HIGH** |
| `frontend/src/types/api.ts` | 891 | **MEDIUM** |
| `frontend/src/routes/jobs/JobDetail.tsx` | 732 | **MEDIUM** |
| `frontend/src/routes/admin/UserDetail.tsx` | 523 | **MEDIUM** |
| `frontend/src/hooks/useJobStatusWebSocket.ts` | 443 | **MEDIUM** |
**routes_jobs.py at 2882 lines** is the worst offender — it mixes upload, approval, translation, TTS, VTT editing, download, admin, and websocket concerns in a single router. Splitting by domain (e.g., `routes_upload.py`, `routes_vtt.py`, `routes_review.py`, `routes_tts.py`) would bring each under 500 lines.
**QCDetail.tsx at 2079 lines** handles the entire QC workflow, VTT display, audio preview, language selection, and approval modals in one component. Needs extraction of at minimum: `LanguageQCPanel`, `VttReviewView`, `ApprovalModal`.
---
## 2. Long Methods (> 100 lines)
| File:line | Function | Length | Severity |
|-----------|---------|--------|---------|
| `tasks/translate_and_synthesize.py:109` | `_async_translate_and_synthesize()` | 485 lines | **CRITICAL** |
| `services/video_renderer.py:487` | `_render_pause_insert_method()` | 419 lines | **CRITICAL** |
| `tasks/ingest_and_ai.py:53` | `ingest_and_ai_task_impl()` | 276 lines | **HIGH** |
| `tasks/rerender_accessible_video.py:110` | `_async_rerender_accessible_video()` | 280 lines | **HIGH** |
| `tasks/render_accessible_video.py:56` | `_async_render_accessible_video()` | 287 lines | **HIGH** |
| `api/v1/routes_jobs.py:1552` | `update_job_vtt_content()` | 215 lines | **HIGH** |
| `tasks/notify.py:29` | `run_async()` | 169 lines | **HIGH** |
| `api/v1/routes_jobs.py:2738` | `update_tts_preferences()` | 144 lines | **MEDIUM** |
| `services/whisper_service.py:241` | `_find_sentence_boundaries()` | 120 lines | **MEDIUM** |
| `services/gemini.py:591` | `analyze_accessible_video_placement()` | 132 lines | **MEDIUM** |
The two most critical ones (`_async_translate_and_synthesize` at 485 lines and `_render_pause_insert_method` at 419 lines) are orchestrator-style functions with sequential pipeline steps. They could be split into named pipeline stages, each ~50 lines.
---
## 3. Deep Nesting
Not systematically scanned with a tool (radon/lizard not installed). The long functions above likely contain 45+ nesting levels given their complexity.
---
## 4. Too Many Parameters
| Location | Function | Params | Severity |
|----------|---------|--------|---------|
| `services/gemini.py` | `extract_accessibility_targeted()` | 7+ | **MEDIUM** |
| `tasks/translate_and_synthesize.py` | `_generate_language_tts()` | 8+ | **MEDIUM** |
Pattern: many functions pass `db`, `job`, `language`, `settings`, `gcs_client`, etc. individually instead of grouping into a context dataclass.
---
## 5. Magic Numbers
### Backend (MEDIUM)
Scattered timing constants without named definitions:
- TTS retry delays (hardcoded seconds)
- chunk sizes in upload
- Audio padding values in video_renderer.py
### Frontend (LOW)
Mostly clean. Some inline pixel values in Tailwind (acceptable). No concerning business-logic magic numbers found.
---
## 6. N+1 Query Patterns (MEDIUM)
Potential N+1 patterns found:
- `app/main.py:102``async for job_doc in db.jobs.find(...)` — check if this iterates and makes additional queries per document
- `app/core/dependencies.py:185``async for m in db.memberships.find(...)` — membership lookup per request in auth middleware (acceptable if cached, but no caching observed)
- `app/core/authz.py:54``async for doc in db.memberships.find(...)` — similar pattern in auth check
These are all async iterators over `find()` — not necessarily N+1 if no nested DB calls, but should be reviewed for `.find()` calls inside the loop body.
---
## 7. Method Signature Quality
### Boolean flag parameters (MEDIUM)
Several async functions in tasks accept `bool` flags controlling behavior variants (e.g., `skip_tts`, `force_regenerate`). These should be enums or separate functions.
### Unclear return types (MEDIUM)
Some routes return `dict` or untyped responses instead of Pydantic response models. `routes_admin_production.py` has a few endpoints returning bare dicts.
---
## 8. Side-Effect Cascade Depth
`_async_translate_and_synthesize()` at 485 lines is the worst case: it writes to GCS, updates MongoDB, dispatches TTS tasks, sends notifications, and updates job status — 5+ distinct side-effect categories from a single function call. This warrants extraction into an orchestrator that delegates to named sink functions.
---
## Summary
| Check | Status | Severity |
|-------|--------|---------|
| God files (>500L) | 11 files | CRITICAL×2, HIGH×4 |
| Long methods (>100L) | 10 functions | CRITICAL×2, HIGH×5 |
| N+1 patterns | 3 potential | MEDIUM |
| Magic numbers | Some in tasks | MEDIUM |
| Method signatures | Boolean flags, unclear returns | MEDIUM |
| Side-effect cascade | translate_and_synthesize | HIGH |
**Primary recommendation:** Split `routes_jobs.py` and `QCDetail.tsx` — these two files account for the majority of the quality debt.

View file

@ -1,94 +0,0 @@
# Dependencies & Reuse Audit — ln-625
**Score: 7.5/10** | Issues: 9 (C:0 H:2 M:5 L:2)
**Date:** 2026-04-30
---
## 1. Vulnerability Scan (CVE/CVSS)
### Frontend — npm audit: ✓ CLEAN
```
Total packages: 479
Vulnerabilities: info:0 low:0 moderate:0 high:0 critical:0 total:0
```
Zero CVEs. Excellent.
### Backend — pip-audit: NOT RUN
`pip-audit` not installed in local env. Recommended to add to CI:
```bash
pip install pip-audit && pip-audit -r requirements.txt
```
Given many heavy deps (Celery 5.3, google-cloud-*, faster-whisper, aiohttp), a CI scan is strongly advised.
---
## 2. Outdated Packages
### Frontend — npm outdated (many minor/major updates pending)
**MAJOR version gaps (HIGH):**
| Package | Installed | Latest | Notes |
|---------|-----------|--------|-------|
| `@azure/msal-browser` | 4.25.0 | **5.9.0** | MSAL v5 has breaking API changes |
| `@azure/msal-react` | 3.0.20 | **5.3.2** | Paired with msal-browser, coordinated upgrade needed |
| `@sentry/react` | 8.55.0 | **10.51.0** | Sentry v10 has breaking changes |
| `typescript` | 5.8.3 | **6.0.3** | TS 6 has strictness changes |
| `vite` | 7.3.2 | **8.0.10** | Vite 8 breaking changes |
| `eslint` | 9.33.0 | **10.2.1** | ESLint 10 config format may change |
| `jsdom` | 26.1.0 | **29.1.1** | Test environment |
**Minor updates (LOW-MEDIUM):** Most other packages have minor/patch updates pending (react 19.1→19.2, tailwindcss 4.1→4.2, etc.)
**Recommendation:** Keep MSAL and Sentry on current major until dedicated upgrade sprint. React, TailwindCSS, react-query minor updates are safe to apply immediately.
### Backend — pip outdated: pip-audit not available
Based on pyproject.toml dates vs ecosystem:
- `ruff ^0.1.6` → installed ruff is `0.15.12` (already updated, good)
- `google-genai ^1.56.0` → recently updated per git log
- `faster-whisper ^1.2.0` → check for 1.x updates
---
## 3. Unused Dependencies
### Backend — `sendgrid` (MEDIUM)
`pyproject.toml` lists `sendgrid = "^6.11.0"`. However:
- The actual emailer (`app/services/emailer.py`) uses **Mailgun** REST API via `httpx`
- `sendgrid` is referenced **only** in `app/core/config.py` as a dead config field `sendgrid_api_key: str = ""` with comment `# Email (Mailgun — primary; sendgrid_api_key kept for backward compat)`
- No `import sendgrid` anywhere in app code
**Action:** Remove `sendgrid` from `pyproject.toml` dependencies and remove the `sendgrid_api_key` config field.
### Frontend — no unused dependencies found
- `axios` → used in `lib/api.ts`
- `@azure/msal-*` → used in `main.tsx`, `routes/Login.tsx`
- `date-fns` → used in 5+ components
- `zustand`, `@tanstack/react-query`, `react-hook-form`, `zod` → all actively used
- `react-dropzone` → used in upload components
---
## 4. Available Native Alternatives
### Frontend — axios vs fetch (LOW)
`axios` is used for all API calls in `lib/api.ts`. The project targets modern browsers and uses Vite. Native `fetch` + `AbortController` could replace axios, reducing bundle by ~14kb gzipped. However, axios provides request/response interceptors that are actively used for auth token refresh — migration effort is medium. **Not urgent.**
---
## 5. Custom Implementations
No custom crypto or hand-rolled validation libraries found. All auth uses `python-jose` + `libpass` (bcrypt). VTT parsing is domain-specific and not replaceable by a library. No concerns.
---
## Summary
| Check | Result | Severity |
|-------|--------|---------|
| Frontend CVEs | ✓ 0 vulnerabilities | OK |
| Backend CVEs | ⚠ Not scanned | MEDIUM |
| Frontend major updates | MSAL×2, Sentry, TS, Vite, ESLint | HIGH |
| Frontend minor updates | Many | LOW |
| Backend unused dep | `sendgrid` in pyproject.toml | MEDIUM |
| Native alternatives | axios → fetch possible | LOW |
| Custom implementations | None found | OK |

View file

@ -1,143 +0,0 @@
# Dead Code Audit — ln-626
**Score: 7.0/10** | Issues: 14 (C:0 H:0 M:6 L:8)
**Date:** 2026-04-30
---
## 1. Unused Imports (Python — F401)
ruff detected **58 unused import violations** across backend. Sample:
| File | Unused import |
|------|--------------|
| `routes_admin.py:9` | `get_current_user` |
| `routes_admin.py:11` | `verify_password` |
| `routes_admin.py:16` | `ChangePasswordRequest` |
| `routes_admin.py:23` | `log_security_event` |
| (+ 54 more across all files) | |
All are auto-fixable with `ruff check --fix --select F401`. The `__init__.py` files are correctly excluded via `per-file-ignores`.
**Severity: MEDIUM** — clutters imports, increases cognitive load when reading files.
---
## 2. Deprecated / Legacy Types (Frontend)
`frontend/src/types/api.ts` contains 3 deprecated exported types with JSDoc markers:
| Line | Type | Marker |
|------|------|--------|
| 96 | `TtsVoicesResponse` | `@deprecated Use ProviderVoicesResponse instead` |
| 137 | `TtsOptionsResponse` | `@deprecated Use ProviderOptionsResponse instead` |
| 555-566 | `Client` / `OrganizationLegacy` | `@deprecated Use Organization instead` + `export { Client as OrganizationLegacy }` |
These types are still exported, meaning consumers could use them by mistake. If no external consumers exist (library not published), they should be deleted.
**Severity: MEDIUM** — active deprecation markers indicate intent to remove. Leaving them causes confusion.
---
## 3. Legacy Status Values (Frontend)
`frontend/src/types/api.ts:12,14`:
```ts
| "tts_failed" // legacy: keep for back-compat
| "render_failed" // legacy: keep for back-compat
```
These job statuses are marked as legacy. If the backend no longer emits them, they are dead type branches. If it still does (for old jobs in MongoDB), they're valid — but should be clearly documented with a removal condition.
**Severity: LOW** — no runtime impact, but requires clarification.
---
## 4. Backward Compatibility Code (Frontend)
### lib/api.ts:239 — Legacy approval method (MEDIUM)
```ts
// Legacy method - calls approve_source for backwards compatibility
```
A backward-compat shim in the API client. If all callers have been updated to the new method, this should be removed.
### VideoWithCaptions.tsx:1643 — Legacy single-language props (MEDIUM)
```ts
// Legacy single-language props (still supported)
sourceLanguage?: string; // Language code for legacy props
// Legacy props
// Combine legacy props with tracks (use useMemo to prevent recreation)
```
The component maintains backward-compat with old single-language prop API. If no callers use these legacy props, they can be removed.
### JobDetail.tsx:41 — Legacy status mapping (LOW)
```ts
// Handle legacy approved_english/approved_source statuses (map to pending_final_review)
```
Status mapping shim for old job records. Should be removed after all existing jobs are migrated.
---
## 5. Commented-Out Code (Backend)
| File | Line | Content |
|------|------|---------|
| `telemetry/tracing.py:5` | `# from opentelemetry.exporter.gcp.trace import CloudTraceSpanExporter # Disabled for local dev` | GCP trace exporter disabled |
| `telemetry/metrics.py:5` | `# from opentelemetry.exporter.prometheus import PrometheusMetricReader # Disabled for local dev` | Prometheus reader disabled |
| `pyproject.toml` | `# opentelemetry-exporter-prometheus = ... # Temporarily disabled - version conflicts` | Dep commented out |
These are intentional (local dev vs prod config), not dead code. However, the conditional should be expressed via environment config, not source comments. **Low priority.**
**Severity: LOW**
---
## 6. Leftover .old Files (MEDIUM)
| File | Age | Action |
|------|-----|--------|
| `docker-compose.yml.old` | Created 2026-03-03 (~2 months) | Delete |
| `backend/Dockerfile.old` | Created 2026-03-03 (~2 months) | Delete |
| `backend/.dockerignore.old` | — | Delete |
These files have no build references. Git history preserves them.
---
## 7. Unused Dockerfiles
| File | Referenced in compose? |
|------|----------------------|
| `backend/Dockerfile.ffmpeg-service` | No — ffmpeg is embedded in main worker |
| `backend/Dockerfile.cloudrun` | Yes — referenced for Cloud Run deploys |
| `backend/Dockerfile.whisper-service` | Yes — whisper-worker service in compose |
`Dockerfile.ffmpeg-service` appears to be dead — the main Dockerfile handles ffmpeg. Should be confirmed and deleted if unused.
**Severity: LOW**
---
## 8. Dead Config Field
`backend/app/core/config.py:272`:
```python
sendgrid_api_key: str = "" # Email (Mailgun — primary; sendgrid_api_key kept for backward compat)
```
`sendgrid` package not used. Config field and `secrets_config.py` secret reference both dead.
**Severity: MEDIUM** — misleads ops into configuring a sendgrid secret that has no effect.
---
## Summary
| Check | Issues | Severity |
|-------|--------|---------|
| Unused Python imports | 58 (auto-fixable) | MEDIUM |
| Deprecated TS types | 3 types | MEDIUM |
| Backward-compat shims | 3 in frontend | MEDIUM |
| Commented-out code | 3 telemetry lines | LOW |
| .old files | 3 files | MEDIUM |
| Unused Dockerfile | Dockerfile.ffmpeg-service | LOW |
| Dead config field | sendgrid_api_key | MEDIUM |
| Legacy status values | 2 status strings | LOW |

View file

@ -1,97 +0,0 @@
# Accessible Video Processing Platform — Project Entry Point
<!-- SCOPE: root | owner: ln-111 | generated: 2026-04-29 -->
## What Is This Project
AI-powered SaaS platform that generates legally-required accessibility assets from video files: closed captions, audio descriptions, SDH captions, and descriptive transcripts. Outputs are reviewed through a human QC workflow before client delivery. 50+ language translation and cultural transcreation are built in.
**Client:** Oliver Internal
**Server:** optical-web-1
**Status:** 85% production-ready
---
## Quick Navigation
| Need | Go to |
|------|-------|
| Architecture, data flow, state machine | [docs/project/architecture.md](docs/project/architecture.md) |
| Tech stack versions and config | [docs/project/tech_stack.md](docs/project/tech_stack.md) |
| API endpoint reference | [docs/project/api_spec.md](docs/project/api_spec.md) |
| Database collections and indexes | [docs/project/database_schema.md](docs/project/database_schema.md) |
| Infrastructure inventory | [docs/project/infrastructure.md](docs/project/infrastructure.md) |
| Runbook — deploy, restart, rollback | [docs/project/runbook.md](docs/project/runbook.md) |
| Functional requirements | [docs/project/requirements.md](docs/project/requirements.md) |
| Development principles | [docs/principles.md](docs/principles.md) |
| Reference — ADRs, guides, research | [docs/reference/README.md](docs/reference/README.md) |
| Task management | [docs/tasks/README.md](docs/tasks/README.md) |
| Test strategy and commands | [tests/README.md](tests/README.md) |
| Documentation hub | [docs/README.md](docs/README.md) |
---
## Entry Points by Audience
| Audience | Start here |
|----------|-----------|
| New developer | [docs/project/runbook.md](docs/project/runbook.md) → local setup section |
| Reviewer / QC | [docs/project/requirements.md](docs/project/requirements.md) → QC workflow section |
| DevOps | [docs/project/infrastructure.md](docs/project/infrastructure.md) + [docs/project/runbook.md](docs/project/runbook.md) |
| Security reviewer | [docs/project/architecture.md](docs/project/architecture.md) → security section |
| AI agent | Read this file → pick topic → read `_index`-equivalent doc → synthesize |
---
## Core Pipeline (one-line summary per stage)
| Stage | What happens | Key file |
|-------|-------------|---------|
| Upload | MP4 → GCS + MongoDB job record | `routes_files.py` |
| Ingestion | Celery worker transcribes with Gemini 2.5 Pro | `tasks/ingest_and_ai.py` |
| AI Processing | VTT generated, validated, stored in GCS | `services/gemini.py` |
| QC Review | Reviewer edits VTT, approves or rejects | `services/language_qc.py` |
| Translation | Google Translate + transcreation per language | `tasks/translate_and_synthesize.py` |
| TTS | Per-cue audio synthesis (Google TTS / ElevenLabs) | `services/tts.py` |
| Final Review | PM approves deliverables | `routes_language_qc.py` |
| Delivery | Signed GCS URLs emailed to client | `services/emailer.py` |
See full state machine (16 states) in [docs/project/architecture.md](docs/project/architecture.md#job-state-machine).
---
## Development Commands
| Action | Command |
|--------|---------|
| Start local (Docker + Vite) | `./scripts/run-local.sh` |
| Rebuild after code change | `./scripts/run-local.sh --rebuild` |
| Stop all local services | `./scripts/run-local.sh --stop` |
| Backend lint | `cd backend && ruff check .` |
| Backend type-check | `cd backend && mypy .` (run in Docker container) |
| Frontend lint | `cd frontend && npm run lint` |
| Frontend type-check | `cd frontend && npm run type-check` |
| Backend tests | `cd backend && poetry run pytest` |
| Frontend tests | `cd frontend && npm run test` |
| E2E tests | `cd frontend && npm run test:e2e` |
---
## Key Constraints
- **NO SSH to optical-web-1** without explicit user instruction — hard rule in CLAUDE.md
- **Access tokens in memory only** (not localStorage) — auth architecture constraint
- **Refresh tokens in HttpOnly cookies** — security requirement
- **Signed GCS URLs** expire in 24h — do not cache or store URLs
- **RBAC enforced server-side** — never trust client-supplied role claims
- **All reviewer actions emit audit log entries** — compliance requirement
---
## Maintenance
**Update triggers:** New route added, deployment target changes, key dependency version change, new team member onboarded.
**Verification:** All links in Quick Navigation resolve. Entry commands are correct against current scripts/.
<!-- END SCOPE: root -->

View file

@ -1,8 +1,5 @@
# Accessible Video Processing Platform - Development Guide # Accessible Video Processing Platform - Development Guide
<!-- Documentation entry point: see @AGENTS.md for full project navigation -->
@AGENTS.md
## Project Overview ## Project Overview
This is a comprehensive video accessibility platform that automatically generates closed captions and audio descriptions using AI, with quality control workflows and multi-language support. This is a comprehensive video accessibility platform that automatically generates closed captions and audio descriptions using AI, with quality control workflows and multi-language support.

Binary file not shown.

View file

@ -2,8 +2,6 @@
A comprehensive AI-powered platform for generating accessible video content with closed captions, audio descriptions, and multi-language translations. Features a complete workflow from video upload to final delivery with quality control processes. A comprehensive AI-powered platform for generating accessible video content with closed captions, audio descriptions, and multi-language translations. Features a complete workflow from video upload to final delivery with quality control processes.
**Documentation:** See [AGENTS.md](AGENTS.md) for full navigation, or [docs/README.md](docs/README.md) for the documentation hub.
## ✅ Current Status: **Production-Ready** (85% Complete) ## ✅ Current Status: **Production-Ready** (85% Complete)
**Lines of Code:** 20,471 total (12,198 backend + 8,273 frontend) **Lines of Code:** 20,471 total (12,198 backend + 8,273 frontend)

View file

@ -1,96 +1,172 @@
# ============================================================================= # =============================================================================
# Apache config fragment — Accessible Video Platform # Apache Configuration for Accessible Video Platform
# Inject into: /etc/apache2/sites-available/optical-dev.oliver.solutions-ssl.conf # =============================================================================
# # Add this configuration to your existing VirtualHost for ai-sandbox.oliver.solutions
# Required modules: # Location: /etc/apache2/sites-available/ai-sandbox.oliver.solutions-ssl.conf
# sudo a2enmod proxy proxy_http proxy_wstunnel rewrite headers
#
# Container port map:
# accessible-video-api → 0.0.0.0:8012->8000/tcp
# ============================================================================= # =============================================================================
# ── Timeouts for large video uploads (up to 2 GB, ~10 min) ────────────────── # -----------------------------------------------------------------------------
<IfModule mod_proxy.c> # Frontend - Static React SPA served from subdirectory
ProxyTimeout 600 # -----------------------------------------------------------------------------
</IfModule>
# ── WebSocket proxy (MUST be before /api/ HTTP proxy) ─────────────────────── # Serve frontend from /video-accessibility subdirectory
# disablereuse=on prevents long-lived WS connections from exhausting the pool
ProxyPassMatch ^/video-accessibility/api/v1/ws/(.*)$ ws://127.0.0.1:8012/api/v1/ws/$1 disablereuse=on
ProxyPassReverse /video-accessibility/api/v1/ws/ ws://127.0.0.1:8012/api/v1/ws/
# ── API proxy ────────────────────────────────────────────────────────────────
# Strips /video-accessibility prefix — FastAPI sees /api/v1/...
ProxyPassMatch ^/video-accessibility/api/(.*)$ http://127.0.0.1:8012/api/$1
ProxyPassReverse /video-accessibility/api/ http://127.0.0.1:8012/api/
# Swagger / OpenAPI
ProxyPassMatch ^/video-accessibility/docs(/.*)?$ http://127.0.0.1:8012/docs$1
ProxyPassReverse /video-accessibility/docs http://127.0.0.1:8012/docs
ProxyPassMatch ^/video-accessibility/openapi\.json$ http://127.0.0.1:8012/openapi.json
ProxyPassReverse /video-accessibility/openapi.json http://127.0.0.1:8012/openapi.json
# ── SPA static files ─────────────────────────────────────────────────────────
Alias /video-accessibility /var/www/html/video-accessibility Alias /video-accessibility /var/www/html/video-accessibility
<Directory /var/www/html/video-accessibility> <Directory /var/www/html/video-accessibility>
# Basic options
Options -Indexes +FollowSymLinks Options -Indexes +FollowSymLinks
AllowOverride None AllowOverride All
Require all granted Require all granted
# Allow video uploads up to 2 GB # React SPA routing - rewrite all requests to index.html
LimitRequestBody 2147483648
RewriteEngine On RewriteEngine On
RewriteBase /video-accessibility/ RewriteBase /video-accessibility
# Serve real files/directories directly (JS, CSS, assets, fonts) # Don't rewrite files or directories that exist
RewriteCond %{REQUEST_FILENAME} -f [OR] RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} -d RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ - [L]
# Everything else → index.html (React Router handles client-side nav) # Rewrite everything else to index.html
RewriteRule ^ index.html [L] RewriteRule ^ /video-accessibility/index.html [L]
# Cache-bust hashed assets indefinitely; never cache HTML
<FilesMatch "\.(js|css|woff2?|ttf|eot|png|jpg|jpeg|gif|ico|svg)$">
Header set Cache-Control "public, max-age=31536000, immutable"
</FilesMatch>
<FilesMatch "\.html$">
Header set Cache-Control "no-cache, no-store, must-revalidate"
</FilesMatch>
# Security headers # Security headers
Header always set X-Frame-Options "SAMEORIGIN" Header always set X-Frame-Options "SAMEORIGIN"
Header always set X-Content-Type-Options "nosniff" Header always set X-Content-Type-Options "nosniff"
Header always set X-XSS-Protection "1; mode=block"
Header always set Referrer-Policy "strict-origin-when-cross-origin" Header always set Referrer-Policy "strict-origin-when-cross-origin"
# Cache control for static assets
<FilesMatch "\.(js|css|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$">
Header set Cache-Control "public, max-age=31536000, immutable"
</FilesMatch>
# No cache for HTML files
<FilesMatch "\.(html)$">
Header set Cache-Control "no-cache, no-store, must-revalidate"
Header set Pragma "no-cache"
Header set Expires "0"
</FilesMatch>
</Directory> </Directory>
# -----------------------------------------------------------------------------
# Backend API - Reverse proxy to Docker container
# -----------------------------------------------------------------------------
# Proxy backend API to Docker container on port 8000
<Location /video-accessibility-back>
# Preserve original host header
ProxyPreserveHost On
# Proxy HTTP requests
ProxyPass http://localhost:8000
ProxyPassReverse http://localhost:8000
# Proxy timeout settings (important for long-running video processing)
ProxyTimeout 300
# WebSocket support (CRITICAL for real-time job updates)
RewriteEngine On
RewriteCond %{HTTP:Upgrade} =websocket [NC]
RewriteRule /video-accessibility-back/(.*) ws://localhost:8000/$1 [P,L]
RewriteCond %{HTTP:Upgrade} !=websocket [NC]
RewriteRule /video-accessibility-back/(.*) http://localhost:8000/$1 [P,L]
# Security headers
Header always set X-Frame-Options "SAMEORIGIN"
Header always set X-Content-Type-Options "nosniff"
# CORS is handled by the backend, don't add headers here
</Location>
# -----------------------------------------------------------------------------
# Required Apache Modules
# -----------------------------------------------------------------------------
# Enable these modules with:
# sudo a2enmod rewrite
# sudo a2enmod proxy
# sudo a2enmod proxy_http
# sudo a2enmod proxy_wstunnel
# sudo a2enmod headers
# sudo systemctl restart apache2
# Verify modules are enabled:
# apache2ctl -M | grep -E '(rewrite|proxy|headers)'
# ============================================================================= # =============================================================================
# Full VirtualHost skeleton (reference — values match optical-web-1) # Full VirtualHost Example
# ============================================================================= # =============================================================================
# Example of complete VirtualHost configuration:
# #
# <VirtualHost *:443> # <VirtualHost *:443>
# ServerName optical-dev.oliver.solutions # ServerName ai-sandbox.oliver.solutions
# ServerAdmin admin@oliver.solutions
#
# DocumentRoot /var/www/html # DocumentRoot /var/www/html
# #
# # SSL Configuration (with wildcard cert)
# SSLEngine on # SSLEngine on
# SSLCertificateFile /path/to/wildcard.crt # SSLCertificateFile /path/to/wildcard-ai-sandbox.oliver.solutions.crt
# SSLCertificateKeyFile /path/to/wildcard.key # SSLCertificateKeyFile /path/to/wildcard-ai-sandbox.oliver.solutions.key
# SSLCertificateChainFile /path/to/chain.crt # If needed
# #
# SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1 # # SSL Protocol and Cipher settings
# SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1
# SSLCipherSuite HIGH:!aNULL:!MD5 # SSLCipherSuite HIGH:!aNULL:!MD5
# #
# # — paste the block above here — # # Frontend configuration (from above)
# Alias /video-accessibility /var/www/html/video-accessibility
# <Directory /var/www/html/video-accessibility>
# ...
# </Directory>
# #
# ErrorLog ${APACHE_LOG_DIR}/optical-dev-error.log # # Backend API configuration (from above)
# CustomLog ${APACHE_LOG_DIR}/optical-dev-access.log combined # <Location /video-accessibility-back>
# ...
# </Location>
#
# # Logging
# ErrorLog ${APACHE_LOG_DIR}/ai-sandbox-error.log
# CustomLog ${APACHE_LOG_DIR}/ai-sandbox-access.log combined
# </VirtualHost> # </VirtualHost>
# ============================================================================= # =============================================================================
# Verify # Testing & Verification
# ============================================================================= # =============================================================================
# sudo apache2ctl configtest
# sudo systemctl reload apache2 # Test Apache configuration:
# curl -I https://optical-dev.oliver.solutions/video-accessibility/ # sudo apache2ctl configtest
# curl https://optical-dev.oliver.solutions/video-accessibility/api/v1/health #
# wscat -c wss://optical-dev.oliver.solutions/video-accessibility/api/v1/ws/job-list # Restart Apache:
# sudo systemctl restart apache2
#
# Test frontend:
# curl -I https://ai-sandbox.oliver.solutions/video-accessibility
#
# Test backend:
# curl https://ai-sandbox.oliver.solutions/video-accessibility-back/health
#
# Test WebSocket (requires wscat):
# wscat -c wss://ai-sandbox.oliver.solutions/video-accessibility-back/api/v1/ws/job-list
# =============================================================================
# Troubleshooting
# =============================================================================
# Check Apache logs:
# sudo tail -f /var/log/apache2/ai-sandbox-error.log
# sudo tail -f /var/log/apache2/ai-sandbox-access.log
#
# Check if backend is running:
# curl http://localhost:8000/health
#
# Check Docker containers:
# cd /opt/accessible-video
# docker-compose ps
#
# Common issues:
# - 502 Bad Gateway: Backend container not running
# - 404 Not Found: Frontend not deployed or Apache alias incorrect
# - WebSocket fails: mod_proxy_wstunnel not enabled
# - CORS errors: Check backend CORS configuration, not Apache

92
backend/.dockerignore.old Normal file
View file

@ -0,0 +1,92 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# Poetry (keep poetry.lock for reproducible builds)
# poetry.lock
# Virtual environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# OS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
# Testing
.coverage
.pytest_cache/
.mypy_cache/
.tox/
htmlcov/
coverage.xml
*.cover
.hypothesis/
# Documentation
docs/
*.md
README*
# Logs
*.log
logs/
# Git
.git/
.gitignore
# Docker
Dockerfile*
.dockerignore
docker-compose*
# CI/CD
.github/
# Local development
.env.local
.env.development
.env.test
# Temporary files
tmp/
temp/
*.tmp
*.bak

1
backend/.gitignore vendored
View file

@ -23,7 +23,6 @@ eggs/
.eggs/ .eggs/
lib/ lib/
lib64/ lib64/
!app/lib/
parts/ parts/
sdist/ sdist/
var/ var/

View file

@ -3,8 +3,8 @@
# ============================================================================= # =============================================================================
# Stage 1: Builder - Install dependencies # Stage 1: Builder - Install dependencies
# Stage 2: Base - Common runtime for API and Worker # Stage 2: Base - Common runtime for API and Worker
# Stage 3: API - FastAPI + Gunicorn (no ffmpeg — heavy tasks run on Cloud Run Jobs) # Stage 3: API - FastAPI + Gunicorn (with ffmpeg for TTS audio conversion)
# Stage 4: Worker - Celery worker, lightweight queues only (notify, embed) # Stage 4: Worker - Celery worker (with ffmpeg for video processing)
# ============================================================================= # =============================================================================
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
@ -19,7 +19,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/* && rm -rf /var/lib/apt/lists/*
# Install Poetry # Install Poetry
RUN pip install --no-cache-dir poetry==2.1.4 RUN pip install --no-cache-dir poetry==1.8.2
# Configure Poetry to not create virtual environment (we're in a container) # Configure Poetry to not create virtual environment (we're in a container)
ENV POETRY_NO_INTERACTION=1 \ ENV POETRY_NO_INTERACTION=1 \
@ -33,7 +33,7 @@ COPY pyproject.toml poetry.lock ./
# Install dependencies using Poetry directly (simpler and more reliable) # Install dependencies using Poetry directly (simpler and more reliable)
RUN poetry config virtualenvs.create false \ RUN poetry config virtualenvs.create false \
&& poetry install --only main --no-root --no-interaction --no-ansi \ && poetry install --only main --no-interaction --no-ansi \
&& rm -rf $POETRY_CACHE_DIR && rm -rf $POETRY_CACHE_DIR
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
@ -46,7 +46,6 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
libmagic1 \ libmagic1 \
curl \ curl \
tini \ tini \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \ && rm -rf /var/lib/apt/lists/* \
&& apt-get clean && apt-get clean
@ -73,10 +72,21 @@ USER app
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Stage 3: API - FastAPI + Gunicorn (Production API Server) # Stage 3: API - FastAPI + Gunicorn (Production API Server)
# Heavy pipeline tasks (ingest/translate/render) run on Cloud Run Jobs
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
FROM base AS api FROM base AS api
# Switch to root to install ffmpeg
USER root
# Install ffmpeg for TTS audio conversion
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Switch back to non-root user
USER app
# Set production environment variables # Set production environment variables
ENV APP_ENV=prod ENV APP_ENV=prod
@ -94,10 +104,22 @@ ENTRYPOINT ["tini", "--"]
CMD ["gunicorn", "-c", "gunicorn_conf.py", "app.main:app"] CMD ["gunicorn", "-c", "gunicorn_conf.py", "app.main:app"]
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Stage 4: Worker - Celery Worker (lightweight queues: notify, embed) # Stage 4: Worker - Celery Worker (with ffmpeg for video processing)
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
FROM base AS worker FROM base AS worker
# Switch back to root to install ffmpeg
USER root
# Install ffmpeg for video processing
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Switch back to non-root user
USER app
# Set production environment variables # Set production environment variables
# WORKER_CONCURRENCY can be overridden at runtime (default: 8) # WORKER_CONCURRENCY can be overridden at runtime (default: 8)
ENV APP_ENV=prod \ ENV APP_ENV=prod \
@ -126,6 +148,18 @@ CMD celery -A celery_worker worker \
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
FROM base AS whisper-worker FROM base AS whisper-worker
# Switch back to root to install ffmpeg
USER root
# Install ffmpeg for audio extraction
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Switch back to non-root user
USER app
# Pre-download Whisper medium model during build to avoid cold start delays # Pre-download Whisper medium model during build to avoid cold start delays
# Model is cached in ~/.cache/huggingface/hub (~1.5GB) # Model is cached in ~/.cache/huggingface/hub (~1.5GB)
RUN python -c "from faster_whisper import WhisperModel; WhisperModel('medium', device='cpu', compute_type='int8')" RUN python -c "from faster_whisper import WhisperModel; WhisperModel('medium', device='cpu', compute_type='int8')"

View file

@ -1,55 +0,0 @@
# =============================================================================
# Cloud Run Job image — va-worker
#
# Reuses the multi-stage base from Dockerfile.
# Entrypoint: python -m app.tasks.runner --task <name> --job-id <id>
#
# Build:
# docker build -f backend/Dockerfile.cloudrun -t va-worker backend/
# =============================================================================
# ── Stage 1: Builder ─────────────────────────────────────────────────────────
FROM python:3.11-slim AS builder
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential curl \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir poetry==1.8.3
WORKDIR /app
COPY pyproject.toml poetry.lock ./
RUN poetry config virtualenvs.create false \
&& poetry install --no-interaction --no-ansi --only main
# ── Stage 2: Runtime ─────────────────────────────────────────────────────────
FROM python:3.11-slim AS runtime
# ffmpeg required for video rendering tasks
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
tini \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy installed packages from builder
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
WORKDIR /app
COPY . .
# Non-root user for security
RUN groupadd -r worker && useradd -r -g worker worker \
&& chown -R worker:worker /app
USER worker
# Cloud Run Jobs: no persistent HTTP port needed.
# Cloud Run passes CLOUD_RUN_TASK_INDEX and CLOUD_RUN_TASK_COUNT env vars.
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONPATH=/app
ENTRYPOINT ["tini", "--", "python", "-m", "app.tasks.runner"]
# Args are injected per-execution via Cloud Run Job overrides:
# --task ingest|translate|render|rerender --job-id <id> [--language <lang>] ...

127
backend/Dockerfile.old Normal file
View file

@ -0,0 +1,127 @@
# Build stage - Install dependencies and build wheels
FROM python:3.11-slim AS builder
# Install build dependencies
RUN apt-get update && apt-get install -y \
build-essential \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install Poetry
RUN pip install poetry==1.8.2
# Set Poetry configuration
ENV POETRY_NO_INTERACTION=1 \
POETRY_VENV_IN_PROJECT=1 \
POETRY_CACHE_DIR=/tmp/poetry_cache
WORKDIR /app
# Copy dependency files
COPY pyproject.toml poetry.lock ./
# Install dependencies into venv
RUN poetry config virtualenvs.in-project true && \
poetry lock --no-update || true && \
poetry install --only=main --no-root && \
rm -rf $POETRY_CACHE_DIR
# Base runtime stage
FROM python:3.11-slim AS base
# Install runtime system dependencies
RUN apt-get update && apt-get install -y \
ffmpeg \
curl \
tini \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Create non-root user
RUN groupadd --gid 1000 app \
&& useradd --uid 1000 --gid app --shell /bin/bash --create-home app
# Set working directory
WORKDIR /app
# Copy virtual environment from builder stage
COPY --from=builder --chown=app:app /app/.venv /app/.venv
# Ensure venv is in PATH
ENV PATH="/app/.venv/bin:$PATH"
# Copy application code
COPY --chown=app:app . .
# Switch to non-root user
USER app
# Production API stage
FROM base AS production
# Set environment variables for production
ENV APP_ENV=prod \
PYTHONPATH=/app \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Expose port
EXPOSE 8000
# Use tini as init system for proper signal handling
ENTRYPOINT ["tini", "--"]
# Default command for API server
CMD ["gunicorn", "-c", "gunicorn_conf.py"]
# Worker stage for Celery workers
FROM base AS worker
# Set environment variables for worker
ENV APP_ENV=prod \
PYTHONPATH=/app \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
C_FORCE_ROOT=1
# Health check for worker (check if Celery is responding)
HEALTHCHECK --interval=60s --timeout=15s --start-period=10s --retries=3 \
CMD python -c "from celery import Celery; app=Celery('app'); print('Worker healthy')" || exit 1
# Use tini as init system for proper signal handling
ENTRYPOINT ["tini", "--"]
# Default command for Celery worker
CMD ["celery", "-A", "app.tasks", "worker", "--loglevel=info", "--concurrency=1"]
# Development stage with dev dependencies
FROM builder AS development
# Install all dependencies including dev
RUN poetry install --no-root && rm -rf $POETRY_CACHE_DIR
# Install additional dev tools
RUN apt-get update && apt-get install -y \
git \
vim \
&& rm -rf /var/lib/apt/lists/*
# Copy application code
COPY --chown=app:app . .
# Switch to non-root user
USER app
# Set environment for development
ENV APP_ENV=dev \
PYTHONPATH=/app \
PYTHONUNBUFFERED=1
EXPOSE 8000
# Development command with hot reload
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]

View file

@ -22,7 +22,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/* && rm -rf /var/lib/apt/lists/*
# Install Poetry # Install Poetry
RUN pip install --no-cache-dir poetry==2.1.4 RUN pip install --no-cache-dir poetry==1.8.2
# Configure Poetry to not create virtual environment # Configure Poetry to not create virtual environment
ENV POETRY_NO_INTERACTION=1 \ ENV POETRY_NO_INTERACTION=1 \
@ -36,7 +36,7 @@ COPY pyproject.toml poetry.lock ./
# Install dependencies # Install dependencies
RUN poetry config virtualenvs.create false \ RUN poetry config virtualenvs.create false \
&& poetry install --only main --no-root --no-interaction --no-ansi \ && poetry install --only main --no-interaction --no-ansi \
&& rm -rf $POETRY_CACHE_DIR && rm -rf $POETRY_CACHE_DIR
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------

Binary file not shown.

Binary file not shown.

View file

@ -1,28 +1,26 @@
from datetime import datetime, timedelta from datetime import datetime, timedelta
from typing import Optional
from bson import ObjectId from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException, Query, Request, status from fastapi import APIRouter, Depends, HTTPException, Query, Request, status
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, get_membership_context
from ...core.database import get_database from ...core.database import get_database
from ...core.dependencies import get_current_user, require_roles from ...core.dependencies import get_current_user, require_roles
from ...core.logging import get_logger from ...core.logging import get_logger
from ...core.security import get_password_hash from ...core.security import get_password_hash, verify_password
from ...models.audit_log import AuditAction, AuditLogQuery, AuditLogResponse
from ...models.user import User, UserRole from ...models.user import User, UserRole
from ...models.audit_log import AuditAction, AuditLogQuery, AuditLogResponse
from ...schemas.auth import ( from ...schemas.auth import (
AdminStatsResponse, AdminStatsResponse,
ChangePasswordRequest,
CreateUserRequest, CreateUserRequest,
ResetPasswordRequest, ResetPasswordRequest,
UpdateUserRequest, UpdateUserRequest,
UserListResponse, UserListResponse,
UserResponse, UserResponse,
) )
from ...services.audit_logger import ( from ...services.audit_logger import audit_logger, log_user_management, log_security_event
audit_logger,
log_user_management,
)
from ...telemetry import app_metrics from ...telemetry import app_metrics
logger = get_logger(__name__) logger = get_logger(__name__)
@ -32,41 +30,21 @@ router = APIRouter(prefix="/admin", tags=["admin"])
@router.get("/users", response_model=UserListResponse) @router.get("/users", response_model=UserListResponse)
async def list_users( async def list_users(
page: int = Query(1, ge=1), page: int = Query(1, ge=1),
size: int = Query(20, ge=1, le=500), size: int = Query(20, ge=1, le=100),
role: str | None = Query(None, description="Single role or comma-separated list, e.g. 'linguist,admin'"), role: Optional[str] = Query(None),
active_only: bool = Query(True), active_only: bool = Query(True),
org_id: str | None = Query(None, description="Filter by org (platform admin only)"),
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""List users with filtering and pagination (admin only)""" """List users with filtering and pagination (admin only)"""
query: dict = {} query = {}
if role: if role:
roles = [r.strip() for r in role.split(",") if r.strip()] query["role"] = role
query["role"] = {"$in": roles} if len(roles) > 1 else roles[0]
if active_only: if active_only:
query["is_active"] = True query["is_active"] = True
if not ctx.is_platform_admin:
# Org-scoped admin: show only users in their org(s) via membership collection
accessible_org_ids = ctx.accessible_org_ids()
if not accessible_org_ids:
return UserListResponse(users=[], total=0, page=page, size=size)
member_ids_cursor = db.memberships.find(
{"organization_id": {"$in": accessible_org_ids}},
{"user_id": 1},
)
member_ids = [doc["user_id"] async for doc in member_ids_cursor]
query["_id"] = {"$in": member_ids}
elif org_id:
# Platform admin filtered to a specific org
member_ids_cursor = db.memberships.find({"organization_id": org_id}, {"user_id": 1})
member_ids = [doc["user_id"] async for doc in member_ids_cursor]
query["_id"] = {"$in": member_ids}
# Get total count # Get total count
total = await db.users.count_documents(query) total = await db.users.count_documents(query)
@ -86,7 +64,6 @@ async def list_users(
is_active=user_doc["is_active"], is_active=user_doc["is_active"],
created_at=user_doc.get("created_at", datetime.utcnow()).isoformat(), created_at=user_doc.get("created_at", datetime.utcnow()).isoformat(),
pm_client_ids=user_doc.get("pm_client_ids", []), pm_client_ids=user_doc.get("pm_client_ids", []),
languages=user_doc.get("languages", []),
)) ))
return UserListResponse( return UserListResponse(
@ -97,32 +74,6 @@ async def list_users(
) )
@router.get("/brief-assignees", response_model=list[UserResponse])
async def list_brief_assignees(
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Return users who can be assigned a brief (PM, production, admin). Accessible to all brief-creating roles."""
docs = await db.users.find(
{
"role": {"$in": [UserRole.ADMIN.value, UserRole.PROJECT_MANAGER.value, UserRole.PRODUCTION.value]},
"is_active": True,
},
{"hashed_password": 0},
).sort("full_name", 1).to_list(None)
return [UserResponse(
id=str(d["_id"]),
email=d["email"],
full_name=d["full_name"],
role=d["role"],
auth_provider=d.get("auth_provider", "local"),
is_active=d["is_active"],
created_at=d.get("created_at", datetime.utcnow()).isoformat() if d.get("created_at") else None,
pm_client_ids=d.get("pm_client_ids", []),
languages=d.get("languages", []),
) for d in docs]
@router.get("/users/{user_id}", response_model=UserResponse) @router.get("/users/{user_id}", response_model=UserResponse)
async def get_user( async def get_user(
user_id: str, user_id: str,
@ -146,7 +97,6 @@ async def get_user(
is_active=user_doc["is_active"], is_active=user_doc["is_active"],
created_at=user_doc.get("created_at", datetime.utcnow()).isoformat(), created_at=user_doc.get("created_at", datetime.utcnow()).isoformat(),
pm_client_ids=user_doc.get("pm_client_ids", []), pm_client_ids=user_doc.get("pm_client_ids", []),
languages=user_doc.get("languages", []),
) )
@ -200,7 +150,6 @@ async def create_user(
is_active=True, is_active=True,
created_at=user_doc["created_at"].isoformat(), created_at=user_doc["created_at"].isoformat(),
pm_client_ids=[], pm_client_ids=[],
languages=[],
) )
@ -253,7 +202,7 @@ async def update_user(
action = AuditAction.USER_ROLE_CHANGE if user_update.role else AuditAction.USER_UPDATE action = AuditAction.USER_ROLE_CHANGE if user_update.role else AuditAction.USER_UPDATE
await log_user_management( await log_user_management(
action, user_id, current_user, request, action, user_id, current_user, request,
details=dict(user_update.dict(exclude_none=True).items()), details={k: v for k, v in user_update.dict(exclude_none=True).items()},
) )
return UserResponse( return UserResponse(
@ -265,7 +214,6 @@ async def update_user(
is_active=result["is_active"], is_active=result["is_active"],
created_at=result.get("created_at", datetime.utcnow()).isoformat(), created_at=result.get("created_at", datetime.utcnow()).isoformat(),
pm_client_ids=result.get("pm_client_ids", []), pm_client_ids=result.get("pm_client_ids", []),
languages=result.get("languages", []),
) )
@ -439,7 +387,7 @@ async def detailed_health_check(
try: try:
from ...services.gcs import gcs_service from ...services.gcs import gcs_service
# Simple check to see if bucket is accessible # Simple check to see if bucket is accessible
await gcs_service.file_exists("health_check_dummy") # This will return False but won't error if bucket accessible bucket_exists = await gcs_service.file_exists("health_check_dummy") # This will return False but won't error if bucket accessible
health_status["components"]["gcs"] = {"status": "healthy"} health_status["components"]["gcs"] = {"status": "healthy"}
except Exception as e: except Exception as e:
health_status["components"]["gcs"] = {"status": "unhealthy", "error": str(e)} health_status["components"]["gcs"] = {"status": "unhealthy", "error": str(e)}
@ -596,6 +544,47 @@ async def admin_force_password_reset(
} }
@router.get("/audit-logs")
async def get_audit_logs(
job_id: Optional[str] = Query(None),
action: Optional[str] = Query(None),
days: int = Query(7, ge=1, le=90),
page: int = Query(1, ge=1),
size: int = Query(50, ge=1, le=200),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Get audit logs with filtering (production/admin only)"""
query = {
"when": {"$gte": datetime.utcnow() - timedelta(days=days)}
}
if job_id:
query["job_id"] = job_id
if action:
query["action"] = action
# Get total count
total = await db.audit_logs.count_documents(query)
# Get paginated results
skip = (page - 1) * size
cursor = (
db.audit_logs.find(query)
.sort("when", -1)
.skip(skip)
.limit(size)
)
logs = await cursor.to_list(length=size)
return {
"logs": logs,
"total": total,
"page": page,
"size": size,
"period_days": days
}
@router.post("/maintenance/reprocess-job/{job_id}") @router.post("/maintenance/reprocess-job/{job_id}")
async def reprocess_job( async def reprocess_job(
@ -656,23 +645,23 @@ async def reprocess_job(
@router.get("/audit-logs", response_model=AuditLogResponse) @router.get("/audit-logs", response_model=AuditLogResponse)
async def get_audit_logs_detailed( async def get_audit_logs_detailed(
# Time range # Time range
start_date: datetime | None = Query(None, description="Start date for audit logs"), start_date: Optional[datetime] = Query(None, description="Start date for audit logs"),
end_date: datetime | None = Query(None, description="End date for audit logs"), end_date: Optional[datetime] = Query(None, description="End date for audit logs"),
# Filters # Filters
action: str | None = Query(None, description="Filter by action type"), action: Optional[str] = Query(None, description="Filter by action type"),
severity: str | None = Query(None, description="Filter by severity level"), severity: Optional[str] = Query(None, description="Filter by severity level"),
user_email: str | None = Query(None, description="Filter by user email"), user_email: Optional[str] = Query(None, description="Filter by user email"),
resource_type: str | None = Query(None, description="Filter by resource type"), resource_type: Optional[str] = Query(None, description="Filter by resource type"),
resource_id: str | None = Query(None, description="Filter by resource ID"), resource_id: Optional[str] = Query(None, description="Filter by resource ID"),
success: bool | None = Query(None, description="Filter by success status"), success: Optional[bool] = Query(None, description="Filter by success status"),
# Search # Search
search: str | None = Query(None, description="Search in description and details"), search: Optional[str] = Query(None, description="Search in description and details"),
# Pagination (skip/limit to match frontend AuditLogQuery) # Pagination
skip: int = Query(0, ge=0, description="Number of records to skip"), page: int = Query(1, ge=1, description="Page number"),
limit: int = Query(50, ge=1, le=500, description="Max records to return"), size: int = Query(50, ge=1, le=500, description="Page size"),
# Sorting # Sorting
sort_by: str = Query("timestamp", description="Field to sort by"), sort_by: str = Query("timestamp", description="Field to sort by"),
@ -683,6 +672,25 @@ async def get_audit_logs_detailed(
): ):
"""Get audit logs with filtering and pagination (production/admin only)""" """Get audit logs with filtering and pagination (production/admin only)"""
# Log audit log access
await audit_logger.log_action(
action="admin.audit.access",
description=f"Admin {current_user.email} accessed audit logs",
user=current_user,
request=request,
details={
"filters": {
"start_date": start_date.isoformat() if start_date else None,
"end_date": end_date.isoformat() if end_date else None,
"action": action,
"severity": severity,
"user_email": user_email,
"resource_type": resource_type,
"search": search
}
}
)
# Build query # Build query
query = AuditLogQuery( query = AuditLogQuery(
start_date=start_date, start_date=start_date,
@ -694,8 +702,8 @@ async def get_audit_logs_detailed(
resource_id=resource_id, resource_id=resource_id,
success=success, success=success,
search=search, search=search,
skip=skip, skip=(page - 1) * size,
limit=limit, limit=size,
sort_by=sort_by, sort_by=sort_by,
sort_order=sort_order sort_order=sort_order
) )
@ -708,34 +716,32 @@ async def get_user_audit_logs(
user_id: str, user_id: str,
days: int = Query(30, ge=1, le=365, description="Number of days to look back"), days: int = Query(30, ge=1, le=365, description="Number of days to look back"),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
request: Request = None, request: Request = None,
): ):
"""Get audit logs for a specific user — accepts user ID or email (production/admin only)""" """Get audit logs for a specific user (production/admin only)"""
import re as _re # Validate user_id
try:
# Accept email address: look up user by case-insensitive email match ObjectId(user_id)
resolved_id = user_id except Exception:
if "@" in user_id: raise HTTPException(
user_doc = await db.users.find_one( status_code=status.HTTP_400_BAD_REQUEST,
{"email": _re.compile(f"^{_re.escape(user_id)}$", _re.IGNORECASE)}, detail="Invalid user ID format"
{"_id": 1},
) )
if user_doc:
resolved_id = str(user_doc["_id"])
logs = await audit_logger.get_user_activity(resolved_id, days) # Log access to user audit logs
await audit_logger.log_action(
action="admin.audit.access",
description=f"Admin {current_user.email} accessed user audit logs for {user_id}",
user=current_user,
request=request,
resource_type="user",
resource_id=user_id,
details={"days_requested": days}
)
# Fallback: query by email field in audit logs (case-insensitive via audit_logger) logs = await audit_logger.get_user_activity(user_id, days)
if not logs and "@" in user_id: return {"logs": logs, "user_id": user_id, "days": days}
from ...models.audit_log import AuditLogQuery as ALQ
from ...services.audit_logger import audit_logger as al
q = ALQ(user_email=user_id, limit=1000, sort_by="timestamp", sort_order=-1)
result = await al.query_logs(q)
logs = result.logs
return logs
@router.get("/audit-logs/security") @router.get("/audit-logs/security")
@ -756,7 +762,7 @@ async def get_security_events(
) )
logs = await audit_logger.get_security_events(hours) logs = await audit_logger.get_security_events(hours)
return logs return {"logs": logs, "hours": hours}
@router.delete("/audit-logs/cleanup") @router.delete("/audit-logs/cleanup")

View file

@ -1,295 +0,0 @@
"""Admin production endpoints: failure dashboard, bulk retry, queue stats, VTT override."""
from datetime import datetime
import redis.asyncio as aioredis
from fastapi import (
APIRouter,
Depends,
File,
Form,
HTTPException,
Query,
UploadFile,
status,
)
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...core.logging import get_logger
from ...core.redis import get_redis
from ...models.audit_log import AuditAction
from ...models.job import JobStatus, RequestedOutputs
from ...models.user import User, UserRole
from ...schemas.job import JobResponse
from ...services.audit_logger import audit_logger
from ...services.cloud_run_dispatch import dispatch as _cr_dispatch
from ...services.gcs import upload_vtt_to_gcs
logger = get_logger(__name__)
router = APIRouter(prefix="/admin/production", tags=["admin-production"])
_FAILURE_STATUSES = [
JobStatus.PROCESSING_FAILED.value,
JobStatus.TTS_FAILED.value,
JobStatus.RENDER_FAILED.value,
]
_RETRY_CAP = 50
class BulkRetryRequest(BaseModel):
job_ids: list[str]
strategy: str = "auto" # "auto" | "from_scratch"
class BulkRetryResponse(BaseModel):
retried: list[str]
skipped: list[str]
errors: list[dict]
@router.get("/failures", response_model=list[JobResponse])
async def list_failures(
step: str | None = Query(None, description="Filter by failure.step"),
org_id: str | None = Query(None, description="Filter by organization_id"),
limit: int = Query(50, ge=1, le=200),
skip: int = Query(0, ge=0),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List all jobs in a failed status, optionally filtered by step and org."""
query: dict = {"status": {"$in": _FAILURE_STATUSES}}
if step:
query["failure.step"] = step
if org_id:
query["organization_id"] = org_id
cursor = db.jobs.find(query).sort("updated_at", -1).skip(skip).limit(limit)
jobs = await cursor.to_list(length=limit)
return [
JobResponse(
id=str(j["_id"]),
title=j["title"],
status=j["status"],
source=j["source"],
requested_outputs=RequestedOutputs(**j["requested_outputs"]),
review=j.get("review", {"notes": "", "history": []}),
outputs=j.get("outputs"),
created_at=j["created_at"].isoformat(),
updated_at=j["updated_at"].isoformat(),
)
for j in jobs
]
@router.post("/bulk-retry", response_model=BulkRetryResponse)
async def bulk_retry(
payload: BulkRetryRequest,
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Retry up to 50 failed jobs in one call."""
if len(payload.job_ids) > _RETRY_CAP:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"Cannot retry more than {_RETRY_CAP} jobs at once",
)
retried: list[str] = []
skipped: list[str] = []
errors: list[dict] = []
now = datetime.utcnow()
for job_id in payload.job_ids:
try:
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
skipped.append(job_id)
continue
if job_doc["status"] not in _FAILURE_STATUSES:
skipped.append(job_id)
continue
failure = job_doc.get("failure") or {}
if payload.strategy == "from_scratch":
step = "ingestion"
else:
step = failure.get("step")
if not step:
step = "tts" if job_doc["status"] == JobStatus.TTS_FAILED.value else "render"
if step in ("ingestion", "ai_processing"):
reset_status = JobStatus.CREATED.value
elif step == "translation":
reset_status = JobStatus.AI_PROCESSING.value
elif step == "tts":
src = job_doc["source"].get("language", "en")
reset_status = (
JobStatus.APPROVED_ENGLISH.value if src == "en" else JobStatus.APPROVED_SOURCE.value
)
elif step == "render":
reset_status = JobStatus.PENDING_QC.value
else:
skipped.append(job_id)
continue
await db.jobs.update_one(
{"_id": job_id},
{
"$set": {"status": reset_status, "error": None, "updated_at": now},
"$inc": {"retry_count": 1},
"$push": {
"review.history": {
"at": now,
"status": f"bulk_retry_{step}",
"by": str(current_user.id),
}
},
},
)
if step in ("ingestion", "ai_processing"):
await _cr_dispatch("ingest", job_id)
elif step in ("translation", "tts"):
await _cr_dispatch("translate", job_id)
elif step == "render":
lang = job_doc.get("last_render_language", "en")
await _cr_dispatch("rerender", job_id, language=lang)
retried.append(job_id)
except Exception as e:
logger.error(f"bulk-retry failed for job {job_id}: {e}")
errors.append({"job_id": job_id, "error": str(e)})
try:
await audit_logger.log(
action=AuditAction.JOB_BULK_RETRY,
user_id=str(current_user.id),
user_email=current_user.email,
user_role=current_user.role.value if current_user.role else None,
resource_type="job",
description=f"Bulk retry {len(retried)} jobs (strategy={payload.strategy})",
details={"retried": retried, "skipped": skipped, "error_count": len(errors)},
)
except Exception as e:
logger.warning(f"Failed to write bulk-retry audit log: {e}")
return BulkRetryResponse(retried=retried, skipped=skipped, errors=errors)
# ---------------------------------------------------------------------------
# PR-7: Queue depth stats
# ---------------------------------------------------------------------------
_CELERY_QUEUES = ["default", "ingest", "tts", "render", "ffmpeg", "whisper", "notify", "embed"]
class QueueStats(BaseModel):
queues: dict[str, int] # queue_name → pending task count
total_pending: int
@router.get("/queue-stats", response_model=QueueStats)
async def get_queue_stats(
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
redis: aioredis.Redis = Depends(get_redis),
):
"""Return pending task counts per Celery queue (via Redis LLEN)."""
counts: dict[str, int] = {}
for q in _CELERY_QUEUES:
try:
n = await redis.llen(q)
counts[q] = n
except Exception:
counts[q] = 0
return QueueStats(queues=counts, total_pending=sum(counts.values()))
# ---------------------------------------------------------------------------
# PR-8: Upload final VTT override — bypass AI, jump to PENDING_QC
# ---------------------------------------------------------------------------
_BYPASSABLE_STATUSES = {
JobStatus.CREATED.value,
JobStatus.INGESTING.value,
JobStatus.AI_PROCESSING.value,
JobStatus.PROCESSING_FAILED.value,
JobStatus.TTS_FAILED.value,
JobStatus.RENDER_FAILED.value,
}
@router.post("/jobs/{job_id}/upload-final-vtt")
async def upload_final_vtt(
job_id: str,
language: str = Form(..., description="BCP-47 language code, e.g. 'en' or 'fr'"),
vtt_file: UploadFile = File(..., description="WebVTT (.vtt) file"),
vtt_type: str = Form("captions", description="'captions' or 'ad'"),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Upload a hand-crafted VTT to override AI output and advance job to PENDING_QC."""
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
if job_doc["status"] not in _BYPASSABLE_STATUSES:
raise HTTPException(
status_code=status.HTTP_409_CONFLICT,
detail=f"Cannot override VTT when job is in status '{job_doc['status']}'. "
f"Only allowed in: {sorted(_BYPASSABLE_STATUSES)}",
)
if not vtt_file.filename or not vtt_file.filename.endswith(".vtt"):
raise HTTPException(status_code=400, detail="File must be a .vtt file")
vtt_content = (await vtt_file.read()).decode("utf-8")
if not vtt_content.strip().startswith("WEBVTT"):
raise HTTPException(status_code=400, detail="File does not start with WEBVTT header")
if vtt_type not in ("captions", "ad"):
raise HTTPException(status_code=400, detail="vtt_type must be 'captions' or 'ad'")
lang_key = language.replace("-", "_")
field = "captions_vtt_gcs" if vtt_type == "captions" else "ad_vtt_gcs"
gcs_path = f"{job_id}/{lang_key}/{vtt_type}.vtt"
gcs_uri = await upload_vtt_to_gcs(vtt_content, gcs_path)
now = datetime.utcnow()
await db.jobs.update_one(
{"_id": job_id},
{
"$set": {
f"outputs.{lang_key}.{field}": gcs_uri,
"status": JobStatus.PENDING_QC.value,
"updated_at": now,
},
"$push": {
"review.history": {
"at": now,
"status": "manual_vtt_upload",
"by": str(current_user.id),
"note": f"Manual {vtt_type} VTT upload for {language} by {current_user.email}",
}
},
},
)
try:
await audit_logger.log(
action=AuditAction.VTT_EDIT,
user_id=str(current_user.id),
user_email=current_user.email,
user_role=current_user.role.value if current_user.role else None,
resource_type="job",
resource_id=job_id,
description=f"Manual {vtt_type} VTT upload for {language} — job advanced to PENDING_QC",
)
except Exception as e:
logger.warning(f"Failed to write upload-final-vtt audit log: {e}")
return {"status": "ok", "gcs_uri": gcs_uri, "job_status": JobStatus.PENDING_QC.value}

View file

@ -1,126 +1,112 @@
import re import re
import secrets
from datetime import datetime from datetime import datetime
from fastapi import APIRouter, Depends, HTTPException, Request, Response, status from fastapi import APIRouter, Depends, HTTPException, Request, Response, status
from fastapi.security import HTTPBearer from fastapi.security import HTTPBearer
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorClient, AsyncIOMotorDatabase
from ...core.config import settings from ...core.config import settings
from ...core.database import get_database from ...core.database import get_database
from ...core.logging import get_logger
from ...core.security import ( from ...core.security import (
create_access_token, create_access_token,
create_refresh_token, create_refresh_token,
decode_token, decode_token,
verify_password, verify_password,
) )
from ...models.audit_log import AuditAction, AuditLogSeverity from ...models.user import User, AuthProvider, UserRole
from ...models.user import AuthProvider, User, UserRole
from ...schemas.auth import ( from ...schemas.auth import (
LoginRequest, LoginRequest,
LoginResponse, LoginResponse,
LogoutResponse, LogoutResponse,
RefreshResponse,
MicrosoftLoginRequest, MicrosoftLoginRequest,
MicrosoftLoginResponse, MicrosoftLoginResponse,
RefreshResponse,
) )
from ...services.audit_logger import audit_logger, log_auth_failure, log_auth_success
from ...services.microsoft_auth import ( from ...services.microsoft_auth import (
MicrosoftAuthError,
MicrosoftTokenValidationError,
get_microsoft_auth_service, get_microsoft_auth_service,
MicrosoftTokenValidationError,
MicrosoftAuthError,
) )
from ...services.audit_logger import log_auth_success, log_auth_failure, audit_logger
from ...models.audit_log import AuditAction, AuditLogSeverity
logger = get_logger(__name__)
router = APIRouter(prefix="/auth", tags=["auth"]) router = APIRouter(prefix="/auth", tags=["auth"])
security = HTTPBearer() security = HTTPBearer()
async def _get_user_org_ids(user_id: str, db: AsyncIOMotorDatabase) -> list[str]:
"""Return list of org IDs the user belongs to — used as a JWT hint only."""
cursor = db.memberships.find({"user_id": user_id}, {"organization_id": 1})
memberships = await cursor.to_list(length=200)
return [str(m["organization_id"]) for m in memberships if m.get("organization_id")]
def _set_auth_cookies(response: Response, refresh_token: str) -> str:
"""Set httponly refresh_token cookie and readable csrf_token cookie. Returns the csrf token."""
csrf_token = secrets.token_hex(32)
ttl = settings.jwt_refresh_ttl_days * 24 * 60 * 60
domain = settings.cookie_domain if settings.app_env == "prod" else None
response.set_cookie(
key="refresh_token",
value=refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=domain,
max_age=ttl,
)
response.set_cookie(
key="csrf_token",
value=csrf_token,
httponly=False, # JS-readable for Double Submit Cookie pattern
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=domain,
max_age=ttl,
)
return csrf_token
@router.post("/login", response_model=LoginResponse) @router.post("/login", response_model=LoginResponse)
async def login( async def login(
login_data: LoginRequest, login_data: LoginRequest,
request: Request, request: Request,
response: Response, response: Response,
db: AsyncIOMotorDatabase = Depends(get_database),
): ):
user_doc = await db.users.find_one({"email": login_data.email}) print(f"LOGIN: Starting login for {login_data.email}")
if not user_doc: # Create database connection directly (bypass dependency injection issues)
await log_auth_failure(login_data.email, request, "User not found") client = AsyncIOMotorClient(settings.mongodb_uri)
raise HTTPException( db = client[settings.mongodb_db]
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password", try:
print("LOGIN: Database connection created")
# Find user by email
print("LOGIN: Looking up user in database")
user_doc = await db.users.find_one({"email": login_data.email})
print(f"LOGIN: User lookup complete, found: {user_doc is not None}")
if not user_doc:
await log_auth_failure(login_data.email, request, "User not found")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
)
user = User(**user_doc)
# Check if user uses Microsoft authentication
if user.auth_provider == AuthProvider.MICROSOFT:
await log_auth_failure(login_data.email, request, "Account uses Microsoft SSO")
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="This account uses Microsoft authentication. Please sign in with Microsoft.",
)
# Verify password
if not user.hashed_password or not verify_password(login_data.password, user.hashed_password):
await log_auth_failure(login_data.email, request, "Invalid password")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
)
if not user.is_active:
await log_auth_failure(login_data.email, request, "Account disabled")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account is disabled",
)
# Create tokens
access_token = create_access_token(subject=str(user.id))
refresh_token = create_refresh_token(subject=str(user.id))
# Set refresh token as HttpOnly cookie
response.set_cookie(
key="refresh_token",
value=refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=settings.cookie_domain if settings.app_env == "prod" else None,
max_age=settings.jwt_refresh_ttl_days * 24 * 60 * 60,
) )
user = User(**user_doc) await log_auth_success(user, request)
return LoginResponse(
if user.auth_provider == AuthProvider.MICROSOFT: access_token=access_token,
await log_auth_failure(login_data.email, request, "Account uses Microsoft SSO") user_id=str(user.id),
raise HTTPException( role=user.role,
status_code=status.HTTP_400_BAD_REQUEST,
detail="This account uses Microsoft authentication. Please sign in with Microsoft.",
) )
if not user.hashed_password or not verify_password(login_data.password, user.hashed_password): finally:
await log_auth_failure(login_data.email, request, "Invalid password") # Close database connection
raise HTTPException( client.close()
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
)
if not user.is_active:
await log_auth_failure(login_data.email, request, "Account disabled")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account is disabled",
)
org_ids = await _get_user_org_ids(str(user.id), db)
access_token = create_access_token(subject=str(user.id), org_ids=org_ids)
refresh_token = create_refresh_token(subject=str(user.id))
_set_auth_cookies(response, refresh_token)
await log_auth_success(user, request)
return LoginResponse(
access_token=access_token,
user_id=str(user.id),
role=user.role,
)
@router.post("/microsoft", response_model=MicrosoftLoginResponse) @router.post("/microsoft", response_model=MicrosoftLoginResponse)
@ -128,84 +114,127 @@ async def microsoft_login(
login_data: MicrosoftLoginRequest, login_data: MicrosoftLoginRequest,
request: Request, request: Request,
response: Response, response: Response,
db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""Authenticate user with Microsoft ID token. """Authenticate user with Microsoft ID token.
This endpoint validates the Microsoft ID token, finds or creates the user, This endpoint validates the Microsoft ID token, finds or creates the user,
and returns JWT tokens for API access. and returns JWT tokens for API access.
""" """
microsoft_auth = get_microsoft_auth_service() print(f"MICROSOFT LOGIN: Starting Microsoft authentication")
# Create database connection
client = AsyncIOMotorClient(settings.mongodb_uri)
db = client[settings.mongodb_db]
try: try:
user_info = await microsoft_auth.validate_token(login_data.id_token) # Validate Microsoft token
except MicrosoftTokenValidationError as e: microsoft_auth = get_microsoft_auth_service()
await log_auth_failure(login_data.id_token[:20] + "", request, f"MS token invalid: {e}") try:
raise HTTPException( user_info = microsoft_auth.validate_token(login_data.id_token)
status_code=status.HTTP_401_UNAUTHORIZED, print(f"MICROSOFT LOGIN: Token validated for {user_info.email}")
detail=f"Microsoft authentication failed: {str(e)}", except MicrosoftTokenValidationError as e:
) from None print(f"MICROSOFT LOGIN ERROR: Token validation failed: {e}")
except MicrosoftAuthError as e: await log_auth_failure(login_data.id_token[:20] + "", request, f"MS token invalid: {e}")
await log_auth_failure("microsoft-sso", request, f"MS auth service error: {e}") raise HTTPException(
raise HTTPException( status_code=status.HTTP_401_UNAUTHORIZED,
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=f"Microsoft authentication failed: {str(e)}",
detail="Microsoft authentication service error", )
) from None except MicrosoftAuthError as e:
print(f"MICROSOFT LOGIN ERROR: Authentication error: {e}")
# Look up by Microsoft-derived ID first — handles email casing changes across logins await log_auth_failure("microsoft-sso", request, f"MS auth service error: {e}")
ms_user_id = f"ms-{user_info.sub[:20]}" raise HTTPException(
user_doc = await db.users.find_one({"_id": ms_user_id}) status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
if not user_doc: detail="Microsoft authentication service error",
# Fall back to case-insensitive email lookup (handles local-to-Microsoft migration)
user_doc = await db.users.find_one(
{"email": {"$regex": f"^{re.escape(user_info.email)}$", "$options": "i"}}
)
if user_doc:
user = User(**user_doc)
if user.auth_provider == AuthProvider.LOCAL:
await db.users.update_one(
{"_id": user_doc["_id"]},
{"$set": {"auth_provider": AuthProvider.MICROSOFT.value, "updated_at": datetime.utcnow()}},
) )
user.auth_provider = AuthProvider.MICROSOFT
else:
new_user = {
"_id": ms_user_id,
"email": user_info.email,
"full_name": user_info.name,
"hashed_password": None,
"role": UserRole.CLIENT.value,
"auth_provider": AuthProvider.MICROSOFT.value,
"is_active": True,
"pm_client_ids": [],
"created_at": datetime.utcnow(),
"updated_at": datetime.utcnow(),
}
await db.users.insert_one(new_user)
user = User(**new_user)
if not user.is_active: # Find or create user
await log_auth_failure(user.email, request, "Account disabled") # Look up by Microsoft-derived ID first — handles email casing changes across logins
raise HTTPException( # (Microsoft can return vadymsamoilenko@... vs VadymSamoilenko@... for the same user)
status_code=status.HTTP_401_UNAUTHORIZED, ms_user_id = f"ms-{user_info.sub[:20]}"
detail="User account is disabled", user_doc = await db.users.find_one({"_id": ms_user_id})
if not user_doc:
# Fall back to case-insensitive email lookup (handles local-to-Microsoft migration)
user_doc = await db.users.find_one(
{"email": {"$regex": f"^{re.escape(user_info.email)}$", "$options": "i"}}
)
if user_doc:
# User exists
user = User(**user_doc)
print(f"MICROSOFT LOGIN: Existing user found: {user.id}")
# Update auth_provider if user is switching from local to Microsoft
if user.auth_provider == AuthProvider.LOCAL:
print(f"MICROSOFT LOGIN: Updating user to Microsoft auth provider")
await db.users.update_one(
{"_id": user_doc["_id"]},
{
"$set": {
"auth_provider": AuthProvider.MICROSOFT.value,
"updated_at": datetime.utcnow()
}
}
)
user.auth_provider = AuthProvider.MICROSOFT
else:
# Create new user with zero org memberships (SaaS model).
# They will see a "no access" landing until an admin invites them.
print(f"MICROSOFT LOGIN: Creating new user for {user_info.email}")
new_user = {
"_id": ms_user_id,
"email": user_info.email,
"full_name": user_info.name,
"hashed_password": None,
"role": UserRole.CLIENT.value,
"auth_provider": AuthProvider.MICROSOFT.value,
"is_active": True,
"pm_client_ids": [],
"created_at": datetime.utcnow(),
"updated_at": datetime.utcnow(),
}
await db.users.insert_one(new_user)
user = User(**new_user)
print(f"MICROSOFT LOGIN: New user created (zero memberships): {user.id}")
# Check if user is active
if not user.is_active:
await log_auth_failure(user.email, request, "Account disabled")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account is disabled",
)
# Create JWT tokens
access_token = create_access_token(subject=str(user.id))
refresh_token = create_refresh_token(subject=str(user.id))
# Set refresh token as HttpOnly cookie
response.set_cookie(
key="refresh_token",
value=refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=settings.cookie_domain if settings.app_env == "prod" else None,
max_age=settings.jwt_refresh_ttl_days * 24 * 60 * 60,
) )
org_ids = await _get_user_org_ids(str(user.id), db) print(f"MICROSOFT LOGIN: Authentication successful for {user.email}")
access_token = create_access_token(subject=str(user.id), org_ids=org_ids) await log_auth_success(user, request)
refresh_token = create_refresh_token(subject=str(user.id)) return MicrosoftLoginResponse(
access_token=access_token,
user_id=str(user.id),
role=user.role if isinstance(user.role, str) else user.role.value,
email=user.email,
full_name=user.full_name,
auth_provider=user.auth_provider,
)
_set_auth_cookies(response, refresh_token) finally:
# Close database connection
await log_auth_success(user, request) client.close()
return MicrosoftLoginResponse(
access_token=access_token,
user_id=str(user.id),
role=user.role if isinstance(user.role, str) else user.role.value,
email=user.email,
full_name=user.full_name,
auth_provider=user.auth_provider,
)
@router.post("/refresh", response_model=RefreshResponse) @router.post("/refresh", response_model=RefreshResponse)
@ -215,32 +244,29 @@ async def refresh_token(
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
refresh_token = request.cookies.get("refresh_token") refresh_token = request.cookies.get("refresh_token")
print(f"🔍 REFRESH DEBUG: Cookie exists: {bool(refresh_token)}")
if not refresh_token: if not refresh_token:
print("🚨 REFRESH ERROR: No refresh token in cookies")
raise HTTPException( raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED, status_code=status.HTTP_401_UNAUTHORIZED,
detail="Refresh token not found", detail="Refresh token not found",
) )
# CSRF protection: Double Submit Cookie pattern
csrf_cookie = request.cookies.get("csrf_token")
csrf_header = request.headers.get("X-CSRF-Token")
if csrf_cookie and (not csrf_header or csrf_header != csrf_cookie):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="CSRF token mismatch",
)
try: try:
print(f"🔍 REFRESH DEBUG: Attempting to decode token...")
payload = decode_token(refresh_token) payload = decode_token(refresh_token)
print(f"🔍 REFRESH DEBUG: Token decoded successfully, type={payload.get('type')}")
if payload.get("type") != "refresh": if payload.get("type") != "refresh":
print(f"🚨 REFRESH ERROR: Wrong token type: {payload.get('type')}")
raise HTTPException( raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED, status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid token type", detail="Invalid token type",
) )
user_id = payload.get("sub") user_id = payload.get("sub")
print(f"🔍 REFRESH DEBUG: User ID from token: {user_id}")
if not user_id: if not user_id:
raise HTTPException( raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED, status_code=status.HTTP_401_UNAUTHORIZED,
@ -262,15 +288,22 @@ async def refresh_token(
detail="User account is disabled", detail="User account is disabled",
) )
# Create new tokens (include org_ids claim for prefilter hint) # Create new tokens
_org_ids = await _get_user_org_ids(user_id, db) new_access_token = create_access_token(subject=user_id)
new_access_token = create_access_token(subject=user_id, org_ids=_org_ids)
new_refresh_token = create_refresh_token(subject=user_id) new_refresh_token = create_refresh_token(subject=user_id)
# Rotate both refresh and CSRF cookies # Update refresh token cookie
_set_auth_cookies(response, new_refresh_token) response.set_cookie(
key="refresh_token",
value=new_refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=settings.cookie_domain if settings.app_env == "prod" else None,
max_age=settings.jwt_refresh_ttl_days * 24 * 60 * 60,
)
logger.info("Token refresh successful for user %s", user_id) print(f"🔍 REFRESH DEBUG: Refresh successful for user {user_id}")
return RefreshResponse( return RefreshResponse(
access_token=new_access_token, access_token=new_access_token,
user_id=user_id, user_id=user_id,
@ -279,15 +312,14 @@ async def refresh_token(
full_name=user.full_name full_name=user.full_name
) )
except HTTPException:
raise
except Exception as e: except Exception as e:
print(f"🚨 REFRESH ERROR: Exception during refresh: {type(e).__name__}: {e}")
import traceback import traceback
logger.exception("Refresh token error: %s\n%s", type(e).__name__, traceback.format_exc()) print(f"Traceback:\n{traceback.format_exc()}")
raise HTTPException( raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED, status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid refresh token", detail=f"Invalid refresh token: {str(e)}",
) from None )
@router.post("/logout", response_model=LogoutResponse) @router.post("/logout", response_model=LogoutResponse)

View file

@ -1,245 +0,0 @@
"""Job Brief CRUD endpoints."""
from datetime import datetime
from fastapi import APIRouter, Depends, HTTPException, Request, status
from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, assert_user_in_org, get_membership_context
from ...core.database import get_database
from ...core.logging import get_logger
from ...models.audit_log import AuditAction
from ...models.job_brief import (
BriefStatus,
JobBriefCreate,
JobBriefResponse,
JobBriefUpdate,
)
from ...models.organization import OrgRole
from ...services.audit_logger import audit_logger
logger = get_logger(__name__)
router = APIRouter(prefix="/briefs", tags=["briefs"])
def _doc_to_response(doc: dict) -> JobBriefResponse:
return JobBriefResponse(
id=str(doc["_id"]),
organization_id=doc["organization_id"],
project_id=doc.get("project_id"),
title=doc["title"],
description=doc.get("description"),
requested_outputs=doc["requested_outputs"],
languages=doc.get("languages", []),
deadline=doc.get("deadline"),
status=doc["status"],
created_by=doc["created_by"],
assignee_id=doc.get("assignee_id"),
job_id=doc.get("job_id"),
created_at=doc["created_at"].isoformat(),
updated_at=doc["updated_at"].isoformat(),
submitted_at=doc["submitted_at"].isoformat() if doc.get("submitted_at") else None,
approved_by=doc.get("approved_by"),
)
@router.get("", response_model=list[JobBriefResponse])
async def list_briefs(
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
org_ids = [m.organization_id for m in ctx.memberships] if hasattr(ctx, "memberships") else []
if ctx.is_platform_admin:
query: dict = {}
elif org_ids:
query = {"organization_id": {"$in": org_ids}}
else:
raise HTTPException(status_code=403, detail="No org memberships")
cursor = db.job_briefs.find(query).sort("created_at", -1).limit(100)
docs = await cursor.to_list(length=100)
return [_doc_to_response(d) for d in docs]
@router.post("", response_model=JobBriefResponse, status_code=status.HTTP_201_CREATED)
async def create_brief(
payload: JobBriefCreate,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
# Resolve org from project if not directly identifiable
org_id: str | None = None
if payload.project_id:
project = await db.projects.find_one({"_id": payload.project_id}, {"client_id": 1})
if project:
org_id = project.get("client_id")
if not org_id:
# Use first membership org if user has only one (or admin)
if ctx.is_platform_admin:
raise HTTPException(status_code=400, detail="Admin must supply project_id or org_id cannot be inferred")
memberships = [m for m in (ctx.memberships if hasattr(ctx, "memberships") else [])
if ctx.can_access_org(m.organization_id, OrgRole.MANAGER)]
if len(memberships) == 1:
org_id = memberships[0].organization_id
else:
raise HTTPException(status_code=400, detail="Cannot infer organization; supply project_id")
assert_user_in_org(ctx, org_id, OrgRole.MANAGER)
now = datetime.utcnow()
doc = {
"_id": f"brief_{now.strftime('%Y%m%d%H%M%S%f')}_{str(ctx.user.id)[-6:]}",
"organization_id": org_id,
"project_id": payload.project_id,
"title": payload.title,
"description": payload.description,
"requested_outputs": payload.requested_outputs.model_dump(),
"languages": payload.languages,
"deadline": payload.deadline,
"assignee_id": payload.assignee_id,
"status": BriefStatus.DRAFT.value,
"created_by": str(ctx.user.id),
"job_id": None,
"created_at": now,
"updated_at": now,
"submitted_at": None,
"approved_by": None,
}
await db.job_briefs.insert_one(doc)
await audit_logger.log_action(
action=AuditAction.BRIEF_CREATE,
description=f"Brief '{payload.title}' created",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=str(doc["_id"]),
details={"title": payload.title, "organization_id": org_id},
)
return _doc_to_response(doc)
@router.get("/{brief_id}", response_model=JobBriefResponse)
async def get_brief(
brief_id: str,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.VIEWER)
return _doc_to_response(doc)
@router.patch("/{brief_id}", response_model=JobBriefResponse)
async def update_brief(
brief_id: str,
payload: JobBriefUpdate,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.MANAGER)
if doc["status"] != BriefStatus.DRAFT.value:
raise HTTPException(status_code=400, detail="Only DRAFT briefs can be updated")
updates: dict = {"updated_at": datetime.utcnow()}
if payload.title is not None:
updates["title"] = payload.title
if payload.description is not None:
updates["description"] = payload.description
if payload.requested_outputs is not None:
updates["requested_outputs"] = payload.requested_outputs.model_dump()
if payload.languages is not None:
updates["languages"] = payload.languages
if payload.deadline is not None:
updates["deadline"] = payload.deadline
result = await db.job_briefs.find_one_and_update(
{"_id": brief_id},
{"$set": updates},
return_document=True,
)
await audit_logger.log_action(
action=AuditAction.BRIEF_UPDATE,
description=f"Brief '{brief_id}' updated",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=brief_id,
details={"fields_updated": list(updates.keys())},
)
return _doc_to_response(result)
@router.post("/{brief_id}/submit", response_model=JobBriefResponse)
async def submit_brief(
brief_id: str,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.MANAGER)
if doc["status"] != BriefStatus.DRAFT.value:
raise HTTPException(status_code=400, detail="Only DRAFT briefs can be submitted")
now = datetime.utcnow()
result = await db.job_briefs.find_one_and_update(
{"_id": brief_id},
{"$set": {"status": BriefStatus.SUBMITTED.value, "submitted_at": now, "updated_at": now}},
return_document=True,
)
await audit_logger.log_action(
action=AuditAction.BRIEF_SUBMIT,
description=f"Brief '{brief_id}' submitted for review",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=brief_id,
details={"organization_id": result.get("organization_id")},
)
return _doc_to_response(result)
@router.post("/{brief_id}/approve", response_model=JobBriefResponse)
async def approve_brief(
brief_id: str,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.ADMIN)
if doc["status"] != BriefStatus.SUBMITTED.value:
raise HTTPException(status_code=400, detail="Only SUBMITTED briefs can be approved")
now = datetime.utcnow()
result = await db.job_briefs.find_one_and_update(
{"_id": brief_id},
{
"$set": {
"status": BriefStatus.APPROVED.value,
"approved_by": str(ctx.user.id),
"updated_at": now,
}
},
return_document=True,
)
await audit_logger.log_action(
action=AuditAction.BRIEF_APPROVE,
description=f"Brief '{brief_id}' approved",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=brief_id,
details={"organization_id": result.get("organization_id")},
)
return _doc_to_response(result)

View file

@ -9,16 +9,15 @@ Access rules:
- List projects (read) Admin, PM, or any team member of the client - List projects (read) Admin, PM, or any team member of the client
""" """
from datetime import UTC, datetime from datetime import datetime, timezone
from bson import ObjectId from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException, Request from fastapi import APIRouter, Depends, HTTPException
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel from pydantic import BaseModel
from ...core.database import get_database from ...core.database import get_database
from ...core.dependencies import get_current_user, require_roles from ...core.dependencies import get_current_user, require_pm_for_client, require_roles
from ...models.audit_log import AuditAction
from ...models.client import ( from ...models.client import (
Client, Client,
ClientCreate, ClientCreate,
@ -31,7 +30,6 @@ from ...models.client import (
TeamUpdate, TeamUpdate,
) )
from ...models.user import User, UserRole from ...models.user import User, UserRole
from ...services.audit_logger import audit_logger
router = APIRouter(prefix="/clients", tags=["clients"]) router = APIRouter(prefix="/clients", tags=["clients"])
@ -41,7 +39,7 @@ router = APIRouter(prefix="/clients", tags=["clients"])
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
def _now() -> datetime: def _now() -> datetime:
return datetime.now(UTC) return datetime.now(timezone.utc)
async def _get_client_or_404(client_id: str, db: AsyncIOMotorDatabase) -> dict: async def _get_client_or_404(client_id: str, db: AsyncIOMotorDatabase) -> dict:
@ -93,9 +91,6 @@ def _project_from_doc(doc: dict) -> Project:
name=doc["name"], name=doc["name"],
client_id=doc["client_id"], client_id=doc["client_id"],
is_active=doc.get("is_active", True), is_active=doc.get("is_active", True),
default_languages=doc.get("default_languages", []),
default_linguist_id=doc.get("default_linguist_id"),
default_reviewer_id=doc.get("default_reviewer_id"),
created_at=doc.get("created_at"), created_at=doc.get("created_at"),
updated_at=doc.get("updated_at"), updated_at=doc.get("updated_at"),
) )
@ -123,7 +118,6 @@ async def list_clients(
@router.post("", response_model=Client) @router.post("", response_model=Client)
async def create_client( async def create_client(
body: ClientCreate, body: ClientCreate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
@ -140,18 +134,7 @@ async def create_client(
"updated_at": now, "updated_at": now,
}) })
doc = await db.clients.find_one({"_id": client_id}) doc = await db.clients.find_one({"_id": client_id})
client = _client_from_doc(doc) return _client_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_CREATE,
description=f"Client '{client.name}' created",
user=current_user,
request=request,
resource_type="client",
resource_id=str(client.id),
resource_name=client.name,
details={"slug": client.slug},
)
return client
@router.get("/{client_id}", response_model=Client) @router.get("/{client_id}", response_model=Client)
@ -172,12 +155,11 @@ async def get_client(
async def update_client( async def update_client(
client_id: str, client_id: str,
body: ClientUpdate, body: ClientUpdate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
update: dict = dict(body.model_dump(exclude_none=True).items()) update: dict = {k: v for k, v in body.model_dump(exclude_none=True).items()}
if not update: if not update:
raise HTTPException(status_code=422, detail="No fields to update") raise HTTPException(status_code=422, detail="No fields to update")
if "slug" in update and await db.clients.find_one({"slug": update["slug"], "_id": {"$ne": client_id}}): if "slug" in update and await db.clients.find_one({"slug": update["slug"], "_id": {"$ne": client_id}}):
@ -185,39 +167,17 @@ async def update_client(
update["updated_at"] = _now() update["updated_at"] = _now()
await db.clients.update_one({"_id": client_id}, {"$set": update}) await db.clients.update_one({"_id": client_id}, {"$set": update})
doc = await db.clients.find_one({"_id": client_id}) doc = await db.clients.find_one({"_id": client_id})
client = _client_from_doc(doc) return _client_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_UPDATE,
description=f"Client '{client.name}' updated",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client.name,
details={"fields_updated": list(body.model_dump(exclude_none=True).keys())},
)
return client
@router.delete("/{client_id}", status_code=204) @router.delete("/{client_id}", status_code=204)
async def deactivate_client( async def deactivate_client(
client_id: str, client_id: str,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await db.clients.update_one({"_id": client_id}, {"$set": {"is_active": False, "updated_at": _now()}}) await db.clients.update_one({"_id": client_id}, {"$set": {"is_active": False, "updated_at": _now()}})
await audit_logger.log_action(
action=AuditAction.CLIENT_DEACTIVATE,
description=f"Client '{doc['name']}' deactivated",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=doc["name"],
details={"was_active": doc.get("is_active", True)},
)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -232,11 +192,10 @@ class AssignPMRequest(BaseModel):
async def assign_pm( async def assign_pm(
client_id: str, client_id: str,
body: AssignPMRequest, body: AssignPMRequest,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
user_doc = await db.users.find_one({"_id": body.user_id}) user_doc = await db.users.find_one({"_id": body.user_id})
if not user_doc: if not user_doc:
raise HTTPException(status_code=404, detail="User not found") raise HTTPException(status_code=404, detail="User not found")
@ -247,28 +206,16 @@ async def assign_pm(
"$set": {"role": UserRole.PROJECT_MANAGER.value, "updated_at": _now()}, "$set": {"role": UserRole.PROJECT_MANAGER.value, "updated_at": _now()},
}, },
) )
await audit_logger.log_action(
action=AuditAction.CLIENT_PM_ASSIGN,
description=f"PM '{user_doc.get('email', body.user_id)}' assigned to client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"pm_user_id": body.user_id, "pm_email": user_doc.get("email")},
)
@router.delete("/{client_id}/pm/{user_id}", status_code=204) @router.delete("/{client_id}/pm/{user_id}", status_code=204)
async def remove_pm( async def remove_pm(
client_id: str, client_id: str,
user_id: str, user_id: str,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
pm_doc = await db.users.find_one({"_id": user_id})
await db.users.update_one( await db.users.update_one(
{"_id": user_id}, {"_id": user_id},
{"$pull": {"pm_client_ids": client_id}, "$set": {"updated_at": _now()}}, {"$pull": {"pm_client_ids": client_id}, "$set": {"updated_at": _now()}},
@ -280,16 +227,6 @@ async def remove_pm(
{"_id": user_id}, {"_id": user_id},
{"$set": {"role": UserRole.CLIENT.value, "updated_at": _now()}}, {"$set": {"role": UserRole.CLIENT.value, "updated_at": _now()}},
) )
await audit_logger.log_action(
action=AuditAction.CLIENT_PM_REMOVE,
description=f"PM '{pm_doc.get('email', user_id) if pm_doc else user_id}' removed from client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"pm_user_id": user_id, "pm_email": pm_doc.get("email") if pm_doc else None},
)
@router.get("/{client_id}/pm", response_model=list[dict]) @router.get("/{client_id}/pm", response_model=list[dict])
@ -326,11 +263,10 @@ async def list_teams(
async def create_team( async def create_team(
client_id: str, client_id: str,
body: TeamCreate, body: TeamCreate,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
now = _now() now = _now()
team_id = str(ObjectId()) team_id = str(ObjectId())
@ -343,18 +279,7 @@ async def create_team(
"updated_at": now, "updated_at": now,
}) })
doc = await db.teams.find_one({"_id": team_id}) doc = await db.teams.find_one({"_id": team_id})
team = _team_from_doc(doc) return _team_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_CREATE,
description=f"Team '{team.name}' created for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team.name},
)
return team
@router.patch("/{client_id}/teams/{team_id}", response_model=Team) @router.patch("/{client_id}/teams/{team_id}", response_model=Team)
@ -362,55 +287,32 @@ async def update_team(
client_id: str, client_id: str,
team_id: str, team_id: str,
body: TeamUpdate, body: TeamUpdate,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
await _get_team_or_404(team_id, client_id, db) await _get_team_or_404(team_id, client_id, db)
update = dict(body.model_dump(exclude_none=True).items()) update = {k: v for k, v in body.model_dump(exclude_none=True).items()}
if not update: if not update:
raise HTTPException(status_code=422, detail="No fields to update") raise HTTPException(status_code=422, detail="No fields to update")
update["updated_at"] = _now() update["updated_at"] = _now()
await db.teams.update_one({"_id": team_id}, {"$set": update}) await db.teams.update_one({"_id": team_id}, {"$set": update})
doc = await db.teams.find_one({"_id": team_id}) doc = await db.teams.find_one({"_id": team_id})
team = _team_from_doc(doc) return _team_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_UPDATE,
description=f"Team '{team.name}' updated for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team.name, "fields_updated": list(body.model_dump(exclude_none=True).keys())},
)
return team
@router.delete("/{client_id}/teams/{team_id}", status_code=204) @router.delete("/{client_id}/teams/{team_id}", status_code=204)
async def delete_team( async def delete_team(
client_id: str, client_id: str,
team_id: str, team_id: str,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
team_doc = await _get_team_or_404(team_id, client_id, db) await _get_team_or_404(team_id, client_id, db)
await db.teams.delete_one({"_id": team_id}) await db.teams.delete_one({"_id": team_id})
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_DELETE,
description=f"Team '{team_doc['name']}' deleted from client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team_doc["name"]},
)
# Team membership # Team membership
@ -424,35 +326,18 @@ async def add_team_member(
client_id: str, client_id: str,
team_id: str, team_id: str,
body: AddMemberRequest, body: AddMemberRequest,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
team_doc = await _get_team_or_404(team_id, client_id, db) await _get_team_or_404(team_id, client_id, db)
member_doc = await db.users.find_one({"_id": body.user_id}) if not await db.users.find_one({"_id": body.user_id}):
if not member_doc:
raise HTTPException(status_code=404, detail="User not found") raise HTTPException(status_code=404, detail="User not found")
# Write to both Team.member_user_ids (legacy) and Membership.team_ids (MT-17)
await db.teams.update_one( await db.teams.update_one(
{"_id": team_id}, {"_id": team_id},
{"$addToSet": {"member_user_ids": body.user_id}, "$set": {"updated_at": _now()}}, {"$addToSet": {"member_user_ids": body.user_id}, "$set": {"updated_at": _now()}},
) )
await db.memberships.update_one(
{"user_id": body.user_id, "organization_id": client_id},
{"$addToSet": {"team_ids": team_id}},
)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_MEMBER_ADD,
description=f"User '{member_doc.get('email', body.user_id)}' added to team '{team_doc['name']}' of client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team_doc["name"], "member_user_id": body.user_id, "member_email": member_doc.get("email")},
)
@router.delete("/{client_id}/teams/{team_id}/members/{user_id}", status_code=204) @router.delete("/{client_id}/teams/{team_id}/members/{user_id}", status_code=204)
@ -460,56 +345,22 @@ async def remove_team_member(
client_id: str, client_id: str,
team_id: str, team_id: str,
user_id: str, user_id: str,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
team_doc = await _get_team_or_404(team_id, client_id, db) await _get_team_or_404(team_id, client_id, db)
member_doc = await db.users.find_one({"_id": user_id})
await db.teams.update_one( await db.teams.update_one(
{"_id": team_id}, {"_id": team_id},
{"$pull": {"member_user_ids": user_id}, "$set": {"updated_at": _now()}}, {"$pull": {"member_user_ids": user_id}, "$set": {"updated_at": _now()}},
) )
await db.memberships.update_one(
{"user_id": user_id, "organization_id": client_id},
{"$pull": {"team_ids": team_id}},
)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_MEMBER_REMOVE,
description=f"User '{member_doc.get('email', user_id) if member_doc else user_id}' removed from team '{team_doc['name']}' of client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team_doc["name"], "member_user_id": user_id, "member_email": member_doc.get("email") if member_doc else None},
)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Project endpoints # Project endpoints
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@router.get("/all-projects", response_model=list[Project])
async def list_all_projects(
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Return all active projects accessible to the current user (across all clients)."""
if current_user.role in (UserRole.ADMIN, UserRole.PRODUCTION, UserRole.PROJECT_MANAGER):
docs = await db.projects.find({"is_active": True}).to_list(None)
else:
accessible_client_ids = await _get_accessible_client_ids(current_user, db)
if not accessible_client_ids:
return []
docs = await db.projects.find(
{"client_id": {"$in": accessible_client_ids}, "is_active": True}
).to_list(None)
return [_project_from_doc(d) for d in docs]
@router.get("/{client_id}/projects", response_model=list[Project]) @router.get("/{client_id}/projects", response_model=list[Project])
async def list_projects( async def list_projects(
client_id: str, client_id: str,
@ -526,12 +377,11 @@ async def list_projects(
async def create_project( async def create_project(
client_id: str, client_id: str,
body: ProjectCreate, body: ProjectCreate,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_client_member(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
now = _now() now = _now()
project_id = str(ObjectId()) project_id = str(ObjectId())
await db.projects.insert_one({ await db.projects.insert_one({
@ -539,25 +389,11 @@ async def create_project(
"name": body.name, "name": body.name,
"client_id": client_id, "client_id": client_id,
"is_active": True, "is_active": True,
"default_languages": body.default_languages,
"default_linguist_id": body.default_linguist_id,
"default_reviewer_id": body.default_reviewer_id,
"created_at": now, "created_at": now,
"updated_at": now, "updated_at": now,
}) })
doc = await db.projects.find_one({"_id": project_id}) doc = await db.projects.find_one({"_id": project_id})
project = _project_from_doc(doc) return _project_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_PROJECT_CREATE,
description=f"Project '{project.name}' created for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"project_id": project_id, "project_name": project.name, "default_languages": body.default_languages},
)
return project
@router.patch("/{client_id}/projects/{project_id}", response_model=Project) @router.patch("/{client_id}/projects/{project_id}", response_model=Project)
@ -565,58 +401,35 @@ async def update_project(
client_id: str, client_id: str,
project_id: str, project_id: str,
body: ProjectUpdate, body: ProjectUpdate,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
await _get_project_or_404(project_id, client_id, db) await _get_project_or_404(project_id, client_id, db)
update = dict(body.model_dump(exclude_none=True).items()) update = {k: v for k, v in body.model_dump(exclude_none=True).items()}
if not update: if not update:
raise HTTPException(status_code=422, detail="No fields to update") raise HTTPException(status_code=422, detail="No fields to update")
update["updated_at"] = _now() update["updated_at"] = _now()
await db.projects.update_one({"_id": project_id}, {"$set": update}) await db.projects.update_one({"_id": project_id}, {"$set": update})
doc = await db.projects.find_one({"_id": project_id}) doc = await db.projects.find_one({"_id": project_id})
project = _project_from_doc(doc) return _project_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_PROJECT_UPDATE,
description=f"Project '{project.name}' updated for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"project_id": project_id, "project_name": project.name, "fields_updated": list(body.model_dump(exclude_none=True).keys())},
)
return project
@router.delete("/{client_id}/projects/{project_id}", status_code=204) @router.delete("/{client_id}/projects/{project_id}", status_code=204)
async def archive_project( async def archive_project(
client_id: str, client_id: str,
project_id: str, project_id: str,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
project_doc = await _get_project_or_404(project_id, client_id, db) await _get_project_or_404(project_id, client_id, db)
await db.projects.update_one( await db.projects.update_one(
{"_id": project_id}, {"_id": project_id},
{"$set": {"is_active": False, "updated_at": _now()}}, {"$set": {"is_active": False, "updated_at": _now()}},
) )
await audit_logger.log_action(
action=AuditAction.CLIENT_PROJECT_ARCHIVE,
description=f"Project '{project_doc['name']}' archived for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"project_id": project_id, "project_name": project_doc["name"]},
)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -636,37 +449,6 @@ async def _assert_pm_or_admin(user: User, client_id: str, db: AsyncIOMotorDataba
raise HTTPException(status_code=403, detail="Not a manager for this client") raise HTTPException(status_code=403, detail="Not a manager for this client")
async def _assert_pm_or_client_member(user: User, client_id: str, db: AsyncIOMotorDatabase) -> None:
"""Allow PM/ADMIN/PROD or any org member (CLIENT role) with membership in this client's org."""
if user.role in (UserRole.ADMIN, UserRole.PRODUCTION):
return
if user.role == UserRole.PROJECT_MANAGER:
if client_id in (user.pm_client_ids or []):
return
mem = await db.memberships.find_one({"user_id": str(user.id), "organization_id": client_id})
if mem and mem.get("role_in_org") in ("owner", "admin", "manager"):
return
# Allow CLIENT users who are members of the org
if user.role == UserRole.CLIENT:
mem = await db.memberships.find_one({"user_id": str(user.id), "organization_id": client_id})
if mem:
return
raise HTTPException(status_code=403, detail="Not authorized to create projects for this client")
async def _get_accessible_client_ids(user: User, db: AsyncIOMotorDatabase) -> list[str]:
"""Return list of client_ids the user can access."""
ids: set[str] = set()
# PM assignments (legacy)
if user.pm_client_ids:
ids.update(user.pm_client_ids)
# Org memberships
mems = await db.memberships.find({"user_id": str(user.id)}).to_list(None)
for m in mems:
ids.add(m["organization_id"])
return list(ids)
async def _assert_client_access(user: User, client_id: str, db: AsyncIOMotorDatabase) -> None: async def _assert_client_access(user: User, client_id: str, db: AsyncIOMotorDatabase) -> None:
"""Allow platform staff, org members (any role), or PM of the client.""" """Allow platform staff, org members (any role), or PM of the client."""
if user.role in (UserRole.ADMIN, UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.LINGUIST): if user.role in (UserRole.ADMIN, UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.LINGUIST):
@ -678,4 +460,6 @@ async def _assert_client_access(user: User, client_id: str, db: AsyncIOMotorData
# Legacy fallback for pre-migration users # Legacy fallback for pre-migration users
if user.role == UserRole.PROJECT_MANAGER and client_id in (user.pm_client_ids or []): if user.role == UserRole.PROJECT_MANAGER and client_id in (user.pm_client_ids or []):
return return
if user.role in (UserRole.CLIENT, UserRole.PROJECT_MANAGER):
return
raise HTTPException(status_code=403, detail="Insufficient permissions") raise HTTPException(status_code=403, detail="Insufficient permissions")

View file

@ -3,11 +3,11 @@ from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.database import get_database from ...core.database import get_database
from ...core.dependencies import get_current_user from ...core.dependencies import get_current_user
from ...models.audit_log import AuditAction
from ...models.user import User from ...models.user import User
from ...schemas.file import SignedUploadRequest, SignedUploadResponse from ...schemas.file import SignedUploadRequest, SignedUploadResponse
from ...services.audit_logger import audit_logger
from ...services.gcs import generate_signed_upload_url from ...services.gcs import generate_signed_upload_url
from ...services.audit_logger import audit_logger
from ...models.audit_log import AuditAction
router = APIRouter(prefix="/files", tags=["files"]) router = APIRouter(prefix="/files", tags=["files"])
@ -62,4 +62,4 @@ async def get_signed_upload_url(
raise HTTPException( raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to generate signed upload URL: {str(e)}" detail=f"Failed to generate signed upload URL: {str(e)}"
) from None )

View file

@ -1,326 +0,0 @@
"""
Glossary management endpoints.
Access:
- All glossary mutations (upload, activate, archive) Admin or PM of the client
- Glossary reads (list, detail, terms) Admin, PM, or staff members
Routes are nested under /clients/{client_id}/glossaries to keep ownership clear.
"""
from __future__ import annotations
from fastapi import APIRouter, Depends, File, Form, HTTPException, Query, UploadFile
from ...core.authz import MembershipContext, assert_user_in_org, get_membership_context
from ...core.logging import get_logger
from ...models.audit_log import AuditAction
from ...models.glossary import (
GlossaryDetailResponse,
GlossaryResponse,
GlossaryVersionResponse,
)
from ...models.organization import OrgRole
from ...services import audit_logger as audit_svc
from ...services import glossary_service as svc
logger = get_logger(__name__)
router = APIRouter(
prefix="/clients/{client_id}/glossaries",
tags=["glossaries"],
)
_ALLOWED_CONTENT_TYPES = {
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"application/vnd.ms-excel",
}
_MAX_FILE_SIZE_MB = 50
# ── List glossaries ───────────────────────────────────────────────────────────
@router.get("", response_model=list[GlossaryResponse])
async def list_glossaries(
client_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
"""List all active glossaries for a client."""
assert_user_in_org(ctx, client_id, OrgRole.VIEWER)
glossaries = await svc.get_glossaries_for_client(client_id)
version_map = await svc.get_versions_by_ids([g.current_version_id for g in glossaries if g.current_version_id])
return [_to_response(g, version_map.get(g.current_version_id)) for g in glossaries]
# ── Upload new glossary ───────────────────────────────────────────────────────
@router.post("", response_model=GlossaryDetailResponse, status_code=201)
async def upload_glossary(
client_id: str,
file: UploadFile = File(..., description="xlsx glossary file"),
name: str = Form(...),
source_locale: str = Form(..., description="BCP-47 source locale, e.g. en-GB"),
source_locale_col: str = Form(..., description="xlsx column header for the source language, e.g. en_gb"),
description: str | None = Form(None),
change_note: str | None = Form(None),
ctx: MembershipContext = Depends(get_membership_context),
):
"""Upload a new glossary xlsx file and associate it with a client."""
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
_validate_xlsx(file)
try:
glossary, version = await svc.ingest_glossary(
client_id=client_id,
name=name,
source_locale=source_locale,
source_locale_col=source_locale_col,
file=file,
user_id=str(ctx.user.id),
description=description,
change_note=change_note,
)
except ValueError as exc:
raise HTTPException(status_code=422, detail=str(exc)) from exc
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_UPLOAD,
description=f"Glossary '{name}' uploaded for client {client_id}",
user=ctx.user,
resource_type="glossary",
resource_id=glossary.id,
details={"term_count": version.term_count, "source_locale": source_locale},
)
versions = await svc.get_versions(glossary.id)
return _to_detail_response(glossary, versions)
# ── Get glossary detail ───────────────────────────────────────────────────────
@router.get("/{glossary_id}", response_model=GlossaryDetailResponse)
async def get_glossary(
client_id: str,
glossary_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.VIEWER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
versions = await svc.get_versions(glossary_id)
return _to_detail_response(glossary, versions)
# ── Browse terms ──────────────────────────────────────────────────────────────
@router.get("/{glossary_id}/terms")
async def list_terms(
client_id: str,
glossary_id: str,
version_id: str | None = Query(None, description="Specific version; defaults to active"),
search: str | None = Query(None),
page: int = Query(1, ge=1),
page_size: int = Query(50, ge=1, le=200),
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.VIEWER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
vid = version_id or glossary.current_version_id
if not vid:
return {"terms": [], "total": 0, "page": page, "page_size": page_size}
terms, total = await svc.get_terms_page(vid, search=search, page=page, page_size=page_size)
return {
"terms": [{"source_term": t["source_term"], "translations": t["translations"]} for t in terms],
"total": total,
"page": page,
"page_size": page_size,
}
# ── Upload new version ────────────────────────────────────────────────────────
@router.post("/{glossary_id}/versions", response_model=GlossaryVersionResponse, status_code=201)
async def upload_version(
client_id: str,
glossary_id: str,
file: UploadFile = File(...),
source_locale_col: str = Form(...),
change_note: str | None = Form(None),
ctx: MembershipContext = Depends(get_membership_context),
):
"""Upload a new xlsx file as a new version of an existing glossary."""
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
_validate_xlsx(file)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
try:
version = await svc.ingest_new_version(
glossary_id=glossary_id,
source_locale_col=source_locale_col,
file=file,
user_id=str(ctx.user.id),
change_note=change_note,
)
except ValueError as exc:
raise HTTPException(status_code=422, detail=str(exc)) from exc
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_VERSION_UPLOAD,
description=f"New glossary version uploaded for glossary {glossary_id}",
user=ctx.user,
resource_type="glossary_version",
resource_id=version.id,
details={"term_count": version.term_count, "version_number": version.version_number},
)
return _version_to_response(version)
# ── Activate a version ────────────────────────────────────────────────────────
@router.post("/{glossary_id}/activate")
async def activate_version(
client_id: str,
glossary_id: str,
version_id: str = Form(...),
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
try:
await svc.activate_version(glossary_id, version_id)
except ValueError as exc:
raise HTTPException(status_code=404, detail=str(exc)) from exc
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_ACTIVATE,
description=f"Glossary version {version_id} activated",
user=ctx.user,
resource_type="glossary",
resource_id=glossary_id,
details={"version_id": version_id},
)
return {"status": "ok", "active_version_id": version_id}
# ── Re-queue embedding ────────────────────────────────────────────────────────
@router.post("/{glossary_id}/versions/{version_id}/reembed", status_code=202)
async def reembed_version(
client_id: str,
glossary_id: str,
version_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
"""Re-queue the embedding task for a glossary version (resets failed/pending/stuck embeds)."""
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
versions = await svc.get_versions(glossary_id)
version = next((v for v in versions if str(v.id) == version_id), None)
if not version:
raise HTTPException(status_code=404, detail="Version not found")
try:
import motor.motor_asyncio
from bson import ObjectId
from ...core.config import settings
from ...tasks.embed_glossary import embed_glossary_version_task
client_db = motor.motor_asyncio.AsyncIOMotorClient(settings.mongodb_uri)
db = client_db[settings.mongodb_db]
await db.glossary_versions.update_one(
{"_id": ObjectId(version_id)},
{"$set": {"embedding_status": "pending", "embedded_count": 0}},
)
client_db.close()
embed_glossary_version_task.delay(version_id)
except Exception as exc:
raise HTTPException(status_code=500, detail=f"Failed to queue embedding: {exc}") from exc
return {"status": "queued", "version_id": version_id}
# ── Delete ───────────────────────────────────────────────────────────────────
@router.delete("/{glossary_id}", status_code=204)
async def archive_glossary(
client_id: str,
glossary_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.ADMIN)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
await svc.archive_glossary(glossary_id)
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_ARCHIVE,
description=f"Glossary {glossary_id} archived",
user=ctx.user,
resource_type="glossary",
resource_id=glossary_id,
)
# ── Helpers ───────────────────────────────────────────────────────────────────
def _validate_xlsx(file: UploadFile) -> None:
if file.content_type not in _ALLOWED_CONTENT_TYPES and not (
file.filename and file.filename.endswith(".xlsx")
):
raise HTTPException(
status_code=422,
detail="Only .xlsx files are accepted",
)
def _to_response(g, current_version=None) -> GlossaryResponse:
return GlossaryResponse(
id=str(g.id),
client_id=g.client_id,
name=g.name,
description=g.description,
source_locale=g.source_locale,
source=g.source,
status=g.status,
current_version_id=g.current_version_id,
current_version_embedding_status=current_version.embedding_status if current_version else None,
current_version_embedded_count=current_version.embedded_count if current_version else None,
current_version_term_count=current_version.term_count if current_version else None,
created_at=g.created_at,
created_by=g.created_by,
)
def _version_to_response(v) -> GlossaryVersionResponse:
return GlossaryVersionResponse(
id=str(v.id),
glossary_id=v.glossary_id,
version_number=v.version_number,
term_count=v.term_count,
embedded_count=v.embedded_count,
embedding_status=v.embedding_status,
created_at=v.created_at,
created_by=v.created_by,
change_note=v.change_note,
)
def _to_detail_response(glossary, versions) -> GlossaryDetailResponse:
return GlossaryDetailResponse(
**_to_response(glossary).model_dump(),
versions=[_version_to_response(v) for v in versions],
)

View file

@ -14,21 +14,16 @@ Protected endpoints:
import hashlib import hashlib
import re import re
import secrets import secrets
from datetime import UTC, datetime, timedelta from datetime import datetime, timedelta, timezone
from fastapi import APIRouter, Depends, HTTPException, Request from fastapi import APIRouter, Depends, HTTPException, status
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import bump_user_membership_cache
from ...core.database import get_database from ...core.database import get_database
from ...core.dependencies import get_current_user from ...core.dependencies import get_current_user
from ...core.security import ( from ...core.security import create_access_token, create_refresh_token, get_password_hash
create_access_token,
create_refresh_token,
get_password_hash,
)
from ...models.audit_log import AuditAction
from ...models.invitation import ( from ...models.invitation import (
Invitation,
InvitationAcceptRequest, InvitationAcceptRequest,
InvitationCreate, InvitationCreate,
InvitationPreviewResponse, InvitationPreviewResponse,
@ -36,7 +31,7 @@ from ...models.invitation import (
) )
from ...models.organization import OrgRole from ...models.organization import OrgRole
from ...models.user import AuthProvider, User, UserRole from ...models.user import AuthProvider, User, UserRole
from ...services.audit_logger import audit_logger from ...core.authz import bump_user_membership_cache
from ...services.emailer import email_service from ...services.emailer import email_service
from ...services.membership_service import get_membership, upsert_membership from ...services.membership_service import get_membership, upsert_membership
@ -44,7 +39,7 @@ router = APIRouter(tags=["invitations"])
def _now() -> datetime: def _now() -> datetime:
return datetime.now(UTC) return datetime.now(timezone.utc)
def _hash_token(plaintext: str) -> str: def _hash_token(plaintext: str) -> str:
@ -59,7 +54,7 @@ def _make_token() -> tuple[str, str]:
def _inv_from_doc(doc: dict) -> InvitationResponse: def _inv_from_doc(doc: dict) -> InvitationResponse:
now = _now() now = _now()
expires_at = doc["expires_at"].replace(tzinfo=UTC) if doc["expires_at"].tzinfo is None else doc["expires_at"] expires_at = doc["expires_at"].replace(tzinfo=timezone.utc) if doc["expires_at"].tzinfo is None else doc["expires_at"]
return InvitationResponse( return InvitationResponse(
id=str(doc["_id"]), id=str(doc["_id"]),
email=doc["email"], email=doc["email"],
@ -105,7 +100,6 @@ org_router = APIRouter(prefix="/organizations", tags=["invitations"])
async def create_invitation( async def create_invitation(
org_id: str, org_id: str,
body: InvitationCreate, body: InvitationCreate,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
@ -127,18 +121,6 @@ async def create_invitation(
detail="A pending invitation already exists for this email. Revoke it first to re-invite.", detail="A pending invitation already exists for this email. Revoke it first to re-invite.",
) )
# MT-19: ensure all target_team_ids belong to this org (client_id == org_id)
if body.target_team_ids:
valid_teams = await db.teams.count_documents({
"_id": {"$in": body.target_team_ids},
"client_id": org_id,
})
if valid_teams != len(body.target_team_ids):
raise HTTPException(
status_code=400,
detail="One or more target_team_ids do not belong to this organization.",
)
plaintext, token_hash = _make_token() plaintext, token_hash = _make_token()
now = _now() now = _now()
expires_at = now + timedelta(days=body.expires_in_days) expires_at = now + timedelta(days=body.expires_in_days)
@ -172,17 +154,7 @@ async def create_invitation(
expires_at=expires_at, expires_at=expires_at,
) )
inv = _inv_from_doc(doc) return _inv_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.INVITATION_CREATE,
description=f"Invitation created for '{email_lower}' to organization '{org_id}'",
user=current_user,
request=request,
resource_type="invitation",
resource_id=inv.id,
details={"invited_email": email_lower, "org_id": org_id, "role": body.role_in_org},
)
return inv
@org_router.get("/{org_id}/invitations", response_model=list[InvitationResponse]) @org_router.get("/{org_id}/invitations", response_model=list[InvitationResponse])
@ -202,30 +174,16 @@ async def list_invitations(
async def revoke_invitation( async def revoke_invitation(
org_id: str, org_id: str,
invitation_id: str, invitation_id: str,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
await _assert_org_admin(org_id, current_user, db) await _assert_org_admin(org_id, current_user, db)
inv_doc = await db.invitations.find_one({"_id": invitation_id, "organization_id": org_id})
result = await db.invitations.update_one( result = await db.invitations.update_one(
{"_id": invitation_id, "organization_id": org_id, "accepted_at": None, "revoked_at": None}, {"_id": invitation_id, "organization_id": org_id, "accepted_at": None, "revoked_at": None},
{"$set": {"revoked_at": _now()}}, {"$set": {"revoked_at": _now()}},
) )
if result.matched_count == 0: if result.matched_count == 0:
raise HTTPException(status_code=404, detail="Invitation not found or already accepted/revoked") raise HTTPException(status_code=404, detail="Invitation not found or already accepted/revoked")
await audit_logger.log_action(
action=AuditAction.INVITATION_REVOKE,
description=f"Invitation '{invitation_id}' revoked in organization '{org_id}'",
user=current_user,
request=request,
resource_type="invitation",
resource_id=invitation_id,
details={
"invited_email": inv_doc["email"] if inv_doc else None,
"org_id": org_id,
},
)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -248,7 +206,7 @@ async def preview_invitation(
raise HTTPException(status_code=410, detail="Invitation not found or has expired") raise HTTPException(status_code=410, detail="Invitation not found or has expired")
now = _now() now = _now()
expires_at = doc["expires_at"].replace(tzinfo=UTC) if doc["expires_at"].tzinfo is None else doc["expires_at"] expires_at = doc["expires_at"].replace(tzinfo=timezone.utc) if doc["expires_at"].tzinfo is None else doc["expires_at"]
if doc.get("revoked_at"): if doc.get("revoked_at"):
raise HTTPException(status_code=410, detail="This invitation has been revoked") raise HTTPException(status_code=410, detail="This invitation has been revoked")
@ -297,7 +255,6 @@ async def preview_invitation(
@router.post("/invitations/accept") @router.post("/invitations/accept")
async def accept_invitation( async def accept_invitation(
body: InvitationAcceptRequest, body: InvitationAcceptRequest,
request: Request,
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""Accept an invitation. Creates user if needed, creates membership, returns tokens.""" """Accept an invitation. Creates user if needed, creates membership, returns tokens."""
@ -360,16 +317,12 @@ async def accept_invitation(
await upsert_membership(user_id, org_id, role_in_org, doc["invited_by_user_id"], db) await upsert_membership(user_id, org_id, role_in_org, doc["invited_by_user_id"], db)
await bump_user_membership_cache(user_id) await bump_user_membership_cache(user_id)
# Auto-add to target teams — write to both Team.member_user_ids (legacy) and Membership.team_ids (MT-17) # Auto-add to target teams
for team_id in doc.get("target_team_ids", []): for team_id in doc.get("target_team_ids", []):
await db.teams.update_one( await db.teams.update_one(
{"_id": team_id, "client_id": org_id}, {"_id": team_id, "client_id": org_id},
{"$addToSet": {"member_user_ids": user_id}}, {"$addToSet": {"member_user_ids": user_id}},
) )
await db.memberships.update_one(
{"user_id": user_id, "organization_id": org_id},
{"$addToSet": {"team_ids": team_id}},
)
# Send welcome email # Send welcome email
if not existing_user.get("_welcomed"): if not existing_user.get("_welcomed"):
@ -380,23 +333,12 @@ async def accept_invitation(
org_name=org_name, org_name=org_name,
) )
# Issue JWT tokens with org_ids claim # Issue JWT tokens
_inv_org_ids = [m["organization_id"] async for m in db.memberships.find({"user_id": user_id}, {"organization_id": 1})] access_token = create_access_token(subject=user_id)
access_token = create_access_token(subject=user_id, org_ids=[str(o) for o in _inv_org_ids if o])
refresh_token = create_refresh_token(subject=user_id) refresh_token = create_refresh_token(subject=user_id)
org_name, org_slug = await _get_org_name(org_id, db) org_name, org_slug = await _get_org_name(org_id, db)
await audit_logger.log_action(
action=AuditAction.INVITATION_ACCEPT,
description=f"Invitation accepted by '{email_lower}' for organization '{org_id}'",
user=None,
request=request,
resource_type="invitation",
resource_id=str(doc["_id"]),
details={"invited_email": email_lower, "org_id": org_id},
)
return { return {
"access_token": access_token, "access_token": access_token,
"refresh_token": refresh_token, "refresh_token": refresh_token,

File diff suppressed because it is too large Load diff

View file

@ -1,580 +0,0 @@
"""Per-language QC endpoints — two-stage (linguist + reviewer) assignment, workflow, comments."""
from datetime import datetime
from fastapi import APIRouter, Depends, HTTPException, Query, Request
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel, Field
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...models.audit_log import AuditAction
from ...models.job import LanguageQCComment, LanguageQCState
from ...models.user import User, UserRole
from ...services import language_qc as lqc
from ...services.audit_logger import audit_logger
router = APIRouter(tags=["language-qc"])
# ── Request / response schemas ────────────────────────────────────────────────
class AssignRequest(BaseModel):
linguist_user_id: str
notes: str | None = None
deadline: datetime | None = None
class ReassignRequest(BaseModel):
linguist_user_id: str
notes: str | None = None
deadline: datetime | None = None
class AssignReviewerRequest(BaseModel):
reviewer_user_id: str
notes: str | None = None
deadline: datetime | None = None
class ReassignReviewerRequest(BaseModel):
reviewer_user_id: str
notes: str | None = None
deadline: datetime | None = None
class ApproveLanguageRequest(BaseModel):
notes: str | None = None
class RejectLanguageRequest(BaseModel):
notes: str
category: str | None = None # timing | mistranslation | terminology | profanity | length | other
class ReopenLanguageRequest(BaseModel):
notes: str | None = None
class AddCommentRequest(BaseModel):
body: str = Field(..., min_length=1, max_length=4000)
class LanguageQCStateResponse(BaseModel):
lang: str
state: LanguageQCState
class LanguageQCMapResponse(BaseModel):
job_id: str
language_qc: dict[str, LanguageQCState]
class QueueItem(BaseModel):
job_id: str
job_title: str
job_status: str
lang: str
lang_qc_status: str
assigned_at: str | None = None
reviewed_at: str | None = None
class QueueResponse(BaseModel):
items: list[QueueItem]
total: int
class BulkAssignRequest(BaseModel):
linguist_user_id: str
reviewer_user_id: str | None = None
languages: list[str] | None = None # None = all available languages
only_unassigned: bool = False # skip languages that already have an assignment
deadline: datetime | None = None
class BulkAssignResponse(BaseModel):
assigned: list[str]
skipped: list[str]
errors: dict[str, str]
# ── Routes ────────────────────────────────────────────────────────────────────
@router.get("/jobs/{job_id}/language-qc", response_model=LanguageQCMapResponse)
async def get_language_qc(
job_id: str,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION,
UserRole.PROJECT_MANAGER, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
# Lazy auto-assignment: apply project/job defaults on first open in PENDING_QC
await lqc.auto_assign_defaults(db, job_id)
states = await lqc.get_all_states(db, job_id)
return LanguageQCMapResponse(job_id=job_id, language_qc=states)
# ── Linguist assignment ───────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/assign", response_model=LanguageQCStateResponse)
async def assign_language(
job_id: str,
lang: str,
request: AssignRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.assign_linguist(
db, job_id, lang, request.linguist_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_ASSIGN,
description=f"Language '{lang}' assigned to linguist '{request.linguist_user_id}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "linguist_user_id": request.linguist_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reassign", response_model=LanguageQCStateResponse)
async def reassign_language(
job_id: str,
lang: str,
request: ReassignRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reassign_linguist(
db, job_id, lang, request.linguist_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REASSIGN,
description=f"Language '{lang}' reassigned to linguist '{request.linguist_user_id}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "linguist_user_id": request.linguist_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Reviewer assignment ───────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/assign-reviewer", response_model=LanguageQCStateResponse)
async def assign_reviewer(
job_id: str,
lang: str,
request: AssignReviewerRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.assign_reviewer(
db, job_id, lang, request.reviewer_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REVIEWER_ASSIGN,
description=f"Reviewer '{request.reviewer_user_id}' assigned to language '{lang}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "reviewer_user_id": request.reviewer_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reassign-reviewer", response_model=LanguageQCStateResponse)
async def reassign_reviewer(
job_id: str,
lang: str,
request: ReassignReviewerRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reassign_reviewer(
db, job_id, lang, request.reviewer_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REVIEWER_REASSIGN,
description=f"Reviewer reassigned to '{request.reviewer_user_id}' for language '{lang}', job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "reviewer_user_id": request.reviewer_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Bulk assignment ───────────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/bulk-assign", response_model=BulkAssignResponse)
async def bulk_assign_languages(
job_id: str,
request: BulkAssignRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Assign one linguist (and optionally one reviewer) to multiple languages in one call."""
job_doc = await db["jobs"].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
available = list((job_doc.get("outputs") or {}).keys())
target_langs = request.languages if request.languages else available
assigned: list[str] = []
skipped: list[str] = []
errors: dict[str, str] = {}
language_qc = job_doc.get("language_qc") or {}
for lang in target_langs:
if lang not in available:
skipped.append(lang)
continue
lang_state = language_qc.get(lang) or {}
already_assigned = bool(lang_state.get("assigned_linguist_id"))
if request.only_unassigned and already_assigned:
skipped.append(lang)
continue
try:
await lqc.assign_linguist(
db, job_id, lang, request.linguist_user_id, current_user,
http_request=http_request, deadline=request.deadline,
)
except Exception as exc:
errors[lang] = str(exc)
continue
if request.reviewer_user_id:
try:
await lqc.assign_reviewer(
db, job_id, lang, request.reviewer_user_id, current_user,
http_request=http_request, deadline=request.deadline,
)
except Exception as exc:
errors[f"{lang}:reviewer"] = str(exc)
assigned.append(lang)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_BULK_ASSIGN,
description=f"Bulk assignment for job {job_id}: {len(assigned)} language(s) assigned to linguist '{request.linguist_user_id}'",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={
"languages": assigned,
"linguist_user_id": request.linguist_user_id,
"reviewer_user_id": request.reviewer_user_id,
"skipped": skipped,
"errors": errors,
},
)
return BulkAssignResponse(assigned=assigned, skipped=skipped, errors=errors)
# ── Workflow transitions ──────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/start-work", response_model=LanguageQCStateResponse)
async def start_linguist_work(
job_id: str,
lang: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Linguist opens the language — pending → in_progress."""
state = await lqc.start_linguist_work(db, job_id, lang, current_user)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_START_WORK,
description=f"Linguist started work on language '{lang}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/submit", response_model=LanguageQCStateResponse)
async def submit_for_review(
job_id: str,
lang: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Linguist submits — in_progress → pending_review. Notifies reviewer by email."""
state = await lqc.submit_for_review(db, job_id, lang, current_user, http_request=http_request)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_SUBMIT,
description=f"Language '{lang}' submitted for review for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/open-review", response_model=LanguageQCStateResponse)
async def open_review(
job_id: str,
lang: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Reviewer opens the review — pending_review → in_review."""
state = await lqc.open_review(db, job_id, lang, current_user, http_request=http_request)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_OPEN_REVIEW,
description=f"Reviewer opened review for language '{lang}', job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Approve / Reject / Reopen ─────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/approve", response_model=LanguageQCStateResponse)
async def approve_language(
job_id: str,
lang: str,
request: ApproveLanguageRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.approve_language(
db, job_id, lang, current_user, http_request=http_request, notes=request.notes,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_APPROVE,
description=f"Language '{lang}' approved for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "notes": request.notes},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reject", response_model=LanguageQCStateResponse)
async def reject_language(
job_id: str,
lang: str,
request: RejectLanguageRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reject_language(
db, job_id, lang, current_user, request.notes, category=request.category, http_request=http_request,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REJECT,
description=f"Language '{lang}' rejected for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "notes": request.notes, "category": request.category},
)
return LanguageQCStateResponse(lang=lang, state=state)
class MarkCueReviewedRequest(BaseModel):
total_cues: int | None = None # client sends on first call to set total
@router.post("/jobs/{job_id}/languages/{lang}/mark-cue-reviewed", response_model=LanguageQCStateResponse)
async def mark_cue_reviewed(
job_id: str,
lang: str,
request: MarkCueReviewedRequest,
http_request: Request,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Increment reviewed_cues counter; optionally set total_cues on first call."""
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
inc_op: dict = {f"language_qc.{lang}.reviewed_cues": 1}
set_op: dict = {"updated_at": datetime.utcnow()}
if request.total_cues is not None:
set_op[f"language_qc.{lang}.total_cues"] = request.total_cues
await db.jobs.update_one({"_id": job_id}, {"$inc": inc_op, "$set": set_op})
updated_doc = await db.jobs.find_one({"_id": job_id})
state_dict = (updated_doc.get("language_qc") or {}).get(lang, {})
from ...models.job import LanguageQCState
state = LanguageQCState(**state_dict) if isinstance(state_dict, dict) else LanguageQCState()
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reopen", response_model=LanguageQCStateResponse)
async def reopen_language(
job_id: str,
lang: str,
request: ReopenLanguageRequest,
http_request: Request,
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reopen_language(
db, job_id, lang, current_user, http_request=http_request, notes=request.notes,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REOPEN,
description=f"Language '{lang}' reopened for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "notes": request.notes},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Comments ──────────────────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/comments", response_model=LanguageQCComment, status_code=201)
async def add_comment(
job_id: str,
lang: str,
request: AddCommentRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.REVIEWER, UserRole.PROJECT_MANAGER,
UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
comment = await lqc.add_comment(
db, job_id, lang, current_user, request.body, http_request=http_request,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_COMMENT,
description=f"Comment added to language '{lang}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "comment_id": str(comment.id) if hasattr(comment, "id") else None},
)
return comment
@router.get("/jobs/{job_id}/languages/{lang}/comments", response_model=list[LanguageQCComment])
async def list_comments(
job_id: str,
lang: str,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.REVIEWER, UserRole.PROJECT_MANAGER,
UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.get_state(db, job_id, lang)
if state is None:
return []
return state.comments
# ── Queues ─────────────────────────────────────────────────────────────────────
@router.get("/me/language-qc-queue", response_model=QueueResponse)
async def my_language_qc_queue(
role: str = Query("linguist", description="'linguist' or 'reviewer'"),
qc_status: str | None = Query(None, description="Filter by status"),
skip: int = Query(0, ge=0),
limit: int = Query(50, ge=1, le=200),
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List jobs and languages assigned to the current user as linguist or reviewer."""
# ADMIN sees all orgs; staff scoped to their orgs from JWT claim (MT-18)
org_ids: list[str] | None = None if current_user.role == UserRole.ADMIN else getattr(current_user, "org_ids", None)
if role == "reviewer":
jobs = await lqc.list_for_reviewer(
db, str(current_user.id), accessible_org_ids=org_ids,
status_filter=qc_status, skip=skip, limit=limit,
)
else:
jobs = await lqc.list_for_linguist(
db, str(current_user.id), accessible_org_ids=org_ids,
status_filter=qc_status, skip=skip, limit=limit,
)
items: list[QueueItem] = []
for job in jobs:
job_id = str(job["_id"])
for assignment in job.get("_my_assignments", []):
lang = assignment["lang"]
state_raw = (job.get("language_qc") or {}).get(lang, {})
items.append(QueueItem(
job_id=job_id,
job_title=job.get("title", ""),
job_status=job.get("status", ""),
lang=lang,
lang_qc_status=assignment.get("status", "pending"),
assigned_at=state_raw.get("assigned_at").isoformat() if isinstance(state_raw, dict) and state_raw.get("assigned_at") else None,
reviewed_at=state_raw.get("reviewed_at").isoformat() if isinstance(state_raw, dict) and state_raw.get("reviewed_at") else None,
))
return QueueResponse(items=items, total=len(items))

View file

@ -12,25 +12,19 @@ underlying MongoDB collections used by routes_clients.py so both
endpoints coexist without data duplication. endpoints coexist without data duplication.
""" """
from datetime import UTC, datetime from datetime import datetime, timezone
from fastapi import APIRouter, Depends, HTTPException, Request from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel from pydantic import BaseModel
from ...core.authz import bump_user_membership_cache
from ...core.database import get_database from ...core.database import get_database
from ...core.dependencies import get_current_user, require_roles from ...core.dependencies import get_current_user, require_roles
from ...models.audit_log import AuditAction
from ...models.membership import MemberDetail, MembershipCreate, MembershipUpdate from ...models.membership import MemberDetail, MembershipCreate, MembershipUpdate
from ...models.organization import ( from ...models.organization import OrgRole, Organization, OrganizationCreate, OrganizationUpdate
Organization,
OrganizationCreate,
OrganizationUpdate,
OrgRole,
)
from ...models.user import User, UserRole from ...models.user import User, UserRole
from ...services.audit_logger import audit_logger from ...core.authz import bump_user_membership_cache
from ...services.membership_service import ( from ...services.membership_service import (
get_membership, get_membership,
get_memberships_for_user, get_memberships_for_user,
@ -45,7 +39,7 @@ ADMIN_ROLES = [UserRole.ADMIN]
def _now() -> datetime: def _now() -> datetime:
return datetime.now(UTC) return datetime.now(timezone.utc)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -121,7 +115,6 @@ class _OrgCreate(BaseModel):
@router.post("", response_model=Organization, status_code=201) @router.post("", response_model=Organization, status_code=201)
async def create_organization( async def create_organization(
body: OrganizationCreate, body: OrganizationCreate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
@ -140,25 +133,13 @@ async def create_organization(
"updated_at": now, "updated_at": now,
} }
await db.clients.insert_one(doc) await db.clients.insert_one(doc)
org = _org_from_doc(doc) return _org_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.ORG_CREATE,
description=f"Organization '{org.name}' created",
user=current_user,
request=request,
resource_type="organization",
resource_id=str(org.id),
resource_name=org.name,
details={"slug": org.slug},
)
return org
@router.patch("/{org_id}", response_model=Organization) @router.patch("/{org_id}", response_model=Organization)
async def update_organization( async def update_organization(
org_id: str, org_id: str,
body: OrganizationUpdate, body: OrganizationUpdate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
@ -175,18 +156,7 @@ async def update_organization(
await db.clients.update_one({"_id": org_id}, {"$set": updates}) await db.clients.update_one({"_id": org_id}, {"$set": updates})
updated = {**doc, **updates} updated = {**doc, **updates}
org = _org_from_doc(updated) return _org_from_doc(updated)
await audit_logger.log_action(
action=AuditAction.ORG_UPDATE,
description=f"Organization '{org.name}' updated",
user=current_user,
request=request,
resource_type="organization",
resource_id=str(org.id),
resource_name=org.name,
details={k: v for k, v in updates.items() if k != "updated_at"},
)
return org
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -208,7 +178,6 @@ async def list_members(
async def add_member( async def add_member(
org_id: str, org_id: str,
body: MembershipCreate, body: MembershipCreate,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
@ -224,15 +193,6 @@ async def add_member(
members = await list_org_members(org_id, db) members = await list_org_members(org_id, db)
for m in members: for m in members:
if m.user_id == body.user_id: if m.user_id == body.user_id:
await audit_logger.log_action(
action=AuditAction.ORG_MEMBER_ADD,
description=f"Member '{body.user_id}' added to organization '{org_id}' with role '{body.role_in_org}'",
user=current_user,
request=request,
resource_type="organization",
resource_id=org_id,
details={"user_id": body.user_id, "role": body.role_in_org},
)
return m return m
raise HTTPException(status_code=500, detail="Membership created but could not be retrieved") raise HTTPException(status_code=500, detail="Membership created but could not be retrieved")
@ -242,7 +202,6 @@ async def update_member(
org_id: str, org_id: str,
user_id: str, user_id: str,
body: MembershipUpdate, body: MembershipUpdate,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
@ -259,15 +218,6 @@ async def update_member(
members = await list_org_members(org_id, db) members = await list_org_members(org_id, db)
for m in members: for m in members:
if m.user_id == user_id: if m.user_id == user_id:
await audit_logger.log_action(
action=AuditAction.ORG_MEMBER_UPDATE,
description=f"Member '{user_id}' role updated in organization '{org_id}' to '{body.role_in_org}'",
user=current_user,
request=request,
resource_type="organization",
resource_id=org_id,
details={"user_id": user_id, "role": body.role_in_org},
)
return m return m
raise HTTPException(status_code=500, detail="Could not retrieve updated membership") raise HTTPException(status_code=500, detail="Could not retrieve updated membership")
@ -276,7 +226,6 @@ async def update_member(
async def remove_member( async def remove_member(
org_id: str, org_id: str,
user_id: str, user_id: str,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
@ -290,15 +239,6 @@ async def remove_member(
await remove_membership(user_id, org_id, db) await remove_membership(user_id, org_id, db)
await bump_user_membership_cache(user_id) await bump_user_membership_cache(user_id)
await audit_logger.log_action(
action=AuditAction.ORG_MEMBER_REMOVE,
description=f"Member '{user_id}' removed from organization '{org_id}'",
user=current_user,
request=request,
resource_type="organization",
resource_id=org_id,
details={"user_id": user_id, "role": existing.role_in_org},
)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------

View file

@ -1,14 +1,14 @@
"""API routes for review notes - timestamped notes on video assets during review.""" """API routes for review notes - timestamped notes on video assets during review."""
from datetime import datetime from datetime import datetime
from typing import Optional
from bson import ObjectId from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException, Query, status from fastapi import APIRouter, Depends, HTTPException, Query, status
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, get_job_or_403, get_membership_context
from ...core.database import get_database from ...core.database import get_database
from ...core.dependencies import require_roles from ...core.dependencies import get_current_user, require_roles
from ...core.logging import get_logger from ...core.logging import get_logger
from ...models.user import User, UserRole from ...models.user import User, UserRole
from ...schemas.review_note import ( from ...schemas.review_note import (
@ -25,13 +25,18 @@ router = APIRouter(prefix="/jobs/{job_id}/review-notes", tags=["review-notes"])
@router.get("", response_model=ReviewNotesListResponse) @router.get("", response_model=ReviewNotesListResponse)
async def list_review_notes( async def list_review_notes(
job_id: str, job_id: str,
asset_key: str | None = Query(None, description="Filter notes by asset key"), asset_key: Optional[str] = Query(None, description="Filter notes by asset key"),
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""List all review notes for a job, optionally filtered by asset key.""" """List all review notes for a job, optionally filtered by asset key."""
await get_job_or_403(job_id, ctx, db) # org check + existence check # Verify job exists
job = await db.jobs.find_one({"_id": job_id})
if not job:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Job not found"
)
# Build query # Build query
query = {"job_id": job_id} query = {"job_id": job_id}
@ -53,11 +58,16 @@ async def create_review_note(
job_id: str, job_id: str,
request: ReviewNoteCreateRequest, request: ReviewNoteCreateRequest,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""Create a new review note for a video asset.""" """Create a new review note for a video asset."""
await get_job_or_403(job_id, ctx, db) # org check + existence check # Verify job exists
job = await db.jobs.find_one({"_id": job_id})
if not job:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Job not found"
)
# Create note document # Create note document
note_id = str(ObjectId()) note_id = str(ObjectId())
@ -86,11 +96,9 @@ async def get_review_note(
job_id: str, job_id: str,
note_id: str, note_id: str,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""Get a single review note by ID.""" """Get a single review note by ID."""
await get_job_or_403(job_id, ctx, db) # org check
note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id}) note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id})
if not note: if not note:
raise HTTPException( raise HTTPException(
@ -107,11 +115,9 @@ async def update_review_note(
note_id: str, note_id: str,
request: ReviewNoteUpdateRequest, request: ReviewNoteUpdateRequest,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""Update a review note. Only the note owner can update.""" """Update a review note. Only the note owner can update."""
await get_job_or_403(job_id, ctx, db) # org check
note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id}) note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id})
if not note: if not note:
raise HTTPException( raise HTTPException(
@ -145,11 +151,9 @@ async def delete_review_note(
job_id: str, job_id: str,
note_id: str, note_id: str,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""Delete a review note. Only the note owner can delete.""" """Delete a review note. Only the note owner can delete."""
await get_job_or_403(job_id, ctx, db) # org check
note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id}) note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id})
if not note: if not note:
raise HTTPException( raise HTTPException(

View file

@ -1,354 +0,0 @@
"""Share-token endpoints — create/revoke/list tokens + public read-only view + client decision."""
import secrets
from datetime import datetime, timedelta
from typing import Literal
from fastapi import APIRouter, Depends, HTTPException, Request
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel
from ...core.config import settings
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...models.audit_log import AuditAction
from ...models.share_token import ShareTokenResponse
from ...models.user import User, UserRole
from ...services.audit_logger import audit_logger
from ...services.gcs import get_signed_download_url
router = APIRouter(tags=["share"])
_TOKENS = "share_tokens"
_JOBS = "jobs"
def _share_url(token: str) -> str:
return f"{settings.app_url}/share/{token}"
# ── Request schemas ───────────────────────────────────────────────────────────
class CreateShareTokenRequest(BaseModel):
expires_in_days: int | None = 30 # None = no expiry
label: str | None = None
class ShareTokenListResponse(BaseModel):
tokens: list[ShareTokenResponse]
class PublicJobPreviewLanguage(BaseModel):
captions_vtt_url: str | None = None
audio_description_vtt_url: str | None = None
accessible_video_mp4_url: str | None = None
audio_description_mp3_url: str | None = None
class PublicJobPreviewResponse(BaseModel):
job_id: str
job_title: str
job_status: str
source_language: str
languages: list[str]
language_outputs: dict[str, PublicJobPreviewLanguage]
class ClientDecisionRequest(BaseModel):
action: Literal["approve", "reject"]
notes: str | None = None
client_name: str | None = None
class ClientDecisionResponse(BaseModel):
status: str
new_job_status: str
# ── Authenticated routes ──────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/share", response_model=ShareTokenResponse, status_code=201)
async def create_share_token(
job_id: str,
request: CreateShareTokenRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Generate a read-only share link for a job."""
job_doc = await db[_JOBS].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
token_id = secrets.token_hex(32)
now = datetime.utcnow()
expires_at = (now + timedelta(days=request.expires_in_days)) if request.expires_in_days else None
token_doc = {
"_id": token_id,
"job_id": job_id,
"organization_id": job_doc.get("organization_id", ""),
"created_by_user_id": str(current_user.id),
"created_by_email": current_user.email,
"created_at": now,
"expires_at": expires_at,
"is_active": True,
"label": request.label,
}
await db[_TOKENS].insert_one(token_doc)
await audit_logger.log_action(
action=AuditAction.SHARE_TOKEN_CREATE,
description=f"Share token created for job '{job_id}'",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"token_id": token_id, "label": request.label, "expires_in_days": request.expires_in_days},
)
return ShareTokenResponse(
id=token_id,
job_id=job_id,
created_by_email=current_user.email,
created_at=now,
expires_at=expires_at,
is_active=True,
label=request.label,
share_url=_share_url(token_id),
)
@router.get("/jobs/{job_id}/share", response_model=ShareTokenListResponse)
async def list_share_tokens(
job_id: str,
current_user: User = Depends(require_roles(
UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List all active share tokens for a job."""
job_doc = await db[_JOBS].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
cursor = db[_TOKENS].find({"job_id": job_id, "is_active": True})
tokens = []
async for doc in cursor:
tokens.append(ShareTokenResponse(
id=doc["_id"],
job_id=doc["job_id"],
created_by_email=doc["created_by_email"],
created_at=doc["created_at"],
expires_at=doc.get("expires_at"),
is_active=doc["is_active"],
label=doc.get("label"),
share_url=_share_url(doc["_id"]),
))
return ShareTokenListResponse(tokens=tokens)
@router.delete("/jobs/{job_id}/share/{token_id}", status_code=204)
async def revoke_share_token(
job_id: str,
token_id: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Revoke (deactivate) a share token."""
result = await db[_TOKENS].update_one(
{"_id": token_id, "job_id": job_id},
{"$set": {"is_active": False}},
)
if result.matched_count == 0:
raise HTTPException(status_code=404, detail="Token not found")
await audit_logger.log_action(
action=AuditAction.SHARE_TOKEN_REVOKE,
description=f"Share token '{token_id}' revoked for job '{job_id}'",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"token_id": token_id},
)
# ── Public route (no auth) ────────────────────────────────────────────────────
@router.get("/public/share/{token}", response_model=PublicJobPreviewResponse)
async def get_public_job_preview(
token: str,
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Return read-only job preview for a valid share token. No authentication required."""
token_doc = await db[_TOKENS].find_one({"_id": token, "is_active": True})
if not token_doc:
raise HTTPException(status_code=404, detail="Share link not found or has been revoked")
if token_doc.get("expires_at") and token_doc["expires_at"] < datetime.utcnow():
raise HTTPException(status_code=410, detail="Share link has expired")
job_doc = await db[_JOBS].find_one({"_id": token_doc["job_id"]})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
outputs = job_doc.get("outputs") or {}
language_outputs: dict[str, PublicJobPreviewLanguage] = {}
for lang, lang_output in outputs.items():
if not isinstance(lang_output, dict):
continue
lang_data = PublicJobPreviewLanguage()
if "captions_vtt_gcs" in lang_output:
blob_path = lang_output["captions_vtt_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.captions_vtt_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
if "ad_vtt_gcs" in lang_output:
blob_path = lang_output["ad_vtt_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.audio_description_vtt_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
if "ad_mp3_gcs" in lang_output:
blob_path = lang_output["ad_mp3_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.audio_description_mp3_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
if "accessible_video_gcs" in lang_output:
blob_path = lang_output["accessible_video_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.accessible_video_mp4_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
language_outputs[lang] = lang_data
return PublicJobPreviewResponse(
job_id=str(job_doc["_id"]),
job_title=job_doc.get("title", "Untitled"),
job_status=job_doc.get("status", ""),
source_language=job_doc.get("source", {}).get("language", "en"),
languages=list(outputs.keys()),
language_outputs=language_outputs,
)
@router.post("/public/share/{token}/decision", response_model=ClientDecisionResponse)
async def client_decision(
token: str,
request: ClientDecisionRequest,
http_request: Request,
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Submit client approval or rejection via a share link. No authentication required."""
from ...services.validation import asset_validation_service
token_doc = await db[_TOKENS].find_one({"_id": token, "is_active": True})
if not token_doc:
raise HTTPException(status_code=404, detail="Share link not found or has been revoked")
if token_doc.get("expires_at") and token_doc["expires_at"] < datetime.utcnow():
raise HTTPException(status_code=410, detail="Share link has expired")
job_id = token_doc["job_id"]
job_doc = await db[_JOBS].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
if job_doc.get("status") != "pending_final_review":
raise HTTPException(
status_code=409,
detail="This job is not currently awaiting client review"
)
now = datetime.utcnow()
by_label = f"client:{request.client_name or 'anonymous'} (share/{token[:8]})"
if request.action == "approve":
is_valid, validation_errors = await asset_validation_service.validate_job_assets(job_doc)
if not is_valid:
raise HTTPException(
status_code=400,
detail=f"Asset validation failed: {'; '.join(validation_errors)}"
)
new_status = "completed"
update = {
"$set": {
"status": new_status,
"review.notes": request.notes or "",
"updated_at": now,
},
"$push": {
"review.history": {
"at": now,
"status": new_status,
"by": by_label,
"notes": request.notes or "",
}
},
}
else:
new_status = "qc_feedback"
update = {
"$set": {
"status": new_status,
"review.notes": request.notes or "",
"review.reviewer_id": by_label,
"updated_at": now,
},
"$push": {
"review.history": {
"at": now,
"status": new_status,
"by": by_label,
"notes": request.notes or "",
}
},
}
result = await db[_JOBS].find_one_and_update(
{"_id": job_id, "status": "pending_final_review"},
update,
return_document=True,
)
if not result:
raise HTTPException(
status_code=409,
detail="Decision could not be submitted — the job status may have changed"
)
await audit_logger.log_action(
action=AuditAction.SHARE_CLIENT_DECISION,
description=f"Client '{request.client_name or 'anonymous'}' submitted decision '{request.action}' for job '{job_id}' via share token",
user=None,
request=http_request,
resource_type="job",
resource_id=job_id,
details={
"action": request.action,
"token": token,
"client_name": request.client_name,
"new_status": new_status,
"notes": request.notes,
},
)
if request.action == "approve":
try:
from ...tasks.notify import notify_client_task
notify_client_task.delay(job_id)
except Exception:
pass
return ClientDecisionResponse(status="ok", new_job_status=new_status)

View file

@ -1,18 +1,18 @@
import asyncio import asyncio
import time import time
from typing import Literal from typing import Literal, Optional
from fastapi import APIRouter, Depends, HTTPException, Query from fastapi import APIRouter, Depends, HTTPException, Query
from fastapi.responses import Response from fastapi.responses import Response
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from ...core.config import settings from ...core.config import settings
from ...core.dependencies import get_current_user
from ...core.logging import get_logger from ...core.logging import get_logger
from ...services import cost_tracker
from ...services.elevenlabs_voices import elevenlabs_voice_service
from ...services.gemini_tts import gemini_tts_service from ...services.gemini_tts import gemini_tts_service
from ...services.elevenlabs_voices import elevenlabs_voice_service
from ...services.tts import tts_service from ...services.tts import tts_service
from ...services import cost_tracker
from ...core.dependencies import get_current_user
logger = get_logger(__name__) logger = get_logger(__name__)
@ -30,20 +30,20 @@ class VoicePreviewRequest(BaseModel):
style_preset: Literal[ style_preset: Literal[
"neutral", "calm", "energetic", "professional", "warm", "documentary", "custom" "neutral", "calm", "energetic", "professional", "warm", "documentary", "custom"
] = "neutral" ] = "neutral"
custom_style_prompt: str | None = None custom_style_prompt: Optional[str] = None
# ElevenLabs-specific # ElevenLabs-specific
stability: float | None = Field(default=None, ge=0.0, le=1.0) stability: Optional[float] = Field(default=None, ge=0.0, le=1.0)
similarity_boost: float | None = Field(default=None, ge=0.0, le=1.0) similarity_boost: Optional[float] = Field(default=None, ge=0.0, le=1.0)
class VoiceInfo(BaseModel): class VoiceInfo(BaseModel):
"""Structured voice information for any provider.""" """Structured voice information for any provider."""
id: str id: str
name: str name: str
description: str | None = None description: Optional[str] = None
preview_url: str | None = None preview_url: Optional[str] = None
labels: dict[str, str] | None = None labels: Optional[dict[str, str]] = None
category: str | None = None category: Optional[str] = None
class ProviderVoicesResponse(BaseModel): class ProviderVoicesResponse(BaseModel):
@ -52,7 +52,7 @@ class ProviderVoicesResponse(BaseModel):
voices: list[VoiceInfo] voices: list[VoiceInfo]
default: str default: str
available: bool = True available: bool = True
error: str | None = None error: Optional[str] = None
class LanguagesResponse(BaseModel): class LanguagesResponse(BaseModel):
@ -87,12 +87,12 @@ class ProviderOptionsResponse(BaseModel):
"""Available TTS configuration options for a provider.""" """Available TTS configuration options for a provider."""
provider: str provider: str
# Gemini-specific # Gemini-specific
models: list[TTSOptionItem] | None = None models: Optional[list[TTSOptionItem]] = None
style_presets: list[TTSOptionItem] | None = None style_presets: Optional[list[TTSOptionItem]] = None
speed_range: SpeedRange | None = None speed_range: Optional[SpeedRange] = None
# ElevenLabs-specific # ElevenLabs-specific
stability_range: FloatRange | None = None stability_range: Optional[FloatRange] = None
similarity_boost_range: FloatRange | None = None similarity_boost_range: Optional[FloatRange] = None
@router.get("/voices", response_model=ProviderVoicesResponse) @router.get("/voices", response_model=ProviderVoicesResponse)

View file

@ -1,151 +0,0 @@
"""VTT version control endpoints."""
from fastapi import APIRouter, Depends, HTTPException, Query, Request, status
from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, get_job_or_403, get_membership_context
from ...core.config import settings
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...models.audit_log import AuditAction
from ...models.user import User, UserRole
from ...models.vtt_version import (
VttDiffResponse,
VttKind,
VttVersionListResponse,
VttVersionSummary,
)
from ...services import vtt_versioning
from ...services.audit_logger import audit_logger
from ...services.gcs import gcs_service
router = APIRouter(prefix="/jobs", tags=["vtt-versions"])
_EDITABLE_ROLES = (UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)
@router.get("/{job_id}/vtt/versions", response_model=VttVersionListResponse)
async def list_vtt_versions(
job_id: str,
lang: str = Query(...),
kind: VttKind = Query(...),
skip: int = Query(0, ge=0),
limit: int = Query(50, ge=1, le=200),
current_user: User = Depends(require_roles(*_EDITABLE_ROLES)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List all VTT versions for a job/lang/kind, newest first."""
await get_job_or_403(job_id, ctx, db) # org check
return await vtt_versioning.list_versions(db, job_id, lang, kind, skip, limit)
@router.get("/{job_id}/vtt/versions/{version}", response_model=dict)
async def get_vtt_version(
job_id: str,
version: int,
lang: str = Query(...),
kind: VttKind = Query(...),
current_user: User = Depends(require_roles(*_EDITABLE_ROLES)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Get full VTT content for a specific version."""
await get_job_or_403(job_id, ctx, db) # org check
v = await vtt_versioning.get_version(db, job_id, lang, kind, version)
if not v:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Version not found")
return {
"job_id": v.job_id,
"lang": v.lang,
"kind": v.kind,
"version": v.version,
"content": v.content,
"gcs_uri": v.gcs_uri,
"created_at": v.created_at.isoformat(),
"created_by": v.created_by.dict(),
"note": v.note,
"parent_version": v.parent_version,
"cue_count": v.cue_count,
"byte_size": v.byte_size,
}
@router.get("/{job_id}/vtt/versions/diff", response_model=VttDiffResponse)
async def diff_vtt_versions(
job_id: str,
lang: str = Query(...),
kind: VttKind = Query(...),
from_version: int = Query(..., alias="from"),
to_version: int = Query(..., alias="to"),
current_user: User = Depends(require_roles(*_EDITABLE_ROLES)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Line-level diff between two versions of a VTT file."""
await get_job_or_403(job_id, ctx, db) # org check
v_from = await vtt_versioning.get_version(db, job_id, lang, kind, from_version)
v_to = await vtt_versioning.get_version(db, job_id, lang, kind, to_version)
if not v_from:
raise HTTPException(status_code=404, detail=f"Version {from_version} not found")
if not v_to:
raise HTTPException(status_code=404, detail=f"Version {to_version} not found")
return vtt_versioning.diff_versions(job_id, lang, kind, v_from, v_to)
@router.post(
"/{job_id}/vtt/versions/{version}/restore",
response_model=VttVersionSummary,
status_code=status.HTTP_201_CREATED,
)
async def restore_vtt_version(
job_id: str,
version: int,
lang: str = Query(...),
kind: VttKind = Query(...),
http_request: Request = None,
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""
Restore a previous version as the new live VTT.
Non-destructive: creates a new version entry whose content mirrors the old one,
then overwrites the live GCS file.
"""
await get_job_or_403(job_id, ctx, db) # org check
src = await vtt_versioning.get_version(db, job_id, lang, kind, version)
if not src:
raise HTTPException(status_code=404, detail="Version not found")
# Create new version snapshot (this also bumps the counter)
new_ver = await vtt_versioning.restore_version(db, job_id, lang, kind, version, current_user)
# Overwrite the live file in GCS so the QC editor sees the restored content
live_path = f"{job_id}/{lang}/{'captions' if kind == 'captions' else 'ad'}.vtt"
try:
await gcs_service.upload_text_to_gcs(src.content, live_path, "text/vtt")
except Exception as exc:
raise HTTPException(
status_code=500,
detail=f"Version snapshot created (v{new_ver.version}) but live file update failed: {exc}",
) from None
# Update the GCS URI pointer in the job document
gcs_uri_key = "captions_vtt_gcs" if kind == "captions" else "ad_vtt_gcs"
new_gcs_uri = f"gs://{settings.gcs_bucket}/{live_path}"
await db.jobs.update_one(
{"_id": job_id},
{"$set": {f"outputs.{lang}.{gcs_uri_key}": new_gcs_uri}},
)
await audit_logger.log_action(
action=AuditAction.VTT_EDIT,
description=f"VTT restored to v{version} for job {job_id} lang={lang} kind={kind}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "kind": kind, "restored_from_version": version, "new_version": new_ver.version},
)
return new_ver

View file

@ -5,140 +5,101 @@ Provides WebSocket endpoints for:
1. Individual job status updates: /ws/jobs/{job_id} 1. Individual job status updates: /ws/jobs/{job_id}
2. Job list updates: /ws/jobs (all jobs for authenticated user) 2. Job list updates: /ws/jobs (all jobs for authenticated user)
""" """
import asyncio
import logging import logging
from typing import Optional
from fastapi import ( from fastapi import APIRouter, WebSocket, WebSocketDisconnect, HTTPException, Depends, Query
APIRouter,
Depends,
Query,
WebSocket,
WebSocketDisconnect,
)
from fastapi.security import HTTPBearer from fastapi.security import HTTPBearer
from ...core.authz import PLATFORM_ADMIN_ROLES, _cached_memberships
from ...core.database import get_database
from ...models.user import UserRole
from ...services.websocket import ( from ...services.websocket import (
ConnectionManager,
authenticate_websocket,
connection_manager, connection_manager,
authenticate_websocket,
get_connection_manager, get_connection_manager,
ConnectionManager
) )
from ...models.job import Job
from ...core.database import get_database
from ...core.dependencies import get_current_user
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
router = APIRouter(tags=["WebSocket"]) router = APIRouter(tags=["WebSocket"])
security = HTTPBearer() security = HTTPBearer()
# Close codes that indicate a permanent auth/permission failure — frontend must NOT retry
_TERMINAL_CLOSE_CODES = {4001, 4003, 4004, 4403}
# Seconds between server-side keepalive frames.
# Must be < Apache mod_proxy_wstunnel idle timeout.
# Mod Comms incident 2026-03-18: 25s was insufficient; 20s is safe.
_KEEPALIVE_INTERVAL_S = 20
async def _resolve_user_and_org(websocket: WebSocket, user_id: str, db):
"""
Fetch user document and resolve org memberships from cache.
Returns (user_doc, memberships_dict) or closes the socket and returns (None, None).
"""
user = await db["users"].find_one({"_id": user_id})
if not user:
try:
from bson import ObjectId
user = await db["users"].find_one({"_id": ObjectId(user_id)})
except Exception:
pass
if not user:
await websocket.close(code=4001, reason="User not found")
return None, None
is_platform_admin = UserRole(user.get("role", "")) in PLATFORM_ADMIN_ROLES
if is_platform_admin:
return user, None # None memberships = unrestricted
memberships = await _cached_memberships(user_id, db)
return user, memberships
def _can_access_org(org_id: str | None, memberships: dict | None) -> bool:
"""Return True if user (with these memberships) may access the given org_id."""
if memberships is None:
return True # platform admin
if not org_id:
return True # legacy job without org: allow (further checks done below if needed)
return org_id in memberships
@router.websocket("/ws/jobs/{job_id}") @router.websocket("/ws/jobs/{job_id}")
async def websocket_job_status( async def websocket_job_status(
websocket: WebSocket, websocket: WebSocket,
job_id: str, job_id: str,
token: str | None = Query(None), token: Optional[str] = Query(None),
manager: ConnectionManager = Depends(get_connection_manager) manager: ConnectionManager = Depends(get_connection_manager)
): ):
""" """
WebSocket endpoint for real-time job status updates. WebSocket endpoint for real-time job status updates
Usage: Usage:
- Connect: ws://localhost:8000/api/v1/ws/jobs/{job_id}?token={jwt_token} - Connect: ws://localhost:8000/api/v1/ws/jobs/{job_id}?token={jwt_token}
- Receives: Real-time status updates for the specific job - Receives: Real-time status updates for the specific job
Close codes: Message format:
4001 user not found {
4003 role-based access denied "type": "job_status_update",
4004 job not found "data": {
4403 org membership access denied (do not retry) "job_id": "...",
"status": "processing",
"updated_at": "2023-...",
"message": "Processing video...",
"progress": 45
}
}
""" """
# Authenticate the WebSocket connection
user_id = await authenticate_websocket(websocket, token) user_id = await authenticate_websocket(websocket, token)
if not user_id: if not user_id:
return return
try: try:
# Verify user has access to this job
db = await get_database() db = await get_database()
jobs_collection = db["jobs"]
job = await db["jobs"].find_one({"_id": job_id}) job = await jobs_collection.find_one({"_id": job_id})
if not job: if not job:
await websocket.close(code=4004, reason="Job not found") await websocket.close(code=4004, reason="Job not found")
return return
user, memberships = await _resolve_user_and_org(websocket, user_id, db) # Check permissions - users can only access their own jobs unless they're admin/reviewer
if user is None: user = await db["users"].find_one({"_id": user_id})
return # socket already closed inside helper if not user:
try:
from bson import ObjectId
user = await db["users"].find_one({"_id": ObjectId(user_id)})
except Exception:
pass # Invalid ObjectId format
# Role-based client restriction if not user:
await websocket.close(code=4001, reason="User not found")
return
# Check access permissions
if user["role"] == "client" and job.get("created_by") != user_id: if user["role"] == "client" and job.get("created_by") != user_id:
await websocket.close(code=4003, reason="Access denied") await websocket.close(code=4003, reason="Access denied")
return return
# Org membership check # Connect to job status updates
job_org = job.get("organization_id")
if not _can_access_org(job_org, memberships):
await websocket.close(code=4403, reason="Org access denied")
return
await manager.connect_job_status(websocket, user_id, job_id) await manager.connect_job_status(websocket, user_id, job_id)
# Keep connection alive and handle incoming messages
while True: while True:
try: try:
# Wait up to _KEEPALIVE_INTERVAL_S for a client message. # Wait for incoming WebSocket messages (for heartbeat, etc.)
# On timeout send a keepalive frame so the proxy idle timer resets. message = await websocket.receive_text()
message = await asyncio.wait_for(
websocket.receive_text(),
timeout=_KEEPALIVE_INTERVAL_S,
)
logger.debug(f"Received WebSocket message from user {user_id}: {message}") logger.debug(f"Received WebSocket message from user {user_id}: {message}")
# Handle heartbeat or other client messages if needed
if message == "ping": if message == "ping":
await websocket.send_text("pong") await websocket.send_text("pong")
except TimeoutError:
await websocket.send_text("keepalive")
except WebSocketDisconnect: except WebSocketDisconnect:
break break
except Exception as e: except Exception as e:
@ -156,48 +117,69 @@ async def websocket_job_status(
@router.websocket("/ws/jobs") @router.websocket("/ws/jobs")
async def websocket_job_list( async def websocket_job_list(
websocket: WebSocket, websocket: WebSocket,
token: str | None = Query(None), token: Optional[str] = Query(None),
manager: ConnectionManager = Depends(get_connection_manager) manager: ConnectionManager = Depends(get_connection_manager)
): ):
""" """
WebSocket endpoint for real-time job list updates. WebSocket endpoint for real-time job list updates
Usage: Usage:
- Connect: ws://localhost:8000/api/v1/ws/jobs?token={jwt_token} - Connect: ws://localhost:8000/api/v1/ws/jobs?token={jwt_token}
- Receives: Real-time status updates for all jobs the user can access - Receives: Real-time status updates for all jobs the user can access
Only events for jobs in the user's accessible orgs are delivered. Message format:
{
"type": "job_list_update",
"data": {
"job_id": "...",
"status": "processing",
"updated_at": "2023-...",
"message": "Processing video...",
"progress": 45
}
}
""" """
# Authenticate the WebSocket connection
user_id = await authenticate_websocket(websocket, token) user_id = await authenticate_websocket(websocket, token)
if not user_id: if not user_id:
return return
try: try:
# Verify user exists
logger.info(f"WebSocket: Looking up user {user_id} in database") logger.info(f"WebSocket: Looking up user {user_id} in database")
db = await get_database() db = await get_database()
user, memberships = await _resolve_user_and_org(websocket, user_id, db) # Try looking up user by string ID first, then by ObjectId
if user is None: user = await db["users"].find_one({"_id": user_id})
return # socket already closed inside helper if not user:
try:
from bson import ObjectId
user = await db["users"].find_one({"_id": ObjectId(user_id)})
except Exception:
pass # Invalid ObjectId format
if not user:
logger.warning(f"WebSocket: User {user_id} not found in database (tried both string and ObjectId)")
await websocket.close(code=4001, reason="User not found")
return
logger.info(f"WebSocket: User {user_id} found, role: {user.get('role', 'unknown')}") logger.info(f"WebSocket: User {user_id} found, role: {user.get('role', 'unknown')}")
accessible_org_ids = None if memberships is None else list(memberships.keys()) logger.info(f"WebSocket: User {user_id} found, connecting to job list updates")
await manager.connect_job_list(websocket, user_id, accessible_org_ids=accessible_org_ids) # Connect to job list updates
await manager.connect_job_list(websocket, user_id)
# Keep connection alive and handle incoming messages
while True: while True:
try: try:
message = await asyncio.wait_for( # Wait for incoming WebSocket messages
websocket.receive_text(), message = await websocket.receive_text()
timeout=_KEEPALIVE_INTERVAL_S,
)
logger.debug(f"Received WebSocket message from user {user_id}: {message}") logger.debug(f"Received WebSocket message from user {user_id}: {message}")
# Handle heartbeat or other client messages if needed
if message == "ping": if message == "ping":
await websocket.send_text("pong") await websocket.send_text("pong")
except TimeoutError:
await websocket.send_text("keepalive")
except WebSocketDisconnect: except WebSocketDisconnect:
break break
except Exception as e: except Exception as e:
@ -214,7 +196,10 @@ async def websocket_job_list(
@router.get("/ws/status") @router.get("/ws/status")
async def websocket_status(): async def websocket_status():
"""Get WebSocket connection status and statistics (debug/monitoring).""" """
Get WebSocket connection status and statistics
Useful for debugging and monitoring
"""
stats = { stats = {
"active_connections": len(connection_manager.active_connections), "active_connections": len(connection_manager.active_connections),
"job_subscriptions": len(connection_manager.job_subscriptions), "job_subscriptions": len(connection_manager.job_subscriptions),
@ -225,4 +210,5 @@ async def websocket_status():
not connection_manager.subscriber_task.done() not connection_manager.subscriber_task.done()
) )
} }
return stats return stats

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -11,6 +11,7 @@ Provides:
import json import json
from dataclasses import dataclass from dataclasses import dataclass
from typing import Optional
from fastapi import Depends, HTTPException, status from fastapi import Depends, HTTPException, status
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
@ -63,10 +64,10 @@ async def _cached_memberships(
db: AsyncIOMotorDatabase, db: AsyncIOMotorDatabase,
) -> dict[str, OrgRole]: ) -> dict[str, OrgRole]:
"""Load memberships, with Redis cache (60s TTL).""" """Load memberships, with Redis cache (60s TTL)."""
cache_key = f"mem:user:{user_id}"
try: try:
redis = await get_redis() redis = get_redis()
if redis: if redis:
cache_key = f"mem:user:{user_id}"
cached = await redis.get(cache_key) cached = await redis.get(cache_key)
if cached: if cached:
raw = json.loads(cached) raw = json.loads(cached)
@ -77,7 +78,7 @@ async def _cached_memberships(
memberships = await _load_memberships(user_id, db) memberships = await _load_memberships(user_id, db)
try: try:
redis = await get_redis() redis = get_redis()
if redis: if redis:
await redis.setex( await redis.setex(
cache_key, cache_key,
@ -158,7 +159,7 @@ class OrgScopedQuery:
def filter( def filter(
self, self,
base_query: dict, base_query: dict,
org_id: str | None = None, org_id: Optional[str] = None,
org_field: str = "organization_id", org_field: str = "organization_id",
) -> dict: ) -> dict:
if self.ctx.is_platform_admin: if self.ctx.is_platform_admin:
@ -182,50 +183,6 @@ class OrgScopedQuery:
return {**base_query, org_field: {"$in": accessible}} return {**base_query, org_field: {"$in": accessible}}
def assert_user_in_org(
ctx: "MembershipContext",
org_id: str,
min_role: OrgRole = OrgRole.VIEWER,
) -> None:
"""Raise 403 if ctx user does not have min_role in org_id. Platform admins always pass."""
if not ctx.can_access_org(org_id, min_role):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Access to this organization is not permitted",
)
async def get_job_or_403(
job_id: str,
ctx: "MembershipContext",
db: AsyncIOMotorDatabase,
) -> dict:
"""Load job document and verify ctx user can access its organization. Returns 404 for missing jobs."""
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
org_id = job_doc.get("organization_id")
if not org_id:
# Legacy job without org: try resolving via project
project_id = job_doc.get("project_id")
if project_id:
project = await db.projects.find_one({"_id": project_id}, {"client_id": 1})
if project:
org_id = project.get("client_id")
if org_id:
if not ctx.can_access_org(org_id):
# Return 404 to avoid leaking existence of cross-org jobs
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
else:
# Truly legacy job (no project, no org): only the original uploader or admin can access
if not ctx.is_platform_admin and job_doc.get("client_id") != str(ctx.user.id):
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
return job_doc
async def bump_user_membership_cache(user_id: str) -> None: async def bump_user_membership_cache(user_id: str) -> None:
"""Invalidate the Redis membership cache for a user (call on any membership write).""" """Invalidate the Redis membership cache for a user (call on any membership write)."""
try: try:

View file

@ -6,7 +6,6 @@ class Settings(BaseSettings):
# App # App
app_env: str = "dev" app_env: str = "dev"
api_base_url: str = "http://localhost:8000" api_base_url: str = "http://localhost:8000"
app_url: str = "https://optical-dev.oliver.solutions/video-accessibility"
# Auth # Auth
jwt_secret: str jwt_secret: str
@ -30,7 +29,6 @@ class Settings(BaseSettings):
# GCP # GCP
gcp_project_id: str gcp_project_id: str
gcp_location: str = "us-central1"
gcs_bucket: str = "accessible-video" gcs_bucket: str = "accessible-video"
google_application_credentials: str = "" google_application_credentials: str = ""
@ -52,7 +50,7 @@ class Settings(BaseSettings):
elevenlabs_voices: dict[str, str] = {} elevenlabs_voices: dict[str, str] = {}
# Gemini TTS Configuration # Gemini TTS Configuration
gemini_tts_model: str = "gemini-3.1-flash-tts-preview" gemini_tts_model: str = "gemini-2.5-flash-preview-tts"
gemini_tts_default_voice: str = "Kore" gemini_tts_default_voice: str = "Kore"
gemini_tts_voices: list[str] = [ gemini_tts_voices: list[str] = [
"Zephyr", "Puck", "Charon", "Kore", "Fenrir", "Leda", "Orus", "Aoede", "Zephyr", "Puck", "Charon", "Kore", "Fenrir", "Leda", "Orus", "Aoede",
@ -95,24 +93,7 @@ class Settings(BaseSettings):
"sv": "sv-SE", "sv": "sv-SE",
"es-419": "es-US", "es-419": "es-US",
"pt-BR": "pt-BR", "pt-BR": "pt-BR",
"fr-CA": "fr-CA", "fr-CA": "fr-CA"
# Explicit region variants (added for locale-aware glossary support)
"de-DE": "de-DE",
"en-US": "en-US",
"en-GB": "en-GB",
"en-CA": "en-CA",
"es-ES": "es-ES",
"es-MX": "es-US",
"fr-FR": "fr-FR",
"it-IT": "it-IT",
"ja-JP": "ja-JP",
"ko-KR": "ko-KR",
"nl-NL": "nl-NL",
"pl-PL": "pl-PL",
"cs-CZ": "cs-CZ",
"tr-TR": "tr-TR",
"id-ID": "id-ID",
"pt-PT": "pt-PT",
} }
gemini_tts_language_names: dict[str, str] = { gemini_tts_language_names: dict[str, str] = {
"en": "English", "en": "English",
@ -148,24 +129,7 @@ class Settings(BaseSettings):
"sv": "Swedish", "sv": "Swedish",
"es-419": "Spanish (Latin America)", "es-419": "Spanish (Latin America)",
"pt-BR": "Portuguese (Brazil)", "pt-BR": "Portuguese (Brazil)",
"fr-CA": "French (Canada)", "fr-CA": "French (Canada)"
# Explicit region variants
"de-DE": "German (Germany)",
"en-US": "English (US)",
"en-GB": "English (UK)",
"en-CA": "English (Canada)",
"es-ES": "Spanish (Spain)",
"es-MX": "Spanish (Mexico)",
"fr-FR": "French (France)",
"it-IT": "Italian (Italy)",
"ja-JP": "Japanese (Japan)",
"ko-KR": "Korean (Korea)",
"nl-NL": "Dutch (Netherlands)",
"pl-PL": "Polish (Poland)",
"cs-CZ": "Czech (Czech Republic)",
"tr-TR": "Turkish (Turkey)",
"id-ID": "Indonesian (Indonesia)",
"pt-PT": "Portuguese (Portugal)",
} }
gemini_tts_preview_samples: dict[str, str] = { gemini_tts_preview_samples: dict[str, str] = {
"en": "This is a preview of the audio description voice.", "en": "This is a preview of the audio description voice.",
@ -201,30 +165,13 @@ class Settings(BaseSettings):
"sv": "Det här är en förhandsgranskning av ljudbeskrivningsrösten.", "sv": "Det här är en förhandsgranskning av ljudbeskrivningsrösten.",
"es-419": "Esta es una vista previa de la voz de audiodescripción.", "es-419": "Esta es una vista previa de la voz de audiodescripción.",
"pt-BR": "Esta é uma prévia da voz da audiodescrição.", "pt-BR": "Esta é uma prévia da voz da audiodescrição.",
"fr-CA": "Ceci est un aperçu de la voix de l'audiodescription.", "fr-CA": "Ceci est un aperçu de la voix de l'audiodescription."
# Explicit region variants
"de-DE": "Dies ist eine Vorschau der Audiodeskriptionsstimme.",
"en-US": "This is a preview of the audio description voice.",
"en-GB": "This is a preview of the audio description voice.",
"en-CA": "This is a preview of the audio description voice.",
"es-ES": "Esta es una vista previa de la voz de audiodescripción.",
"es-MX": "Esta es una vista previa de la voz de audiodescripción.",
"fr-FR": "Ceci est un aperçu de la voix de l'audiodescription.",
"it-IT": "Questa è un'anteprima della voce dell'audiodescrizione.",
"ja-JP": "これは音声解説の声のプレビューです。",
"ko-KR": "이것은 오디오 설명 음성의 미리보기입니다.",
"nl-NL": "Dit is een voorbeeld van de audiodescriptiestem.",
"pl-PL": "To jest podgląd głosu audiodeskrypcji.",
"cs-CZ": "Toto je náhled hlasu zvukového popisu.",
"tr-TR": "Bu, sesli betimleme sesinin bir önizlemesidir.",
"id-ID": "Ini adalah pratinjau suara deskripsi audio.",
"pt-PT": "Esta é uma pré-visualização da voz da audiodescrição.",
} }
# Gemini TTS Model Options # Gemini TTS Model Options
gemini_tts_models: dict[str, str] = { gemini_tts_models: dict[str, str] = {
"flash": "gemini-3.1-flash-tts-preview", # Fast, cost-efficient (Preview) "flash": "gemini-2.5-flash-preview-tts", # Fast, cost-efficient
"pro": "gemini-2.5-pro-tts", # Higher quality (GA) "pro": "gemini-2.5-pro-preview-tts", # Higher quality
} }
# Gemini TTS Style Presets - prompts prepended to text for style control # Gemini TTS Style Presets - prompts prepended to text for style control
@ -249,14 +196,6 @@ class Settings(BaseSettings):
whisper_sentence_gap_threshold: float = 0.5 # Gap duration to classify as sentence boundary whisper_sentence_gap_threshold: float = 0.5 # Gap duration to classify as sentence boundary
whisper_phrase_gap_threshold: float = 0.3 # Gap duration to classify as phrase boundary whisper_phrase_gap_threshold: float = 0.3 # Gap duration to classify as phrase boundary
whisper_min_gap_threshold: float = 0.15 # Minimum gap duration to consider whisper_min_gap_threshold: float = 0.15 # Minimum gap duration to consider
# Forward-preferred snap windows (A2)
whisper_snap_forward_window: float = 4.0 # Prefer boundary up to N seconds ahead of Gemini point
whisper_snap_backward_window: float = 1.5 # Fall back to boundary up to N seconds behind
# Adaptive silence buffer (A1)
ad_silence_buffer_default: float = 0.5 # Base silence duration (s) before/after AD audio
ad_silence_buffer_min_after: float = 0.1 # Minimum silence after AD audio
# Minimum gap required at the chosen pause point (A3)
ad_min_acceptable_gap: float = 0.2 # Seconds; points with shorter gaps trigger forward search
# Cloud Run Service URLs (empty = use local processing) # Cloud Run Service URLs (empty = use local processing)
# When set, CPU-intensive work is offloaded to Cloud Run with autoscaling # When set, CPU-intensive work is offloaded to Cloud Run with autoscaling
@ -275,10 +214,11 @@ class Settings(BaseSettings):
ffmpeg_worker_concurrency: int = 4 # FFmpeg tasks on main worker ffmpeg_worker_concurrency: int = 4 # FFmpeg tasks on main worker
tts_worker_concurrency: int = 8 # TTS worker tts_worker_concurrency: int = 8 # TTS worker
# Email (Mailgun) # Email (Mailgun — primary; sendgrid_api_key kept for backward compat)
mailgun_api_key: str = "" mailgun_api_key: str = ""
mailgun_domain: str = "mg.oliver.solutions" mailgun_domain: str = "mg.oliver.solutions"
mailgun_from: str = "noreply@mg.oliver.solutions" mailgun_from: str = "noreply@mg.oliver.solutions"
sendgrid_api_key: str = ""
email_from: str = "noreply@mg.oliver.solutions" email_from: str = "noreply@mg.oliver.solutions"
client_base_url: str client_base_url: str
@ -297,10 +237,6 @@ class Settings(BaseSettings):
cost_tracker_source_app: str = "video-accessibility" cost_tracker_source_app: str = "video-accessibility"
cost_tracker_enabled: bool = True cost_tracker_enabled: bool = True
# Upload limits (T-14 — single source of truth)
upload_max_video_bytes: int = 2 * 1024 * 1024 * 1024 # 2GB
upload_signed_url_ttl_hours: int = 24 # signed URL lifetime
# CORS - comma-separated list of allowed origins # CORS - comma-separated list of allowed origins
cors_origins: str = "http://localhost:5173,http://localhost:5174,http://localhost:3000,http://localhost:6001" cors_origins: str = "http://localhost:5173,http://localhost:5174,http://localhost:3000,http://localhost:6001"

View file

@ -64,19 +64,9 @@ async def create_indexes():
("error_message", "text") ("error_message", "text")
]) ])
# Per-language QC assignment index — for linguist queue queries
await db.jobs.create_index([("qc_assignments.linguist_id", 1), ("qc_assignments.status", 1)])
# Review notes collection indexes # Review notes collection indexes
await db.review_notes.create_index([("job_id", 1), ("asset_key", 1)]) await db.review_notes.create_index([("job_id", 1), ("asset_key", 1)])
await db.review_notes.create_index([("job_id", 1), ("asset_key", 1), ("timestamp_seconds", 1)]) await db.review_notes.create_index([("job_id", 1), ("asset_key", 1), ("timestamp_seconds", 1)])
await db.review_notes.create_index([("user_id", 1)]) await db.review_notes.create_index([("user_id", 1)])
# VTT versions collection indexes
await db.vtt_versions.create_index(
[("job_id", 1), ("lang", 1), ("kind", 1), ("version", -1)],
unique=True,
)
await db.vtt_versions.create_index([("job_id", 1), ("created_at", -1)])
logger.info("Database indexes created successfully") logger.info("Database indexes created successfully")

View file

@ -1,16 +1,18 @@
from typing import Optional
from fastapi import Depends, HTTPException, Request, status from fastapi import Depends, HTTPException, Request, status
from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
from ..models.user import User, UserRole from ..models.user import User, UserRole
from .config import settings
from .database import get_database from .database import get_database
from .security import decode_token from .security import decode_token
security = HTTPBearer() security = HTTPBearer()
# Only admins bypass tenant isolation; other staff are scoped by team membership # Roles that see all jobs (no tenant isolation)
STAFF_ROLES = {UserRole.ADMIN} STAFF_ROLES = {UserRole.ADMIN, UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION}
async def get_current_user( async def get_current_user(
@ -19,13 +21,6 @@ async def get_current_user(
) -> User: ) -> User:
token = credentials.credentials token = credentials.credentials
payload = decode_token(token) payload = decode_token(token)
if payload.get("type") == "refresh":
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Could not validate credentials",
)
user_id: str = payload.get("sub") user_id: str = payload.get("sub")
if user_id is None: if user_id is None:
@ -41,12 +36,7 @@ async def get_current_user(
detail="User not found", detail="User not found",
) )
user = User(**user_doc) return User(**user_doc)
# Attach org_ids hint from token as transient attribute (never used for authz)
token_org_ids = payload.get("org_ids", [])
if token_org_ids:
user.__dict__["org_ids"] = token_org_ids
return user
def require_role(required_role: UserRole): def require_role(required_role: UserRole):
@ -76,7 +66,7 @@ def require_roles(*required_roles: UserRole):
async def get_current_user_optional( async def get_current_user_optional(
request: Request, request: Request,
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
) -> User | None: ) -> Optional[User]:
authorization: str = request.headers.get("Authorization") authorization: str = request.headers.get("Authorization")
if not authorization: if not authorization:
return None return None
@ -87,9 +77,6 @@ async def get_current_user_optional(
return None return None
payload = decode_token(token) payload = decode_token(token)
if payload.get("type") == "refresh":
return None
user_id: str = payload.get("sub") user_id: str = payload.get("sub")
if user_id is None: if user_id is None:
@ -107,28 +94,21 @@ async def get_current_user_optional(
async def get_accessible_project_ids( async def get_accessible_project_ids(
user: User, user: User,
db: AsyncIOMotorDatabase, db: AsyncIOMotorDatabase,
) -> list[str] | None: ) -> Optional[list[str]]:
""" """
Returns project IDs the user may access, or None meaning "see everything". Returns project IDs the user may access, or None meaning "see everything".
- Admin None (unrestricted) - Staff / Admin None (unrestricted)
- Staff (REVIEWER/LINGUIST/PRODUCTION) scoped by team membership; - Otherwise projects in orgs where the user holds any membership
if not yet assigned to any team, falls back to None (see all) (falls back to legacy pm_client_ids/team lookups if no memberships found)
so existing staff aren't locked out before teams are configured
- PM projects in accessible orgs/clients (pm_client_ids legacy)
- CLIENT projects in orgs where the user holds any membership
""" """
if user.role in STAFF_ROLES: if user.role in STAFF_ROLES:
return None return None
# Primary path: use memberships collection (Phase 3 SaaS)
user_id = str(user.id) user_id = str(user.id)
membership_cursor = db.memberships.find({"user_id": user_id}, {"organization_id": 1})
# Primary path: use Redis-cached memberships (60s TTL, same cache as authz.py) org_ids = [doc["organization_id"] async for doc in membership_cursor]
from .authz import (
_cached_memberships, # local import to avoid circular dep at module level
)
memberships_map = await _cached_memberships(user_id, db)
org_ids = list(memberships_map.keys())
if org_ids: if org_ids:
projects = await db.projects.find( projects = await db.projects.find(
@ -137,98 +117,29 @@ async def get_accessible_project_ids(
).to_list(None) ).to_list(None)
return [str(p["_id"]) for p in projects] return [str(p["_id"]) for p in projects]
# Legacy fallback: team membership (used by REVIEWER/LINGUIST/PRODUCTION and legacy CLIENT) # Legacy fallback (pre-backfill) — keeps the app working before migration runs
teams = await db.teams.find( if user.role == UserRole.PROJECT_MANAGER:
{"member_user_ids": user_id}, client_ids = user.pm_client_ids or []
{"client_id": 1}, if not client_ids:
).to_list(None) return []
client_ids = list({t["client_id"] for t in teams})
if client_ids:
projects = await db.projects.find( projects = await db.projects.find(
{"client_id": {"$in": client_ids}, "is_active": True}, {"client_id": {"$in": client_ids}, "is_active": True},
{"_id": 1}, {"_id": 1},
).to_list(None) ).to_list(None)
return [str(p["_id"]) for p in projects] return [str(p["_id"]) for p in projects]
# PM legacy: scoped via pm_client_ids teams = await db.teams.find(
if user.role == UserRole.PROJECT_MANAGER: {"member_user_ids": user_id},
pm_client_ids = user.pm_client_ids or [] {"client_id": 1},
if not pm_client_ids: ).to_list(None)
return [] client_ids = list({t["client_id"] for t in teams})
projects = await db.projects.find( if not client_ids:
{"client_id": {"$in": pm_client_ids}, "is_active": True}, return []
{"_id": 1}, projects = await db.projects.find(
).to_list(None) {"client_id": {"$in": client_ids}, "is_active": True},
return [str(p["_id"]) for p in projects] {"_id": 1},
).to_list(None)
# Staff with no team assignments → unrestricted until teams are configured return [str(p["_id"]) for p in projects]
if user.role in {UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION}:
return None
# CLIENT with no memberships and no teams → show nothing
return []
async def get_user_org_ids(user: User, db: AsyncIOMotorDatabase) -> list[str] | None:
"""Return org IDs the user belongs to, or None meaning unrestricted (ADMIN).
Priority: memberships pm_client_ids (PM legacy) team.member_user_ids (staff legacy)
"""
if user.role == UserRole.ADMIN:
return None
user_id = str(user.id)
# Primary: Membership collection
org_ids: list[str] = []
async for m in db.memberships.find({"user_id": user_id}, {"organization_id": 1}):
if m.get("organization_id"):
org_ids.append(str(m["organization_id"]))
if org_ids:
return org_ids
# PM legacy: pm_client_ids
if user.role == UserRole.PROJECT_MANAGER:
return list(user.pm_client_ids or [])
# Staff legacy: team.member_user_ids
teams = await db.teams.find({"member_user_ids": user_id}, {"client_id": 1}).to_list(None)
if teams:
return [str(t["client_id"]) for t in teams if t.get("client_id")]
return []
async def assert_job_in_user_org(job: dict, user: User, db: AsyncIOMotorDatabase) -> None:
"""Raise 404 (not 403) when user cannot access this job — avoids information disclosure."""
if user.role == UserRole.ADMIN:
return
org_ids = await get_user_org_ids(user, db)
if org_ids is None:
return # unrestricted
job_org = job.get("organization_id")
if job_org:
if job_org in org_ids:
return
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
# No organization_id — try project fallback
project_id = job.get("project_id")
if project_id:
project = await db.projects.find_one({"_id": project_id}, {"client_id": 1})
if project and project.get("client_id") in org_ids:
return
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
# Legacy: client_id == creator user_id
job_client_id = job.get("client_id")
if job_client_id and job_client_id == str(user.id):
return
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
def require_pm_for_client(client_id_param: str = "client_id"): def require_pm_for_client(client_id_param: str = "client_id"):

View file

@ -1,6 +1,10 @@
"""Enhanced configuration system with Secret Manager integration.""" """Enhanced configuration system with Secret Manager integration."""
import os
import asyncio
from typing import Dict, Optional, Any
from functools import lru_cache from functools import lru_cache
from pydantic_settings import BaseSettings
from .config import Settings as BaseConfig from .config import Settings as BaseConfig
from .logging import get_logger from .logging import get_logger
@ -17,7 +21,7 @@ class SecretsConfig(BaseConfig):
# Flag to track if secrets have been loaded # Flag to track if secrets have been loaded
self._secrets_loaded = False self._secrets_loaded = False
self._secret_values: dict[str, str] = {} self._secret_values: Dict[str, str] = {}
async def load_secrets(self) -> None: async def load_secrets(self) -> None:
"""Load secrets from Secret Manager asynchronously.""" """Load secrets from Secret Manager asynchronously."""
@ -36,6 +40,7 @@ class SecretsConfig(BaseConfig):
"mongodb_uri": "mongodb-url", "mongodb_uri": "mongodb-url",
"redis_url": "redis-url", "redis_url": "redis-url",
"gemini_api_key": "gemini-api-key", "gemini_api_key": "gemini-api-key",
"sendgrid_api_key": "sendgrid-api-key",
"elevenlabs_api_key": "elevenlabs-api-key", "elevenlabs_api_key": "elevenlabs-api-key",
"sentry_dsn": "sentry-dsn" "sentry_dsn": "sentry-dsn"
} }
@ -62,7 +67,7 @@ class SecretsConfig(BaseConfig):
logger.warning("Falling back to environment variables") logger.warning("Falling back to environment variables")
self._secrets_loaded = True # Mark as loaded to prevent retries self._secrets_loaded = True # Mark as loaded to prevent retries
def get_secret_value(self, field_name: str) -> str | None: def get_secret_value(self, field_name: str) -> Optional[str]:
"""Get a secret value if it was loaded from Secret Manager.""" """Get a secret value if it was loaded from Secret Manager."""
return self._secret_values.get(field_name) return self._secret_values.get(field_name)
@ -104,7 +109,7 @@ class SecretsConfig(BaseConfig):
# Global configuration instance # Global configuration instance
_config_instance: SecretsConfig | None = None _config_instance: Optional[SecretsConfig] = None
async def initialize_config() -> SecretsConfig: async def initialize_config() -> SecretsConfig:
@ -130,7 +135,7 @@ def get_settings() -> SecretsConfig:
return _config_instance return _config_instance
@lru_cache @lru_cache()
def get_settings_cached() -> SecretsConfig: def get_settings_cached() -> SecretsConfig:
"""Get cached settings instance.""" """Get cached settings instance."""
return get_settings() return get_settings()

View file

@ -1,5 +1,5 @@
from datetime import datetime, timedelta from datetime import datetime, timedelta
from typing import Any from typing import Any, Optional, Union
from fastapi import HTTPException, status from fastapi import HTTPException, status
from jose import JWTError, jwt from jose import JWTError, jwt
@ -11,24 +11,20 @@ pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
def create_access_token( def create_access_token(
subject: str | Any, subject: Union[str, Any], expires_delta: Optional[timedelta] = None
expires_delta: timedelta | None = None,
org_ids: list[str] | None = None,
) -> str: ) -> str:
if expires_delta: if expires_delta:
expire = datetime.utcnow() + expires_delta expire = datetime.utcnow() + expires_delta
else: else:
expire = datetime.utcnow() + timedelta(minutes=settings.jwt_access_ttl_min) expire = datetime.utcnow() + timedelta(minutes=settings.jwt_access_ttl_min)
to_encode: dict[str, Any] = {"exp": expire, "sub": str(subject), "v": 2} to_encode = {"exp": expire, "sub": str(subject)}
if org_ids:
to_encode["org_ids"] = org_ids
encoded_jwt = jwt.encode(to_encode, settings.jwt_secret, algorithm=settings.jwt_alg) encoded_jwt = jwt.encode(to_encode, settings.jwt_secret, algorithm=settings.jwt_alg)
return encoded_jwt return encoded_jwt
def create_refresh_token( def create_refresh_token(
subject: str | Any, expires_delta: timedelta | None = None subject: Union[str, Any], expires_delta: Optional[timedelta] = None
) -> str: ) -> str:
if expires_delta: if expires_delta:
expire = datetime.utcnow() + expires_delta expire = datetime.utcnow() + expires_delta
@ -41,8 +37,6 @@ def create_refresh_token(
def verify_password(plain_password: str, hashed_password: str) -> bool: def verify_password(plain_password: str, hashed_password: str) -> bool:
if not hashed_password:
return False
return pwd_context.verify(plain_password, hashed_password) return pwd_context.verify(plain_password, hashed_password)
@ -58,4 +52,4 @@ def decode_token(token: str) -> dict[str, Any]:
raise HTTPException( raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED, status_code=status.HTTP_401_UNAUTHORIZED,
detail="Could not validate credentials", detail="Could not validate credentials",
) from None )

View file

@ -34,13 +34,7 @@ async def seed_default_admin(db) -> None:
print(f"✅ Default admin {DEFAULT_ADMIN_EMAIL} already exists") print(f"✅ Default admin {DEFAULT_ADMIN_EMAIL} already exists")
return return
password = os.environ.get("DEFAULT_ADMIN_PASSWORD") password = os.environ.get("DEFAULT_ADMIN_PASSWORD", "ChangeMe123!")
if not password:
print(
"⚠️ DEFAULT_ADMIN_PASSWORD not set — skipping default admin creation. "
"Set this env var and restart to create the admin account."
)
return
user_doc = { user_doc = {
"_id": str(ObjectId()), "_id": str(ObjectId()),
"email": DEFAULT_ADMIN_EMAIL, "email": DEFAULT_ADMIN_EMAIL,

Binary file not shown.

View file

@ -1,245 +0,0 @@
"""
Central locale registry.
Provides a single source of truth for BCP-47 codes, display names,
and Gemini-friendly labels used throughout the translation/TTS pipeline.
Convention: BCP-47 with hyphen separator (fr-FR, en-GB, pt-BR).
xlsx underscore format (fr_fr, en_gb) is normalized at import time.
Bare language-only codes (fr, en) remain valid for legacy compat.
"""
from __future__ import annotations
from dataclasses import dataclass
@dataclass(frozen=True)
class Locale:
code: str # canonical BCP-47 (e.g. "fr-FR")
display_name: str # human-readable (e.g. "French (France)")
gemini_label: str # what to pass to Gemini prompts (e.g. "French (France)")
tts_lang: str # BCP-47 for TTS API (may differ, e.g. es-MX → es-US)
preview_sample: str # sample sentence for TTS preview
# Master locale registry. Bare language codes (legacy) + explicit region variants.
_REGISTRY: dict[str, Locale] = {loc.code: loc for loc in [
# ── English ──────────────────────────────────────────────────────────────
Locale("en", "English", "English", "en-US",
"This is a preview of the audio description voice."),
Locale("en-US", "English (US)", "English (United States)", "en-US",
"This is a preview of the audio description voice."),
Locale("en-GB", "English (UK)", "English (United Kingdom)", "en-GB",
"This is a preview of the audio description voice."),
Locale("en-CA", "English (Canada)", "English (Canada)", "en-CA",
"This is a preview of the audio description voice."),
# ── Spanish ──────────────────────────────────────────────────────────────
Locale("es", "Spanish", "Spanish", "es-US",
"Esta es una vista previa de la voz de audiodescripcion."),
Locale("es-ES", "Spanish (Spain)", "Spanish (Spain)", "es-ES",
"Esta es una vista previa de la voz de audiodescripción."),
Locale("es-MX", "Spanish (Mexico)", "Spanish (Mexico)", "es-US",
"Esta es una vista previa de la voz de audiodescripción."),
Locale("es-419", "Spanish (Latin America)", "Spanish (Latin America)", "es-US",
"Esta es una vista previa de la voz de audiodescripción."),
# ── French ───────────────────────────────────────────────────────────────
Locale("fr", "French", "French", "fr-FR",
"Ceci est un apercu de la voix de l'audiodescription."),
Locale("fr-FR", "French (France)", "French (France)", "fr-FR",
"Ceci est un aperçu de la voix de l'audiodescription."),
Locale("fr-CA", "French (Canada)", "French (Canada)", "fr-CA",
"Ceci est un aperçu de la voix de l'audiodescription."),
# ── German ───────────────────────────────────────────────────────────────
Locale("de", "German", "German", "de-DE",
"Dies ist eine Vorschau der Audiodeskriptionsstimme."),
Locale("de-DE", "German (Germany)", "German (Germany)", "de-DE",
"Dies ist eine Vorschau der Audiodeskriptionsstimme."),
# ── Italian ──────────────────────────────────────────────────────────────
Locale("it", "Italian", "Italian", "it-IT",
"Questa e un'anteprima della voce dell'audiodescrizione."),
Locale("it-IT", "Italian (Italy)", "Italian (Italy)", "it-IT",
"Questa è un'anteprima della voce dell'audiodescrizione."),
# ── Portuguese ───────────────────────────────────────────────────────────
Locale("pt", "Portuguese", "Portuguese", "pt-BR",
"Esta e uma previa da voz da audiodescricao."),
Locale("pt-BR", "Portuguese (Brazil)", "Portuguese (Brazil)", "pt-BR",
"Esta é uma prévia da voz da audiodescrição."),
Locale("pt-PT", "Portuguese (Portugal)", "Portuguese (Portugal)", "pt-PT",
"Esta é uma pré-visualização da voz da audiodescrição."),
# ── Japanese ─────────────────────────────────────────────────────────────
Locale("ja", "Japanese", "Japanese", "ja-JP",
"これは音声解説の声のプレビューです。"),
Locale("ja-JP", "Japanese (Japan)", "Japanese (Japan)", "ja-JP",
"これは音声解説の声のプレビューです。"),
# ── Korean ───────────────────────────────────────────────────────────────
Locale("ko", "Korean", "Korean", "ko-KR",
"이것은 오디오 설명 음성의 미리보기입니다."),
Locale("ko-KR", "Korean (Korea)", "Korean (South Korea)", "ko-KR",
"이것은 오디오 설명 음성의 미리보기입니다."),
# ── Arabic ───────────────────────────────────────────────────────────────
Locale("ar", "Arabic", "Arabic", "ar-EG",
"هذه معاينة لصوت الوصف الصوتي."),
# ── Hindi ────────────────────────────────────────────────────────────────
Locale("hi", "Hindi", "Hindi", "hi-IN",
"यह ऑडियो विवरण आवाज का पूर्वावलोकन है।"),
# ── Indonesian ───────────────────────────────────────────────────────────
Locale("id", "Indonesian", "Indonesian", "id-ID",
"Ini adalah pratinjau suara deskripsi audio."),
Locale("id-ID", "Indonesian (Indonesia)", "Indonesian (Indonesia)", "id-ID",
"Ini adalah pratinjau suara deskripsi audio."),
# ── Dutch ────────────────────────────────────────────────────────────────
Locale("nl", "Dutch", "Dutch", "nl-NL",
"Dit is een voorbeeld van de audiodescriptiestem."),
Locale("nl-NL", "Dutch (Netherlands)", "Dutch (Netherlands)", "nl-NL",
"Dit is een voorbeeld van de audiodescriptiestem."),
# ── Polish ───────────────────────────────────────────────────────────────
Locale("pl", "Polish", "Polish", "pl-PL",
"To jest podglad glosu audiodeskrypcji."),
Locale("pl-PL", "Polish (Poland)", "Polish (Poland)", "pl-PL",
"To jest podgląd głosu audiodeskrypcji."),
# ── Russian ──────────────────────────────────────────────────────────────
Locale("ru", "Russian", "Russian", "ru-RU",
"Это предварительный просмотр голоса аудиоописания."),
# ── Thai ─────────────────────────────────────────────────────────────────
Locale("th", "Thai", "Thai", "th-TH",
"นี่คือตัวอย่างเสียงบรรยายภาพ"),
# ── Turkish ──────────────────────────────────────────────────────────────
Locale("tr", "Turkish", "Turkish", "tr-TR",
"Bu, sesli betimleme sesinin bir onizlemesidir."),
Locale("tr-TR", "Turkish (Turkey)", "Turkish (Turkey)", "tr-TR",
"Bu, sesli betimleme sesinin bir önizlemesidir."),
# ── Vietnamese ───────────────────────────────────────────────────────────
Locale("vi", "Vietnamese", "Vietnamese", "vi-VN",
"Day la ban xem truoc giong mo ta am thanh."),
# ── Romanian ─────────────────────────────────────────────────────────────
Locale("ro", "Romanian", "Romanian", "ro-RO",
"Aceasta este o previzualizare a vocii descrierii audio."),
# ── Ukrainian ────────────────────────────────────────────────────────────
Locale("uk", "Ukrainian", "Ukrainian", "uk-UA",
"Це попередній перегляд голосу аудіоопису."),
# ── Bengali ──────────────────────────────────────────────────────────────
Locale("bn", "Bengali", "Bengali", "bn-BD",
"এটি অডিও বর্ণনা ভয়েসের একটি প্রিভিউ।"),
# ── Marathi ──────────────────────────────────────────────────────────────
Locale("mr", "Marathi", "Marathi", "mr-IN",
"हे ऑडिओ वर्णन आवाजाचे पूर्वावलोकन आहे."),
# ── Tamil ────────────────────────────────────────────────────────────────
Locale("ta", "Tamil", "Tamil", "ta-IN",
"இது ஆடியோ விளக்க குரலின் முன்னோட்டம்."),
# ── Telugu ───────────────────────────────────────────────────────────────
Locale("te", "Telugu", "Telugu", "te-IN",
"ఇది ఆడియో వివరణ స్వరం యొక్క ప్రివ్యూ."),
# ── Chinese ──────────────────────────────────────────────────────────────
Locale("zh", "Chinese", "Chinese (Simplified)", "zh-CN",
"这是音频描述语音的预览。"),
# ── Czech ────────────────────────────────────────────────────────────────
Locale("cs", "Czech", "Czech", "cs-CZ",
"Toto je náhled hlasu zvukového popisu."),
Locale("cs-CZ", "Czech (Czech Republic)", "Czech (Czech Republic)", "cs-CZ",
"Toto je náhled hlasu zvukového popisu."),
# ── Danish ───────────────────────────────────────────────────────────────
Locale("da", "Danish", "Danish", "da-DK",
"Dette er en forhåndsvisning af lydbeskrivelsesstemmen."),
# ── Finnish ──────────────────────────────────────────────────────────────
Locale("fi", "Finnish", "Finnish", "fi-FI",
"Tämä on äänikuvauksen äänen esikatselu."),
# ── Hungarian ────────────────────────────────────────────────────────────
Locale("hu", "Hungarian", "Hungarian", "hu-HU",
"Ez a hangos leírás hangjának előnézete."),
# ── Norwegian ────────────────────────────────────────────────────────────
Locale("no", "Norwegian", "Norwegian", "nb-NO",
"Dette er en forhåndsvisning av lydbeskrivelsesstemmen."),
# ── Slovak ───────────────────────────────────────────────────────────────
Locale("sk", "Slovak", "Slovak", "sk-SK",
"Toto je náhľad hlasu zvukového popisu."),
# ── Swedish ──────────────────────────────────────────────────────────────
Locale("sv", "Swedish", "Swedish", "sv-SE",
"Det här är en förhandsgranskning av ljudbeskrivningsrösten."),
]}
# xlsx uses underscores; normalize to BCP-47 hyphen form
_XLSX_ALIASES: dict[str, str] = {
code.replace("-", "_").lower(): code
for code in _REGISTRY
if "-" in code
}
# a few extra mappings for edge cases
_XLSX_ALIASES.update({
"id": "id", # Indonesian column header is just "id" (no region)
})
def normalize_code(code: str) -> str:
"""
Normalize an arbitrary locale code to the canonical BCP-47 form used in this registry.
Handles:
- xlsx underscore form: "fr_fr" "fr-FR"
- Bare language code: "fr" "fr" (passthrough, legacy compat)
- Already canonical: "fr-FR" "fr-FR"
"""
if not code:
return code
lowered = code.strip().lower()
# e.g. "fr_fr" -> check alias table
if "_" in lowered:
return _XLSX_ALIASES.get(lowered, code.replace("_", "-").upper() if len(lowered) > 3 else code)
# Already hyphen form — canonicalise case
if "-" in code:
parts = code.split("-", 1)
canonical = f"{parts[0].lower()}-{parts[1].upper()}"
if canonical in _REGISTRY:
return canonical
return canonical
# Bare language code — return as-is (legacy)
return lowered
def get(code: str) -> Locale | None:
"""Return Locale for the given code, or None if unknown."""
canonical = normalize_code(code)
return _REGISTRY.get(canonical) or _REGISTRY.get(canonical.split("-")[0])
def get_display_name(code: str) -> str:
"""Human-readable display name, e.g. 'French (Canada)'."""
locale = get(code)
return locale.display_name if locale else code
def get_gemini_label(code: str) -> str:
"""
Label to use inside Gemini prompts, e.g. 'French (Canada)'.
Gemini models respond more reliably to human-readable language names
than to bare BCP-47 codes when used inside instruction prompts.
"""
locale = get(code)
return locale.gemini_label if locale else code
def get_tts_lang(code: str) -> str:
"""BCP-47 code for the TTS API (may differ from canonical, e.g. es-MX → es-US)."""
locale = get(code)
return locale.tts_lang if locale else code
def get_preview_sample(code: str) -> str:
"""Language-appropriate TTS preview sentence."""
locale = get(code)
if locale:
return locale.preview_sample
# fallback: try parent language then English
parent = get(code.split("-")[0]) if "-" in code else None
if parent:
return parent.preview_sample
return "This is a preview of the audio description voice."
def all_codes() -> list[str]:
"""Return all registered locale codes, sorted."""
return sorted(_REGISTRY.keys())
def all_display_map() -> dict[str, str]:
"""Return {code: display_name} for all registered locales."""
return {code: locale.display_name for code, locale in _REGISTRY.items()}

View file

@ -8,7 +8,6 @@ class VTTCue:
end_time: float # seconds end_time: float # seconds
text: str text: str
identifier: str | None = None identifier: str | None = None
settings: str = ""
class VTTParser: class VTTParser:
@ -38,11 +37,10 @@ class VTTParser:
# Parse timing line # Parse timing line
if " --> " in line: if " --> " in line:
timing_match = re.match(r'([\d:.,]+)\s+-->\s+([\d:.,]+)\s*(.*)', line) timing_match = re.match(r'([\d:.,]+)\s+-->\s+([\d:.,]+)', line)
if timing_match: if timing_match:
start_time = VTTParser._parse_timestamp(timing_match.group(1)) start_time = VTTParser._parse_timestamp(timing_match.group(1))
end_time = VTTParser._parse_timestamp(timing_match.group(2)) end_time = VTTParser._parse_timestamp(timing_match.group(2))
settings = timing_match.group(3).strip()
# Collect text lines until empty line or next cue # Collect text lines until empty line or next cue
i += 1 i += 1
@ -51,13 +49,13 @@ class VTTParser:
text_lines.append(lines[i].strip()) text_lines.append(lines[i].strip())
i += 1 i += 1
cues.append(VTTCue( if text_lines:
start_time=start_time, cues.append(VTTCue(
end_time=end_time, start_time=start_time,
text="\n".join(text_lines), end_time=end_time,
identifier=identifier, text="\n".join(text_lines),
settings=settings, identifier=identifier
)) ))
else: else:
i += 1 i += 1
@ -73,19 +71,16 @@ class VTTParser:
if cue.identifier: if cue.identifier:
lines.append(cue.identifier) lines.append(cue.identifier)
# Add timing line (preserve cue settings like line:0%) # Add timing line
start_timestamp = VTTParser._format_timestamp(cue.start_time) start_timestamp = VTTParser._format_timestamp(cue.start_time)
end_timestamp = VTTParser._format_timestamp(cue.end_time) end_timestamp = VTTParser._format_timestamp(cue.end_time)
timing_line = f"{start_timestamp} --> {end_timestamp}" lines.append(f"{start_timestamp} --> {end_timestamp}")
if cue.settings:
timing_line += f" {cue.settings}"
lines.append(timing_line)
# Add text (can be multi-line) # Add text (can be multi-line)
lines.append(cue.text) lines.append(cue.text)
lines.append("") # Empty line between cues lines.append("") # Empty line between cues
return "\n".join(lines) + "\n" return "\n".join(lines)
@staticmethod @staticmethod
def _parse_timestamp(timestamp: str) -> float: def _parse_timestamp(timestamp: str) -> float:
@ -126,7 +121,7 @@ class VTTParser:
secs = seconds % 60 secs = seconds % 60
whole_secs = int(secs) whole_secs = int(secs)
milliseconds = round((secs - whole_secs) * 1000) milliseconds = int((secs - whole_secs) * 1000)
return f"{hours:02d}:{minutes:02d}:{whole_secs:02d}.{milliseconds:03d}" return f"{hours:02d}:{minutes:02d}:{whole_secs:02d}.{milliseconds:03d}"
@ -153,22 +148,6 @@ class VTTEditor:
return VTTParser.build(cues) return VTTParser.build(cues)
@staticmethod
def assert_cue_alignment(en_vtt: str, target_vtt: str, lang: str) -> None:
"""Raise ValueError if target VTT cue count or timestamps diverge from EN master."""
en_cues = VTTParser.parse(en_vtt)
tgt_cues = VTTParser.parse(target_vtt)
if len(tgt_cues) != len(en_cues):
raise ValueError(
f"Cue count mismatch for {lang}: EN has {len(en_cues)}, target has {len(tgt_cues)}"
)
for i, (en, tgt) in enumerate(zip(en_cues, tgt_cues, strict=True)):
if en.start_time != tgt.start_time or en.end_time != tgt.end_time:
raise ValueError(
f"Timestamp mismatch for {lang} cue {i}: "
f"EN {en.start_time}-->{en.end_time}, target {tgt.start_time}-->{tgt.end_time}"
)
@staticmethod @staticmethod
def update_cue_text(vtt_content: str, cue_index: int, new_text: str) -> str: def update_cue_text(vtt_content: str, cue_index: int, new_text: str) -> str:
"""Update text for a specific cue by index""" """Update text for a specific cue by index"""
@ -207,20 +186,6 @@ class VTTEditor:
return len(errors) == 0, errors return len(errors) == 0, errors
@staticmethod
def fix_overlapping_cues(vtt_content: str) -> str:
"""Trim end_time of each cue so it does not overlap the next cue's start_time."""
cues = VTTParser.parse(vtt_content)
for i in range(1, len(cues)):
if cues[i].start_time < cues[i - 1].end_time:
# Clamp previous cue end to 1ms before next cue start
new_end = cues[i].start_time - 0.001
# Never let end_time go at or below start_time
if new_end <= cues[i - 1].start_time:
new_end = cues[i - 1].start_time + 0.001
cues[i - 1].end_time = new_end
return VTTParser.build(cues)
@staticmethod @staticmethod
def get_cue_count(vtt_content: str) -> int: def get_cue_count(vtt_content: str) -> int:
"""Get the number of cues in VTT content""" """Get the number of cues in VTT content"""
@ -256,7 +221,7 @@ class VTTEditor:
) )
return False, errors return False, errors
for i, (src, tgt) in enumerate(zip(source_cues, translated_cues, strict=False)): for i, (src, tgt) in enumerate(zip(source_cues, translated_cues)):
if abs(src.start_time - tgt.start_time) > 0.001: if abs(src.start_time - tgt.start_time) > 0.001:
errors.append( errors.append(
f"Cue {i + 1}: start time changed " f"Cue {i + 1}: start time changed "
@ -286,33 +251,3 @@ class VTTEditor:
return VTTParser.build(cues) return VTTParser.build(cues)
# DCMP §6.01 filler patterns per language (whole-word, case-insensitive)
_FILLER_PATTERNS: dict[str, str] = {
"en": r'\b(um+|uh+|ah+|er+|hmm+|you know|i mean|sort of|kind of|basically|literally|honestly|actually|right\?|so yeah)\b',
"es": r'\b(eh+|este|o sea|pues|bueno|o sea que|mmm+)\b',
"fr": r'\b(euh+|beh|ben|donc|quoi|enfin|voilà|genre)\b',
"de": r'\b(äh+|ähm+|halt|ne|also|naja|sozusagen|quasi)\b',
"it": r'\b(ehm+|allora|cioè|tipo|praticamente|insomma|ecco)\b',
"nl": r'\b(eh+|nou|zeg|eigenlijk|gewoon|toch|zo van|hè)\b',
"pt": r'\b(ahn+|hã+|né|sabe|tipo|então|assim)\b',
"pl": r'\b(no|że|bo|znaczy|właśnie|jakby|wiesz)\b',
"uk": r'\b(ну+|ем+|типу|знаєш|значить|власне|от)\b',
"ru": r'\b(ну+|эм+|типа|знаешь|значит|вот|собственно)\b',
}
@staticmethod
def clean_disfluencies(vtt_content: str, lang: str) -> str:
"""Remove filler words and hesitations per DCMP §6.01 for supported languages."""
pattern = VTTEditor._FILLER_PATTERNS.get(lang.split("-")[0].lower())
if not pattern:
return vtt_content
cues = VTTParser.parse(vtt_content)
compiled = re.compile(pattern, re.IGNORECASE)
for cue in cues:
cleaned = compiled.sub("", cue.text)
# Collapse multiple spaces and strip leading/trailing punctuation artifacts
cleaned = re.sub(r'[ \t]{2,}', ' ', cleaned).strip().strip(',').strip()
if cleaned:
cue.text = cleaned
return VTTParser.build(cues)

View file

@ -1,48 +1,41 @@
from contextlib import asynccontextmanager from contextlib import asynccontextmanager
import sentry_sdk import sentry_sdk
from fastapi import FastAPI, HTTPException, Request from fastapi import FastAPI, Request, HTTPException
from fastapi.exceptions import RequestValidationError from fastapi.exceptions import RequestValidationError
from fastapi.middleware.cors import CORSMiddleware from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse from fastapi.responses import JSONResponse
from sentry_sdk.integrations.celery import CeleryIntegration
from sentry_sdk.integrations.fastapi import FastApiIntegration from sentry_sdk.integrations.fastapi import FastApiIntegration
from sentry_sdk.integrations.pymongo import PyMongoIntegration
from sentry_sdk.integrations.redis import RedisIntegration from sentry_sdk.integrations.redis import RedisIntegration
from sentry_sdk.integrations.pymongo import PyMongoIntegration
from sentry_sdk.integrations.celery import CeleryIntegration
from .api.v1.routes_admin import router as admin_router from .api.v1.routes_admin import router as admin_router
from .api.v1.routes_admin_production import router as admin_production_router
from .api.v1.routes_auth import router as auth_router from .api.v1.routes_auth import router as auth_router
from .api.v1.routes_briefs import router as briefs_router
from .api.v1.routes_clients import router as clients_router from .api.v1.routes_clients import router as clients_router
from .api.v1.routes_files import router as files_router from .api.v1.routes_files import router as files_router
from .api.v1.routes_glossaries import router as glossaries_router from .api.v1.routes_jobs import router as jobs_router
from .api.v1.routes_invitations import org_router as invitations_org_router from .api.v1.routes_invitations import org_router as invitations_org_router
from .api.v1.routes_invitations import router as invitations_router from .api.v1.routes_invitations import router as invitations_router
from .api.v1.routes_jobs import router as jobs_router
from .api.v1.routes_language_qc import router as language_qc_router
from .api.v1.routes_organizations import router as organizations_router from .api.v1.routes_organizations import router as organizations_router
from .api.v1.routes_review_notes import router as review_notes_router from .api.v1.routes_review_notes import router as review_notes_router
from .api.v1.routes_share import router as share_router
from .api.v1.routes_tts import router as tts_router from .api.v1.routes_tts import router as tts_router
from .api.v1.routes_vtt_versions import router as vtt_versions_router
from .api.v1.routes_websockets import router as websockets_router from .api.v1.routes_websockets import router as websockets_router
from .services.websocket import connection_manager
from .core.config import settings from .core.config import settings
from .core.database import ( from .core.secrets_config import initialize_config
close_mongo_connection, from .core.database import close_mongo_connection, connect_to_mongo, create_indexes, get_database
connect_to_mongo,
get_database,
)
from .core.logging import setup_logging from .core.logging import setup_logging
from .core.redis import close_redis_connection, connect_to_redis, get_redis_client from .core.redis import close_redis_connection, connect_to_redis, get_redis_client
from .core.secrets_config import initialize_config
from .core.seed import seed_default_admin from .core.seed import seed_default_admin
from .middleware import create_rate_limit_middleware, create_validation_middleware from .middleware import create_rate_limit_middleware, create_validation_middleware
from .services.language_qc import seed_language_qc_for_job
from .services.websocket import connection_manager
from .telemetry import ( from .telemetry import (
app_metrics, app_metrics,
instrument_dependencies,
instrument_fastapi_app,
setup_tracing
) )
from .services.websocket import connection_manager
@asynccontextmanager @asynccontextmanager
@ -94,20 +87,6 @@ async def lifespan(app: FastAPI):
print(f"⚠️ Could not seed default admin: {e}") print(f"⚠️ Could not seed default admin: {e}")
# await create_indexes() # Temporarily disabled for debugging # await create_indexes() # Temporarily disabled for debugging
# T-16: Seed language_qc only for jobs that still lack it (idempotent, skips on subsequent starts)
try:
db = await get_database()
pending_count = await db.jobs.count_documents({"language_qc": {"$exists": False}})
if pending_count > 0:
async for job_doc in db.jobs.find(
{"language_qc": {"$exists": False}},
{"_id": 1, "status": 1, "outputs": 1, "source": 1, "review": 1, "updated_at": 1, "requested_outputs": 1},
):
await seed_language_qc_for_job(db, job_doc)
print(f"✅ language_qc migration complete ({pending_count} jobs seeded)")
except Exception as e:
print(f"⚠️ language_qc migration failed: {e}")
# Start WebSocket connection manager # Start WebSocket connection manager
await connection_manager.start() await connection_manager.start()
@ -120,9 +99,6 @@ async def lifespan(app: FastAPI):
# Store middleware in app state for access # Store middleware in app state for access
app.state.rate_limit_middleware = rate_limit_middleware app.state.rate_limit_middleware = rate_limit_middleware
app.state.validation_middleware = validation_middleware app.state.validation_middleware = validation_middleware
elif settings.redis_url:
# T-13: REDIS_URL is configured but client unavailable — rate limiting is disabled
print(f"⚠️ Redis configured at {settings.redis_url!r} but connection failed — rate limiting disabled")
yield yield
# Shutdown # Shutdown
@ -155,15 +131,16 @@ async def cors_error_handler(request, call_next):
try: try:
response = await call_next(request) response = await call_next(request)
except Exception as e: except Exception as e:
# LOG THE EXCEPTION BEFORE HANDLING IT
print(f"🚨 EXCEPTION IN CORS MIDDLEWARE: {e}")
import traceback import traceback
print(f"Traceback:\n{traceback.format_exc()}")
from .core.logging import get_logger as _get_logger # Handle any unhandled exceptions and add CORS headers
_get_logger(__name__).exception("🚨 CORS middleware caught: %s\n%s", e, traceback.format_exc())
from fastapi.responses import JSONResponse from fastapi.responses import JSONResponse
response = JSONResponse( response = JSONResponse(
status_code=500, status_code=500,
content={"detail": "Internal server error"}, content={"detail": "Internal server error", "error": str(e)}
) )
# Always add CORS headers for allowed origins # Always add CORS headers for allowed origins
@ -221,18 +198,21 @@ async def validation_exception_handler(request: Request, exc: RequestValidationE
async def general_exception_handler(request: Request, exc: Exception): async def general_exception_handler(request: Request, exc: Exception):
"""Handle all uncaught exceptions with logging""" """Handle all uncaught exceptions with logging"""
import traceback import traceback
from .core.logging import get_logger from .core.logging import get_logger
logger = get_logger(__name__) logger = get_logger(__name__)
logger.exception( logger.error(f"Unhandled exception in {request.method} {request.url.path}: {exc}")
"🚨 Unhandled %s %s: %s\n%s", logger.error(f"Exception type: {type(exc).__name__}")
request.method, request.url.path, exc, traceback.format_exc(), logger.error(f"Traceback: {traceback.format_exc()}")
)
# Also print to stdout for immediate visibility
print(f"🚨 UNHANDLED EXCEPTION: {request.method} {request.url.path}")
print(f"Exception: {exc}")
print(f"Traceback:\n{traceback.format_exc()}")
response = JSONResponse( response = JSONResponse(
status_code=500, status_code=500,
content={"detail": "Internal server error"}, content={"detail": "Internal server error", "error": str(exc)}
) )
# Add CORS headers # Add CORS headers
@ -247,6 +227,9 @@ async def general_exception_handler(request: Request, exc: Exception):
@app.middleware("http") @app.middleware("http")
async def rate_limiting_middleware(request, call_next): async def rate_limiting_middleware(request, call_next):
"""Apply rate limiting middleware.""" """Apply rate limiting middleware."""
# Skip middleware for auth endpoints during debugging
if request.url.path in ["/api/v1/auth/login", "/api/v1/auth/refresh"]:
return await call_next(request)
if hasattr(app.state, 'rate_limit_middleware'): if hasattr(app.state, 'rate_limit_middleware'):
return await app.state.rate_limit_middleware(request, call_next) return await app.state.rate_limit_middleware(request, call_next)
return await call_next(request) return await call_next(request)
@ -254,7 +237,11 @@ async def rate_limiting_middleware(request, call_next):
@app.middleware("http") @app.middleware("http")
async def validation_middleware(request, call_next): async def validation_middleware(request, call_next):
"""Apply request validation middleware.""" """Apply request validation middleware."""
if request.url.path in ["/health", "/metrics", "/api/v1/auth/login", "/api/v1/auth/refresh"]: # TEMPORARILY DISABLED FOR DEBUGGING
return await call_next(request)
# Skip middleware for auth endpoints during debugging
if request.url.path in ["/api/v1/auth/login", "/api/v1/auth/refresh"]:
return await call_next(request) return await call_next(request)
if hasattr(app.state, 'validation_middleware'): if hasattr(app.state, 'validation_middleware'):
return await app.state.validation_middleware(request, call_next) return await app.state.validation_middleware(request, call_next)
@ -272,27 +259,53 @@ app.include_router(invitations_router, prefix="/api/v1")
app.include_router(files_router, prefix="/api/v1") app.include_router(files_router, prefix="/api/v1")
app.include_router(jobs_router, prefix="/api/v1") app.include_router(jobs_router, prefix="/api/v1")
app.include_router(review_notes_router, prefix="/api/v1") app.include_router(review_notes_router, prefix="/api/v1")
app.include_router(vtt_versions_router, prefix="/api/v1")
app.include_router(language_qc_router, prefix="/api/v1")
app.include_router(glossaries_router, prefix="/api/v1")
app.include_router(tts_router, prefix="/api/v1") app.include_router(tts_router, prefix="/api/v1")
app.include_router(admin_router, prefix="/api/v1") app.include_router(admin_router, prefix="/api/v1")
app.include_router(admin_production_router, prefix="/api/v1")
app.include_router(briefs_router, prefix="/api/v1")
app.include_router(share_router, prefix="/api/v1")
app.include_router(websockets_router, prefix="/api/v1") app.include_router(websockets_router, prefix="/api/v1")
@app.on_event("startup")
async def startup_event():
"""Initialize services on startup"""
logger.info("🚀 Starting up FastAPI application...")
# Start WebSocket connection manager
try:
await connection_manager.start()
logger.info("✅ WebSocket connection manager started successfully")
except Exception as e:
logger.error(f"❌ Failed to start WebSocket connection manager: {e}")
raise
@app.on_event("shutdown")
async def shutdown_event():
"""Cleanup services on shutdown"""
logger.info("🛑 Shutting down FastAPI application...")
# Stop WebSocket connection manager
try:
await connection_manager.stop()
logger.info("✅ WebSocket connection manager stopped successfully")
except Exception as e:
logger.error(f"❌ Error stopping WebSocket connection manager: {e}")
@app.get("/health") @app.get("/health")
async def health_check(): async def health_check():
return {"status": "healthy", "version": "1.0.0"} return {"status": "healthy", "version": "1.0.0"}
@app.get("/debug-test")
async def debug_test():
print("🔥🔥🔥 DEBUG TEST ENDPOINT HIT 🔥🔥🔥")
return {"message": "If you see this, routing works"}
@app.get("/metrics") @app.get("/metrics")
async def metrics(): async def metrics():
"""Prometheus metrics endpoint""" """Prometheus metrics endpoint"""
from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
from fastapi import Response from fastapi import Response
from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
return Response( return Response(
content=generate_latest(), content=generate_latest(),

View file

@ -1,10 +1,6 @@
"""Middleware package for FastAPI application.""" """Middleware package for FastAPI application."""
from .rate_limiting import ( from .rate_limiting import RateLimitMiddleware, IPWhitelist, create_rate_limit_middleware
IPWhitelist,
RateLimitMiddleware,
create_rate_limit_middleware,
)
from .validation import ValidationMiddleware, create_validation_middleware from .validation import ValidationMiddleware, create_validation_middleware
__all__ = [ __all__ = [

View file

@ -1,10 +1,14 @@
"""Rate limiting middleware for API endpoints.""" """Rate limiting middleware for API endpoints."""
import time import time
from collections import defaultdict
from typing import Dict, Optional, Tuple
import redis.asyncio as aioredis import redis.asyncio as aioredis
from fastapi import Request, status from fastapi import HTTPException, Request, status
from fastapi.responses import JSONResponse from fastapi.responses import JSONResponse
import json
import asyncio
from datetime import datetime, timedelta
from app.core.config import get_settings from app.core.config import get_settings
from app.telemetry.metrics import track_rate_limit_metrics from app.telemetry.metrics import track_rate_limit_metrics
@ -22,7 +26,7 @@ class RateLimiter:
limit: int, limit: int,
window_seconds: int, window_seconds: int,
identifier: str = "" identifier: str = ""
) -> tuple[bool, dict[str, int]]: ) -> Tuple[bool, Dict[str, int]]:
""" """
Check if request is allowed under rate limit. Check if request is allowed under rate limit.
@ -109,18 +113,15 @@ class RateLimitMiddleware:
def _get_client_identifier(self, request: Request) -> str: def _get_client_identifier(self, request: Request) -> str:
"""Get client identifier for rate limiting.""" """Get client identifier for rate limiting."""
# Try to get user ID from JWT token
user = getattr(request.state, 'user', None) user = getattr(request.state, 'user', None)
if user: if user:
return f"user:{user.id}" return f"user:{user.id}"
# Only trust X-Forwarded-For when the request arrived via HTTPS (i.e. through # Fall back to IP address
# the Apache/nginx reverse proxy). On plain HTTP (direct connections, local forwarded_for = request.headers.get("X-Forwarded-For")
# dev) the header can be forged, so we fall back to the socket IP. if forwarded_for:
if request.headers.get("X-Forwarded-Proto") == "https": return f"ip:{forwarded_for.split(',')[0].strip()}"
forwarded_for = request.headers.get("X-Forwarded-For")
if forwarded_for:
# Take the right-most IP added by the trusted proxy, not client-supplied ones.
return f"ip:{forwarded_for.split(',')[-1].strip()}"
client_ip = request.client.host if request.client else "unknown" client_ip = request.client.host if request.client else "unknown"
return f"ip:{client_ip}" return f"ip:{client_ip}"
@ -137,7 +138,7 @@ class RateLimitMiddleware:
return f"{method}:{path}" return f"{method}:{path}"
def _get_rate_limit(self, request: Request) -> tuple[int, int]: def _get_rate_limit(self, request: Request) -> Tuple[int, int]:
"""Get rate limit for the current request.""" """Get rate limit for the current request."""
endpoint_key = self._get_endpoint_key(request) endpoint_key = self._get_endpoint_key(request)
@ -160,8 +161,8 @@ class RateLimitMiddleware:
async def __call__(self, request: Request, call_next): async def __call__(self, request: Request, call_next):
"""Process rate limiting for the request.""" """Process rate limiting for the request."""
# Skip rate limiting for health checks and metrics only # Skip rate limiting for health checks and login (temporary for debugging)
if request.url.path in ["/health", "/metrics"]: if request.url.path in ["/health", "/metrics", "/api/v1/auth/login"]:
return await call_next(request) return await call_next(request)
client_id = self._get_client_identifier(request) client_id = self._get_client_identifier(request)
@ -237,7 +238,7 @@ class IPWhitelist:
except Exception: except Exception:
return False return False
async def add_ip(self, ip: str, ttl_seconds: int | None = None) -> bool: async def add_ip(self, ip: str, ttl_seconds: Optional[int] = None) -> bool:
"""Add IP to whitelist.""" """Add IP to whitelist."""
try: try:
await self.redis.sadd(self.whitelist_key, ip) await self.redis.sadd(self.whitelist_key, ip)

View file

@ -3,17 +3,15 @@
import json import json
import re import re
import time import time
from typing import Any from typing import Any, Dict, List, Optional, Set
from fastapi import HTTPException, Request, status
from fastapi.responses import JSONResponse
from pydantic import BaseModel, ValidationError as PydanticValidationError
import magic
from urllib.parse import unquote from urllib.parse import unquote
import magic
from fastapi import Request, status
from fastapi.responses import JSONResponse
from app.telemetry.metrics import track_validation_metrics from app.telemetry.metrics import track_validation_metrics
from ..core.config import settings
class ValidationError(Exception): class ValidationError(Exception):
"""Custom validation error.""" """Custom validation error."""
@ -44,9 +42,8 @@ class RequestValidator:
# Security patterns to block # Security patterns to block
self.malicious_patterns = [ self.malicious_patterns = [
# SQL injection patterns # SQL injection patterns
r"\b(union|select|insert|update|delete|drop|create|alter)\b\s+", r"(union|select|insert|update|delete|drop|create|alter)\s+",
r"vbscript:", # vbscript protocol injection r"(script|javascript|vbscript|onload|onerror|onclick)",
r"\b(onload|onerror|onclick)\s*=", # HTML event handler attribute injection
r"<\s*script[^>]*>", r"<\s*script[^>]*>",
r"javascript:", r"javascript:",
r"data:.*base64", r"data:.*base64",
@ -57,21 +54,18 @@ class RequestValidator:
r"%2e%2e%2f", r"%2e%2e%2f",
r"%2e%2e\\", r"%2e%2e\\",
# Command injection (removed $ and ; — semicolons are common in natural language) # Command injection (removed $ to allow MongoDB operators in controlled contexts)
r"[&|`](?!\s*$)", r"[;&|`](?!\s*$)", # Allow $ but not as command separator
r"\b(rm|wget|curl|nc|bash|sh|cmd|powershell)\b\s+", r"(rm|wget|curl|nc|bash|sh|cmd|powershell)\s+",
# MongoDB injection — NoSQL operator abuse # MongoDB injection
r"\$where|\$expr|\$function|\$accumulator" r"\$where|\$ne|\$gt|\$lt|\$regex",
r"|\$ne|\$nin|\$not"
r"|\$gt|\$gte|\$lt|\$lte"
r"|\$regex|\$jsonSchema|\$mod",
] ]
self.compiled_patterns = [re.compile(pattern, re.IGNORECASE) for pattern in self.malicious_patterns] self.compiled_patterns = [re.compile(pattern, re.IGNORECASE) for pattern in self.malicious_patterns]
# Max file sizes (in bytes) — driven by central config (T-14) # Max file sizes (in bytes)
self.max_video_size = settings.upload_max_video_bytes self.max_video_size = 2 * 1024 * 1024 * 1024 # 2GB
self.max_subtitle_size = 10 * 1024 * 1024 # 10MB self.max_subtitle_size = 10 * 1024 * 1024 # 10MB
# Request size limits # Request size limits
@ -125,9 +119,9 @@ class RequestValidator:
subtitle_extensions = {'vtt', 'srt', 'txt'} subtitle_extensions = {'vtt', 'srt', 'txt'}
if expected_type == "video" and ext not in video_extensions: if expected_type == "video" and ext not in video_extensions:
raise ValidationError(f"Invalid video file extension: {ext}") from None raise ValidationError(f"Invalid video file extension: {ext}")
elif expected_type == "subtitle" and ext not in subtitle_extensions: elif expected_type == "subtitle" and ext not in subtitle_extensions:
raise ValidationError(f"Invalid subtitle file extension: {ext}") from None raise ValidationError(f"Invalid subtitle file extension: {ext}")
return return
if expected_type == "video" and detected_type not in self.allowed_video_types: if expected_type == "video" and detected_type not in self.allowed_video_types:
@ -154,7 +148,7 @@ class RequestValidator:
f"Maximum allowed: {self.max_subtitle_size} bytes" f"Maximum allowed: {self.max_subtitle_size} bytes"
) )
async def validate_json_payload(self, request: Request) -> dict[str, Any] | None: async def validate_json_payload(self, request: Request) -> Optional[Dict[str, Any]]:
"""Validate JSON request payload.""" """Validate JSON request payload."""
if not request.headers.get("content-type", "").startswith("application/json"): if not request.headers.get("content-type", "").startswith("application/json"):
return None return None
@ -186,10 +180,7 @@ class RequestValidator:
return payload return payload
except json.JSONDecodeError as e: except json.JSONDecodeError as e:
raise ValidationError(f"Invalid JSON: {e}") from e raise ValidationError(f"Invalid JSON: {e}")
# Fields that contain free-form natural language — skip injection pattern checks
_FREETEXT_FIELDS = {"captions_vtt", "audio_description_vtt", "text", "notes", "change_note", "description"}
def _validate_json_values(self, obj: Any, path: str = "root") -> None: def _validate_json_values(self, obj: Any, path: str = "root") -> None:
"""Recursively validate JSON values.""" """Recursively validate JSON values."""
@ -198,10 +189,9 @@ class RequestValidator:
raise ValidationError(f"Too many fields in object at {path}") raise ValidationError(f"Too many fields in object at {path}")
for key, value in obj.items(): for key, value in obj.items():
self.validate_string_content(key, f"{path}.key") if isinstance(key, str):
# Skip pattern scanning for free-text fields (VTT content, notes, etc.) self.validate_string_content(key, f"{path}.{key}")
if key not in self._FREETEXT_FIELDS: self._validate_json_values(value, f"{path}.{key}")
self._validate_json_values(value, f"{path}.{key}")
elif isinstance(obj, list): elif isinstance(obj, list):
if len(obj) > 1000: # Prevent large arrays if len(obj) > 1000: # Prevent large arrays
@ -278,7 +268,7 @@ class ValidationMiddleware:
return response return response
except SecurityValidationError: except SecurityValidationError as e:
validation_errors.append("security") validation_errors.append("security")
track_validation_metrics( track_validation_metrics(
endpoint=request.url.path, endpoint=request.url.path,

View file

@ -1,5 +1,5 @@
"""Database migration framework for MongoDB.""" """Database migration framework for MongoDB."""
from .migrator import Migration, MigrationManager from .migrator import MigrationManager, Migration
__all__ = ["MigrationManager", "Migration"] __all__ = ["MigrationManager", "Migration"]

View file

@ -1,10 +1,11 @@
"""MongoDB migration framework.""" """MongoDB migration framework."""
import os
import importlib.util import importlib.util
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from datetime import datetime from datetime import datetime
from pathlib import Path from pathlib import Path
from typing import List, Optional
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
from app.core.database import get_database from app.core.database import get_database
@ -17,11 +18,10 @@ logger = get_logger(__name__)
class Migration(ABC): class Migration(ABC):
"""Base class for database migrations.""" """Base class for database migrations."""
version: str = "0000-00-00-000000" # overridden by subclass as class variable
description: str = ""
def __init__(self): def __init__(self):
self.db: AsyncIOMotorDatabase | None = None self.version: str = "0000-00-00-000000" # Format: YYYY-MM-DD-HHMMSS
self.description: str = ""
self.db: Optional[AsyncIOMotorDatabase] = None
@abstractmethod @abstractmethod
async def up(self) -> None: async def up(self) -> None:
@ -51,7 +51,7 @@ class MigrationManager:
"""Manages database migrations.""" """Manages database migrations."""
def __init__(self): def __init__(self):
self.db: AsyncIOMotorDatabase | None = None self.db: Optional[AsyncIOMotorDatabase] = None
self.migrations_dir = Path(__file__).parent / "scripts" self.migrations_dir = Path(__file__).parent / "scripts"
self.collection_name = "migration_history" self.collection_name = "migration_history"
@ -70,7 +70,7 @@ class MigrationManager:
logger.info("Migration history collection initialized") logger.info("Migration history collection initialized")
def discover_migrations(self) -> list[str]: def discover_migrations(self) -> List[str]:
"""Discover all migration files in the migrations directory.""" """Discover all migration files in the migrations directory."""
if not self.migrations_dir.exists(): if not self.migrations_dir.exists():
logger.warning(f"Migrations directory not found: {self.migrations_dir}") logger.warning(f"Migrations directory not found: {self.migrations_dir}")
@ -101,13 +101,13 @@ class MigrationManager:
if not hasattr(module, 'Migration'): if not hasattr(module, 'Migration'):
raise AttributeError(f"Migration class not found in {migration_name}") raise AttributeError(f"Migration class not found in {migration_name}")
migration_class = module.Migration migration_class = getattr(module, 'Migration')
migration = migration_class() migration = migration_class()
await migration.set_database(self.db) await migration.set_database(self.db)
return migration return migration
async def get_applied_migrations(self) -> list[str]: async def get_applied_migrations(self) -> List[str]:
"""Get list of applied migration versions.""" """Get list of applied migration versions."""
collection = self.db[self.collection_name] collection = self.db[self.collection_name]
cursor = collection.find({}, {"version": 1}).sort("version", 1) cursor = collection.find({}, {"version": 1}).sort("version", 1)
@ -138,7 +138,7 @@ class MigrationManager:
logger.info(f"Removed migration record: {version}") logger.info(f"Removed migration record: {version}")
@trace_async_operation("migration_manager.migrate_up") @trace_async_operation("migration_manager.migrate_up")
async def migrate_up(self, target_version: str | None = None) -> list[str]: async def migrate_up(self, target_version: Optional[str] = None) -> List[str]:
""" """
Apply migrations up to the target version. Apply migrations up to the target version.
@ -186,7 +186,7 @@ class MigrationManager:
return applied return applied
@trace_async_operation("migration_manager.migrate_down") @trace_async_operation("migration_manager.migrate_down")
async def migrate_down(self, target_version: str) -> list[str]: async def migrate_down(self, target_version: str) -> List[str]:
""" """
Rollback migrations down to the target version. Rollback migrations down to the target version.

View file

@ -1,22 +0,0 @@
"""Entry point for running migrations: python -m app.migrations.run"""
import asyncio
from app.core.database import close_mongo_connection, connect_to_mongo
from app.migrations.migrator import MigrationManager
async def main() -> None:
await connect_to_mongo()
try:
mgr = MigrationManager()
applied = await mgr.migrate_up()
if applied:
print(f"Applied {len(applied)} migration(s): {applied}")
else:
print("Already up to date — no pending migrations.")
finally:
await close_mongo_connection()
if __name__ == "__main__":
asyncio.run(main())

View file

@ -1,5 +1,6 @@
"""Initial database schema setup migration.""" """Initial database schema setup migration."""
from datetime import datetime
from app.migrations.migrator import Migration from app.migrations.migrator import Migration

View file

@ -1,7 +1,6 @@
"""Migrate audit log schema from basic to comprehensive format.""" """Migrate audit log schema from basic to comprehensive format."""
from datetime import datetime from datetime import datetime
from app.migrations.migrator import Migration from app.migrations.migrator import Migration

View file

@ -24,7 +24,7 @@ class Migration(Migration):
# Create index on auth_provider for faster queries # Create index on auth_provider for faster queries
await self.db.users.create_index([("auth_provider", 1)]) await self.db.users.create_index([("auth_provider", 1)])
print("✅ Created index on auth_provider field") print(f"✅ Created index on auth_provider field")
print(f"✅ Applied migration {self.version}: {self.description}") print(f"✅ Applied migration {self.version}: {self.description}")
@ -34,7 +34,7 @@ class Migration(Migration):
# Drop the index # Drop the index
try: try:
await self.db.users.drop_index("auth_provider_1") await self.db.users.drop_index("auth_provider_1")
print("✅ Dropped index on auth_provider field") print(f"✅ Dropped index on auth_provider field")
except Exception as e: except Exception as e:
print(f"⚠️ Could not drop index: {e}") print(f"⚠️ Could not drop index: {e}")

View file

@ -75,7 +75,7 @@ class Migration(Migration):
"validationLevel": "moderate", # moderate = only validate on insert/update, not existing docs "validationLevel": "moderate", # moderate = only validate on insert/update, not existing docs
"validationAction": "error" # error = reject invalid documents "validationAction": "error" # error = reject invalid documents
}) })
print("✅ Updated users collection validator") print(f"✅ Updated users collection validator")
except Exception as e: except Exception as e:
print(f"⚠️ Could not update validator: {e}") print(f"⚠️ Could not update validator: {e}")
# Try creating the collection if it doesn't exist # Try creating the collection if it doesn't exist
@ -86,7 +86,7 @@ class Migration(Migration):
validationLevel="moderate", validationLevel="moderate",
validationAction="error" validationAction="error"
) )
print("✅ Created users collection with validator") print(f"✅ Created users collection with validator")
except Exception as e2: except Exception as e2:
print(f"⚠️ Could not create collection: {e2}") print(f"⚠️ Could not create collection: {e2}")
@ -136,4 +136,4 @@ class Migration(Migration):
}) })
print(f"⚠️ Rolled back migration {self.version}: {self.description}") print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print("⚠️ WARNING: Production role users will fail validation!") print(f"⚠️ WARNING: Production role users will fail validation!")

View file

@ -53,7 +53,7 @@ class Migration(Migration):
"validationLevel": "moderate", "validationLevel": "moderate",
"validationAction": "error" "validationAction": "error"
}) })
print(" Updated jobs collection validator") print(f" Updated jobs collection validator")
except Exception as e: except Exception as e:
print(f" Could not update validator: {e}") print(f" Could not update validator: {e}")
raise raise
@ -101,4 +101,4 @@ class Migration(Migration):
}) })
print(f" Rolled back migration {self.version}: {self.description}") print(f" Rolled back migration {self.version}: {self.description}")
print(" WARNING: Jobs with approved_source or qc_feedback status will fail validation!") print(f" WARNING: Jobs with approved_source or qc_feedback status will fail validation!")

View file

@ -54,7 +54,7 @@ class Migration(Migration):
"validationLevel": "moderate", "validationLevel": "moderate",
"validationAction": "error" "validationAction": "error"
}) })
print(" Updated jobs collection validator") print(f" Updated jobs collection validator")
except Exception as e: except Exception as e:
print(f" Could not update validator: {e}") print(f" Could not update validator: {e}")
raise raise
@ -104,4 +104,4 @@ class Migration(Migration):
}) })
print(f" Rolled back migration {self.version}: {self.description}") print(f" Rolled back migration {self.version}: {self.description}")
print(" WARNING: Jobs with rendering_video status will fail validation!") print(f" WARNING: Jobs with rendering_video status will fail validation!")

View file

@ -60,7 +60,7 @@ class Migration(Migration):
"validationLevel": "moderate", "validationLevel": "moderate",
"validationAction": "error" "validationAction": "error"
}) })
print(" Updated jobs collection validator") print(f" Updated jobs collection validator")
except Exception as e: except Exception as e:
print(f" Could not update validator: {e}") print(f" Could not update validator: {e}")
raise raise
@ -111,4 +111,4 @@ class Migration(Migration):
}) })
print(f" Rolled back migration {self.version}: {self.description}") print(f" Rolled back migration {self.version}: {self.description}")
print(" WARNING: Jobs with tts_failed or render_failed status will fail validation!") print(f" WARNING: Jobs with tts_failed or render_failed status will fail validation!")

View file

@ -61,7 +61,7 @@ class Migration(Migration):
"validationLevel": "moderate", "validationLevel": "moderate",
"validationAction": "error" "validationAction": "error"
}) })
print(" Updated jobs collection validator") print(f" Updated jobs collection validator")
except Exception as e: except Exception as e:
print(f" Could not update validator: {e}") print(f" Could not update validator: {e}")
raise raise
@ -114,4 +114,4 @@ class Migration(Migration):
}) })
print(f" Rolled back migration {self.version}: {self.description}") print(f" Rolled back migration {self.version}: {self.description}")
print(" WARNING: Jobs with rendering_qc status will fail validation!") print(f" WARNING: Jobs with rendering_qc status will fail validation!")

View file

@ -64,7 +64,7 @@ class Migration(Migration):
"validationLevel": "moderate", "validationLevel": "moderate",
"validationAction": "error" "validationAction": "error"
}) })
print("✅ Updated users collection validator") print(f"✅ Updated users collection validator")
except Exception as e: except Exception as e:
print(f"⚠️ Could not update validator: {e}") print(f"⚠️ Could not update validator: {e}")
try: try:
@ -74,7 +74,7 @@ class Migration(Migration):
validationLevel="moderate", validationLevel="moderate",
validationAction="error" validationAction="error"
) )
print("✅ Created users collection with validator") print(f"✅ Created users collection with validator")
except Exception as e2: except Exception as e2:
print(f"⚠️ Could not create collection: {e2}") print(f"⚠️ Could not create collection: {e2}")
@ -134,4 +134,4 @@ class Migration(Migration):
}) })
print(f"⚠️ Rolled back migration {self.version}: {self.description}") print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print("⚠️ WARNING: Linguist role users will fail validation!") print(f"⚠️ WARNING: Linguist role users will fail validation!")

View file

@ -69,7 +69,7 @@ class Migration(Migration):
"validationLevel": "moderate", "validationLevel": "moderate",
"validationAction": "error" "validationAction": "error"
}) })
print("✅ Updated users collection validator") print(f"✅ Updated users collection validator")
except Exception as e: except Exception as e:
print(f"⚠️ Could not update validator: {e}") print(f"⚠️ Could not update validator: {e}")
try: try:
@ -79,7 +79,7 @@ class Migration(Migration):
validationLevel="moderate", validationLevel="moderate",
validationAction="error" validationAction="error"
) )
print("✅ Created users collection with validator") print(f"✅ Created users collection with validator")
except Exception as e2: except Exception as e2:
print(f"⚠️ Could not create collection: {e2}") print(f"⚠️ Could not create collection: {e2}")
@ -139,4 +139,4 @@ class Migration(Migration):
}) })
print(f"⚠️ Rolled back migration {self.version}: {self.description}") print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print("⚠️ WARNING: project_manager role users will fail validation!") print(f"⚠️ WARNING: project_manager role users will fail validation!")

View file

@ -1,6 +1,6 @@
"""Backfill memberships collection from existing pm_client_ids and team.member_user_ids.""" """Backfill memberships collection from existing pm_client_ids and team.member_user_ids."""
from datetime import UTC, datetime from datetime import datetime, timezone
from app.migrations.migrator import Migration from app.migrations.migrator import Migration
@ -13,7 +13,7 @@ class Migration(Migration):
self.description = "Backfill memberships from pm_client_ids and team member lists" self.description = "Backfill memberships from pm_client_ids and team member lists"
async def up(self) -> None: async def up(self) -> None:
now = datetime.now(UTC) now = datetime.now(timezone.utc)
upserted = 0 upserted = 0
# 1. PROJECT_MANAGER users → MANAGER membership for each pm_client_id # 1. PROJECT_MANAGER users → MANAGER membership for each pm_client_id

View file

@ -1,53 +0,0 @@
"""Add PROCESSING_FAILED status to job schema validator and create failure indexes."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-29-000000"
description = "Add processing_failed status and failure/status compound indexes on jobs"
async def up(self) -> None:
db = self.db
# Add processing_failed to the schema validator enum (if validator exists)
try:
validator_info = await db.command(
"listCollections", filter={"name": "jobs"}
)
collections = [c async for c in validator_info["cursor"]]
if collections and collections[0].get("options", {}).get("validator"):
existing_validator = collections[0]["options"]["validator"]
status_path = (
existing_validator.get("$jsonSchema", {})
.get("properties", {})
.get("status", {})
.get("enum", [])
)
if status_path and "processing_failed" not in status_path:
status_path.append("processing_failed")
await db.command(
"collMod",
"jobs",
validator=existing_validator,
validationAction="warn",
)
except Exception:
# No validator or unsupported — skip gracefully
pass
# Indexes for failure dashboard queries
await db.jobs.create_index(
[("failure.step", 1), ("status", 1)],
name="idx_jobs_failure_step_status",
background=True,
)
await db.jobs.create_index(
[("status", 1), ("organization_id", 1), ("created_at", -1)],
name="idx_jobs_status_org_created",
background=True,
)
async def down(self) -> None:
db = self.db
await db.jobs.drop_index("idx_jobs_failure_step_status")
await db.jobs.drop_index("idx_jobs_status_org_created")

View file

@ -1,46 +0,0 @@
"""Create job_briefs collection with indexes."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-29-000001"
description = "Create job_briefs collection and indexes"
async def up(self) -> None:
db = self.db
# Ensure collection exists (insert + delete a dummy doc)
try:
await db.create_collection("job_briefs")
except Exception:
pass # already exists
await db.job_briefs.create_index(
[("organization_id", 1), ("status", 1), ("created_at", -1)],
name="idx_briefs_org_status_created",
background=True,
)
await db.job_briefs.create_index(
[("created_by", 1)],
name="idx_briefs_created_by",
background=True,
)
await db.job_briefs.create_index(
[("project_id", 1)],
name="idx_briefs_project_id",
background=True,
sparse=True,
)
await db.job_briefs.create_index(
[("job_id", 1)],
name="idx_briefs_job_id",
background=True,
sparse=True,
)
async def down(self) -> None:
db = self.db
await db.job_briefs.drop_index("idx_briefs_org_status_created")
await db.job_briefs.drop_index("idx_briefs_created_by")
await db.job_briefs.drop_index("idx_briefs_project_id")
await db.job_briefs.drop_index("idx_briefs_job_id")

View file

@ -1,44 +0,0 @@
"""Backfill Membership.team_ids from Team.member_user_ids (MT-17)."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-30-000000"
description = "Backfill team_ids on Membership records from Team.member_user_ids"
async def up(self) -> None:
db = self.db
upserted = 0
# For each team that has member_user_ids, push team_id into the matching Membership
async for team in db.teams.find(
{"member_user_ids": {"$exists": True, "$ne": []}},
{"_id": 1, "client_id": 1, "member_user_ids": 1},
):
team_id = str(team["_id"])
org_id = str(team.get("client_id", ""))
for user_id in team.get("member_user_ids", []):
result = await db.memberships.update_one(
{"user_id": str(user_id), "organization_id": org_id},
{"$addToSet": {"team_ids": team_id}},
)
if result.modified_count:
upserted += 1
# Ensure index for efficient team-based lookups
await db.memberships.create_index(
[("team_ids", 1)],
name="idx_memberships_team_ids",
background=True,
sparse=True,
)
print(f"✅ Backfilled team_ids on {upserted} Membership records")
async def down(self) -> None:
db = self.db
await db.memberships.update_many({}, {"$unset": {"team_ids": ""}})
try:
await db.memberships.drop_index("idx_memberships_team_ids")
except Exception:
pass

View file

@ -1,38 +0,0 @@
"""Add cancelled status to job schema validator."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-30-000001"
description = "Add cancelled status to jobs collection schema validator"
async def up(self) -> None:
db = self.db
try:
validator_info = await db.command(
"listCollections", filter={"name": "jobs"}
)
collections = [c async for c in validator_info["cursor"]]
if collections and collections[0].get("options", {}).get("validator"):
existing_validator = collections[0]["options"]["validator"]
status_path = (
existing_validator.get("$jsonSchema", {})
.get("properties", {})
.get("status", {})
.get("enum", [])
)
if status_path and "cancelled" not in status_path:
status_path.append("cancelled")
await db.command(
"collMod",
"jobs",
validator=existing_validator,
validationAction="warn",
)
except Exception:
# No validator or unsupported — skip gracefully
pass
async def down(self) -> None:
pass

View file

@ -1,47 +0,0 @@
"""Replace status enum in $jsonSchema validator with the full current list."""
from app.migrations.migrator import Migration
ALL_STATUSES = [
"created", "ingesting", "ai_processing",
"pending_qc", "approved_english", "approved_source",
"rejected", "qc_feedback",
"translating", "tts_generating", "tts_failed",
"rendering_video", "render_failed", "rendering_qc",
"pending_final_review", "completed",
"processing_failed", "cancelled",
]
class Migration(Migration):
version = "2026-04-30-000002"
description = "Fix status enum in jobs $jsonSchema validator (add processing_failed + cancelled)"
async def up(self) -> None:
db = self.db
result = await db.command("listCollections", filter={"name": "jobs"})
batch = result.get("cursor", {}).get("firstBatch", [])
if not batch:
return
existing_validator = batch[0].get("options", {}).get("validator")
if not existing_validator:
return
schema = existing_validator.get("$jsonSchema", {})
status_prop = schema.get("properties", {}).get("status")
if not status_prop:
return
status_prop["enum"] = ALL_STATUSES
await db.command(
"collMod",
"jobs",
validator=existing_validator,
validationLevel="moderate",
validationAction="error",
)
async def down(self) -> None:
pass

View file

@ -1,26 +0,0 @@
"""Backfill source_has_ad=False on existing jobs and job_briefs."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-05-08-000000"
description = "Add source_has_ad field to jobs.source and job_briefs"
async def up(self) -> None:
db = self.db
jobs_result = await db.jobs.update_many(
{"source.source_has_ad": {"$exists": False}},
{"$set": {"source.source_has_ad": False}},
)
briefs_result = await db.job_briefs.update_many(
{"source_has_ad": {"$exists": False}},
{"$set": {"source_has_ad": False}},
)
print(f"✅ Backfilled source_has_ad on {jobs_result.modified_count} jobs, {briefs_result.modified_count} job_briefs")
async def down(self) -> None:
db = self.db
await db.jobs.update_many({}, {"$unset": {"source.source_has_ad": ""}})
await db.job_briefs.update_many({}, {"$unset": {"source_has_ad": ""}})

Binary file not shown.

Binary file not shown.

View file

@ -1,16 +1,15 @@
"""Audit log model for tracking sensitive operations.""" """Audit log model for tracking sensitive operations."""
from datetime import datetime from datetime import datetime
from enum import StrEnum from enum import Enum
from typing import Any from typing import Any, Dict, Optional
from bson import ObjectId from bson import ObjectId
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from .user import PyObjectId from .user import PyObjectId
class AuditAction(StrEnum): class AuditAction(str, Enum):
"""Enumeration of auditable actions.""" """Enumeration of auditable actions."""
# Authentication actions # Authentication actions
@ -37,9 +36,6 @@ class AuditAction(StrEnum):
JOB_REJECT = "job.reject" JOB_REJECT = "job.reject"
JOB_CANCEL = "job.cancel" JOB_CANCEL = "job.cancel"
JOB_STATUS_CHANGE = "job.status.change" JOB_STATUS_CHANGE = "job.status.change"
JOB_TASK_FAILED = "job.task.failed"
JOB_RETRY = "job.retry"
JOB_BULK_RETRY = "job.bulk_retry"
# File operations # File operations
FILE_UPLOAD = "file.upload" FILE_UPLOAD = "file.upload"
@ -51,19 +47,6 @@ class AuditAction(StrEnum):
VTT_EDIT = "vtt.edit" VTT_EDIT = "vtt.edit"
VTT_APPROVE = "vtt.approve" VTT_APPROVE = "vtt.approve"
VTT_REJECT = "vtt.reject" VTT_REJECT = "vtt.reject"
VTT_RETRANSLATE = "vtt.retranslate"
# Per-language QC actions
LANGUAGE_QC_ASSIGN = "language_qc.assign"
LANGUAGE_QC_REASSIGN = "language_qc.reassign"
LANGUAGE_QC_REVIEWER_ASSIGN = "language_qc.reviewer_assign"
LANGUAGE_QC_REVIEWER_REASSIGN = "language_qc.reviewer_reassign"
LANGUAGE_QC_SUBMIT = "language_qc.submit"
LANGUAGE_QC_OPEN_REVIEW = "language_qc.open_review"
LANGUAGE_QC_APPROVE = "language_qc.approve"
LANGUAGE_QC_REJECT = "language_qc.reject"
LANGUAGE_QC_REOPEN = "language_qc.reopen"
LANGUAGE_QC_COMMENT = "language_qc.comment"
# Admin actions # Admin actions
ADMIN_CONFIG_CHANGE = "admin.config.change" ADMIN_CONFIG_CHANGE = "admin.config.change"
@ -71,55 +54,6 @@ class AuditAction(StrEnum):
ADMIN_DATA_EXPORT = "admin.data.export" ADMIN_DATA_EXPORT = "admin.data.export"
ADMIN_AUDIT_ACCESS = "admin.audit.access" ADMIN_AUDIT_ACCESS = "admin.audit.access"
# Glossary management
GLOSSARY_UPLOAD = "glossary.upload"
GLOSSARY_VERSION_UPLOAD = "glossary.version.upload"
GLOSSARY_ACTIVATE = "glossary.activate"
GLOSSARY_ARCHIVE = "glossary.archive"
# Client management
CLIENT_CREATE = "client.create"
CLIENT_UPDATE = "client.update"
CLIENT_DEACTIVATE = "client.deactivate"
CLIENT_PM_ASSIGN = "client.pm_assign"
CLIENT_PM_REMOVE = "client.pm_remove"
CLIENT_TEAM_CREATE = "client.team_create"
CLIENT_TEAM_UPDATE = "client.team_update"
CLIENT_TEAM_DELETE = "client.team_delete"
CLIENT_TEAM_MEMBER_ADD = "client.team_member_add"
CLIENT_TEAM_MEMBER_REMOVE = "client.team_member_remove"
CLIENT_PROJECT_CREATE = "client.project_create"
CLIENT_PROJECT_UPDATE = "client.project_update"
CLIENT_PROJECT_ARCHIVE = "client.project_archive"
# Organization management
ORG_CREATE = "org.create"
ORG_UPDATE = "org.update"
ORG_MEMBER_ADD = "org.member_add"
ORG_MEMBER_UPDATE = "org.member_update"
ORG_MEMBER_REMOVE = "org.member_remove"
# Invitations
INVITATION_CREATE = "invitation.create"
INVITATION_REVOKE = "invitation.revoke"
INVITATION_ACCEPT = "invitation.accept"
# Language QC (additional)
LANGUAGE_QC_BULK_ASSIGN = "language_qc.bulk_assign"
LANGUAGE_QC_START_WORK = "language_qc.start_work"
LANGUAGE_QC_MARK_CUE_REVIEWED = "language_qc.mark_cue_reviewed"
# Brief management
BRIEF_CREATE = "brief.create"
BRIEF_UPDATE = "brief.update"
BRIEF_SUBMIT = "brief.submit"
BRIEF_APPROVE = "brief.approve"
# Share tokens
SHARE_TOKEN_CREATE = "share.token_create"
SHARE_TOKEN_REVOKE = "share.token_revoke"
SHARE_CLIENT_DECISION = "share.client_decision"
# Security events # Security events
RATE_LIMIT_EXCEEDED = "security.rate_limit.exceeded" RATE_LIMIT_EXCEEDED = "security.rate_limit.exceeded"
VALIDATION_FAILURE = "security.validation.failure" VALIDATION_FAILURE = "security.validation.failure"
@ -127,7 +61,7 @@ class AuditAction(StrEnum):
SUSPICIOUS_ACTIVITY = "security.suspicious.activity" SUSPICIOUS_ACTIVITY = "security.suspicious.activity"
class AuditLogSeverity(StrEnum): class AuditLogSeverity(str, Enum):
"""Severity levels for audit events.""" """Severity levels for audit events."""
INFO = "info" # Normal operations INFO = "info" # Normal operations
@ -139,7 +73,7 @@ class AuditLogSeverity(StrEnum):
class AuditLog(BaseModel): class AuditLog(BaseModel):
"""Audit log entry model.""" """Audit log entry model."""
id: PyObjectId | None = Field(default_factory=lambda: str(ObjectId()), alias="_id") id: Optional[PyObjectId] = Field(default_factory=PyObjectId, alias="_id")
# Core audit fields # Core audit fields
timestamp: datetime = Field(default_factory=datetime.utcnow) timestamp: datetime = Field(default_factory=datetime.utcnow)
@ -147,28 +81,28 @@ class AuditLog(BaseModel):
severity: AuditLogSeverity = AuditLogSeverity.INFO severity: AuditLogSeverity = AuditLogSeverity.INFO
# Actor information # Actor information
user_id: PyObjectId | None = None user_id: Optional[PyObjectId] = None
user_email: str | None = None user_email: Optional[str] = None
user_role: str | None = None user_role: Optional[str] = None
# Request context # Request context
ip_address: str | None = None ip_address: Optional[str] = None
user_agent: str | None = None user_agent: Optional[str] = None
request_id: str | None = None request_id: Optional[str] = None
session_id: str | None = None session_id: Optional[str] = None
# Resource information # Resource information
resource_type: str | None = None # e.g., "job", "user", "file" resource_type: Optional[str] = None # e.g., "job", "user", "file"
resource_id: str | None = None resource_id: Optional[str] = None
resource_name: str | None = None resource_name: Optional[str] = None
# Action details # Action details
description: str description: str
details: dict[str, Any] = Field(default_factory=dict) details: Dict[str, Any] = Field(default_factory=dict)
# Outcome # Outcome
success: bool = True success: bool = True
error_message: str | None = None error_message: Optional[str] = None
# Additional metadata # Additional metadata
environment: str = "prod" environment: str = "prod"
@ -189,38 +123,38 @@ class AuditLogCreate(BaseModel):
description: str description: str
# Optional fields that can be provided # Optional fields that can be provided
user_id: PyObjectId | None = None user_id: Optional[PyObjectId] = None
user_email: str | None = None user_email: Optional[str] = None
user_role: str | None = None user_role: Optional[str] = None
ip_address: str | None = None ip_address: Optional[str] = None
user_agent: str | None = None user_agent: Optional[str] = None
request_id: str | None = None request_id: Optional[str] = None
resource_type: str | None = None resource_type: Optional[str] = None
resource_id: str | None = None resource_id: Optional[str] = None
resource_name: str | None = None resource_name: Optional[str] = None
details: dict[str, Any] = Field(default_factory=dict) details: Dict[str, Any] = Field(default_factory=dict)
success: bool = True success: bool = True
error_message: str | None = None error_message: Optional[str] = None
class AuditLogQuery(BaseModel): class AuditLogQuery(BaseModel):
"""Schema for querying audit logs.""" """Schema for querying audit logs."""
# Time range # Time range
start_date: datetime | None = None start_date: Optional[datetime] = None
end_date: datetime | None = None end_date: Optional[datetime] = None
# Filters # Filters
action: AuditAction | None = None action: Optional[AuditAction] = None
severity: AuditLogSeverity | None = None severity: Optional[AuditLogSeverity] = None
user_id: PyObjectId | None = None user_id: Optional[PyObjectId] = None
user_email: str | None = None user_email: Optional[str] = None
resource_type: str | None = None resource_type: Optional[str] = None
resource_id: str | None = None resource_id: Optional[str] = None
success: bool | None = None success: Optional[bool] = None
# Search # Search
search: str | None = None # Full-text search in description and details search: Optional[str] = None # Full-text search in description and details
# Pagination # Pagination
skip: int = 0 skip: int = 0

View file

@ -1,5 +1,5 @@
from datetime import datetime from datetime import datetime
from typing import Annotated from typing import Optional, Annotated
from bson import ObjectId from bson import ObjectId
from pydantic import BaseModel, BeforeValidator from pydantic import BaseModel, BeforeValidator
@ -17,12 +17,12 @@ PyObjectId = Annotated[str, BeforeValidator(validate_object_id)]
class Client(BaseModel): class Client(BaseModel):
id: str | None = None id: Optional[str] = None
name: str name: str
slug: str slug: str
is_active: bool = True is_active: bool = True
created_at: datetime | None = None created_at: Optional[datetime] = None
updated_at: datetime | None = None updated_at: Optional[datetime] = None
class ClientCreate(BaseModel): class ClientCreate(BaseModel):
@ -31,18 +31,18 @@ class ClientCreate(BaseModel):
class ClientUpdate(BaseModel): class ClientUpdate(BaseModel):
name: str | None = None name: Optional[str] = None
slug: str | None = None slug: Optional[str] = None
is_active: bool | None = None is_active: Optional[bool] = None
class Team(BaseModel): class Team(BaseModel):
id: str | None = None id: Optional[str] = None
name: str name: str
client_id: str client_id: str
member_user_ids: list[str] = [] member_user_ids: list[str] = []
created_at: datetime | None = None created_at: Optional[datetime] = None
updated_at: datetime | None = None updated_at: Optional[datetime] = None
class TeamCreate(BaseModel): class TeamCreate(BaseModel):
@ -50,31 +50,22 @@ class TeamCreate(BaseModel):
class TeamUpdate(BaseModel): class TeamUpdate(BaseModel):
name: str | None = None name: Optional[str] = None
class Project(BaseModel): class Project(BaseModel):
id: str | None = None id: Optional[str] = None
name: str name: str
client_id: str client_id: str
is_active: bool = True is_active: bool = True
default_languages: list[str] = [] created_at: Optional[datetime] = None
default_linguist_id: str | None = None updated_at: Optional[datetime] = None
default_reviewer_id: str | None = None
created_at: datetime | None = None
updated_at: datetime | None = None
class ProjectCreate(BaseModel): class ProjectCreate(BaseModel):
name: str name: str
default_languages: list[str] = []
default_linguist_id: str | None = None
default_reviewer_id: str | None = None
class ProjectUpdate(BaseModel): class ProjectUpdate(BaseModel):
name: str | None = None name: Optional[str] = None
is_active: bool | None = None is_active: Optional[bool] = None
default_languages: list[str] | None = None
default_linguist_id: str | None = None
default_reviewer_id: str | None = None

View file

@ -1,142 +0,0 @@
from __future__ import annotations
from datetime import datetime
from enum import StrEnum
from pydantic import BaseModel, Field
class GlossarySource(StrEnum):
XLSX_UPLOAD = "xlsx_upload"
FRAZE_API = "fraze_api" # reserved for future FRAZE integration
class GlossaryStatus(StrEnum):
ACTIVE = "active"
ARCHIVED = "archived"
class EmbeddingStatus(StrEnum):
PENDING = "pending"
IN_PROGRESS = "in_progress"
DONE = "done"
FAILED = "failed"
class Glossary(BaseModel):
id: str | None = Field(None, alias="_id")
client_id: str
name: str
description: str | None = None
source_locale: str # BCP-47 source column, e.g. "en-GB"
source: GlossarySource = GlossarySource.XLSX_UPLOAD
status: GlossaryStatus = GlossaryStatus.ACTIVE
current_version_id: str | None = None
created_at: datetime = Field(default_factory=datetime.utcnow)
created_by: str # user_id
model_config = {"populate_by_name": True, "arbitrary_types_allowed": True}
class GlossaryVersion(BaseModel):
id: str | None = Field(None, alias="_id")
glossary_id: str
version_number: int
source_xlsx_gcs_path: str | None = None # GCS path to original file
term_count: int = 0
embedded_count: int = 0
embedding_status: EmbeddingStatus = EmbeddingStatus.PENDING
created_at: datetime = Field(default_factory=datetime.utcnow)
created_by: str
change_note: str | None = None
model_config = {"populate_by_name": True}
class GlossaryTerm(BaseModel):
"""One source term with its per-locale translations."""
id: str | None = Field(None, alias="_id")
glossary_id: str
version_id: str
cid: str | None = None # 3M Content ID from xlsx
tid: str | None = None # 3M Term ID from xlsx
source_term: str # canonical source text (whitespace-normalised)
source_term_lower: str # lowercase for case-insensitive index
translations: dict[str, str] = {} # {locale_code: translated_text}
embedding: list[float] | None = None # 768-dim Gemini embedding
model_config = {"populate_by_name": True}
# ── Schema models (API request/response) ──────────────────────────────────────
class GlossaryCreate(BaseModel):
name: str
description: str | None = None
source_locale: str
change_note: str | None = None
class GlossaryVersionCreate(BaseModel):
source_locale: str
change_note: str | None = None
class GlossaryResponse(BaseModel):
id: str
client_id: str
name: str
description: str | None = None
source_locale: str
source: GlossarySource
status: GlossaryStatus
current_version_id: str | None = None
current_version_embedding_status: EmbeddingStatus | None = None
current_version_embedded_count: int | None = None
current_version_term_count: int | None = None
created_at: datetime
created_by: str
class GlossaryVersionResponse(BaseModel):
id: str
glossary_id: str
version_number: int
term_count: int
embedded_count: int
embedding_status: EmbeddingStatus
created_at: datetime
created_by: str
change_note: str | None = None
class GlossaryDetailResponse(GlossaryResponse):
versions: list[GlossaryVersionResponse] = []
class GlossaryTermPreview(BaseModel):
"""Subset of GlossaryTerm for UI previews."""
source_term: str
translations: dict[str, str]
class MatchedTerm(BaseModel):
"""A term matched against VTT source text, with the target-locale translation."""
source_term: str
target_translation: str
match_kind: str # "exact" | "vector"
score: float # 1.0 for exact, cosine similarity for vector
def glossary_from_doc(doc: dict) -> Glossary:
doc = dict(doc)
if "_id" in doc:
doc["_id"] = str(doc["_id"])
return Glossary.model_validate(doc)
def glossary_version_from_doc(doc: dict) -> GlossaryVersion:
doc = dict(doc)
if "_id" in doc:
doc["_id"] = str(doc["_id"])
return GlossaryVersion.model_validate(doc)

View file

@ -1,4 +1,5 @@
from datetime import datetime from datetime import datetime
from typing import Optional
from pydantic import BaseModel, EmailStr from pydantic import BaseModel, EmailStr
@ -6,7 +7,7 @@ from .organization import OrgRole
class Invitation(BaseModel): class Invitation(BaseModel):
id: str | None = None id: Optional[str] = None
email: str email: str
organization_id: str organization_id: str
role_in_org: OrgRole role_in_org: OrgRole
@ -14,9 +15,9 @@ class Invitation(BaseModel):
token_hash: str token_hash: str
invited_by_user_id: str invited_by_user_id: str
expires_at: datetime expires_at: datetime
accepted_at: datetime | None = None accepted_at: Optional[datetime] = None
revoked_at: datetime | None = None revoked_at: Optional[datetime] = None
created_at: datetime | None = None created_at: Optional[datetime] = None
class InvitationCreate(BaseModel): class InvitationCreate(BaseModel):
@ -39,9 +40,9 @@ class InvitationPreviewResponse(BaseModel):
class InvitationAcceptRequest(BaseModel): class InvitationAcceptRequest(BaseModel):
token: str token: str
full_name: str | None = None full_name: Optional[str] = None
password: str | None = None password: Optional[str] = None
ms_id_token: str | None = None ms_id_token: Optional[str] = None
class InvitationResponse(BaseModel): class InvitationResponse(BaseModel):
@ -51,9 +52,9 @@ class InvitationResponse(BaseModel):
role_in_org: OrgRole role_in_org: OrgRole
invited_by_user_id: str invited_by_user_id: str
expires_at: datetime expires_at: datetime
accepted_at: datetime | None = None accepted_at: Optional[datetime] = None
revoked_at: datetime | None = None revoked_at: Optional[datetime] = None
created_at: datetime | None = None created_at: Optional[datetime] = None
is_expired: bool = False is_expired: bool = False
is_accepted: bool = False is_accepted: bool = False
is_revoked: bool = False is_revoked: bool = False

View file

@ -1,13 +1,11 @@
from datetime import datetime from datetime import datetime
from enum import StrEnum from enum import Enum
from typing import Any, Literal from typing import Any, Literal, Optional
from pydantic import BaseModel, Field, constr from pydantic import BaseModel, Field, constr
FailureStep = Literal["ingestion", "ai_processing", "translation", "tts", "render"]
class JobStatus(str, Enum):
class JobStatus(StrEnum):
CREATED = "created" CREATED = "created"
INGESTING = "ingesting" INGESTING = "ingesting"
AI_PROCESSING = "ai_processing" AI_PROCESSING = "ai_processing"
@ -18,14 +16,12 @@ class JobStatus(StrEnum):
QC_FEEDBACK = "qc_feedback" QC_FEEDBACK = "qc_feedback"
TRANSLATING = "translating" TRANSLATING = "translating"
TTS_GENERATING = "tts_generating" TTS_GENERATING = "tts_generating"
TTS_FAILED = "tts_failed" # legacy: use PROCESSING_FAILED + failure.step="tts" for new failures TTS_FAILED = "tts_failed" # TTS synthesis failed after retries, requires reprocessing
RENDERING_VIDEO = "rendering_video" # Accessible video rendering in progress RENDERING_VIDEO = "rendering_video" # Accessible video rendering in progress
RENDER_FAILED = "render_failed" # legacy: use PROCESSING_FAILED + failure.step="render" for new failures RENDER_FAILED = "render_failed" # Accessible video rendering failed, requires reprocessing
PROCESSING_FAILED = "processing_failed" # unified failure status; see Job.failure for step details
RENDERING_QC = "rendering_qc" # Re-rendering accessible video during QC review RENDERING_QC = "rendering_qc" # Re-rendering accessible video during QC review
PENDING_FINAL_REVIEW = "pending_final_review" PENDING_FINAL_REVIEW = "pending_final_review"
COMPLETED = "completed" COMPLETED = "completed"
CANCELLED = "cancelled"
@classmethod @classmethod
def is_approved(cls, status: str) -> bool: def is_approved(cls, status: str) -> bool:
@ -33,24 +29,14 @@ class JobStatus(StrEnum):
return status in [cls.APPROVED_ENGLISH.value, cls.APPROVED_SOURCE.value] return status in [cls.APPROVED_ENGLISH.value, cls.APPROVED_SOURCE.value]
class JobFailure(BaseModel):
step: FailureStep
type: str
message: str
retriable: bool = True
occurred_at: datetime
retry_count: int = 0
class Source(BaseModel): class Source(BaseModel):
filename: str filename: str
original_filename: str | None = None original_filename: Optional[str] = None
gcs_uri: str gcs_uri: str
duration_s: float | None = None duration_s: Optional[float] = None
language: constr(min_length=2, max_length=10) = "en" # Final source language (from detection or explicit) language: constr(min_length=2, max_length=10) = "en" # Final source language (from detection or explicit)
language_hint: str | None = None # User-provided hint for non-English videos language_hint: Optional[str] = None # User-provided hint for non-English videos
detected_language: str | None = None # AI-detected language from Gemini detected_language: Optional[str] = None # AI-detected language from Gemini
source_has_ad: bool = False # Source video already contains professional audio descriptions
class TTSPreferences(BaseModel): class TTSPreferences(BaseModel):
@ -64,10 +50,10 @@ class TTSPreferences(BaseModel):
style_preset: Literal[ style_preset: Literal[
"neutral", "calm", "energetic", "professional", "warm", "documentary", "custom" "neutral", "calm", "energetic", "professional", "warm", "documentary", "custom"
] = "neutral" ] = "neutral"
custom_style_prompt: str | None = None # Used when style_preset is "custom" custom_style_prompt: Optional[str] = None # Used when style_preset is "custom"
# ElevenLabs-specific settings # ElevenLabs-specific settings
stability: float | None = None # 0.0-1.0, default 0.5 when used stability: Optional[float] = None # 0.0-1.0, default 0.5 when used
similarity_boost: float | None = None # 0.0-1.0, default 0.5 when used similarity_boost: Optional[float] = None # 0.0-1.0, default 0.5 when used
class RequestedOutputs(BaseModel): class RequestedOutputs(BaseModel):
@ -75,24 +61,22 @@ class RequestedOutputs(BaseModel):
audio_description_vtt: bool = True audio_description_vtt: bool = True
audio_description_mp3: bool = True audio_description_mp3: bool = True
accessible_video_mp4: bool = False # Rendered video with embedded audio descriptions accessible_video_mp4: bool = False # Rendered video with embedded audio descriptions
accessible_video_method: Literal["overlay", "pause_insert"] | None = None # User-selected method accessible_video_method: Optional[Literal["overlay", "pause_insert"]] = None # User-selected method
sdh_vtt: bool = False # SDH (Subtitles for Deaf and Hard of Hearing) captions with speaker labels, sound effects, music notation sdh_vtt: bool = False # SDH (Subtitles for Deaf and Hard of Hearing) captions with speaker labels, sound effects, music notation
descriptive_transcript: bool = False # WCAG-compliant combined speech+description transcript text file
languages: list[str] = [] languages: list[str] = []
transcreation: list[str] = [] transcreation: list[str] = []
tts_preferences: TTSPreferences | None = None tts_preferences: Optional[TTSPreferences] = None
translation_mode: Literal["traditional", "video_native"] = "traditional" translation_mode: Literal["traditional", "video_native"] = "video_native"
class PausePointData(BaseModel): class PausePointData(BaseModel):
"""Pause point timing data for accessible video editing during QC.""" """Pause point timing data for accessible video editing during QC."""
cue_index: int # AD cue index this pause point belongs to cue_index: int # AD cue index this pause point belongs to
original_ms: float # Rendered timeline position (ms) - for UI display original_ms: float # Rendered timeline position (ms) - for UI display
source_ms: float | None = None # Source video cut point (ms) - for re-rendering (None = use original_ms) source_ms: Optional[float] = None # Source video cut point (ms) - for re-rendering (None = use original_ms)
adjusted_ms: float | None = None # User-adjusted timestamp (ms), None = use original adjusted_ms: Optional[float] = None # User-adjusted timestamp (ms), None = use original
min_bound_ms: float # Minimum allowed value (end of previous AD segment) min_bound_ms: float # Minimum allowed value (end of previous AD segment)
max_bound_ms: float # Maximum allowed value (start of next AD segment) max_bound_ms: float # Maximum allowed value (start of next AD segment)
natural_gap_ms: float = 0.0 # Natural silence already present at pause point (ms); used to size silence buffers
class VideoSegmentMetadata(BaseModel): class VideoSegmentMetadata(BaseModel):
@ -103,16 +87,16 @@ class VideoSegmentMetadata(BaseModel):
gcs_uri: str # GCS path to segment MP4 gcs_uri: str # GCS path to segment MP4
duration_ms: float # Actual segment duration (ms) duration_ms: float # Actual segment duration (ms)
is_freeze_frame: bool = False # True if this is a freeze frame segment with AD audio is_freeze_frame: bool = False # True if this is a freeze frame segment with AD audio
cue_index: int | None = None # AD cue index (only for freeze frame segments) cue_index: Optional[int] = None # AD cue index (only for freeze frame segments)
class TTSRegenerationRequest(BaseModel): class TTSRegenerationRequest(BaseModel):
"""Request to regenerate TTS for a specific cue during QC.""" """Request to regenerate TTS for a specific cue during QC."""
cue_index: int cue_index: int
requested_at: datetime requested_at: datetime
new_text: str | None = None # If provided, use this text instead of current VTT new_text: Optional[str] = None # If provided, use this text instead of current VTT
status: Literal["pending", "processing", "completed", "failed"] = "pending" status: Literal["pending", "processing", "completed", "failed"] = "pending"
error_message: str | None = None error_message: Optional[str] = None
class AccessibleVideoEditState(BaseModel): class AccessibleVideoEditState(BaseModel):
@ -120,156 +104,74 @@ class AccessibleVideoEditState(BaseModel):
pause_points: list[PausePointData] = [] pause_points: list[PausePointData] = []
video_segments: list[VideoSegmentMetadata] = [] video_segments: list[VideoSegmentMetadata] = []
tts_regeneration_queue: list[TTSRegenerationRequest] = [] tts_regeneration_queue: list[TTSRegenerationRequest] = []
last_render_at: datetime | None = None last_render_at: Optional[datetime] = None
whisper_refine_enabled: bool = False # Default: off (user enables if cue positions changed) whisper_refine_enabled: bool = False # Default: off (user enables if cue positions changed)
class LangOutput(BaseModel): class LangOutput(BaseModel):
captions_vtt_gcs: str | None = None captions_vtt_gcs: Optional[str] = None
sdh_captions_vtt_gcs: str | None = None # SDH-format captions (speaker labels, sound effects, music) sdh_captions_vtt_gcs: Optional[str] = None # SDH-format captions (speaker labels, sound effects, music)
ad_vtt_gcs: str | None = None ad_vtt_gcs: Optional[str] = None
ad_mp3_gcs: str | None = None ad_mp3_gcs: Optional[str] = None
# Accessible video outputs # Accessible video outputs
accessible_video_gcs: str | None = None # Rendered accessible MP4 accessible_video_gcs: Optional[str] = None # Rendered accessible MP4
accessible_video_method: Literal["overlay", "pause_insert"] | None = None accessible_video_method: Optional[Literal["overlay", "pause_insert"]] = None
retimed_captions_vtt_gcs: str | None = None # Re-timed captions for pause-insert method retimed_captions_vtt_gcs: Optional[str] = None # Re-timed captions for pause-insert method
ad_cues_gcs_prefix: str | None = None # GCS path prefix for per-cue MP3 segments ad_cues_gcs_prefix: Optional[str] = None # GCS path prefix for per-cue MP3 segments
ad_cue_manifest: list[dict] | None = None # Per-cue manifest: [{cue_index, gcs_uri, text, duration_s}] ad_cue_manifest: Optional[list[dict]] = None # Per-cue manifest: [{cue_index, gcs_uri, text, duration_s}]
# QC editing state for accessible video # QC editing state for accessible video
video_segments_gcs_prefix: str | None = None # GCS prefix for persisted video segments video_segments_gcs_prefix: Optional[str] = None # GCS prefix for persisted video segments
accessible_video_edit_state: AccessibleVideoEditState | None = None accessible_video_edit_state: Optional[AccessibleVideoEditState] = None
origin: Literal["translate", "transcreate", "gemini_translate", "video_native"] | None = None origin: Optional[Literal["translate", "transcreate", "gemini_translate", "video_native"]] = None
qa_notes: str | None = None qa_notes: Optional[str] = None
descriptive_transcript_gcs: str | None = None # WCAG-compliant combined speech+description transcript descriptive_transcript_gcs: Optional[str] = None # WCAG-compliant combined speech+description transcript
class ReviewHistoryItem(BaseModel): class ReviewHistoryItem(BaseModel):
at: datetime at: datetime
status: str status: str
by: str | None = None by: Optional[str] = None
notes: str | None = None notes: Optional[str] = None
class Review(BaseModel): class Review(BaseModel):
notes: str | None = "" notes: Optional[str] = ""
reviewer_id: str | None = None reviewer_id: Optional[str] = None
history: list[ReviewHistoryItem] = [] history: list[ReviewHistoryItem] = []
# ── Per-language QC ───────────────────────────────────────────────────────────
class LanguageQCStatus(StrEnum):
PENDING = "pending"
IN_PROGRESS = "in_progress" # linguist is working
PENDING_REVIEW = "pending_review" # linguist submitted, awaiting reviewer
IN_REVIEW = "in_review" # reviewer has opened it
APPROVED = "approved"
REJECTED = "rejected"
class LanguageQCEvent(BaseModel):
at: datetime
actor_user_id: str
actor_email: str
action: Literal[
"assign", "reassign",
"reviewer_assigned", "reviewer_reassigned",
"start_work", "submit_for_review", "open_review",
"approve", "reject", "reopen",
"comment_added",
]
notes: str | None = None
previous_assignee_id: str | None = None
class LanguageQCComment(BaseModel):
id: str
author_id: str
author_name: str
author_email: str
body: str
created_at: datetime
class LanguageQCState(BaseModel):
status: LanguageQCStatus = LanguageQCStatus.PENDING
# Linguist slot
assigned_linguist_id: str | None = None
assigned_linguist_email: str | None = None
assigned_linguist_name: str | None = None
assigned_at: datetime | None = None
assigned_by_user_id: str | None = None
submitted_for_review_at: datetime | None = None
linguist_deadline: datetime | None = None # when linguist must submit
# Reviewer slot
assigned_reviewer_id: str | None = None
assigned_reviewer_email: str | None = None
assigned_reviewer_name: str | None = None
assigned_reviewer_at: datetime | None = None
review_started_at: datetime | None = None
reviewer_deadline: datetime | None = None # when reviewer must decide
# Reviewer progress
total_cues: int | None = None # set when reviewer opens the job
reviewed_cues: int = 0 # incremented as reviewer marks cues reviewed
# Final outcome
reviewed_at: datetime | None = None
reviewed_by_user_id: str | None = None
reviewed_by_email: str | None = None
notes: str | None = None
reject_category: str | None = None # e.g. timing/mistranslation/terminology/profanity/length
history: list[LanguageQCEvent] = []
comments: list[LanguageQCComment] = []
class QCAssignment(BaseModel):
"""Denormalized for efficient per-linguist queue queries."""
lang: str
linguist_id: str
status: LanguageQCStatus
class AISection(BaseModel): class AISection(BaseModel):
ingestion_json: dict[str, Any] | None = None ingestion_json: Optional[dict[str, Any]] = None
confidence: float | None = None confidence: Optional[float] = None
class AccessibleVideoProgressItem(BaseModel): class AccessibleVideoProgressItem(BaseModel):
"""Progress tracking for accessible video rendering per language.""" """Progress tracking for accessible video rendering per language."""
status: Literal["pending", "rendering", "completed", "failed"] = "pending" status: Literal["pending", "rendering", "completed", "failed"] = "pending"
method: Literal["overlay", "pause_insert"] | None = None method: Optional[Literal["overlay", "pause_insert"]] = None
error_message: str | None = None error_message: Optional[str] = None
started_at: datetime | None = None started_at: Optional[datetime] = None
completed_at: datetime | None = None completed_at: Optional[datetime] = None
class Job(BaseModel): class Job(BaseModel):
id: str | None = Field(None, alias="_id") id: Optional[str] = Field(None, alias="_id")
client_id: str client_id: str
title: str title: str
source: Source source: Source
requested_outputs: RequestedOutputs requested_outputs: RequestedOutputs
status: JobStatus = JobStatus.CREATED status: JobStatus = JobStatus.CREATED
review: Review = Review() review: Review = Review()
outputs: dict[str, LangOutput] | None = None outputs: Optional[dict[str, LangOutput]] = None
accessible_video_progress: dict[str, AccessibleVideoProgressItem] | None = None accessible_video_progress: Optional[dict[str, AccessibleVideoProgressItem]] = None
ai: AISection | None = None ai: Optional[AISection] = None
error: dict[str, Any] | None = None error: Optional[dict[str, Any]] = None
failure: JobFailure | None = None # structured failure info; see failure.step for pipeline stage tts_rewrites: Optional[list[dict[str, Any]]] = None # Track auto-rewritten TTS cues
retry_count: int = 0 # total number of manual retries attempted project_id: Optional[str] = None # Platform project this job belongs to (Client → Project → Job)
tts_rewrites: list[dict[str, Any]] | None = None # Track auto-rewritten TTS cues brand_context: Optional[str] = None # Brand names present in the video for accurate product identification
project_id: str | None = None # Platform project this job belongs to (Client → Project → Job) cost_tracker_project_id: Optional[str] = None # External project ID for AI cost attribution
organization_id: str | None = None # org-tenant ID; backfilled by 2026-04-28-000003 migration created_at: Optional[datetime] = None
brief_id: str | None = None # JobBrief that originated this job (W-12) updated_at: Optional[datetime] = None
gcs_prefix: str | None = None # GCS path prefix; None = legacy flat {job_id}/ layout
initial_linguist_id: str | None = None
initial_reviewer_id: str | None = None
brand_context: str | None = None # Brand names present in the video for accurate product identification
cost_tracker_project_id: str | None = None # External project ID for AI cost attribution
deadline: datetime | None = None # job-level PM deadline (overdue if past and not completed)
language_qc: dict[str, LanguageQCState] = {} # per-language QC state, keyed by lang code
qc_assignments: list[QCAssignment] = [] # denormalized for linguist-queue queries
created_at: datetime | None = None
updated_at: datetime | None = None
class Config: class Config:
populate_by_name = True populate_by_name = True
@ -279,17 +181,15 @@ class Job(BaseModel):
class JobCreate(BaseModel): class JobCreate(BaseModel):
title: str title: str
source_is_english: bool = True # True = English source, False = other language (auto-detect) source_is_english: bool = True # True = English source, False = other language (auto-detect)
language_hint: str | None = None # Optional hint when source_is_english=False language_hint: Optional[str] = None # Optional hint when source_is_english=False
requested_outputs: RequestedOutputs requested_outputs: RequestedOutputs
brand_context: str | None = None # Comma-separated brand names present in the video (e.g. "Sellotape, Coca-Cola") brand_context: Optional[str] = None # Comma-separated brand names present in the video (e.g. "Sellotape, Coca-Cola")
source_has_ad: bool = False # Source video already contains professional audio descriptions
class JobUpdate(BaseModel): class JobUpdate(BaseModel):
title: str | None = None title: Optional[str] = None
status: JobStatus | None = None status: Optional[JobStatus] = None
review: Review | None = None review: Optional[Review] = None
outputs: dict[str, LangOutput] | None = None outputs: Optional[dict[str, LangOutput]] = None
ai: AISection | None = None ai: Optional[AISection] = None
error: dict[str, Any] | None = None error: Optional[dict[str, Any]] = None
deadline: datetime | None = None

Some files were not shown because too many files have changed in this diff Show more