Compare commits

..

No commits in common. "main" and "dev" have entirely different histories.
main ... dev

384 changed files with 5716 additions and 31139 deletions

View file

@ -1,25 +0,0 @@
# Source Documentation Archive — 2026-04-29
## What was archived
Original non-canonical documentation files backed up before canonical structure was created.
## Files archived
| File | Migrated to |
|------|------------|
| `README.md` | Updated in place; canonical docs in `docs/` |
| `DEPLOYMENT.md` | `docs/project/runbook.md` + `docs/project/infrastructure.md` |
| `DEPLOYMENT_OPTIONS.md` | `docs/project/infrastructure.md` |
| `APACHE_DEPLOYMENT.md` | `docs/project/runbook.md` (Apache config section) |
## Rollback
To restore original files: copy from `original/` back to project root.
```
cp original/README.md ../../README.md
cp original/DEPLOYMENT.md ../../DEPLOYMENT.md
cp original/DEPLOYMENT_OPTIONS.md ../../DEPLOYMENT_OPTIONS.md
cp original/APACHE_DEPLOYMENT.md ../../APACHE_DEPLOYMENT.md
```

View file

@ -1,236 +0,0 @@
# Apache Frontend + Docker Backend Deployment Guide
## 🏗 Architecture Overview
**Frontend**: Built React app served by your existing Apache webserver
**Backend**: Docker containers running FastAPI + workers + database
```
Apache Webserver (Frontend) → Docker Backend Services
└── Built React App ├── FastAPI API (:8000)
├── Celery Workers
├── Change Stream Service
├── MongoDB
└── Redis
```
## 🚀 Deployment Steps
### 1. **Deploy Backend Services**
```bash
# 1. Create production environment file
cp .env.prod.example .env.prod
# Edit .env.prod with your production values
# 2. Start backend services only
docker-compose -f docker-compose.prod.yml up -d
# 3. Verify services are running
docker-compose -f docker-compose.prod.yml ps
```
**Running Services:**
- `accessible-video-api-prod` - FastAPI API (port 8000)
- `accessible-video-worker-prod` - Celery workers
- `accessible-video-mongo-prod` - MongoDB database
- `accessible-video-redis-prod` - Redis cache/queue
### 2. **Build and Deploy Frontend to Apache**
```bash
# 1. Configure frontend environment
cd frontend
cp .env.example .env.production.local
# Edit .env.production.local:
# VITE_API_URL=https://your-api-domain.com:8000
# VITE_SENTRY_DSN=your-sentry-dsn
# VITE_ENVIRONMENT=production
# 2. Build production frontend
npm run build
# 3. Deploy to Apache document root
sudo cp -r dist/* /var/www/html/your-app/
# OR
sudo rsync -av --delete dist/ /var/www/html/your-app/
```
### 3. **Configure Apache Virtual Host**
Create `/etc/apache2/sites-available/your-app.conf`:
```apache
<VirtualHost *:443>
ServerName your-domain.com
ServerAlias www.your-domain.com
DocumentRoot /var/www/html/your-app
# SSL Configuration
SSLEngine on
SSLCertificateFile /path/to/your/certificate.crt
SSLCertificateKeyFile /path/to/your/private.key
# Security Headers
Header always set X-Frame-Options "SAMEORIGIN"
Header always set X-Content-Type-Options "nosniff"
Header always set X-XSS-Protection "1; mode=block"
Header always set Referrer-Policy "strict-origin-when-cross-origin"
Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
# Compression
<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript
</IfModule>
# Caching for static assets
<LocationMatch "\.(css|js|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$">
ExpiresActive On
ExpiresDefault "access plus 1 year"
Header set Cache-Control "public, immutable"
</LocationMatch>
# Don't cache HTML files
<LocationMatch "\.html$">
ExpiresActive On
ExpiresDefault "access plus 0 seconds"
Header set Cache-Control "no-cache, no-store, must-revalidate"
</LocationMatch>
# React Router support (handle client-side routing)
<Directory "/var/www/html/your-app">
Options -Indexes
AllowOverride All
Require all granted
# Fallback to index.html for client-side routing
FallbackResource /index.html
</Directory>
# Optional: Proxy API requests (alternative to CORS)
# ProxyPreserveHost On
# ProxyPass /api/ http://your-docker-host:8000/api/
# ProxyPassReverse /api/ http://your-docker-host:8000/api/
# Logs
ErrorLog ${APACHE_LOG_DIR}/your-app_error.log
CustomLog ${APACHE_LOG_DIR}/your-app_access.log combined
</VirtualHost>
# HTTP to HTTPS redirect
<VirtualHost *:80>
ServerName your-domain.com
ServerAlias www.your-domain.com
Redirect permanent / https://your-domain.com/
</VirtualHost>
```
Enable the site:
```bash
sudo a2ensite your-app.conf
sudo systemctl reload apache2
```
## ⚙️ Configuration Files Updated
### `docker-compose.prod.yml`
- ✅ Removed frontend and nginx services
- ✅ Added CORS_ORIGINS environment variable
- ✅ Backend services only (API, workers, database)
### `.env.prod.example`
- ✅ Production environment template
- ✅ CORS configuration for Apache frontend
- ✅ All required variables documented
## 🔧 CORS Configuration
Since frontend and backend are on different domains, configure CORS in your backend:
**In `.env.prod`:**
```bash
CORS_ORIGINS=https://your-domain.com,https://www.your-domain.com
```
**Backend automatically handles CORS** based on this environment variable.
## 📋 Deployment Checklist
### Backend Services
- [ ] Copy `.env.prod.example` to `.env.prod`
- [ ] Update all environment variables in `.env.prod`
- [ ] Run `docker-compose -f docker-compose.prod.yml up -d`
- [ ] Verify API accessible at `http://your-docker-host:8000/docs`
- [ ] Check logs: `docker-compose -f docker-compose.prod.yml logs -f`
### Frontend Deployment
- [ ] Update `frontend/.env.production.local` with API URL
- [ ] Run `npm run build` in frontend directory
- [ ] Copy `dist/*` to Apache document root
- [ ] Configure Apache virtual host
- [ ] Enable site and reload Apache
- [ ] Test frontend loads and connects to API
### Security & Performance
- [ ] SSL certificate configured
- [ ] Security headers enabled
- [ ] Gzip compression enabled
- [ ] Static file caching configured
- [ ] CORS origins properly set
- [ ] Firewall rules: only expose port 8000 for API
## 🔍 Troubleshooting
### Common Issues
**CORS Errors:**
- Verify `CORS_ORIGINS` in `.env.prod` matches your domain
- Check browser dev tools for exact error
**API Connection Failed:**
- Verify `VITE_API_URL` in frontend build
- Check backend API is accessible from frontend server
- Ensure port 8000 is open and reachable
**React Router 404s:**
- Verify `FallbackResource /index.html` in Apache config
- Ensure `AllowOverride All` is set
**File Upload Issues:**
- Check Apache `LimitRequestBody` directive
- Verify backend can write to GCS bucket
### Monitoring Commands
```bash
# Backend services status
docker-compose -f docker-compose.prod.yml ps
# View logs
docker-compose -f docker-compose.prod.yml logs -f api
docker-compose -f docker-compose.prod.yml logs -f worker
# Apache status
sudo systemctl status apache2
sudo tail -f /var/log/apache2/your-app_error.log
```
## 🎯 Benefits of This Setup
**Separation of Concerns** - Frontend and backend independently deployable
**Existing Infrastructure** - Uses your current Apache setup
**Scalability** - Backend can be moved to different hosts easily
**Caching** - Apache handles static file caching efficiently
**SSL Termination** - Apache handles HTTPS for frontend
**Monitoring** - Separate logs and monitoring for each tier
Your backend services will run in Docker containers while the frontend integrates seamlessly with your existing Apache web server infrastructure.

View file

@ -1,168 +0,0 @@
# Deployment Options for Video Accessibility Platform
## 🏗 Current Docker Setup
Your `docker-compose.yml` serves **both frontend and backend** in **development mode**:
- **Frontend**: Vite dev server on port 5173 (hot reload)
- **Backend**: FastAPI on port 8000 (auto-reload)
- **Database**: MongoDB + Redis
- **Workers**: Celery + Change Stream service
## 🚀 Production Deployment Options
### 1. **All-in-Docker Production** ✅ Recommended
**What it does:**
- Frontend: Built React app served by Nginx (port 80)
- Backend: Production FastAPI (port 8000)
- Single `docker-compose up` deployment
**Usage:**
```bash
# Production deployment
docker-compose -f docker-compose.prod.yml up -d
# Access:
# Frontend: http://localhost:80
# Backend API: http://localhost:8000
```
**Benefits:**
- ✅ Single command deployment
- ✅ Optimized frontend build
- ✅ Production-ready configuration
- ✅ Built-in health checks
- ✅ Nginx caching and compression
### 2. **Single Domain with Nginx Proxy** ✅ Best UX
**What it does:**
- Everything served from one domain (port 80)
- `/api/*` routes to backend
- `/*` routes to frontend
- WebSocket support included
**Usage:**
```bash
# Uses nginx/nginx.conf for routing
docker-compose -f docker-compose.prod.yml up nginx
# Access everything at: http://localhost
```
**Benefits:**
- ✅ No CORS issues
- ✅ Single domain simplicity
- ✅ Better caching control
- ✅ Rate limiting built-in
- ✅ SSL termination ready
### 3. **Cloud-Native (Google Cloud)** 🌟 Enterprise
**Architecture:**
```
Frontend (Cloud Storage + CDN) → API (Cloud Run) → Database (MongoDB Atlas)
Workers (Cloud Run)
```
**Components:**
- **Frontend**: Build + deploy to Cloud Storage, serve via Cloud CDN
- **Backend**: Deploy to Cloud Run (auto-scaling)
- **Workers**: Separate Cloud Run service for Celery
- **Database**: MongoDB Atlas (managed)
- **Files**: Google Cloud Storage (already integrated)
**Benefits:**
- ✅ Auto-scaling
- ✅ Global CDN
- ✅ Managed services
- ✅ Pay-per-use
- ✅ High availability
## 📊 Comparison Matrix
| Option | Complexity | Cost | Scalability | Maintenance |
|--------|------------|------|-------------|-------------|
| **Dev Docker** | Low | Very Low | Limited | Manual |
| **Prod Docker** | Low | Low | Manual | Medium |
| **Nginx Proxy** | Medium | Low | Manual | Medium |
| **Cloud Native** | High | Variable | Automatic | Low |
## 🚀 Quick Migration Guide
### From Development → Production Docker
1. **Update environment variables:**
```bash
cp .env.example .env.prod
# Edit .env.prod with production values
```
2. **Deploy:**
```bash
docker-compose -f docker-compose.prod.yml up -d
```
3. **Verify:**
```bash
# Frontend (optimized build)
curl http://localhost:80
# Backend API
curl http://localhost:8000/health
```
### From Docker → Cloud Native
1. **Build frontend:**
```bash
cd frontend && npm run build
gsutil -m rsync -r -d dist/ gs://your-bucket/
```
2. **Deploy backend:**
```bash
gcloud run deploy video-api --source=./backend --region=us-central1
```
3. **Deploy workers:**
```bash
gcloud run deploy video-workers --source=./backend --region=us-central1
```
## 🔧 Configuration Files Created
### `docker-compose.prod.yml`
- Production-ready Docker setup
- Nginx serving frontend
- Optimized environment variables
- Health checks included
### `nginx/nginx.conf`
- Single-domain routing configuration
- API proxy with rate limiting
- WebSocket support
- Static file caching
- Security headers
## 🎯 Recommendations by Use Case
### **Small Team / MVP**
→ Use **Production Docker** (`docker-compose.prod.yml`)
### **Growing Business**
→ Use **Nginx Proxy** setup for better performance
### **Enterprise / Scale**
→ Go **Cloud Native** with Google Cloud Run + CDN
## 🔍 Current Status
**Development**: Already working with `docker-compose up`
**Production Docker**: Ready with `docker-compose.prod.yml`
**Nginx Proxy**: Configured and ready to deploy
⚠️ **Cloud Native**: Requires GCP setup and configuration
Your current Docker setup is **development-optimized**. For production, use the new `docker-compose.prod.yml` which properly builds and serves the React app through Nginx while keeping the backend API separate but coordinated.

View file

@ -1,384 +0,0 @@
# Accessible Video Processing Platform
A comprehensive AI-powered platform for generating accessible video content with closed captions, audio descriptions, and multi-language translations. Features a complete workflow from video upload to final delivery with quality control processes.
## ✅ Current Status: **Production-Ready** (85% Complete)
**Lines of Code:** 20,471 total (12,198 backend + 8,273 frontend)
## 🚀 Key Features Implemented
### Core Functionality ✅
- **AI-Powered Processing**: Complete Gemini 2.5 Pro integration for intelligent caption and audio description generation
- **Multi-Language Pipeline**: Google Translate + cultural transcreation with 50+ language support
- **Quality Control Workflow**: Full reviewer approval/rejection system with VTT editing capabilities
- **Audio Description TTS**: Google Cloud TTS and ElevenLabs integration with audio synthesis
- **Real-time Updates**: WebSocket-powered job status tracking and notifications
- **Advanced Video Player**: Multi-language caption support with timeline navigation
- **Role-Based Access Control**: Complete CLIENT/REVIEWER/ADMIN role system
### Security & Infrastructure ✅
- **JWT Authentication**: Secure access/refresh token system with HttpOnly cookies
- **Audit Logging**: Comprehensive audit trail for all reviewer actions
- **Signed URLs**: Secure Google Cloud Storage file access (24h expiry)
- **Input Validation**: Complete request validation and sanitization
- **HTTPS/CORS**: Production-ready security configuration
### User Experience ✅
- **Responsive Design**: Mobile-first Tailwind CSS implementation
- **Real-time Feedback**: Live job progress tracking and notifications
- **Advanced File Management**: Drag-and-drop uploads with progress indicators
- **VTT Editor**: Inline caption editing with live preview
- **Download Portal**: Secure asset delivery with organized file structure
## 🛠 Tech Stack
### Backend (FastAPI + Python 3.11)
- **FastAPI 0.115.0** - Modern async web framework with OpenAPI documentation
- **Celery 5.3.4** - Distributed task queue with Redis broker
- **MongoDB 7.0** - Document database with replica set support
- **Redis 7.2** - Caching and message queuing
- **Google Cloud Platform** - Storage, AI services, Secret Manager, TTS
- **Pydantic 2.5** - Data validation and serialization
- **OpenTelemetry** - Observability and monitoring
- **Sentry** - Error tracking and performance monitoring
### Frontend (React 19 + TypeScript)
- **React 19.1.1** - Modern UI framework with latest features
- **Vite 7.1.2** - Lightning-fast build tool and dev server
- **TypeScript 5.8** - Full type safety throughout application
- **TanStack Query 5.85** - Advanced server state management with caching
- **React Router 7.8** - Client-side routing with protected routes
- **Tailwind CSS 4.1** - Utility-first CSS framework
- **Zustand 5.0** - Lightweight client state management
- **React Hook Form + Zod** - Form handling with schema validation
## 🏗 Architecture Overview
### Complete Job Processing Pipeline ✅
```
Upload → Ingestion → AI Processing → QC Review → Translation → TTS → Final Review → Delivery
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
GCS Gemini 2.5 VTT Generation Human Google Text-to- Reviewer Email +
Storage Pro + Validation Review Translate Speech Approval Downloads
```
### System Architecture
- **Monorepo Structure**: `/backend`, `/frontend`, `/infra` with clear separation
- **Microservices Ready**: Modular FastAPI services with proper dependency injection
- **Event-Driven**: WebSocket real-time updates with connection management
- **Scalable Workers**: Celery task queue with auto-retry and error recovery
- **Secure by Design**: RBAC, signed URLs, audit logging, input validation
## 🚀 Getting Started
### Prerequisites
- **Python 3.11+** (backend development)
- **Node.js 18+** (frontend development)
- **Docker & Docker Compose** (required for local development)
- **Google Cloud Project** with APIs enabled (for video processing)
### 🐳 Local Development with Docker (Recommended)
This is the recommended approach for local development. Backend services run in Docker containers while the frontend runs via Vite dev server for fast hot-reload.
#### Initial Setup
```bash
# 1. Clone the repository
git clone <repository>
cd video_accessibility
# 2. Copy and configure environment files
cp .env.prod.example .env.local
# Edit .env.local with your API keys and settings
# 3. Set up frontend environment
cp frontend/.env.example frontend/.env.local
# The defaults should work for local development
# 4. Ensure GCP credentials are in place
# Copy your GCP service account JSON to: ./secrets/gcp-credentials.json
```
#### Starting the Development Environment
**Step 1: Start Backend Services (Docker)**
```bash
# Start API, Worker, MongoDB, and Redis in Docker
./scripts/run-local.sh
# Services will be available at:
# - API: http://localhost:8003
# - API Docs: http://localhost:8003/docs
# - MongoDB: mongodb://localhost:27017
# - Redis: redis://localhost:6379
```
**Step 2: Start Frontend (Vite Dev Server)**
```bash
# In a separate terminal
cd frontend
npm install # First time only
npm run dev
# Frontend will be available at:
# - Application: http://localhost:6001/video-accessibility
```
#### Useful Commands
```bash
# View logs
docker compose logs -f api # API logs
docker compose logs -f worker # Worker logs
docker compose logs -f # All logs
# Restart a service
docker compose restart api
docker compose restart worker
# Rebuild and restart (after code changes)
./scripts/run-local.sh --rebuild
# Stop all services
./scripts/run-local.sh --stop
# or
docker compose down
```
#### Test User Credentials (Local Development Only)
For testing different user roles locally:
```
Admin: admin@example.com / admin
Production: production@example.com / production
Reviewer: reviewer@example.com / reviewer
Client: client@example.com / client123
```
**Note**: These test users are only for local development. Production uses Microsoft authentication.
### Alternative: Native Development (Without Docker)
For development without Docker, you'll need to run each service manually:
```bash
# Terminal 1: MongoDB
mongod --dbpath ./data/db
# Terminal 2: Redis
redis-server
# Terminal 3: Backend API
cd backend
poetry install
poetry run uvicorn app.main:app --reload --port 8000
# Terminal 4: Celery Worker
cd backend
poetry run celery -A app.tasks worker --loglevel=info
# Terminal 5: Frontend
cd frontend
npm install
npm run dev
```
**Note**: The Docker approach is strongly recommended as it ensures consistency and simplifies setup.
### Testing & Quality
```bash
# Backend tests + linting
cd backend
poetry run pytest
poetry run ruff check .
poetry run mypy .
# Frontend tests + linting
cd frontend
npm run test
npm run test:e2e
npm run lint
npm run type-check
```
## 📁 Project Structure
```
video_accessibility/ # Root monorepo
├── backend/ # FastAPI Python backend (12,198 LOC)
│ ├── app/
│ │ ├── api/v1/ # REST API endpoints
│ │ │ ├── auth.py # JWT authentication
│ │ │ ├── jobs.py # Job CRUD & workflow
│ │ │ ├── admin.py # Admin operations
│ │ │ └── files.py # File management
│ │ ├── core/ # Core configuration
│ │ ├── models/ # Database models
│ │ ├── schemas/ # Pydantic request/response schemas
│ │ ├── services/ # External service integrations
│ │ │ ├── gemini.py # AI processing
│ │ │ ├── gcs.py # Google Cloud Storage
│ │ │ ├── translation.py # Multi-language support
│ │ │ └── tts.py # Text-to-speech
│ │ ├── tasks/ # Celery background workers
│ │ ├── middleware/ # Request processing
│ │ └── telemetry/ # Observability
│ ├── tests/ # Comprehensive test suite
│ └── Dockerfile # Container configuration
├── frontend/ # React TypeScript SPA (8,273 LOC)
│ ├── src/
│ │ ├── routes/ # Page components
│ │ │ ├── auth/ # Login system
│ │ │ ├── jobs/ # Job management
│ │ │ ├── qc/ # Quality control
│ │ │ └── admin/ # Admin interface
│ │ ├── components/ # Reusable UI components
│ │ │ ├── VideoWithCaptions.tsx # Advanced video player
│ │ │ ├── VttEditor.tsx # Caption editing
│ │ │ └── UploadDropzone.tsx # File upload
│ │ ├── lib/ # Utilities and API client
│ │ ├── hooks/ # Custom React hooks
│ │ └── types/ # TypeScript definitions
│ ├── tests/ # Unit + E2E tests
│ ├── .env.local # Local development config
│ └── Dockerfile # Container configuration
├── scripts/
│ ├── run-local.sh # Local development startup
│ ├── deploy.sh # Production deployment
│ ├── full-deploy.sh # Full production rebuild
│ └── build-frontend.sh # Frontend build script
├── docker-compose.yml # Base Docker configuration
├── docker-compose.local.yml # Local development overrides
├── docker-compose.prod.yml # Production overrides
├── .env.local # Local environment variables
├── .env.production # Production environment variables
├── CLAUDE.md # Development guidelines
└── video_accessibility_development_plan.txt # Complete specification
```
## ⚙️ Configuration
### Environment Variables
**Backend** (`backend/.env`):
```bash
# Database
MONGODB_URL=mongodb://admin:password@localhost:27017/accessible_video
REDIS_URL=redis://localhost:6379/0
# Authentication
JWT_SECRET_KEY=your-jwt-secret
JWT_REFRESH_SECRET_KEY=your-refresh-secret
# AI Services
GEMINI_API_KEY=your-gemini-key
ELEVENLABS_API_KEY=your-elevenlabs-key
# Google Cloud
GCS_BUCKET_NAME=your-bucket-name
GOOGLE_CLOUD_PROJECT=your-project-id
# Email
SENDGRID_API_KEY=your-sendgrid-key
# Monitoring
SENTRY_DSN=your-sentry-dsn
```
**Frontend** (`frontend/.env`):
```bash
VITE_API_URL=http://localhost:8000
VITE_SENTRY_DSN=your-sentry-dsn
VITE_ENVIRONMENT=development
```
### Google Cloud Setup
1. **Create GCP Project** with billing enabled
2. **Enable APIs**:
- Cloud Storage API
- Cloud Translation API
- Cloud Text-to-Speech API
- Vertex AI API (for Gemini)
- Secret Manager API
3. **Create Service Account** with roles:
- Storage Admin
- AI Platform Admin
- Secret Manager Admin
4. **Download JSON key** and set `GOOGLE_APPLICATION_CREDENTIALS`
## 🚢 Deployment Options
### Production Architecture (Google Cloud)
- **Frontend**: Cloud Storage + Cloud CDN (static hosting)
- **Backend API**: Cloud Run (serverless, auto-scaling)
- **Workers**: Cloud Run (Celery with Redis)
- **Database**: MongoDB Atlas (managed)
- **Queue**: Cloud Memorystore (Redis)
- **Storage**: Google Cloud Storage
- **Monitoring**: Cloud Monitoring + Sentry
### Docker Production
```bash
# Build production images
docker-compose -f docker-compose.prod.yml up -d
```
## 🔒 Security Features
### Implemented Security ✅
- **JWT Authentication**: Access (15min) + refresh (7 days) token rotation
- **RBAC System**: CLIENT/REVIEWER/ADMIN roles with endpoint protection
- **Secure Storage**: HttpOnly cookies for refresh tokens
- **File Security**: Signed URLs with 24h expiry, no client access to raw files
- **Input Validation**: Comprehensive Pydantic validation on all endpoints
- **Audit Logging**: Complete trail of all reviewer actions and system events
- **CORS Protection**: Configured for production domains
- **Rate Limiting**: Request throttling and validation middleware
## 🔧 API Documentation
### Key Endpoints Implemented
```
POST /api/v1/auth/login # Authentication
POST /api/v1/jobs # Create job with file upload
GET /api/v1/jobs # List jobs (filtered by role)
GET /api/v1/jobs/{id} # Job details with real-time status
POST /api/v1/jobs/{id}/actions/* # Workflow actions (approve/reject/complete)
GET /api/v1/jobs/{id}/vtt # VTT content retrieval
PATCH /api/v1/jobs/{id}/vtt # VTT editing and updates
GET /api/v1/jobs/{id}/downloads # Signed download URLs
WS /api/v1/ws/jobs/{id} # Real-time job status updates
```
**OpenAPI Documentation**: http://localhost:8000/docs
## 🎯 Development Status
### ✅ Completed (Production Ready)
- **User Management**: Full authentication, RBAC, password management
- **Job Pipeline**: Complete video processing workflow with state machine
- **Quality Control**: VTT editor, approval workflows, reviewer dashboards
- **Real-time Features**: WebSocket updates, live notifications
- **Multi-language**: Translation pipeline with cultural transcreation
- **File Management**: Secure uploads, downloads, asset validation
- **Admin Features**: User management, system monitoring, audit logs
### ⚠️ Needs Attention (Minor)
- **Integration Tests**: Framework exists but needs completion
- **Email Templates**: Service implemented, templates may need customization
- **Performance Testing**: No load testing implemented yet
- **Documentation**: API docs complete, user guides could be enhanced
### 🎯 Recommended Next Steps
1. **Complete integration test suite** for end-to-end validation
2. **Performance testing** with realistic video processing loads
3. **Production deployment** configuration and CI/CD pipeline
4. **User documentation** and training materials
5. **Monitoring dashboards** for production operations
## 📚 Development Resources
- **Complete Specification**: `video_accessibility_development_plan.txt`
- **Development Guidelines**: `CLAUDE.md`
- **API Documentation**: http://localhost:8000/docs (when running)
- **Test Coverage Reports**: `backend/htmlcov/` (after running tests)

View file

@ -1,94 +0,0 @@
{
"permissions": {
"allow": [
"WebSearch",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/backend && ruff check app/services/elevenlabs_voices.py app/services/tts.py app/api/v1/routes_tts.py app/models/job.py app/tasks/tts_synthesis.py app/core/config.py 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/backend && python -m ruff check app/services/elevenlabs_voices.py app/services/tts.py app/api/v1/routes_tts.py app/models/job.py app/tasks/tts_synthesis.py app/core/config.py 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/backend && pip3 show ruff 2>&1 | head -5; which pip3 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/frontend && npm run type-check 2>&1 | tail -20)",
"Bash(node_modules/.bin/tsc --noEmit 2>&1 | tail -20)",
"Bash(./node_modules/.bin/tsc --noEmit 2>&1 | tail -30)",
"Bash(npm run type-check 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/frontend && npm run type-check 2>&1)",
"Bash(npm run lint 2>&1)",
"WebFetch(domain:dcmp.org)",
"WebFetch(domain:www.w3.org)",
"WebFetch(domain:partnerhelp.netflixstudios.com)",
"WebFetch(domain:m.media-amazon.com)",
"WebFetch(domain:www.acb.org)",
"Bash(./node_modules/.bin/tsc --noEmit)",
"Bash(node_modules/.bin/tsc --noEmit)",
"Bash(pandoc --version)",
"WebFetch(domain:ai-sandbox.oliver.solutions)",
"Bash(gcloud run:*)",
"Bash(gcloud logging:*)",
"Bash(ssh optical:*)",
"Bash(/Volumes/SSD/Projects/Oliver/video-accessibility/backend/.venv/bin/python3.11 -c \"import sys; sys.path.insert\\(0, '.'\\); from app.models.user import UserRole; print\\([r.value for r in UserRole]\\)\")",
"Bash(npm list *)",
"Bash(brew list *)",
"Bash(npx --yes puppeteer --version)",
"Bash(node md_to_pdf.js)",
"Bash(npm root *)",
"Bash(node *)",
"Bash(ssh optical-web-1 *)",
"Bash(git *)",
"WebFetch(domain:docs.anthropic.com)",
"Bash(poetry lock *)",
"Bash(pip show *)",
"Read(//Users/ai_leed/.local/bin/**)",
"Read(//opt/homebrew/bin/**)",
"Bash(pip3 install *)",
"Bash(poetry --version)",
"Bash(docker run *)",
"Read(//Users/ai_leed/.docker/run/**)",
"Bash(docker context *)",
"Bash(DOCKER_HOST=unix:///var/run/docker.sock docker run --rm -v \"$\\(pwd\\):/app\" -w /app python:3.11-slim bash -c \"pip install poetry==1.8.2 -q && poetry lock --no-update\")",
"Bash(brew install *)",
"Bash(npm run *)",
"Bash(scp /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/models/audit_log.py optical:/tmp/audit_log.py)",
"Bash(scp *)",
"Bash(kill %1)",
"Bash(ssh optical-dev *)",
"Skill(fullstack-dev-skills:security-reviewer)",
"Bash(chmod +x *)",
"Bash(gcloud auth *)",
"Bash(gcloud config *)",
"Bash(gcloud artifacts *)",
"Bash(sed -n '190,200p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '1914,1922p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '2048,2062p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '2490,2502p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '2628,2638p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(gcloud builds submit *)",
"Bash(gcloud builds describe 79802b34-e17b-4446-b01d-68d99d569262 *)",
"Bash(gcloud compute instances list *)",
"Bash(gcloud compute networks vpc-access connectors list *)",
"Bash(gcloud builds *)",
"Bash(gcloud projects get-iam-policy optical-414516 *)",
"Bash(gcloud projects *)",
"Bash(npm audit *)",
"Skill(codebase-audit-suite:ln-622-build-auditor)",
"Skill(codebase-audit-suite:ln-624-code-quality-auditor)",
"Skill(codebase-audit-suite:ln-625-dependencies-auditor)",
"Skill(codebase-audit-suite:ln-626-dead-code-auditor)",
"Bash(/opt/homebrew/bin/ruff check *)",
"Bash(npm test *)",
"Bash(sed -n '35,42p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/src/test/utils.tsx)",
"Bash(sed -n '55,90p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/tests/helpers/auth.ts)",
"Bash(sed -n '48,60p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/src/components/Layout/Sidebar.tsx)",
"Bash(sed -n '152,170p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/src/components/Layout/Sidebar.tsx)",
"Bash(poetry env *)",
"Bash(poetry install *)",
"Bash(poetry run *)",
"Bash(docker info *)",
"Bash(sed -n '1,30p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/services/gcs.py)",
"Bash(sed -n '155,165p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/services/gcs.py)",
"Bash(gcloud secrets *)",
"Bash(openssl rand *)",
"Bash(ssh *)",
"Skill(commit-commands:commit-push-pr)",
"Bash(obsidian read *)",
"Bash(obsidian search *)"
]
}
}

View file

@ -10,8 +10,6 @@ REDIS_URL=redis://redis:6379/0
# JWT Authentication # JWT Authentication
JWT_SECRET_KEY=your-production-jwt-secret-key-min-32-chars JWT_SECRET_KEY=your-production-jwt-secret-key-min-32-chars
JWT_REFRESH_SECRET_KEY=your-production-refresh-secret-key-min-32-chars JWT_REFRESH_SECRET_KEY=your-production-refresh-secret-key-min-32-chars
# Required: admin account created on first boot. Unset = admin not seeded.
DEFAULT_ADMIN_PASSWORD=your-secure-admin-password
# AI Services # AI Services
GEMINI_API_KEY=your-gemini-api-key GEMINI_API_KEY=your-gemini-api-key
@ -21,11 +19,8 @@ ELEVENLABS_API_KEY=your-elevenlabs-api-key
GCS_BUCKET_NAME=your-production-bucket-name GCS_BUCKET_NAME=your-production-bucket-name
GOOGLE_CLOUD_PROJECT=your-gcp-project-id GOOGLE_CLOUD_PROJECT=your-gcp-project-id
# Email Service (Mailgun) # Email Service
SENDGRID_API_KEY= SENDGRID_API_KEY=your-sendgrid-api-key
MAILGUN_API_KEY=your-mailgun-api-key
MAILGUN_DOMAIN=mg.oliver.solutions
MAILGUN_FROM=noreply@mg.oliver.solutions
# Monitoring # Monitoring
SENTRY_DSN=your-sentry-dsn-url SENTRY_DSN=your-sentry-dsn-url

View file

@ -9,18 +9,18 @@
# App Configuration # App Configuration
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
APP_ENV=prod APP_ENV=prod
API_BASE_URL=https://optical-dev.oliver.solutions/video-accessibility API_BASE_URL=https://ai-sandbox.oliver.solutions/video-accessibility-back
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Authentication & Security # Authentication & Security
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# IMPORTANT: Generate a secure random secret for JWT_SECRET # IMPORTANT: Generate a secure random secret for JWT_SECRET
# Example: openssl rand -hex 32 # Example: openssl rand -hex 32
JWT_SECRET=d81fd31798510f53b374951908b6bedd75f7ddaabe9b4e4c4ca5bf81393f48b7 JWT_SECRET=CHANGE_ME_TO_SECURE_RANDOM_64_CHAR_STRING
JWT_ALG=HS256 JWT_ALG=HS256
JWT_ACCESS_TTL_MIN=240 JWT_ACCESS_TTL_MIN=240
JWT_REFRESH_TTL_DAYS=7 JWT_REFRESH_TTL_DAYS=7
COOKIE_DOMAIN=optical-dev.oliver.solutions COOKIE_DOMAIN=ai-sandbox.oliver.solutions
COOKIE_SECURE=true COOKIE_SECURE=true
COOKIE_SAMESITE=Lax COOKIE_SAMESITE=Lax
@ -63,31 +63,29 @@ TRANSLATE_API_KEY=
ELEVENLABS_API_KEY=sk_c17be2768ca784f1807018420b84c7f1ee969946e698f986 ELEVENLABS_API_KEY=sk_c17be2768ca784f1807018420b84c7f1ee969946e698f986
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Email Configuration (Mailgun) # Email Configuration (SendGrid)
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# IMPORTANT: Get SendGrid API key from https://app.sendgrid.com/settings/api_keys
SENDGRID_API_KEY= SENDGRID_API_KEY=
MAILGUN_API_KEY=1d8c6f38c53f237305353cc2e55f39f2-c6620443-4b9961f5
MAILGUN_DOMAIN=mg.oliver.solutions
MAILGUN_FROM=noreply@mg.oliver.solutions
# Email sender address # Email sender address (must be verified in SendGrid)
EMAIL_FROM=noreply@mg.oliver.solutions EMAIL_FROM=noreply@ai-sandbox.oliver.solutions
# Client-facing URL (used in emails) # Client-facing URL (used in emails)
CLIENT_BASE_URL=https://optical-dev.oliver.solutions/video-accessibility CLIENT_BASE_URL=https://ai-sandbox.oliver.solutions/video-accessibility
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Microsoft Authentication (Azure AD) # Microsoft Authentication (Azure AD)
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
AZURE_CLIENT_ID=9079054c-9620-4757-a256-23413042f1ef AZURE_CLIENT_ID=9079054c-9620-4757-a256-23413042f1ef
AZURE_AUTHORITY=https://login.microsoftonline.com/e519c2e6-bc6d-4fdf-8d9c-923c2f002385 AZURE_AUTHORITY=https://login.microsoftonline.com/e519c2e6-bc6d-4fdf-8d9c-923c2f002385
AZURE_REDIRECT_URI=https://optical-dev.oliver.solutions/video-accessibility/ AZURE_REDIRECT_URI=https://ai-sandbox.oliver.solutions/video-accessibility/
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# CORS Configuration # CORS Configuration
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Comma-separated list of allowed origins # Comma-separated list of allowed origins
CORS_ORIGINS=https://optical-dev.oliver.solutions CORS_ORIGINS=https://ai-sandbox.oliver.solutions
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Observability & Monitoring (Optional) # Observability & Monitoring (Optional)
@ -118,9 +116,6 @@ OTEL_EXPORTER_OTLP_ENDPOINT=
WHISPER_SERVICE_URL=https://whisper-http-service-bcb6ipdqka-uc.a.run.app WHISPER_SERVICE_URL=https://whisper-http-service-bcb6ipdqka-uc.a.run.app
FFMPEG_SERVICE_URL=https://ffmpeg-http-service-bcb6ipdqka-uc.a.run.app FFMPEG_SERVICE_URL=https://ffmpeg-http-service-bcb6ipdqka-uc.a.run.app
# optical-dev uses Celery workers (not Cloud Run Jobs) for pipeline dispatch
USE_CELERY_FALLBACK=true
# Worker Concurrency (higher values for Cloud Run mode since workers just make HTTP calls) # Worker Concurrency (higher values for Cloud Run mode since workers just make HTTP calls)
WHISPER_WORKER_CONCURRENCY=10 WHISPER_WORKER_CONCURRENCY=10
FFMPEG_WORKER_CONCURRENCY=20 FFMPEG_WORKER_CONCURRENCY=20

View file

@ -1,23 +0,0 @@
# Screenshot capture credentials — copy to .env.screenshots and fill in values
# NEVER commit .env.screenshots (it is gitignored)
BASE_URL=https://optical-dev.oliver.solutions/video-accessibility
# Local-password admin seeded by backend/scripts/seed_test_users.py
TEST_ADMIN_EMAIL=test-admin@oliver.agency
TEST_ADMIN_PASSWORD=TestAdmin2026!
TEST_CLIENT_EMAIL=test-client@oliver.agency
TEST_CLIENT_PASSWORD=TestClient2026!
TEST_LINGUIST_EMAIL=test-linguist@oliver.agency
TEST_LINGUIST_PASSWORD=TestLinguist2026!
TEST_REVIEWER_EMAIL=test-reviewer@oliver.agency
TEST_REVIEWER_PASSWORD=TestReviewer2026!
TEST_PRODUCTION_EMAIL=test-production@oliver.agency
TEST_PRODUCTION_PASSWORD=TestProduction2026!
TEST_PM_EMAIL=test-pm@oliver.agency
TEST_PM_PASSWORD=TestPM2026!

13
.gitignore vendored
View file

@ -12,7 +12,6 @@ examples/
.env.local .env.local
.env.production .env.production
.env.*.local .env.*.local
.env.screenshots
secrets/ secrets/
*.pem *.pem
*.key *.key
@ -99,15 +98,3 @@ docs/*.pdf
/var/www/html/video-accessibility.backup.* /var/www/html/video-accessibility.backup.*
backend/.env backend/.env
# Node / npm artifacts at repo root (Playwright MCP installs these)
node_modules/
package.json
package-lock.json
# Playwright MCP session snapshots
.playwright-mcp/
# Test videos
test-video.mp4
.worktrees/

View file

@ -1,118 +0,0 @@
# Build Health Audit — ln-622
**Score: 5.5/10** | Issues: 28 (C:0 H:5 M:18 L:5)
**Date:** 2026-04-30 | **Stack:** Python 3.11 / FastAPI / Celery + React 19 / Vite / TypeScript 5.8
---
## 1. Compiler / Linter Errors
### Backend — ruff: 1314 errors (HIGH)
`ruff check app/` exits non-zero with 1314 violations. The ruff config in `pyproject.toml` uses **deprecated top-level `select`/`ignore`/`per-file-ignores`** instead of `[tool.ruff.lint]` — ruff emits a warning on every run.
Top violation codes:
| Code | Meaning | Volume |
|------|---------|--------|
| I001 | Import block unsorted | ~400 |
| UP | pyupgrade (f-strings, typing aliases) | ~500 |
| B | flake8-bugbear | ~200 |
| F401 | Unused import | 58 |
Most violations are **auto-fixable** (`ruff check --fix`). The unsorted imports and UP rules are cosmetic but make CI noisy and block future enforcement.
**Severity: HIGH** — CI cannot gate on ruff without fixing this first.
### Frontend — ESLint: 36 problems (30 errors, 6 warnings) (MEDIUM)
Key errors:
| File | Rule | Count |
|------|------|-------|
| `contexts/GlobalWebSocketContext.tsx:56` | `react-refresh/only-export-components` | 1 |
| `contexts/NotificationContext.tsx:91` | `react-refresh/only-export-components` | 1 |
| `contexts/ToastContext.tsx:83` | `react-refresh/only-export-components` | 1 |
| `lib/api.ts:539` | `@typescript-eslint/no-explicit-any` | 1 |
| `routes/admin/QCDetail.tsx` | `@typescript-eslint/no-explicit-any` | 6 |
| `routes/AcceptInvite.tsx` | `@typescript-eslint/no-explicit-any` | 1 |
| `routes/jobs/JobDetail.tsx` | `no-unused-vars` (err catch) | 2 |
| `hooks/__tests__/useJob.test.tsx` | `no-unused-vars` | 1 |
| `tests/helpers/auth.ts` | `no-explicit-any` | 3 |
**Severity: MEDIUM** — build succeeds, but `any` types and react-refresh errors degrade DX and HMR.
---
## 2. Type Errors
### Frontend — tsc: CLEAN ✓
`tsc --noEmit` exits 0. No TypeScript compilation errors. The `any` issues above are ESLint-level, not tsc errors.
### Backend — mypy: NOT RUN
Cannot run mypy outside the poetry venv. Needs `poetry run mypy .` inside Docker or an activated venv.
**Severity: LOW** (mypy not blocking, but should be run in CI)
---
## 3. Tests
### Frontend — vitest: 13 failed / 75 total (HIGH)
8 test files affected:
| Test | Failures | Root cause |
|------|----------|-----------|
| `auth.test.ts` | 1 | Mock shape mismatch — response has extra field `organizationId` |
| `StatusBadge.test.tsx` | 1 | Unknown status no longer renders text (component changed) |
| `VttEditor.test.tsx` | 1 | Multiple elements found for `Insert cue before` title — DOM duplication |
| `useJob.test.tsx` | 3 | `useApproveEnglish` — pending state never resolves in test (timeout 1s); `useCreateJob` arg mismatch |
| `UploadDropzone.test.tsx` | 6 | Text broken across elements — test uses exact string match, component renders in `<span>` nodes |
| `useJobStatusWebSocket.test.tsx` | 1 | (see output) |
**Severity: HIGH** — 17% test failure rate. Several are stale tests from component refactors (UploadDropzone, StatusBadge).
### Backend — pytest: CANNOT RUN (CRITICAL)
Running `pytest` outside poetry venv fails with `ModuleNotFoundError` for `fastapi`, `aiohttp`, etc. Tests must be run with `poetry run pytest` inside Docker or an activated poetry environment.
The `backend/.venv` exists but appears to be a plain venv, not the poetry-managed one. **Tests are effectively unrunnable in local dev without explicit poetry activation.**
**Severity: CRITICAL** — Developers with system Python cannot run tests without explicit setup steps.
---
## 4. Build Configuration Issues
### ruff config deprecated (MEDIUM)
`pyproject.toml` uses `[tool.ruff]` top-level `select`, `ignore`, `per-file-ignores`. Current ruff ≥ 0.2 expects `[tool.ruff.lint]`. Fix:
```toml
# Before
[tool.ruff]
select = ["E", "W", ...]
ignore = ["E501", ...]
# After
[tool.ruff]
target-version = "py311"
line-length = 88
[tool.ruff.lint]
select = ["E", "W", ...]
ignore = ["E501", ...]
```
### Backend venv mismatch (MEDIUM)
`backend/.venv` cannot run `ruff`, `pytest`, or `mypy` — they are installed in the poetry-managed venv, not this one. Confusing to new devs.
### AGENTS.md commands incorrect (LOW)
`AGENTS.md` documents `cd backend && poetry run pytest` but the backend has `.venv` and `pyproject.toml` with no Makefile wrapper. The actual working path is `cd backend && .venv/bin/python -m pytest` or requires `poetry shell`.
---
## Summary
| Check | Result | Severity |
|-------|--------|---------|
| ruff backend | 1314 violations (auto-fixable) | HIGH |
| ESLint frontend | 36 problems | MEDIUM |
| tsc frontend | ✓ Clean | OK |
| mypy backend | Not runnable locally | LOW |
| vitest frontend | 13/75 failing | HIGH |
| pytest backend | Not runnable locally | CRITICAL |
| ruff config | Deprecated syntax | MEDIUM |
| venv setup | Confusing / broken | MEDIUM |

View file

@ -1,116 +0,0 @@
# Code Quality Audit — ln-624
**Score: 5.0/10** | Issues: 22 (C:2 H:8 M:9 L:3)
**Date:** 2026-04-30
---
## 1. God Classes / Files (> 500 lines)
| File | Lines | Severity |
|------|-------|---------|
| `backend/app/api/v1/routes_jobs.py` | 2882 | **CRITICAL** |
| `frontend/src/routes/admin/QCDetail.tsx` | 2079 | **CRITICAL** |
| `backend/app/services/video_renderer.py` | 1695 | **HIGH** |
| `frontend/src/routes/jobs/JobsList.tsx` | 1246 | **HIGH** |
| `frontend/src/lib/api.ts` | 1056 | **HIGH** |
| `backend/app/tasks/translate_and_synthesize.py` | 1019 | **HIGH** |
| `frontend/src/routes/jobs/NewJob.tsx` | 1038 | **HIGH** |
| `frontend/src/types/api.ts` | 891 | **MEDIUM** |
| `frontend/src/routes/jobs/JobDetail.tsx` | 732 | **MEDIUM** |
| `frontend/src/routes/admin/UserDetail.tsx` | 523 | **MEDIUM** |
| `frontend/src/hooks/useJobStatusWebSocket.ts` | 443 | **MEDIUM** |
**routes_jobs.py at 2882 lines** is the worst offender — it mixes upload, approval, translation, TTS, VTT editing, download, admin, and websocket concerns in a single router. Splitting by domain (e.g., `routes_upload.py`, `routes_vtt.py`, `routes_review.py`, `routes_tts.py`) would bring each under 500 lines.
**QCDetail.tsx at 2079 lines** handles the entire QC workflow, VTT display, audio preview, language selection, and approval modals in one component. Needs extraction of at minimum: `LanguageQCPanel`, `VttReviewView`, `ApprovalModal`.
---
## 2. Long Methods (> 100 lines)
| File:line | Function | Length | Severity |
|-----------|---------|--------|---------|
| `tasks/translate_and_synthesize.py:109` | `_async_translate_and_synthesize()` | 485 lines | **CRITICAL** |
| `services/video_renderer.py:487` | `_render_pause_insert_method()` | 419 lines | **CRITICAL** |
| `tasks/ingest_and_ai.py:53` | `ingest_and_ai_task_impl()` | 276 lines | **HIGH** |
| `tasks/rerender_accessible_video.py:110` | `_async_rerender_accessible_video()` | 280 lines | **HIGH** |
| `tasks/render_accessible_video.py:56` | `_async_render_accessible_video()` | 287 lines | **HIGH** |
| `api/v1/routes_jobs.py:1552` | `update_job_vtt_content()` | 215 lines | **HIGH** |
| `tasks/notify.py:29` | `run_async()` | 169 lines | **HIGH** |
| `api/v1/routes_jobs.py:2738` | `update_tts_preferences()` | 144 lines | **MEDIUM** |
| `services/whisper_service.py:241` | `_find_sentence_boundaries()` | 120 lines | **MEDIUM** |
| `services/gemini.py:591` | `analyze_accessible_video_placement()` | 132 lines | **MEDIUM** |
The two most critical ones (`_async_translate_and_synthesize` at 485 lines and `_render_pause_insert_method` at 419 lines) are orchestrator-style functions with sequential pipeline steps. They could be split into named pipeline stages, each ~50 lines.
---
## 3. Deep Nesting
Not systematically scanned with a tool (radon/lizard not installed). The long functions above likely contain 45+ nesting levels given their complexity.
---
## 4. Too Many Parameters
| Location | Function | Params | Severity |
|----------|---------|--------|---------|
| `services/gemini.py` | `extract_accessibility_targeted()` | 7+ | **MEDIUM** |
| `tasks/translate_and_synthesize.py` | `_generate_language_tts()` | 8+ | **MEDIUM** |
Pattern: many functions pass `db`, `job`, `language`, `settings`, `gcs_client`, etc. individually instead of grouping into a context dataclass.
---
## 5. Magic Numbers
### Backend (MEDIUM)
Scattered timing constants without named definitions:
- TTS retry delays (hardcoded seconds)
- chunk sizes in upload
- Audio padding values in video_renderer.py
### Frontend (LOW)
Mostly clean. Some inline pixel values in Tailwind (acceptable). No concerning business-logic magic numbers found.
---
## 6. N+1 Query Patterns (MEDIUM)
Potential N+1 patterns found:
- `app/main.py:102``async for job_doc in db.jobs.find(...)` — check if this iterates and makes additional queries per document
- `app/core/dependencies.py:185``async for m in db.memberships.find(...)` — membership lookup per request in auth middleware (acceptable if cached, but no caching observed)
- `app/core/authz.py:54``async for doc in db.memberships.find(...)` — similar pattern in auth check
These are all async iterators over `find()` — not necessarily N+1 if no nested DB calls, but should be reviewed for `.find()` calls inside the loop body.
---
## 7. Method Signature Quality
### Boolean flag parameters (MEDIUM)
Several async functions in tasks accept `bool` flags controlling behavior variants (e.g., `skip_tts`, `force_regenerate`). These should be enums or separate functions.
### Unclear return types (MEDIUM)
Some routes return `dict` or untyped responses instead of Pydantic response models. `routes_admin_production.py` has a few endpoints returning bare dicts.
---
## 8. Side-Effect Cascade Depth
`_async_translate_and_synthesize()` at 485 lines is the worst case: it writes to GCS, updates MongoDB, dispatches TTS tasks, sends notifications, and updates job status — 5+ distinct side-effect categories from a single function call. This warrants extraction into an orchestrator that delegates to named sink functions.
---
## Summary
| Check | Status | Severity |
|-------|--------|---------|
| God files (>500L) | 11 files | CRITICAL×2, HIGH×4 |
| Long methods (>100L) | 10 functions | CRITICAL×2, HIGH×5 |
| N+1 patterns | 3 potential | MEDIUM |
| Magic numbers | Some in tasks | MEDIUM |
| Method signatures | Boolean flags, unclear returns | MEDIUM |
| Side-effect cascade | translate_and_synthesize | HIGH |
**Primary recommendation:** Split `routes_jobs.py` and `QCDetail.tsx` — these two files account for the majority of the quality debt.

View file

@ -1,94 +0,0 @@
# Dependencies & Reuse Audit — ln-625
**Score: 7.5/10** | Issues: 9 (C:0 H:2 M:5 L:2)
**Date:** 2026-04-30
---
## 1. Vulnerability Scan (CVE/CVSS)
### Frontend — npm audit: ✓ CLEAN
```
Total packages: 479
Vulnerabilities: info:0 low:0 moderate:0 high:0 critical:0 total:0
```
Zero CVEs. Excellent.
### Backend — pip-audit: NOT RUN
`pip-audit` not installed in local env. Recommended to add to CI:
```bash
pip install pip-audit && pip-audit -r requirements.txt
```
Given many heavy deps (Celery 5.3, google-cloud-*, faster-whisper, aiohttp), a CI scan is strongly advised.
---
## 2. Outdated Packages
### Frontend — npm outdated (many minor/major updates pending)
**MAJOR version gaps (HIGH):**
| Package | Installed | Latest | Notes |
|---------|-----------|--------|-------|
| `@azure/msal-browser` | 4.25.0 | **5.9.0** | MSAL v5 has breaking API changes |
| `@azure/msal-react` | 3.0.20 | **5.3.2** | Paired with msal-browser, coordinated upgrade needed |
| `@sentry/react` | 8.55.0 | **10.51.0** | Sentry v10 has breaking changes |
| `typescript` | 5.8.3 | **6.0.3** | TS 6 has strictness changes |
| `vite` | 7.3.2 | **8.0.10** | Vite 8 breaking changes |
| `eslint` | 9.33.0 | **10.2.1** | ESLint 10 config format may change |
| `jsdom` | 26.1.0 | **29.1.1** | Test environment |
**Minor updates (LOW-MEDIUM):** Most other packages have minor/patch updates pending (react 19.1→19.2, tailwindcss 4.1→4.2, etc.)
**Recommendation:** Keep MSAL and Sentry on current major until dedicated upgrade sprint. React, TailwindCSS, react-query minor updates are safe to apply immediately.
### Backend — pip outdated: pip-audit not available
Based on pyproject.toml dates vs ecosystem:
- `ruff ^0.1.6` → installed ruff is `0.15.12` (already updated, good)
- `google-genai ^1.56.0` → recently updated per git log
- `faster-whisper ^1.2.0` → check for 1.x updates
---
## 3. Unused Dependencies
### Backend — `sendgrid` (MEDIUM)
`pyproject.toml` lists `sendgrid = "^6.11.0"`. However:
- The actual emailer (`app/services/emailer.py`) uses **Mailgun** REST API via `httpx`
- `sendgrid` is referenced **only** in `app/core/config.py` as a dead config field `sendgrid_api_key: str = ""` with comment `# Email (Mailgun — primary; sendgrid_api_key kept for backward compat)`
- No `import sendgrid` anywhere in app code
**Action:** Remove `sendgrid` from `pyproject.toml` dependencies and remove the `sendgrid_api_key` config field.
### Frontend — no unused dependencies found
- `axios` → used in `lib/api.ts`
- `@azure/msal-*` → used in `main.tsx`, `routes/Login.tsx`
- `date-fns` → used in 5+ components
- `zustand`, `@tanstack/react-query`, `react-hook-form`, `zod` → all actively used
- `react-dropzone` → used in upload components
---
## 4. Available Native Alternatives
### Frontend — axios vs fetch (LOW)
`axios` is used for all API calls in `lib/api.ts`. The project targets modern browsers and uses Vite. Native `fetch` + `AbortController` could replace axios, reducing bundle by ~14kb gzipped. However, axios provides request/response interceptors that are actively used for auth token refresh — migration effort is medium. **Not urgent.**
---
## 5. Custom Implementations
No custom crypto or hand-rolled validation libraries found. All auth uses `python-jose` + `libpass` (bcrypt). VTT parsing is domain-specific and not replaceable by a library. No concerns.
---
## Summary
| Check | Result | Severity |
|-------|--------|---------|
| Frontend CVEs | ✓ 0 vulnerabilities | OK |
| Backend CVEs | ⚠ Not scanned | MEDIUM |
| Frontend major updates | MSAL×2, Sentry, TS, Vite, ESLint | HIGH |
| Frontend minor updates | Many | LOW |
| Backend unused dep | `sendgrid` in pyproject.toml | MEDIUM |
| Native alternatives | axios → fetch possible | LOW |
| Custom implementations | None found | OK |

View file

@ -1,143 +0,0 @@
# Dead Code Audit — ln-626
**Score: 7.0/10** | Issues: 14 (C:0 H:0 M:6 L:8)
**Date:** 2026-04-30
---
## 1. Unused Imports (Python — F401)
ruff detected **58 unused import violations** across backend. Sample:
| File | Unused import |
|------|--------------|
| `routes_admin.py:9` | `get_current_user` |
| `routes_admin.py:11` | `verify_password` |
| `routes_admin.py:16` | `ChangePasswordRequest` |
| `routes_admin.py:23` | `log_security_event` |
| (+ 54 more across all files) | |
All are auto-fixable with `ruff check --fix --select F401`. The `__init__.py` files are correctly excluded via `per-file-ignores`.
**Severity: MEDIUM** — clutters imports, increases cognitive load when reading files.
---
## 2. Deprecated / Legacy Types (Frontend)
`frontend/src/types/api.ts` contains 3 deprecated exported types with JSDoc markers:
| Line | Type | Marker |
|------|------|--------|
| 96 | `TtsVoicesResponse` | `@deprecated Use ProviderVoicesResponse instead` |
| 137 | `TtsOptionsResponse` | `@deprecated Use ProviderOptionsResponse instead` |
| 555-566 | `Client` / `OrganizationLegacy` | `@deprecated Use Organization instead` + `export { Client as OrganizationLegacy }` |
These types are still exported, meaning consumers could use them by mistake. If no external consumers exist (library not published), they should be deleted.
**Severity: MEDIUM** — active deprecation markers indicate intent to remove. Leaving them causes confusion.
---
## 3. Legacy Status Values (Frontend)
`frontend/src/types/api.ts:12,14`:
```ts
| "tts_failed" // legacy: keep for back-compat
| "render_failed" // legacy: keep for back-compat
```
These job statuses are marked as legacy. If the backend no longer emits them, they are dead type branches. If it still does (for old jobs in MongoDB), they're valid — but should be clearly documented with a removal condition.
**Severity: LOW** — no runtime impact, but requires clarification.
---
## 4. Backward Compatibility Code (Frontend)
### lib/api.ts:239 — Legacy approval method (MEDIUM)
```ts
// Legacy method - calls approve_source for backwards compatibility
```
A backward-compat shim in the API client. If all callers have been updated to the new method, this should be removed.
### VideoWithCaptions.tsx:1643 — Legacy single-language props (MEDIUM)
```ts
// Legacy single-language props (still supported)
sourceLanguage?: string; // Language code for legacy props
// Legacy props
// Combine legacy props with tracks (use useMemo to prevent recreation)
```
The component maintains backward-compat with old single-language prop API. If no callers use these legacy props, they can be removed.
### JobDetail.tsx:41 — Legacy status mapping (LOW)
```ts
// Handle legacy approved_english/approved_source statuses (map to pending_final_review)
```
Status mapping shim for old job records. Should be removed after all existing jobs are migrated.
---
## 5. Commented-Out Code (Backend)
| File | Line | Content |
|------|------|---------|
| `telemetry/tracing.py:5` | `# from opentelemetry.exporter.gcp.trace import CloudTraceSpanExporter # Disabled for local dev` | GCP trace exporter disabled |
| `telemetry/metrics.py:5` | `# from opentelemetry.exporter.prometheus import PrometheusMetricReader # Disabled for local dev` | Prometheus reader disabled |
| `pyproject.toml` | `# opentelemetry-exporter-prometheus = ... # Temporarily disabled - version conflicts` | Dep commented out |
These are intentional (local dev vs prod config), not dead code. However, the conditional should be expressed via environment config, not source comments. **Low priority.**
**Severity: LOW**
---
## 6. Leftover .old Files (MEDIUM)
| File | Age | Action |
|------|-----|--------|
| `docker-compose.yml.old` | Created 2026-03-03 (~2 months) | Delete |
| `backend/Dockerfile.old` | Created 2026-03-03 (~2 months) | Delete |
| `backend/.dockerignore.old` | — | Delete |
These files have no build references. Git history preserves them.
---
## 7. Unused Dockerfiles
| File | Referenced in compose? |
|------|----------------------|
| `backend/Dockerfile.ffmpeg-service` | No — ffmpeg is embedded in main worker |
| `backend/Dockerfile.cloudrun` | Yes — referenced for Cloud Run deploys |
| `backend/Dockerfile.whisper-service` | Yes — whisper-worker service in compose |
`Dockerfile.ffmpeg-service` appears to be dead — the main Dockerfile handles ffmpeg. Should be confirmed and deleted if unused.
**Severity: LOW**
---
## 8. Dead Config Field
`backend/app/core/config.py:272`:
```python
sendgrid_api_key: str = "" # Email (Mailgun — primary; sendgrid_api_key kept for backward compat)
```
`sendgrid` package not used. Config field and `secrets_config.py` secret reference both dead.
**Severity: MEDIUM** — misleads ops into configuring a sendgrid secret that has no effect.
---
## Summary
| Check | Issues | Severity |
|-------|--------|---------|
| Unused Python imports | 58 (auto-fixable) | MEDIUM |
| Deprecated TS types | 3 types | MEDIUM |
| Backward-compat shims | 3 in frontend | MEDIUM |
| Commented-out code | 3 telemetry lines | LOW |
| .old files | 3 files | MEDIUM |
| Unused Dockerfile | Dockerfile.ffmpeg-service | LOW |
| Dead config field | sendgrid_api_key | MEDIUM |
| Legacy status values | 2 status strings | LOW |

View file

@ -1,97 +0,0 @@
# Accessible Video Processing Platform — Project Entry Point
<!-- SCOPE: root | owner: ln-111 | generated: 2026-04-29 -->
## What Is This Project
AI-powered SaaS platform that generates legally-required accessibility assets from video files: closed captions, audio descriptions, SDH captions, and descriptive transcripts. Outputs are reviewed through a human QC workflow before client delivery. 50+ language translation and cultural transcreation are built in.
**Client:** Oliver Internal
**Server:** optical-web-1
**Status:** 85% production-ready
---
## Quick Navigation
| Need | Go to |
|------|-------|
| Architecture, data flow, state machine | [docs/project/architecture.md](docs/project/architecture.md) |
| Tech stack versions and config | [docs/project/tech_stack.md](docs/project/tech_stack.md) |
| API endpoint reference | [docs/project/api_spec.md](docs/project/api_spec.md) |
| Database collections and indexes | [docs/project/database_schema.md](docs/project/database_schema.md) |
| Infrastructure inventory | [docs/project/infrastructure.md](docs/project/infrastructure.md) |
| Runbook — deploy, restart, rollback | [docs/project/runbook.md](docs/project/runbook.md) |
| Functional requirements | [docs/project/requirements.md](docs/project/requirements.md) |
| Development principles | [docs/principles.md](docs/principles.md) |
| Reference — ADRs, guides, research | [docs/reference/README.md](docs/reference/README.md) |
| Task management | [docs/tasks/README.md](docs/tasks/README.md) |
| Test strategy and commands | [tests/README.md](tests/README.md) |
| Documentation hub | [docs/README.md](docs/README.md) |
---
## Entry Points by Audience
| Audience | Start here |
|----------|-----------|
| New developer | [docs/project/runbook.md](docs/project/runbook.md) → local setup section |
| Reviewer / QC | [docs/project/requirements.md](docs/project/requirements.md) → QC workflow section |
| DevOps | [docs/project/infrastructure.md](docs/project/infrastructure.md) + [docs/project/runbook.md](docs/project/runbook.md) |
| Security reviewer | [docs/project/architecture.md](docs/project/architecture.md) → security section |
| AI agent | Read this file → pick topic → read `_index`-equivalent doc → synthesize |
---
## Core Pipeline (one-line summary per stage)
| Stage | What happens | Key file |
|-------|-------------|---------|
| Upload | MP4 → GCS + MongoDB job record | `routes_files.py` |
| Ingestion | Celery worker transcribes with Gemini 2.5 Pro | `tasks/ingest_and_ai.py` |
| AI Processing | VTT generated, validated, stored in GCS | `services/gemini.py` |
| QC Review | Reviewer edits VTT, approves or rejects | `services/language_qc.py` |
| Translation | Google Translate + transcreation per language | `tasks/translate_and_synthesize.py` |
| TTS | Per-cue audio synthesis (Google TTS / ElevenLabs) | `services/tts.py` |
| Final Review | PM approves deliverables | `routes_language_qc.py` |
| Delivery | Signed GCS URLs emailed to client | `services/emailer.py` |
See full state machine (16 states) in [docs/project/architecture.md](docs/project/architecture.md#job-state-machine).
---
## Development Commands
| Action | Command |
|--------|---------|
| Start local (Docker + Vite) | `./scripts/run-local.sh` |
| Rebuild after code change | `./scripts/run-local.sh --rebuild` |
| Stop all local services | `./scripts/run-local.sh --stop` |
| Backend lint | `cd backend && ruff check .` |
| Backend type-check | `cd backend && mypy .` (run in Docker container) |
| Frontend lint | `cd frontend && npm run lint` |
| Frontend type-check | `cd frontend && npm run type-check` |
| Backend tests | `cd backend && poetry run pytest` |
| Frontend tests | `cd frontend && npm run test` |
| E2E tests | `cd frontend && npm run test:e2e` |
---
## Key Constraints
- **NO SSH to optical-web-1** without explicit user instruction — hard rule in CLAUDE.md
- **Access tokens in memory only** (not localStorage) — auth architecture constraint
- **Refresh tokens in HttpOnly cookies** — security requirement
- **Signed GCS URLs** expire in 24h — do not cache or store URLs
- **RBAC enforced server-side** — never trust client-supplied role claims
- **All reviewer actions emit audit log entries** — compliance requirement
---
## Maintenance
**Update triggers:** New route added, deployment target changes, key dependency version change, new team member onboarded.
**Verification:** All links in Quick Navigation resolve. Entry commands are correct against current scripts/.
<!-- END SCOPE: root -->

View file

@ -1,8 +1,5 @@
# Accessible Video Processing Platform - Development Guide # Accessible Video Processing Platform - Development Guide
<!-- Documentation entry point: see @AGENTS.md for full project navigation -->
@AGENTS.md
## Project Overview ## Project Overview
This is a comprehensive video accessibility platform that automatically generates closed captions and audio descriptions using AI, with quality control workflows and multi-language support. This is a comprehensive video accessibility platform that automatically generates closed captions and audio descriptions using AI, with quality control workflows and multi-language support.

Binary file not shown.

View file

@ -2,8 +2,6 @@
A comprehensive AI-powered platform for generating accessible video content with closed captions, audio descriptions, and multi-language translations. Features a complete workflow from video upload to final delivery with quality control processes. A comprehensive AI-powered platform for generating accessible video content with closed captions, audio descriptions, and multi-language translations. Features a complete workflow from video upload to final delivery with quality control processes.
**Documentation:** See [AGENTS.md](AGENTS.md) for full navigation, or [docs/README.md](docs/README.md) for the documentation hub.
## ✅ Current Status: **Production-Ready** (85% Complete) ## ✅ Current Status: **Production-Ready** (85% Complete)
**Lines of Code:** 20,471 total (12,198 backend + 8,273 frontend) **Lines of Code:** 20,471 total (12,198 backend + 8,273 frontend)

View file

@ -1,96 +1,172 @@
# ============================================================================= # =============================================================================
# Apache config fragment — Accessible Video Platform # Apache Configuration for Accessible Video Platform
# Inject into: /etc/apache2/sites-available/optical-dev.oliver.solutions-ssl.conf # =============================================================================
# # Add this configuration to your existing VirtualHost for ai-sandbox.oliver.solutions
# Required modules: # Location: /etc/apache2/sites-available/ai-sandbox.oliver.solutions-ssl.conf
# sudo a2enmod proxy proxy_http proxy_wstunnel rewrite headers
#
# Container port map:
# accessible-video-api → 0.0.0.0:8012->8000/tcp
# ============================================================================= # =============================================================================
# ── Timeouts for large video uploads (up to 2 GB, ~10 min) ────────────────── # -----------------------------------------------------------------------------
<IfModule mod_proxy.c> # Frontend - Static React SPA served from subdirectory
ProxyTimeout 600 # -----------------------------------------------------------------------------
</IfModule>
# ── WebSocket proxy (MUST be before /api/ HTTP proxy) ─────────────────────── # Serve frontend from /video-accessibility subdirectory
# disablereuse=on prevents long-lived WS connections from exhausting the pool
ProxyPassMatch ^/video-accessibility/api/v1/ws/(.*)$ ws://127.0.0.1:8012/api/v1/ws/$1 disablereuse=on
ProxyPassReverse /video-accessibility/api/v1/ws/ ws://127.0.0.1:8012/api/v1/ws/
# ── API proxy ────────────────────────────────────────────────────────────────
# Strips /video-accessibility prefix — FastAPI sees /api/v1/...
ProxyPassMatch ^/video-accessibility/api/(.*)$ http://127.0.0.1:8012/api/$1
ProxyPassReverse /video-accessibility/api/ http://127.0.0.1:8012/api/
# Swagger / OpenAPI
ProxyPassMatch ^/video-accessibility/docs(/.*)?$ http://127.0.0.1:8012/docs$1
ProxyPassReverse /video-accessibility/docs http://127.0.0.1:8012/docs
ProxyPassMatch ^/video-accessibility/openapi\.json$ http://127.0.0.1:8012/openapi.json
ProxyPassReverse /video-accessibility/openapi.json http://127.0.0.1:8012/openapi.json
# ── SPA static files ─────────────────────────────────────────────────────────
Alias /video-accessibility /var/www/html/video-accessibility Alias /video-accessibility /var/www/html/video-accessibility
<Directory /var/www/html/video-accessibility> <Directory /var/www/html/video-accessibility>
# Basic options
Options -Indexes +FollowSymLinks Options -Indexes +FollowSymLinks
AllowOverride None AllowOverride All
Require all granted Require all granted
# Allow video uploads up to 2 GB # React SPA routing - rewrite all requests to index.html
LimitRequestBody 2147483648
RewriteEngine On RewriteEngine On
RewriteBase /video-accessibility/ RewriteBase /video-accessibility
# Serve real files/directories directly (JS, CSS, assets, fonts) # Don't rewrite files or directories that exist
RewriteCond %{REQUEST_FILENAME} -f [OR] RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} -d RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ - [L]
# Everything else → index.html (React Router handles client-side nav) # Rewrite everything else to index.html
RewriteRule ^ index.html [L] RewriteRule ^ /video-accessibility/index.html [L]
# Cache-bust hashed assets indefinitely; never cache HTML
<FilesMatch "\.(js|css|woff2?|ttf|eot|png|jpg|jpeg|gif|ico|svg)$">
Header set Cache-Control "public, max-age=31536000, immutable"
</FilesMatch>
<FilesMatch "\.html$">
Header set Cache-Control "no-cache, no-store, must-revalidate"
</FilesMatch>
# Security headers # Security headers
Header always set X-Frame-Options "SAMEORIGIN" Header always set X-Frame-Options "SAMEORIGIN"
Header always set X-Content-Type-Options "nosniff" Header always set X-Content-Type-Options "nosniff"
Header always set X-XSS-Protection "1; mode=block"
Header always set Referrer-Policy "strict-origin-when-cross-origin" Header always set Referrer-Policy "strict-origin-when-cross-origin"
# Cache control for static assets
<FilesMatch "\.(js|css|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$">
Header set Cache-Control "public, max-age=31536000, immutable"
</FilesMatch>
# No cache for HTML files
<FilesMatch "\.(html)$">
Header set Cache-Control "no-cache, no-store, must-revalidate"
Header set Pragma "no-cache"
Header set Expires "0"
</FilesMatch>
</Directory> </Directory>
# -----------------------------------------------------------------------------
# Backend API - Reverse proxy to Docker container
# -----------------------------------------------------------------------------
# Proxy backend API to Docker container on port 8000
<Location /video-accessibility-back>
# Preserve original host header
ProxyPreserveHost On
# Proxy HTTP requests
ProxyPass http://localhost:8000
ProxyPassReverse http://localhost:8000
# Proxy timeout settings (important for long-running video processing)
ProxyTimeout 300
# WebSocket support (CRITICAL for real-time job updates)
RewriteEngine On
RewriteCond %{HTTP:Upgrade} =websocket [NC]
RewriteRule /video-accessibility-back/(.*) ws://localhost:8000/$1 [P,L]
RewriteCond %{HTTP:Upgrade} !=websocket [NC]
RewriteRule /video-accessibility-back/(.*) http://localhost:8000/$1 [P,L]
# Security headers
Header always set X-Frame-Options "SAMEORIGIN"
Header always set X-Content-Type-Options "nosniff"
# CORS is handled by the backend, don't add headers here
</Location>
# -----------------------------------------------------------------------------
# Required Apache Modules
# -----------------------------------------------------------------------------
# Enable these modules with:
# sudo a2enmod rewrite
# sudo a2enmod proxy
# sudo a2enmod proxy_http
# sudo a2enmod proxy_wstunnel
# sudo a2enmod headers
# sudo systemctl restart apache2
# Verify modules are enabled:
# apache2ctl -M | grep -E '(rewrite|proxy|headers)'
# ============================================================================= # =============================================================================
# Full VirtualHost skeleton (reference — values match optical-web-1) # Full VirtualHost Example
# ============================================================================= # =============================================================================
# Example of complete VirtualHost configuration:
# #
# <VirtualHost *:443> # <VirtualHost *:443>
# ServerName optical-dev.oliver.solutions # ServerName ai-sandbox.oliver.solutions
# ServerAdmin admin@oliver.solutions
#
# DocumentRoot /var/www/html # DocumentRoot /var/www/html
# #
# # SSL Configuration (with wildcard cert)
# SSLEngine on # SSLEngine on
# SSLCertificateFile /path/to/wildcard.crt # SSLCertificateFile /path/to/wildcard-ai-sandbox.oliver.solutions.crt
# SSLCertificateKeyFile /path/to/wildcard.key # SSLCertificateKeyFile /path/to/wildcard-ai-sandbox.oliver.solutions.key
# SSLCertificateChainFile /path/to/chain.crt # If needed
# #
# SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1 # # SSL Protocol and Cipher settings
# SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1
# SSLCipherSuite HIGH:!aNULL:!MD5 # SSLCipherSuite HIGH:!aNULL:!MD5
# #
# # — paste the block above here — # # Frontend configuration (from above)
# Alias /video-accessibility /var/www/html/video-accessibility
# <Directory /var/www/html/video-accessibility>
# ...
# </Directory>
# #
# ErrorLog ${APACHE_LOG_DIR}/optical-dev-error.log # # Backend API configuration (from above)
# CustomLog ${APACHE_LOG_DIR}/optical-dev-access.log combined # <Location /video-accessibility-back>
# ...
# </Location>
#
# # Logging
# ErrorLog ${APACHE_LOG_DIR}/ai-sandbox-error.log
# CustomLog ${APACHE_LOG_DIR}/ai-sandbox-access.log combined
# </VirtualHost> # </VirtualHost>
# ============================================================================= # =============================================================================
# Verify # Testing & Verification
# ============================================================================= # =============================================================================
# sudo apache2ctl configtest
# sudo systemctl reload apache2 # Test Apache configuration:
# curl -I https://optical-dev.oliver.solutions/video-accessibility/ # sudo apache2ctl configtest
# curl https://optical-dev.oliver.solutions/video-accessibility/api/v1/health #
# wscat -c wss://optical-dev.oliver.solutions/video-accessibility/api/v1/ws/job-list # Restart Apache:
# sudo systemctl restart apache2
#
# Test frontend:
# curl -I https://ai-sandbox.oliver.solutions/video-accessibility
#
# Test backend:
# curl https://ai-sandbox.oliver.solutions/video-accessibility-back/health
#
# Test WebSocket (requires wscat):
# wscat -c wss://ai-sandbox.oliver.solutions/video-accessibility-back/api/v1/ws/job-list
# =============================================================================
# Troubleshooting
# =============================================================================
# Check Apache logs:
# sudo tail -f /var/log/apache2/ai-sandbox-error.log
# sudo tail -f /var/log/apache2/ai-sandbox-access.log
#
# Check if backend is running:
# curl http://localhost:8000/health
#
# Check Docker containers:
# cd /opt/accessible-video
# docker-compose ps
#
# Common issues:
# - 502 Bad Gateway: Backend container not running
# - 404 Not Found: Frontend not deployed or Apache alias incorrect
# - WebSocket fails: mod_proxy_wstunnel not enabled
# - CORS errors: Check backend CORS configuration, not Apache

92
backend/.dockerignore.old Normal file
View file

@ -0,0 +1,92 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# Poetry (keep poetry.lock for reproducible builds)
# poetry.lock
# Virtual environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# OS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
# Testing
.coverage
.pytest_cache/
.mypy_cache/
.tox/
htmlcov/
coverage.xml
*.cover
.hypothesis/
# Documentation
docs/
*.md
README*
# Logs
*.log
logs/
# Git
.git/
.gitignore
# Docker
Dockerfile*
.dockerignore
docker-compose*
# CI/CD
.github/
# Local development
.env.local
.env.development
.env.test
# Temporary files
tmp/
temp/
*.tmp
*.bak

1
backend/.gitignore vendored
View file

@ -23,7 +23,6 @@ eggs/
.eggs/ .eggs/
lib/ lib/
lib64/ lib64/
!app/lib/
parts/ parts/
sdist/ sdist/
var/ var/

View file

@ -3,8 +3,8 @@
# ============================================================================= # =============================================================================
# Stage 1: Builder - Install dependencies # Stage 1: Builder - Install dependencies
# Stage 2: Base - Common runtime for API and Worker # Stage 2: Base - Common runtime for API and Worker
# Stage 3: API - FastAPI + Gunicorn (no ffmpeg — heavy tasks run on Cloud Run Jobs) # Stage 3: API - FastAPI + Gunicorn (with ffmpeg for TTS audio conversion)
# Stage 4: Worker - Celery worker, lightweight queues only (notify, embed) # Stage 4: Worker - Celery worker (with ffmpeg for video processing)
# ============================================================================= # =============================================================================
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
@ -19,7 +19,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/* && rm -rf /var/lib/apt/lists/*
# Install Poetry # Install Poetry
RUN pip install --no-cache-dir poetry==2.1.4 RUN pip install --no-cache-dir poetry==1.8.2
# Configure Poetry to not create virtual environment (we're in a container) # Configure Poetry to not create virtual environment (we're in a container)
ENV POETRY_NO_INTERACTION=1 \ ENV POETRY_NO_INTERACTION=1 \
@ -33,7 +33,7 @@ COPY pyproject.toml poetry.lock ./
# Install dependencies using Poetry directly (simpler and more reliable) # Install dependencies using Poetry directly (simpler and more reliable)
RUN poetry config virtualenvs.create false \ RUN poetry config virtualenvs.create false \
&& poetry install --only main --no-root --no-interaction --no-ansi \ && poetry install --only main --no-interaction --no-ansi \
&& rm -rf $POETRY_CACHE_DIR && rm -rf $POETRY_CACHE_DIR
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
@ -46,7 +46,6 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
libmagic1 \ libmagic1 \
curl \ curl \
tini \ tini \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \ && rm -rf /var/lib/apt/lists/* \
&& apt-get clean && apt-get clean
@ -73,10 +72,21 @@ USER app
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Stage 3: API - FastAPI + Gunicorn (Production API Server) # Stage 3: API - FastAPI + Gunicorn (Production API Server)
# Heavy pipeline tasks (ingest/translate/render) run on Cloud Run Jobs
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
FROM base AS api FROM base AS api
# Switch to root to install ffmpeg
USER root
# Install ffmpeg for TTS audio conversion
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Switch back to non-root user
USER app
# Set production environment variables # Set production environment variables
ENV APP_ENV=prod ENV APP_ENV=prod
@ -94,10 +104,22 @@ ENTRYPOINT ["tini", "--"]
CMD ["gunicorn", "-c", "gunicorn_conf.py", "app.main:app"] CMD ["gunicorn", "-c", "gunicorn_conf.py", "app.main:app"]
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Stage 4: Worker - Celery Worker (lightweight queues: notify, embed) # Stage 4: Worker - Celery Worker (with ffmpeg for video processing)
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
FROM base AS worker FROM base AS worker
# Switch back to root to install ffmpeg
USER root
# Install ffmpeg for video processing
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Switch back to non-root user
USER app
# Set production environment variables # Set production environment variables
# WORKER_CONCURRENCY can be overridden at runtime (default: 8) # WORKER_CONCURRENCY can be overridden at runtime (default: 8)
ENV APP_ENV=prod \ ENV APP_ENV=prod \
@ -126,6 +148,18 @@ CMD celery -A celery_worker worker \
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
FROM base AS whisper-worker FROM base AS whisper-worker
# Switch back to root to install ffmpeg
USER root
# Install ffmpeg for audio extraction
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Switch back to non-root user
USER app
# Pre-download Whisper medium model during build to avoid cold start delays # Pre-download Whisper medium model during build to avoid cold start delays
# Model is cached in ~/.cache/huggingface/hub (~1.5GB) # Model is cached in ~/.cache/huggingface/hub (~1.5GB)
RUN python -c "from faster_whisper import WhisperModel; WhisperModel('medium', device='cpu', compute_type='int8')" RUN python -c "from faster_whisper import WhisperModel; WhisperModel('medium', device='cpu', compute_type='int8')"

View file

@ -1,55 +0,0 @@
# =============================================================================
# Cloud Run Job image — va-worker
#
# Reuses the multi-stage base from Dockerfile.
# Entrypoint: python -m app.tasks.runner --task <name> --job-id <id>
#
# Build:
# docker build -f backend/Dockerfile.cloudrun -t va-worker backend/
# =============================================================================
# ── Stage 1: Builder ─────────────────────────────────────────────────────────
FROM python:3.11-slim AS builder
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential curl \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir poetry==1.8.3
WORKDIR /app
COPY pyproject.toml poetry.lock ./
RUN poetry config virtualenvs.create false \
&& poetry install --no-interaction --no-ansi --only main
# ── Stage 2: Runtime ─────────────────────────────────────────────────────────
FROM python:3.11-slim AS runtime
# ffmpeg required for video rendering tasks
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
tini \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy installed packages from builder
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
WORKDIR /app
COPY . .
# Non-root user for security
RUN groupadd -r worker && useradd -r -g worker worker \
&& chown -R worker:worker /app
USER worker
# Cloud Run Jobs: no persistent HTTP port needed.
# Cloud Run passes CLOUD_RUN_TASK_INDEX and CLOUD_RUN_TASK_COUNT env vars.
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONPATH=/app
ENTRYPOINT ["tini", "--", "python", "-m", "app.tasks.runner"]
# Args are injected per-execution via Cloud Run Job overrides:
# --task ingest|translate|render|rerender --job-id <id> [--language <lang>] ...

127
backend/Dockerfile.old Normal file
View file

@ -0,0 +1,127 @@
# Build stage - Install dependencies and build wheels
FROM python:3.11-slim AS builder
# Install build dependencies
RUN apt-get update && apt-get install -y \
build-essential \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install Poetry
RUN pip install poetry==1.8.2
# Set Poetry configuration
ENV POETRY_NO_INTERACTION=1 \
POETRY_VENV_IN_PROJECT=1 \
POETRY_CACHE_DIR=/tmp/poetry_cache
WORKDIR /app
# Copy dependency files
COPY pyproject.toml poetry.lock ./
# Install dependencies into venv
RUN poetry config virtualenvs.in-project true && \
poetry lock --no-update || true && \
poetry install --only=main --no-root && \
rm -rf $POETRY_CACHE_DIR
# Base runtime stage
FROM python:3.11-slim AS base
# Install runtime system dependencies
RUN apt-get update && apt-get install -y \
ffmpeg \
curl \
tini \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Create non-root user
RUN groupadd --gid 1000 app \
&& useradd --uid 1000 --gid app --shell /bin/bash --create-home app
# Set working directory
WORKDIR /app
# Copy virtual environment from builder stage
COPY --from=builder --chown=app:app /app/.venv /app/.venv
# Ensure venv is in PATH
ENV PATH="/app/.venv/bin:$PATH"
# Copy application code
COPY --chown=app:app . .
# Switch to non-root user
USER app
# Production API stage
FROM base AS production
# Set environment variables for production
ENV APP_ENV=prod \
PYTHONPATH=/app \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Expose port
EXPOSE 8000
# Use tini as init system for proper signal handling
ENTRYPOINT ["tini", "--"]
# Default command for API server
CMD ["gunicorn", "-c", "gunicorn_conf.py"]
# Worker stage for Celery workers
FROM base AS worker
# Set environment variables for worker
ENV APP_ENV=prod \
PYTHONPATH=/app \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
C_FORCE_ROOT=1
# Health check for worker (check if Celery is responding)
HEALTHCHECK --interval=60s --timeout=15s --start-period=10s --retries=3 \
CMD python -c "from celery import Celery; app=Celery('app'); print('Worker healthy')" || exit 1
# Use tini as init system for proper signal handling
ENTRYPOINT ["tini", "--"]
# Default command for Celery worker
CMD ["celery", "-A", "app.tasks", "worker", "--loglevel=info", "--concurrency=1"]
# Development stage with dev dependencies
FROM builder AS development
# Install all dependencies including dev
RUN poetry install --no-root && rm -rf $POETRY_CACHE_DIR
# Install additional dev tools
RUN apt-get update && apt-get install -y \
git \
vim \
&& rm -rf /var/lib/apt/lists/*
# Copy application code
COPY --chown=app:app . .
# Switch to non-root user
USER app
# Set environment for development
ENV APP_ENV=dev \
PYTHONPATH=/app \
PYTHONUNBUFFERED=1
EXPOSE 8000
# Development command with hot reload
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]

View file

@ -22,7 +22,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/* && rm -rf /var/lib/apt/lists/*
# Install Poetry # Install Poetry
RUN pip install --no-cache-dir poetry==2.1.4 RUN pip install --no-cache-dir poetry==1.8.2
# Configure Poetry to not create virtual environment # Configure Poetry to not create virtual environment
ENV POETRY_NO_INTERACTION=1 \ ENV POETRY_NO_INTERACTION=1 \
@ -36,7 +36,7 @@ COPY pyproject.toml poetry.lock ./
# Install dependencies # Install dependencies
RUN poetry config virtualenvs.create false \ RUN poetry config virtualenvs.create false \
&& poetry install --only main --no-root --no-interaction --no-ansi \ && poetry install --only main --no-interaction --no-ansi \
&& rm -rf $POETRY_CACHE_DIR && rm -rf $POETRY_CACHE_DIR
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------

Binary file not shown.

Binary file not shown.

View file

@ -1,28 +1,26 @@
from datetime import datetime, timedelta from datetime import datetime, timedelta
from typing import Optional
from bson import ObjectId from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException, Query, Request, status from fastapi import APIRouter, Depends, HTTPException, Query, Request, status
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, get_membership_context
from ...core.database import get_database from ...core.database import get_database
from ...core.dependencies import get_current_user, require_roles from ...core.dependencies import get_current_user, require_roles
from ...core.logging import get_logger from ...core.logging import get_logger
from ...core.security import get_password_hash from ...core.security import get_password_hash, verify_password
from ...models.audit_log import AuditAction, AuditLogQuery, AuditLogResponse
from ...models.user import User, UserRole from ...models.user import User, UserRole
from ...models.audit_log import AuditAction, AuditLogQuery, AuditLogResponse
from ...schemas.auth import ( from ...schemas.auth import (
AdminStatsResponse, AdminStatsResponse,
ChangePasswordRequest,
CreateUserRequest, CreateUserRequest,
ResetPasswordRequest, ResetPasswordRequest,
UpdateUserRequest, UpdateUserRequest,
UserListResponse, UserListResponse,
UserResponse, UserResponse,
) )
from ...services.audit_logger import ( from ...services.audit_logger import audit_logger, log_user_management, log_security_event
audit_logger,
log_user_management,
)
from ...telemetry import app_metrics from ...telemetry import app_metrics
logger = get_logger(__name__) logger = get_logger(__name__)
@ -32,49 +30,29 @@ router = APIRouter(prefix="/admin", tags=["admin"])
@router.get("/users", response_model=UserListResponse) @router.get("/users", response_model=UserListResponse)
async def list_users( async def list_users(
page: int = Query(1, ge=1), page: int = Query(1, ge=1),
size: int = Query(20, ge=1, le=500), size: int = Query(20, ge=1, le=100),
role: str | None = Query(None, description="Single role or comma-separated list, e.g. 'linguist,admin'"), role: Optional[str] = Query(None),
active_only: bool = Query(True), active_only: bool = Query(True),
org_id: str | None = Query(None, description="Filter by org (platform admin only)"),
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""List users with filtering and pagination (admin only)""" """List users with filtering and pagination (admin only)"""
query: dict = {} query = {}
if role: if role:
roles = [r.strip() for r in role.split(",") if r.strip()] query["role"] = role
query["role"] = {"$in": roles} if len(roles) > 1 else roles[0]
if active_only: if active_only:
query["is_active"] = True query["is_active"] = True
if not ctx.is_platform_admin:
# Org-scoped admin: show only users in their org(s) via membership collection
accessible_org_ids = ctx.accessible_org_ids()
if not accessible_org_ids:
return UserListResponse(users=[], total=0, page=page, size=size)
member_ids_cursor = db.memberships.find(
{"organization_id": {"$in": accessible_org_ids}},
{"user_id": 1},
)
member_ids = [doc["user_id"] async for doc in member_ids_cursor]
query["_id"] = {"$in": member_ids}
elif org_id:
# Platform admin filtered to a specific org
member_ids_cursor = db.memberships.find({"organization_id": org_id}, {"user_id": 1})
member_ids = [doc["user_id"] async for doc in member_ids_cursor]
query["_id"] = {"$in": member_ids}
# Get total count # Get total count
total = await db.users.count_documents(query) total = await db.users.count_documents(query)
# Get paginated results # Get paginated results
skip = (page - 1) * size skip = (page - 1) * size
cursor = db.users.find(query, {"hashed_password": 0}).sort("created_at", -1).skip(skip).limit(size) cursor = db.users.find(query, {"hashed_password": 0}).sort("created_at", -1).skip(skip).limit(size)
users = await cursor.to_list(length=size) users = await cursor.to_list(length=size)
user_responses = [] user_responses = []
for user_doc in users: for user_doc in users:
user_responses.append(UserResponse( user_responses.append(UserResponse(
@ -86,9 +64,8 @@ async def list_users(
is_active=user_doc["is_active"], is_active=user_doc["is_active"],
created_at=user_doc.get("created_at", datetime.utcnow()).isoformat(), created_at=user_doc.get("created_at", datetime.utcnow()).isoformat(),
pm_client_ids=user_doc.get("pm_client_ids", []), pm_client_ids=user_doc.get("pm_client_ids", []),
languages=user_doc.get("languages", []),
)) ))
return UserListResponse( return UserListResponse(
users=user_responses, users=user_responses,
total=total, total=total,
@ -97,32 +74,6 @@ async def list_users(
) )
@router.get("/brief-assignees", response_model=list[UserResponse])
async def list_brief_assignees(
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Return users who can be assigned a brief (PM, production, admin). Accessible to all brief-creating roles."""
docs = await db.users.find(
{
"role": {"$in": [UserRole.ADMIN.value, UserRole.PROJECT_MANAGER.value, UserRole.PRODUCTION.value]},
"is_active": True,
},
{"hashed_password": 0},
).sort("full_name", 1).to_list(None)
return [UserResponse(
id=str(d["_id"]),
email=d["email"],
full_name=d["full_name"],
role=d["role"],
auth_provider=d.get("auth_provider", "local"),
is_active=d["is_active"],
created_at=d.get("created_at", datetime.utcnow()).isoformat() if d.get("created_at") else None,
pm_client_ids=d.get("pm_client_ids", []),
languages=d.get("languages", []),
) for d in docs]
@router.get("/users/{user_id}", response_model=UserResponse) @router.get("/users/{user_id}", response_model=UserResponse)
async def get_user( async def get_user(
user_id: str, user_id: str,
@ -136,7 +87,7 @@ async def get_user(
status_code=status.HTTP_404_NOT_FOUND, status_code=status.HTTP_404_NOT_FOUND,
detail="User not found" detail="User not found"
) )
return UserResponse( return UserResponse(
id=str(user_doc["_id"]), id=str(user_doc["_id"]),
email=user_doc["email"], email=user_doc["email"],
@ -146,7 +97,6 @@ async def get_user(
is_active=user_doc["is_active"], is_active=user_doc["is_active"],
created_at=user_doc.get("created_at", datetime.utcnow()).isoformat(), created_at=user_doc.get("created_at", datetime.utcnow()).isoformat(),
pm_client_ids=user_doc.get("pm_client_ids", []), pm_client_ids=user_doc.get("pm_client_ids", []),
languages=user_doc.get("languages", []),
) )
@ -165,7 +115,7 @@ async def create_user(
status_code=status.HTTP_400_BAD_REQUEST, status_code=status.HTTP_400_BAD_REQUEST,
detail="User with this email already exists" detail="User with this email already exists"
) )
# Create user document # Create user document
user_id = str(ObjectId()) user_id = str(ObjectId())
user_doc = { user_doc = {
@ -179,12 +129,12 @@ async def create_user(
"created_at": datetime.utcnow(), "created_at": datetime.utcnow(),
"updated_at": datetime.utcnow() "updated_at": datetime.utcnow()
} }
await db.users.insert_one(user_doc) await db.users.insert_one(user_doc)
# Record metrics # Record metrics
app_metrics.record_auth_attempt("user_created", user_data.role.value) app_metrics.record_auth_attempt("user_created", user_data.role.value)
logger.info(f"Admin {current_user.id} created user {user_id} with role {user_data.role.value}") logger.info(f"Admin {current_user.id} created user {user_id} with role {user_data.role.value}")
await log_user_management( await log_user_management(
AuditAction.USER_CREATE, user_id, current_user, request, AuditAction.USER_CREATE, user_id, current_user, request,
@ -200,7 +150,6 @@ async def create_user(
is_active=True, is_active=True,
created_at=user_doc["created_at"].isoformat(), created_at=user_doc["created_at"].isoformat(),
pm_client_ids=[], pm_client_ids=[],
languages=[],
) )
@ -220,7 +169,7 @@ async def update_user(
status_code=status.HTTP_404_NOT_FOUND, status_code=status.HTTP_404_NOT_FOUND,
detail="User not found" detail="User not found"
) )
# Check if email is being changed and doesn't conflict # Check if email is being changed and doesn't conflict
if user_update.email and user_update.email != user_doc["email"]: if user_update.email and user_update.email != user_doc["email"]:
existing_user = await db.users.find_one({"email": user_update.email, "_id": {"$ne": user_id}}) existing_user = await db.users.find_one({"email": user_update.email, "_id": {"$ne": user_id}})
@ -229,10 +178,10 @@ async def update_user(
status_code=status.HTTP_400_BAD_REQUEST, status_code=status.HTTP_400_BAD_REQUEST,
detail="Email already in use by another user" detail="Email already in use by another user"
) )
# Build update document # Build update document
update_data = {"updated_at": datetime.utcnow()} update_data = {"updated_at": datetime.utcnow()}
if user_update.email: if user_update.email:
update_data["email"] = user_update.email update_data["email"] = user_update.email
if user_update.full_name: if user_update.full_name:
@ -241,19 +190,19 @@ async def update_user(
update_data["role"] = user_update.role.value update_data["role"] = user_update.role.value
if user_update.is_active is not None: if user_update.is_active is not None:
update_data["is_active"] = user_update.is_active update_data["is_active"] = user_update.is_active
# Update user # Update user
result = await db.users.find_one_and_update( result = await db.users.find_one_and_update(
{"_id": user_id}, {"_id": user_id},
{"$set": update_data}, {"$set": update_data},
return_document=True return_document=True
) )
logger.info(f"Admin {current_user.id} updated user {user_id}") logger.info(f"Admin {current_user.id} updated user {user_id}")
action = AuditAction.USER_ROLE_CHANGE if user_update.role else AuditAction.USER_UPDATE action = AuditAction.USER_ROLE_CHANGE if user_update.role else AuditAction.USER_UPDATE
await log_user_management( await log_user_management(
action, user_id, current_user, request, action, user_id, current_user, request,
details=dict(user_update.dict(exclude_none=True).items()), details={k: v for k, v in user_update.dict(exclude_none=True).items()},
) )
return UserResponse( return UserResponse(
@ -265,7 +214,6 @@ async def update_user(
is_active=result["is_active"], is_active=result["is_active"],
created_at=result.get("created_at", datetime.utcnow()).isoformat(), created_at=result.get("created_at", datetime.utcnow()).isoformat(),
pm_client_ids=result.get("pm_client_ids", []), pm_client_ids=result.get("pm_client_ids", []),
languages=result.get("languages", []),
) )
@ -282,7 +230,7 @@ async def deactivate_user(
status_code=status.HTTP_400_BAD_REQUEST, status_code=status.HTTP_400_BAD_REQUEST,
detail="Cannot deactivate your own account" detail="Cannot deactivate your own account"
) )
result = await db.users.update_one( result = await db.users.update_one(
{"_id": user_id}, {"_id": user_id},
{ {
@ -292,13 +240,13 @@ async def deactivate_user(
} }
} }
) )
if result.matched_count == 0: if result.matched_count == 0:
raise HTTPException( raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND, status_code=status.HTTP_404_NOT_FOUND,
detail="User not found" detail="User not found"
) )
logger.info(f"Admin {current_user.id} deactivated user {user_id}") logger.info(f"Admin {current_user.id} deactivated user {user_id}")
await log_user_management(AuditAction.USER_DEACTIVATE, user_id, current_user, request) await log_user_management(AuditAction.USER_DEACTIVATE, user_id, current_user, request)
@ -316,10 +264,10 @@ async def admin_reset_password(
# Generate temporary password # Generate temporary password
import secrets import secrets
import string import string
temp_password = ''.join(secrets.choice(string.ascii_letters + string.digits) for _ in range(12)) temp_password = ''.join(secrets.choice(string.ascii_letters + string.digits) for _ in range(12))
hashed_password = get_password_hash(temp_password) hashed_password = get_password_hash(temp_password)
result = await db.users.update_one( result = await db.users.update_one(
{"_id": user_id}, {"_id": user_id},
{ {
@ -329,15 +277,15 @@ async def admin_reset_password(
} }
} }
) )
if result.matched_count == 0: if result.matched_count == 0:
raise HTTPException( raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND, status_code=status.HTTP_404_NOT_FOUND,
detail="User not found" detail="User not found"
) )
logger.info(f"Admin {current_user.id} reset password for user {user_id}") logger.info(f"Admin {current_user.id} reset password for user {user_id}")
# In production, send email with temp password instead of returning it # In production, send email with temp password instead of returning it
return { return {
"message": "Password reset successfully", "message": "Password reset successfully",
@ -353,23 +301,23 @@ async def get_admin_stats(
"""Get system statistics (production/admin only)""" """Get system statistics (production/admin only)"""
# Get user count # Get user count
total_users = await db.users.count_documents({"is_active": True}) total_users = await db.users.count_documents({"is_active": True})
# Get job counts # Get job counts
total_jobs = await db.jobs.count_documents({}) total_jobs = await db.jobs.count_documents({})
# Get jobs by status # Get jobs by status
pipeline = [ pipeline = [
{"$group": {"_id": "$status", "count": {"$sum": 1}}} {"$group": {"_id": "$status", "count": {"$sum": 1}}}
] ]
status_counts = await db.jobs.aggregate(pipeline).to_list(None) status_counts = await db.jobs.aggregate(pipeline).to_list(None)
jobs_by_status = {item["_id"]: item["count"] for item in status_counts} jobs_by_status = {item["_id"]: item["count"] for item in status_counts}
# Get jobs created today # Get jobs created today
today_start = datetime.utcnow().replace(hour=0, minute=0, second=0, microsecond=0) today_start = datetime.utcnow().replace(hour=0, minute=0, second=0, microsecond=0)
active_jobs_today = await db.jobs.count_documents({ active_jobs_today = await db.jobs.count_documents({
"created_at": {"$gte": today_start} "created_at": {"$gte": today_start}
}) })
# Calculate average processing time for completed jobs # Calculate average processing time for completed jobs
avg_processing_pipeline = [ avg_processing_pipeline = [
{"$match": {"status": "completed", "created_at": {"$exists": True}, "updated_at": {"$exists": True}}}, {"$match": {"status": "completed", "created_at": {"$exists": True}, "updated_at": {"$exists": True}}},
@ -390,10 +338,10 @@ async def get_admin_stats(
} }
} }
] ]
avg_result = await db.jobs.aggregate(avg_processing_pipeline).to_list(None) avg_result = await db.jobs.aggregate(avg_processing_pipeline).to_list(None)
avg_processing_time = avg_result[0]["avg_processing_time"] if avg_result else 0.0 avg_processing_time = avg_result[0]["avg_processing_time"] if avg_result else 0.0
return AdminStatsResponse( return AdminStatsResponse(
total_users=total_users, total_users=total_users,
total_jobs=total_jobs, total_jobs=total_jobs,
@ -414,7 +362,7 @@ async def detailed_health_check(
"timestamp": datetime.utcnow().isoformat(), "timestamp": datetime.utcnow().isoformat(),
"components": {} "components": {}
} }
# Check MongoDB # Check MongoDB
try: try:
await db.command("ping") await db.command("ping")
@ -422,7 +370,7 @@ async def detailed_health_check(
except Exception as e: except Exception as e:
health_status["components"]["mongodb"] = {"status": "unhealthy", "error": str(e)} health_status["components"]["mongodb"] = {"status": "unhealthy", "error": str(e)}
health_status["status"] = "degraded" health_status["status"] = "degraded"
# Check Redis (via import to avoid circular dependency) # Check Redis (via import to avoid circular dependency)
try: try:
from ...core.redis import redis_client from ...core.redis import redis_client
@ -434,23 +382,23 @@ async def detailed_health_check(
except Exception as e: except Exception as e:
health_status["components"]["redis"] = {"status": "unhealthy", "error": str(e)} health_status["components"]["redis"] = {"status": "unhealthy", "error": str(e)}
health_status["status"] = "degraded" health_status["status"] = "degraded"
# Check GCS (basic check) # Check GCS (basic check)
try: try:
from ...services.gcs import gcs_service from ...services.gcs import gcs_service
# Simple check to see if bucket is accessible # Simple check to see if bucket is accessible
await gcs_service.file_exists("health_check_dummy") # This will return False but won't error if bucket accessible bucket_exists = await gcs_service.file_exists("health_check_dummy") # This will return False but won't error if bucket accessible
health_status["components"]["gcs"] = {"status": "healthy"} health_status["components"]["gcs"] = {"status": "healthy"}
except Exception as e: except Exception as e:
health_status["components"]["gcs"] = {"status": "unhealthy", "error": str(e)} health_status["components"]["gcs"] = {"status": "unhealthy", "error": str(e)}
health_status["status"] = "degraded" health_status["status"] = "degraded"
# Check job queue health # Check job queue health
try: try:
from ...tasks import celery_app from ...tasks import celery_app
inspect = celery_app.control.inspect() inspect = celery_app.control.inspect()
active_tasks = inspect.active() active_tasks = inspect.active()
if active_tasks: if active_tasks:
total_active = sum(len(tasks) for tasks in active_tasks.values()) total_active = sum(len(tasks) for tasks in active_tasks.values())
health_status["components"]["celery"] = { health_status["components"]["celery"] = {
@ -467,7 +415,7 @@ async def detailed_health_check(
except Exception as e: except Exception as e:
health_status["components"]["celery"] = {"status": "unhealthy", "error": str(e)} health_status["components"]["celery"] = {"status": "unhealthy", "error": str(e)}
health_status["status"] = "degraded" health_status["status"] = "degraded"
return health_status return health_status
@ -479,18 +427,18 @@ async def get_job_statistics(
): ):
"""Get job processing statistics (reviewer/production/admin only)""" """Get job processing statistics (reviewer/production/admin only)"""
since_date = datetime.utcnow() - timedelta(days=days) since_date = datetime.utcnow() - timedelta(days=days)
# Jobs created in period # Jobs created in period
jobs_in_period = await db.jobs.count_documents({ jobs_in_period = await db.jobs.count_documents({
"created_at": {"$gte": since_date} "created_at": {"$gte": since_date}
}) })
# Jobs completed in period # Jobs completed in period
jobs_completed = await db.jobs.count_documents({ jobs_completed = await db.jobs.count_documents({
"status": "completed", "status": "completed",
"updated_at": {"$gte": since_date} "updated_at": {"$gte": since_date}
}) })
# Average processing time for completed jobs # Average processing time for completed jobs
avg_pipeline = [ avg_pipeline = [
{ {
@ -519,12 +467,12 @@ async def get_job_statistics(
} }
} }
] ]
avg_result = await db.jobs.aggregate(avg_pipeline).to_list(None) avg_result = await db.jobs.aggregate(avg_pipeline).to_list(None)
processing_stats = avg_result[0] if avg_result else { processing_stats = avg_result[0] if avg_result else {
"avg_time": 0, "min_time": 0, "max_time": 0 "avg_time": 0, "min_time": 0, "max_time": 0
} }
# Current queue status # Current queue status
current_queue_stats = {} current_queue_stats = {}
pipeline = [ pipeline = [
@ -533,7 +481,7 @@ async def get_job_statistics(
status_counts = await db.jobs.aggregate(pipeline).to_list(None) status_counts = await db.jobs.aggregate(pipeline).to_list(None)
for item in status_counts: for item in status_counts:
current_queue_stats[item["_id"]] = item["count"] current_queue_stats[item["_id"]] = item["count"]
return { return {
"period_days": days, "period_days": days,
"jobs_created": jobs_in_period, "jobs_created": jobs_in_period,
@ -558,7 +506,7 @@ async def admin_force_password_reset(
status_code=status.HTTP_400_BAD_REQUEST, status_code=status.HTTP_400_BAD_REQUEST,
detail="Cannot reset your own password this way" detail="Cannot reset your own password this way"
) )
# Check if user exists # Check if user exists
user_doc = await db.users.find_one({"_id": user_id}) user_doc = await db.users.find_one({"_id": user_id})
if not user_doc: if not user_doc:
@ -566,15 +514,15 @@ async def admin_force_password_reset(
status_code=status.HTTP_404_NOT_FOUND, status_code=status.HTTP_404_NOT_FOUND,
detail="User not found" detail="User not found"
) )
# Generate secure temporary password # Generate secure temporary password
import secrets import secrets
import string import string
temp_password = ''.join(secrets.choice( temp_password = ''.join(secrets.choice(
string.ascii_letters + string.digits + "!@#$%" string.ascii_letters + string.digits + "!@#$%"
) for _ in range(16)) ) for _ in range(16))
# Update password # Update password
await db.users.update_one( await db.users.update_one(
{"_id": user_id}, {"_id": user_id},
@ -585,10 +533,10 @@ async def admin_force_password_reset(
} }
} }
) )
# TODO: In production, send via secure email instead of returning password # TODO: In production, send via secure email instead of returning password
logger.info(f"Admin {current_user.id} reset password for user {user_id}") logger.info(f"Admin {current_user.id} reset password for user {user_id}")
return { return {
"message": "Password reset successfully", "message": "Password reset successfully",
"temporary_password": temp_password, "temporary_password": temp_password,
@ -596,6 +544,47 @@ async def admin_force_password_reset(
} }
@router.get("/audit-logs")
async def get_audit_logs(
job_id: Optional[str] = Query(None),
action: Optional[str] = Query(None),
days: int = Query(7, ge=1, le=90),
page: int = Query(1, ge=1),
size: int = Query(50, ge=1, le=200),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Get audit logs with filtering (production/admin only)"""
query = {
"when": {"$gte": datetime.utcnow() - timedelta(days=days)}
}
if job_id:
query["job_id"] = job_id
if action:
query["action"] = action
# Get total count
total = await db.audit_logs.count_documents(query)
# Get paginated results
skip = (page - 1) * size
cursor = (
db.audit_logs.find(query)
.sort("when", -1)
.skip(skip)
.limit(size)
)
logs = await cursor.to_list(length=size)
return {
"logs": logs,
"total": total,
"page": page,
"size": size,
"period_days": days
}
@router.post("/maintenance/reprocess-job/{job_id}") @router.post("/maintenance/reprocess-job/{job_id}")
async def reprocess_job( async def reprocess_job(
@ -611,7 +600,7 @@ async def reprocess_job(
status_code=status.HTTP_404_NOT_FOUND, status_code=status.HTTP_404_NOT_FOUND,
detail="Job not found" detail="Job not found"
) )
# Reset job to created status for reprocessing # Reset job to created status for reprocessing
await db.jobs.update_one( await db.jobs.update_one(
{"_id": job_id}, {"_id": job_id},
@ -631,7 +620,7 @@ async def reprocess_job(
} }
} }
) )
# Broadcast status update # Broadcast status update
try: try:
from ...services.websocket import connection_manager from ...services.websocket import connection_manager
@ -643,36 +632,36 @@ async def reprocess_job(
) )
except Exception as e: except Exception as e:
logger.warning(f"Failed to broadcast status update for job reset {job_id}: {e}") logger.warning(f"Failed to broadcast status update for job reset {job_id}: {e}")
# Trigger ingestion task # Trigger ingestion task
from ...tasks.ingest_and_ai import ingest_and_ai_task from ...tasks.ingest_and_ai import ingest_and_ai_task
ingest_and_ai_task.delay(job_id) ingest_and_ai_task.delay(job_id)
logger.warning(f"Admin {current_user.id} triggered reprocessing for job {job_id}") logger.warning(f"Admin {current_user.id} triggered reprocessing for job {job_id}")
return {"message": f"Job {job_id} queued for reprocessing"} return {"message": f"Job {job_id} queued for reprocessing"}
@router.get("/audit-logs", response_model=AuditLogResponse) @router.get("/audit-logs", response_model=AuditLogResponse)
async def get_audit_logs_detailed( async def get_audit_logs_detailed(
# Time range # Time range
start_date: datetime | None = Query(None, description="Start date for audit logs"), start_date: Optional[datetime] = Query(None, description="Start date for audit logs"),
end_date: datetime | None = Query(None, description="End date for audit logs"), end_date: Optional[datetime] = Query(None, description="End date for audit logs"),
# Filters # Filters
action: str | None = Query(None, description="Filter by action type"), action: Optional[str] = Query(None, description="Filter by action type"),
severity: str | None = Query(None, description="Filter by severity level"), severity: Optional[str] = Query(None, description="Filter by severity level"),
user_email: str | None = Query(None, description="Filter by user email"), user_email: Optional[str] = Query(None, description="Filter by user email"),
resource_type: str | None = Query(None, description="Filter by resource type"), resource_type: Optional[str] = Query(None, description="Filter by resource type"),
resource_id: str | None = Query(None, description="Filter by resource ID"), resource_id: Optional[str] = Query(None, description="Filter by resource ID"),
success: bool | None = Query(None, description="Filter by success status"), success: Optional[bool] = Query(None, description="Filter by success status"),
# Search # Search
search: str | None = Query(None, description="Search in description and details"), search: Optional[str] = Query(None, description="Search in description and details"),
# Pagination (skip/limit to match frontend AuditLogQuery) # Pagination
skip: int = Query(0, ge=0, description="Number of records to skip"), page: int = Query(1, ge=1, description="Page number"),
limit: int = Query(50, ge=1, le=500, description="Max records to return"), size: int = Query(50, ge=1, le=500, description="Page size"),
# Sorting # Sorting
sort_by: str = Query("timestamp", description="Field to sort by"), sort_by: str = Query("timestamp", description="Field to sort by"),
@ -682,7 +671,26 @@ async def get_audit_logs_detailed(
request: Request = None, request: Request = None,
): ):
"""Get audit logs with filtering and pagination (production/admin only)""" """Get audit logs with filtering and pagination (production/admin only)"""
# Log audit log access
await audit_logger.log_action(
action="admin.audit.access",
description=f"Admin {current_user.email} accessed audit logs",
user=current_user,
request=request,
details={
"filters": {
"start_date": start_date.isoformat() if start_date else None,
"end_date": end_date.isoformat() if end_date else None,
"action": action,
"severity": severity,
"user_email": user_email,
"resource_type": resource_type,
"search": search
}
}
)
# Build query # Build query
query = AuditLogQuery( query = AuditLogQuery(
start_date=start_date, start_date=start_date,
@ -694,12 +702,12 @@ async def get_audit_logs_detailed(
resource_id=resource_id, resource_id=resource_id,
success=success, success=success,
search=search, search=search,
skip=skip, skip=(page - 1) * size,
limit=limit, limit=size,
sort_by=sort_by, sort_by=sort_by,
sort_order=sort_order sort_order=sort_order
) )
return await audit_logger.query_logs(query) return await audit_logger.query_logs(query)
@ -708,34 +716,32 @@ async def get_user_audit_logs(
user_id: str, user_id: str,
days: int = Query(30, ge=1, le=365, description="Number of days to look back"), days: int = Query(30, ge=1, le=365, description="Number of days to look back"),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
request: Request = None, request: Request = None,
): ):
"""Get audit logs for a specific user — accepts user ID or email (production/admin only)""" """Get audit logs for a specific user (production/admin only)"""
import re as _re # Validate user_id
try:
# Accept email address: look up user by case-insensitive email match ObjectId(user_id)
resolved_id = user_id except Exception:
if "@" in user_id: raise HTTPException(
user_doc = await db.users.find_one( status_code=status.HTTP_400_BAD_REQUEST,
{"email": _re.compile(f"^{_re.escape(user_id)}$", _re.IGNORECASE)}, detail="Invalid user ID format"
{"_id": 1},
) )
if user_doc:
resolved_id = str(user_doc["_id"]) # Log access to user audit logs
await audit_logger.log_action(
logs = await audit_logger.get_user_activity(resolved_id, days) action="admin.audit.access",
description=f"Admin {current_user.email} accessed user audit logs for {user_id}",
# Fallback: query by email field in audit logs (case-insensitive via audit_logger) user=current_user,
if not logs and "@" in user_id: request=request,
from ...models.audit_log import AuditLogQuery as ALQ resource_type="user",
from ...services.audit_logger import audit_logger as al resource_id=user_id,
q = ALQ(user_email=user_id, limit=1000, sort_by="timestamp", sort_order=-1) details={"days_requested": days}
result = await al.query_logs(q) )
logs = result.logs
logs = await audit_logger.get_user_activity(user_id, days)
return logs return {"logs": logs, "user_id": user_id, "days": days}
@router.get("/audit-logs/security") @router.get("/audit-logs/security")
@ -745,7 +751,7 @@ async def get_security_events(
request: Request = None, request: Request = None,
): ):
"""Get recent security events (production/admin only)""" """Get recent security events (production/admin only)"""
# Log access to security events # Log access to security events
await audit_logger.log_action( await audit_logger.log_action(
action="admin.audit.access", action="admin.audit.access",
@ -754,9 +760,9 @@ async def get_security_events(
request=request, request=request,
details={"hours_requested": hours} details={"hours_requested": hours}
) )
logs = await audit_logger.get_security_events(hours) logs = await audit_logger.get_security_events(hours)
return logs return {"logs": logs, "hours": hours}
@router.delete("/audit-logs/cleanup") @router.delete("/audit-logs/cleanup")
@ -766,7 +772,7 @@ async def cleanup_audit_logs(
request: Request = None, request: Request = None,
): ):
"""Clean up old audit logs (admin only)""" """Clean up old audit logs (admin only)"""
# Log audit cleanup action # Log audit cleanup action
await audit_logger.log_action( await audit_logger.log_action(
action="admin.system.action", action="admin.system.action",
@ -776,9 +782,9 @@ async def cleanup_audit_logs(
details={"retention_days": retention_days}, details={"retention_days": retention_days},
severity="warning" severity="warning"
) )
deleted_count = await audit_logger.cleanup_old_logs(retention_days) deleted_count = await audit_logger.cleanup_old_logs(retention_days)
# Log cleanup completion # Log cleanup completion
await audit_logger.log_action( await audit_logger.log_action(
action="admin.system.action", action="admin.system.action",
@ -790,9 +796,9 @@ async def cleanup_audit_logs(
"deleted_count": deleted_count "deleted_count": deleted_count
} }
) )
return { return {
"message": f"Deleted {deleted_count} audit logs older than {retention_days} days", "message": f"Deleted {deleted_count} audit logs older than {retention_days} days",
"deleted_count": deleted_count, "deleted_count": deleted_count,
"retention_days": retention_days "retention_days": retention_days
} }

View file

@ -1,295 +0,0 @@
"""Admin production endpoints: failure dashboard, bulk retry, queue stats, VTT override."""
from datetime import datetime
import redis.asyncio as aioredis
from fastapi import (
APIRouter,
Depends,
File,
Form,
HTTPException,
Query,
UploadFile,
status,
)
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...core.logging import get_logger
from ...core.redis import get_redis
from ...models.audit_log import AuditAction
from ...models.job import JobStatus, RequestedOutputs
from ...models.user import User, UserRole
from ...schemas.job import JobResponse
from ...services.audit_logger import audit_logger
from ...services.cloud_run_dispatch import dispatch as _cr_dispatch
from ...services.gcs import upload_vtt_to_gcs
logger = get_logger(__name__)
router = APIRouter(prefix="/admin/production", tags=["admin-production"])
_FAILURE_STATUSES = [
JobStatus.PROCESSING_FAILED.value,
JobStatus.TTS_FAILED.value,
JobStatus.RENDER_FAILED.value,
]
_RETRY_CAP = 50
class BulkRetryRequest(BaseModel):
job_ids: list[str]
strategy: str = "auto" # "auto" | "from_scratch"
class BulkRetryResponse(BaseModel):
retried: list[str]
skipped: list[str]
errors: list[dict]
@router.get("/failures", response_model=list[JobResponse])
async def list_failures(
step: str | None = Query(None, description="Filter by failure.step"),
org_id: str | None = Query(None, description="Filter by organization_id"),
limit: int = Query(50, ge=1, le=200),
skip: int = Query(0, ge=0),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List all jobs in a failed status, optionally filtered by step and org."""
query: dict = {"status": {"$in": _FAILURE_STATUSES}}
if step:
query["failure.step"] = step
if org_id:
query["organization_id"] = org_id
cursor = db.jobs.find(query).sort("updated_at", -1).skip(skip).limit(limit)
jobs = await cursor.to_list(length=limit)
return [
JobResponse(
id=str(j["_id"]),
title=j["title"],
status=j["status"],
source=j["source"],
requested_outputs=RequestedOutputs(**j["requested_outputs"]),
review=j.get("review", {"notes": "", "history": []}),
outputs=j.get("outputs"),
created_at=j["created_at"].isoformat(),
updated_at=j["updated_at"].isoformat(),
)
for j in jobs
]
@router.post("/bulk-retry", response_model=BulkRetryResponse)
async def bulk_retry(
payload: BulkRetryRequest,
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Retry up to 50 failed jobs in one call."""
if len(payload.job_ids) > _RETRY_CAP:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"Cannot retry more than {_RETRY_CAP} jobs at once",
)
retried: list[str] = []
skipped: list[str] = []
errors: list[dict] = []
now = datetime.utcnow()
for job_id in payload.job_ids:
try:
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
skipped.append(job_id)
continue
if job_doc["status"] not in _FAILURE_STATUSES:
skipped.append(job_id)
continue
failure = job_doc.get("failure") or {}
if payload.strategy == "from_scratch":
step = "ingestion"
else:
step = failure.get("step")
if not step:
step = "tts" if job_doc["status"] == JobStatus.TTS_FAILED.value else "render"
if step in ("ingestion", "ai_processing"):
reset_status = JobStatus.CREATED.value
elif step == "translation":
reset_status = JobStatus.AI_PROCESSING.value
elif step == "tts":
src = job_doc["source"].get("language", "en")
reset_status = (
JobStatus.APPROVED_ENGLISH.value if src == "en" else JobStatus.APPROVED_SOURCE.value
)
elif step == "render":
reset_status = JobStatus.PENDING_QC.value
else:
skipped.append(job_id)
continue
await db.jobs.update_one(
{"_id": job_id},
{
"$set": {"status": reset_status, "error": None, "updated_at": now},
"$inc": {"retry_count": 1},
"$push": {
"review.history": {
"at": now,
"status": f"bulk_retry_{step}",
"by": str(current_user.id),
}
},
},
)
if step in ("ingestion", "ai_processing"):
await _cr_dispatch("ingest", job_id)
elif step in ("translation", "tts"):
await _cr_dispatch("translate", job_id)
elif step == "render":
lang = job_doc.get("last_render_language", "en")
await _cr_dispatch("rerender", job_id, language=lang)
retried.append(job_id)
except Exception as e:
logger.error(f"bulk-retry failed for job {job_id}: {e}")
errors.append({"job_id": job_id, "error": str(e)})
try:
await audit_logger.log(
action=AuditAction.JOB_BULK_RETRY,
user_id=str(current_user.id),
user_email=current_user.email,
user_role=current_user.role.value if current_user.role else None,
resource_type="job",
description=f"Bulk retry {len(retried)} jobs (strategy={payload.strategy})",
details={"retried": retried, "skipped": skipped, "error_count": len(errors)},
)
except Exception as e:
logger.warning(f"Failed to write bulk-retry audit log: {e}")
return BulkRetryResponse(retried=retried, skipped=skipped, errors=errors)
# ---------------------------------------------------------------------------
# PR-7: Queue depth stats
# ---------------------------------------------------------------------------
_CELERY_QUEUES = ["default", "ingest", "tts", "render", "ffmpeg", "whisper", "notify", "embed"]
class QueueStats(BaseModel):
queues: dict[str, int] # queue_name → pending task count
total_pending: int
@router.get("/queue-stats", response_model=QueueStats)
async def get_queue_stats(
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
redis: aioredis.Redis = Depends(get_redis),
):
"""Return pending task counts per Celery queue (via Redis LLEN)."""
counts: dict[str, int] = {}
for q in _CELERY_QUEUES:
try:
n = await redis.llen(q)
counts[q] = n
except Exception:
counts[q] = 0
return QueueStats(queues=counts, total_pending=sum(counts.values()))
# ---------------------------------------------------------------------------
# PR-8: Upload final VTT override — bypass AI, jump to PENDING_QC
# ---------------------------------------------------------------------------
_BYPASSABLE_STATUSES = {
JobStatus.CREATED.value,
JobStatus.INGESTING.value,
JobStatus.AI_PROCESSING.value,
JobStatus.PROCESSING_FAILED.value,
JobStatus.TTS_FAILED.value,
JobStatus.RENDER_FAILED.value,
}
@router.post("/jobs/{job_id}/upload-final-vtt")
async def upload_final_vtt(
job_id: str,
language: str = Form(..., description="BCP-47 language code, e.g. 'en' or 'fr'"),
vtt_file: UploadFile = File(..., description="WebVTT (.vtt) file"),
vtt_type: str = Form("captions", description="'captions' or 'ad'"),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Upload a hand-crafted VTT to override AI output and advance job to PENDING_QC."""
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
if job_doc["status"] not in _BYPASSABLE_STATUSES:
raise HTTPException(
status_code=status.HTTP_409_CONFLICT,
detail=f"Cannot override VTT when job is in status '{job_doc['status']}'. "
f"Only allowed in: {sorted(_BYPASSABLE_STATUSES)}",
)
if not vtt_file.filename or not vtt_file.filename.endswith(".vtt"):
raise HTTPException(status_code=400, detail="File must be a .vtt file")
vtt_content = (await vtt_file.read()).decode("utf-8")
if not vtt_content.strip().startswith("WEBVTT"):
raise HTTPException(status_code=400, detail="File does not start with WEBVTT header")
if vtt_type not in ("captions", "ad"):
raise HTTPException(status_code=400, detail="vtt_type must be 'captions' or 'ad'")
lang_key = language.replace("-", "_")
field = "captions_vtt_gcs" if vtt_type == "captions" else "ad_vtt_gcs"
gcs_path = f"{job_id}/{lang_key}/{vtt_type}.vtt"
gcs_uri = await upload_vtt_to_gcs(vtt_content, gcs_path)
now = datetime.utcnow()
await db.jobs.update_one(
{"_id": job_id},
{
"$set": {
f"outputs.{lang_key}.{field}": gcs_uri,
"status": JobStatus.PENDING_QC.value,
"updated_at": now,
},
"$push": {
"review.history": {
"at": now,
"status": "manual_vtt_upload",
"by": str(current_user.id),
"note": f"Manual {vtt_type} VTT upload for {language} by {current_user.email}",
}
},
},
)
try:
await audit_logger.log(
action=AuditAction.VTT_EDIT,
user_id=str(current_user.id),
user_email=current_user.email,
user_role=current_user.role.value if current_user.role else None,
resource_type="job",
resource_id=job_id,
description=f"Manual {vtt_type} VTT upload for {language} — job advanced to PENDING_QC",
)
except Exception as e:
logger.warning(f"Failed to write upload-final-vtt audit log: {e}")
return {"status": "ok", "gcs_uri": gcs_uri, "job_status": JobStatus.PENDING_QC.value}

View file

@ -1,126 +1,112 @@
import re import re
import secrets
from datetime import datetime from datetime import datetime
from fastapi import APIRouter, Depends, HTTPException, Request, Response, status from fastapi import APIRouter, Depends, HTTPException, Request, Response, status
from fastapi.security import HTTPBearer from fastapi.security import HTTPBearer
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorClient, AsyncIOMotorDatabase
from ...core.config import settings from ...core.config import settings
from ...core.database import get_database from ...core.database import get_database
from ...core.logging import get_logger
from ...core.security import ( from ...core.security import (
create_access_token, create_access_token,
create_refresh_token, create_refresh_token,
decode_token, decode_token,
verify_password, verify_password,
) )
from ...models.audit_log import AuditAction, AuditLogSeverity from ...models.user import User, AuthProvider, UserRole
from ...models.user import AuthProvider, User, UserRole
from ...schemas.auth import ( from ...schemas.auth import (
LoginRequest, LoginRequest,
LoginResponse, LoginResponse,
LogoutResponse, LogoutResponse,
RefreshResponse,
MicrosoftLoginRequest, MicrosoftLoginRequest,
MicrosoftLoginResponse, MicrosoftLoginResponse,
RefreshResponse,
) )
from ...services.audit_logger import audit_logger, log_auth_failure, log_auth_success
from ...services.microsoft_auth import ( from ...services.microsoft_auth import (
MicrosoftAuthError,
MicrosoftTokenValidationError,
get_microsoft_auth_service, get_microsoft_auth_service,
MicrosoftTokenValidationError,
MicrosoftAuthError,
) )
from ...services.audit_logger import log_auth_success, log_auth_failure, audit_logger
from ...models.audit_log import AuditAction, AuditLogSeverity
logger = get_logger(__name__)
router = APIRouter(prefix="/auth", tags=["auth"]) router = APIRouter(prefix="/auth", tags=["auth"])
security = HTTPBearer() security = HTTPBearer()
async def _get_user_org_ids(user_id: str, db: AsyncIOMotorDatabase) -> list[str]:
"""Return list of org IDs the user belongs to — used as a JWT hint only."""
cursor = db.memberships.find({"user_id": user_id}, {"organization_id": 1})
memberships = await cursor.to_list(length=200)
return [str(m["organization_id"]) for m in memberships if m.get("organization_id")]
def _set_auth_cookies(response: Response, refresh_token: str) -> str:
"""Set httponly refresh_token cookie and readable csrf_token cookie. Returns the csrf token."""
csrf_token = secrets.token_hex(32)
ttl = settings.jwt_refresh_ttl_days * 24 * 60 * 60
domain = settings.cookie_domain if settings.app_env == "prod" else None
response.set_cookie(
key="refresh_token",
value=refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=domain,
max_age=ttl,
)
response.set_cookie(
key="csrf_token",
value=csrf_token,
httponly=False, # JS-readable for Double Submit Cookie pattern
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=domain,
max_age=ttl,
)
return csrf_token
@router.post("/login", response_model=LoginResponse) @router.post("/login", response_model=LoginResponse)
async def login( async def login(
login_data: LoginRequest, login_data: LoginRequest,
request: Request, request: Request,
response: Response, response: Response,
db: AsyncIOMotorDatabase = Depends(get_database),
): ):
user_doc = await db.users.find_one({"email": login_data.email}) print(f"LOGIN: Starting login for {login_data.email}")
if not user_doc: # Create database connection directly (bypass dependency injection issues)
await log_auth_failure(login_data.email, request, "User not found") client = AsyncIOMotorClient(settings.mongodb_uri)
raise HTTPException( db = client[settings.mongodb_db]
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password", try:
print("LOGIN: Database connection created")
# Find user by email
print("LOGIN: Looking up user in database")
user_doc = await db.users.find_one({"email": login_data.email})
print(f"LOGIN: User lookup complete, found: {user_doc is not None}")
if not user_doc:
await log_auth_failure(login_data.email, request, "User not found")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
)
user = User(**user_doc)
# Check if user uses Microsoft authentication
if user.auth_provider == AuthProvider.MICROSOFT:
await log_auth_failure(login_data.email, request, "Account uses Microsoft SSO")
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="This account uses Microsoft authentication. Please sign in with Microsoft.",
)
# Verify password
if not user.hashed_password or not verify_password(login_data.password, user.hashed_password):
await log_auth_failure(login_data.email, request, "Invalid password")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
)
if not user.is_active:
await log_auth_failure(login_data.email, request, "Account disabled")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account is disabled",
)
# Create tokens
access_token = create_access_token(subject=str(user.id))
refresh_token = create_refresh_token(subject=str(user.id))
# Set refresh token as HttpOnly cookie
response.set_cookie(
key="refresh_token",
value=refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=settings.cookie_domain if settings.app_env == "prod" else None,
max_age=settings.jwt_refresh_ttl_days * 24 * 60 * 60,
) )
user = User(**user_doc) await log_auth_success(user, request)
return LoginResponse(
if user.auth_provider == AuthProvider.MICROSOFT: access_token=access_token,
await log_auth_failure(login_data.email, request, "Account uses Microsoft SSO") user_id=str(user.id),
raise HTTPException( role=user.role,
status_code=status.HTTP_400_BAD_REQUEST,
detail="This account uses Microsoft authentication. Please sign in with Microsoft.",
) )
if not user.hashed_password or not verify_password(login_data.password, user.hashed_password): finally:
await log_auth_failure(login_data.email, request, "Invalid password") # Close database connection
raise HTTPException( client.close()
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
)
if not user.is_active:
await log_auth_failure(login_data.email, request, "Account disabled")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account is disabled",
)
org_ids = await _get_user_org_ids(str(user.id), db)
access_token = create_access_token(subject=str(user.id), org_ids=org_ids)
refresh_token = create_refresh_token(subject=str(user.id))
_set_auth_cookies(response, refresh_token)
await log_auth_success(user, request)
return LoginResponse(
access_token=access_token,
user_id=str(user.id),
role=user.role,
)
@router.post("/microsoft", response_model=MicrosoftLoginResponse) @router.post("/microsoft", response_model=MicrosoftLoginResponse)
@ -128,84 +114,127 @@ async def microsoft_login(
login_data: MicrosoftLoginRequest, login_data: MicrosoftLoginRequest,
request: Request, request: Request,
response: Response, response: Response,
db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""Authenticate user with Microsoft ID token. """Authenticate user with Microsoft ID token.
This endpoint validates the Microsoft ID token, finds or creates the user, This endpoint validates the Microsoft ID token, finds or creates the user,
and returns JWT tokens for API access. and returns JWT tokens for API access.
""" """
microsoft_auth = get_microsoft_auth_service() print(f"MICROSOFT LOGIN: Starting Microsoft authentication")
# Create database connection
client = AsyncIOMotorClient(settings.mongodb_uri)
db = client[settings.mongodb_db]
try: try:
user_info = await microsoft_auth.validate_token(login_data.id_token) # Validate Microsoft token
except MicrosoftTokenValidationError as e: microsoft_auth = get_microsoft_auth_service()
await log_auth_failure(login_data.id_token[:20] + "", request, f"MS token invalid: {e}") try:
raise HTTPException( user_info = microsoft_auth.validate_token(login_data.id_token)
status_code=status.HTTP_401_UNAUTHORIZED, print(f"MICROSOFT LOGIN: Token validated for {user_info.email}")
detail=f"Microsoft authentication failed: {str(e)}", except MicrosoftTokenValidationError as e:
) from None print(f"MICROSOFT LOGIN ERROR: Token validation failed: {e}")
except MicrosoftAuthError as e: await log_auth_failure(login_data.id_token[:20] + "", request, f"MS token invalid: {e}")
await log_auth_failure("microsoft-sso", request, f"MS auth service error: {e}") raise HTTPException(
raise HTTPException( status_code=status.HTTP_401_UNAUTHORIZED,
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=f"Microsoft authentication failed: {str(e)}",
detail="Microsoft authentication service error", )
) from None except MicrosoftAuthError as e:
print(f"MICROSOFT LOGIN ERROR: Authentication error: {e}")
# Look up by Microsoft-derived ID first — handles email casing changes across logins await log_auth_failure("microsoft-sso", request, f"MS auth service error: {e}")
ms_user_id = f"ms-{user_info.sub[:20]}" raise HTTPException(
user_doc = await db.users.find_one({"_id": ms_user_id}) status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
if not user_doc: detail="Microsoft authentication service error",
# Fall back to case-insensitive email lookup (handles local-to-Microsoft migration)
user_doc = await db.users.find_one(
{"email": {"$regex": f"^{re.escape(user_info.email)}$", "$options": "i"}}
)
if user_doc:
user = User(**user_doc)
if user.auth_provider == AuthProvider.LOCAL:
await db.users.update_one(
{"_id": user_doc["_id"]},
{"$set": {"auth_provider": AuthProvider.MICROSOFT.value, "updated_at": datetime.utcnow()}},
) )
user.auth_provider = AuthProvider.MICROSOFT
else:
new_user = {
"_id": ms_user_id,
"email": user_info.email,
"full_name": user_info.name,
"hashed_password": None,
"role": UserRole.CLIENT.value,
"auth_provider": AuthProvider.MICROSOFT.value,
"is_active": True,
"pm_client_ids": [],
"created_at": datetime.utcnow(),
"updated_at": datetime.utcnow(),
}
await db.users.insert_one(new_user)
user = User(**new_user)
if not user.is_active: # Find or create user
await log_auth_failure(user.email, request, "Account disabled") # Look up by Microsoft-derived ID first — handles email casing changes across logins
raise HTTPException( # (Microsoft can return vadymsamoilenko@... vs VadymSamoilenko@... for the same user)
status_code=status.HTTP_401_UNAUTHORIZED, ms_user_id = f"ms-{user_info.sub[:20]}"
detail="User account is disabled", user_doc = await db.users.find_one({"_id": ms_user_id})
if not user_doc:
# Fall back to case-insensitive email lookup (handles local-to-Microsoft migration)
user_doc = await db.users.find_one(
{"email": {"$regex": f"^{re.escape(user_info.email)}$", "$options": "i"}}
)
if user_doc:
# User exists
user = User(**user_doc)
print(f"MICROSOFT LOGIN: Existing user found: {user.id}")
# Update auth_provider if user is switching from local to Microsoft
if user.auth_provider == AuthProvider.LOCAL:
print(f"MICROSOFT LOGIN: Updating user to Microsoft auth provider")
await db.users.update_one(
{"_id": user_doc["_id"]},
{
"$set": {
"auth_provider": AuthProvider.MICROSOFT.value,
"updated_at": datetime.utcnow()
}
}
)
user.auth_provider = AuthProvider.MICROSOFT
else:
# Create new user with zero org memberships (SaaS model).
# They will see a "no access" landing until an admin invites them.
print(f"MICROSOFT LOGIN: Creating new user for {user_info.email}")
new_user = {
"_id": ms_user_id,
"email": user_info.email,
"full_name": user_info.name,
"hashed_password": None,
"role": UserRole.CLIENT.value,
"auth_provider": AuthProvider.MICROSOFT.value,
"is_active": True,
"pm_client_ids": [],
"created_at": datetime.utcnow(),
"updated_at": datetime.utcnow(),
}
await db.users.insert_one(new_user)
user = User(**new_user)
print(f"MICROSOFT LOGIN: New user created (zero memberships): {user.id}")
# Check if user is active
if not user.is_active:
await log_auth_failure(user.email, request, "Account disabled")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account is disabled",
)
# Create JWT tokens
access_token = create_access_token(subject=str(user.id))
refresh_token = create_refresh_token(subject=str(user.id))
# Set refresh token as HttpOnly cookie
response.set_cookie(
key="refresh_token",
value=refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=settings.cookie_domain if settings.app_env == "prod" else None,
max_age=settings.jwt_refresh_ttl_days * 24 * 60 * 60,
) )
org_ids = await _get_user_org_ids(str(user.id), db) print(f"MICROSOFT LOGIN: Authentication successful for {user.email}")
access_token = create_access_token(subject=str(user.id), org_ids=org_ids) await log_auth_success(user, request)
refresh_token = create_refresh_token(subject=str(user.id)) return MicrosoftLoginResponse(
access_token=access_token,
user_id=str(user.id),
role=user.role if isinstance(user.role, str) else user.role.value,
email=user.email,
full_name=user.full_name,
auth_provider=user.auth_provider,
)
_set_auth_cookies(response, refresh_token) finally:
# Close database connection
await log_auth_success(user, request) client.close()
return MicrosoftLoginResponse(
access_token=access_token,
user_id=str(user.id),
role=user.role if isinstance(user.role, str) else user.role.value,
email=user.email,
full_name=user.full_name,
auth_provider=user.auth_provider,
)
@router.post("/refresh", response_model=RefreshResponse) @router.post("/refresh", response_model=RefreshResponse)
@ -215,32 +244,29 @@ async def refresh_token(
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
refresh_token = request.cookies.get("refresh_token") refresh_token = request.cookies.get("refresh_token")
print(f"🔍 REFRESH DEBUG: Cookie exists: {bool(refresh_token)}")
if not refresh_token: if not refresh_token:
print("🚨 REFRESH ERROR: No refresh token in cookies")
raise HTTPException( raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED, status_code=status.HTTP_401_UNAUTHORIZED,
detail="Refresh token not found", detail="Refresh token not found",
) )
# CSRF protection: Double Submit Cookie pattern
csrf_cookie = request.cookies.get("csrf_token")
csrf_header = request.headers.get("X-CSRF-Token")
if csrf_cookie and (not csrf_header or csrf_header != csrf_cookie):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="CSRF token mismatch",
)
try: try:
print(f"🔍 REFRESH DEBUG: Attempting to decode token...")
payload = decode_token(refresh_token) payload = decode_token(refresh_token)
print(f"🔍 REFRESH DEBUG: Token decoded successfully, type={payload.get('type')}")
if payload.get("type") != "refresh": if payload.get("type") != "refresh":
print(f"🚨 REFRESH ERROR: Wrong token type: {payload.get('type')}")
raise HTTPException( raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED, status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid token type", detail="Invalid token type",
) )
user_id = payload.get("sub") user_id = payload.get("sub")
print(f"🔍 REFRESH DEBUG: User ID from token: {user_id}")
if not user_id: if not user_id:
raise HTTPException( raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED, status_code=status.HTTP_401_UNAUTHORIZED,
@ -262,15 +288,22 @@ async def refresh_token(
detail="User account is disabled", detail="User account is disabled",
) )
# Create new tokens (include org_ids claim for prefilter hint) # Create new tokens
_org_ids = await _get_user_org_ids(user_id, db) new_access_token = create_access_token(subject=user_id)
new_access_token = create_access_token(subject=user_id, org_ids=_org_ids)
new_refresh_token = create_refresh_token(subject=user_id) new_refresh_token = create_refresh_token(subject=user_id)
# Rotate both refresh and CSRF cookies # Update refresh token cookie
_set_auth_cookies(response, new_refresh_token) response.set_cookie(
key="refresh_token",
value=new_refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=settings.cookie_domain if settings.app_env == "prod" else None,
max_age=settings.jwt_refresh_ttl_days * 24 * 60 * 60,
)
logger.info("Token refresh successful for user %s", user_id) print(f"🔍 REFRESH DEBUG: Refresh successful for user {user_id}")
return RefreshResponse( return RefreshResponse(
access_token=new_access_token, access_token=new_access_token,
user_id=user_id, user_id=user_id,
@ -279,15 +312,14 @@ async def refresh_token(
full_name=user.full_name full_name=user.full_name
) )
except HTTPException:
raise
except Exception as e: except Exception as e:
print(f"🚨 REFRESH ERROR: Exception during refresh: {type(e).__name__}: {e}")
import traceback import traceback
logger.exception("Refresh token error: %s\n%s", type(e).__name__, traceback.format_exc()) print(f"Traceback:\n{traceback.format_exc()}")
raise HTTPException( raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED, status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid refresh token", detail=f"Invalid refresh token: {str(e)}",
) from None )
@router.post("/logout", response_model=LogoutResponse) @router.post("/logout", response_model=LogoutResponse)

View file

@ -1,245 +0,0 @@
"""Job Brief CRUD endpoints."""
from datetime import datetime
from fastapi import APIRouter, Depends, HTTPException, Request, status
from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, assert_user_in_org, get_membership_context
from ...core.database import get_database
from ...core.logging import get_logger
from ...models.audit_log import AuditAction
from ...models.job_brief import (
BriefStatus,
JobBriefCreate,
JobBriefResponse,
JobBriefUpdate,
)
from ...models.organization import OrgRole
from ...services.audit_logger import audit_logger
logger = get_logger(__name__)
router = APIRouter(prefix="/briefs", tags=["briefs"])
def _doc_to_response(doc: dict) -> JobBriefResponse:
return JobBriefResponse(
id=str(doc["_id"]),
organization_id=doc["organization_id"],
project_id=doc.get("project_id"),
title=doc["title"],
description=doc.get("description"),
requested_outputs=doc["requested_outputs"],
languages=doc.get("languages", []),
deadline=doc.get("deadline"),
status=doc["status"],
created_by=doc["created_by"],
assignee_id=doc.get("assignee_id"),
job_id=doc.get("job_id"),
created_at=doc["created_at"].isoformat(),
updated_at=doc["updated_at"].isoformat(),
submitted_at=doc["submitted_at"].isoformat() if doc.get("submitted_at") else None,
approved_by=doc.get("approved_by"),
)
@router.get("", response_model=list[JobBriefResponse])
async def list_briefs(
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
org_ids = [m.organization_id for m in ctx.memberships] if hasattr(ctx, "memberships") else []
if ctx.is_platform_admin:
query: dict = {}
elif org_ids:
query = {"organization_id": {"$in": org_ids}}
else:
raise HTTPException(status_code=403, detail="No org memberships")
cursor = db.job_briefs.find(query).sort("created_at", -1).limit(100)
docs = await cursor.to_list(length=100)
return [_doc_to_response(d) for d in docs]
@router.post("", response_model=JobBriefResponse, status_code=status.HTTP_201_CREATED)
async def create_brief(
payload: JobBriefCreate,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
# Resolve org from project if not directly identifiable
org_id: str | None = None
if payload.project_id:
project = await db.projects.find_one({"_id": payload.project_id}, {"client_id": 1})
if project:
org_id = project.get("client_id")
if not org_id:
# Use first membership org if user has only one (or admin)
if ctx.is_platform_admin:
raise HTTPException(status_code=400, detail="Admin must supply project_id or org_id cannot be inferred")
memberships = [m for m in (ctx.memberships if hasattr(ctx, "memberships") else [])
if ctx.can_access_org(m.organization_id, OrgRole.MANAGER)]
if len(memberships) == 1:
org_id = memberships[0].organization_id
else:
raise HTTPException(status_code=400, detail="Cannot infer organization; supply project_id")
assert_user_in_org(ctx, org_id, OrgRole.MANAGER)
now = datetime.utcnow()
doc = {
"_id": f"brief_{now.strftime('%Y%m%d%H%M%S%f')}_{str(ctx.user.id)[-6:]}",
"organization_id": org_id,
"project_id": payload.project_id,
"title": payload.title,
"description": payload.description,
"requested_outputs": payload.requested_outputs.model_dump(),
"languages": payload.languages,
"deadline": payload.deadline,
"assignee_id": payload.assignee_id,
"status": BriefStatus.DRAFT.value,
"created_by": str(ctx.user.id),
"job_id": None,
"created_at": now,
"updated_at": now,
"submitted_at": None,
"approved_by": None,
}
await db.job_briefs.insert_one(doc)
await audit_logger.log_action(
action=AuditAction.BRIEF_CREATE,
description=f"Brief '{payload.title}' created",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=str(doc["_id"]),
details={"title": payload.title, "organization_id": org_id},
)
return _doc_to_response(doc)
@router.get("/{brief_id}", response_model=JobBriefResponse)
async def get_brief(
brief_id: str,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.VIEWER)
return _doc_to_response(doc)
@router.patch("/{brief_id}", response_model=JobBriefResponse)
async def update_brief(
brief_id: str,
payload: JobBriefUpdate,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.MANAGER)
if doc["status"] != BriefStatus.DRAFT.value:
raise HTTPException(status_code=400, detail="Only DRAFT briefs can be updated")
updates: dict = {"updated_at": datetime.utcnow()}
if payload.title is not None:
updates["title"] = payload.title
if payload.description is not None:
updates["description"] = payload.description
if payload.requested_outputs is not None:
updates["requested_outputs"] = payload.requested_outputs.model_dump()
if payload.languages is not None:
updates["languages"] = payload.languages
if payload.deadline is not None:
updates["deadline"] = payload.deadline
result = await db.job_briefs.find_one_and_update(
{"_id": brief_id},
{"$set": updates},
return_document=True,
)
await audit_logger.log_action(
action=AuditAction.BRIEF_UPDATE,
description=f"Brief '{brief_id}' updated",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=brief_id,
details={"fields_updated": list(updates.keys())},
)
return _doc_to_response(result)
@router.post("/{brief_id}/submit", response_model=JobBriefResponse)
async def submit_brief(
brief_id: str,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.MANAGER)
if doc["status"] != BriefStatus.DRAFT.value:
raise HTTPException(status_code=400, detail="Only DRAFT briefs can be submitted")
now = datetime.utcnow()
result = await db.job_briefs.find_one_and_update(
{"_id": brief_id},
{"$set": {"status": BriefStatus.SUBMITTED.value, "submitted_at": now, "updated_at": now}},
return_document=True,
)
await audit_logger.log_action(
action=AuditAction.BRIEF_SUBMIT,
description=f"Brief '{brief_id}' submitted for review",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=brief_id,
details={"organization_id": result.get("organization_id")},
)
return _doc_to_response(result)
@router.post("/{brief_id}/approve", response_model=JobBriefResponse)
async def approve_brief(
brief_id: str,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.ADMIN)
if doc["status"] != BriefStatus.SUBMITTED.value:
raise HTTPException(status_code=400, detail="Only SUBMITTED briefs can be approved")
now = datetime.utcnow()
result = await db.job_briefs.find_one_and_update(
{"_id": brief_id},
{
"$set": {
"status": BriefStatus.APPROVED.value,
"approved_by": str(ctx.user.id),
"updated_at": now,
}
},
return_document=True,
)
await audit_logger.log_action(
action=AuditAction.BRIEF_APPROVE,
description=f"Brief '{brief_id}' approved",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=brief_id,
details={"organization_id": result.get("organization_id")},
)
return _doc_to_response(result)

View file

@ -9,16 +9,15 @@ Access rules:
- List projects (read) Admin, PM, or any team member of the client - List projects (read) Admin, PM, or any team member of the client
""" """
from datetime import UTC, datetime from datetime import datetime, timezone
from bson import ObjectId from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException, Request from fastapi import APIRouter, Depends, HTTPException
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel from pydantic import BaseModel
from ...core.database import get_database from ...core.database import get_database
from ...core.dependencies import get_current_user, require_roles from ...core.dependencies import get_current_user, require_pm_for_client, require_roles
from ...models.audit_log import AuditAction
from ...models.client import ( from ...models.client import (
Client, Client,
ClientCreate, ClientCreate,
@ -31,7 +30,6 @@ from ...models.client import (
TeamUpdate, TeamUpdate,
) )
from ...models.user import User, UserRole from ...models.user import User, UserRole
from ...services.audit_logger import audit_logger
router = APIRouter(prefix="/clients", tags=["clients"]) router = APIRouter(prefix="/clients", tags=["clients"])
@ -41,7 +39,7 @@ router = APIRouter(prefix="/clients", tags=["clients"])
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
def _now() -> datetime: def _now() -> datetime:
return datetime.now(UTC) return datetime.now(timezone.utc)
async def _get_client_or_404(client_id: str, db: AsyncIOMotorDatabase) -> dict: async def _get_client_or_404(client_id: str, db: AsyncIOMotorDatabase) -> dict:
@ -93,9 +91,6 @@ def _project_from_doc(doc: dict) -> Project:
name=doc["name"], name=doc["name"],
client_id=doc["client_id"], client_id=doc["client_id"],
is_active=doc.get("is_active", True), is_active=doc.get("is_active", True),
default_languages=doc.get("default_languages", []),
default_linguist_id=doc.get("default_linguist_id"),
default_reviewer_id=doc.get("default_reviewer_id"),
created_at=doc.get("created_at"), created_at=doc.get("created_at"),
updated_at=doc.get("updated_at"), updated_at=doc.get("updated_at"),
) )
@ -123,7 +118,6 @@ async def list_clients(
@router.post("", response_model=Client) @router.post("", response_model=Client)
async def create_client( async def create_client(
body: ClientCreate, body: ClientCreate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
@ -140,18 +134,7 @@ async def create_client(
"updated_at": now, "updated_at": now,
}) })
doc = await db.clients.find_one({"_id": client_id}) doc = await db.clients.find_one({"_id": client_id})
client = _client_from_doc(doc) return _client_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_CREATE,
description=f"Client '{client.name}' created",
user=current_user,
request=request,
resource_type="client",
resource_id=str(client.id),
resource_name=client.name,
details={"slug": client.slug},
)
return client
@router.get("/{client_id}", response_model=Client) @router.get("/{client_id}", response_model=Client)
@ -172,12 +155,11 @@ async def get_client(
async def update_client( async def update_client(
client_id: str, client_id: str,
body: ClientUpdate, body: ClientUpdate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
update: dict = dict(body.model_dump(exclude_none=True).items()) update: dict = {k: v for k, v in body.model_dump(exclude_none=True).items()}
if not update: if not update:
raise HTTPException(status_code=422, detail="No fields to update") raise HTTPException(status_code=422, detail="No fields to update")
if "slug" in update and await db.clients.find_one({"slug": update["slug"], "_id": {"$ne": client_id}}): if "slug" in update and await db.clients.find_one({"slug": update["slug"], "_id": {"$ne": client_id}}):
@ -185,39 +167,17 @@ async def update_client(
update["updated_at"] = _now() update["updated_at"] = _now()
await db.clients.update_one({"_id": client_id}, {"$set": update}) await db.clients.update_one({"_id": client_id}, {"$set": update})
doc = await db.clients.find_one({"_id": client_id}) doc = await db.clients.find_one({"_id": client_id})
client = _client_from_doc(doc) return _client_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_UPDATE,
description=f"Client '{client.name}' updated",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client.name,
details={"fields_updated": list(body.model_dump(exclude_none=True).keys())},
)
return client
@router.delete("/{client_id}", status_code=204) @router.delete("/{client_id}", status_code=204)
async def deactivate_client( async def deactivate_client(
client_id: str, client_id: str,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await db.clients.update_one({"_id": client_id}, {"$set": {"is_active": False, "updated_at": _now()}}) await db.clients.update_one({"_id": client_id}, {"$set": {"is_active": False, "updated_at": _now()}})
await audit_logger.log_action(
action=AuditAction.CLIENT_DEACTIVATE,
description=f"Client '{doc['name']}' deactivated",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=doc["name"],
details={"was_active": doc.get("is_active", True)},
)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -232,11 +192,10 @@ class AssignPMRequest(BaseModel):
async def assign_pm( async def assign_pm(
client_id: str, client_id: str,
body: AssignPMRequest, body: AssignPMRequest,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
user_doc = await db.users.find_one({"_id": body.user_id}) user_doc = await db.users.find_one({"_id": body.user_id})
if not user_doc: if not user_doc:
raise HTTPException(status_code=404, detail="User not found") raise HTTPException(status_code=404, detail="User not found")
@ -247,28 +206,16 @@ async def assign_pm(
"$set": {"role": UserRole.PROJECT_MANAGER.value, "updated_at": _now()}, "$set": {"role": UserRole.PROJECT_MANAGER.value, "updated_at": _now()},
}, },
) )
await audit_logger.log_action(
action=AuditAction.CLIENT_PM_ASSIGN,
description=f"PM '{user_doc.get('email', body.user_id)}' assigned to client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"pm_user_id": body.user_id, "pm_email": user_doc.get("email")},
)
@router.delete("/{client_id}/pm/{user_id}", status_code=204) @router.delete("/{client_id}/pm/{user_id}", status_code=204)
async def remove_pm( async def remove_pm(
client_id: str, client_id: str,
user_id: str, user_id: str,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
pm_doc = await db.users.find_one({"_id": user_id})
await db.users.update_one( await db.users.update_one(
{"_id": user_id}, {"_id": user_id},
{"$pull": {"pm_client_ids": client_id}, "$set": {"updated_at": _now()}}, {"$pull": {"pm_client_ids": client_id}, "$set": {"updated_at": _now()}},
@ -280,16 +227,6 @@ async def remove_pm(
{"_id": user_id}, {"_id": user_id},
{"$set": {"role": UserRole.CLIENT.value, "updated_at": _now()}}, {"$set": {"role": UserRole.CLIENT.value, "updated_at": _now()}},
) )
await audit_logger.log_action(
action=AuditAction.CLIENT_PM_REMOVE,
description=f"PM '{pm_doc.get('email', user_id) if pm_doc else user_id}' removed from client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"pm_user_id": user_id, "pm_email": pm_doc.get("email") if pm_doc else None},
)
@router.get("/{client_id}/pm", response_model=list[dict]) @router.get("/{client_id}/pm", response_model=list[dict])
@ -326,11 +263,10 @@ async def list_teams(
async def create_team( async def create_team(
client_id: str, client_id: str,
body: TeamCreate, body: TeamCreate,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
now = _now() now = _now()
team_id = str(ObjectId()) team_id = str(ObjectId())
@ -343,18 +279,7 @@ async def create_team(
"updated_at": now, "updated_at": now,
}) })
doc = await db.teams.find_one({"_id": team_id}) doc = await db.teams.find_one({"_id": team_id})
team = _team_from_doc(doc) return _team_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_CREATE,
description=f"Team '{team.name}' created for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team.name},
)
return team
@router.patch("/{client_id}/teams/{team_id}", response_model=Team) @router.patch("/{client_id}/teams/{team_id}", response_model=Team)
@ -362,55 +287,32 @@ async def update_team(
client_id: str, client_id: str,
team_id: str, team_id: str,
body: TeamUpdate, body: TeamUpdate,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
await _get_team_or_404(team_id, client_id, db) await _get_team_or_404(team_id, client_id, db)
update = dict(body.model_dump(exclude_none=True).items()) update = {k: v for k, v in body.model_dump(exclude_none=True).items()}
if not update: if not update:
raise HTTPException(status_code=422, detail="No fields to update") raise HTTPException(status_code=422, detail="No fields to update")
update["updated_at"] = _now() update["updated_at"] = _now()
await db.teams.update_one({"_id": team_id}, {"$set": update}) await db.teams.update_one({"_id": team_id}, {"$set": update})
doc = await db.teams.find_one({"_id": team_id}) doc = await db.teams.find_one({"_id": team_id})
team = _team_from_doc(doc) return _team_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_UPDATE,
description=f"Team '{team.name}' updated for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team.name, "fields_updated": list(body.model_dump(exclude_none=True).keys())},
)
return team
@router.delete("/{client_id}/teams/{team_id}", status_code=204) @router.delete("/{client_id}/teams/{team_id}", status_code=204)
async def delete_team( async def delete_team(
client_id: str, client_id: str,
team_id: str, team_id: str,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
team_doc = await _get_team_or_404(team_id, client_id, db) await _get_team_or_404(team_id, client_id, db)
await db.teams.delete_one({"_id": team_id}) await db.teams.delete_one({"_id": team_id})
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_DELETE,
description=f"Team '{team_doc['name']}' deleted from client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team_doc["name"]},
)
# Team membership # Team membership
@ -424,35 +326,18 @@ async def add_team_member(
client_id: str, client_id: str,
team_id: str, team_id: str,
body: AddMemberRequest, body: AddMemberRequest,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
team_doc = await _get_team_or_404(team_id, client_id, db) await _get_team_or_404(team_id, client_id, db)
member_doc = await db.users.find_one({"_id": body.user_id}) if not await db.users.find_one({"_id": body.user_id}):
if not member_doc:
raise HTTPException(status_code=404, detail="User not found") raise HTTPException(status_code=404, detail="User not found")
# Write to both Team.member_user_ids (legacy) and Membership.team_ids (MT-17)
await db.teams.update_one( await db.teams.update_one(
{"_id": team_id}, {"_id": team_id},
{"$addToSet": {"member_user_ids": body.user_id}, "$set": {"updated_at": _now()}}, {"$addToSet": {"member_user_ids": body.user_id}, "$set": {"updated_at": _now()}},
) )
await db.memberships.update_one(
{"user_id": body.user_id, "organization_id": client_id},
{"$addToSet": {"team_ids": team_id}},
)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_MEMBER_ADD,
description=f"User '{member_doc.get('email', body.user_id)}' added to team '{team_doc['name']}' of client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team_doc["name"], "member_user_id": body.user_id, "member_email": member_doc.get("email")},
)
@router.delete("/{client_id}/teams/{team_id}/members/{user_id}", status_code=204) @router.delete("/{client_id}/teams/{team_id}/members/{user_id}", status_code=204)
@ -460,56 +345,22 @@ async def remove_team_member(
client_id: str, client_id: str,
team_id: str, team_id: str,
user_id: str, user_id: str,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
team_doc = await _get_team_or_404(team_id, client_id, db) await _get_team_or_404(team_id, client_id, db)
member_doc = await db.users.find_one({"_id": user_id})
await db.teams.update_one( await db.teams.update_one(
{"_id": team_id}, {"_id": team_id},
{"$pull": {"member_user_ids": user_id}, "$set": {"updated_at": _now()}}, {"$pull": {"member_user_ids": user_id}, "$set": {"updated_at": _now()}},
) )
await db.memberships.update_one(
{"user_id": user_id, "organization_id": client_id},
{"$pull": {"team_ids": team_id}},
)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_MEMBER_REMOVE,
description=f"User '{member_doc.get('email', user_id) if member_doc else user_id}' removed from team '{team_doc['name']}' of client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team_doc["name"], "member_user_id": user_id, "member_email": member_doc.get("email") if member_doc else None},
)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Project endpoints # Project endpoints
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@router.get("/all-projects", response_model=list[Project])
async def list_all_projects(
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Return all active projects accessible to the current user (across all clients)."""
if current_user.role in (UserRole.ADMIN, UserRole.PRODUCTION, UserRole.PROJECT_MANAGER):
docs = await db.projects.find({"is_active": True}).to_list(None)
else:
accessible_client_ids = await _get_accessible_client_ids(current_user, db)
if not accessible_client_ids:
return []
docs = await db.projects.find(
{"client_id": {"$in": accessible_client_ids}, "is_active": True}
).to_list(None)
return [_project_from_doc(d) for d in docs]
@router.get("/{client_id}/projects", response_model=list[Project]) @router.get("/{client_id}/projects", response_model=list[Project])
async def list_projects( async def list_projects(
client_id: str, client_id: str,
@ -526,12 +377,11 @@ async def list_projects(
async def create_project( async def create_project(
client_id: str, client_id: str,
body: ProjectCreate, body: ProjectCreate,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_client_member(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
now = _now() now = _now()
project_id = str(ObjectId()) project_id = str(ObjectId())
await db.projects.insert_one({ await db.projects.insert_one({
@ -539,25 +389,11 @@ async def create_project(
"name": body.name, "name": body.name,
"client_id": client_id, "client_id": client_id,
"is_active": True, "is_active": True,
"default_languages": body.default_languages,
"default_linguist_id": body.default_linguist_id,
"default_reviewer_id": body.default_reviewer_id,
"created_at": now, "created_at": now,
"updated_at": now, "updated_at": now,
}) })
doc = await db.projects.find_one({"_id": project_id}) doc = await db.projects.find_one({"_id": project_id})
project = _project_from_doc(doc) return _project_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_PROJECT_CREATE,
description=f"Project '{project.name}' created for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"project_id": project_id, "project_name": project.name, "default_languages": body.default_languages},
)
return project
@router.patch("/{client_id}/projects/{project_id}", response_model=Project) @router.patch("/{client_id}/projects/{project_id}", response_model=Project)
@ -565,58 +401,35 @@ async def update_project(
client_id: str, client_id: str,
project_id: str, project_id: str,
body: ProjectUpdate, body: ProjectUpdate,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
await _get_project_or_404(project_id, client_id, db) await _get_project_or_404(project_id, client_id, db)
update = dict(body.model_dump(exclude_none=True).items()) update = {k: v for k, v in body.model_dump(exclude_none=True).items()}
if not update: if not update:
raise HTTPException(status_code=422, detail="No fields to update") raise HTTPException(status_code=422, detail="No fields to update")
update["updated_at"] = _now() update["updated_at"] = _now()
await db.projects.update_one({"_id": project_id}, {"$set": update}) await db.projects.update_one({"_id": project_id}, {"$set": update})
doc = await db.projects.find_one({"_id": project_id}) doc = await db.projects.find_one({"_id": project_id})
project = _project_from_doc(doc) return _project_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_PROJECT_UPDATE,
description=f"Project '{project.name}' updated for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"project_id": project_id, "project_name": project.name, "fields_updated": list(body.model_dump(exclude_none=True).keys())},
)
return project
@router.delete("/{client_id}/projects/{project_id}", status_code=204) @router.delete("/{client_id}/projects/{project_id}", status_code=204)
async def archive_project( async def archive_project(
client_id: str, client_id: str,
project_id: str, project_id: str,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
client_doc = await _get_client_or_404(client_id, db) await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db) await _assert_pm_or_admin(current_user, client_id, db)
project_doc = await _get_project_or_404(project_id, client_id, db) await _get_project_or_404(project_id, client_id, db)
await db.projects.update_one( await db.projects.update_one(
{"_id": project_id}, {"_id": project_id},
{"$set": {"is_active": False, "updated_at": _now()}}, {"$set": {"is_active": False, "updated_at": _now()}},
) )
await audit_logger.log_action(
action=AuditAction.CLIENT_PROJECT_ARCHIVE,
description=f"Project '{project_doc['name']}' archived for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"project_id": project_id, "project_name": project_doc["name"]},
)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -636,37 +449,6 @@ async def _assert_pm_or_admin(user: User, client_id: str, db: AsyncIOMotorDataba
raise HTTPException(status_code=403, detail="Not a manager for this client") raise HTTPException(status_code=403, detail="Not a manager for this client")
async def _assert_pm_or_client_member(user: User, client_id: str, db: AsyncIOMotorDatabase) -> None:
"""Allow PM/ADMIN/PROD or any org member (CLIENT role) with membership in this client's org."""
if user.role in (UserRole.ADMIN, UserRole.PRODUCTION):
return
if user.role == UserRole.PROJECT_MANAGER:
if client_id in (user.pm_client_ids or []):
return
mem = await db.memberships.find_one({"user_id": str(user.id), "organization_id": client_id})
if mem and mem.get("role_in_org") in ("owner", "admin", "manager"):
return
# Allow CLIENT users who are members of the org
if user.role == UserRole.CLIENT:
mem = await db.memberships.find_one({"user_id": str(user.id), "organization_id": client_id})
if mem:
return
raise HTTPException(status_code=403, detail="Not authorized to create projects for this client")
async def _get_accessible_client_ids(user: User, db: AsyncIOMotorDatabase) -> list[str]:
"""Return list of client_ids the user can access."""
ids: set[str] = set()
# PM assignments (legacy)
if user.pm_client_ids:
ids.update(user.pm_client_ids)
# Org memberships
mems = await db.memberships.find({"user_id": str(user.id)}).to_list(None)
for m in mems:
ids.add(m["organization_id"])
return list(ids)
async def _assert_client_access(user: User, client_id: str, db: AsyncIOMotorDatabase) -> None: async def _assert_client_access(user: User, client_id: str, db: AsyncIOMotorDatabase) -> None:
"""Allow platform staff, org members (any role), or PM of the client.""" """Allow platform staff, org members (any role), or PM of the client."""
if user.role in (UserRole.ADMIN, UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.LINGUIST): if user.role in (UserRole.ADMIN, UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.LINGUIST):
@ -678,4 +460,6 @@ async def _assert_client_access(user: User, client_id: str, db: AsyncIOMotorData
# Legacy fallback for pre-migration users # Legacy fallback for pre-migration users
if user.role == UserRole.PROJECT_MANAGER and client_id in (user.pm_client_ids or []): if user.role == UserRole.PROJECT_MANAGER and client_id in (user.pm_client_ids or []):
return return
if user.role in (UserRole.CLIENT, UserRole.PROJECT_MANAGER):
return
raise HTTPException(status_code=403, detail="Insufficient permissions") raise HTTPException(status_code=403, detail="Insufficient permissions")

View file

@ -3,11 +3,11 @@ from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.database import get_database from ...core.database import get_database
from ...core.dependencies import get_current_user from ...core.dependencies import get_current_user
from ...models.audit_log import AuditAction
from ...models.user import User from ...models.user import User
from ...schemas.file import SignedUploadRequest, SignedUploadResponse from ...schemas.file import SignedUploadRequest, SignedUploadResponse
from ...services.audit_logger import audit_logger
from ...services.gcs import generate_signed_upload_url from ...services.gcs import generate_signed_upload_url
from ...services.audit_logger import audit_logger
from ...models.audit_log import AuditAction
router = APIRouter(prefix="/files", tags=["files"]) router = APIRouter(prefix="/files", tags=["files"])
@ -28,11 +28,11 @@ async def get_signed_upload_url(
status_code=status.HTTP_400_BAD_REQUEST, status_code=status.HTTP_400_BAD_REQUEST,
detail="Only video files are supported" detail="Only video files are supported"
) )
# Generate unique blob path # Generate unique blob path
from bson import ObjectId from bson import ObjectId
blob_path = f"temp/{ObjectId()}/{request.filename}" blob_path = f"temp/{ObjectId()}/{request.filename}"
try: try:
# Generate signed upload URL with form fields # Generate signed upload URL with form fields
signed_data = await generate_signed_upload_url( signed_data = await generate_signed_upload_url(
@ -40,7 +40,7 @@ async def get_signed_upload_url(
content_type=request.content_type, content_type=request.content_type,
max_size=request.max_size or 1024 * 1024 * 1024 # 1GB default max_size=request.max_size or 1024 * 1024 * 1024 # 1GB default
) )
await audit_logger.log_action( await audit_logger.log_action(
action=AuditAction.FILE_UPLOAD, action=AuditAction.FILE_UPLOAD,
description=f"Signed upload URL generated for {request.filename}", description=f"Signed upload URL generated for {request.filename}",
@ -62,4 +62,4 @@ async def get_signed_upload_url(
raise HTTPException( raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to generate signed upload URL: {str(e)}" detail=f"Failed to generate signed upload URL: {str(e)}"
) from None )

View file

@ -1,326 +0,0 @@
"""
Glossary management endpoints.
Access:
- All glossary mutations (upload, activate, archive) Admin or PM of the client
- Glossary reads (list, detail, terms) Admin, PM, or staff members
Routes are nested under /clients/{client_id}/glossaries to keep ownership clear.
"""
from __future__ import annotations
from fastapi import APIRouter, Depends, File, Form, HTTPException, Query, UploadFile
from ...core.authz import MembershipContext, assert_user_in_org, get_membership_context
from ...core.logging import get_logger
from ...models.audit_log import AuditAction
from ...models.glossary import (
GlossaryDetailResponse,
GlossaryResponse,
GlossaryVersionResponse,
)
from ...models.organization import OrgRole
from ...services import audit_logger as audit_svc
from ...services import glossary_service as svc
logger = get_logger(__name__)
router = APIRouter(
prefix="/clients/{client_id}/glossaries",
tags=["glossaries"],
)
_ALLOWED_CONTENT_TYPES = {
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"application/vnd.ms-excel",
}
_MAX_FILE_SIZE_MB = 50
# ── List glossaries ───────────────────────────────────────────────────────────
@router.get("", response_model=list[GlossaryResponse])
async def list_glossaries(
client_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
"""List all active glossaries for a client."""
assert_user_in_org(ctx, client_id, OrgRole.VIEWER)
glossaries = await svc.get_glossaries_for_client(client_id)
version_map = await svc.get_versions_by_ids([g.current_version_id for g in glossaries if g.current_version_id])
return [_to_response(g, version_map.get(g.current_version_id)) for g in glossaries]
# ── Upload new glossary ───────────────────────────────────────────────────────
@router.post("", response_model=GlossaryDetailResponse, status_code=201)
async def upload_glossary(
client_id: str,
file: UploadFile = File(..., description="xlsx glossary file"),
name: str = Form(...),
source_locale: str = Form(..., description="BCP-47 source locale, e.g. en-GB"),
source_locale_col: str = Form(..., description="xlsx column header for the source language, e.g. en_gb"),
description: str | None = Form(None),
change_note: str | None = Form(None),
ctx: MembershipContext = Depends(get_membership_context),
):
"""Upload a new glossary xlsx file and associate it with a client."""
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
_validate_xlsx(file)
try:
glossary, version = await svc.ingest_glossary(
client_id=client_id,
name=name,
source_locale=source_locale,
source_locale_col=source_locale_col,
file=file,
user_id=str(ctx.user.id),
description=description,
change_note=change_note,
)
except ValueError as exc:
raise HTTPException(status_code=422, detail=str(exc)) from exc
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_UPLOAD,
description=f"Glossary '{name}' uploaded for client {client_id}",
user=ctx.user,
resource_type="glossary",
resource_id=glossary.id,
details={"term_count": version.term_count, "source_locale": source_locale},
)
versions = await svc.get_versions(glossary.id)
return _to_detail_response(glossary, versions)
# ── Get glossary detail ───────────────────────────────────────────────────────
@router.get("/{glossary_id}", response_model=GlossaryDetailResponse)
async def get_glossary(
client_id: str,
glossary_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.VIEWER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
versions = await svc.get_versions(glossary_id)
return _to_detail_response(glossary, versions)
# ── Browse terms ──────────────────────────────────────────────────────────────
@router.get("/{glossary_id}/terms")
async def list_terms(
client_id: str,
glossary_id: str,
version_id: str | None = Query(None, description="Specific version; defaults to active"),
search: str | None = Query(None),
page: int = Query(1, ge=1),
page_size: int = Query(50, ge=1, le=200),
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.VIEWER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
vid = version_id or glossary.current_version_id
if not vid:
return {"terms": [], "total": 0, "page": page, "page_size": page_size}
terms, total = await svc.get_terms_page(vid, search=search, page=page, page_size=page_size)
return {
"terms": [{"source_term": t["source_term"], "translations": t["translations"]} for t in terms],
"total": total,
"page": page,
"page_size": page_size,
}
# ── Upload new version ────────────────────────────────────────────────────────
@router.post("/{glossary_id}/versions", response_model=GlossaryVersionResponse, status_code=201)
async def upload_version(
client_id: str,
glossary_id: str,
file: UploadFile = File(...),
source_locale_col: str = Form(...),
change_note: str | None = Form(None),
ctx: MembershipContext = Depends(get_membership_context),
):
"""Upload a new xlsx file as a new version of an existing glossary."""
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
_validate_xlsx(file)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
try:
version = await svc.ingest_new_version(
glossary_id=glossary_id,
source_locale_col=source_locale_col,
file=file,
user_id=str(ctx.user.id),
change_note=change_note,
)
except ValueError as exc:
raise HTTPException(status_code=422, detail=str(exc)) from exc
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_VERSION_UPLOAD,
description=f"New glossary version uploaded for glossary {glossary_id}",
user=ctx.user,
resource_type="glossary_version",
resource_id=version.id,
details={"term_count": version.term_count, "version_number": version.version_number},
)
return _version_to_response(version)
# ── Activate a version ────────────────────────────────────────────────────────
@router.post("/{glossary_id}/activate")
async def activate_version(
client_id: str,
glossary_id: str,
version_id: str = Form(...),
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
try:
await svc.activate_version(glossary_id, version_id)
except ValueError as exc:
raise HTTPException(status_code=404, detail=str(exc)) from exc
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_ACTIVATE,
description=f"Glossary version {version_id} activated",
user=ctx.user,
resource_type="glossary",
resource_id=glossary_id,
details={"version_id": version_id},
)
return {"status": "ok", "active_version_id": version_id}
# ── Re-queue embedding ────────────────────────────────────────────────────────
@router.post("/{glossary_id}/versions/{version_id}/reembed", status_code=202)
async def reembed_version(
client_id: str,
glossary_id: str,
version_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
"""Re-queue the embedding task for a glossary version (resets failed/pending/stuck embeds)."""
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
versions = await svc.get_versions(glossary_id)
version = next((v for v in versions if str(v.id) == version_id), None)
if not version:
raise HTTPException(status_code=404, detail="Version not found")
try:
import motor.motor_asyncio
from bson import ObjectId
from ...core.config import settings
from ...tasks.embed_glossary import embed_glossary_version_task
client_db = motor.motor_asyncio.AsyncIOMotorClient(settings.mongodb_uri)
db = client_db[settings.mongodb_db]
await db.glossary_versions.update_one(
{"_id": ObjectId(version_id)},
{"$set": {"embedding_status": "pending", "embedded_count": 0}},
)
client_db.close()
embed_glossary_version_task.delay(version_id)
except Exception as exc:
raise HTTPException(status_code=500, detail=f"Failed to queue embedding: {exc}") from exc
return {"status": "queued", "version_id": version_id}
# ── Delete ───────────────────────────────────────────────────────────────────
@router.delete("/{glossary_id}", status_code=204)
async def archive_glossary(
client_id: str,
glossary_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.ADMIN)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
await svc.archive_glossary(glossary_id)
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_ARCHIVE,
description=f"Glossary {glossary_id} archived",
user=ctx.user,
resource_type="glossary",
resource_id=glossary_id,
)
# ── Helpers ───────────────────────────────────────────────────────────────────
def _validate_xlsx(file: UploadFile) -> None:
if file.content_type not in _ALLOWED_CONTENT_TYPES and not (
file.filename and file.filename.endswith(".xlsx")
):
raise HTTPException(
status_code=422,
detail="Only .xlsx files are accepted",
)
def _to_response(g, current_version=None) -> GlossaryResponse:
return GlossaryResponse(
id=str(g.id),
client_id=g.client_id,
name=g.name,
description=g.description,
source_locale=g.source_locale,
source=g.source,
status=g.status,
current_version_id=g.current_version_id,
current_version_embedding_status=current_version.embedding_status if current_version else None,
current_version_embedded_count=current_version.embedded_count if current_version else None,
current_version_term_count=current_version.term_count if current_version else None,
created_at=g.created_at,
created_by=g.created_by,
)
def _version_to_response(v) -> GlossaryVersionResponse:
return GlossaryVersionResponse(
id=str(v.id),
glossary_id=v.glossary_id,
version_number=v.version_number,
term_count=v.term_count,
embedded_count=v.embedded_count,
embedding_status=v.embedding_status,
created_at=v.created_at,
created_by=v.created_by,
change_note=v.change_note,
)
def _to_detail_response(glossary, versions) -> GlossaryDetailResponse:
return GlossaryDetailResponse(
**_to_response(glossary).model_dump(),
versions=[_version_to_response(v) for v in versions],
)

View file

@ -14,21 +14,16 @@ Protected endpoints:
import hashlib import hashlib
import re import re
import secrets import secrets
from datetime import UTC, datetime, timedelta from datetime import datetime, timedelta, timezone
from fastapi import APIRouter, Depends, HTTPException, Request from fastapi import APIRouter, Depends, HTTPException, status
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import bump_user_membership_cache
from ...core.database import get_database from ...core.database import get_database
from ...core.dependencies import get_current_user from ...core.dependencies import get_current_user
from ...core.security import ( from ...core.security import create_access_token, create_refresh_token, get_password_hash
create_access_token,
create_refresh_token,
get_password_hash,
)
from ...models.audit_log import AuditAction
from ...models.invitation import ( from ...models.invitation import (
Invitation,
InvitationAcceptRequest, InvitationAcceptRequest,
InvitationCreate, InvitationCreate,
InvitationPreviewResponse, InvitationPreviewResponse,
@ -36,7 +31,7 @@ from ...models.invitation import (
) )
from ...models.organization import OrgRole from ...models.organization import OrgRole
from ...models.user import AuthProvider, User, UserRole from ...models.user import AuthProvider, User, UserRole
from ...services.audit_logger import audit_logger from ...core.authz import bump_user_membership_cache
from ...services.emailer import email_service from ...services.emailer import email_service
from ...services.membership_service import get_membership, upsert_membership from ...services.membership_service import get_membership, upsert_membership
@ -44,7 +39,7 @@ router = APIRouter(tags=["invitations"])
def _now() -> datetime: def _now() -> datetime:
return datetime.now(UTC) return datetime.now(timezone.utc)
def _hash_token(plaintext: str) -> str: def _hash_token(plaintext: str) -> str:
@ -59,7 +54,7 @@ def _make_token() -> tuple[str, str]:
def _inv_from_doc(doc: dict) -> InvitationResponse: def _inv_from_doc(doc: dict) -> InvitationResponse:
now = _now() now = _now()
expires_at = doc["expires_at"].replace(tzinfo=UTC) if doc["expires_at"].tzinfo is None else doc["expires_at"] expires_at = doc["expires_at"].replace(tzinfo=timezone.utc) if doc["expires_at"].tzinfo is None else doc["expires_at"]
return InvitationResponse( return InvitationResponse(
id=str(doc["_id"]), id=str(doc["_id"]),
email=doc["email"], email=doc["email"],
@ -105,7 +100,6 @@ org_router = APIRouter(prefix="/organizations", tags=["invitations"])
async def create_invitation( async def create_invitation(
org_id: str, org_id: str,
body: InvitationCreate, body: InvitationCreate,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
@ -127,18 +121,6 @@ async def create_invitation(
detail="A pending invitation already exists for this email. Revoke it first to re-invite.", detail="A pending invitation already exists for this email. Revoke it first to re-invite.",
) )
# MT-19: ensure all target_team_ids belong to this org (client_id == org_id)
if body.target_team_ids:
valid_teams = await db.teams.count_documents({
"_id": {"$in": body.target_team_ids},
"client_id": org_id,
})
if valid_teams != len(body.target_team_ids):
raise HTTPException(
status_code=400,
detail="One or more target_team_ids do not belong to this organization.",
)
plaintext, token_hash = _make_token() plaintext, token_hash = _make_token()
now = _now() now = _now()
expires_at = now + timedelta(days=body.expires_in_days) expires_at = now + timedelta(days=body.expires_in_days)
@ -172,17 +154,7 @@ async def create_invitation(
expires_at=expires_at, expires_at=expires_at,
) )
inv = _inv_from_doc(doc) return _inv_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.INVITATION_CREATE,
description=f"Invitation created for '{email_lower}' to organization '{org_id}'",
user=current_user,
request=request,
resource_type="invitation",
resource_id=inv.id,
details={"invited_email": email_lower, "org_id": org_id, "role": body.role_in_org},
)
return inv
@org_router.get("/{org_id}/invitations", response_model=list[InvitationResponse]) @org_router.get("/{org_id}/invitations", response_model=list[InvitationResponse])
@ -202,30 +174,16 @@ async def list_invitations(
async def revoke_invitation( async def revoke_invitation(
org_id: str, org_id: str,
invitation_id: str, invitation_id: str,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
await _assert_org_admin(org_id, current_user, db) await _assert_org_admin(org_id, current_user, db)
inv_doc = await db.invitations.find_one({"_id": invitation_id, "organization_id": org_id})
result = await db.invitations.update_one( result = await db.invitations.update_one(
{"_id": invitation_id, "organization_id": org_id, "accepted_at": None, "revoked_at": None}, {"_id": invitation_id, "organization_id": org_id, "accepted_at": None, "revoked_at": None},
{"$set": {"revoked_at": _now()}}, {"$set": {"revoked_at": _now()}},
) )
if result.matched_count == 0: if result.matched_count == 0:
raise HTTPException(status_code=404, detail="Invitation not found or already accepted/revoked") raise HTTPException(status_code=404, detail="Invitation not found or already accepted/revoked")
await audit_logger.log_action(
action=AuditAction.INVITATION_REVOKE,
description=f"Invitation '{invitation_id}' revoked in organization '{org_id}'",
user=current_user,
request=request,
resource_type="invitation",
resource_id=invitation_id,
details={
"invited_email": inv_doc["email"] if inv_doc else None,
"org_id": org_id,
},
)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -248,7 +206,7 @@ async def preview_invitation(
raise HTTPException(status_code=410, detail="Invitation not found or has expired") raise HTTPException(status_code=410, detail="Invitation not found or has expired")
now = _now() now = _now()
expires_at = doc["expires_at"].replace(tzinfo=UTC) if doc["expires_at"].tzinfo is None else doc["expires_at"] expires_at = doc["expires_at"].replace(tzinfo=timezone.utc) if doc["expires_at"].tzinfo is None else doc["expires_at"]
if doc.get("revoked_at"): if doc.get("revoked_at"):
raise HTTPException(status_code=410, detail="This invitation has been revoked") raise HTTPException(status_code=410, detail="This invitation has been revoked")
@ -297,7 +255,6 @@ async def preview_invitation(
@router.post("/invitations/accept") @router.post("/invitations/accept")
async def accept_invitation( async def accept_invitation(
body: InvitationAcceptRequest, body: InvitationAcceptRequest,
request: Request,
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""Accept an invitation. Creates user if needed, creates membership, returns tokens.""" """Accept an invitation. Creates user if needed, creates membership, returns tokens."""
@ -360,16 +317,12 @@ async def accept_invitation(
await upsert_membership(user_id, org_id, role_in_org, doc["invited_by_user_id"], db) await upsert_membership(user_id, org_id, role_in_org, doc["invited_by_user_id"], db)
await bump_user_membership_cache(user_id) await bump_user_membership_cache(user_id)
# Auto-add to target teams — write to both Team.member_user_ids (legacy) and Membership.team_ids (MT-17) # Auto-add to target teams
for team_id in doc.get("target_team_ids", []): for team_id in doc.get("target_team_ids", []):
await db.teams.update_one( await db.teams.update_one(
{"_id": team_id, "client_id": org_id}, {"_id": team_id, "client_id": org_id},
{"$addToSet": {"member_user_ids": user_id}}, {"$addToSet": {"member_user_ids": user_id}},
) )
await db.memberships.update_one(
{"user_id": user_id, "organization_id": org_id},
{"$addToSet": {"team_ids": team_id}},
)
# Send welcome email # Send welcome email
if not existing_user.get("_welcomed"): if not existing_user.get("_welcomed"):
@ -380,23 +333,12 @@ async def accept_invitation(
org_name=org_name, org_name=org_name,
) )
# Issue JWT tokens with org_ids claim # Issue JWT tokens
_inv_org_ids = [m["organization_id"] async for m in db.memberships.find({"user_id": user_id}, {"organization_id": 1})] access_token = create_access_token(subject=user_id)
access_token = create_access_token(subject=user_id, org_ids=[str(o) for o in _inv_org_ids if o])
refresh_token = create_refresh_token(subject=user_id) refresh_token = create_refresh_token(subject=user_id)
org_name, org_slug = await _get_org_name(org_id, db) org_name, org_slug = await _get_org_name(org_id, db)
await audit_logger.log_action(
action=AuditAction.INVITATION_ACCEPT,
description=f"Invitation accepted by '{email_lower}' for organization '{org_id}'",
user=None,
request=request,
resource_type="invitation",
resource_id=str(doc["_id"]),
details={"invited_email": email_lower, "org_id": org_id},
)
return { return {
"access_token": access_token, "access_token": access_token,
"refresh_token": refresh_token, "refresh_token": refresh_token,

File diff suppressed because it is too large Load diff

View file

@ -1,580 +0,0 @@
"""Per-language QC endpoints — two-stage (linguist + reviewer) assignment, workflow, comments."""
from datetime import datetime
from fastapi import APIRouter, Depends, HTTPException, Query, Request
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel, Field
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...models.audit_log import AuditAction
from ...models.job import LanguageQCComment, LanguageQCState
from ...models.user import User, UserRole
from ...services import language_qc as lqc
from ...services.audit_logger import audit_logger
router = APIRouter(tags=["language-qc"])
# ── Request / response schemas ────────────────────────────────────────────────
class AssignRequest(BaseModel):
linguist_user_id: str
notes: str | None = None
deadline: datetime | None = None
class ReassignRequest(BaseModel):
linguist_user_id: str
notes: str | None = None
deadline: datetime | None = None
class AssignReviewerRequest(BaseModel):
reviewer_user_id: str
notes: str | None = None
deadline: datetime | None = None
class ReassignReviewerRequest(BaseModel):
reviewer_user_id: str
notes: str | None = None
deadline: datetime | None = None
class ApproveLanguageRequest(BaseModel):
notes: str | None = None
class RejectLanguageRequest(BaseModel):
notes: str
category: str | None = None # timing | mistranslation | terminology | profanity | length | other
class ReopenLanguageRequest(BaseModel):
notes: str | None = None
class AddCommentRequest(BaseModel):
body: str = Field(..., min_length=1, max_length=4000)
class LanguageQCStateResponse(BaseModel):
lang: str
state: LanguageQCState
class LanguageQCMapResponse(BaseModel):
job_id: str
language_qc: dict[str, LanguageQCState]
class QueueItem(BaseModel):
job_id: str
job_title: str
job_status: str
lang: str
lang_qc_status: str
assigned_at: str | None = None
reviewed_at: str | None = None
class QueueResponse(BaseModel):
items: list[QueueItem]
total: int
class BulkAssignRequest(BaseModel):
linguist_user_id: str
reviewer_user_id: str | None = None
languages: list[str] | None = None # None = all available languages
only_unassigned: bool = False # skip languages that already have an assignment
deadline: datetime | None = None
class BulkAssignResponse(BaseModel):
assigned: list[str]
skipped: list[str]
errors: dict[str, str]
# ── Routes ────────────────────────────────────────────────────────────────────
@router.get("/jobs/{job_id}/language-qc", response_model=LanguageQCMapResponse)
async def get_language_qc(
job_id: str,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION,
UserRole.PROJECT_MANAGER, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
# Lazy auto-assignment: apply project/job defaults on first open in PENDING_QC
await lqc.auto_assign_defaults(db, job_id)
states = await lqc.get_all_states(db, job_id)
return LanguageQCMapResponse(job_id=job_id, language_qc=states)
# ── Linguist assignment ───────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/assign", response_model=LanguageQCStateResponse)
async def assign_language(
job_id: str,
lang: str,
request: AssignRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.assign_linguist(
db, job_id, lang, request.linguist_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_ASSIGN,
description=f"Language '{lang}' assigned to linguist '{request.linguist_user_id}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "linguist_user_id": request.linguist_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reassign", response_model=LanguageQCStateResponse)
async def reassign_language(
job_id: str,
lang: str,
request: ReassignRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reassign_linguist(
db, job_id, lang, request.linguist_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REASSIGN,
description=f"Language '{lang}' reassigned to linguist '{request.linguist_user_id}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "linguist_user_id": request.linguist_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Reviewer assignment ───────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/assign-reviewer", response_model=LanguageQCStateResponse)
async def assign_reviewer(
job_id: str,
lang: str,
request: AssignReviewerRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.assign_reviewer(
db, job_id, lang, request.reviewer_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REVIEWER_ASSIGN,
description=f"Reviewer '{request.reviewer_user_id}' assigned to language '{lang}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "reviewer_user_id": request.reviewer_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reassign-reviewer", response_model=LanguageQCStateResponse)
async def reassign_reviewer(
job_id: str,
lang: str,
request: ReassignReviewerRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reassign_reviewer(
db, job_id, lang, request.reviewer_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REVIEWER_REASSIGN,
description=f"Reviewer reassigned to '{request.reviewer_user_id}' for language '{lang}', job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "reviewer_user_id": request.reviewer_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Bulk assignment ───────────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/bulk-assign", response_model=BulkAssignResponse)
async def bulk_assign_languages(
job_id: str,
request: BulkAssignRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Assign one linguist (and optionally one reviewer) to multiple languages in one call."""
job_doc = await db["jobs"].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
available = list((job_doc.get("outputs") or {}).keys())
target_langs = request.languages if request.languages else available
assigned: list[str] = []
skipped: list[str] = []
errors: dict[str, str] = {}
language_qc = job_doc.get("language_qc") or {}
for lang in target_langs:
if lang not in available:
skipped.append(lang)
continue
lang_state = language_qc.get(lang) or {}
already_assigned = bool(lang_state.get("assigned_linguist_id"))
if request.only_unassigned and already_assigned:
skipped.append(lang)
continue
try:
await lqc.assign_linguist(
db, job_id, lang, request.linguist_user_id, current_user,
http_request=http_request, deadline=request.deadline,
)
except Exception as exc:
errors[lang] = str(exc)
continue
if request.reviewer_user_id:
try:
await lqc.assign_reviewer(
db, job_id, lang, request.reviewer_user_id, current_user,
http_request=http_request, deadline=request.deadline,
)
except Exception as exc:
errors[f"{lang}:reviewer"] = str(exc)
assigned.append(lang)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_BULK_ASSIGN,
description=f"Bulk assignment for job {job_id}: {len(assigned)} language(s) assigned to linguist '{request.linguist_user_id}'",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={
"languages": assigned,
"linguist_user_id": request.linguist_user_id,
"reviewer_user_id": request.reviewer_user_id,
"skipped": skipped,
"errors": errors,
},
)
return BulkAssignResponse(assigned=assigned, skipped=skipped, errors=errors)
# ── Workflow transitions ──────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/start-work", response_model=LanguageQCStateResponse)
async def start_linguist_work(
job_id: str,
lang: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Linguist opens the language — pending → in_progress."""
state = await lqc.start_linguist_work(db, job_id, lang, current_user)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_START_WORK,
description=f"Linguist started work on language '{lang}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/submit", response_model=LanguageQCStateResponse)
async def submit_for_review(
job_id: str,
lang: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Linguist submits — in_progress → pending_review. Notifies reviewer by email."""
state = await lqc.submit_for_review(db, job_id, lang, current_user, http_request=http_request)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_SUBMIT,
description=f"Language '{lang}' submitted for review for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/open-review", response_model=LanguageQCStateResponse)
async def open_review(
job_id: str,
lang: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Reviewer opens the review — pending_review → in_review."""
state = await lqc.open_review(db, job_id, lang, current_user, http_request=http_request)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_OPEN_REVIEW,
description=f"Reviewer opened review for language '{lang}', job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Approve / Reject / Reopen ─────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/approve", response_model=LanguageQCStateResponse)
async def approve_language(
job_id: str,
lang: str,
request: ApproveLanguageRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.approve_language(
db, job_id, lang, current_user, http_request=http_request, notes=request.notes,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_APPROVE,
description=f"Language '{lang}' approved for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "notes": request.notes},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reject", response_model=LanguageQCStateResponse)
async def reject_language(
job_id: str,
lang: str,
request: RejectLanguageRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reject_language(
db, job_id, lang, current_user, request.notes, category=request.category, http_request=http_request,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REJECT,
description=f"Language '{lang}' rejected for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "notes": request.notes, "category": request.category},
)
return LanguageQCStateResponse(lang=lang, state=state)
class MarkCueReviewedRequest(BaseModel):
total_cues: int | None = None # client sends on first call to set total
@router.post("/jobs/{job_id}/languages/{lang}/mark-cue-reviewed", response_model=LanguageQCStateResponse)
async def mark_cue_reviewed(
job_id: str,
lang: str,
request: MarkCueReviewedRequest,
http_request: Request,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Increment reviewed_cues counter; optionally set total_cues on first call."""
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
inc_op: dict = {f"language_qc.{lang}.reviewed_cues": 1}
set_op: dict = {"updated_at": datetime.utcnow()}
if request.total_cues is not None:
set_op[f"language_qc.{lang}.total_cues"] = request.total_cues
await db.jobs.update_one({"_id": job_id}, {"$inc": inc_op, "$set": set_op})
updated_doc = await db.jobs.find_one({"_id": job_id})
state_dict = (updated_doc.get("language_qc") or {}).get(lang, {})
from ...models.job import LanguageQCState
state = LanguageQCState(**state_dict) if isinstance(state_dict, dict) else LanguageQCState()
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reopen", response_model=LanguageQCStateResponse)
async def reopen_language(
job_id: str,
lang: str,
request: ReopenLanguageRequest,
http_request: Request,
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reopen_language(
db, job_id, lang, current_user, http_request=http_request, notes=request.notes,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REOPEN,
description=f"Language '{lang}' reopened for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "notes": request.notes},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Comments ──────────────────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/comments", response_model=LanguageQCComment, status_code=201)
async def add_comment(
job_id: str,
lang: str,
request: AddCommentRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.REVIEWER, UserRole.PROJECT_MANAGER,
UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
comment = await lqc.add_comment(
db, job_id, lang, current_user, request.body, http_request=http_request,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_COMMENT,
description=f"Comment added to language '{lang}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "comment_id": str(comment.id) if hasattr(comment, "id") else None},
)
return comment
@router.get("/jobs/{job_id}/languages/{lang}/comments", response_model=list[LanguageQCComment])
async def list_comments(
job_id: str,
lang: str,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.REVIEWER, UserRole.PROJECT_MANAGER,
UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.get_state(db, job_id, lang)
if state is None:
return []
return state.comments
# ── Queues ─────────────────────────────────────────────────────────────────────
@router.get("/me/language-qc-queue", response_model=QueueResponse)
async def my_language_qc_queue(
role: str = Query("linguist", description="'linguist' or 'reviewer'"),
qc_status: str | None = Query(None, description="Filter by status"),
skip: int = Query(0, ge=0),
limit: int = Query(50, ge=1, le=200),
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List jobs and languages assigned to the current user as linguist or reviewer."""
# ADMIN sees all orgs; staff scoped to their orgs from JWT claim (MT-18)
org_ids: list[str] | None = None if current_user.role == UserRole.ADMIN else getattr(current_user, "org_ids", None)
if role == "reviewer":
jobs = await lqc.list_for_reviewer(
db, str(current_user.id), accessible_org_ids=org_ids,
status_filter=qc_status, skip=skip, limit=limit,
)
else:
jobs = await lqc.list_for_linguist(
db, str(current_user.id), accessible_org_ids=org_ids,
status_filter=qc_status, skip=skip, limit=limit,
)
items: list[QueueItem] = []
for job in jobs:
job_id = str(job["_id"])
for assignment in job.get("_my_assignments", []):
lang = assignment["lang"]
state_raw = (job.get("language_qc") or {}).get(lang, {})
items.append(QueueItem(
job_id=job_id,
job_title=job.get("title", ""),
job_status=job.get("status", ""),
lang=lang,
lang_qc_status=assignment.get("status", "pending"),
assigned_at=state_raw.get("assigned_at").isoformat() if isinstance(state_raw, dict) and state_raw.get("assigned_at") else None,
reviewed_at=state_raw.get("reviewed_at").isoformat() if isinstance(state_raw, dict) and state_raw.get("reviewed_at") else None,
))
return QueueResponse(items=items, total=len(items))

View file

@ -12,25 +12,19 @@ underlying MongoDB collections used by routes_clients.py so both
endpoints coexist without data duplication. endpoints coexist without data duplication.
""" """
from datetime import UTC, datetime from datetime import datetime, timezone
from fastapi import APIRouter, Depends, HTTPException, Request from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel from pydantic import BaseModel
from ...core.authz import bump_user_membership_cache
from ...core.database import get_database from ...core.database import get_database
from ...core.dependencies import get_current_user, require_roles from ...core.dependencies import get_current_user, require_roles
from ...models.audit_log import AuditAction
from ...models.membership import MemberDetail, MembershipCreate, MembershipUpdate from ...models.membership import MemberDetail, MembershipCreate, MembershipUpdate
from ...models.organization import ( from ...models.organization import OrgRole, Organization, OrganizationCreate, OrganizationUpdate
Organization,
OrganizationCreate,
OrganizationUpdate,
OrgRole,
)
from ...models.user import User, UserRole from ...models.user import User, UserRole
from ...services.audit_logger import audit_logger from ...core.authz import bump_user_membership_cache
from ...services.membership_service import ( from ...services.membership_service import (
get_membership, get_membership,
get_memberships_for_user, get_memberships_for_user,
@ -45,7 +39,7 @@ ADMIN_ROLES = [UserRole.ADMIN]
def _now() -> datetime: def _now() -> datetime:
return datetime.now(UTC) return datetime.now(timezone.utc)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -121,7 +115,6 @@ class _OrgCreate(BaseModel):
@router.post("", response_model=Organization, status_code=201) @router.post("", response_model=Organization, status_code=201)
async def create_organization( async def create_organization(
body: OrganizationCreate, body: OrganizationCreate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
@ -140,25 +133,13 @@ async def create_organization(
"updated_at": now, "updated_at": now,
} }
await db.clients.insert_one(doc) await db.clients.insert_one(doc)
org = _org_from_doc(doc) return _org_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.ORG_CREATE,
description=f"Organization '{org.name}' created",
user=current_user,
request=request,
resource_type="organization",
resource_id=str(org.id),
resource_name=org.name,
details={"slug": org.slug},
)
return org
@router.patch("/{org_id}", response_model=Organization) @router.patch("/{org_id}", response_model=Organization)
async def update_organization( async def update_organization(
org_id: str, org_id: str,
body: OrganizationUpdate, body: OrganizationUpdate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
@ -175,18 +156,7 @@ async def update_organization(
await db.clients.update_one({"_id": org_id}, {"$set": updates}) await db.clients.update_one({"_id": org_id}, {"$set": updates})
updated = {**doc, **updates} updated = {**doc, **updates}
org = _org_from_doc(updated) return _org_from_doc(updated)
await audit_logger.log_action(
action=AuditAction.ORG_UPDATE,
description=f"Organization '{org.name}' updated",
user=current_user,
request=request,
resource_type="organization",
resource_id=str(org.id),
resource_name=org.name,
details={k: v for k, v in updates.items() if k != "updated_at"},
)
return org
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -208,7 +178,6 @@ async def list_members(
async def add_member( async def add_member(
org_id: str, org_id: str,
body: MembershipCreate, body: MembershipCreate,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
@ -224,15 +193,6 @@ async def add_member(
members = await list_org_members(org_id, db) members = await list_org_members(org_id, db)
for m in members: for m in members:
if m.user_id == body.user_id: if m.user_id == body.user_id:
await audit_logger.log_action(
action=AuditAction.ORG_MEMBER_ADD,
description=f"Member '{body.user_id}' added to organization '{org_id}' with role '{body.role_in_org}'",
user=current_user,
request=request,
resource_type="organization",
resource_id=org_id,
details={"user_id": body.user_id, "role": body.role_in_org},
)
return m return m
raise HTTPException(status_code=500, detail="Membership created but could not be retrieved") raise HTTPException(status_code=500, detail="Membership created but could not be retrieved")
@ -242,7 +202,6 @@ async def update_member(
org_id: str, org_id: str,
user_id: str, user_id: str,
body: MembershipUpdate, body: MembershipUpdate,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
@ -259,15 +218,6 @@ async def update_member(
members = await list_org_members(org_id, db) members = await list_org_members(org_id, db)
for m in members: for m in members:
if m.user_id == user_id: if m.user_id == user_id:
await audit_logger.log_action(
action=AuditAction.ORG_MEMBER_UPDATE,
description=f"Member '{user_id}' role updated in organization '{org_id}' to '{body.role_in_org}'",
user=current_user,
request=request,
resource_type="organization",
resource_id=org_id,
details={"user_id": user_id, "role": body.role_in_org},
)
return m return m
raise HTTPException(status_code=500, detail="Could not retrieve updated membership") raise HTTPException(status_code=500, detail="Could not retrieve updated membership")
@ -276,7 +226,6 @@ async def update_member(
async def remove_member( async def remove_member(
org_id: str, org_id: str,
user_id: str, user_id: str,
request: Request,
current_user: User = Depends(get_current_user), current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
@ -290,15 +239,6 @@ async def remove_member(
await remove_membership(user_id, org_id, db) await remove_membership(user_id, org_id, db)
await bump_user_membership_cache(user_id) await bump_user_membership_cache(user_id)
await audit_logger.log_action(
action=AuditAction.ORG_MEMBER_REMOVE,
description=f"Member '{user_id}' removed from organization '{org_id}'",
user=current_user,
request=request,
resource_type="organization",
resource_id=org_id,
details={"user_id": user_id, "role": existing.role_in_org},
)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------

View file

@ -1,14 +1,14 @@
"""API routes for review notes - timestamped notes on video assets during review.""" """API routes for review notes - timestamped notes on video assets during review."""
from datetime import datetime from datetime import datetime
from typing import Optional
from bson import ObjectId from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException, Query, status from fastapi import APIRouter, Depends, HTTPException, Query, status
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, get_job_or_403, get_membership_context
from ...core.database import get_database from ...core.database import get_database
from ...core.dependencies import require_roles from ...core.dependencies import get_current_user, require_roles
from ...core.logging import get_logger from ...core.logging import get_logger
from ...models.user import User, UserRole from ...models.user import User, UserRole
from ...schemas.review_note import ( from ...schemas.review_note import (
@ -25,13 +25,18 @@ router = APIRouter(prefix="/jobs/{job_id}/review-notes", tags=["review-notes"])
@router.get("", response_model=ReviewNotesListResponse) @router.get("", response_model=ReviewNotesListResponse)
async def list_review_notes( async def list_review_notes(
job_id: str, job_id: str,
asset_key: str | None = Query(None, description="Filter notes by asset key"), asset_key: Optional[str] = Query(None, description="Filter notes by asset key"),
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""List all review notes for a job, optionally filtered by asset key.""" """List all review notes for a job, optionally filtered by asset key."""
await get_job_or_403(job_id, ctx, db) # org check + existence check # Verify job exists
job = await db.jobs.find_one({"_id": job_id})
if not job:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Job not found"
)
# Build query # Build query
query = {"job_id": job_id} query = {"job_id": job_id}
@ -53,11 +58,16 @@ async def create_review_note(
job_id: str, job_id: str,
request: ReviewNoteCreateRequest, request: ReviewNoteCreateRequest,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""Create a new review note for a video asset.""" """Create a new review note for a video asset."""
await get_job_or_403(job_id, ctx, db) # org check + existence check # Verify job exists
job = await db.jobs.find_one({"_id": job_id})
if not job:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Job not found"
)
# Create note document # Create note document
note_id = str(ObjectId()) note_id = str(ObjectId())
@ -86,11 +96,9 @@ async def get_review_note(
job_id: str, job_id: str,
note_id: str, note_id: str,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""Get a single review note by ID.""" """Get a single review note by ID."""
await get_job_or_403(job_id, ctx, db) # org check
note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id}) note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id})
if not note: if not note:
raise HTTPException( raise HTTPException(
@ -107,11 +115,9 @@ async def update_review_note(
note_id: str, note_id: str,
request: ReviewNoteUpdateRequest, request: ReviewNoteUpdateRequest,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""Update a review note. Only the note owner can update.""" """Update a review note. Only the note owner can update."""
await get_job_or_403(job_id, ctx, db) # org check
note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id}) note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id})
if not note: if not note:
raise HTTPException( raise HTTPException(
@ -145,11 +151,9 @@ async def delete_review_note(
job_id: str, job_id: str,
note_id: str, note_id: str,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)), current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
): ):
"""Delete a review note. Only the note owner can delete.""" """Delete a review note. Only the note owner can delete."""
await get_job_or_403(job_id, ctx, db) # org check
note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id}) note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id})
if not note: if not note:
raise HTTPException( raise HTTPException(

View file

@ -1,354 +0,0 @@
"""Share-token endpoints — create/revoke/list tokens + public read-only view + client decision."""
import secrets
from datetime import datetime, timedelta
from typing import Literal
from fastapi import APIRouter, Depends, HTTPException, Request
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel
from ...core.config import settings
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...models.audit_log import AuditAction
from ...models.share_token import ShareTokenResponse
from ...models.user import User, UserRole
from ...services.audit_logger import audit_logger
from ...services.gcs import get_signed_download_url
router = APIRouter(tags=["share"])
_TOKENS = "share_tokens"
_JOBS = "jobs"
def _share_url(token: str) -> str:
return f"{settings.app_url}/share/{token}"
# ── Request schemas ───────────────────────────────────────────────────────────
class CreateShareTokenRequest(BaseModel):
expires_in_days: int | None = 30 # None = no expiry
label: str | None = None
class ShareTokenListResponse(BaseModel):
tokens: list[ShareTokenResponse]
class PublicJobPreviewLanguage(BaseModel):
captions_vtt_url: str | None = None
audio_description_vtt_url: str | None = None
accessible_video_mp4_url: str | None = None
audio_description_mp3_url: str | None = None
class PublicJobPreviewResponse(BaseModel):
job_id: str
job_title: str
job_status: str
source_language: str
languages: list[str]
language_outputs: dict[str, PublicJobPreviewLanguage]
class ClientDecisionRequest(BaseModel):
action: Literal["approve", "reject"]
notes: str | None = None
client_name: str | None = None
class ClientDecisionResponse(BaseModel):
status: str
new_job_status: str
# ── Authenticated routes ──────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/share", response_model=ShareTokenResponse, status_code=201)
async def create_share_token(
job_id: str,
request: CreateShareTokenRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Generate a read-only share link for a job."""
job_doc = await db[_JOBS].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
token_id = secrets.token_hex(32)
now = datetime.utcnow()
expires_at = (now + timedelta(days=request.expires_in_days)) if request.expires_in_days else None
token_doc = {
"_id": token_id,
"job_id": job_id,
"organization_id": job_doc.get("organization_id", ""),
"created_by_user_id": str(current_user.id),
"created_by_email": current_user.email,
"created_at": now,
"expires_at": expires_at,
"is_active": True,
"label": request.label,
}
await db[_TOKENS].insert_one(token_doc)
await audit_logger.log_action(
action=AuditAction.SHARE_TOKEN_CREATE,
description=f"Share token created for job '{job_id}'",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"token_id": token_id, "label": request.label, "expires_in_days": request.expires_in_days},
)
return ShareTokenResponse(
id=token_id,
job_id=job_id,
created_by_email=current_user.email,
created_at=now,
expires_at=expires_at,
is_active=True,
label=request.label,
share_url=_share_url(token_id),
)
@router.get("/jobs/{job_id}/share", response_model=ShareTokenListResponse)
async def list_share_tokens(
job_id: str,
current_user: User = Depends(require_roles(
UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List all active share tokens for a job."""
job_doc = await db[_JOBS].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
cursor = db[_TOKENS].find({"job_id": job_id, "is_active": True})
tokens = []
async for doc in cursor:
tokens.append(ShareTokenResponse(
id=doc["_id"],
job_id=doc["job_id"],
created_by_email=doc["created_by_email"],
created_at=doc["created_at"],
expires_at=doc.get("expires_at"),
is_active=doc["is_active"],
label=doc.get("label"),
share_url=_share_url(doc["_id"]),
))
return ShareTokenListResponse(tokens=tokens)
@router.delete("/jobs/{job_id}/share/{token_id}", status_code=204)
async def revoke_share_token(
job_id: str,
token_id: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Revoke (deactivate) a share token."""
result = await db[_TOKENS].update_one(
{"_id": token_id, "job_id": job_id},
{"$set": {"is_active": False}},
)
if result.matched_count == 0:
raise HTTPException(status_code=404, detail="Token not found")
await audit_logger.log_action(
action=AuditAction.SHARE_TOKEN_REVOKE,
description=f"Share token '{token_id}' revoked for job '{job_id}'",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"token_id": token_id},
)
# ── Public route (no auth) ────────────────────────────────────────────────────
@router.get("/public/share/{token}", response_model=PublicJobPreviewResponse)
async def get_public_job_preview(
token: str,
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Return read-only job preview for a valid share token. No authentication required."""
token_doc = await db[_TOKENS].find_one({"_id": token, "is_active": True})
if not token_doc:
raise HTTPException(status_code=404, detail="Share link not found or has been revoked")
if token_doc.get("expires_at") and token_doc["expires_at"] < datetime.utcnow():
raise HTTPException(status_code=410, detail="Share link has expired")
job_doc = await db[_JOBS].find_one({"_id": token_doc["job_id"]})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
outputs = job_doc.get("outputs") or {}
language_outputs: dict[str, PublicJobPreviewLanguage] = {}
for lang, lang_output in outputs.items():
if not isinstance(lang_output, dict):
continue
lang_data = PublicJobPreviewLanguage()
if "captions_vtt_gcs" in lang_output:
blob_path = lang_output["captions_vtt_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.captions_vtt_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
if "ad_vtt_gcs" in lang_output:
blob_path = lang_output["ad_vtt_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.audio_description_vtt_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
if "ad_mp3_gcs" in lang_output:
blob_path = lang_output["ad_mp3_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.audio_description_mp3_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
if "accessible_video_gcs" in lang_output:
blob_path = lang_output["accessible_video_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.accessible_video_mp4_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
language_outputs[lang] = lang_data
return PublicJobPreviewResponse(
job_id=str(job_doc["_id"]),
job_title=job_doc.get("title", "Untitled"),
job_status=job_doc.get("status", ""),
source_language=job_doc.get("source", {}).get("language", "en"),
languages=list(outputs.keys()),
language_outputs=language_outputs,
)
@router.post("/public/share/{token}/decision", response_model=ClientDecisionResponse)
async def client_decision(
token: str,
request: ClientDecisionRequest,
http_request: Request,
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Submit client approval or rejection via a share link. No authentication required."""
from ...services.validation import asset_validation_service
token_doc = await db[_TOKENS].find_one({"_id": token, "is_active": True})
if not token_doc:
raise HTTPException(status_code=404, detail="Share link not found or has been revoked")
if token_doc.get("expires_at") and token_doc["expires_at"] < datetime.utcnow():
raise HTTPException(status_code=410, detail="Share link has expired")
job_id = token_doc["job_id"]
job_doc = await db[_JOBS].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
if job_doc.get("status") != "pending_final_review":
raise HTTPException(
status_code=409,
detail="This job is not currently awaiting client review"
)
now = datetime.utcnow()
by_label = f"client:{request.client_name or 'anonymous'} (share/{token[:8]})"
if request.action == "approve":
is_valid, validation_errors = await asset_validation_service.validate_job_assets(job_doc)
if not is_valid:
raise HTTPException(
status_code=400,
detail=f"Asset validation failed: {'; '.join(validation_errors)}"
)
new_status = "completed"
update = {
"$set": {
"status": new_status,
"review.notes": request.notes or "",
"updated_at": now,
},
"$push": {
"review.history": {
"at": now,
"status": new_status,
"by": by_label,
"notes": request.notes or "",
}
},
}
else:
new_status = "qc_feedback"
update = {
"$set": {
"status": new_status,
"review.notes": request.notes or "",
"review.reviewer_id": by_label,
"updated_at": now,
},
"$push": {
"review.history": {
"at": now,
"status": new_status,
"by": by_label,
"notes": request.notes or "",
}
},
}
result = await db[_JOBS].find_one_and_update(
{"_id": job_id, "status": "pending_final_review"},
update,
return_document=True,
)
if not result:
raise HTTPException(
status_code=409,
detail="Decision could not be submitted — the job status may have changed"
)
await audit_logger.log_action(
action=AuditAction.SHARE_CLIENT_DECISION,
description=f"Client '{request.client_name or 'anonymous'}' submitted decision '{request.action}' for job '{job_id}' via share token",
user=None,
request=http_request,
resource_type="job",
resource_id=job_id,
details={
"action": request.action,
"token": token,
"client_name": request.client_name,
"new_status": new_status,
"notes": request.notes,
},
)
if request.action == "approve":
try:
from ...tasks.notify import notify_client_task
notify_client_task.delay(job_id)
except Exception:
pass
return ClientDecisionResponse(status="ok", new_job_status=new_status)

View file

@ -1,18 +1,18 @@
import asyncio import asyncio
import time import time
from typing import Literal from typing import Literal, Optional
from fastapi import APIRouter, Depends, HTTPException, Query from fastapi import APIRouter, Depends, HTTPException, Query
from fastapi.responses import Response from fastapi.responses import Response
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from ...core.config import settings from ...core.config import settings
from ...core.dependencies import get_current_user
from ...core.logging import get_logger from ...core.logging import get_logger
from ...services import cost_tracker
from ...services.elevenlabs_voices import elevenlabs_voice_service
from ...services.gemini_tts import gemini_tts_service from ...services.gemini_tts import gemini_tts_service
from ...services.elevenlabs_voices import elevenlabs_voice_service
from ...services.tts import tts_service from ...services.tts import tts_service
from ...services import cost_tracker
from ...core.dependencies import get_current_user
logger = get_logger(__name__) logger = get_logger(__name__)
@ -30,20 +30,20 @@ class VoicePreviewRequest(BaseModel):
style_preset: Literal[ style_preset: Literal[
"neutral", "calm", "energetic", "professional", "warm", "documentary", "custom" "neutral", "calm", "energetic", "professional", "warm", "documentary", "custom"
] = "neutral" ] = "neutral"
custom_style_prompt: str | None = None custom_style_prompt: Optional[str] = None
# ElevenLabs-specific # ElevenLabs-specific
stability: float | None = Field(default=None, ge=0.0, le=1.0) stability: Optional[float] = Field(default=None, ge=0.0, le=1.0)
similarity_boost: float | None = Field(default=None, ge=0.0, le=1.0) similarity_boost: Optional[float] = Field(default=None, ge=0.0, le=1.0)
class VoiceInfo(BaseModel): class VoiceInfo(BaseModel):
"""Structured voice information for any provider.""" """Structured voice information for any provider."""
id: str id: str
name: str name: str
description: str | None = None description: Optional[str] = None
preview_url: str | None = None preview_url: Optional[str] = None
labels: dict[str, str] | None = None labels: Optional[dict[str, str]] = None
category: str | None = None category: Optional[str] = None
class ProviderVoicesResponse(BaseModel): class ProviderVoicesResponse(BaseModel):
@ -52,7 +52,7 @@ class ProviderVoicesResponse(BaseModel):
voices: list[VoiceInfo] voices: list[VoiceInfo]
default: str default: str
available: bool = True available: bool = True
error: str | None = None error: Optional[str] = None
class LanguagesResponse(BaseModel): class LanguagesResponse(BaseModel):
@ -87,12 +87,12 @@ class ProviderOptionsResponse(BaseModel):
"""Available TTS configuration options for a provider.""" """Available TTS configuration options for a provider."""
provider: str provider: str
# Gemini-specific # Gemini-specific
models: list[TTSOptionItem] | None = None models: Optional[list[TTSOptionItem]] = None
style_presets: list[TTSOptionItem] | None = None style_presets: Optional[list[TTSOptionItem]] = None
speed_range: SpeedRange | None = None speed_range: Optional[SpeedRange] = None
# ElevenLabs-specific # ElevenLabs-specific
stability_range: FloatRange | None = None stability_range: Optional[FloatRange] = None
similarity_boost_range: FloatRange | None = None similarity_boost_range: Optional[FloatRange] = None
@router.get("/voices", response_model=ProviderVoicesResponse) @router.get("/voices", response_model=ProviderVoicesResponse)

View file

@ -1,151 +0,0 @@
"""VTT version control endpoints."""
from fastapi import APIRouter, Depends, HTTPException, Query, Request, status
from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, get_job_or_403, get_membership_context
from ...core.config import settings
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...models.audit_log import AuditAction
from ...models.user import User, UserRole
from ...models.vtt_version import (
VttDiffResponse,
VttKind,
VttVersionListResponse,
VttVersionSummary,
)
from ...services import vtt_versioning
from ...services.audit_logger import audit_logger
from ...services.gcs import gcs_service
router = APIRouter(prefix="/jobs", tags=["vtt-versions"])
_EDITABLE_ROLES = (UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)
@router.get("/{job_id}/vtt/versions", response_model=VttVersionListResponse)
async def list_vtt_versions(
job_id: str,
lang: str = Query(...),
kind: VttKind = Query(...),
skip: int = Query(0, ge=0),
limit: int = Query(50, ge=1, le=200),
current_user: User = Depends(require_roles(*_EDITABLE_ROLES)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List all VTT versions for a job/lang/kind, newest first."""
await get_job_or_403(job_id, ctx, db) # org check
return await vtt_versioning.list_versions(db, job_id, lang, kind, skip, limit)
@router.get("/{job_id}/vtt/versions/{version}", response_model=dict)
async def get_vtt_version(
job_id: str,
version: int,
lang: str = Query(...),
kind: VttKind = Query(...),
current_user: User = Depends(require_roles(*_EDITABLE_ROLES)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Get full VTT content for a specific version."""
await get_job_or_403(job_id, ctx, db) # org check
v = await vtt_versioning.get_version(db, job_id, lang, kind, version)
if not v:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Version not found")
return {
"job_id": v.job_id,
"lang": v.lang,
"kind": v.kind,
"version": v.version,
"content": v.content,
"gcs_uri": v.gcs_uri,
"created_at": v.created_at.isoformat(),
"created_by": v.created_by.dict(),
"note": v.note,
"parent_version": v.parent_version,
"cue_count": v.cue_count,
"byte_size": v.byte_size,
}
@router.get("/{job_id}/vtt/versions/diff", response_model=VttDiffResponse)
async def diff_vtt_versions(
job_id: str,
lang: str = Query(...),
kind: VttKind = Query(...),
from_version: int = Query(..., alias="from"),
to_version: int = Query(..., alias="to"),
current_user: User = Depends(require_roles(*_EDITABLE_ROLES)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Line-level diff between two versions of a VTT file."""
await get_job_or_403(job_id, ctx, db) # org check
v_from = await vtt_versioning.get_version(db, job_id, lang, kind, from_version)
v_to = await vtt_versioning.get_version(db, job_id, lang, kind, to_version)
if not v_from:
raise HTTPException(status_code=404, detail=f"Version {from_version} not found")
if not v_to:
raise HTTPException(status_code=404, detail=f"Version {to_version} not found")
return vtt_versioning.diff_versions(job_id, lang, kind, v_from, v_to)
@router.post(
"/{job_id}/vtt/versions/{version}/restore",
response_model=VttVersionSummary,
status_code=status.HTTP_201_CREATED,
)
async def restore_vtt_version(
job_id: str,
version: int,
lang: str = Query(...),
kind: VttKind = Query(...),
http_request: Request = None,
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""
Restore a previous version as the new live VTT.
Non-destructive: creates a new version entry whose content mirrors the old one,
then overwrites the live GCS file.
"""
await get_job_or_403(job_id, ctx, db) # org check
src = await vtt_versioning.get_version(db, job_id, lang, kind, version)
if not src:
raise HTTPException(status_code=404, detail="Version not found")
# Create new version snapshot (this also bumps the counter)
new_ver = await vtt_versioning.restore_version(db, job_id, lang, kind, version, current_user)
# Overwrite the live file in GCS so the QC editor sees the restored content
live_path = f"{job_id}/{lang}/{'captions' if kind == 'captions' else 'ad'}.vtt"
try:
await gcs_service.upload_text_to_gcs(src.content, live_path, "text/vtt")
except Exception as exc:
raise HTTPException(
status_code=500,
detail=f"Version snapshot created (v{new_ver.version}) but live file update failed: {exc}",
) from None
# Update the GCS URI pointer in the job document
gcs_uri_key = "captions_vtt_gcs" if kind == "captions" else "ad_vtt_gcs"
new_gcs_uri = f"gs://{settings.gcs_bucket}/{live_path}"
await db.jobs.update_one(
{"_id": job_id},
{"$set": {f"outputs.{lang}.{gcs_uri_key}": new_gcs_uri}},
)
await audit_logger.log_action(
action=AuditAction.VTT_EDIT,
description=f"VTT restored to v{version} for job {job_id} lang={lang} kind={kind}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "kind": kind, "restored_from_version": version, "new_version": new_ver.version},
)
return new_ver

View file

@ -5,146 +5,107 @@ Provides WebSocket endpoints for:
1. Individual job status updates: /ws/jobs/{job_id} 1. Individual job status updates: /ws/jobs/{job_id}
2. Job list updates: /ws/jobs (all jobs for authenticated user) 2. Job list updates: /ws/jobs (all jobs for authenticated user)
""" """
import asyncio
import logging import logging
from typing import Optional
from fastapi import ( from fastapi import APIRouter, WebSocket, WebSocketDisconnect, HTTPException, Depends, Query
APIRouter,
Depends,
Query,
WebSocket,
WebSocketDisconnect,
)
from fastapi.security import HTTPBearer from fastapi.security import HTTPBearer
from ...core.authz import PLATFORM_ADMIN_ROLES, _cached_memberships
from ...core.database import get_database
from ...models.user import UserRole
from ...services.websocket import ( from ...services.websocket import (
ConnectionManager,
authenticate_websocket,
connection_manager, connection_manager,
authenticate_websocket,
get_connection_manager, get_connection_manager,
ConnectionManager
) )
from ...models.job import Job
from ...core.database import get_database
from ...core.dependencies import get_current_user
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
router = APIRouter(tags=["WebSocket"]) router = APIRouter(tags=["WebSocket"])
security = HTTPBearer() security = HTTPBearer()
# Close codes that indicate a permanent auth/permission failure — frontend must NOT retry
_TERMINAL_CLOSE_CODES = {4001, 4003, 4004, 4403}
# Seconds between server-side keepalive frames.
# Must be < Apache mod_proxy_wstunnel idle timeout.
# Mod Comms incident 2026-03-18: 25s was insufficient; 20s is safe.
_KEEPALIVE_INTERVAL_S = 20
async def _resolve_user_and_org(websocket: WebSocket, user_id: str, db):
"""
Fetch user document and resolve org memberships from cache.
Returns (user_doc, memberships_dict) or closes the socket and returns (None, None).
"""
user = await db["users"].find_one({"_id": user_id})
if not user:
try:
from bson import ObjectId
user = await db["users"].find_one({"_id": ObjectId(user_id)})
except Exception:
pass
if not user:
await websocket.close(code=4001, reason="User not found")
return None, None
is_platform_admin = UserRole(user.get("role", "")) in PLATFORM_ADMIN_ROLES
if is_platform_admin:
return user, None # None memberships = unrestricted
memberships = await _cached_memberships(user_id, db)
return user, memberships
def _can_access_org(org_id: str | None, memberships: dict | None) -> bool:
"""Return True if user (with these memberships) may access the given org_id."""
if memberships is None:
return True # platform admin
if not org_id:
return True # legacy job without org: allow (further checks done below if needed)
return org_id in memberships
@router.websocket("/ws/jobs/{job_id}") @router.websocket("/ws/jobs/{job_id}")
async def websocket_job_status( async def websocket_job_status(
websocket: WebSocket, websocket: WebSocket,
job_id: str, job_id: str,
token: str | None = Query(None), token: Optional[str] = Query(None),
manager: ConnectionManager = Depends(get_connection_manager) manager: ConnectionManager = Depends(get_connection_manager)
): ):
""" """
WebSocket endpoint for real-time job status updates. WebSocket endpoint for real-time job status updates
Usage: Usage:
- Connect: ws://localhost:8000/api/v1/ws/jobs/{job_id}?token={jwt_token} - Connect: ws://localhost:8000/api/v1/ws/jobs/{job_id}?token={jwt_token}
- Receives: Real-time status updates for the specific job - Receives: Real-time status updates for the specific job
Close codes: Message format:
4001 user not found {
4003 role-based access denied "type": "job_status_update",
4004 job not found "data": {
4403 org membership access denied (do not retry) "job_id": "...",
"status": "processing",
"updated_at": "2023-...",
"message": "Processing video...",
"progress": 45
}
}
""" """
# Authenticate the WebSocket connection
user_id = await authenticate_websocket(websocket, token) user_id = await authenticate_websocket(websocket, token)
if not user_id: if not user_id:
return return
try: try:
# Verify user has access to this job
db = await get_database() db = await get_database()
jobs_collection = db["jobs"]
job = await db["jobs"].find_one({"_id": job_id})
job = await jobs_collection.find_one({"_id": job_id})
if not job: if not job:
await websocket.close(code=4004, reason="Job not found") await websocket.close(code=4004, reason="Job not found")
return return
user, memberships = await _resolve_user_and_org(websocket, user_id, db) # Check permissions - users can only access their own jobs unless they're admin/reviewer
if user is None: user = await db["users"].find_one({"_id": user_id})
return # socket already closed inside helper if not user:
try:
# Role-based client restriction from bson import ObjectId
user = await db["users"].find_one({"_id": ObjectId(user_id)})
except Exception:
pass # Invalid ObjectId format
if not user:
await websocket.close(code=4001, reason="User not found")
return
# Check access permissions
if user["role"] == "client" and job.get("created_by") != user_id: if user["role"] == "client" and job.get("created_by") != user_id:
await websocket.close(code=4003, reason="Access denied") await websocket.close(code=4003, reason="Access denied")
return return
# Org membership check # Connect to job status updates
job_org = job.get("organization_id")
if not _can_access_org(job_org, memberships):
await websocket.close(code=4403, reason="Org access denied")
return
await manager.connect_job_status(websocket, user_id, job_id) await manager.connect_job_status(websocket, user_id, job_id)
# Keep connection alive and handle incoming messages
while True: while True:
try: try:
# Wait up to _KEEPALIVE_INTERVAL_S for a client message. # Wait for incoming WebSocket messages (for heartbeat, etc.)
# On timeout send a keepalive frame so the proxy idle timer resets. message = await websocket.receive_text()
message = await asyncio.wait_for(
websocket.receive_text(),
timeout=_KEEPALIVE_INTERVAL_S,
)
logger.debug(f"Received WebSocket message from user {user_id}: {message}") logger.debug(f"Received WebSocket message from user {user_id}: {message}")
# Handle heartbeat or other client messages if needed
if message == "ping": if message == "ping":
await websocket.send_text("pong") await websocket.send_text("pong")
except TimeoutError:
await websocket.send_text("keepalive")
except WebSocketDisconnect: except WebSocketDisconnect:
break break
except Exception as e: except Exception as e:
logger.error(f"Error in WebSocket message handling: {e}") logger.error(f"Error in WebSocket message handling: {e}")
break break
except WebSocketDisconnect: except WebSocketDisconnect:
pass pass
except Exception as e: except Exception as e:
@ -156,54 +117,75 @@ async def websocket_job_status(
@router.websocket("/ws/jobs") @router.websocket("/ws/jobs")
async def websocket_job_list( async def websocket_job_list(
websocket: WebSocket, websocket: WebSocket,
token: str | None = Query(None), token: Optional[str] = Query(None),
manager: ConnectionManager = Depends(get_connection_manager) manager: ConnectionManager = Depends(get_connection_manager)
): ):
""" """
WebSocket endpoint for real-time job list updates. WebSocket endpoint for real-time job list updates
Usage: Usage:
- Connect: ws://localhost:8000/api/v1/ws/jobs?token={jwt_token} - Connect: ws://localhost:8000/api/v1/ws/jobs?token={jwt_token}
- Receives: Real-time status updates for all jobs the user can access - Receives: Real-time status updates for all jobs the user can access
Only events for jobs in the user's accessible orgs are delivered. Message format:
{
"type": "job_list_update",
"data": {
"job_id": "...",
"status": "processing",
"updated_at": "2023-...",
"message": "Processing video...",
"progress": 45
}
}
""" """
# Authenticate the WebSocket connection
user_id = await authenticate_websocket(websocket, token) user_id = await authenticate_websocket(websocket, token)
if not user_id: if not user_id:
return return
try: try:
# Verify user exists
logger.info(f"WebSocket: Looking up user {user_id} in database") logger.info(f"WebSocket: Looking up user {user_id} in database")
db = await get_database() db = await get_database()
user, memberships = await _resolve_user_and_org(websocket, user_id, db) # Try looking up user by string ID first, then by ObjectId
if user is None: user = await db["users"].find_one({"_id": user_id})
return # socket already closed inside helper if not user:
try:
from bson import ObjectId
user = await db["users"].find_one({"_id": ObjectId(user_id)})
except Exception:
pass # Invalid ObjectId format
if not user:
logger.warning(f"WebSocket: User {user_id} not found in database (tried both string and ObjectId)")
await websocket.close(code=4001, reason="User not found")
return
logger.info(f"WebSocket: User {user_id} found, role: {user.get('role', 'unknown')}") logger.info(f"WebSocket: User {user_id} found, role: {user.get('role', 'unknown')}")
accessible_org_ids = None if memberships is None else list(memberships.keys()) logger.info(f"WebSocket: User {user_id} found, connecting to job list updates")
await manager.connect_job_list(websocket, user_id, accessible_org_ids=accessible_org_ids) # Connect to job list updates
await manager.connect_job_list(websocket, user_id)
# Keep connection alive and handle incoming messages
while True: while True:
try: try:
message = await asyncio.wait_for( # Wait for incoming WebSocket messages
websocket.receive_text(), message = await websocket.receive_text()
timeout=_KEEPALIVE_INTERVAL_S,
)
logger.debug(f"Received WebSocket message from user {user_id}: {message}") logger.debug(f"Received WebSocket message from user {user_id}: {message}")
# Handle heartbeat or other client messages if needed
if message == "ping": if message == "ping":
await websocket.send_text("pong") await websocket.send_text("pong")
except TimeoutError:
await websocket.send_text("keepalive")
except WebSocketDisconnect: except WebSocketDisconnect:
break break
except Exception as e: except Exception as e:
logger.error(f"Error in WebSocket message handling: {e}") logger.error(f"Error in WebSocket message handling: {e}")
break break
except WebSocketDisconnect: except WebSocketDisconnect:
pass pass
except Exception as e: except Exception as e:
@ -214,15 +196,19 @@ async def websocket_job_list(
@router.get("/ws/status") @router.get("/ws/status")
async def websocket_status(): async def websocket_status():
"""Get WebSocket connection status and statistics (debug/monitoring).""" """
Get WebSocket connection status and statistics
Useful for debugging and monitoring
"""
stats = { stats = {
"active_connections": len(connection_manager.active_connections), "active_connections": len(connection_manager.active_connections),
"job_subscriptions": len(connection_manager.job_subscriptions), "job_subscriptions": len(connection_manager.job_subscriptions),
"global_subscriptions": len(connection_manager.global_subscriptions), "global_subscriptions": len(connection_manager.global_subscriptions),
"redis_connected": connection_manager.redis_client is not None, "redis_connected": connection_manager.redis_client is not None,
"subscriber_running": ( "subscriber_running": (
connection_manager.subscriber_task is not None and connection_manager.subscriber_task is not None and
not connection_manager.subscriber_task.done() not connection_manager.subscriber_task.done()
) )
} }
return stats
return stats

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -11,6 +11,7 @@ Provides:
import json import json
from dataclasses import dataclass from dataclasses import dataclass
from typing import Optional
from fastapi import Depends, HTTPException, status from fastapi import Depends, HTTPException, status
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
@ -63,10 +64,10 @@ async def _cached_memberships(
db: AsyncIOMotorDatabase, db: AsyncIOMotorDatabase,
) -> dict[str, OrgRole]: ) -> dict[str, OrgRole]:
"""Load memberships, with Redis cache (60s TTL).""" """Load memberships, with Redis cache (60s TTL)."""
cache_key = f"mem:user:{user_id}"
try: try:
redis = await get_redis() redis = get_redis()
if redis: if redis:
cache_key = f"mem:user:{user_id}"
cached = await redis.get(cache_key) cached = await redis.get(cache_key)
if cached: if cached:
raw = json.loads(cached) raw = json.loads(cached)
@ -77,7 +78,7 @@ async def _cached_memberships(
memberships = await _load_memberships(user_id, db) memberships = await _load_memberships(user_id, db)
try: try:
redis = await get_redis() redis = get_redis()
if redis: if redis:
await redis.setex( await redis.setex(
cache_key, cache_key,
@ -158,7 +159,7 @@ class OrgScopedQuery:
def filter( def filter(
self, self,
base_query: dict, base_query: dict,
org_id: str | None = None, org_id: Optional[str] = None,
org_field: str = "organization_id", org_field: str = "organization_id",
) -> dict: ) -> dict:
if self.ctx.is_platform_admin: if self.ctx.is_platform_admin:
@ -182,50 +183,6 @@ class OrgScopedQuery:
return {**base_query, org_field: {"$in": accessible}} return {**base_query, org_field: {"$in": accessible}}
def assert_user_in_org(
ctx: "MembershipContext",
org_id: str,
min_role: OrgRole = OrgRole.VIEWER,
) -> None:
"""Raise 403 if ctx user does not have min_role in org_id. Platform admins always pass."""
if not ctx.can_access_org(org_id, min_role):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Access to this organization is not permitted",
)
async def get_job_or_403(
job_id: str,
ctx: "MembershipContext",
db: AsyncIOMotorDatabase,
) -> dict:
"""Load job document and verify ctx user can access its organization. Returns 404 for missing jobs."""
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
org_id = job_doc.get("organization_id")
if not org_id:
# Legacy job without org: try resolving via project
project_id = job_doc.get("project_id")
if project_id:
project = await db.projects.find_one({"_id": project_id}, {"client_id": 1})
if project:
org_id = project.get("client_id")
if org_id:
if not ctx.can_access_org(org_id):
# Return 404 to avoid leaking existence of cross-org jobs
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
else:
# Truly legacy job (no project, no org): only the original uploader or admin can access
if not ctx.is_platform_admin and job_doc.get("client_id") != str(ctx.user.id):
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
return job_doc
async def bump_user_membership_cache(user_id: str) -> None: async def bump_user_membership_cache(user_id: str) -> None:
"""Invalidate the Redis membership cache for a user (call on any membership write).""" """Invalidate the Redis membership cache for a user (call on any membership write)."""
try: try:

View file

@ -6,7 +6,6 @@ class Settings(BaseSettings):
# App # App
app_env: str = "dev" app_env: str = "dev"
api_base_url: str = "http://localhost:8000" api_base_url: str = "http://localhost:8000"
app_url: str = "https://optical-dev.oliver.solutions/video-accessibility"
# Auth # Auth
jwt_secret: str jwt_secret: str
@ -23,14 +22,13 @@ class Settings(BaseSettings):
# Redis # Redis
redis_url: str redis_url: str
# Celery # Celery
celery_broker_url: str = "" celery_broker_url: str = ""
celery_result_backend: str = "" celery_result_backend: str = ""
# GCP # GCP
gcp_project_id: str gcp_project_id: str
gcp_location: str = "us-central1"
gcs_bucket: str = "accessible-video" gcs_bucket: str = "accessible-video"
google_application_credentials: str = "" google_application_credentials: str = ""
@ -38,7 +36,7 @@ class Settings(BaseSettings):
gemini_api_key: str gemini_api_key: str
elevenlabs_api_key: str = "" elevenlabs_api_key: str = ""
google_tts_credentials: str = "" google_tts_credentials: str = ""
# TTS Voice Configuration # TTS Voice Configuration
tts_provider: str = "gemini" # "gemini", "google", or "elevenlabs" tts_provider: str = "gemini" # "gemini", "google", or "elevenlabs"
google_tts_voices: dict[str, str] = { google_tts_voices: dict[str, str] = {
@ -52,7 +50,7 @@ class Settings(BaseSettings):
elevenlabs_voices: dict[str, str] = {} elevenlabs_voices: dict[str, str] = {}
# Gemini TTS Configuration # Gemini TTS Configuration
gemini_tts_model: str = "gemini-3.1-flash-tts-preview" gemini_tts_model: str = "gemini-2.5-flash-preview-tts"
gemini_tts_default_voice: str = "Kore" gemini_tts_default_voice: str = "Kore"
gemini_tts_voices: list[str] = [ gemini_tts_voices: list[str] = [
"Zephyr", "Puck", "Charon", "Kore", "Fenrir", "Leda", "Orus", "Aoede", "Zephyr", "Puck", "Charon", "Kore", "Fenrir", "Leda", "Orus", "Aoede",
@ -95,24 +93,7 @@ class Settings(BaseSettings):
"sv": "sv-SE", "sv": "sv-SE",
"es-419": "es-US", "es-419": "es-US",
"pt-BR": "pt-BR", "pt-BR": "pt-BR",
"fr-CA": "fr-CA", "fr-CA": "fr-CA"
# Explicit region variants (added for locale-aware glossary support)
"de-DE": "de-DE",
"en-US": "en-US",
"en-GB": "en-GB",
"en-CA": "en-CA",
"es-ES": "es-ES",
"es-MX": "es-US",
"fr-FR": "fr-FR",
"it-IT": "it-IT",
"ja-JP": "ja-JP",
"ko-KR": "ko-KR",
"nl-NL": "nl-NL",
"pl-PL": "pl-PL",
"cs-CZ": "cs-CZ",
"tr-TR": "tr-TR",
"id-ID": "id-ID",
"pt-PT": "pt-PT",
} }
gemini_tts_language_names: dict[str, str] = { gemini_tts_language_names: dict[str, str] = {
"en": "English", "en": "English",
@ -148,24 +129,7 @@ class Settings(BaseSettings):
"sv": "Swedish", "sv": "Swedish",
"es-419": "Spanish (Latin America)", "es-419": "Spanish (Latin America)",
"pt-BR": "Portuguese (Brazil)", "pt-BR": "Portuguese (Brazil)",
"fr-CA": "French (Canada)", "fr-CA": "French (Canada)"
# Explicit region variants
"de-DE": "German (Germany)",
"en-US": "English (US)",
"en-GB": "English (UK)",
"en-CA": "English (Canada)",
"es-ES": "Spanish (Spain)",
"es-MX": "Spanish (Mexico)",
"fr-FR": "French (France)",
"it-IT": "Italian (Italy)",
"ja-JP": "Japanese (Japan)",
"ko-KR": "Korean (Korea)",
"nl-NL": "Dutch (Netherlands)",
"pl-PL": "Polish (Poland)",
"cs-CZ": "Czech (Czech Republic)",
"tr-TR": "Turkish (Turkey)",
"id-ID": "Indonesian (Indonesia)",
"pt-PT": "Portuguese (Portugal)",
} }
gemini_tts_preview_samples: dict[str, str] = { gemini_tts_preview_samples: dict[str, str] = {
"en": "This is a preview of the audio description voice.", "en": "This is a preview of the audio description voice.",
@ -201,30 +165,13 @@ class Settings(BaseSettings):
"sv": "Det här är en förhandsgranskning av ljudbeskrivningsrösten.", "sv": "Det här är en förhandsgranskning av ljudbeskrivningsrösten.",
"es-419": "Esta es una vista previa de la voz de audiodescripción.", "es-419": "Esta es una vista previa de la voz de audiodescripción.",
"pt-BR": "Esta é uma prévia da voz da audiodescrição.", "pt-BR": "Esta é uma prévia da voz da audiodescrição.",
"fr-CA": "Ceci est un aperçu de la voix de l'audiodescription.", "fr-CA": "Ceci est un aperçu de la voix de l'audiodescription."
# Explicit region variants
"de-DE": "Dies ist eine Vorschau der Audiodeskriptionsstimme.",
"en-US": "This is a preview of the audio description voice.",
"en-GB": "This is a preview of the audio description voice.",
"en-CA": "This is a preview of the audio description voice.",
"es-ES": "Esta es una vista previa de la voz de audiodescripción.",
"es-MX": "Esta es una vista previa de la voz de audiodescripción.",
"fr-FR": "Ceci est un aperçu de la voix de l'audiodescription.",
"it-IT": "Questa è un'anteprima della voce dell'audiodescrizione.",
"ja-JP": "これは音声解説の声のプレビューです。",
"ko-KR": "이것은 오디오 설명 음성의 미리보기입니다.",
"nl-NL": "Dit is een voorbeeld van de audiodescriptiestem.",
"pl-PL": "To jest podgląd głosu audiodeskrypcji.",
"cs-CZ": "Toto je náhled hlasu zvukového popisu.",
"tr-TR": "Bu, sesli betimleme sesinin bir önizlemesidir.",
"id-ID": "Ini adalah pratinjau suara deskripsi audio.",
"pt-PT": "Esta é uma pré-visualização da voz da audiodescrição.",
} }
# Gemini TTS Model Options # Gemini TTS Model Options
gemini_tts_models: dict[str, str] = { gemini_tts_models: dict[str, str] = {
"flash": "gemini-3.1-flash-tts-preview", # Fast, cost-efficient (Preview) "flash": "gemini-2.5-flash-preview-tts", # Fast, cost-efficient
"pro": "gemini-2.5-pro-tts", # Higher quality (GA) "pro": "gemini-2.5-pro-preview-tts", # Higher quality
} }
# Gemini TTS Style Presets - prompts prepended to text for style control # Gemini TTS Style Presets - prompts prepended to text for style control
@ -249,14 +196,6 @@ class Settings(BaseSettings):
whisper_sentence_gap_threshold: float = 0.5 # Gap duration to classify as sentence boundary whisper_sentence_gap_threshold: float = 0.5 # Gap duration to classify as sentence boundary
whisper_phrase_gap_threshold: float = 0.3 # Gap duration to classify as phrase boundary whisper_phrase_gap_threshold: float = 0.3 # Gap duration to classify as phrase boundary
whisper_min_gap_threshold: float = 0.15 # Minimum gap duration to consider whisper_min_gap_threshold: float = 0.15 # Minimum gap duration to consider
# Forward-preferred snap windows (A2)
whisper_snap_forward_window: float = 4.0 # Prefer boundary up to N seconds ahead of Gemini point
whisper_snap_backward_window: float = 1.5 # Fall back to boundary up to N seconds behind
# Adaptive silence buffer (A1)
ad_silence_buffer_default: float = 0.5 # Base silence duration (s) before/after AD audio
ad_silence_buffer_min_after: float = 0.1 # Minimum silence after AD audio
# Minimum gap required at the chosen pause point (A3)
ad_min_acceptable_gap: float = 0.2 # Seconds; points with shorter gaps trigger forward search
# Cloud Run Service URLs (empty = use local processing) # Cloud Run Service URLs (empty = use local processing)
# When set, CPU-intensive work is offloaded to Cloud Run with autoscaling # When set, CPU-intensive work is offloaded to Cloud Run with autoscaling
@ -275,10 +214,11 @@ class Settings(BaseSettings):
ffmpeg_worker_concurrency: int = 4 # FFmpeg tasks on main worker ffmpeg_worker_concurrency: int = 4 # FFmpeg tasks on main worker
tts_worker_concurrency: int = 8 # TTS worker tts_worker_concurrency: int = 8 # TTS worker
# Email (Mailgun) # Email (Mailgun — primary; sendgrid_api_key kept for backward compat)
mailgun_api_key: str = "" mailgun_api_key: str = ""
mailgun_domain: str = "mg.oliver.solutions" mailgun_domain: str = "mg.oliver.solutions"
mailgun_from: str = "noreply@mg.oliver.solutions" mailgun_from: str = "noreply@mg.oliver.solutions"
sendgrid_api_key: str = ""
email_from: str = "noreply@mg.oliver.solutions" email_from: str = "noreply@mg.oliver.solutions"
client_base_url: str client_base_url: str
@ -297,10 +237,6 @@ class Settings(BaseSettings):
cost_tracker_source_app: str = "video-accessibility" cost_tracker_source_app: str = "video-accessibility"
cost_tracker_enabled: bool = True cost_tracker_enabled: bool = True
# Upload limits (T-14 — single source of truth)
upload_max_video_bytes: int = 2 * 1024 * 1024 * 1024 # 2GB
upload_signed_url_ttl_hours: int = 24 # signed URL lifetime
# CORS - comma-separated list of allowed origins # CORS - comma-separated list of allowed origins
cors_origins: str = "http://localhost:5173,http://localhost:5174,http://localhost:3000,http://localhost:6001" cors_origins: str = "http://localhost:5173,http://localhost:5174,http://localhost:3000,http://localhost:6001"

View file

@ -56,7 +56,7 @@ async def create_indexes():
await db.audit_logs.create_index([("resource_type", 1), ("resource_id", 1)]) # Resource tracking await db.audit_logs.create_index([("resource_type", 1), ("resource_id", 1)]) # Resource tracking
await db.audit_logs.create_index([("ip_address", 1), ("timestamp", -1)]) # IP-based analysis await db.audit_logs.create_index([("ip_address", 1), ("timestamp", -1)]) # IP-based analysis
await db.audit_logs.create_index([("success", 1), ("timestamp", -1)]) # Failed operations await db.audit_logs.create_index([("success", 1), ("timestamp", -1)]) # Failed operations
# Text search index for description and details # Text search index for description and details
await db.audit_logs.create_index([ await db.audit_logs.create_index([
("description", "text"), ("description", "text"),
@ -64,19 +64,9 @@ async def create_indexes():
("error_message", "text") ("error_message", "text")
]) ])
# Per-language QC assignment index — for linguist queue queries
await db.jobs.create_index([("qc_assignments.linguist_id", 1), ("qc_assignments.status", 1)])
# Review notes collection indexes # Review notes collection indexes
await db.review_notes.create_index([("job_id", 1), ("asset_key", 1)]) await db.review_notes.create_index([("job_id", 1), ("asset_key", 1)])
await db.review_notes.create_index([("job_id", 1), ("asset_key", 1), ("timestamp_seconds", 1)]) await db.review_notes.create_index([("job_id", 1), ("asset_key", 1), ("timestamp_seconds", 1)])
await db.review_notes.create_index([("user_id", 1)]) await db.review_notes.create_index([("user_id", 1)])
# VTT versions collection indexes
await db.vtt_versions.create_index(
[("job_id", 1), ("lang", 1), ("kind", 1), ("version", -1)],
unique=True,
)
await db.vtt_versions.create_index([("job_id", 1), ("created_at", -1)])
logger.info("Database indexes created successfully") logger.info("Database indexes created successfully")

View file

@ -1,16 +1,18 @@
from typing import Optional
from fastapi import Depends, HTTPException, Request, status from fastapi import Depends, HTTPException, Request, status
from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
from ..models.user import User, UserRole from ..models.user import User, UserRole
from .config import settings
from .database import get_database from .database import get_database
from .security import decode_token from .security import decode_token
security = HTTPBearer() security = HTTPBearer()
# Only admins bypass tenant isolation; other staff are scoped by team membership # Roles that see all jobs (no tenant isolation)
STAFF_ROLES = {UserRole.ADMIN} STAFF_ROLES = {UserRole.ADMIN, UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION}
async def get_current_user( async def get_current_user(
@ -19,13 +21,6 @@ async def get_current_user(
) -> User: ) -> User:
token = credentials.credentials token = credentials.credentials
payload = decode_token(token) payload = decode_token(token)
if payload.get("type") == "refresh":
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Could not validate credentials",
)
user_id: str = payload.get("sub") user_id: str = payload.get("sub")
if user_id is None: if user_id is None:
@ -41,12 +36,7 @@ async def get_current_user(
detail="User not found", detail="User not found",
) )
user = User(**user_doc) return User(**user_doc)
# Attach org_ids hint from token as transient attribute (never used for authz)
token_org_ids = payload.get("org_ids", [])
if token_org_ids:
user.__dict__["org_ids"] = token_org_ids
return user
def require_role(required_role: UserRole): def require_role(required_role: UserRole):
@ -76,7 +66,7 @@ def require_roles(*required_roles: UserRole):
async def get_current_user_optional( async def get_current_user_optional(
request: Request, request: Request,
db: AsyncIOMotorDatabase = Depends(get_database), db: AsyncIOMotorDatabase = Depends(get_database),
) -> User | None: ) -> Optional[User]:
authorization: str = request.headers.get("Authorization") authorization: str = request.headers.get("Authorization")
if not authorization: if not authorization:
return None return None
@ -87,9 +77,6 @@ async def get_current_user_optional(
return None return None
payload = decode_token(token) payload = decode_token(token)
if payload.get("type") == "refresh":
return None
user_id: str = payload.get("sub") user_id: str = payload.get("sub")
if user_id is None: if user_id is None:
@ -107,28 +94,21 @@ async def get_current_user_optional(
async def get_accessible_project_ids( async def get_accessible_project_ids(
user: User, user: User,
db: AsyncIOMotorDatabase, db: AsyncIOMotorDatabase,
) -> list[str] | None: ) -> Optional[list[str]]:
""" """
Returns project IDs the user may access, or None meaning "see everything". Returns project IDs the user may access, or None meaning "see everything".
- Admin None (unrestricted) - Staff / Admin None (unrestricted)
- Staff (REVIEWER/LINGUIST/PRODUCTION) scoped by team membership; - Otherwise projects in orgs where the user holds any membership
if not yet assigned to any team, falls back to None (see all) (falls back to legacy pm_client_ids/team lookups if no memberships found)
so existing staff aren't locked out before teams are configured
- PM projects in accessible orgs/clients (pm_client_ids legacy)
- CLIENT projects in orgs where the user holds any membership
""" """
if user.role in STAFF_ROLES: if user.role in STAFF_ROLES:
return None return None
# Primary path: use memberships collection (Phase 3 SaaS)
user_id = str(user.id) user_id = str(user.id)
membership_cursor = db.memberships.find({"user_id": user_id}, {"organization_id": 1})
# Primary path: use Redis-cached memberships (60s TTL, same cache as authz.py) org_ids = [doc["organization_id"] async for doc in membership_cursor]
from .authz import (
_cached_memberships, # local import to avoid circular dep at module level
)
memberships_map = await _cached_memberships(user_id, db)
org_ids = list(memberships_map.keys())
if org_ids: if org_ids:
projects = await db.projects.find( projects = await db.projects.find(
@ -137,98 +117,29 @@ async def get_accessible_project_ids(
).to_list(None) ).to_list(None)
return [str(p["_id"]) for p in projects] return [str(p["_id"]) for p in projects]
# Legacy fallback: team membership (used by REVIEWER/LINGUIST/PRODUCTION and legacy CLIENT) # Legacy fallback (pre-backfill) — keeps the app working before migration runs
teams = await db.teams.find( if user.role == UserRole.PROJECT_MANAGER:
{"member_user_ids": user_id}, client_ids = user.pm_client_ids or []
{"client_id": 1}, if not client_ids:
).to_list(None) return []
client_ids = list({t["client_id"] for t in teams})
if client_ids:
projects = await db.projects.find( projects = await db.projects.find(
{"client_id": {"$in": client_ids}, "is_active": True}, {"client_id": {"$in": client_ids}, "is_active": True},
{"_id": 1}, {"_id": 1},
).to_list(None) ).to_list(None)
return [str(p["_id"]) for p in projects] return [str(p["_id"]) for p in projects]
# PM legacy: scoped via pm_client_ids teams = await db.teams.find(
if user.role == UserRole.PROJECT_MANAGER: {"member_user_ids": user_id},
pm_client_ids = user.pm_client_ids or [] {"client_id": 1},
if not pm_client_ids: ).to_list(None)
return [] client_ids = list({t["client_id"] for t in teams})
projects = await db.projects.find( if not client_ids:
{"client_id": {"$in": pm_client_ids}, "is_active": True}, return []
{"_id": 1}, projects = await db.projects.find(
).to_list(None) {"client_id": {"$in": client_ids}, "is_active": True},
return [str(p["_id"]) for p in projects] {"_id": 1},
).to_list(None)
# Staff with no team assignments → unrestricted until teams are configured return [str(p["_id"]) for p in projects]
if user.role in {UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION}:
return None
# CLIENT with no memberships and no teams → show nothing
return []
async def get_user_org_ids(user: User, db: AsyncIOMotorDatabase) -> list[str] | None:
"""Return org IDs the user belongs to, or None meaning unrestricted (ADMIN).
Priority: memberships pm_client_ids (PM legacy) team.member_user_ids (staff legacy)
"""
if user.role == UserRole.ADMIN:
return None
user_id = str(user.id)
# Primary: Membership collection
org_ids: list[str] = []
async for m in db.memberships.find({"user_id": user_id}, {"organization_id": 1}):
if m.get("organization_id"):
org_ids.append(str(m["organization_id"]))
if org_ids:
return org_ids
# PM legacy: pm_client_ids
if user.role == UserRole.PROJECT_MANAGER:
return list(user.pm_client_ids or [])
# Staff legacy: team.member_user_ids
teams = await db.teams.find({"member_user_ids": user_id}, {"client_id": 1}).to_list(None)
if teams:
return [str(t["client_id"]) for t in teams if t.get("client_id")]
return []
async def assert_job_in_user_org(job: dict, user: User, db: AsyncIOMotorDatabase) -> None:
"""Raise 404 (not 403) when user cannot access this job — avoids information disclosure."""
if user.role == UserRole.ADMIN:
return
org_ids = await get_user_org_ids(user, db)
if org_ids is None:
return # unrestricted
job_org = job.get("organization_id")
if job_org:
if job_org in org_ids:
return
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
# No organization_id — try project fallback
project_id = job.get("project_id")
if project_id:
project = await db.projects.find_one({"_id": project_id}, {"client_id": 1})
if project and project.get("client_id") in org_ids:
return
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
# Legacy: client_id == creator user_id
job_client_id = job.get("client_id")
if job_client_id and job_client_id == str(user.id):
return
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
def require_pm_for_client(client_id_param: str = "client_id"): def require_pm_for_client(client_id_param: str = "client_id"):

View file

@ -1,6 +1,10 @@
"""Enhanced configuration system with Secret Manager integration.""" """Enhanced configuration system with Secret Manager integration."""
import os
import asyncio
from typing import Dict, Optional, Any
from functools import lru_cache from functools import lru_cache
from pydantic_settings import BaseSettings
from .config import Settings as BaseConfig from .config import Settings as BaseConfig
from .logging import get_logger from .logging import get_logger
@ -10,40 +14,41 @@ logger = get_logger(__name__)
class SecretsConfig(BaseConfig): class SecretsConfig(BaseConfig):
"""Enhanced configuration that loads secrets from GCP Secret Manager.""" """Enhanced configuration that loads secrets from GCP Secret Manager."""
def __init__(self, **kwargs): def __init__(self, **kwargs):
# Initialize with base configuration first # Initialize with base configuration first
super().__init__(**kwargs) super().__init__(**kwargs)
# Flag to track if secrets have been loaded # Flag to track if secrets have been loaded
self._secrets_loaded = False self._secrets_loaded = False
self._secret_values: dict[str, str] = {} self._secret_values: Dict[str, str] = {}
async def load_secrets(self) -> None: async def load_secrets(self) -> None:
"""Load secrets from Secret Manager asynchronously.""" """Load secrets from Secret Manager asynchronously."""
if self._secrets_loaded: if self._secrets_loaded:
return return
try: try:
# Only import here to avoid circular imports # Only import here to avoid circular imports
from app.services.secrets_manager import secrets_manager from app.services.secrets_manager import secrets_manager
# Define which config fields should be loaded from secrets # Define which config fields should be loaded from secrets
secret_mappings = { secret_mappings = {
# Config field -> Secret Manager name # Config field -> Secret Manager name
"jwt_secret": "jwt-secret", "jwt_secret": "jwt-secret",
"jwt_refresh_secret": "jwt-refresh-secret", "jwt_refresh_secret": "jwt-refresh-secret",
"mongodb_uri": "mongodb-url", "mongodb_uri": "mongodb-url",
"redis_url": "redis-url", "redis_url": "redis-url",
"gemini_api_key": "gemini-api-key", "gemini_api_key": "gemini-api-key",
"sendgrid_api_key": "sendgrid-api-key",
"elevenlabs_api_key": "elevenlabs-api-key", "elevenlabs_api_key": "elevenlabs-api-key",
"sentry_dsn": "sentry-dsn" "sentry_dsn": "sentry-dsn"
} }
# Get all secrets in batch # Get all secrets in batch
secret_names = list(secret_mappings.values()) secret_names = list(secret_mappings.values())
retrieved_secrets = await secrets_manager.get_secrets_batch(secret_names) retrieved_secrets = await secrets_manager.get_secrets_batch(secret_names)
# Map secrets back to config fields # Map secrets back to config fields
for config_field, secret_name in secret_mappings.items(): for config_field, secret_name in secret_mappings.items():
if secret_name in retrieved_secrets: if secret_name in retrieved_secrets:
@ -53,50 +58,50 @@ class SecretsConfig(BaseConfig):
logger.debug(f"Loaded secret for {config_field}") logger.debug(f"Loaded secret for {config_field}")
else: else:
logger.warning(f"Secret {secret_name} not available, using environment/default") logger.warning(f"Secret {secret_name} not available, using environment/default")
self._secrets_loaded = True self._secrets_loaded = True
logger.info(f"Successfully loaded {len(retrieved_secrets)} secrets from Secret Manager") logger.info(f"Successfully loaded {len(retrieved_secrets)} secrets from Secret Manager")
except Exception as e: except Exception as e:
logger.warning(f"Failed to load secrets from Secret Manager: {e}") logger.warning(f"Failed to load secrets from Secret Manager: {e}")
logger.warning("Falling back to environment variables") logger.warning("Falling back to environment variables")
self._secrets_loaded = True # Mark as loaded to prevent retries self._secrets_loaded = True # Mark as loaded to prevent retries
def get_secret_value(self, field_name: str) -> str | None: def get_secret_value(self, field_name: str) -> Optional[str]:
"""Get a secret value if it was loaded from Secret Manager.""" """Get a secret value if it was loaded from Secret Manager."""
return self._secret_values.get(field_name) return self._secret_values.get(field_name)
async def refresh_secrets(self) -> None: async def refresh_secrets(self) -> None:
"""Force refresh secrets from Secret Manager.""" """Force refresh secrets from Secret Manager."""
self._secrets_loaded = False self._secrets_loaded = False
self._secret_values.clear() self._secret_values.clear()
# Clear the secrets manager cache # Clear the secrets manager cache
from app.services.secrets_manager import secrets_manager from app.services.secrets_manager import secrets_manager
secrets_manager.clear_cache() secrets_manager.clear_cache()
await self.load_secrets() await self.load_secrets()
@property @property
def is_production(self) -> bool: def is_production(self) -> bool:
"""Check if running in production environment.""" """Check if running in production environment."""
return self.app_env == "prod" return self.app_env == "prod"
@property @property
def is_development(self) -> bool: def is_development(self) -> bool:
"""Check if running in development environment.""" """Check if running in development environment."""
return self.app_env == "dev" return self.app_env == "dev"
@property @property
def google_cloud_project(self) -> str: def google_cloud_project(self) -> str:
"""Get Google Cloud Project ID.""" """Get Google Cloud Project ID."""
return self.gcp_project_id return self.gcp_project_id
@property @property
def jwt_refresh_secret(self) -> str: def jwt_refresh_secret(self) -> str:
"""Get JWT refresh secret (fallback to main secret if not set).""" """Get JWT refresh secret (fallback to main secret if not set)."""
return getattr(self, '_jwt_refresh_secret', self.jwt_secret) return getattr(self, '_jwt_refresh_secret', self.jwt_secret)
@jwt_refresh_secret.setter @jwt_refresh_secret.setter
def jwt_refresh_secret(self, value: str) -> None: def jwt_refresh_secret(self, value: str) -> None:
"""Set JWT refresh secret.""" """Set JWT refresh secret."""
@ -104,37 +109,37 @@ class SecretsConfig(BaseConfig):
# Global configuration instance # Global configuration instance
_config_instance: SecretsConfig | None = None _config_instance: Optional[SecretsConfig] = None
async def initialize_config() -> SecretsConfig: async def initialize_config() -> SecretsConfig:
"""Initialize configuration with secrets loading.""" """Initialize configuration with secrets loading."""
global _config_instance global _config_instance
if _config_instance is None: if _config_instance is None:
_config_instance = SecretsConfig() _config_instance = SecretsConfig()
await _config_instance.load_secrets() await _config_instance.load_secrets()
return _config_instance return _config_instance
def get_settings() -> SecretsConfig: def get_settings() -> SecretsConfig:
"""Get settings instance (synchronous).""" """Get settings instance (synchronous)."""
global _config_instance global _config_instance
if _config_instance is None: if _config_instance is None:
# Initialize without secrets for backwards compatibility # Initialize without secrets for backwards compatibility
_config_instance = SecretsConfig() _config_instance = SecretsConfig()
logger.warning("Settings accessed before async initialization - secrets not loaded") logger.warning("Settings accessed before async initialization - secrets not loaded")
return _config_instance return _config_instance
@lru_cache @lru_cache()
def get_settings_cached() -> SecretsConfig: def get_settings_cached() -> SecretsConfig:
"""Get cached settings instance.""" """Get cached settings instance."""
return get_settings() return get_settings()
# Backwards compatibility # Backwards compatibility
settings = get_settings() settings = get_settings()

View file

@ -1,5 +1,5 @@
from datetime import datetime, timedelta from datetime import datetime, timedelta
from typing import Any from typing import Any, Optional, Union
from fastapi import HTTPException, status from fastapi import HTTPException, status
from jose import JWTError, jwt from jose import JWTError, jwt
@ -11,24 +11,20 @@ pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
def create_access_token( def create_access_token(
subject: str | Any, subject: Union[str, Any], expires_delta: Optional[timedelta] = None
expires_delta: timedelta | None = None,
org_ids: list[str] | None = None,
) -> str: ) -> str:
if expires_delta: if expires_delta:
expire = datetime.utcnow() + expires_delta expire = datetime.utcnow() + expires_delta
else: else:
expire = datetime.utcnow() + timedelta(minutes=settings.jwt_access_ttl_min) expire = datetime.utcnow() + timedelta(minutes=settings.jwt_access_ttl_min)
to_encode: dict[str, Any] = {"exp": expire, "sub": str(subject), "v": 2} to_encode = {"exp": expire, "sub": str(subject)}
if org_ids:
to_encode["org_ids"] = org_ids
encoded_jwt = jwt.encode(to_encode, settings.jwt_secret, algorithm=settings.jwt_alg) encoded_jwt = jwt.encode(to_encode, settings.jwt_secret, algorithm=settings.jwt_alg)
return encoded_jwt return encoded_jwt
def create_refresh_token( def create_refresh_token(
subject: str | Any, expires_delta: timedelta | None = None subject: Union[str, Any], expires_delta: Optional[timedelta] = None
) -> str: ) -> str:
if expires_delta: if expires_delta:
expire = datetime.utcnow() + expires_delta expire = datetime.utcnow() + expires_delta
@ -41,8 +37,6 @@ def create_refresh_token(
def verify_password(plain_password: str, hashed_password: str) -> bool: def verify_password(plain_password: str, hashed_password: str) -> bool:
if not hashed_password:
return False
return pwd_context.verify(plain_password, hashed_password) return pwd_context.verify(plain_password, hashed_password)
@ -58,4 +52,4 @@ def decode_token(token: str) -> dict[str, Any]:
raise HTTPException( raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED, status_code=status.HTTP_401_UNAUTHORIZED,
detail="Could not validate credentials", detail="Could not validate credentials",
) from None )

View file

@ -34,13 +34,7 @@ async def seed_default_admin(db) -> None:
print(f"✅ Default admin {DEFAULT_ADMIN_EMAIL} already exists") print(f"✅ Default admin {DEFAULT_ADMIN_EMAIL} already exists")
return return
password = os.environ.get("DEFAULT_ADMIN_PASSWORD") password = os.environ.get("DEFAULT_ADMIN_PASSWORD", "ChangeMe123!")
if not password:
print(
"⚠️ DEFAULT_ADMIN_PASSWORD not set — skipping default admin creation. "
"Set this env var and restart to create the admin account."
)
return
user_doc = { user_doc = {
"_id": str(ObjectId()), "_id": str(ObjectId()),
"email": DEFAULT_ADMIN_EMAIL, "email": DEFAULT_ADMIN_EMAIL,

Binary file not shown.

View file

@ -1,245 +0,0 @@
"""
Central locale registry.
Provides a single source of truth for BCP-47 codes, display names,
and Gemini-friendly labels used throughout the translation/TTS pipeline.
Convention: BCP-47 with hyphen separator (fr-FR, en-GB, pt-BR).
xlsx underscore format (fr_fr, en_gb) is normalized at import time.
Bare language-only codes (fr, en) remain valid for legacy compat.
"""
from __future__ import annotations
from dataclasses import dataclass
@dataclass(frozen=True)
class Locale:
code: str # canonical BCP-47 (e.g. "fr-FR")
display_name: str # human-readable (e.g. "French (France)")
gemini_label: str # what to pass to Gemini prompts (e.g. "French (France)")
tts_lang: str # BCP-47 for TTS API (may differ, e.g. es-MX → es-US)
preview_sample: str # sample sentence for TTS preview
# Master locale registry. Bare language codes (legacy) + explicit region variants.
_REGISTRY: dict[str, Locale] = {loc.code: loc for loc in [
# ── English ──────────────────────────────────────────────────────────────
Locale("en", "English", "English", "en-US",
"This is a preview of the audio description voice."),
Locale("en-US", "English (US)", "English (United States)", "en-US",
"This is a preview of the audio description voice."),
Locale("en-GB", "English (UK)", "English (United Kingdom)", "en-GB",
"This is a preview of the audio description voice."),
Locale("en-CA", "English (Canada)", "English (Canada)", "en-CA",
"This is a preview of the audio description voice."),
# ── Spanish ──────────────────────────────────────────────────────────────
Locale("es", "Spanish", "Spanish", "es-US",
"Esta es una vista previa de la voz de audiodescripcion."),
Locale("es-ES", "Spanish (Spain)", "Spanish (Spain)", "es-ES",
"Esta es una vista previa de la voz de audiodescripción."),
Locale("es-MX", "Spanish (Mexico)", "Spanish (Mexico)", "es-US",
"Esta es una vista previa de la voz de audiodescripción."),
Locale("es-419", "Spanish (Latin America)", "Spanish (Latin America)", "es-US",
"Esta es una vista previa de la voz de audiodescripción."),
# ── French ───────────────────────────────────────────────────────────────
Locale("fr", "French", "French", "fr-FR",
"Ceci est un apercu de la voix de l'audiodescription."),
Locale("fr-FR", "French (France)", "French (France)", "fr-FR",
"Ceci est un aperçu de la voix de l'audiodescription."),
Locale("fr-CA", "French (Canada)", "French (Canada)", "fr-CA",
"Ceci est un aperçu de la voix de l'audiodescription."),
# ── German ───────────────────────────────────────────────────────────────
Locale("de", "German", "German", "de-DE",
"Dies ist eine Vorschau der Audiodeskriptionsstimme."),
Locale("de-DE", "German (Germany)", "German (Germany)", "de-DE",
"Dies ist eine Vorschau der Audiodeskriptionsstimme."),
# ── Italian ──────────────────────────────────────────────────────────────
Locale("it", "Italian", "Italian", "it-IT",
"Questa e un'anteprima della voce dell'audiodescrizione."),
Locale("it-IT", "Italian (Italy)", "Italian (Italy)", "it-IT",
"Questa è un'anteprima della voce dell'audiodescrizione."),
# ── Portuguese ───────────────────────────────────────────────────────────
Locale("pt", "Portuguese", "Portuguese", "pt-BR",
"Esta e uma previa da voz da audiodescricao."),
Locale("pt-BR", "Portuguese (Brazil)", "Portuguese (Brazil)", "pt-BR",
"Esta é uma prévia da voz da audiodescrição."),
Locale("pt-PT", "Portuguese (Portugal)", "Portuguese (Portugal)", "pt-PT",
"Esta é uma pré-visualização da voz da audiodescrição."),
# ── Japanese ─────────────────────────────────────────────────────────────
Locale("ja", "Japanese", "Japanese", "ja-JP",
"これは音声解説の声のプレビューです。"),
Locale("ja-JP", "Japanese (Japan)", "Japanese (Japan)", "ja-JP",
"これは音声解説の声のプレビューです。"),
# ── Korean ───────────────────────────────────────────────────────────────
Locale("ko", "Korean", "Korean", "ko-KR",
"이것은 오디오 설명 음성의 미리보기입니다."),
Locale("ko-KR", "Korean (Korea)", "Korean (South Korea)", "ko-KR",
"이것은 오디오 설명 음성의 미리보기입니다."),
# ── Arabic ───────────────────────────────────────────────────────────────
Locale("ar", "Arabic", "Arabic", "ar-EG",
"هذه معاينة لصوت الوصف الصوتي."),
# ── Hindi ────────────────────────────────────────────────────────────────
Locale("hi", "Hindi", "Hindi", "hi-IN",
"यह ऑडियो विवरण आवाज का पूर्वावलोकन है।"),
# ── Indonesian ───────────────────────────────────────────────────────────
Locale("id", "Indonesian", "Indonesian", "id-ID",
"Ini adalah pratinjau suara deskripsi audio."),
Locale("id-ID", "Indonesian (Indonesia)", "Indonesian (Indonesia)", "id-ID",
"Ini adalah pratinjau suara deskripsi audio."),
# ── Dutch ────────────────────────────────────────────────────────────────
Locale("nl", "Dutch", "Dutch", "nl-NL",
"Dit is een voorbeeld van de audiodescriptiestem."),
Locale("nl-NL", "Dutch (Netherlands)", "Dutch (Netherlands)", "nl-NL",
"Dit is een voorbeeld van de audiodescriptiestem."),
# ── Polish ───────────────────────────────────────────────────────────────
Locale("pl", "Polish", "Polish", "pl-PL",
"To jest podglad glosu audiodeskrypcji."),
Locale("pl-PL", "Polish (Poland)", "Polish (Poland)", "pl-PL",
"To jest podgląd głosu audiodeskrypcji."),
# ── Russian ──────────────────────────────────────────────────────────────
Locale("ru", "Russian", "Russian", "ru-RU",
"Это предварительный просмотр голоса аудиоописания."),
# ── Thai ─────────────────────────────────────────────────────────────────
Locale("th", "Thai", "Thai", "th-TH",
"นี่คือตัวอย่างเสียงบรรยายภาพ"),
# ── Turkish ──────────────────────────────────────────────────────────────
Locale("tr", "Turkish", "Turkish", "tr-TR",
"Bu, sesli betimleme sesinin bir onizlemesidir."),
Locale("tr-TR", "Turkish (Turkey)", "Turkish (Turkey)", "tr-TR",
"Bu, sesli betimleme sesinin bir önizlemesidir."),
# ── Vietnamese ───────────────────────────────────────────────────────────
Locale("vi", "Vietnamese", "Vietnamese", "vi-VN",
"Day la ban xem truoc giong mo ta am thanh."),
# ── Romanian ─────────────────────────────────────────────────────────────
Locale("ro", "Romanian", "Romanian", "ro-RO",
"Aceasta este o previzualizare a vocii descrierii audio."),
# ── Ukrainian ────────────────────────────────────────────────────────────
Locale("uk", "Ukrainian", "Ukrainian", "uk-UA",
"Це попередній перегляд голосу аудіоопису."),
# ── Bengali ──────────────────────────────────────────────────────────────
Locale("bn", "Bengali", "Bengali", "bn-BD",
"এটি অডিও বর্ণনা ভয়েসের একটি প্রিভিউ।"),
# ── Marathi ──────────────────────────────────────────────────────────────
Locale("mr", "Marathi", "Marathi", "mr-IN",
"हे ऑडिओ वर्णन आवाजाचे पूर्वावलोकन आहे."),
# ── Tamil ────────────────────────────────────────────────────────────────
Locale("ta", "Tamil", "Tamil", "ta-IN",
"இது ஆடியோ விளக்க குரலின் முன்னோட்டம்."),
# ── Telugu ───────────────────────────────────────────────────────────────
Locale("te", "Telugu", "Telugu", "te-IN",
"ఇది ఆడియో వివరణ స్వరం యొక్క ప్రివ్యూ."),
# ── Chinese ──────────────────────────────────────────────────────────────
Locale("zh", "Chinese", "Chinese (Simplified)", "zh-CN",
"这是音频描述语音的预览。"),
# ── Czech ────────────────────────────────────────────────────────────────
Locale("cs", "Czech", "Czech", "cs-CZ",
"Toto je náhled hlasu zvukového popisu."),
Locale("cs-CZ", "Czech (Czech Republic)", "Czech (Czech Republic)", "cs-CZ",
"Toto je náhled hlasu zvukového popisu."),
# ── Danish ───────────────────────────────────────────────────────────────
Locale("da", "Danish", "Danish", "da-DK",
"Dette er en forhåndsvisning af lydbeskrivelsesstemmen."),
# ── Finnish ──────────────────────────────────────────────────────────────
Locale("fi", "Finnish", "Finnish", "fi-FI",
"Tämä on äänikuvauksen äänen esikatselu."),
# ── Hungarian ────────────────────────────────────────────────────────────
Locale("hu", "Hungarian", "Hungarian", "hu-HU",
"Ez a hangos leírás hangjának előnézete."),
# ── Norwegian ────────────────────────────────────────────────────────────
Locale("no", "Norwegian", "Norwegian", "nb-NO",
"Dette er en forhåndsvisning av lydbeskrivelsesstemmen."),
# ── Slovak ───────────────────────────────────────────────────────────────
Locale("sk", "Slovak", "Slovak", "sk-SK",
"Toto je náhľad hlasu zvukového popisu."),
# ── Swedish ──────────────────────────────────────────────────────────────
Locale("sv", "Swedish", "Swedish", "sv-SE",
"Det här är en förhandsgranskning av ljudbeskrivningsrösten."),
]}
# xlsx uses underscores; normalize to BCP-47 hyphen form
_XLSX_ALIASES: dict[str, str] = {
code.replace("-", "_").lower(): code
for code in _REGISTRY
if "-" in code
}
# a few extra mappings for edge cases
_XLSX_ALIASES.update({
"id": "id", # Indonesian column header is just "id" (no region)
})
def normalize_code(code: str) -> str:
"""
Normalize an arbitrary locale code to the canonical BCP-47 form used in this registry.
Handles:
- xlsx underscore form: "fr_fr" "fr-FR"
- Bare language code: "fr" "fr" (passthrough, legacy compat)
- Already canonical: "fr-FR" "fr-FR"
"""
if not code:
return code
lowered = code.strip().lower()
# e.g. "fr_fr" -> check alias table
if "_" in lowered:
return _XLSX_ALIASES.get(lowered, code.replace("_", "-").upper() if len(lowered) > 3 else code)
# Already hyphen form — canonicalise case
if "-" in code:
parts = code.split("-", 1)
canonical = f"{parts[0].lower()}-{parts[1].upper()}"
if canonical in _REGISTRY:
return canonical
return canonical
# Bare language code — return as-is (legacy)
return lowered
def get(code: str) -> Locale | None:
"""Return Locale for the given code, or None if unknown."""
canonical = normalize_code(code)
return _REGISTRY.get(canonical) or _REGISTRY.get(canonical.split("-")[0])
def get_display_name(code: str) -> str:
"""Human-readable display name, e.g. 'French (Canada)'."""
locale = get(code)
return locale.display_name if locale else code
def get_gemini_label(code: str) -> str:
"""
Label to use inside Gemini prompts, e.g. 'French (Canada)'.
Gemini models respond more reliably to human-readable language names
than to bare BCP-47 codes when used inside instruction prompts.
"""
locale = get(code)
return locale.gemini_label if locale else code
def get_tts_lang(code: str) -> str:
"""BCP-47 code for the TTS API (may differ from canonical, e.g. es-MX → es-US)."""
locale = get(code)
return locale.tts_lang if locale else code
def get_preview_sample(code: str) -> str:
"""Language-appropriate TTS preview sentence."""
locale = get(code)
if locale:
return locale.preview_sample
# fallback: try parent language then English
parent = get(code.split("-")[0]) if "-" in code else None
if parent:
return parent.preview_sample
return "This is a preview of the audio description voice."
def all_codes() -> list[str]:
"""Return all registered locale codes, sorted."""
return sorted(_REGISTRY.keys())
def all_display_map() -> dict[str, str]:
"""Return {code: display_name} for all registered locales."""
return {code: locale.display_name for code, locale in _REGISTRY.items()}

View file

@ -8,7 +8,6 @@ class VTTCue:
end_time: float # seconds end_time: float # seconds
text: str text: str
identifier: str | None = None identifier: str | None = None
settings: str = ""
class VTTParser: class VTTParser:
@ -38,11 +37,10 @@ class VTTParser:
# Parse timing line # Parse timing line
if " --> " in line: if " --> " in line:
timing_match = re.match(r'([\d:.,]+)\s+-->\s+([\d:.,]+)\s*(.*)', line) timing_match = re.match(r'([\d:.,]+)\s+-->\s+([\d:.,]+)', line)
if timing_match: if timing_match:
start_time = VTTParser._parse_timestamp(timing_match.group(1)) start_time = VTTParser._parse_timestamp(timing_match.group(1))
end_time = VTTParser._parse_timestamp(timing_match.group(2)) end_time = VTTParser._parse_timestamp(timing_match.group(2))
settings = timing_match.group(3).strip()
# Collect text lines until empty line or next cue # Collect text lines until empty line or next cue
i += 1 i += 1
@ -51,13 +49,13 @@ class VTTParser:
text_lines.append(lines[i].strip()) text_lines.append(lines[i].strip())
i += 1 i += 1
cues.append(VTTCue( if text_lines:
start_time=start_time, cues.append(VTTCue(
end_time=end_time, start_time=start_time,
text="\n".join(text_lines), end_time=end_time,
identifier=identifier, text="\n".join(text_lines),
settings=settings, identifier=identifier
)) ))
else: else:
i += 1 i += 1
@ -73,19 +71,16 @@ class VTTParser:
if cue.identifier: if cue.identifier:
lines.append(cue.identifier) lines.append(cue.identifier)
# Add timing line (preserve cue settings like line:0%) # Add timing line
start_timestamp = VTTParser._format_timestamp(cue.start_time) start_timestamp = VTTParser._format_timestamp(cue.start_time)
end_timestamp = VTTParser._format_timestamp(cue.end_time) end_timestamp = VTTParser._format_timestamp(cue.end_time)
timing_line = f"{start_timestamp} --> {end_timestamp}" lines.append(f"{start_timestamp} --> {end_timestamp}")
if cue.settings:
timing_line += f" {cue.settings}"
lines.append(timing_line)
# Add text (can be multi-line) # Add text (can be multi-line)
lines.append(cue.text) lines.append(cue.text)
lines.append("") # Empty line between cues lines.append("") # Empty line between cues
return "\n".join(lines) + "\n" return "\n".join(lines)
@staticmethod @staticmethod
def _parse_timestamp(timestamp: str) -> float: def _parse_timestamp(timestamp: str) -> float:
@ -126,7 +121,7 @@ class VTTParser:
secs = seconds % 60 secs = seconds % 60
whole_secs = int(secs) whole_secs = int(secs)
milliseconds = round((secs - whole_secs) * 1000) milliseconds = int((secs - whole_secs) * 1000)
return f"{hours:02d}:{minutes:02d}:{whole_secs:02d}.{milliseconds:03d}" return f"{hours:02d}:{minutes:02d}:{whole_secs:02d}.{milliseconds:03d}"
@ -153,22 +148,6 @@ class VTTEditor:
return VTTParser.build(cues) return VTTParser.build(cues)
@staticmethod
def assert_cue_alignment(en_vtt: str, target_vtt: str, lang: str) -> None:
"""Raise ValueError if target VTT cue count or timestamps diverge from EN master."""
en_cues = VTTParser.parse(en_vtt)
tgt_cues = VTTParser.parse(target_vtt)
if len(tgt_cues) != len(en_cues):
raise ValueError(
f"Cue count mismatch for {lang}: EN has {len(en_cues)}, target has {len(tgt_cues)}"
)
for i, (en, tgt) in enumerate(zip(en_cues, tgt_cues, strict=True)):
if en.start_time != tgt.start_time or en.end_time != tgt.end_time:
raise ValueError(
f"Timestamp mismatch for {lang} cue {i}: "
f"EN {en.start_time}-->{en.end_time}, target {tgt.start_time}-->{tgt.end_time}"
)
@staticmethod @staticmethod
def update_cue_text(vtt_content: str, cue_index: int, new_text: str) -> str: def update_cue_text(vtt_content: str, cue_index: int, new_text: str) -> str:
"""Update text for a specific cue by index""" """Update text for a specific cue by index"""
@ -207,20 +186,6 @@ class VTTEditor:
return len(errors) == 0, errors return len(errors) == 0, errors
@staticmethod
def fix_overlapping_cues(vtt_content: str) -> str:
"""Trim end_time of each cue so it does not overlap the next cue's start_time."""
cues = VTTParser.parse(vtt_content)
for i in range(1, len(cues)):
if cues[i].start_time < cues[i - 1].end_time:
# Clamp previous cue end to 1ms before next cue start
new_end = cues[i].start_time - 0.001
# Never let end_time go at or below start_time
if new_end <= cues[i - 1].start_time:
new_end = cues[i - 1].start_time + 0.001
cues[i - 1].end_time = new_end
return VTTParser.build(cues)
@staticmethod @staticmethod
def get_cue_count(vtt_content: str) -> int: def get_cue_count(vtt_content: str) -> int:
"""Get the number of cues in VTT content""" """Get the number of cues in VTT content"""
@ -256,7 +221,7 @@ class VTTEditor:
) )
return False, errors return False, errors
for i, (src, tgt) in enumerate(zip(source_cues, translated_cues, strict=False)): for i, (src, tgt) in enumerate(zip(source_cues, translated_cues)):
if abs(src.start_time - tgt.start_time) > 0.001: if abs(src.start_time - tgt.start_time) > 0.001:
errors.append( errors.append(
f"Cue {i + 1}: start time changed " f"Cue {i + 1}: start time changed "
@ -286,33 +251,3 @@ class VTTEditor:
return VTTParser.build(cues) return VTTParser.build(cues)
# DCMP §6.01 filler patterns per language (whole-word, case-insensitive)
_FILLER_PATTERNS: dict[str, str] = {
"en": r'\b(um+|uh+|ah+|er+|hmm+|you know|i mean|sort of|kind of|basically|literally|honestly|actually|right\?|so yeah)\b',
"es": r'\b(eh+|este|o sea|pues|bueno|o sea que|mmm+)\b',
"fr": r'\b(euh+|beh|ben|donc|quoi|enfin|voilà|genre)\b',
"de": r'\b(äh+|ähm+|halt|ne|also|naja|sozusagen|quasi)\b',
"it": r'\b(ehm+|allora|cioè|tipo|praticamente|insomma|ecco)\b',
"nl": r'\b(eh+|nou|zeg|eigenlijk|gewoon|toch|zo van|hè)\b',
"pt": r'\b(ahn+|hã+|né|sabe|tipo|então|assim)\b',
"pl": r'\b(no|że|bo|znaczy|właśnie|jakby|wiesz)\b',
"uk": r'\b(ну+|ем+|типу|знаєш|значить|власне|от)\b',
"ru": r'\b(ну+|эм+|типа|знаешь|значит|вот|собственно)\b',
}
@staticmethod
def clean_disfluencies(vtt_content: str, lang: str) -> str:
"""Remove filler words and hesitations per DCMP §6.01 for supported languages."""
pattern = VTTEditor._FILLER_PATTERNS.get(lang.split("-")[0].lower())
if not pattern:
return vtt_content
cues = VTTParser.parse(vtt_content)
compiled = re.compile(pattern, re.IGNORECASE)
for cue in cues:
cleaned = compiled.sub("", cue.text)
# Collapse multiple spaces and strip leading/trailing punctuation artifacts
cleaned = re.sub(r'[ \t]{2,}', ' ', cleaned).strip().strip(',').strip()
if cleaned:
cue.text = cleaned
return VTTParser.build(cues)

View file

@ -1,55 +1,48 @@
from contextlib import asynccontextmanager from contextlib import asynccontextmanager
import sentry_sdk import sentry_sdk
from fastapi import FastAPI, HTTPException, Request from fastapi import FastAPI, Request, HTTPException
from fastapi.exceptions import RequestValidationError from fastapi.exceptions import RequestValidationError
from fastapi.middleware.cors import CORSMiddleware from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse from fastapi.responses import JSONResponse
from sentry_sdk.integrations.celery import CeleryIntegration
from sentry_sdk.integrations.fastapi import FastApiIntegration from sentry_sdk.integrations.fastapi import FastApiIntegration
from sentry_sdk.integrations.pymongo import PyMongoIntegration
from sentry_sdk.integrations.redis import RedisIntegration from sentry_sdk.integrations.redis import RedisIntegration
from sentry_sdk.integrations.pymongo import PyMongoIntegration
from sentry_sdk.integrations.celery import CeleryIntegration
from .api.v1.routes_admin import router as admin_router from .api.v1.routes_admin import router as admin_router
from .api.v1.routes_admin_production import router as admin_production_router
from .api.v1.routes_auth import router as auth_router from .api.v1.routes_auth import router as auth_router
from .api.v1.routes_briefs import router as briefs_router
from .api.v1.routes_clients import router as clients_router from .api.v1.routes_clients import router as clients_router
from .api.v1.routes_files import router as files_router from .api.v1.routes_files import router as files_router
from .api.v1.routes_glossaries import router as glossaries_router from .api.v1.routes_jobs import router as jobs_router
from .api.v1.routes_invitations import org_router as invitations_org_router from .api.v1.routes_invitations import org_router as invitations_org_router
from .api.v1.routes_invitations import router as invitations_router from .api.v1.routes_invitations import router as invitations_router
from .api.v1.routes_jobs import router as jobs_router
from .api.v1.routes_language_qc import router as language_qc_router
from .api.v1.routes_organizations import router as organizations_router from .api.v1.routes_organizations import router as organizations_router
from .api.v1.routes_review_notes import router as review_notes_router from .api.v1.routes_review_notes import router as review_notes_router
from .api.v1.routes_share import router as share_router
from .api.v1.routes_tts import router as tts_router from .api.v1.routes_tts import router as tts_router
from .api.v1.routes_vtt_versions import router as vtt_versions_router
from .api.v1.routes_websockets import router as websockets_router from .api.v1.routes_websockets import router as websockets_router
from .services.websocket import connection_manager
from .core.config import settings from .core.config import settings
from .core.database import ( from .core.secrets_config import initialize_config
close_mongo_connection, from .core.database import close_mongo_connection, connect_to_mongo, create_indexes, get_database
connect_to_mongo,
get_database,
)
from .core.logging import setup_logging from .core.logging import setup_logging
from .core.redis import close_redis_connection, connect_to_redis, get_redis_client from .core.redis import close_redis_connection, connect_to_redis, get_redis_client
from .core.secrets_config import initialize_config
from .core.seed import seed_default_admin from .core.seed import seed_default_admin
from .middleware import create_rate_limit_middleware, create_validation_middleware from .middleware import create_rate_limit_middleware, create_validation_middleware
from .services.language_qc import seed_language_qc_for_job
from .services.websocket import connection_manager
from .telemetry import ( from .telemetry import (
app_metrics, app_metrics,
instrument_dependencies,
instrument_fastapi_app,
setup_tracing
) )
from .services.websocket import connection_manager
@asynccontextmanager @asynccontextmanager
async def lifespan(app: FastAPI): async def lifespan(app: FastAPI):
# Startup # Startup
setup_logging() setup_logging()
# Initialize configuration with secrets # Initialize configuration with secrets
if settings.app_env == "prod": if settings.app_env == "prod":
try: try:
@ -58,7 +51,7 @@ async def lifespan(app: FastAPI):
except Exception as e: except Exception as e:
print(f"⚠️ Failed to load secrets from Secret Manager: {e}") print(f"⚠️ Failed to load secrets from Secret Manager: {e}")
print("⚠️ Falling back to environment variables") print("⚠️ Falling back to environment variables")
# Initialize Sentry error tracking # Initialize Sentry error tracking
if settings.sentry_dsn and settings.sentry_dsn.startswith(('http', 'https')): if settings.sentry_dsn and settings.sentry_dsn.startswith(('http', 'https')):
sentry_sdk.init( sentry_sdk.init(
@ -75,15 +68,15 @@ async def lifespan(app: FastAPI):
attach_stacktrace=True, attach_stacktrace=True,
send_default_pii=False, # Don't send PII for privacy send_default_pii=False, # Don't send PII for privacy
) )
# Initialize telemetry (disabled for local development) # Initialize telemetry (disabled for local development)
# setup_tracing("accessible-video-api", "1.0.0") # setup_tracing("accessible-video-api", "1.0.0")
# instrument_dependencies() # instrument_dependencies()
# Start Prometheus metrics server in production # Start Prometheus metrics server in production
if settings.app_env == "prod": if settings.app_env == "prod":
app_metrics.start_prometheus_server(port=8001) app_metrics.start_prometheus_server(port=8001)
await connect_to_mongo() await connect_to_mongo()
await connect_to_redis() await connect_to_redis()
@ -93,37 +86,20 @@ async def lifespan(app: FastAPI):
except Exception as e: except Exception as e:
print(f"⚠️ Could not seed default admin: {e}") print(f"⚠️ Could not seed default admin: {e}")
# await create_indexes() # Temporarily disabled for debugging # await create_indexes() # Temporarily disabled for debugging
# T-16: Seed language_qc only for jobs that still lack it (idempotent, skips on subsequent starts)
try:
db = await get_database()
pending_count = await db.jobs.count_documents({"language_qc": {"$exists": False}})
if pending_count > 0:
async for job_doc in db.jobs.find(
{"language_qc": {"$exists": False}},
{"_id": 1, "status": 1, "outputs": 1, "source": 1, "review": 1, "updated_at": 1, "requested_outputs": 1},
):
await seed_language_qc_for_job(db, job_doc)
print(f"✅ language_qc migration complete ({pending_count} jobs seeded)")
except Exception as e:
print(f"⚠️ language_qc migration failed: {e}")
# Start WebSocket connection manager # Start WebSocket connection manager
await connection_manager.start() await connection_manager.start()
# Initialize middleware with Redis client # Initialize middleware with Redis client
redis_client = get_redis_client() redis_client = get_redis_client()
if redis_client: if redis_client:
rate_limit_middleware = await create_rate_limit_middleware(redis_client) rate_limit_middleware = await create_rate_limit_middleware(redis_client)
validation_middleware = await create_validation_middleware() validation_middleware = await create_validation_middleware()
# Store middleware in app state for access # Store middleware in app state for access
app.state.rate_limit_middleware = rate_limit_middleware app.state.rate_limit_middleware = rate_limit_middleware
app.state.validation_middleware = validation_middleware app.state.validation_middleware = validation_middleware
elif settings.redis_url:
# T-13: REDIS_URL is configured but client unavailable — rate limiting is disabled
print(f"⚠️ Redis configured at {settings.redis_url!r} but connection failed — rate limiting disabled")
yield yield
# Shutdown # Shutdown
await connection_manager.stop() await connection_manager.stop()
@ -155,17 +131,18 @@ async def cors_error_handler(request, call_next):
try: try:
response = await call_next(request) response = await call_next(request)
except Exception as e: except Exception as e:
# LOG THE EXCEPTION BEFORE HANDLING IT
print(f"🚨 EXCEPTION IN CORS MIDDLEWARE: {e}")
import traceback import traceback
print(f"Traceback:\n{traceback.format_exc()}")
from .core.logging import get_logger as _get_logger # Handle any unhandled exceptions and add CORS headers
_get_logger(__name__).exception("🚨 CORS middleware caught: %s\n%s", e, traceback.format_exc())
from fastapi.responses import JSONResponse from fastapi.responses import JSONResponse
response = JSONResponse( response = JSONResponse(
status_code=500, status_code=500,
content={"detail": "Internal server error"}, content={"detail": "Internal server error", "error": str(e)}
) )
# Always add CORS headers for allowed origins # Always add CORS headers for allowed origins
origin = request.headers.get("origin") origin = request.headers.get("origin")
if origin and origin in settings.cors_origins_list: if origin and origin in settings.cors_origins_list:
@ -186,7 +163,7 @@ async def http_exception_handler(request: Request, exc: HTTPException):
status_code=exc.status_code, status_code=exc.status_code,
content={"detail": exc.detail} content={"detail": exc.detail}
) )
# Add CORS headers # Add CORS headers
origin = request.headers.get("origin") origin = request.headers.get("origin")
if origin and origin in settings.cors_origins_list: if origin and origin in settings.cors_origins_list:
@ -221,18 +198,21 @@ async def validation_exception_handler(request: Request, exc: RequestValidationE
async def general_exception_handler(request: Request, exc: Exception): async def general_exception_handler(request: Request, exc: Exception):
"""Handle all uncaught exceptions with logging""" """Handle all uncaught exceptions with logging"""
import traceback import traceback
from .core.logging import get_logger from .core.logging import get_logger
logger = get_logger(__name__) logger = get_logger(__name__)
logger.exception( logger.error(f"Unhandled exception in {request.method} {request.url.path}: {exc}")
"🚨 Unhandled %s %s: %s\n%s", logger.error(f"Exception type: {type(exc).__name__}")
request.method, request.url.path, exc, traceback.format_exc(), logger.error(f"Traceback: {traceback.format_exc()}")
)
# Also print to stdout for immediate visibility
print(f"🚨 UNHANDLED EXCEPTION: {request.method} {request.url.path}")
print(f"Exception: {exc}")
print(f"Traceback:\n{traceback.format_exc()}")
response = JSONResponse( response = JSONResponse(
status_code=500, status_code=500,
content={"detail": "Internal server error"}, content={"detail": "Internal server error", "error": str(exc)}
) )
# Add CORS headers # Add CORS headers
@ -247,6 +227,9 @@ async def general_exception_handler(request: Request, exc: Exception):
@app.middleware("http") @app.middleware("http")
async def rate_limiting_middleware(request, call_next): async def rate_limiting_middleware(request, call_next):
"""Apply rate limiting middleware.""" """Apply rate limiting middleware."""
# Skip middleware for auth endpoints during debugging
if request.url.path in ["/api/v1/auth/login", "/api/v1/auth/refresh"]:
return await call_next(request)
if hasattr(app.state, 'rate_limit_middleware'): if hasattr(app.state, 'rate_limit_middleware'):
return await app.state.rate_limit_middleware(request, call_next) return await app.state.rate_limit_middleware(request, call_next)
return await call_next(request) return await call_next(request)
@ -254,7 +237,11 @@ async def rate_limiting_middleware(request, call_next):
@app.middleware("http") @app.middleware("http")
async def validation_middleware(request, call_next): async def validation_middleware(request, call_next):
"""Apply request validation middleware.""" """Apply request validation middleware."""
if request.url.path in ["/health", "/metrics", "/api/v1/auth/login", "/api/v1/auth/refresh"]: # TEMPORARILY DISABLED FOR DEBUGGING
return await call_next(request)
# Skip middleware for auth endpoints during debugging
if request.url.path in ["/api/v1/auth/login", "/api/v1/auth/refresh"]:
return await call_next(request) return await call_next(request)
if hasattr(app.state, 'validation_middleware'): if hasattr(app.state, 'validation_middleware'):
return await app.state.validation_middleware(request, call_next) return await app.state.validation_middleware(request, call_next)
@ -272,28 +259,54 @@ app.include_router(invitations_router, prefix="/api/v1")
app.include_router(files_router, prefix="/api/v1") app.include_router(files_router, prefix="/api/v1")
app.include_router(jobs_router, prefix="/api/v1") app.include_router(jobs_router, prefix="/api/v1")
app.include_router(review_notes_router, prefix="/api/v1") app.include_router(review_notes_router, prefix="/api/v1")
app.include_router(vtt_versions_router, prefix="/api/v1")
app.include_router(language_qc_router, prefix="/api/v1")
app.include_router(glossaries_router, prefix="/api/v1")
app.include_router(tts_router, prefix="/api/v1") app.include_router(tts_router, prefix="/api/v1")
app.include_router(admin_router, prefix="/api/v1") app.include_router(admin_router, prefix="/api/v1")
app.include_router(admin_production_router, prefix="/api/v1")
app.include_router(briefs_router, prefix="/api/v1")
app.include_router(share_router, prefix="/api/v1")
app.include_router(websockets_router, prefix="/api/v1") app.include_router(websockets_router, prefix="/api/v1")
@app.on_event("startup")
async def startup_event():
"""Initialize services on startup"""
logger.info("🚀 Starting up FastAPI application...")
# Start WebSocket connection manager
try:
await connection_manager.start()
logger.info("✅ WebSocket connection manager started successfully")
except Exception as e:
logger.error(f"❌ Failed to start WebSocket connection manager: {e}")
raise
@app.on_event("shutdown")
async def shutdown_event():
"""Cleanup services on shutdown"""
logger.info("🛑 Shutting down FastAPI application...")
# Stop WebSocket connection manager
try:
await connection_manager.stop()
logger.info("✅ WebSocket connection manager stopped successfully")
except Exception as e:
logger.error(f"❌ Error stopping WebSocket connection manager: {e}")
@app.get("/health") @app.get("/health")
async def health_check(): async def health_check():
return {"status": "healthy", "version": "1.0.0"} return {"status": "healthy", "version": "1.0.0"}
@app.get("/debug-test")
async def debug_test():
print("🔥🔥🔥 DEBUG TEST ENDPOINT HIT 🔥🔥🔥")
return {"message": "If you see this, routing works"}
@app.get("/metrics") @app.get("/metrics")
async def metrics(): async def metrics():
"""Prometheus metrics endpoint""" """Prometheus metrics endpoint"""
from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
from fastapi import Response from fastapi import Response
from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
return Response( return Response(
content=generate_latest(), content=generate_latest(),
media_type=CONTENT_TYPE_LATEST media_type=CONTENT_TYPE_LATEST

View file

@ -1,16 +1,12 @@
"""Middleware package for FastAPI application.""" """Middleware package for FastAPI application."""
from .rate_limiting import ( from .rate_limiting import RateLimitMiddleware, IPWhitelist, create_rate_limit_middleware
IPWhitelist,
RateLimitMiddleware,
create_rate_limit_middleware,
)
from .validation import ValidationMiddleware, create_validation_middleware from .validation import ValidationMiddleware, create_validation_middleware
__all__ = [ __all__ = [
"RateLimitMiddleware", "RateLimitMiddleware",
"IPWhitelist", "IPWhitelist",
"create_rate_limit_middleware", "create_rate_limit_middleware",
"ValidationMiddleware", "ValidationMiddleware",
"create_validation_middleware" "create_validation_middleware"
] ]

View file

@ -1,10 +1,14 @@
"""Rate limiting middleware for API endpoints.""" """Rate limiting middleware for API endpoints."""
import time import time
from collections import defaultdict
from typing import Dict, Optional, Tuple
import redis.asyncio as aioredis import redis.asyncio as aioredis
from fastapi import Request, status from fastapi import HTTPException, Request, status
from fastapi.responses import JSONResponse from fastapi.responses import JSONResponse
import json
import asyncio
from datetime import datetime, timedelta
from app.core.config import get_settings from app.core.config import get_settings
from app.telemetry.metrics import track_rate_limit_metrics from app.telemetry.metrics import track_rate_limit_metrics
@ -12,50 +16,50 @@ from app.telemetry.metrics import track_rate_limit_metrics
class RateLimiter: class RateLimiter:
"""Redis-based rate limiter with sliding window algorithm.""" """Redis-based rate limiter with sliding window algorithm."""
def __init__(self, redis_client: aioredis.Redis): def __init__(self, redis_client: aioredis.Redis):
self.redis = redis_client self.redis = redis_client
async def is_allowed( async def is_allowed(
self, self,
key: str, key: str,
limit: int, limit: int,
window_seconds: int, window_seconds: int,
identifier: str = "" identifier: str = ""
) -> tuple[bool, dict[str, int]]: ) -> Tuple[bool, Dict[str, int]]:
""" """
Check if request is allowed under rate limit. Check if request is allowed under rate limit.
Returns: Returns:
Tuple of (is_allowed, rate_limit_info) Tuple of (is_allowed, rate_limit_info)
""" """
now = time.time() now = time.time()
pipeline = self.redis.pipeline() pipeline = self.redis.pipeline()
# Remove expired entries # Remove expired entries
pipeline.zremrangebyscore(key, 0, now - window_seconds) pipeline.zremrangebyscore(key, 0, now - window_seconds)
# Count current requests in window # Count current requests in window
pipeline.zcard(key) pipeline.zcard(key)
# Add current request # Add current request
pipeline.zadd(key, {str(now): now}) pipeline.zadd(key, {str(now): now})
# Set expiry # Set expiry
pipeline.expire(key, window_seconds) pipeline.expire(key, window_seconds)
results = await pipeline.execute() results = await pipeline.execute()
current_requests = results[1] current_requests = results[1]
rate_limit_info = { rate_limit_info = {
"limit": limit, "limit": limit,
"remaining": max(0, limit - current_requests), "remaining": max(0, limit - current_requests),
"reset_time": int(now + window_seconds), "reset_time": int(now + window_seconds),
"retry_after": window_seconds if current_requests >= limit else 0 "retry_after": window_seconds if current_requests >= limit else 0
} }
is_allowed = current_requests <= limit is_allowed = current_requests <= limit
# Track metrics # Track metrics
track_rate_limit_metrics( track_rate_limit_metrics(
identifier=identifier, identifier=identifier,
@ -63,17 +67,17 @@ class RateLimiter:
current_requests=current_requests, current_requests=current_requests,
limit=limit limit=limit
) )
return is_allowed, rate_limit_info return is_allowed, rate_limit_info
class RateLimitMiddleware: class RateLimitMiddleware:
"""FastAPI middleware for rate limiting.""" """FastAPI middleware for rate limiting."""
def __init__(self, redis_client: aioredis.Redis): def __init__(self, redis_client: aioredis.Redis):
self.limiter = RateLimiter(redis_client) self.limiter = RateLimiter(redis_client)
self.settings = get_settings() self.settings = get_settings()
# Rate limit configurations by endpoint pattern # Rate limit configurations by endpoint pattern
self.rate_limits = { self.rate_limits = {
# Authentication endpoints # Authentication endpoints
@ -81,96 +85,93 @@ class RateLimitMiddleware:
"POST:/api/v1/auth/register": (3, 3600), # 3 requests per hour "POST:/api/v1/auth/register": (3, 3600), # 3 requests per hour
"POST:/api/v1/auth/refresh": (10, 300), # 10 requests per 5 minutes "POST:/api/v1/auth/refresh": (10, 300), # 10 requests per 5 minutes
"POST:/api/v1/auth/forgot-password": (3, 3600), # 3 requests per hour "POST:/api/v1/auth/forgot-password": (3, 3600), # 3 requests per hour
# File upload endpoints # File upload endpoints
"POST:/api/v1/files/upload": (10, 3600), # 10 uploads per hour "POST:/api/v1/files/upload": (10, 3600), # 10 uploads per hour
"POST:/api/v1/jobs": (20, 3600), # 20 job creations per hour "POST:/api/v1/jobs": (20, 3600), # 20 job creations per hour
# Job management endpoints # Job management endpoints
"GET:/api/v1/jobs": (100, 300), # 100 requests per 5 minutes "GET:/api/v1/jobs": (100, 300), # 100 requests per 5 minutes
"PATCH:/api/v1/jobs/*/approve": (50, 3600), # 50 approvals per hour "PATCH:/api/v1/jobs/*/approve": (50, 3600), # 50 approvals per hour
"PATCH:/api/v1/jobs/*/reject": (50, 3600), # 50 rejections per hour "PATCH:/api/v1/jobs/*/reject": (50, 3600), # 50 rejections per hour
# VTT editing endpoints # VTT editing endpoints
"PATCH:/api/v1/jobs/*/vtt": (100, 3600), # 100 VTT edits per hour "PATCH:/api/v1/jobs/*/vtt": (100, 3600), # 100 VTT edits per hour
# Admin endpoints (more restrictive) # Admin endpoints (more restrictive)
"GET:/api/v1/admin/*": (50, 300), # 50 requests per 5 minutes "GET:/api/v1/admin/*": (50, 300), # 50 requests per 5 minutes
"POST:/api/v1/admin/*": (20, 3600), # 20 admin actions per hour "POST:/api/v1/admin/*": (20, 3600), # 20 admin actions per hour
"PATCH:/api/v1/admin/*": (20, 3600), # 20 admin updates per hour "PATCH:/api/v1/admin/*": (20, 3600), # 20 admin updates per hour
"DELETE:/api/v1/admin/*": (10, 3600), # 10 admin deletions per hour "DELETE:/api/v1/admin/*": (10, 3600), # 10 admin deletions per hour
} }
# Default rate limits # Default rate limits
self.default_limits = { self.default_limits = {
"authenticated": (1000, 3600), # 1000 requests per hour for authenticated users "authenticated": (1000, 3600), # 1000 requests per hour for authenticated users
"anonymous": (100, 3600), # 100 requests per hour for anonymous users "anonymous": (100, 3600), # 100 requests per hour for anonymous users
} }
def _get_client_identifier(self, request: Request) -> str: def _get_client_identifier(self, request: Request) -> str:
"""Get client identifier for rate limiting.""" """Get client identifier for rate limiting."""
# Try to get user ID from JWT token
user = getattr(request.state, 'user', None) user = getattr(request.state, 'user', None)
if user: if user:
return f"user:{user.id}" return f"user:{user.id}"
# Only trust X-Forwarded-For when the request arrived via HTTPS (i.e. through # Fall back to IP address
# the Apache/nginx reverse proxy). On plain HTTP (direct connections, local forwarded_for = request.headers.get("X-Forwarded-For")
# dev) the header can be forged, so we fall back to the socket IP. if forwarded_for:
if request.headers.get("X-Forwarded-Proto") == "https": return f"ip:{forwarded_for.split(',')[0].strip()}"
forwarded_for = request.headers.get("X-Forwarded-For")
if forwarded_for:
# Take the right-most IP added by the trusted proxy, not client-supplied ones.
return f"ip:{forwarded_for.split(',')[-1].strip()}"
client_ip = request.client.host if request.client else "unknown" client_ip = request.client.host if request.client else "unknown"
return f"ip:{client_ip}" return f"ip:{client_ip}"
def _get_endpoint_key(self, request: Request) -> str: def _get_endpoint_key(self, request: Request) -> str:
"""Get endpoint pattern for rate limiting.""" """Get endpoint pattern for rate limiting."""
method = request.method method = request.method
path = request.url.path path = request.url.path
# Replace job IDs with wildcard for pattern matching # Replace job IDs with wildcard for pattern matching
import re import re
path = re.sub(r'/jobs/[a-f0-9-]+/', '/jobs/*/', path) path = re.sub(r'/jobs/[a-f0-9-]+/', '/jobs/*/', path)
path = re.sub(r'/admin/users/[a-f0-9-]+', '/admin/users/*', path) path = re.sub(r'/admin/users/[a-f0-9-]+', '/admin/users/*', path)
return f"{method}:{path}" return f"{method}:{path}"
def _get_rate_limit(self, request: Request) -> tuple[int, int]: def _get_rate_limit(self, request: Request) -> Tuple[int, int]:
"""Get rate limit for the current request.""" """Get rate limit for the current request."""
endpoint_key = self._get_endpoint_key(request) endpoint_key = self._get_endpoint_key(request)
# Check for specific endpoint limits # Check for specific endpoint limits
if endpoint_key in self.rate_limits: if endpoint_key in self.rate_limits:
return self.rate_limits[endpoint_key] return self.rate_limits[endpoint_key]
# Check for wildcard matches # Check for wildcard matches
for pattern, limits in self.rate_limits.items(): for pattern, limits in self.rate_limits.items():
if pattern.endswith("*") and endpoint_key.startswith(pattern[:-1]): if pattern.endswith("*") and endpoint_key.startswith(pattern[:-1]):
return limits return limits
# Use default limits based on authentication # Use default limits based on authentication
user = getattr(request.state, 'user', None) user = getattr(request.state, 'user', None)
if user: if user:
return self.default_limits["authenticated"] return self.default_limits["authenticated"]
else: else:
return self.default_limits["anonymous"] return self.default_limits["anonymous"]
async def __call__(self, request: Request, call_next): async def __call__(self, request: Request, call_next):
"""Process rate limiting for the request.""" """Process rate limiting for the request."""
# Skip rate limiting for health checks and metrics only # Skip rate limiting for health checks and login (temporary for debugging)
if request.url.path in ["/health", "/metrics"]: if request.url.path in ["/health", "/metrics", "/api/v1/auth/login"]:
return await call_next(request) return await call_next(request)
client_id = self._get_client_identifier(request) client_id = self._get_client_identifier(request)
endpoint_key = self._get_endpoint_key(request) endpoint_key = self._get_endpoint_key(request)
limit, window = self._get_rate_limit(request) limit, window = self._get_rate_limit(request)
# Create rate limit key # Create rate limit key
rate_limit_key = f"rate_limit:{client_id}:{endpoint_key}" rate_limit_key = f"rate_limit:{client_id}:{endpoint_key}"
try: try:
is_allowed, rate_info = await self.limiter.is_allowed( is_allowed, rate_info = await self.limiter.is_allowed(
key=rate_limit_key, key=rate_limit_key,
@ -178,7 +179,7 @@ class RateLimitMiddleware:
window_seconds=window, window_seconds=window,
identifier=client_id identifier=client_id
) )
if not is_allowed: if not is_allowed:
# Return rate limit exceeded response # Return rate limit exceeded response
return JSONResponse( return JSONResponse(
@ -195,17 +196,17 @@ class RateLimitMiddleware:
"Retry-After": str(rate_info["retry_after"]) "Retry-After": str(rate_info["retry_after"])
} }
) )
# Process the request # Process the request
response = await call_next(request) response = await call_next(request)
# Add rate limit headers to response # Add rate limit headers to response
response.headers["X-RateLimit-Limit"] = str(rate_info["limit"]) response.headers["X-RateLimit-Limit"] = str(rate_info["limit"])
response.headers["X-RateLimit-Remaining"] = str(rate_info["remaining"]) response.headers["X-RateLimit-Remaining"] = str(rate_info["remaining"])
response.headers["X-RateLimit-Reset"] = str(rate_info["reset_time"]) response.headers["X-RateLimit-Reset"] = str(rate_info["reset_time"])
return response return response
except Exception as e: except Exception as e:
# Log error but don't block request if rate limiting fails # Log error but don't block request if rate limiting fails
print(f"Rate limiting error: {e}") print(f"Rate limiting error: {e}")
@ -214,30 +215,30 @@ class RateLimitMiddleware:
class IPWhitelist: class IPWhitelist:
"""IP whitelist for bypassing rate limits.""" """IP whitelist for bypassing rate limits."""
def __init__(self, redis_client: aioredis.Redis): def __init__(self, redis_client: aioredis.Redis):
self.redis = redis_client self.redis = redis_client
self.whitelist_key = "ip_whitelist" self.whitelist_key = "ip_whitelist"
# Default whitelisted IPs (health checks, monitoring) # Default whitelisted IPs (health checks, monitoring)
self.default_whitelist = { self.default_whitelist = {
"127.0.0.1", "127.0.0.1",
"::1", "::1",
"169.254.169.254", # GCP metadata server "169.254.169.254", # GCP metadata server
} }
async def is_whitelisted(self, ip: str) -> bool: async def is_whitelisted(self, ip: str) -> bool:
"""Check if IP is whitelisted.""" """Check if IP is whitelisted."""
if ip in self.default_whitelist: if ip in self.default_whitelist:
return True return True
try: try:
is_member = await self.redis.sismember(self.whitelist_key, ip) is_member = await self.redis.sismember(self.whitelist_key, ip)
return bool(is_member) return bool(is_member)
except Exception: except Exception:
return False return False
async def add_ip(self, ip: str, ttl_seconds: int | None = None) -> bool: async def add_ip(self, ip: str, ttl_seconds: Optional[int] = None) -> bool:
"""Add IP to whitelist.""" """Add IP to whitelist."""
try: try:
await self.redis.sadd(self.whitelist_key, ip) await self.redis.sadd(self.whitelist_key, ip)
@ -248,7 +249,7 @@ class IPWhitelist:
return True return True
except Exception: except Exception:
return False return False
async def remove_ip(self, ip: str) -> bool: async def remove_ip(self, ip: str) -> bool:
"""Remove IP from whitelist.""" """Remove IP from whitelist."""
try: try:
@ -260,4 +261,4 @@ class IPWhitelist:
async def create_rate_limit_middleware(redis_client: aioredis.Redis) -> RateLimitMiddleware: async def create_rate_limit_middleware(redis_client: aioredis.Redis) -> RateLimitMiddleware:
"""Factory function to create rate limit middleware.""" """Factory function to create rate limit middleware."""
return RateLimitMiddleware(redis_client) return RateLimitMiddleware(redis_client)

View file

@ -3,17 +3,15 @@
import json import json
import re import re
import time import time
from typing import Any from typing import Any, Dict, List, Optional, Set
from fastapi import HTTPException, Request, status
from fastapi.responses import JSONResponse
from pydantic import BaseModel, ValidationError as PydanticValidationError
import magic
from urllib.parse import unquote from urllib.parse import unquote
import magic
from fastapi import Request, status
from fastapi.responses import JSONResponse
from app.telemetry.metrics import track_validation_metrics from app.telemetry.metrics import track_validation_metrics
from ..core.config import settings
class ValidationError(Exception): class ValidationError(Exception):
"""Custom validation error.""" """Custom validation error."""
@ -27,93 +25,89 @@ class SecurityValidationError(Exception):
class RequestValidator: class RequestValidator:
"""Enhanced request validation with security checks.""" """Enhanced request validation with security checks."""
def __init__(self): def __init__(self):
# File type restrictions # File type restrictions
self.allowed_video_types = { self.allowed_video_types = {
"video/mp4", "video/mp4",
"video/quicktime", "video/quicktime",
"video/x-msvideo" # AVI "video/x-msvideo" # AVI
} }
self.allowed_subtitle_types = { self.allowed_subtitle_types = {
"text/vtt", "text/vtt",
"text/plain" "text/plain"
} }
# Security patterns to block # Security patterns to block
self.malicious_patterns = [ self.malicious_patterns = [
# SQL injection patterns # SQL injection patterns
r"\b(union|select|insert|update|delete|drop|create|alter)\b\s+", r"(union|select|insert|update|delete|drop|create|alter)\s+",
r"vbscript:", # vbscript protocol injection r"(script|javascript|vbscript|onload|onerror|onclick)",
r"\b(onload|onerror|onclick)\s*=", # HTML event handler attribute injection
r"<\s*script[^>]*>", r"<\s*script[^>]*>",
r"javascript:", r"javascript:",
r"data:.*base64", r"data:.*base64",
# Path traversal # Path traversal
r"\.\./", r"\.\./",
r"\.\.\\", r"\.\.\\",
r"%2e%2e%2f", r"%2e%2e%2f",
r"%2e%2e\\", r"%2e%2e\\",
# Command injection (removed $ and ; — semicolons are common in natural language) # Command injection (removed $ to allow MongoDB operators in controlled contexts)
r"[&|`](?!\s*$)", r"[;&|`](?!\s*$)", # Allow $ but not as command separator
r"\b(rm|wget|curl|nc|bash|sh|cmd|powershell)\b\s+", r"(rm|wget|curl|nc|bash|sh|cmd|powershell)\s+",
# MongoDB injection — NoSQL operator abuse # MongoDB injection
r"\$where|\$expr|\$function|\$accumulator" r"\$where|\$ne|\$gt|\$lt|\$regex",
r"|\$ne|\$nin|\$not"
r"|\$gt|\$gte|\$lt|\$lte"
r"|\$regex|\$jsonSchema|\$mod",
] ]
self.compiled_patterns = [re.compile(pattern, re.IGNORECASE) for pattern in self.malicious_patterns] self.compiled_patterns = [re.compile(pattern, re.IGNORECASE) for pattern in self.malicious_patterns]
# Max file sizes (in bytes) — driven by central config (T-14) # Max file sizes (in bytes)
self.max_video_size = settings.upload_max_video_bytes self.max_video_size = 2 * 1024 * 1024 * 1024 # 2GB
self.max_subtitle_size = 10 * 1024 * 1024 # 10MB self.max_subtitle_size = 10 * 1024 * 1024 # 10MB
# Request size limits # Request size limits
self.max_json_size = 1024 * 1024 # 1MB self.max_json_size = 1024 * 1024 # 1MB
self.max_form_fields = 50 self.max_form_fields = 50
def validate_string_content(self, content: str, field_name: str = "input") -> None: def validate_string_content(self, content: str, field_name: str = "input") -> None:
"""Validate string content for malicious patterns.""" """Validate string content for malicious patterns."""
if not isinstance(content, str): if not isinstance(content, str):
return return
for pattern in self.compiled_patterns: for pattern in self.compiled_patterns:
if pattern.search(content): if pattern.search(content):
raise SecurityValidationError( raise SecurityValidationError(
f"Potentially malicious content detected in {field_name}" f"Potentially malicious content detected in {field_name}"
) )
def validate_filename(self, filename: str) -> str: def validate_filename(self, filename: str) -> str:
"""Validate and sanitize filename.""" """Validate and sanitize filename."""
if not filename: if not filename:
raise ValidationError("Filename cannot be empty") raise ValidationError("Filename cannot be empty")
# Decode URL encoding # Decode URL encoding
filename = unquote(filename) filename = unquote(filename)
# Check for malicious patterns # Check for malicious patterns
self.validate_string_content(filename, "filename") self.validate_string_content(filename, "filename")
# Remove dangerous characters # Remove dangerous characters
safe_filename = re.sub(r'[^\w\-_\.]', '_', filename) safe_filename = re.sub(r'[^\w\-_\.]', '_', filename)
# Prevent hidden files # Prevent hidden files
if safe_filename.startswith('.'): if safe_filename.startswith('.'):
safe_filename = 'file_' + safe_filename[1:] safe_filename = 'file_' + safe_filename[1:]
# Limit length # Limit length
if len(safe_filename) > 255: if len(safe_filename) > 255:
name, ext = safe_filename.rsplit('.', 1) if '.' in safe_filename else (safe_filename, '') name, ext = safe_filename.rsplit('.', 1) if '.' in safe_filename else (safe_filename, '')
safe_filename = name[:250] + ('.' + ext if ext else '') safe_filename = name[:250] + ('.' + ext if ext else '')
return safe_filename return safe_filename
def validate_file_type(self, content: bytes, expected_type: str, filename: str) -> None: def validate_file_type(self, content: bytes, expected_type: str, filename: str) -> None:
"""Validate file type using magic numbers.""" """Validate file type using magic numbers."""
try: try:
@ -123,13 +117,13 @@ class RequestValidator:
ext = filename.lower().split('.')[-1] if '.' in filename else '' ext = filename.lower().split('.')[-1] if '.' in filename else ''
video_extensions = {'mp4', 'mov', 'avi', 'mkv'} video_extensions = {'mp4', 'mov', 'avi', 'mkv'}
subtitle_extensions = {'vtt', 'srt', 'txt'} subtitle_extensions = {'vtt', 'srt', 'txt'}
if expected_type == "video" and ext not in video_extensions: if expected_type == "video" and ext not in video_extensions:
raise ValidationError(f"Invalid video file extension: {ext}") from None raise ValidationError(f"Invalid video file extension: {ext}")
elif expected_type == "subtitle" and ext not in subtitle_extensions: elif expected_type == "subtitle" and ext not in subtitle_extensions:
raise ValidationError(f"Invalid subtitle file extension: {ext}") from None raise ValidationError(f"Invalid subtitle file extension: {ext}")
return return
if expected_type == "video" and detected_type not in self.allowed_video_types: if expected_type == "video" and detected_type not in self.allowed_video_types:
raise ValidationError( raise ValidationError(
f"Invalid video file type: {detected_type}. " f"Invalid video file type: {detected_type}. "
@ -140,7 +134,7 @@ class RequestValidator:
f"Invalid subtitle file type: {detected_type}. " f"Invalid subtitle file type: {detected_type}. "
f"Allowed types: {', '.join(self.allowed_subtitle_types)}" f"Allowed types: {', '.join(self.allowed_subtitle_types)}"
) )
def validate_file_size(self, size: int, file_type: str) -> None: def validate_file_size(self, size: int, file_type: str) -> None:
"""Validate file size limits.""" """Validate file size limits."""
if file_type == "video" and size > self.max_video_size: if file_type == "video" and size > self.max_video_size:
@ -153,16 +147,16 @@ class RequestValidator:
f"Subtitle file too large: {size} bytes. " f"Subtitle file too large: {size} bytes. "
f"Maximum allowed: {self.max_subtitle_size} bytes" f"Maximum allowed: {self.max_subtitle_size} bytes"
) )
async def validate_json_payload(self, request: Request) -> dict[str, Any] | None: async def validate_json_payload(self, request: Request) -> Optional[Dict[str, Any]]:
"""Validate JSON request payload.""" """Validate JSON request payload."""
if not request.headers.get("content-type", "").startswith("application/json"): if not request.headers.get("content-type", "").startswith("application/json"):
return None return None
content_length = request.headers.get("content-length") content_length = request.headers.get("content-length")
if content_length and int(content_length) > self.max_json_size: if content_length and int(content_length) > self.max_json_size:
raise ValidationError(f"JSON payload too large: {content_length} bytes") raise ValidationError(f"JSON payload too large: {content_length} bytes")
try: try:
# Check if body has already been read # Check if body has already been read
if hasattr(request, '_cached_body'): if hasattr(request, '_cached_body'):
@ -171,67 +165,63 @@ class RequestValidator:
body = await request.body() body = await request.body()
# Cache the body so FastAPI can read it later # Cache the body so FastAPI can read it later
request._cached_body = body request._cached_body = body
if len(body) > self.max_json_size: if len(body) > self.max_json_size:
raise ValidationError(f"JSON payload too large: {len(body)} bytes") raise ValidationError(f"JSON payload too large: {len(body)} bytes")
if not body: if not body:
return {} return {}
payload = json.loads(body) payload = json.loads(body)
# Recursively validate all string values # Recursively validate all string values
self._validate_json_values(payload) self._validate_json_values(payload)
return payload return payload
except json.JSONDecodeError as e: except json.JSONDecodeError as e:
raise ValidationError(f"Invalid JSON: {e}") from e raise ValidationError(f"Invalid JSON: {e}")
# Fields that contain free-form natural language — skip injection pattern checks
_FREETEXT_FIELDS = {"captions_vtt", "audio_description_vtt", "text", "notes", "change_note", "description"}
def _validate_json_values(self, obj: Any, path: str = "root") -> None: def _validate_json_values(self, obj: Any, path: str = "root") -> None:
"""Recursively validate JSON values.""" """Recursively validate JSON values."""
if isinstance(obj, dict): if isinstance(obj, dict):
if len(obj) > self.max_form_fields: if len(obj) > self.max_form_fields:
raise ValidationError(f"Too many fields in object at {path}") raise ValidationError(f"Too many fields in object at {path}")
for key, value in obj.items(): for key, value in obj.items():
self.validate_string_content(key, f"{path}.key") if isinstance(key, str):
# Skip pattern scanning for free-text fields (VTT content, notes, etc.) self.validate_string_content(key, f"{path}.{key}")
if key not in self._FREETEXT_FIELDS: self._validate_json_values(value, f"{path}.{key}")
self._validate_json_values(value, f"{path}.{key}")
elif isinstance(obj, list): elif isinstance(obj, list):
if len(obj) > 1000: # Prevent large arrays if len(obj) > 1000: # Prevent large arrays
raise ValidationError(f"Array too large at {path}") raise ValidationError(f"Array too large at {path}")
for i, item in enumerate(obj): for i, item in enumerate(obj):
self._validate_json_values(item, f"{path}[{i}]") self._validate_json_values(item, f"{path}[{i}]")
elif isinstance(obj, str): elif isinstance(obj, str):
self.validate_string_content(obj, path) self.validate_string_content(obj, path)
def validate_query_params(self, request: Request) -> None: def validate_query_params(self, request: Request) -> None:
"""Validate query parameters.""" """Validate query parameters."""
for key, value in request.query_params.items(): for key, value in request.query_params.items():
self.validate_string_content(key, f"query.{key}") self.validate_string_content(key, f"query.{key}")
self.validate_string_content(str(value), f"query.{key}") self.validate_string_content(str(value), f"query.{key}")
def validate_headers(self, request: Request) -> None: def validate_headers(self, request: Request) -> None:
"""Validate request headers.""" """Validate request headers."""
suspicious_headers = { suspicious_headers = {
"x-forwarded-host", "x-forwarded-host",
"x-original-host", "x-original-host",
"x-rewrite-url" "x-rewrite-url"
} }
for header_name, header_value in request.headers.items(): for header_name, header_value in request.headers.items():
# Check for suspicious headers # Check for suspicious headers
if header_name.lower() in suspicious_headers: if header_name.lower() in suspicious_headers:
self.validate_string_content(header_value, f"header.{header_name}") self.validate_string_content(header_value, f"header.{header_name}")
# Validate user-agent length # Validate user-agent length
if header_name.lower() == "user-agent" and len(header_value) > 500: if header_name.lower() == "user-agent" and len(header_value) > 500:
raise SecurityValidationError("User-Agent header too long") raise SecurityValidationError("User-Agent header too long")
@ -239,34 +229,34 @@ class RequestValidator:
class ValidationMiddleware: class ValidationMiddleware:
"""FastAPI middleware for enhanced request validation.""" """FastAPI middleware for enhanced request validation."""
def __init__(self): def __init__(self):
self.validator = RequestValidator() self.validator = RequestValidator()
async def __call__(self, request: Request, call_next): async def __call__(self, request: Request, call_next):
"""Process validation for the request.""" """Process validation for the request."""
start_time = time.time() start_time = time.time()
validation_errors = [] validation_errors = []
# Skip validation for timing adjustment endpoint temporarily # Skip validation for timing adjustment endpoint temporarily
if "/vtt/adjust-timing" in request.url.path: if "/vtt/adjust-timing" in request.url.path:
return await call_next(request) return await call_next(request)
try: try:
# Validate headers # Validate headers
self.validator.validate_headers(request) self.validator.validate_headers(request)
# Validate query parameters # Validate query parameters
self.validator.validate_query_params(request) self.validator.validate_query_params(request)
# Validate JSON payload if present # Validate JSON payload if present
if request.method in ["POST", "PUT", "PATCH"]: if request.method in ["POST", "PUT", "PATCH"]:
await self.validator.validate_json_payload(request) await self.validator.validate_json_payload(request)
# Process the request # Process the request
response = await call_next(request) response = await call_next(request)
# Track successful validation # Track successful validation
track_validation_metrics( track_validation_metrics(
endpoint=request.url.path, endpoint=request.url.path,
@ -275,10 +265,10 @@ class ValidationMiddleware:
validation_time=time.time() - start_time, validation_time=time.time() - start_time,
error_types=[] error_types=[]
) )
return response return response
except SecurityValidationError: except SecurityValidationError as e:
validation_errors.append("security") validation_errors.append("security")
track_validation_metrics( track_validation_metrics(
endpoint=request.url.path, endpoint=request.url.path,
@ -287,7 +277,7 @@ class ValidationMiddleware:
validation_time=time.time() - start_time, validation_time=time.time() - start_time,
error_types=validation_errors error_types=validation_errors
) )
return JSONResponse( return JSONResponse(
status_code=status.HTTP_400_BAD_REQUEST, status_code=status.HTTP_400_BAD_REQUEST,
content={ content={
@ -295,7 +285,7 @@ class ValidationMiddleware:
"error_code": "SECURITY_VALIDATION_ERROR" "error_code": "SECURITY_VALIDATION_ERROR"
} }
) )
except ValidationError as e: except ValidationError as e:
validation_errors.append("format") validation_errors.append("format")
track_validation_metrics( track_validation_metrics(
@ -305,7 +295,7 @@ class ValidationMiddleware:
validation_time=time.time() - start_time, validation_time=time.time() - start_time,
error_types=validation_errors error_types=validation_errors
) )
return JSONResponse( return JSONResponse(
status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
content={ content={
@ -313,7 +303,7 @@ class ValidationMiddleware:
"error_code": "VALIDATION_ERROR" "error_code": "VALIDATION_ERROR"
} }
) )
except Exception as e: except Exception as e:
validation_errors.append("unknown") validation_errors.append("unknown")
track_validation_metrics( track_validation_metrics(
@ -323,7 +313,7 @@ class ValidationMiddleware:
validation_time=time.time() - start_time, validation_time=time.time() - start_time,
error_types=validation_errors error_types=validation_errors
) )
# Log unexpected error but continue processing # Log unexpected error but continue processing
print(f"Validation middleware error: {e}") print(f"Validation middleware error: {e}")
return await call_next(request) return await call_next(request)
@ -331,4 +321,4 @@ class ValidationMiddleware:
async def create_validation_middleware() -> ValidationMiddleware: async def create_validation_middleware() -> ValidationMiddleware:
"""Factory function to create validation middleware.""" """Factory function to create validation middleware."""
return ValidationMiddleware() return ValidationMiddleware()

View file

@ -1,5 +1,5 @@
"""Database migration framework for MongoDB.""" """Database migration framework for MongoDB."""
from .migrator import Migration, MigrationManager from .migrator import MigrationManager, Migration
__all__ = ["MigrationManager", "Migration"] __all__ = ["MigrationManager", "Migration"]

View file

@ -1,10 +1,11 @@
"""MongoDB migration framework.""" """MongoDB migration framework."""
import os
import importlib.util import importlib.util
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from datetime import datetime from datetime import datetime
from pathlib import Path from pathlib import Path
from typing import List, Optional
from motor.motor_asyncio import AsyncIOMotorDatabase from motor.motor_asyncio import AsyncIOMotorDatabase
from app.core.database import get_database from app.core.database import get_database
@ -16,23 +17,22 @@ logger = get_logger(__name__)
class Migration(ABC): class Migration(ABC):
"""Base class for database migrations.""" """Base class for database migrations."""
version: str = "0000-00-00-000000" # overridden by subclass as class variable
description: str = ""
def __init__(self): def __init__(self):
self.db: AsyncIOMotorDatabase | None = None self.version: str = "0000-00-00-000000" # Format: YYYY-MM-DD-HHMMSS
self.description: str = ""
self.db: Optional[AsyncIOMotorDatabase] = None
@abstractmethod @abstractmethod
async def up(self) -> None: async def up(self) -> None:
"""Apply the migration.""" """Apply the migration."""
pass pass
@abstractmethod @abstractmethod
async def down(self) -> None: async def down(self) -> None:
"""Rollback the migration.""" """Rollback the migration."""
pass pass
async def set_database(self, db: AsyncIOMotorDatabase) -> None: async def set_database(self, db: AsyncIOMotorDatabase) -> None:
"""Set the database instance.""" """Set the database instance."""
self.db = db self.db = db
@ -40,7 +40,7 @@ class Migration(ABC):
class MigrationRecord: class MigrationRecord:
"""Represents a migration record in the database.""" """Represents a migration record in the database."""
def __init__(self, version: str, description: str, applied_at: datetime): def __init__(self, version: str, description: str, applied_at: datetime):
self.version = version self.version = version
self.description = description self.description = description
@ -49,163 +49,163 @@ class MigrationRecord:
class MigrationManager: class MigrationManager:
"""Manages database migrations.""" """Manages database migrations."""
def __init__(self): def __init__(self):
self.db: AsyncIOMotorDatabase | None = None self.db: Optional[AsyncIOMotorDatabase] = None
self.migrations_dir = Path(__file__).parent / "scripts" self.migrations_dir = Path(__file__).parent / "scripts"
self.collection_name = "migration_history" self.collection_name = "migration_history"
async def initialize(self) -> None: async def initialize(self) -> None:
"""Initialize the migration manager.""" """Initialize the migration manager."""
self.db = await get_database() self.db = await get_database()
await self._ensure_migration_collection() await self._ensure_migration_collection()
async def _ensure_migration_collection(self) -> None: async def _ensure_migration_collection(self) -> None:
"""Ensure the migration history collection exists with proper indexes.""" """Ensure the migration history collection exists with proper indexes."""
collection = self.db[self.collection_name] collection = self.db[self.collection_name]
# Create indexes for migration history # Create indexes for migration history
await collection.create_index([("version", 1)], unique=True) await collection.create_index([("version", 1)], unique=True)
await collection.create_index([("applied_at", -1)]) await collection.create_index([("applied_at", -1)])
logger.info("Migration history collection initialized") logger.info("Migration history collection initialized")
def discover_migrations(self) -> list[str]: def discover_migrations(self) -> List[str]:
"""Discover all migration files in the migrations directory.""" """Discover all migration files in the migrations directory."""
if not self.migrations_dir.exists(): if not self.migrations_dir.exists():
logger.warning(f"Migrations directory not found: {self.migrations_dir}") logger.warning(f"Migrations directory not found: {self.migrations_dir}")
return [] return []
migration_files = [] migration_files = []
for file_path in self.migrations_dir.glob("*.py"): for file_path in self.migrations_dir.glob("*.py"):
if file_path.name.startswith("migration_") and not file_path.name.startswith("__"): if file_path.name.startswith("migration_") and not file_path.name.startswith("__"):
migration_files.append(file_path.stem) migration_files.append(file_path.stem)
# Sort by version (filename should start with version) # Sort by version (filename should start with version)
migration_files.sort() migration_files.sort()
return migration_files return migration_files
async def load_migration(self, migration_name: str) -> Migration: async def load_migration(self, migration_name: str) -> Migration:
"""Dynamically load a migration class.""" """Dynamically load a migration class."""
migration_path = self.migrations_dir / f"{migration_name}.py" migration_path = self.migrations_dir / f"{migration_name}.py"
if not migration_path.exists(): if not migration_path.exists():
raise FileNotFoundError(f"Migration file not found: {migration_path}") raise FileNotFoundError(f"Migration file not found: {migration_path}")
# Load the module # Load the module
spec = importlib.util.spec_from_file_location(migration_name, migration_path) spec = importlib.util.spec_from_file_location(migration_name, migration_path)
module = importlib.util.module_from_spec(spec) module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module) spec.loader.exec_module(module)
# Get the migration class (assume it's named Migration) # Get the migration class (assume it's named Migration)
if not hasattr(module, 'Migration'): if not hasattr(module, 'Migration'):
raise AttributeError(f"Migration class not found in {migration_name}") raise AttributeError(f"Migration class not found in {migration_name}")
migration_class = module.Migration migration_class = getattr(module, 'Migration')
migration = migration_class() migration = migration_class()
await migration.set_database(self.db) await migration.set_database(self.db)
return migration return migration
async def get_applied_migrations(self) -> list[str]: async def get_applied_migrations(self) -> List[str]:
"""Get list of applied migration versions.""" """Get list of applied migration versions."""
collection = self.db[self.collection_name] collection = self.db[self.collection_name]
cursor = collection.find({}, {"version": 1}).sort("version", 1) cursor = collection.find({}, {"version": 1}).sort("version", 1)
applied = [] applied = []
async for doc in cursor: async for doc in cursor:
applied.append(doc["version"]) applied.append(doc["version"])
return applied return applied
async def record_migration(self, migration: Migration) -> None: async def record_migration(self, migration: Migration) -> None:
"""Record a successful migration in the database.""" """Record a successful migration in the database."""
collection = self.db[self.collection_name] collection = self.db[self.collection_name]
record = { record = {
"version": migration.version, "version": migration.version,
"description": migration.description, "description": migration.description,
"applied_at": datetime.utcnow() "applied_at": datetime.utcnow()
} }
await collection.insert_one(record) await collection.insert_one(record)
logger.info(f"Recorded migration: {migration.version} - {migration.description}") logger.info(f"Recorded migration: {migration.version} - {migration.description}")
async def remove_migration_record(self, version: str) -> None: async def remove_migration_record(self, version: str) -> None:
"""Remove a migration record (for rollback).""" """Remove a migration record (for rollback)."""
collection = self.db[self.collection_name] collection = self.db[self.collection_name]
await collection.delete_one({"version": version}) await collection.delete_one({"version": version})
logger.info(f"Removed migration record: {version}") logger.info(f"Removed migration record: {version}")
@trace_async_operation("migration_manager.migrate_up") @trace_async_operation("migration_manager.migrate_up")
async def migrate_up(self, target_version: str | None = None) -> list[str]: async def migrate_up(self, target_version: Optional[str] = None) -> List[str]:
""" """
Apply migrations up to the target version. Apply migrations up to the target version.
Args: Args:
target_version: Version to migrate to. If None, applies all pending migrations. target_version: Version to migrate to. If None, applies all pending migrations.
Returns: Returns:
List of applied migration versions. List of applied migration versions.
""" """
await self.initialize() await self.initialize()
# Discover all migrations # Discover all migrations
all_migrations = self.discover_migrations() all_migrations = self.discover_migrations()
applied_migrations = await self.get_applied_migrations() applied_migrations = await self.get_applied_migrations()
# Find pending migrations # Find pending migrations
pending_migrations = [] pending_migrations = []
for migration_name in all_migrations: for migration_name in all_migrations:
# Extract version from filename (assumes format: migration_YYYY-MM-DD-HHMMSS_description.py) # Extract version from filename (assumes format: migration_YYYY-MM-DD-HHMMSS_description.py)
version = migration_name.replace("migration_", "").split("_")[0] version = migration_name.replace("migration_", "").split("_")[0]
if version not in applied_migrations: if version not in applied_migrations:
if target_version is None or version <= target_version: if target_version is None or version <= target_version:
pending_migrations.append((migration_name, version)) pending_migrations.append((migration_name, version))
# Sort by version # Sort by version
pending_migrations.sort(key=lambda x: x[1]) pending_migrations.sort(key=lambda x: x[1])
applied = [] applied = []
for migration_name, version in pending_migrations: for migration_name, version in pending_migrations:
try: try:
logger.info(f"Applying migration: {migration_name}") logger.info(f"Applying migration: {migration_name}")
migration = await self.load_migration(migration_name) migration = await self.load_migration(migration_name)
await migration.up() await migration.up()
await self.record_migration(migration) await self.record_migration(migration)
applied.append(version) applied.append(version)
logger.info(f"Successfully applied migration: {version}") logger.info(f"Successfully applied migration: {version}")
except Exception as e: except Exception as e:
logger.error(f"Failed to apply migration {migration_name}: {e}") logger.error(f"Failed to apply migration {migration_name}: {e}")
raise raise
return applied return applied
@trace_async_operation("migration_manager.migrate_down") @trace_async_operation("migration_manager.migrate_down")
async def migrate_down(self, target_version: str) -> list[str]: async def migrate_down(self, target_version: str) -> List[str]:
""" """
Rollback migrations down to the target version. Rollback migrations down to the target version.
Args: Args:
target_version: Version to rollback to. target_version: Version to rollback to.
Returns: Returns:
List of rolled back migration versions. List of rolled back migration versions.
""" """
await self.initialize() await self.initialize()
applied_migrations = await self.get_applied_migrations() applied_migrations = await self.get_applied_migrations()
# Find migrations to rollback (newer than target) # Find migrations to rollback (newer than target)
to_rollback = [] to_rollback = []
for version in reversed(applied_migrations): for version in reversed(applied_migrations):
if version > target_version: if version > target_version:
to_rollback.append(version) to_rollback.append(version)
rolled_back = [] rolled_back = []
for version in to_rollback: for version in to_rollback:
try: try:
@ -215,39 +215,39 @@ class MigrationManager:
if version in migration_file: if version in migration_file:
migration_name = migration_file migration_name = migration_file
break break
if not migration_name: if not migration_name:
logger.warning(f"Migration file not found for version {version}") logger.warning(f"Migration file not found for version {version}")
continue continue
logger.info(f"Rolling back migration: {migration_name}") logger.info(f"Rolling back migration: {migration_name}")
migration = await self.load_migration(migration_name) migration = await self.load_migration(migration_name)
await migration.down() await migration.down()
await self.remove_migration_record(version) await self.remove_migration_record(version)
rolled_back.append(version) rolled_back.append(version)
logger.info(f"Successfully rolled back migration: {version}") logger.info(f"Successfully rolled back migration: {version}")
except Exception as e: except Exception as e:
logger.error(f"Failed to rollback migration {version}: {e}") logger.error(f"Failed to rollback migration {version}: {e}")
raise raise
return rolled_back return rolled_back
async def get_migration_status(self) -> dict: async def get_migration_status(self) -> dict:
"""Get current migration status.""" """Get current migration status."""
await self.initialize() await self.initialize()
all_migrations = self.discover_migrations() all_migrations = self.discover_migrations()
applied_migrations = await self.get_applied_migrations() applied_migrations = await self.get_applied_migrations()
pending_count = len(all_migrations) - len(applied_migrations) pending_count = len(all_migrations) - len(applied_migrations)
return { return {
"total_migrations": len(all_migrations), "total_migrations": len(all_migrations),
"applied_migrations": len(applied_migrations), "applied_migrations": len(applied_migrations),
"pending_migrations": pending_count, "pending_migrations": pending_count,
"latest_applied": applied_migrations[-1] if applied_migrations else None, "latest_applied": applied_migrations[-1] if applied_migrations else None,
"all_applied": applied_migrations "all_applied": applied_migrations
} }

View file

@ -1,22 +0,0 @@
"""Entry point for running migrations: python -m app.migrations.run"""
import asyncio
from app.core.database import close_mongo_connection, connect_to_mongo
from app.migrations.migrator import MigrationManager
async def main() -> None:
await connect_to_mongo()
try:
mgr = MigrationManager()
applied = await mgr.migrate_up()
if applied:
print(f"Applied {len(applied)} migration(s): {applied}")
else:
print("Already up to date — no pending migrations.")
finally:
await close_mongo_connection()
if __name__ == "__main__":
asyncio.run(main())

View file

@ -1,38 +1,39 @@
"""Initial database schema setup migration.""" """Initial database schema setup migration."""
from datetime import datetime
from app.migrations.migrator import Migration from app.migrations.migrator import Migration
class Migration(Migration): class Migration(Migration):
"""Initial schema setup with all collections and indexes.""" """Initial schema setup with all collections and indexes."""
def __init__(self): def __init__(self):
super().__init__() super().__init__()
self.version = "2025-08-17-120000" self.version = "2025-08-17-120000"
self.description = "Initial database schema with users, jobs, and audit_logs collections" self.description = "Initial database schema with users, jobs, and audit_logs collections"
async def up(self) -> None: async def up(self) -> None:
"""Create initial collections and indexes.""" """Create initial collections and indexes."""
# Users collection setup # Users collection setup
await self.db.users.create_index([("email", 1)], unique=True) await self.db.users.create_index([("email", 1)], unique=True)
await self.db.users.create_index([("role", 1)]) await self.db.users.create_index([("role", 1)])
await self.db.users.create_index([("is_active", 1)]) await self.db.users.create_index([("is_active", 1)])
await self.db.users.create_index([("created_at", -1)]) await self.db.users.create_index([("created_at", -1)])
# Jobs collection setup # Jobs collection setup
await self.db.jobs.create_index([("status", 1), ("created_at", -1)]) await self.db.jobs.create_index([("status", 1), ("created_at", -1)])
await self.db.jobs.create_index([("client_id", 1)]) await self.db.jobs.create_index([("client_id", 1)])
await self.db.jobs.create_index([("updated_at", -1)]) await self.db.jobs.create_index([("updated_at", -1)])
await self.db.jobs.create_index([("languages", 1)]) await self.db.jobs.create_index([("languages", 1)])
# Create compound index for job queries # Create compound index for job queries
await self.db.jobs.create_index([ await self.db.jobs.create_index([
("status", 1), ("status", 1),
("client_id", 1), ("client_id", 1),
("created_at", -1) ("created_at", -1)
]) ])
# Audit logs collection setup # Audit logs collection setup
await self.db.audit_logs.create_index([("timestamp", -1)]) await self.db.audit_logs.create_index([("timestamp", -1)])
await self.db.audit_logs.create_index([("action", 1), ("timestamp", -1)]) await self.db.audit_logs.create_index([("action", 1), ("timestamp", -1)])
@ -41,23 +42,23 @@ class Migration(Migration):
await self.db.audit_logs.create_index([("resource_type", 1), ("resource_id", 1)]) await self.db.audit_logs.create_index([("resource_type", 1), ("resource_id", 1)])
await self.db.audit_logs.create_index([("ip_address", 1), ("timestamp", -1)]) await self.db.audit_logs.create_index([("ip_address", 1), ("timestamp", -1)])
await self.db.audit_logs.create_index([("success", 1), ("timestamp", -1)]) await self.db.audit_logs.create_index([("success", 1), ("timestamp", -1)])
# Text search index for audit logs # Text search index for audit logs
await self.db.audit_logs.create_index([ await self.db.audit_logs.create_index([
("description", "text"), ("description", "text"),
("details", "text"), ("details", "text"),
("error_message", "text") ("error_message", "text")
]) ])
print(f"✅ Applied migration {self.version}: {self.description}") print(f"✅ Applied migration {self.version}: {self.description}")
async def down(self) -> None: async def down(self) -> None:
"""Drop all collections (destructive - use with caution).""" """Drop all collections (destructive - use with caution)."""
# This is a destructive operation - in production, you might want to backup first # This is a destructive operation - in production, you might want to backup first
await self.db.users.drop() await self.db.users.drop()
await self.db.jobs.drop() await self.db.jobs.drop()
await self.db.audit_logs.drop() await self.db.audit_logs.drop()
print(f"⚠️ Rolled back migration {self.version}: {self.description}") print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print("⚠️ WARNING: All data has been deleted!") print("⚠️ WARNING: All data has been deleted!")

View file

@ -5,75 +5,75 @@ from app.migrations.migrator import Migration
class Migration(Migration): class Migration(Migration):
"""Optimize indexes for better query performance.""" """Optimize indexes for better query performance."""
def __init__(self): def __init__(self):
super().__init__() super().__init__()
self.version = "2025-08-17-120001" self.version = "2025-08-17-120001"
self.description = "Index optimization for query performance improvements" self.description = "Index optimization for query performance improvements"
async def up(self) -> None: async def up(self) -> None:
"""Add optimized indexes for common query patterns.""" """Add optimized indexes for common query patterns."""
# Jobs collection optimizations # Jobs collection optimizations
# Index for job status transitions and monitoring # Index for job status transitions and monitoring
await self.db.jobs.create_index([ await self.db.jobs.create_index([
("status", 1), ("status", 1),
("updated_at", -1), ("updated_at", -1),
("client_id", 1) ("client_id", 1)
], name="jobs_status_updated_client_idx") ], name="jobs_status_updated_client_idx")
# Index for queue management (pending jobs) # Index for queue management (pending jobs)
await self.db.jobs.create_index([ await self.db.jobs.create_index([
("status", 1), ("status", 1),
("created_at", 1) ("created_at", 1)
], name="jobs_queue_processing_idx") ], name="jobs_queue_processing_idx")
# Index for client job history # Index for client job history
await self.db.jobs.create_index([ await self.db.jobs.create_index([
("client_id", 1), ("client_id", 1),
("created_at", -1), ("created_at", -1),
("status", 1) ("status", 1)
], name="jobs_client_history_idx") ], name="jobs_client_history_idx")
# Sparse index for error tracking # Sparse index for error tracking
await self.db.jobs.create_index([ await self.db.jobs.create_index([
("status", 1), ("status", 1),
("error", 1) ("error", 1)
], sparse=True, name="jobs_error_tracking_idx") ], sparse=True, name="jobs_error_tracking_idx")
# Users collection optimizations # Users collection optimizations
# Index for active user queries # Index for active user queries
await self.db.users.create_index([ await self.db.users.create_index([
("is_active", 1), ("is_active", 1),
("role", 1), ("role", 1),
("last_login_at", -1) ("last_login_at", -1)
], name="users_active_role_login_idx") ], name="users_active_role_login_idx")
# Index for user search by email pattern # Index for user search by email pattern
await self.db.users.create_index([ await self.db.users.create_index([
("email", "text"), ("email", "text"),
("first_name", "text"), ("first_name", "text"),
("last_name", "text") ("last_name", "text")
], name="users_search_idx") ], name="users_search_idx")
# Audit logs collection optimizations # Audit logs collection optimizations
# Compound index for security monitoring # Compound index for security monitoring
await self.db.audit_logs.create_index([ await self.db.audit_logs.create_index([
("severity", 1), ("severity", 1),
("action", 1), ("action", 1),
("timestamp", -1) ("timestamp", -1)
], name="audit_security_monitoring_idx") ], name="audit_security_monitoring_idx")
# Index for user activity analysis # Index for user activity analysis
await self.db.audit_logs.create_index([ await self.db.audit_logs.create_index([
("user_id", 1), ("user_id", 1),
("action", 1), ("action", 1),
("timestamp", -1) ("timestamp", -1)
], name="audit_user_activity_idx") ], name="audit_user_activity_idx")
# Index for resource access tracking # Index for resource access tracking
await self.db.audit_logs.create_index([ await self.db.audit_logs.create_index([
("resource_type", 1), ("resource_type", 1),
@ -81,30 +81,30 @@ class Migration(Migration):
("action", 1), ("action", 1),
("timestamp", -1) ("timestamp", -1)
], name="audit_resource_access_idx") ], name="audit_resource_access_idx")
# Sparse index for failed operations # Sparse index for failed operations
await self.db.audit_logs.create_index([ await self.db.audit_logs.create_index([
("success", 1), ("success", 1),
("timestamp", -1) ("timestamp", -1)
], sparse=True, name="audit_failures_idx") ], sparse=True, name="audit_failures_idx")
# Add TTL index for automatic audit log cleanup (optional) # Add TTL index for automatic audit log cleanup (optional)
# Uncomment if you want automatic cleanup after 2 years # Uncomment if you want automatic cleanup after 2 years
# await self.db.audit_logs.create_index( # await self.db.audit_logs.create_index(
# [("timestamp", 1)], # [("timestamp", 1)],
# expireAfterSeconds=63072000, # 2 years # expireAfterSeconds=63072000, # 2 years
# name="audit_ttl_idx" # name="audit_ttl_idx"
# ) # )
print(f"✅ Applied migration {self.version}: {self.description}") print(f"✅ Applied migration {self.version}: {self.description}")
async def down(self) -> None: async def down(self) -> None:
"""Remove the optimized indexes.""" """Remove the optimized indexes."""
# Drop the indexes we created # Drop the indexes we created
indexes_to_drop = [ indexes_to_drop = [
"jobs_status_updated_client_idx", "jobs_status_updated_client_idx",
"jobs_queue_processing_idx", "jobs_queue_processing_idx",
"jobs_client_history_idx", "jobs_client_history_idx",
"jobs_error_tracking_idx", "jobs_error_tracking_idx",
"users_active_role_login_idx", "users_active_role_login_idx",
@ -114,21 +114,21 @@ class Migration(Migration):
"audit_resource_access_idx", "audit_resource_access_idx",
"audit_failures_idx" "audit_failures_idx"
] ]
for index_name in indexes_to_drop: for index_name in indexes_to_drop:
try: try:
await self.db.jobs.drop_index(index_name) await self.db.jobs.drop_index(index_name)
except Exception: except Exception:
pass # Index might not exist on this collection pass # Index might not exist on this collection
try: try:
await self.db.users.drop_index(index_name) await self.db.users.drop_index(index_name)
except Exception: except Exception:
pass pass
try: try:
await self.db.audit_logs.drop_index(index_name) await self.db.audit_logs.drop_index(index_name)
except Exception: except Exception:
pass pass
print(f"⚠️ Rolled back migration {self.version}: {self.description}") print(f"⚠️ Rolled back migration {self.version}: {self.description}")

View file

@ -1,21 +1,20 @@
"""Migrate audit log schema from basic to comprehensive format.""" """Migrate audit log schema from basic to comprehensive format."""
from datetime import datetime from datetime import datetime
from app.migrations.migrator import Migration from app.migrations.migrator import Migration
class Migration(Migration): class Migration(Migration):
"""Update audit log schema to comprehensive format.""" """Update audit log schema to comprehensive format."""
def __init__(self): def __init__(self):
super().__init__() super().__init__()
self.version = "2025-08-17-120002" self.version = "2025-08-17-120002"
self.description = "Update audit log schema from basic to comprehensive format" self.description = "Update audit log schema from basic to comprehensive format"
async def up(self) -> None: async def up(self) -> None:
"""Migrate existing audit logs to new schema format.""" """Migrate existing audit logs to new schema format."""
# Find all existing audit logs with old schema # Find all existing audit logs with old schema
old_logs_cursor = self.db.audit_logs.find({ old_logs_cursor = self.db.audit_logs.find({
# Look for logs that have the old schema structure # Look for logs that have the old schema structure
@ -25,9 +24,9 @@ class Migration(Migration):
{"timestamp": {"$exists": False}} # Missing new timestamp field {"timestamp": {"$exists": False}} # Missing new timestamp field
] ]
}) })
migration_count = 0 migration_count = 0
async for old_log in old_logs_cursor: async for old_log in old_logs_cursor:
try: try:
# Map old fields to new schema # Map old fields to new schema
@ -39,82 +38,82 @@ class Migration(Migration):
"description": old_log.get("action", "Legacy action"), "description": old_log.get("action", "Legacy action"),
"success": True, "success": True,
"environment": "prod", "environment": "prod",
"service_name": "accessible-video-api", "service_name": "accessible-video-api",
"api_version": "v1" "api_version": "v1"
} }
# Map optional fields if they exist # Map optional fields if they exist
if "user_id" in old_log: if "user_id" in old_log:
new_log["user_id"] = old_log["user_id"] new_log["user_id"] = old_log["user_id"]
if "job_id" in old_log: if "job_id" in old_log:
new_log["resource_type"] = "job" new_log["resource_type"] = "job"
new_log["resource_id"] = old_log["job_id"] new_log["resource_id"] = old_log["job_id"]
if "ip_address" in old_log: if "ip_address" in old_log:
new_log["ip_address"] = old_log["ip_address"] new_log["ip_address"] = old_log["ip_address"]
if "user_agent" in old_log: if "user_agent" in old_log:
new_log["user_agent"] = old_log["user_agent"] new_log["user_agent"] = old_log["user_agent"]
if "details" in old_log: if "details" in old_log:
new_log["details"] = old_log["details"] new_log["details"] = old_log["details"]
# Replace the old document with the new schema # Replace the old document with the new schema
await self.db.audit_logs.replace_one( await self.db.audit_logs.replace_one(
{"_id": old_log["_id"]}, {"_id": old_log["_id"]},
new_log new_log
) )
migration_count += 1 migration_count += 1
except Exception as e: except Exception as e:
print(f"Error migrating audit log {old_log.get('_id')}: {e}") print(f"Error migrating audit log {old_log.get('_id')}: {e}")
continue continue
print(f"✅ Applied migration {self.version}: Migrated {migration_count} audit log records") print(f"✅ Applied migration {self.version}: Migrated {migration_count} audit log records")
def _map_old_action(self, old_action: str) -> str: def _map_old_action(self, old_action: str) -> str:
"""Map old action strings to new AuditAction enum values.""" """Map old action strings to new AuditAction enum values."""
action_mapping = { action_mapping = {
# Job actions # Job actions
"job_created": "job.create", "job_created": "job.create",
"job_approved": "job.approve", "job_approved": "job.approve",
"job_rejected": "job.reject", "job_rejected": "job.reject",
"job_updated": "job.update", "job_updated": "job.update",
"job_cancelled": "job.cancel", "job_cancelled": "job.cancel",
# Auth actions # Auth actions
"login": "auth.login.success", "login": "auth.login.success",
"logout": "auth.logout", "logout": "auth.logout",
"login_failed": "auth.login.failure", "login_failed": "auth.login.failure",
# File actions # File actions
"file_uploaded": "file.upload", "file_uploaded": "file.upload",
"file_downloaded": "file.download", "file_downloaded": "file.download",
# VTT actions # VTT actions
"vtt_edited": "vtt.edit", "vtt_edited": "vtt.edit",
# Admin actions # Admin actions
"user_created": "user.create", "user_created": "user.create",
"user_updated": "user.update", "user_updated": "user.update",
"user_deleted": "user.delete", "user_deleted": "user.delete",
} }
return action_mapping.get(old_action, old_action) return action_mapping.get(old_action, old_action)
async def down(self) -> None: async def down(self) -> None:
"""Rollback to old audit log schema format (limited).""" """Rollback to old audit log schema format (limited)."""
# Find all audit logs with new schema # Find all audit logs with new schema
new_logs_cursor = self.db.audit_logs.find({ new_logs_cursor = self.db.audit_logs.find({
"timestamp": {"$exists": True}, "timestamp": {"$exists": True},
"action": {"$exists": True} "action": {"$exists": True}
}) })
rollback_count = 0 rollback_count = 0
async for new_log in new_logs_cursor: async for new_log in new_logs_cursor:
try: try:
# Map new fields back to old schema (lossy conversion) # Map new fields back to old schema (lossy conversion)
@ -123,34 +122,34 @@ class Migration(Migration):
"when": new_log["timestamp"], "when": new_log["timestamp"],
"action": new_log["action"] "action": new_log["action"]
} }
# Map back optional fields # Map back optional fields
if "user_id" in new_log: if "user_id" in new_log:
old_log["user_id"] = new_log["user_id"] old_log["user_id"] = new_log["user_id"]
if "resource_type" in new_log and new_log["resource_type"] == "job": if "resource_type" in new_log and new_log["resource_type"] == "job":
old_log["job_id"] = new_log.get("resource_id") old_log["job_id"] = new_log.get("resource_id")
if "ip_address" in new_log: if "ip_address" in new_log:
old_log["ip_address"] = new_log["ip_address"] old_log["ip_address"] = new_log["ip_address"]
if "user_agent" in new_log: if "user_agent" in new_log:
old_log["user_agent"] = new_log["user_agent"] old_log["user_agent"] = new_log["user_agent"]
if "details" in new_log: if "details" in new_log:
old_log["details"] = new_log["details"] old_log["details"] = new_log["details"]
# Replace with old schema # Replace with old schema
await self.db.audit_logs.replace_one( await self.db.audit_logs.replace_one(
{"_id": new_log["_id"]}, {"_id": new_log["_id"]},
old_log old_log
) )
rollback_count += 1 rollback_count += 1
except Exception as e: except Exception as e:
print(f"Error rolling back audit log {new_log.get('_id')}: {e}") print(f"Error rolling back audit log {new_log.get('_id')}: {e}")
continue continue
print(f"⚠️ Rolled back migration {self.version}: Reverted {rollback_count} audit log records") print(f"⚠️ Rolled back migration {self.version}: Reverted {rollback_count} audit log records")
print("⚠️ WARNING: Some audit log data may have been lost due to schema differences") print("⚠️ WARNING: Some audit log data may have been lost due to schema differences")

View file

@ -24,7 +24,7 @@ class Migration(Migration):
# Create index on auth_provider for faster queries # Create index on auth_provider for faster queries
await self.db.users.create_index([("auth_provider", 1)]) await self.db.users.create_index([("auth_provider", 1)])
print("✅ Created index on auth_provider field") print(f"✅ Created index on auth_provider field")
print(f"✅ Applied migration {self.version}: {self.description}") print(f"✅ Applied migration {self.version}: {self.description}")
@ -34,7 +34,7 @@ class Migration(Migration):
# Drop the index # Drop the index
try: try:
await self.db.users.drop_index("auth_provider_1") await self.db.users.drop_index("auth_provider_1")
print("✅ Dropped index on auth_provider field") print(f"✅ Dropped index on auth_provider field")
except Exception as e: except Exception as e:
print(f"⚠️ Could not drop index: {e}") print(f"⚠️ Could not drop index: {e}")

View file

@ -75,7 +75,7 @@ class Migration(Migration):
"validationLevel": "moderate", # moderate = only validate on insert/update, not existing docs "validationLevel": "moderate", # moderate = only validate on insert/update, not existing docs
"validationAction": "error" # error = reject invalid documents "validationAction": "error" # error = reject invalid documents
}) })
print("✅ Updated users collection validator") print(f"✅ Updated users collection validator")
except Exception as e: except Exception as e:
print(f"⚠️ Could not update validator: {e}") print(f"⚠️ Could not update validator: {e}")
# Try creating the collection if it doesn't exist # Try creating the collection if it doesn't exist
@ -86,7 +86,7 @@ class Migration(Migration):
validationLevel="moderate", validationLevel="moderate",
validationAction="error" validationAction="error"
) )
print("✅ Created users collection with validator") print(f"✅ Created users collection with validator")
except Exception as e2: except Exception as e2:
print(f"⚠️ Could not create collection: {e2}") print(f"⚠️ Could not create collection: {e2}")
@ -136,4 +136,4 @@ class Migration(Migration):
}) })
print(f"⚠️ Rolled back migration {self.version}: {self.description}") print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print("⚠️ WARNING: Production role users will fail validation!") print(f"⚠️ WARNING: Production role users will fail validation!")

View file

@ -53,7 +53,7 @@ class Migration(Migration):
"validationLevel": "moderate", "validationLevel": "moderate",
"validationAction": "error" "validationAction": "error"
}) })
print(" Updated jobs collection validator") print(f" Updated jobs collection validator")
except Exception as e: except Exception as e:
print(f" Could not update validator: {e}") print(f" Could not update validator: {e}")
raise raise
@ -101,4 +101,4 @@ class Migration(Migration):
}) })
print(f" Rolled back migration {self.version}: {self.description}") print(f" Rolled back migration {self.version}: {self.description}")
print(" WARNING: Jobs with approved_source or qc_feedback status will fail validation!") print(f" WARNING: Jobs with approved_source or qc_feedback status will fail validation!")

View file

@ -54,7 +54,7 @@ class Migration(Migration):
"validationLevel": "moderate", "validationLevel": "moderate",
"validationAction": "error" "validationAction": "error"
}) })
print(" Updated jobs collection validator") print(f" Updated jobs collection validator")
except Exception as e: except Exception as e:
print(f" Could not update validator: {e}") print(f" Could not update validator: {e}")
raise raise
@ -104,4 +104,4 @@ class Migration(Migration):
}) })
print(f" Rolled back migration {self.version}: {self.description}") print(f" Rolled back migration {self.version}: {self.description}")
print(" WARNING: Jobs with rendering_video status will fail validation!") print(f" WARNING: Jobs with rendering_video status will fail validation!")

View file

@ -60,7 +60,7 @@ class Migration(Migration):
"validationLevel": "moderate", "validationLevel": "moderate",
"validationAction": "error" "validationAction": "error"
}) })
print(" Updated jobs collection validator") print(f" Updated jobs collection validator")
except Exception as e: except Exception as e:
print(f" Could not update validator: {e}") print(f" Could not update validator: {e}")
raise raise
@ -111,4 +111,4 @@ class Migration(Migration):
}) })
print(f" Rolled back migration {self.version}: {self.description}") print(f" Rolled back migration {self.version}: {self.description}")
print(" WARNING: Jobs with tts_failed or render_failed status will fail validation!") print(f" WARNING: Jobs with tts_failed or render_failed status will fail validation!")

View file

@ -61,7 +61,7 @@ class Migration(Migration):
"validationLevel": "moderate", "validationLevel": "moderate",
"validationAction": "error" "validationAction": "error"
}) })
print(" Updated jobs collection validator") print(f" Updated jobs collection validator")
except Exception as e: except Exception as e:
print(f" Could not update validator: {e}") print(f" Could not update validator: {e}")
raise raise
@ -114,4 +114,4 @@ class Migration(Migration):
}) })
print(f" Rolled back migration {self.version}: {self.description}") print(f" Rolled back migration {self.version}: {self.description}")
print(" WARNING: Jobs with rendering_qc status will fail validation!") print(f" WARNING: Jobs with rendering_qc status will fail validation!")

View file

@ -64,7 +64,7 @@ class Migration(Migration):
"validationLevel": "moderate", "validationLevel": "moderate",
"validationAction": "error" "validationAction": "error"
}) })
print("✅ Updated users collection validator") print(f"✅ Updated users collection validator")
except Exception as e: except Exception as e:
print(f"⚠️ Could not update validator: {e}") print(f"⚠️ Could not update validator: {e}")
try: try:
@ -74,7 +74,7 @@ class Migration(Migration):
validationLevel="moderate", validationLevel="moderate",
validationAction="error" validationAction="error"
) )
print("✅ Created users collection with validator") print(f"✅ Created users collection with validator")
except Exception as e2: except Exception as e2:
print(f"⚠️ Could not create collection: {e2}") print(f"⚠️ Could not create collection: {e2}")
@ -134,4 +134,4 @@ class Migration(Migration):
}) })
print(f"⚠️ Rolled back migration {self.version}: {self.description}") print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print("⚠️ WARNING: Linguist role users will fail validation!") print(f"⚠️ WARNING: Linguist role users will fail validation!")

View file

@ -69,7 +69,7 @@ class Migration(Migration):
"validationLevel": "moderate", "validationLevel": "moderate",
"validationAction": "error" "validationAction": "error"
}) })
print("✅ Updated users collection validator") print(f"✅ Updated users collection validator")
except Exception as e: except Exception as e:
print(f"⚠️ Could not update validator: {e}") print(f"⚠️ Could not update validator: {e}")
try: try:
@ -79,7 +79,7 @@ class Migration(Migration):
validationLevel="moderate", validationLevel="moderate",
validationAction="error" validationAction="error"
) )
print("✅ Created users collection with validator") print(f"✅ Created users collection with validator")
except Exception as e2: except Exception as e2:
print(f"⚠️ Could not create collection: {e2}") print(f"⚠️ Could not create collection: {e2}")
@ -139,4 +139,4 @@ class Migration(Migration):
}) })
print(f"⚠️ Rolled back migration {self.version}: {self.description}") print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print("⚠️ WARNING: project_manager role users will fail validation!") print(f"⚠️ WARNING: project_manager role users will fail validation!")

View file

@ -1,6 +1,6 @@
"""Backfill memberships collection from existing pm_client_ids and team.member_user_ids.""" """Backfill memberships collection from existing pm_client_ids and team.member_user_ids."""
from datetime import UTC, datetime from datetime import datetime, timezone
from app.migrations.migrator import Migration from app.migrations.migrator import Migration
@ -13,7 +13,7 @@ class Migration(Migration):
self.description = "Backfill memberships from pm_client_ids and team member lists" self.description = "Backfill memberships from pm_client_ids and team member lists"
async def up(self) -> None: async def up(self) -> None:
now = datetime.now(UTC) now = datetime.now(timezone.utc)
upserted = 0 upserted = 0
# 1. PROJECT_MANAGER users → MANAGER membership for each pm_client_id # 1. PROJECT_MANAGER users → MANAGER membership for each pm_client_id

View file

@ -1,53 +0,0 @@
"""Add PROCESSING_FAILED status to job schema validator and create failure indexes."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-29-000000"
description = "Add processing_failed status and failure/status compound indexes on jobs"
async def up(self) -> None:
db = self.db
# Add processing_failed to the schema validator enum (if validator exists)
try:
validator_info = await db.command(
"listCollections", filter={"name": "jobs"}
)
collections = [c async for c in validator_info["cursor"]]
if collections and collections[0].get("options", {}).get("validator"):
existing_validator = collections[0]["options"]["validator"]
status_path = (
existing_validator.get("$jsonSchema", {})
.get("properties", {})
.get("status", {})
.get("enum", [])
)
if status_path and "processing_failed" not in status_path:
status_path.append("processing_failed")
await db.command(
"collMod",
"jobs",
validator=existing_validator,
validationAction="warn",
)
except Exception:
# No validator or unsupported — skip gracefully
pass
# Indexes for failure dashboard queries
await db.jobs.create_index(
[("failure.step", 1), ("status", 1)],
name="idx_jobs_failure_step_status",
background=True,
)
await db.jobs.create_index(
[("status", 1), ("organization_id", 1), ("created_at", -1)],
name="idx_jobs_status_org_created",
background=True,
)
async def down(self) -> None:
db = self.db
await db.jobs.drop_index("idx_jobs_failure_step_status")
await db.jobs.drop_index("idx_jobs_status_org_created")

View file

@ -1,46 +0,0 @@
"""Create job_briefs collection with indexes."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-29-000001"
description = "Create job_briefs collection and indexes"
async def up(self) -> None:
db = self.db
# Ensure collection exists (insert + delete a dummy doc)
try:
await db.create_collection("job_briefs")
except Exception:
pass # already exists
await db.job_briefs.create_index(
[("organization_id", 1), ("status", 1), ("created_at", -1)],
name="idx_briefs_org_status_created",
background=True,
)
await db.job_briefs.create_index(
[("created_by", 1)],
name="idx_briefs_created_by",
background=True,
)
await db.job_briefs.create_index(
[("project_id", 1)],
name="idx_briefs_project_id",
background=True,
sparse=True,
)
await db.job_briefs.create_index(
[("job_id", 1)],
name="idx_briefs_job_id",
background=True,
sparse=True,
)
async def down(self) -> None:
db = self.db
await db.job_briefs.drop_index("idx_briefs_org_status_created")
await db.job_briefs.drop_index("idx_briefs_created_by")
await db.job_briefs.drop_index("idx_briefs_project_id")
await db.job_briefs.drop_index("idx_briefs_job_id")

View file

@ -1,44 +0,0 @@
"""Backfill Membership.team_ids from Team.member_user_ids (MT-17)."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-30-000000"
description = "Backfill team_ids on Membership records from Team.member_user_ids"
async def up(self) -> None:
db = self.db
upserted = 0
# For each team that has member_user_ids, push team_id into the matching Membership
async for team in db.teams.find(
{"member_user_ids": {"$exists": True, "$ne": []}},
{"_id": 1, "client_id": 1, "member_user_ids": 1},
):
team_id = str(team["_id"])
org_id = str(team.get("client_id", ""))
for user_id in team.get("member_user_ids", []):
result = await db.memberships.update_one(
{"user_id": str(user_id), "organization_id": org_id},
{"$addToSet": {"team_ids": team_id}},
)
if result.modified_count:
upserted += 1
# Ensure index for efficient team-based lookups
await db.memberships.create_index(
[("team_ids", 1)],
name="idx_memberships_team_ids",
background=True,
sparse=True,
)
print(f"✅ Backfilled team_ids on {upserted} Membership records")
async def down(self) -> None:
db = self.db
await db.memberships.update_many({}, {"$unset": {"team_ids": ""}})
try:
await db.memberships.drop_index("idx_memberships_team_ids")
except Exception:
pass

View file

@ -1,38 +0,0 @@
"""Add cancelled status to job schema validator."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-30-000001"
description = "Add cancelled status to jobs collection schema validator"
async def up(self) -> None:
db = self.db
try:
validator_info = await db.command(
"listCollections", filter={"name": "jobs"}
)
collections = [c async for c in validator_info["cursor"]]
if collections and collections[0].get("options", {}).get("validator"):
existing_validator = collections[0]["options"]["validator"]
status_path = (
existing_validator.get("$jsonSchema", {})
.get("properties", {})
.get("status", {})
.get("enum", [])
)
if status_path and "cancelled" not in status_path:
status_path.append("cancelled")
await db.command(
"collMod",
"jobs",
validator=existing_validator,
validationAction="warn",
)
except Exception:
# No validator or unsupported — skip gracefully
pass
async def down(self) -> None:
pass

View file

@ -1,47 +0,0 @@
"""Replace status enum in $jsonSchema validator with the full current list."""
from app.migrations.migrator import Migration
ALL_STATUSES = [
"created", "ingesting", "ai_processing",
"pending_qc", "approved_english", "approved_source",
"rejected", "qc_feedback",
"translating", "tts_generating", "tts_failed",
"rendering_video", "render_failed", "rendering_qc",
"pending_final_review", "completed",
"processing_failed", "cancelled",
]
class Migration(Migration):
version = "2026-04-30-000002"
description = "Fix status enum in jobs $jsonSchema validator (add processing_failed + cancelled)"
async def up(self) -> None:
db = self.db
result = await db.command("listCollections", filter={"name": "jobs"})
batch = result.get("cursor", {}).get("firstBatch", [])
if not batch:
return
existing_validator = batch[0].get("options", {}).get("validator")
if not existing_validator:
return
schema = existing_validator.get("$jsonSchema", {})
status_prop = schema.get("properties", {}).get("status")
if not status_prop:
return
status_prop["enum"] = ALL_STATUSES
await db.command(
"collMod",
"jobs",
validator=existing_validator,
validationLevel="moderate",
validationAction="error",
)
async def down(self) -> None:
pass

View file

@ -1,26 +0,0 @@
"""Backfill source_has_ad=False on existing jobs and job_briefs."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-05-08-000000"
description = "Add source_has_ad field to jobs.source and job_briefs"
async def up(self) -> None:
db = self.db
jobs_result = await db.jobs.update_many(
{"source.source_has_ad": {"$exists": False}},
{"$set": {"source.source_has_ad": False}},
)
briefs_result = await db.job_briefs.update_many(
{"source_has_ad": {"$exists": False}},
{"$set": {"source_has_ad": False}},
)
print(f"✅ Backfilled source_has_ad on {jobs_result.modified_count} jobs, {briefs_result.modified_count} job_briefs")
async def down(self) -> None:
db = self.db
await db.jobs.update_many({}, {"$unset": {"source.source_has_ad": ""}})
await db.job_briefs.update_many({}, {"$unset": {"source_has_ad": ""}})

Binary file not shown.

Binary file not shown.

View file

@ -1,18 +1,17 @@
"""Audit log model for tracking sensitive operations.""" """Audit log model for tracking sensitive operations."""
from datetime import datetime from datetime import datetime
from enum import StrEnum from enum import Enum
from typing import Any from typing import Any, Dict, Optional
from bson import ObjectId from bson import ObjectId
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from .user import PyObjectId from .user import PyObjectId
class AuditAction(StrEnum): class AuditAction(str, Enum):
"""Enumeration of auditable actions.""" """Enumeration of auditable actions."""
# Authentication actions # Authentication actions
LOGIN_SUCCESS = "auth.login.success" LOGIN_SUCCESS = "auth.login.success"
LOGIN_FAILURE = "auth.login.failure" LOGIN_FAILURE = "auth.login.failure"
@ -20,7 +19,7 @@ class AuditAction(StrEnum):
TOKEN_REFRESH = "auth.token.refresh" TOKEN_REFRESH = "auth.token.refresh"
PASSWORD_CHANGE = "auth.password.change" PASSWORD_CHANGE = "auth.password.change"
PASSWORD_RESET = "auth.password.reset" PASSWORD_RESET = "auth.password.reset"
# User management actions # User management actions
USER_CREATE = "user.create" USER_CREATE = "user.create"
USER_UPDATE = "user.update" USER_UPDATE = "user.update"
@ -28,7 +27,7 @@ class AuditAction(StrEnum):
USER_ROLE_CHANGE = "user.role.change" USER_ROLE_CHANGE = "user.role.change"
USER_ACTIVATE = "user.activate" USER_ACTIVATE = "user.activate"
USER_DEACTIVATE = "user.deactivate" USER_DEACTIVATE = "user.deactivate"
# Job management actions # Job management actions
JOB_CREATE = "job.create" JOB_CREATE = "job.create"
JOB_UPDATE = "job.update" JOB_UPDATE = "job.update"
@ -37,89 +36,24 @@ class AuditAction(StrEnum):
JOB_REJECT = "job.reject" JOB_REJECT = "job.reject"
JOB_CANCEL = "job.cancel" JOB_CANCEL = "job.cancel"
JOB_STATUS_CHANGE = "job.status.change" JOB_STATUS_CHANGE = "job.status.change"
JOB_TASK_FAILED = "job.task.failed"
JOB_RETRY = "job.retry"
JOB_BULK_RETRY = "job.bulk_retry"
# File operations # File operations
FILE_UPLOAD = "file.upload" FILE_UPLOAD = "file.upload"
FILE_DOWNLOAD = "file.download" FILE_DOWNLOAD = "file.download"
FILE_DELETE = "file.delete" FILE_DELETE = "file.delete"
FILE_ACCESS = "file.access" FILE_ACCESS = "file.access"
# VTT editing actions # VTT editing actions
VTT_EDIT = "vtt.edit" VTT_EDIT = "vtt.edit"
VTT_APPROVE = "vtt.approve" VTT_APPROVE = "vtt.approve"
VTT_REJECT = "vtt.reject" VTT_REJECT = "vtt.reject"
VTT_RETRANSLATE = "vtt.retranslate"
# Per-language QC actions
LANGUAGE_QC_ASSIGN = "language_qc.assign"
LANGUAGE_QC_REASSIGN = "language_qc.reassign"
LANGUAGE_QC_REVIEWER_ASSIGN = "language_qc.reviewer_assign"
LANGUAGE_QC_REVIEWER_REASSIGN = "language_qc.reviewer_reassign"
LANGUAGE_QC_SUBMIT = "language_qc.submit"
LANGUAGE_QC_OPEN_REVIEW = "language_qc.open_review"
LANGUAGE_QC_APPROVE = "language_qc.approve"
LANGUAGE_QC_REJECT = "language_qc.reject"
LANGUAGE_QC_REOPEN = "language_qc.reopen"
LANGUAGE_QC_COMMENT = "language_qc.comment"
# Admin actions # Admin actions
ADMIN_CONFIG_CHANGE = "admin.config.change" ADMIN_CONFIG_CHANGE = "admin.config.change"
ADMIN_SYSTEM_ACTION = "admin.system.action" ADMIN_SYSTEM_ACTION = "admin.system.action"
ADMIN_DATA_EXPORT = "admin.data.export" ADMIN_DATA_EXPORT = "admin.data.export"
ADMIN_AUDIT_ACCESS = "admin.audit.access" ADMIN_AUDIT_ACCESS = "admin.audit.access"
# Glossary management
GLOSSARY_UPLOAD = "glossary.upload"
GLOSSARY_VERSION_UPLOAD = "glossary.version.upload"
GLOSSARY_ACTIVATE = "glossary.activate"
GLOSSARY_ARCHIVE = "glossary.archive"
# Client management
CLIENT_CREATE = "client.create"
CLIENT_UPDATE = "client.update"
CLIENT_DEACTIVATE = "client.deactivate"
CLIENT_PM_ASSIGN = "client.pm_assign"
CLIENT_PM_REMOVE = "client.pm_remove"
CLIENT_TEAM_CREATE = "client.team_create"
CLIENT_TEAM_UPDATE = "client.team_update"
CLIENT_TEAM_DELETE = "client.team_delete"
CLIENT_TEAM_MEMBER_ADD = "client.team_member_add"
CLIENT_TEAM_MEMBER_REMOVE = "client.team_member_remove"
CLIENT_PROJECT_CREATE = "client.project_create"
CLIENT_PROJECT_UPDATE = "client.project_update"
CLIENT_PROJECT_ARCHIVE = "client.project_archive"
# Organization management
ORG_CREATE = "org.create"
ORG_UPDATE = "org.update"
ORG_MEMBER_ADD = "org.member_add"
ORG_MEMBER_UPDATE = "org.member_update"
ORG_MEMBER_REMOVE = "org.member_remove"
# Invitations
INVITATION_CREATE = "invitation.create"
INVITATION_REVOKE = "invitation.revoke"
INVITATION_ACCEPT = "invitation.accept"
# Language QC (additional)
LANGUAGE_QC_BULK_ASSIGN = "language_qc.bulk_assign"
LANGUAGE_QC_START_WORK = "language_qc.start_work"
LANGUAGE_QC_MARK_CUE_REVIEWED = "language_qc.mark_cue_reviewed"
# Brief management
BRIEF_CREATE = "brief.create"
BRIEF_UPDATE = "brief.update"
BRIEF_SUBMIT = "brief.submit"
BRIEF_APPROVE = "brief.approve"
# Share tokens
SHARE_TOKEN_CREATE = "share.token_create"
SHARE_TOKEN_REVOKE = "share.token_revoke"
SHARE_CLIENT_DECISION = "share.client_decision"
# Security events # Security events
RATE_LIMIT_EXCEEDED = "security.rate_limit.exceeded" RATE_LIMIT_EXCEEDED = "security.rate_limit.exceeded"
VALIDATION_FAILURE = "security.validation.failure" VALIDATION_FAILURE = "security.validation.failure"
@ -127,9 +61,9 @@ class AuditAction(StrEnum):
SUSPICIOUS_ACTIVITY = "security.suspicious.activity" SUSPICIOUS_ACTIVITY = "security.suspicious.activity"
class AuditLogSeverity(StrEnum): class AuditLogSeverity(str, Enum):
"""Severity levels for audit events.""" """Severity levels for audit events."""
INFO = "info" # Normal operations INFO = "info" # Normal operations
WARNING = "warning" # Suspicious but not critical WARNING = "warning" # Suspicious but not critical
ERROR = "error" # Failed operations ERROR = "error" # Failed operations
@ -138,43 +72,43 @@ class AuditLogSeverity(StrEnum):
class AuditLog(BaseModel): class AuditLog(BaseModel):
"""Audit log entry model.""" """Audit log entry model."""
id: PyObjectId | None = Field(default_factory=lambda: str(ObjectId()), alias="_id") id: Optional[PyObjectId] = Field(default_factory=PyObjectId, alias="_id")
# Core audit fields # Core audit fields
timestamp: datetime = Field(default_factory=datetime.utcnow) timestamp: datetime = Field(default_factory=datetime.utcnow)
action: AuditAction action: AuditAction
severity: AuditLogSeverity = AuditLogSeverity.INFO severity: AuditLogSeverity = AuditLogSeverity.INFO
# Actor information # Actor information
user_id: PyObjectId | None = None user_id: Optional[PyObjectId] = None
user_email: str | None = None user_email: Optional[str] = None
user_role: str | None = None user_role: Optional[str] = None
# Request context # Request context
ip_address: str | None = None ip_address: Optional[str] = None
user_agent: str | None = None user_agent: Optional[str] = None
request_id: str | None = None request_id: Optional[str] = None
session_id: str | None = None session_id: Optional[str] = None
# Resource information # Resource information
resource_type: str | None = None # e.g., "job", "user", "file" resource_type: Optional[str] = None # e.g., "job", "user", "file"
resource_id: str | None = None resource_id: Optional[str] = None
resource_name: str | None = None resource_name: Optional[str] = None
# Action details # Action details
description: str description: str
details: dict[str, Any] = Field(default_factory=dict) details: Dict[str, Any] = Field(default_factory=dict)
# Outcome # Outcome
success: bool = True success: bool = True
error_message: str | None = None error_message: Optional[str] = None
# Additional metadata # Additional metadata
environment: str = "prod" environment: str = "prod"
service_name: str = "accessible-video-api" service_name: str = "accessible-video-api"
api_version: str = "v1" api_version: str = "v1"
class Config: class Config:
populate_by_name = True populate_by_name = True
arbitrary_types_allowed = True arbitrary_types_allowed = True
@ -183,49 +117,49 @@ class AuditLog(BaseModel):
class AuditLogCreate(BaseModel): class AuditLogCreate(BaseModel):
"""Schema for creating audit log entries.""" """Schema for creating audit log entries."""
action: AuditAction action: AuditAction
severity: AuditLogSeverity = AuditLogSeverity.INFO severity: AuditLogSeverity = AuditLogSeverity.INFO
description: str description: str
# Optional fields that can be provided # Optional fields that can be provided
user_id: PyObjectId | None = None user_id: Optional[PyObjectId] = None
user_email: str | None = None user_email: Optional[str] = None
user_role: str | None = None user_role: Optional[str] = None
ip_address: str | None = None ip_address: Optional[str] = None
user_agent: str | None = None user_agent: Optional[str] = None
request_id: str | None = None request_id: Optional[str] = None
resource_type: str | None = None resource_type: Optional[str] = None
resource_id: str | None = None resource_id: Optional[str] = None
resource_name: str | None = None resource_name: Optional[str] = None
details: dict[str, Any] = Field(default_factory=dict) details: Dict[str, Any] = Field(default_factory=dict)
success: bool = True success: bool = True
error_message: str | None = None error_message: Optional[str] = None
class AuditLogQuery(BaseModel): class AuditLogQuery(BaseModel):
"""Schema for querying audit logs.""" """Schema for querying audit logs."""
# Time range # Time range
start_date: datetime | None = None start_date: Optional[datetime] = None
end_date: datetime | None = None end_date: Optional[datetime] = None
# Filters # Filters
action: AuditAction | None = None action: Optional[AuditAction] = None
severity: AuditLogSeverity | None = None severity: Optional[AuditLogSeverity] = None
user_id: PyObjectId | None = None user_id: Optional[PyObjectId] = None
user_email: str | None = None user_email: Optional[str] = None
resource_type: str | None = None resource_type: Optional[str] = None
resource_id: str | None = None resource_id: Optional[str] = None
success: bool | None = None success: Optional[bool] = None
# Search # Search
search: str | None = None # Full-text search in description and details search: Optional[str] = None # Full-text search in description and details
# Pagination # Pagination
skip: int = 0 skip: int = 0
limit: int = 100 limit: int = 100
# Sorting # Sorting
sort_by: str = "timestamp" sort_by: str = "timestamp"
sort_order: int = -1 # -1 for descending, 1 for ascending sort_order: int = -1 # -1 for descending, 1 for ascending
@ -233,7 +167,7 @@ class AuditLogQuery(BaseModel):
class AuditLogResponse(BaseModel): class AuditLogResponse(BaseModel):
"""Response schema for audit log queries.""" """Response schema for audit log queries."""
logs: list[AuditLog] logs: list[AuditLog]
total_count: int total_count: int
page: int page: int

View file

@ -1,5 +1,5 @@
from datetime import datetime from datetime import datetime
from typing import Annotated from typing import Optional, Annotated
from bson import ObjectId from bson import ObjectId
from pydantic import BaseModel, BeforeValidator from pydantic import BaseModel, BeforeValidator
@ -17,12 +17,12 @@ PyObjectId = Annotated[str, BeforeValidator(validate_object_id)]
class Client(BaseModel): class Client(BaseModel):
id: str | None = None id: Optional[str] = None
name: str name: str
slug: str slug: str
is_active: bool = True is_active: bool = True
created_at: datetime | None = None created_at: Optional[datetime] = None
updated_at: datetime | None = None updated_at: Optional[datetime] = None
class ClientCreate(BaseModel): class ClientCreate(BaseModel):
@ -31,18 +31,18 @@ class ClientCreate(BaseModel):
class ClientUpdate(BaseModel): class ClientUpdate(BaseModel):
name: str | None = None name: Optional[str] = None
slug: str | None = None slug: Optional[str] = None
is_active: bool | None = None is_active: Optional[bool] = None
class Team(BaseModel): class Team(BaseModel):
id: str | None = None id: Optional[str] = None
name: str name: str
client_id: str client_id: str
member_user_ids: list[str] = [] member_user_ids: list[str] = []
created_at: datetime | None = None created_at: Optional[datetime] = None
updated_at: datetime | None = None updated_at: Optional[datetime] = None
class TeamCreate(BaseModel): class TeamCreate(BaseModel):
@ -50,31 +50,22 @@ class TeamCreate(BaseModel):
class TeamUpdate(BaseModel): class TeamUpdate(BaseModel):
name: str | None = None name: Optional[str] = None
class Project(BaseModel): class Project(BaseModel):
id: str | None = None id: Optional[str] = None
name: str name: str
client_id: str client_id: str
is_active: bool = True is_active: bool = True
default_languages: list[str] = [] created_at: Optional[datetime] = None
default_linguist_id: str | None = None updated_at: Optional[datetime] = None
default_reviewer_id: str | None = None
created_at: datetime | None = None
updated_at: datetime | None = None
class ProjectCreate(BaseModel): class ProjectCreate(BaseModel):
name: str name: str
default_languages: list[str] = []
default_linguist_id: str | None = None
default_reviewer_id: str | None = None
class ProjectUpdate(BaseModel): class ProjectUpdate(BaseModel):
name: str | None = None name: Optional[str] = None
is_active: bool | None = None is_active: Optional[bool] = None
default_languages: list[str] | None = None
default_linguist_id: str | None = None
default_reviewer_id: str | None = None

View file

@ -1,142 +0,0 @@
from __future__ import annotations
from datetime import datetime
from enum import StrEnum
from pydantic import BaseModel, Field
class GlossarySource(StrEnum):
XLSX_UPLOAD = "xlsx_upload"
FRAZE_API = "fraze_api" # reserved for future FRAZE integration
class GlossaryStatus(StrEnum):
ACTIVE = "active"
ARCHIVED = "archived"
class EmbeddingStatus(StrEnum):
PENDING = "pending"
IN_PROGRESS = "in_progress"
DONE = "done"
FAILED = "failed"
class Glossary(BaseModel):
id: str | None = Field(None, alias="_id")
client_id: str
name: str
description: str | None = None
source_locale: str # BCP-47 source column, e.g. "en-GB"
source: GlossarySource = GlossarySource.XLSX_UPLOAD
status: GlossaryStatus = GlossaryStatus.ACTIVE
current_version_id: str | None = None
created_at: datetime = Field(default_factory=datetime.utcnow)
created_by: str # user_id
model_config = {"populate_by_name": True, "arbitrary_types_allowed": True}
class GlossaryVersion(BaseModel):
id: str | None = Field(None, alias="_id")
glossary_id: str
version_number: int
source_xlsx_gcs_path: str | None = None # GCS path to original file
term_count: int = 0
embedded_count: int = 0
embedding_status: EmbeddingStatus = EmbeddingStatus.PENDING
created_at: datetime = Field(default_factory=datetime.utcnow)
created_by: str
change_note: str | None = None
model_config = {"populate_by_name": True}
class GlossaryTerm(BaseModel):
"""One source term with its per-locale translations."""
id: str | None = Field(None, alias="_id")
glossary_id: str
version_id: str
cid: str | None = None # 3M Content ID from xlsx
tid: str | None = None # 3M Term ID from xlsx
source_term: str # canonical source text (whitespace-normalised)
source_term_lower: str # lowercase for case-insensitive index
translations: dict[str, str] = {} # {locale_code: translated_text}
embedding: list[float] | None = None # 768-dim Gemini embedding
model_config = {"populate_by_name": True}
# ── Schema models (API request/response) ──────────────────────────────────────
class GlossaryCreate(BaseModel):
name: str
description: str | None = None
source_locale: str
change_note: str | None = None
class GlossaryVersionCreate(BaseModel):
source_locale: str
change_note: str | None = None
class GlossaryResponse(BaseModel):
id: str
client_id: str
name: str
description: str | None = None
source_locale: str
source: GlossarySource
status: GlossaryStatus
current_version_id: str | None = None
current_version_embedding_status: EmbeddingStatus | None = None
current_version_embedded_count: int | None = None
current_version_term_count: int | None = None
created_at: datetime
created_by: str
class GlossaryVersionResponse(BaseModel):
id: str
glossary_id: str
version_number: int
term_count: int
embedded_count: int
embedding_status: EmbeddingStatus
created_at: datetime
created_by: str
change_note: str | None = None
class GlossaryDetailResponse(GlossaryResponse):
versions: list[GlossaryVersionResponse] = []
class GlossaryTermPreview(BaseModel):
"""Subset of GlossaryTerm for UI previews."""
source_term: str
translations: dict[str, str]
class MatchedTerm(BaseModel):
"""A term matched against VTT source text, with the target-locale translation."""
source_term: str
target_translation: str
match_kind: str # "exact" | "vector"
score: float # 1.0 for exact, cosine similarity for vector
def glossary_from_doc(doc: dict) -> Glossary:
doc = dict(doc)
if "_id" in doc:
doc["_id"] = str(doc["_id"])
return Glossary.model_validate(doc)
def glossary_version_from_doc(doc: dict) -> GlossaryVersion:
doc = dict(doc)
if "_id" in doc:
doc["_id"] = str(doc["_id"])
return GlossaryVersion.model_validate(doc)

View file

@ -1,4 +1,5 @@
from datetime import datetime from datetime import datetime
from typing import Optional
from pydantic import BaseModel, EmailStr from pydantic import BaseModel, EmailStr
@ -6,7 +7,7 @@ from .organization import OrgRole
class Invitation(BaseModel): class Invitation(BaseModel):
id: str | None = None id: Optional[str] = None
email: str email: str
organization_id: str organization_id: str
role_in_org: OrgRole role_in_org: OrgRole
@ -14,9 +15,9 @@ class Invitation(BaseModel):
token_hash: str token_hash: str
invited_by_user_id: str invited_by_user_id: str
expires_at: datetime expires_at: datetime
accepted_at: datetime | None = None accepted_at: Optional[datetime] = None
revoked_at: datetime | None = None revoked_at: Optional[datetime] = None
created_at: datetime | None = None created_at: Optional[datetime] = None
class InvitationCreate(BaseModel): class InvitationCreate(BaseModel):
@ -39,9 +40,9 @@ class InvitationPreviewResponse(BaseModel):
class InvitationAcceptRequest(BaseModel): class InvitationAcceptRequest(BaseModel):
token: str token: str
full_name: str | None = None full_name: Optional[str] = None
password: str | None = None password: Optional[str] = None
ms_id_token: str | None = None ms_id_token: Optional[str] = None
class InvitationResponse(BaseModel): class InvitationResponse(BaseModel):
@ -51,9 +52,9 @@ class InvitationResponse(BaseModel):
role_in_org: OrgRole role_in_org: OrgRole
invited_by_user_id: str invited_by_user_id: str
expires_at: datetime expires_at: datetime
accepted_at: datetime | None = None accepted_at: Optional[datetime] = None
revoked_at: datetime | None = None revoked_at: Optional[datetime] = None
created_at: datetime | None = None created_at: Optional[datetime] = None
is_expired: bool = False is_expired: bool = False
is_accepted: bool = False is_accepted: bool = False
is_revoked: bool = False is_revoked: bool = False

Some files were not shown because too many files have changed in this diff Show more