Compare commits

..

No commits in common. "main" and "dev" have entirely different histories.
main ... dev

384 changed files with 5716 additions and 31139 deletions

View file

@ -1,25 +0,0 @@
# Source Documentation Archive — 2026-04-29
## What was archived
Original non-canonical documentation files backed up before canonical structure was created.
## Files archived
| File | Migrated to |
|------|------------|
| `README.md` | Updated in place; canonical docs in `docs/` |
| `DEPLOYMENT.md` | `docs/project/runbook.md` + `docs/project/infrastructure.md` |
| `DEPLOYMENT_OPTIONS.md` | `docs/project/infrastructure.md` |
| `APACHE_DEPLOYMENT.md` | `docs/project/runbook.md` (Apache config section) |
## Rollback
To restore original files: copy from `original/` back to project root.
```
cp original/README.md ../../README.md
cp original/DEPLOYMENT.md ../../DEPLOYMENT.md
cp original/DEPLOYMENT_OPTIONS.md ../../DEPLOYMENT_OPTIONS.md
cp original/APACHE_DEPLOYMENT.md ../../APACHE_DEPLOYMENT.md
```

View file

@ -1,236 +0,0 @@
# Apache Frontend + Docker Backend Deployment Guide
## 🏗 Architecture Overview
**Frontend**: Built React app served by your existing Apache webserver
**Backend**: Docker containers running FastAPI + workers + database
```
Apache Webserver (Frontend) → Docker Backend Services
└── Built React App ├── FastAPI API (:8000)
├── Celery Workers
├── Change Stream Service
├── MongoDB
└── Redis
```
## 🚀 Deployment Steps
### 1. **Deploy Backend Services**
```bash
# 1. Create production environment file
cp .env.prod.example .env.prod
# Edit .env.prod with your production values
# 2. Start backend services only
docker-compose -f docker-compose.prod.yml up -d
# 3. Verify services are running
docker-compose -f docker-compose.prod.yml ps
```
**Running Services:**
- `accessible-video-api-prod` - FastAPI API (port 8000)
- `accessible-video-worker-prod` - Celery workers
- `accessible-video-mongo-prod` - MongoDB database
- `accessible-video-redis-prod` - Redis cache/queue
### 2. **Build and Deploy Frontend to Apache**
```bash
# 1. Configure frontend environment
cd frontend
cp .env.example .env.production.local
# Edit .env.production.local:
# VITE_API_URL=https://your-api-domain.com:8000
# VITE_SENTRY_DSN=your-sentry-dsn
# VITE_ENVIRONMENT=production
# 2. Build production frontend
npm run build
# 3. Deploy to Apache document root
sudo cp -r dist/* /var/www/html/your-app/
# OR
sudo rsync -av --delete dist/ /var/www/html/your-app/
```
### 3. **Configure Apache Virtual Host**
Create `/etc/apache2/sites-available/your-app.conf`:
```apache
<VirtualHost *:443>
ServerName your-domain.com
ServerAlias www.your-domain.com
DocumentRoot /var/www/html/your-app
# SSL Configuration
SSLEngine on
SSLCertificateFile /path/to/your/certificate.crt
SSLCertificateKeyFile /path/to/your/private.key
# Security Headers
Header always set X-Frame-Options "SAMEORIGIN"
Header always set X-Content-Type-Options "nosniff"
Header always set X-XSS-Protection "1; mode=block"
Header always set Referrer-Policy "strict-origin-when-cross-origin"
Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
# Compression
<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript
</IfModule>
# Caching for static assets
<LocationMatch "\.(css|js|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$">
ExpiresActive On
ExpiresDefault "access plus 1 year"
Header set Cache-Control "public, immutable"
</LocationMatch>
# Don't cache HTML files
<LocationMatch "\.html$">
ExpiresActive On
ExpiresDefault "access plus 0 seconds"
Header set Cache-Control "no-cache, no-store, must-revalidate"
</LocationMatch>
# React Router support (handle client-side routing)
<Directory "/var/www/html/your-app">
Options -Indexes
AllowOverride All
Require all granted
# Fallback to index.html for client-side routing
FallbackResource /index.html
</Directory>
# Optional: Proxy API requests (alternative to CORS)
# ProxyPreserveHost On
# ProxyPass /api/ http://your-docker-host:8000/api/
# ProxyPassReverse /api/ http://your-docker-host:8000/api/
# Logs
ErrorLog ${APACHE_LOG_DIR}/your-app_error.log
CustomLog ${APACHE_LOG_DIR}/your-app_access.log combined
</VirtualHost>
# HTTP to HTTPS redirect
<VirtualHost *:80>
ServerName your-domain.com
ServerAlias www.your-domain.com
Redirect permanent / https://your-domain.com/
</VirtualHost>
```
Enable the site:
```bash
sudo a2ensite your-app.conf
sudo systemctl reload apache2
```
## ⚙️ Configuration Files Updated
### `docker-compose.prod.yml`
- ✅ Removed frontend and nginx services
- ✅ Added CORS_ORIGINS environment variable
- ✅ Backend services only (API, workers, database)
### `.env.prod.example`
- ✅ Production environment template
- ✅ CORS configuration for Apache frontend
- ✅ All required variables documented
## 🔧 CORS Configuration
Since frontend and backend are on different domains, configure CORS in your backend:
**In `.env.prod`:**
```bash
CORS_ORIGINS=https://your-domain.com,https://www.your-domain.com
```
**Backend automatically handles CORS** based on this environment variable.
## 📋 Deployment Checklist
### Backend Services
- [ ] Copy `.env.prod.example` to `.env.prod`
- [ ] Update all environment variables in `.env.prod`
- [ ] Run `docker-compose -f docker-compose.prod.yml up -d`
- [ ] Verify API accessible at `http://your-docker-host:8000/docs`
- [ ] Check logs: `docker-compose -f docker-compose.prod.yml logs -f`
### Frontend Deployment
- [ ] Update `frontend/.env.production.local` with API URL
- [ ] Run `npm run build` in frontend directory
- [ ] Copy `dist/*` to Apache document root
- [ ] Configure Apache virtual host
- [ ] Enable site and reload Apache
- [ ] Test frontend loads and connects to API
### Security & Performance
- [ ] SSL certificate configured
- [ ] Security headers enabled
- [ ] Gzip compression enabled
- [ ] Static file caching configured
- [ ] CORS origins properly set
- [ ] Firewall rules: only expose port 8000 for API
## 🔍 Troubleshooting
### Common Issues
**CORS Errors:**
- Verify `CORS_ORIGINS` in `.env.prod` matches your domain
- Check browser dev tools for exact error
**API Connection Failed:**
- Verify `VITE_API_URL` in frontend build
- Check backend API is accessible from frontend server
- Ensure port 8000 is open and reachable
**React Router 404s:**
- Verify `FallbackResource /index.html` in Apache config
- Ensure `AllowOverride All` is set
**File Upload Issues:**
- Check Apache `LimitRequestBody` directive
- Verify backend can write to GCS bucket
### Monitoring Commands
```bash
# Backend services status
docker-compose -f docker-compose.prod.yml ps
# View logs
docker-compose -f docker-compose.prod.yml logs -f api
docker-compose -f docker-compose.prod.yml logs -f worker
# Apache status
sudo systemctl status apache2
sudo tail -f /var/log/apache2/your-app_error.log
```
## 🎯 Benefits of This Setup
**Separation of Concerns** - Frontend and backend independently deployable
**Existing Infrastructure** - Uses your current Apache setup
**Scalability** - Backend can be moved to different hosts easily
**Caching** - Apache handles static file caching efficiently
**SSL Termination** - Apache handles HTTPS for frontend
**Monitoring** - Separate logs and monitoring for each tier
Your backend services will run in Docker containers while the frontend integrates seamlessly with your existing Apache web server infrastructure.

View file

@ -1,168 +0,0 @@
# Deployment Options for Video Accessibility Platform
## 🏗 Current Docker Setup
Your `docker-compose.yml` serves **both frontend and backend** in **development mode**:
- **Frontend**: Vite dev server on port 5173 (hot reload)
- **Backend**: FastAPI on port 8000 (auto-reload)
- **Database**: MongoDB + Redis
- **Workers**: Celery + Change Stream service
## 🚀 Production Deployment Options
### 1. **All-in-Docker Production** ✅ Recommended
**What it does:**
- Frontend: Built React app served by Nginx (port 80)
- Backend: Production FastAPI (port 8000)
- Single `docker-compose up` deployment
**Usage:**
```bash
# Production deployment
docker-compose -f docker-compose.prod.yml up -d
# Access:
# Frontend: http://localhost:80
# Backend API: http://localhost:8000
```
**Benefits:**
- ✅ Single command deployment
- ✅ Optimized frontend build
- ✅ Production-ready configuration
- ✅ Built-in health checks
- ✅ Nginx caching and compression
### 2. **Single Domain with Nginx Proxy** ✅ Best UX
**What it does:**
- Everything served from one domain (port 80)
- `/api/*` routes to backend
- `/*` routes to frontend
- WebSocket support included
**Usage:**
```bash
# Uses nginx/nginx.conf for routing
docker-compose -f docker-compose.prod.yml up nginx
# Access everything at: http://localhost
```
**Benefits:**
- ✅ No CORS issues
- ✅ Single domain simplicity
- ✅ Better caching control
- ✅ Rate limiting built-in
- ✅ SSL termination ready
### 3. **Cloud-Native (Google Cloud)** 🌟 Enterprise
**Architecture:**
```
Frontend (Cloud Storage + CDN) → API (Cloud Run) → Database (MongoDB Atlas)
Workers (Cloud Run)
```
**Components:**
- **Frontend**: Build + deploy to Cloud Storage, serve via Cloud CDN
- **Backend**: Deploy to Cloud Run (auto-scaling)
- **Workers**: Separate Cloud Run service for Celery
- **Database**: MongoDB Atlas (managed)
- **Files**: Google Cloud Storage (already integrated)
**Benefits:**
- ✅ Auto-scaling
- ✅ Global CDN
- ✅ Managed services
- ✅ Pay-per-use
- ✅ High availability
## 📊 Comparison Matrix
| Option | Complexity | Cost | Scalability | Maintenance |
|--------|------------|------|-------------|-------------|
| **Dev Docker** | Low | Very Low | Limited | Manual |
| **Prod Docker** | Low | Low | Manual | Medium |
| **Nginx Proxy** | Medium | Low | Manual | Medium |
| **Cloud Native** | High | Variable | Automatic | Low |
## 🚀 Quick Migration Guide
### From Development → Production Docker
1. **Update environment variables:**
```bash
cp .env.example .env.prod
# Edit .env.prod with production values
```
2. **Deploy:**
```bash
docker-compose -f docker-compose.prod.yml up -d
```
3. **Verify:**
```bash
# Frontend (optimized build)
curl http://localhost:80
# Backend API
curl http://localhost:8000/health
```
### From Docker → Cloud Native
1. **Build frontend:**
```bash
cd frontend && npm run build
gsutil -m rsync -r -d dist/ gs://your-bucket/
```
2. **Deploy backend:**
```bash
gcloud run deploy video-api --source=./backend --region=us-central1
```
3. **Deploy workers:**
```bash
gcloud run deploy video-workers --source=./backend --region=us-central1
```
## 🔧 Configuration Files Created
### `docker-compose.prod.yml`
- Production-ready Docker setup
- Nginx serving frontend
- Optimized environment variables
- Health checks included
### `nginx/nginx.conf`
- Single-domain routing configuration
- API proxy with rate limiting
- WebSocket support
- Static file caching
- Security headers
## 🎯 Recommendations by Use Case
### **Small Team / MVP**
→ Use **Production Docker** (`docker-compose.prod.yml`)
### **Growing Business**
→ Use **Nginx Proxy** setup for better performance
### **Enterprise / Scale**
→ Go **Cloud Native** with Google Cloud Run + CDN
## 🔍 Current Status
**Development**: Already working with `docker-compose up`
**Production Docker**: Ready with `docker-compose.prod.yml`
**Nginx Proxy**: Configured and ready to deploy
⚠️ **Cloud Native**: Requires GCP setup and configuration
Your current Docker setup is **development-optimized**. For production, use the new `docker-compose.prod.yml` which properly builds and serves the React app through Nginx while keeping the backend API separate but coordinated.

View file

@ -1,384 +0,0 @@
# Accessible Video Processing Platform
A comprehensive AI-powered platform for generating accessible video content with closed captions, audio descriptions, and multi-language translations. Features a complete workflow from video upload to final delivery with quality control processes.
## ✅ Current Status: **Production-Ready** (85% Complete)
**Lines of Code:** 20,471 total (12,198 backend + 8,273 frontend)
## 🚀 Key Features Implemented
### Core Functionality ✅
- **AI-Powered Processing**: Complete Gemini 2.5 Pro integration for intelligent caption and audio description generation
- **Multi-Language Pipeline**: Google Translate + cultural transcreation with 50+ language support
- **Quality Control Workflow**: Full reviewer approval/rejection system with VTT editing capabilities
- **Audio Description TTS**: Google Cloud TTS and ElevenLabs integration with audio synthesis
- **Real-time Updates**: WebSocket-powered job status tracking and notifications
- **Advanced Video Player**: Multi-language caption support with timeline navigation
- **Role-Based Access Control**: Complete CLIENT/REVIEWER/ADMIN role system
### Security & Infrastructure ✅
- **JWT Authentication**: Secure access/refresh token system with HttpOnly cookies
- **Audit Logging**: Comprehensive audit trail for all reviewer actions
- **Signed URLs**: Secure Google Cloud Storage file access (24h expiry)
- **Input Validation**: Complete request validation and sanitization
- **HTTPS/CORS**: Production-ready security configuration
### User Experience ✅
- **Responsive Design**: Mobile-first Tailwind CSS implementation
- **Real-time Feedback**: Live job progress tracking and notifications
- **Advanced File Management**: Drag-and-drop uploads with progress indicators
- **VTT Editor**: Inline caption editing with live preview
- **Download Portal**: Secure asset delivery with organized file structure
## 🛠 Tech Stack
### Backend (FastAPI + Python 3.11)
- **FastAPI 0.115.0** - Modern async web framework with OpenAPI documentation
- **Celery 5.3.4** - Distributed task queue with Redis broker
- **MongoDB 7.0** - Document database with replica set support
- **Redis 7.2** - Caching and message queuing
- **Google Cloud Platform** - Storage, AI services, Secret Manager, TTS
- **Pydantic 2.5** - Data validation and serialization
- **OpenTelemetry** - Observability and monitoring
- **Sentry** - Error tracking and performance monitoring
### Frontend (React 19 + TypeScript)
- **React 19.1.1** - Modern UI framework with latest features
- **Vite 7.1.2** - Lightning-fast build tool and dev server
- **TypeScript 5.8** - Full type safety throughout application
- **TanStack Query 5.85** - Advanced server state management with caching
- **React Router 7.8** - Client-side routing with protected routes
- **Tailwind CSS 4.1** - Utility-first CSS framework
- **Zustand 5.0** - Lightweight client state management
- **React Hook Form + Zod** - Form handling with schema validation
## 🏗 Architecture Overview
### Complete Job Processing Pipeline ✅
```
Upload → Ingestion → AI Processing → QC Review → Translation → TTS → Final Review → Delivery
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
GCS Gemini 2.5 VTT Generation Human Google Text-to- Reviewer Email +
Storage Pro + Validation Review Translate Speech Approval Downloads
```
### System Architecture
- **Monorepo Structure**: `/backend`, `/frontend`, `/infra` with clear separation
- **Microservices Ready**: Modular FastAPI services with proper dependency injection
- **Event-Driven**: WebSocket real-time updates with connection management
- **Scalable Workers**: Celery task queue with auto-retry and error recovery
- **Secure by Design**: RBAC, signed URLs, audit logging, input validation
## 🚀 Getting Started
### Prerequisites
- **Python 3.11+** (backend development)
- **Node.js 18+** (frontend development)
- **Docker & Docker Compose** (required for local development)
- **Google Cloud Project** with APIs enabled (for video processing)
### 🐳 Local Development with Docker (Recommended)
This is the recommended approach for local development. Backend services run in Docker containers while the frontend runs via Vite dev server for fast hot-reload.
#### Initial Setup
```bash
# 1. Clone the repository
git clone <repository>
cd video_accessibility
# 2. Copy and configure environment files
cp .env.prod.example .env.local
# Edit .env.local with your API keys and settings
# 3. Set up frontend environment
cp frontend/.env.example frontend/.env.local
# The defaults should work for local development
# 4. Ensure GCP credentials are in place
# Copy your GCP service account JSON to: ./secrets/gcp-credentials.json
```
#### Starting the Development Environment
**Step 1: Start Backend Services (Docker)**
```bash
# Start API, Worker, MongoDB, and Redis in Docker
./scripts/run-local.sh
# Services will be available at:
# - API: http://localhost:8003
# - API Docs: http://localhost:8003/docs
# - MongoDB: mongodb://localhost:27017
# - Redis: redis://localhost:6379
```
**Step 2: Start Frontend (Vite Dev Server)**
```bash
# In a separate terminal
cd frontend
npm install # First time only
npm run dev
# Frontend will be available at:
# - Application: http://localhost:6001/video-accessibility
```
#### Useful Commands
```bash
# View logs
docker compose logs -f api # API logs
docker compose logs -f worker # Worker logs
docker compose logs -f # All logs
# Restart a service
docker compose restart api
docker compose restart worker
# Rebuild and restart (after code changes)
./scripts/run-local.sh --rebuild
# Stop all services
./scripts/run-local.sh --stop
# or
docker compose down
```
#### Test User Credentials (Local Development Only)
For testing different user roles locally:
```
Admin: admin@example.com / admin
Production: production@example.com / production
Reviewer: reviewer@example.com / reviewer
Client: client@example.com / client123
```
**Note**: These test users are only for local development. Production uses Microsoft authentication.
### Alternative: Native Development (Without Docker)
For development without Docker, you'll need to run each service manually:
```bash
# Terminal 1: MongoDB
mongod --dbpath ./data/db
# Terminal 2: Redis
redis-server
# Terminal 3: Backend API
cd backend
poetry install
poetry run uvicorn app.main:app --reload --port 8000
# Terminal 4: Celery Worker
cd backend
poetry run celery -A app.tasks worker --loglevel=info
# Terminal 5: Frontend
cd frontend
npm install
npm run dev
```
**Note**: The Docker approach is strongly recommended as it ensures consistency and simplifies setup.
### Testing & Quality
```bash
# Backend tests + linting
cd backend
poetry run pytest
poetry run ruff check .
poetry run mypy .
# Frontend tests + linting
cd frontend
npm run test
npm run test:e2e
npm run lint
npm run type-check
```
## 📁 Project Structure
```
video_accessibility/ # Root monorepo
├── backend/ # FastAPI Python backend (12,198 LOC)
│ ├── app/
│ │ ├── api/v1/ # REST API endpoints
│ │ │ ├── auth.py # JWT authentication
│ │ │ ├── jobs.py # Job CRUD & workflow
│ │ │ ├── admin.py # Admin operations
│ │ │ └── files.py # File management
│ │ ├── core/ # Core configuration
│ │ ├── models/ # Database models
│ │ ├── schemas/ # Pydantic request/response schemas
│ │ ├── services/ # External service integrations
│ │ │ ├── gemini.py # AI processing
│ │ │ ├── gcs.py # Google Cloud Storage
│ │ │ ├── translation.py # Multi-language support
│ │ │ └── tts.py # Text-to-speech
│ │ ├── tasks/ # Celery background workers
│ │ ├── middleware/ # Request processing
│ │ └── telemetry/ # Observability
│ ├── tests/ # Comprehensive test suite
│ └── Dockerfile # Container configuration
├── frontend/ # React TypeScript SPA (8,273 LOC)
│ ├── src/
│ │ ├── routes/ # Page components
│ │ │ ├── auth/ # Login system
│ │ │ ├── jobs/ # Job management
│ │ │ ├── qc/ # Quality control
│ │ │ └── admin/ # Admin interface
│ │ ├── components/ # Reusable UI components
│ │ │ ├── VideoWithCaptions.tsx # Advanced video player
│ │ │ ├── VttEditor.tsx # Caption editing
│ │ │ └── UploadDropzone.tsx # File upload
│ │ ├── lib/ # Utilities and API client
│ │ ├── hooks/ # Custom React hooks
│ │ └── types/ # TypeScript definitions
│ ├── tests/ # Unit + E2E tests
│ ├── .env.local # Local development config
│ └── Dockerfile # Container configuration
├── scripts/
│ ├── run-local.sh # Local development startup
│ ├── deploy.sh # Production deployment
│ ├── full-deploy.sh # Full production rebuild
│ └── build-frontend.sh # Frontend build script
├── docker-compose.yml # Base Docker configuration
├── docker-compose.local.yml # Local development overrides
├── docker-compose.prod.yml # Production overrides
├── .env.local # Local environment variables
├── .env.production # Production environment variables
├── CLAUDE.md # Development guidelines
└── video_accessibility_development_plan.txt # Complete specification
```
## ⚙️ Configuration
### Environment Variables
**Backend** (`backend/.env`):
```bash
# Database
MONGODB_URL=mongodb://admin:password@localhost:27017/accessible_video
REDIS_URL=redis://localhost:6379/0
# Authentication
JWT_SECRET_KEY=your-jwt-secret
JWT_REFRESH_SECRET_KEY=your-refresh-secret
# AI Services
GEMINI_API_KEY=your-gemini-key
ELEVENLABS_API_KEY=your-elevenlabs-key
# Google Cloud
GCS_BUCKET_NAME=your-bucket-name
GOOGLE_CLOUD_PROJECT=your-project-id
# Email
SENDGRID_API_KEY=your-sendgrid-key
# Monitoring
SENTRY_DSN=your-sentry-dsn
```
**Frontend** (`frontend/.env`):
```bash
VITE_API_URL=http://localhost:8000
VITE_SENTRY_DSN=your-sentry-dsn
VITE_ENVIRONMENT=development
```
### Google Cloud Setup
1. **Create GCP Project** with billing enabled
2. **Enable APIs**:
- Cloud Storage API
- Cloud Translation API
- Cloud Text-to-Speech API
- Vertex AI API (for Gemini)
- Secret Manager API
3. **Create Service Account** with roles:
- Storage Admin
- AI Platform Admin
- Secret Manager Admin
4. **Download JSON key** and set `GOOGLE_APPLICATION_CREDENTIALS`
## 🚢 Deployment Options
### Production Architecture (Google Cloud)
- **Frontend**: Cloud Storage + Cloud CDN (static hosting)
- **Backend API**: Cloud Run (serverless, auto-scaling)
- **Workers**: Cloud Run (Celery with Redis)
- **Database**: MongoDB Atlas (managed)
- **Queue**: Cloud Memorystore (Redis)
- **Storage**: Google Cloud Storage
- **Monitoring**: Cloud Monitoring + Sentry
### Docker Production
```bash
# Build production images
docker-compose -f docker-compose.prod.yml up -d
```
## 🔒 Security Features
### Implemented Security ✅
- **JWT Authentication**: Access (15min) + refresh (7 days) token rotation
- **RBAC System**: CLIENT/REVIEWER/ADMIN roles with endpoint protection
- **Secure Storage**: HttpOnly cookies for refresh tokens
- **File Security**: Signed URLs with 24h expiry, no client access to raw files
- **Input Validation**: Comprehensive Pydantic validation on all endpoints
- **Audit Logging**: Complete trail of all reviewer actions and system events
- **CORS Protection**: Configured for production domains
- **Rate Limiting**: Request throttling and validation middleware
## 🔧 API Documentation
### Key Endpoints Implemented
```
POST /api/v1/auth/login # Authentication
POST /api/v1/jobs # Create job with file upload
GET /api/v1/jobs # List jobs (filtered by role)
GET /api/v1/jobs/{id} # Job details with real-time status
POST /api/v1/jobs/{id}/actions/* # Workflow actions (approve/reject/complete)
GET /api/v1/jobs/{id}/vtt # VTT content retrieval
PATCH /api/v1/jobs/{id}/vtt # VTT editing and updates
GET /api/v1/jobs/{id}/downloads # Signed download URLs
WS /api/v1/ws/jobs/{id} # Real-time job status updates
```
**OpenAPI Documentation**: http://localhost:8000/docs
## 🎯 Development Status
### ✅ Completed (Production Ready)
- **User Management**: Full authentication, RBAC, password management
- **Job Pipeline**: Complete video processing workflow with state machine
- **Quality Control**: VTT editor, approval workflows, reviewer dashboards
- **Real-time Features**: WebSocket updates, live notifications
- **Multi-language**: Translation pipeline with cultural transcreation
- **File Management**: Secure uploads, downloads, asset validation
- **Admin Features**: User management, system monitoring, audit logs
### ⚠️ Needs Attention (Minor)
- **Integration Tests**: Framework exists but needs completion
- **Email Templates**: Service implemented, templates may need customization
- **Performance Testing**: No load testing implemented yet
- **Documentation**: API docs complete, user guides could be enhanced
### 🎯 Recommended Next Steps
1. **Complete integration test suite** for end-to-end validation
2. **Performance testing** with realistic video processing loads
3. **Production deployment** configuration and CI/CD pipeline
4. **User documentation** and training materials
5. **Monitoring dashboards** for production operations
## 📚 Development Resources
- **Complete Specification**: `video_accessibility_development_plan.txt`
- **Development Guidelines**: `CLAUDE.md`
- **API Documentation**: http://localhost:8000/docs (when running)
- **Test Coverage Reports**: `backend/htmlcov/` (after running tests)

View file

@ -1,94 +0,0 @@
{
"permissions": {
"allow": [
"WebSearch",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/backend && ruff check app/services/elevenlabs_voices.py app/services/tts.py app/api/v1/routes_tts.py app/models/job.py app/tasks/tts_synthesis.py app/core/config.py 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/backend && python -m ruff check app/services/elevenlabs_voices.py app/services/tts.py app/api/v1/routes_tts.py app/models/job.py app/tasks/tts_synthesis.py app/core/config.py 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/backend && pip3 show ruff 2>&1 | head -5; which pip3 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/frontend && npm run type-check 2>&1 | tail -20)",
"Bash(node_modules/.bin/tsc --noEmit 2>&1 | tail -20)",
"Bash(./node_modules/.bin/tsc --noEmit 2>&1 | tail -30)",
"Bash(npm run type-check 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/frontend && npm run type-check 2>&1)",
"Bash(npm run lint 2>&1)",
"WebFetch(domain:dcmp.org)",
"WebFetch(domain:www.w3.org)",
"WebFetch(domain:partnerhelp.netflixstudios.com)",
"WebFetch(domain:m.media-amazon.com)",
"WebFetch(domain:www.acb.org)",
"Bash(./node_modules/.bin/tsc --noEmit)",
"Bash(node_modules/.bin/tsc --noEmit)",
"Bash(pandoc --version)",
"WebFetch(domain:ai-sandbox.oliver.solutions)",
"Bash(gcloud run:*)",
"Bash(gcloud logging:*)",
"Bash(ssh optical:*)",
"Bash(/Volumes/SSD/Projects/Oliver/video-accessibility/backend/.venv/bin/python3.11 -c \"import sys; sys.path.insert\\(0, '.'\\); from app.models.user import UserRole; print\\([r.value for r in UserRole]\\)\")",
"Bash(npm list *)",
"Bash(brew list *)",
"Bash(npx --yes puppeteer --version)",
"Bash(node md_to_pdf.js)",
"Bash(npm root *)",
"Bash(node *)",
"Bash(ssh optical-web-1 *)",
"Bash(git *)",
"WebFetch(domain:docs.anthropic.com)",
"Bash(poetry lock *)",
"Bash(pip show *)",
"Read(//Users/ai_leed/.local/bin/**)",
"Read(//opt/homebrew/bin/**)",
"Bash(pip3 install *)",
"Bash(poetry --version)",
"Bash(docker run *)",
"Read(//Users/ai_leed/.docker/run/**)",
"Bash(docker context *)",
"Bash(DOCKER_HOST=unix:///var/run/docker.sock docker run --rm -v \"$\\(pwd\\):/app\" -w /app python:3.11-slim bash -c \"pip install poetry==1.8.2 -q && poetry lock --no-update\")",
"Bash(brew install *)",
"Bash(npm run *)",
"Bash(scp /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/models/audit_log.py optical:/tmp/audit_log.py)",
"Bash(scp *)",
"Bash(kill %1)",
"Bash(ssh optical-dev *)",
"Skill(fullstack-dev-skills:security-reviewer)",
"Bash(chmod +x *)",
"Bash(gcloud auth *)",
"Bash(gcloud config *)",
"Bash(gcloud artifacts *)",
"Bash(sed -n '190,200p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '1914,1922p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '2048,2062p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '2490,2502p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '2628,2638p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(gcloud builds submit *)",
"Bash(gcloud builds describe 79802b34-e17b-4446-b01d-68d99d569262 *)",
"Bash(gcloud compute instances list *)",
"Bash(gcloud compute networks vpc-access connectors list *)",
"Bash(gcloud builds *)",
"Bash(gcloud projects get-iam-policy optical-414516 *)",
"Bash(gcloud projects *)",
"Bash(npm audit *)",
"Skill(codebase-audit-suite:ln-622-build-auditor)",
"Skill(codebase-audit-suite:ln-624-code-quality-auditor)",
"Skill(codebase-audit-suite:ln-625-dependencies-auditor)",
"Skill(codebase-audit-suite:ln-626-dead-code-auditor)",
"Bash(/opt/homebrew/bin/ruff check *)",
"Bash(npm test *)",
"Bash(sed -n '35,42p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/src/test/utils.tsx)",
"Bash(sed -n '55,90p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/tests/helpers/auth.ts)",
"Bash(sed -n '48,60p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/src/components/Layout/Sidebar.tsx)",
"Bash(sed -n '152,170p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/src/components/Layout/Sidebar.tsx)",
"Bash(poetry env *)",
"Bash(poetry install *)",
"Bash(poetry run *)",
"Bash(docker info *)",
"Bash(sed -n '1,30p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/services/gcs.py)",
"Bash(sed -n '155,165p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/services/gcs.py)",
"Bash(gcloud secrets *)",
"Bash(openssl rand *)",
"Bash(ssh *)",
"Skill(commit-commands:commit-push-pr)",
"Bash(obsidian read *)",
"Bash(obsidian search *)"
]
}
}

View file

@ -10,8 +10,6 @@ REDIS_URL=redis://redis:6379/0
# JWT Authentication
JWT_SECRET_KEY=your-production-jwt-secret-key-min-32-chars
JWT_REFRESH_SECRET_KEY=your-production-refresh-secret-key-min-32-chars
# Required: admin account created on first boot. Unset = admin not seeded.
DEFAULT_ADMIN_PASSWORD=your-secure-admin-password
# AI Services
GEMINI_API_KEY=your-gemini-api-key
@ -21,11 +19,8 @@ ELEVENLABS_API_KEY=your-elevenlabs-api-key
GCS_BUCKET_NAME=your-production-bucket-name
GOOGLE_CLOUD_PROJECT=your-gcp-project-id
# Email Service (Mailgun)
SENDGRID_API_KEY=
MAILGUN_API_KEY=your-mailgun-api-key
MAILGUN_DOMAIN=mg.oliver.solutions
MAILGUN_FROM=noreply@mg.oliver.solutions
# Email Service
SENDGRID_API_KEY=your-sendgrid-api-key
# Monitoring
SENTRY_DSN=your-sentry-dsn-url

View file

@ -9,18 +9,18 @@
# App Configuration
# -----------------------------------------------------------------------------
APP_ENV=prod
API_BASE_URL=https://optical-dev.oliver.solutions/video-accessibility
API_BASE_URL=https://ai-sandbox.oliver.solutions/video-accessibility-back
# -----------------------------------------------------------------------------
# Authentication & Security
# -----------------------------------------------------------------------------
# IMPORTANT: Generate a secure random secret for JWT_SECRET
# Example: openssl rand -hex 32
JWT_SECRET=d81fd31798510f53b374951908b6bedd75f7ddaabe9b4e4c4ca5bf81393f48b7
JWT_SECRET=CHANGE_ME_TO_SECURE_RANDOM_64_CHAR_STRING
JWT_ALG=HS256
JWT_ACCESS_TTL_MIN=240
JWT_REFRESH_TTL_DAYS=7
COOKIE_DOMAIN=optical-dev.oliver.solutions
COOKIE_DOMAIN=ai-sandbox.oliver.solutions
COOKIE_SECURE=true
COOKIE_SAMESITE=Lax
@ -63,31 +63,29 @@ TRANSLATE_API_KEY=
ELEVENLABS_API_KEY=sk_c17be2768ca784f1807018420b84c7f1ee969946e698f986
# -----------------------------------------------------------------------------
# Email Configuration (Mailgun)
# Email Configuration (SendGrid)
# -----------------------------------------------------------------------------
# IMPORTANT: Get SendGrid API key from https://app.sendgrid.com/settings/api_keys
SENDGRID_API_KEY=
MAILGUN_API_KEY=1d8c6f38c53f237305353cc2e55f39f2-c6620443-4b9961f5
MAILGUN_DOMAIN=mg.oliver.solutions
MAILGUN_FROM=noreply@mg.oliver.solutions
# Email sender address
EMAIL_FROM=noreply@mg.oliver.solutions
# Email sender address (must be verified in SendGrid)
EMAIL_FROM=noreply@ai-sandbox.oliver.solutions
# Client-facing URL (used in emails)
CLIENT_BASE_URL=https://optical-dev.oliver.solutions/video-accessibility
CLIENT_BASE_URL=https://ai-sandbox.oliver.solutions/video-accessibility
# -----------------------------------------------------------------------------
# Microsoft Authentication (Azure AD)
# -----------------------------------------------------------------------------
AZURE_CLIENT_ID=9079054c-9620-4757-a256-23413042f1ef
AZURE_AUTHORITY=https://login.microsoftonline.com/e519c2e6-bc6d-4fdf-8d9c-923c2f002385
AZURE_REDIRECT_URI=https://optical-dev.oliver.solutions/video-accessibility/
AZURE_REDIRECT_URI=https://ai-sandbox.oliver.solutions/video-accessibility/
# -----------------------------------------------------------------------------
# CORS Configuration
# -----------------------------------------------------------------------------
# Comma-separated list of allowed origins
CORS_ORIGINS=https://optical-dev.oliver.solutions
CORS_ORIGINS=https://ai-sandbox.oliver.solutions
# -----------------------------------------------------------------------------
# Observability & Monitoring (Optional)
@ -118,9 +116,6 @@ OTEL_EXPORTER_OTLP_ENDPOINT=
WHISPER_SERVICE_URL=https://whisper-http-service-bcb6ipdqka-uc.a.run.app
FFMPEG_SERVICE_URL=https://ffmpeg-http-service-bcb6ipdqka-uc.a.run.app
# optical-dev uses Celery workers (not Cloud Run Jobs) for pipeline dispatch
USE_CELERY_FALLBACK=true
# Worker Concurrency (higher values for Cloud Run mode since workers just make HTTP calls)
WHISPER_WORKER_CONCURRENCY=10
FFMPEG_WORKER_CONCURRENCY=20

View file

@ -1,23 +0,0 @@
# Screenshot capture credentials — copy to .env.screenshots and fill in values
# NEVER commit .env.screenshots (it is gitignored)
BASE_URL=https://optical-dev.oliver.solutions/video-accessibility
# Local-password admin seeded by backend/scripts/seed_test_users.py
TEST_ADMIN_EMAIL=test-admin@oliver.agency
TEST_ADMIN_PASSWORD=TestAdmin2026!
TEST_CLIENT_EMAIL=test-client@oliver.agency
TEST_CLIENT_PASSWORD=TestClient2026!
TEST_LINGUIST_EMAIL=test-linguist@oliver.agency
TEST_LINGUIST_PASSWORD=TestLinguist2026!
TEST_REVIEWER_EMAIL=test-reviewer@oliver.agency
TEST_REVIEWER_PASSWORD=TestReviewer2026!
TEST_PRODUCTION_EMAIL=test-production@oliver.agency
TEST_PRODUCTION_PASSWORD=TestProduction2026!
TEST_PM_EMAIL=test-pm@oliver.agency
TEST_PM_PASSWORD=TestPM2026!

13
.gitignore vendored
View file

@ -12,7 +12,6 @@ examples/
.env.local
.env.production
.env.*.local
.env.screenshots
secrets/
*.pem
*.key
@ -99,15 +98,3 @@ docs/*.pdf
/var/www/html/video-accessibility.backup.*
backend/.env
# Node / npm artifacts at repo root (Playwright MCP installs these)
node_modules/
package.json
package-lock.json
# Playwright MCP session snapshots
.playwright-mcp/
# Test videos
test-video.mp4
.worktrees/

View file

@ -1,118 +0,0 @@
# Build Health Audit — ln-622
**Score: 5.5/10** | Issues: 28 (C:0 H:5 M:18 L:5)
**Date:** 2026-04-30 | **Stack:** Python 3.11 / FastAPI / Celery + React 19 / Vite / TypeScript 5.8
---
## 1. Compiler / Linter Errors
### Backend — ruff: 1314 errors (HIGH)
`ruff check app/` exits non-zero with 1314 violations. The ruff config in `pyproject.toml` uses **deprecated top-level `select`/`ignore`/`per-file-ignores`** instead of `[tool.ruff.lint]` — ruff emits a warning on every run.
Top violation codes:
| Code | Meaning | Volume |
|------|---------|--------|
| I001 | Import block unsorted | ~400 |
| UP | pyupgrade (f-strings, typing aliases) | ~500 |
| B | flake8-bugbear | ~200 |
| F401 | Unused import | 58 |
Most violations are **auto-fixable** (`ruff check --fix`). The unsorted imports and UP rules are cosmetic but make CI noisy and block future enforcement.
**Severity: HIGH** — CI cannot gate on ruff without fixing this first.
### Frontend — ESLint: 36 problems (30 errors, 6 warnings) (MEDIUM)
Key errors:
| File | Rule | Count |
|------|------|-------|
| `contexts/GlobalWebSocketContext.tsx:56` | `react-refresh/only-export-components` | 1 |
| `contexts/NotificationContext.tsx:91` | `react-refresh/only-export-components` | 1 |
| `contexts/ToastContext.tsx:83` | `react-refresh/only-export-components` | 1 |
| `lib/api.ts:539` | `@typescript-eslint/no-explicit-any` | 1 |
| `routes/admin/QCDetail.tsx` | `@typescript-eslint/no-explicit-any` | 6 |
| `routes/AcceptInvite.tsx` | `@typescript-eslint/no-explicit-any` | 1 |
| `routes/jobs/JobDetail.tsx` | `no-unused-vars` (err catch) | 2 |
| `hooks/__tests__/useJob.test.tsx` | `no-unused-vars` | 1 |
| `tests/helpers/auth.ts` | `no-explicit-any` | 3 |
**Severity: MEDIUM** — build succeeds, but `any` types and react-refresh errors degrade DX and HMR.
---
## 2. Type Errors
### Frontend — tsc: CLEAN ✓
`tsc --noEmit` exits 0. No TypeScript compilation errors. The `any` issues above are ESLint-level, not tsc errors.
### Backend — mypy: NOT RUN
Cannot run mypy outside the poetry venv. Needs `poetry run mypy .` inside Docker or an activated venv.
**Severity: LOW** (mypy not blocking, but should be run in CI)
---
## 3. Tests
### Frontend — vitest: 13 failed / 75 total (HIGH)
8 test files affected:
| Test | Failures | Root cause |
|------|----------|-----------|
| `auth.test.ts` | 1 | Mock shape mismatch — response has extra field `organizationId` |
| `StatusBadge.test.tsx` | 1 | Unknown status no longer renders text (component changed) |
| `VttEditor.test.tsx` | 1 | Multiple elements found for `Insert cue before` title — DOM duplication |
| `useJob.test.tsx` | 3 | `useApproveEnglish` — pending state never resolves in test (timeout 1s); `useCreateJob` arg mismatch |
| `UploadDropzone.test.tsx` | 6 | Text broken across elements — test uses exact string match, component renders in `<span>` nodes |
| `useJobStatusWebSocket.test.tsx` | 1 | (see output) |
**Severity: HIGH** — 17% test failure rate. Several are stale tests from component refactors (UploadDropzone, StatusBadge).
### Backend — pytest: CANNOT RUN (CRITICAL)
Running `pytest` outside poetry venv fails with `ModuleNotFoundError` for `fastapi`, `aiohttp`, etc. Tests must be run with `poetry run pytest` inside Docker or an activated poetry environment.
The `backend/.venv` exists but appears to be a plain venv, not the poetry-managed one. **Tests are effectively unrunnable in local dev without explicit poetry activation.**
**Severity: CRITICAL** — Developers with system Python cannot run tests without explicit setup steps.
---
## 4. Build Configuration Issues
### ruff config deprecated (MEDIUM)
`pyproject.toml` uses `[tool.ruff]` top-level `select`, `ignore`, `per-file-ignores`. Current ruff ≥ 0.2 expects `[tool.ruff.lint]`. Fix:
```toml
# Before
[tool.ruff]
select = ["E", "W", ...]
ignore = ["E501", ...]
# After
[tool.ruff]
target-version = "py311"
line-length = 88
[tool.ruff.lint]
select = ["E", "W", ...]
ignore = ["E501", ...]
```
### Backend venv mismatch (MEDIUM)
`backend/.venv` cannot run `ruff`, `pytest`, or `mypy` — they are installed in the poetry-managed venv, not this one. Confusing to new devs.
### AGENTS.md commands incorrect (LOW)
`AGENTS.md` documents `cd backend && poetry run pytest` but the backend has `.venv` and `pyproject.toml` with no Makefile wrapper. The actual working path is `cd backend && .venv/bin/python -m pytest` or requires `poetry shell`.
---
## Summary
| Check | Result | Severity |
|-------|--------|---------|
| ruff backend | 1314 violations (auto-fixable) | HIGH |
| ESLint frontend | 36 problems | MEDIUM |
| tsc frontend | ✓ Clean | OK |
| mypy backend | Not runnable locally | LOW |
| vitest frontend | 13/75 failing | HIGH |
| pytest backend | Not runnable locally | CRITICAL |
| ruff config | Deprecated syntax | MEDIUM |
| venv setup | Confusing / broken | MEDIUM |

View file

@ -1,116 +0,0 @@
# Code Quality Audit — ln-624
**Score: 5.0/10** | Issues: 22 (C:2 H:8 M:9 L:3)
**Date:** 2026-04-30
---
## 1. God Classes / Files (> 500 lines)
| File | Lines | Severity |
|------|-------|---------|
| `backend/app/api/v1/routes_jobs.py` | 2882 | **CRITICAL** |
| `frontend/src/routes/admin/QCDetail.tsx` | 2079 | **CRITICAL** |
| `backend/app/services/video_renderer.py` | 1695 | **HIGH** |
| `frontend/src/routes/jobs/JobsList.tsx` | 1246 | **HIGH** |
| `frontend/src/lib/api.ts` | 1056 | **HIGH** |
| `backend/app/tasks/translate_and_synthesize.py` | 1019 | **HIGH** |
| `frontend/src/routes/jobs/NewJob.tsx` | 1038 | **HIGH** |
| `frontend/src/types/api.ts` | 891 | **MEDIUM** |
| `frontend/src/routes/jobs/JobDetail.tsx` | 732 | **MEDIUM** |
| `frontend/src/routes/admin/UserDetail.tsx` | 523 | **MEDIUM** |
| `frontend/src/hooks/useJobStatusWebSocket.ts` | 443 | **MEDIUM** |
**routes_jobs.py at 2882 lines** is the worst offender — it mixes upload, approval, translation, TTS, VTT editing, download, admin, and websocket concerns in a single router. Splitting by domain (e.g., `routes_upload.py`, `routes_vtt.py`, `routes_review.py`, `routes_tts.py`) would bring each under 500 lines.
**QCDetail.tsx at 2079 lines** handles the entire QC workflow, VTT display, audio preview, language selection, and approval modals in one component. Needs extraction of at minimum: `LanguageQCPanel`, `VttReviewView`, `ApprovalModal`.
---
## 2. Long Methods (> 100 lines)
| File:line | Function | Length | Severity |
|-----------|---------|--------|---------|
| `tasks/translate_and_synthesize.py:109` | `_async_translate_and_synthesize()` | 485 lines | **CRITICAL** |
| `services/video_renderer.py:487` | `_render_pause_insert_method()` | 419 lines | **CRITICAL** |
| `tasks/ingest_and_ai.py:53` | `ingest_and_ai_task_impl()` | 276 lines | **HIGH** |
| `tasks/rerender_accessible_video.py:110` | `_async_rerender_accessible_video()` | 280 lines | **HIGH** |
| `tasks/render_accessible_video.py:56` | `_async_render_accessible_video()` | 287 lines | **HIGH** |
| `api/v1/routes_jobs.py:1552` | `update_job_vtt_content()` | 215 lines | **HIGH** |
| `tasks/notify.py:29` | `run_async()` | 169 lines | **HIGH** |
| `api/v1/routes_jobs.py:2738` | `update_tts_preferences()` | 144 lines | **MEDIUM** |
| `services/whisper_service.py:241` | `_find_sentence_boundaries()` | 120 lines | **MEDIUM** |
| `services/gemini.py:591` | `analyze_accessible_video_placement()` | 132 lines | **MEDIUM** |
The two most critical ones (`_async_translate_and_synthesize` at 485 lines and `_render_pause_insert_method` at 419 lines) are orchestrator-style functions with sequential pipeline steps. They could be split into named pipeline stages, each ~50 lines.
---
## 3. Deep Nesting
Not systematically scanned with a tool (radon/lizard not installed). The long functions above likely contain 45+ nesting levels given their complexity.
---
## 4. Too Many Parameters
| Location | Function | Params | Severity |
|----------|---------|--------|---------|
| `services/gemini.py` | `extract_accessibility_targeted()` | 7+ | **MEDIUM** |
| `tasks/translate_and_synthesize.py` | `_generate_language_tts()` | 8+ | **MEDIUM** |
Pattern: many functions pass `db`, `job`, `language`, `settings`, `gcs_client`, etc. individually instead of grouping into a context dataclass.
---
## 5. Magic Numbers
### Backend (MEDIUM)
Scattered timing constants without named definitions:
- TTS retry delays (hardcoded seconds)
- chunk sizes in upload
- Audio padding values in video_renderer.py
### Frontend (LOW)
Mostly clean. Some inline pixel values in Tailwind (acceptable). No concerning business-logic magic numbers found.
---
## 6. N+1 Query Patterns (MEDIUM)
Potential N+1 patterns found:
- `app/main.py:102``async for job_doc in db.jobs.find(...)` — check if this iterates and makes additional queries per document
- `app/core/dependencies.py:185``async for m in db.memberships.find(...)` — membership lookup per request in auth middleware (acceptable if cached, but no caching observed)
- `app/core/authz.py:54``async for doc in db.memberships.find(...)` — similar pattern in auth check
These are all async iterators over `find()` — not necessarily N+1 if no nested DB calls, but should be reviewed for `.find()` calls inside the loop body.
---
## 7. Method Signature Quality
### Boolean flag parameters (MEDIUM)
Several async functions in tasks accept `bool` flags controlling behavior variants (e.g., `skip_tts`, `force_regenerate`). These should be enums or separate functions.
### Unclear return types (MEDIUM)
Some routes return `dict` or untyped responses instead of Pydantic response models. `routes_admin_production.py` has a few endpoints returning bare dicts.
---
## 8. Side-Effect Cascade Depth
`_async_translate_and_synthesize()` at 485 lines is the worst case: it writes to GCS, updates MongoDB, dispatches TTS tasks, sends notifications, and updates job status — 5+ distinct side-effect categories from a single function call. This warrants extraction into an orchestrator that delegates to named sink functions.
---
## Summary
| Check | Status | Severity |
|-------|--------|---------|
| God files (>500L) | 11 files | CRITICAL×2, HIGH×4 |
| Long methods (>100L) | 10 functions | CRITICAL×2, HIGH×5 |
| N+1 patterns | 3 potential | MEDIUM |
| Magic numbers | Some in tasks | MEDIUM |
| Method signatures | Boolean flags, unclear returns | MEDIUM |
| Side-effect cascade | translate_and_synthesize | HIGH |
**Primary recommendation:** Split `routes_jobs.py` and `QCDetail.tsx` — these two files account for the majority of the quality debt.

View file

@ -1,94 +0,0 @@
# Dependencies & Reuse Audit — ln-625
**Score: 7.5/10** | Issues: 9 (C:0 H:2 M:5 L:2)
**Date:** 2026-04-30
---
## 1. Vulnerability Scan (CVE/CVSS)
### Frontend — npm audit: ✓ CLEAN
```
Total packages: 479
Vulnerabilities: info:0 low:0 moderate:0 high:0 critical:0 total:0
```
Zero CVEs. Excellent.
### Backend — pip-audit: NOT RUN
`pip-audit` not installed in local env. Recommended to add to CI:
```bash
pip install pip-audit && pip-audit -r requirements.txt
```
Given many heavy deps (Celery 5.3, google-cloud-*, faster-whisper, aiohttp), a CI scan is strongly advised.
---
## 2. Outdated Packages
### Frontend — npm outdated (many minor/major updates pending)
**MAJOR version gaps (HIGH):**
| Package | Installed | Latest | Notes |
|---------|-----------|--------|-------|
| `@azure/msal-browser` | 4.25.0 | **5.9.0** | MSAL v5 has breaking API changes |
| `@azure/msal-react` | 3.0.20 | **5.3.2** | Paired with msal-browser, coordinated upgrade needed |
| `@sentry/react` | 8.55.0 | **10.51.0** | Sentry v10 has breaking changes |
| `typescript` | 5.8.3 | **6.0.3** | TS 6 has strictness changes |
| `vite` | 7.3.2 | **8.0.10** | Vite 8 breaking changes |
| `eslint` | 9.33.0 | **10.2.1** | ESLint 10 config format may change |
| `jsdom` | 26.1.0 | **29.1.1** | Test environment |
**Minor updates (LOW-MEDIUM):** Most other packages have minor/patch updates pending (react 19.1→19.2, tailwindcss 4.1→4.2, etc.)
**Recommendation:** Keep MSAL and Sentry on current major until dedicated upgrade sprint. React, TailwindCSS, react-query minor updates are safe to apply immediately.
### Backend — pip outdated: pip-audit not available
Based on pyproject.toml dates vs ecosystem:
- `ruff ^0.1.6` → installed ruff is `0.15.12` (already updated, good)
- `google-genai ^1.56.0` → recently updated per git log
- `faster-whisper ^1.2.0` → check for 1.x updates
---
## 3. Unused Dependencies
### Backend — `sendgrid` (MEDIUM)
`pyproject.toml` lists `sendgrid = "^6.11.0"`. However:
- The actual emailer (`app/services/emailer.py`) uses **Mailgun** REST API via `httpx`
- `sendgrid` is referenced **only** in `app/core/config.py` as a dead config field `sendgrid_api_key: str = ""` with comment `# Email (Mailgun — primary; sendgrid_api_key kept for backward compat)`
- No `import sendgrid` anywhere in app code
**Action:** Remove `sendgrid` from `pyproject.toml` dependencies and remove the `sendgrid_api_key` config field.
### Frontend — no unused dependencies found
- `axios` → used in `lib/api.ts`
- `@azure/msal-*` → used in `main.tsx`, `routes/Login.tsx`
- `date-fns` → used in 5+ components
- `zustand`, `@tanstack/react-query`, `react-hook-form`, `zod` → all actively used
- `react-dropzone` → used in upload components
---
## 4. Available Native Alternatives
### Frontend — axios vs fetch (LOW)
`axios` is used for all API calls in `lib/api.ts`. The project targets modern browsers and uses Vite. Native `fetch` + `AbortController` could replace axios, reducing bundle by ~14kb gzipped. However, axios provides request/response interceptors that are actively used for auth token refresh — migration effort is medium. **Not urgent.**
---
## 5. Custom Implementations
No custom crypto or hand-rolled validation libraries found. All auth uses `python-jose` + `libpass` (bcrypt). VTT parsing is domain-specific and not replaceable by a library. No concerns.
---
## Summary
| Check | Result | Severity |
|-------|--------|---------|
| Frontend CVEs | ✓ 0 vulnerabilities | OK |
| Backend CVEs | ⚠ Not scanned | MEDIUM |
| Frontend major updates | MSAL×2, Sentry, TS, Vite, ESLint | HIGH |
| Frontend minor updates | Many | LOW |
| Backend unused dep | `sendgrid` in pyproject.toml | MEDIUM |
| Native alternatives | axios → fetch possible | LOW |
| Custom implementations | None found | OK |

View file

@ -1,143 +0,0 @@
# Dead Code Audit — ln-626
**Score: 7.0/10** | Issues: 14 (C:0 H:0 M:6 L:8)
**Date:** 2026-04-30
---
## 1. Unused Imports (Python — F401)
ruff detected **58 unused import violations** across backend. Sample:
| File | Unused import |
|------|--------------|
| `routes_admin.py:9` | `get_current_user` |
| `routes_admin.py:11` | `verify_password` |
| `routes_admin.py:16` | `ChangePasswordRequest` |
| `routes_admin.py:23` | `log_security_event` |
| (+ 54 more across all files) | |
All are auto-fixable with `ruff check --fix --select F401`. The `__init__.py` files are correctly excluded via `per-file-ignores`.
**Severity: MEDIUM** — clutters imports, increases cognitive load when reading files.
---
## 2. Deprecated / Legacy Types (Frontend)
`frontend/src/types/api.ts` contains 3 deprecated exported types with JSDoc markers:
| Line | Type | Marker |
|------|------|--------|
| 96 | `TtsVoicesResponse` | `@deprecated Use ProviderVoicesResponse instead` |
| 137 | `TtsOptionsResponse` | `@deprecated Use ProviderOptionsResponse instead` |
| 555-566 | `Client` / `OrganizationLegacy` | `@deprecated Use Organization instead` + `export { Client as OrganizationLegacy }` |
These types are still exported, meaning consumers could use them by mistake. If no external consumers exist (library not published), they should be deleted.
**Severity: MEDIUM** — active deprecation markers indicate intent to remove. Leaving them causes confusion.
---
## 3. Legacy Status Values (Frontend)
`frontend/src/types/api.ts:12,14`:
```ts
| "tts_failed" // legacy: keep for back-compat
| "render_failed" // legacy: keep for back-compat
```
These job statuses are marked as legacy. If the backend no longer emits them, they are dead type branches. If it still does (for old jobs in MongoDB), they're valid — but should be clearly documented with a removal condition.
**Severity: LOW** — no runtime impact, but requires clarification.
---
## 4. Backward Compatibility Code (Frontend)
### lib/api.ts:239 — Legacy approval method (MEDIUM)
```ts
// Legacy method - calls approve_source for backwards compatibility
```
A backward-compat shim in the API client. If all callers have been updated to the new method, this should be removed.
### VideoWithCaptions.tsx:1643 — Legacy single-language props (MEDIUM)
```ts
// Legacy single-language props (still supported)
sourceLanguage?: string; // Language code for legacy props
// Legacy props
// Combine legacy props with tracks (use useMemo to prevent recreation)
```
The component maintains backward-compat with old single-language prop API. If no callers use these legacy props, they can be removed.
### JobDetail.tsx:41 — Legacy status mapping (LOW)
```ts
// Handle legacy approved_english/approved_source statuses (map to pending_final_review)
```
Status mapping shim for old job records. Should be removed after all existing jobs are migrated.
---
## 5. Commented-Out Code (Backend)
| File | Line | Content |
|------|------|---------|
| `telemetry/tracing.py:5` | `# from opentelemetry.exporter.gcp.trace import CloudTraceSpanExporter # Disabled for local dev` | GCP trace exporter disabled |
| `telemetry/metrics.py:5` | `# from opentelemetry.exporter.prometheus import PrometheusMetricReader # Disabled for local dev` | Prometheus reader disabled |
| `pyproject.toml` | `# opentelemetry-exporter-prometheus = ... # Temporarily disabled - version conflicts` | Dep commented out |
These are intentional (local dev vs prod config), not dead code. However, the conditional should be expressed via environment config, not source comments. **Low priority.**
**Severity: LOW**
---
## 6. Leftover .old Files (MEDIUM)
| File | Age | Action |
|------|-----|--------|
| `docker-compose.yml.old` | Created 2026-03-03 (~2 months) | Delete |
| `backend/Dockerfile.old` | Created 2026-03-03 (~2 months) | Delete |
| `backend/.dockerignore.old` | — | Delete |
These files have no build references. Git history preserves them.
---
## 7. Unused Dockerfiles
| File | Referenced in compose? |
|------|----------------------|
| `backend/Dockerfile.ffmpeg-service` | No — ffmpeg is embedded in main worker |
| `backend/Dockerfile.cloudrun` | Yes — referenced for Cloud Run deploys |
| `backend/Dockerfile.whisper-service` | Yes — whisper-worker service in compose |
`Dockerfile.ffmpeg-service` appears to be dead — the main Dockerfile handles ffmpeg. Should be confirmed and deleted if unused.
**Severity: LOW**
---
## 8. Dead Config Field
`backend/app/core/config.py:272`:
```python
sendgrid_api_key: str = "" # Email (Mailgun — primary; sendgrid_api_key kept for backward compat)
```
`sendgrid` package not used. Config field and `secrets_config.py` secret reference both dead.
**Severity: MEDIUM** — misleads ops into configuring a sendgrid secret that has no effect.
---
## Summary
| Check | Issues | Severity |
|-------|--------|---------|
| Unused Python imports | 58 (auto-fixable) | MEDIUM |
| Deprecated TS types | 3 types | MEDIUM |
| Backward-compat shims | 3 in frontend | MEDIUM |
| Commented-out code | 3 telemetry lines | LOW |
| .old files | 3 files | MEDIUM |
| Unused Dockerfile | Dockerfile.ffmpeg-service | LOW |
| Dead config field | sendgrid_api_key | MEDIUM |
| Legacy status values | 2 status strings | LOW |

View file

@ -1,97 +0,0 @@
# Accessible Video Processing Platform — Project Entry Point
<!-- SCOPE: root | owner: ln-111 | generated: 2026-04-29 -->
## What Is This Project
AI-powered SaaS platform that generates legally-required accessibility assets from video files: closed captions, audio descriptions, SDH captions, and descriptive transcripts. Outputs are reviewed through a human QC workflow before client delivery. 50+ language translation and cultural transcreation are built in.
**Client:** Oliver Internal
**Server:** optical-web-1
**Status:** 85% production-ready
---
## Quick Navigation
| Need | Go to |
|------|-------|
| Architecture, data flow, state machine | [docs/project/architecture.md](docs/project/architecture.md) |
| Tech stack versions and config | [docs/project/tech_stack.md](docs/project/tech_stack.md) |
| API endpoint reference | [docs/project/api_spec.md](docs/project/api_spec.md) |
| Database collections and indexes | [docs/project/database_schema.md](docs/project/database_schema.md) |
| Infrastructure inventory | [docs/project/infrastructure.md](docs/project/infrastructure.md) |
| Runbook — deploy, restart, rollback | [docs/project/runbook.md](docs/project/runbook.md) |
| Functional requirements | [docs/project/requirements.md](docs/project/requirements.md) |
| Development principles | [docs/principles.md](docs/principles.md) |
| Reference — ADRs, guides, research | [docs/reference/README.md](docs/reference/README.md) |
| Task management | [docs/tasks/README.md](docs/tasks/README.md) |
| Test strategy and commands | [tests/README.md](tests/README.md) |
| Documentation hub | [docs/README.md](docs/README.md) |
---
## Entry Points by Audience
| Audience | Start here |
|----------|-----------|
| New developer | [docs/project/runbook.md](docs/project/runbook.md) → local setup section |
| Reviewer / QC | [docs/project/requirements.md](docs/project/requirements.md) → QC workflow section |
| DevOps | [docs/project/infrastructure.md](docs/project/infrastructure.md) + [docs/project/runbook.md](docs/project/runbook.md) |
| Security reviewer | [docs/project/architecture.md](docs/project/architecture.md) → security section |
| AI agent | Read this file → pick topic → read `_index`-equivalent doc → synthesize |
---
## Core Pipeline (one-line summary per stage)
| Stage | What happens | Key file |
|-------|-------------|---------|
| Upload | MP4 → GCS + MongoDB job record | `routes_files.py` |
| Ingestion | Celery worker transcribes with Gemini 2.5 Pro | `tasks/ingest_and_ai.py` |
| AI Processing | VTT generated, validated, stored in GCS | `services/gemini.py` |
| QC Review | Reviewer edits VTT, approves or rejects | `services/language_qc.py` |
| Translation | Google Translate + transcreation per language | `tasks/translate_and_synthesize.py` |
| TTS | Per-cue audio synthesis (Google TTS / ElevenLabs) | `services/tts.py` |
| Final Review | PM approves deliverables | `routes_language_qc.py` |
| Delivery | Signed GCS URLs emailed to client | `services/emailer.py` |
See full state machine (16 states) in [docs/project/architecture.md](docs/project/architecture.md#job-state-machine).
---
## Development Commands
| Action | Command |
|--------|---------|
| Start local (Docker + Vite) | `./scripts/run-local.sh` |
| Rebuild after code change | `./scripts/run-local.sh --rebuild` |
| Stop all local services | `./scripts/run-local.sh --stop` |
| Backend lint | `cd backend && ruff check .` |
| Backend type-check | `cd backend && mypy .` (run in Docker container) |
| Frontend lint | `cd frontend && npm run lint` |
| Frontend type-check | `cd frontend && npm run type-check` |
| Backend tests | `cd backend && poetry run pytest` |
| Frontend tests | `cd frontend && npm run test` |
| E2E tests | `cd frontend && npm run test:e2e` |
---
## Key Constraints
- **NO SSH to optical-web-1** without explicit user instruction — hard rule in CLAUDE.md
- **Access tokens in memory only** (not localStorage) — auth architecture constraint
- **Refresh tokens in HttpOnly cookies** — security requirement
- **Signed GCS URLs** expire in 24h — do not cache or store URLs
- **RBAC enforced server-side** — never trust client-supplied role claims
- **All reviewer actions emit audit log entries** — compliance requirement
---
## Maintenance
**Update triggers:** New route added, deployment target changes, key dependency version change, new team member onboarded.
**Verification:** All links in Quick Navigation resolve. Entry commands are correct against current scripts/.
<!-- END SCOPE: root -->

View file

@ -1,8 +1,5 @@
# Accessible Video Processing Platform - Development Guide
<!-- Documentation entry point: see @AGENTS.md for full project navigation -->
@AGENTS.md
## Project Overview
This is a comprehensive video accessibility platform that automatically generates closed captions and audio descriptions using AI, with quality control workflows and multi-language support.

Binary file not shown.

View file

@ -2,8 +2,6 @@
A comprehensive AI-powered platform for generating accessible video content with closed captions, audio descriptions, and multi-language translations. Features a complete workflow from video upload to final delivery with quality control processes.
**Documentation:** See [AGENTS.md](AGENTS.md) for full navigation, or [docs/README.md](docs/README.md) for the documentation hub.
## ✅ Current Status: **Production-Ready** (85% Complete)
**Lines of Code:** 20,471 total (12,198 backend + 8,273 frontend)

View file

@ -1,96 +1,172 @@
# =============================================================================
# Apache config fragment — Accessible Video Platform
# Inject into: /etc/apache2/sites-available/optical-dev.oliver.solutions-ssl.conf
#
# Required modules:
# sudo a2enmod proxy proxy_http proxy_wstunnel rewrite headers
#
# Container port map:
# accessible-video-api → 0.0.0.0:8012->8000/tcp
# Apache Configuration for Accessible Video Platform
# =============================================================================
# Add this configuration to your existing VirtualHost for ai-sandbox.oliver.solutions
# Location: /etc/apache2/sites-available/ai-sandbox.oliver.solutions-ssl.conf
# =============================================================================
# ── Timeouts for large video uploads (up to 2 GB, ~10 min) ──────────────────
<IfModule mod_proxy.c>
ProxyTimeout 600
</IfModule>
# -----------------------------------------------------------------------------
# Frontend - Static React SPA served from subdirectory
# -----------------------------------------------------------------------------
# ── WebSocket proxy (MUST be before /api/ HTTP proxy) ───────────────────────
# disablereuse=on prevents long-lived WS connections from exhausting the pool
ProxyPassMatch ^/video-accessibility/api/v1/ws/(.*)$ ws://127.0.0.1:8012/api/v1/ws/$1 disablereuse=on
ProxyPassReverse /video-accessibility/api/v1/ws/ ws://127.0.0.1:8012/api/v1/ws/
# ── API proxy ────────────────────────────────────────────────────────────────
# Strips /video-accessibility prefix — FastAPI sees /api/v1/...
ProxyPassMatch ^/video-accessibility/api/(.*)$ http://127.0.0.1:8012/api/$1
ProxyPassReverse /video-accessibility/api/ http://127.0.0.1:8012/api/
# Swagger / OpenAPI
ProxyPassMatch ^/video-accessibility/docs(/.*)?$ http://127.0.0.1:8012/docs$1
ProxyPassReverse /video-accessibility/docs http://127.0.0.1:8012/docs
ProxyPassMatch ^/video-accessibility/openapi\.json$ http://127.0.0.1:8012/openapi.json
ProxyPassReverse /video-accessibility/openapi.json http://127.0.0.1:8012/openapi.json
# ── SPA static files ─────────────────────────────────────────────────────────
# Serve frontend from /video-accessibility subdirectory
Alias /video-accessibility /var/www/html/video-accessibility
<Directory /var/www/html/video-accessibility>
# Basic options
Options -Indexes +FollowSymLinks
AllowOverride None
AllowOverride All
Require all granted
# Allow video uploads up to 2 GB
LimitRequestBody 2147483648
# React SPA routing - rewrite all requests to index.html
RewriteEngine On
RewriteBase /video-accessibility/
RewriteBase /video-accessibility
# Serve real files/directories directly (JS, CSS, assets, fonts)
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ - [L]
# Don't rewrite files or directories that exist
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# Everything else → index.html (React Router handles client-side nav)
RewriteRule ^ index.html [L]
# Cache-bust hashed assets indefinitely; never cache HTML
<FilesMatch "\.(js|css|woff2?|ttf|eot|png|jpg|jpeg|gif|ico|svg)$">
Header set Cache-Control "public, max-age=31536000, immutable"
</FilesMatch>
<FilesMatch "\.html$">
Header set Cache-Control "no-cache, no-store, must-revalidate"
</FilesMatch>
# Rewrite everything else to index.html
RewriteRule ^ /video-accessibility/index.html [L]
# Security headers
Header always set X-Frame-Options "SAMEORIGIN"
Header always set X-Content-Type-Options "nosniff"
Header always set X-XSS-Protection "1; mode=block"
Header always set Referrer-Policy "strict-origin-when-cross-origin"
# Cache control for static assets
<FilesMatch "\.(js|css|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$">
Header set Cache-Control "public, max-age=31536000, immutable"
</FilesMatch>
# No cache for HTML files
<FilesMatch "\.(html)$">
Header set Cache-Control "no-cache, no-store, must-revalidate"
Header set Pragma "no-cache"
Header set Expires "0"
</FilesMatch>
</Directory>
# -----------------------------------------------------------------------------
# Backend API - Reverse proxy to Docker container
# -----------------------------------------------------------------------------
# Proxy backend API to Docker container on port 8000
<Location /video-accessibility-back>
# Preserve original host header
ProxyPreserveHost On
# Proxy HTTP requests
ProxyPass http://localhost:8000
ProxyPassReverse http://localhost:8000
# Proxy timeout settings (important for long-running video processing)
ProxyTimeout 300
# WebSocket support (CRITICAL for real-time job updates)
RewriteEngine On
RewriteCond %{HTTP:Upgrade} =websocket [NC]
RewriteRule /video-accessibility-back/(.*) ws://localhost:8000/$1 [P,L]
RewriteCond %{HTTP:Upgrade} !=websocket [NC]
RewriteRule /video-accessibility-back/(.*) http://localhost:8000/$1 [P,L]
# Security headers
Header always set X-Frame-Options "SAMEORIGIN"
Header always set X-Content-Type-Options "nosniff"
# CORS is handled by the backend, don't add headers here
</Location>
# -----------------------------------------------------------------------------
# Required Apache Modules
# -----------------------------------------------------------------------------
# Enable these modules with:
# sudo a2enmod rewrite
# sudo a2enmod proxy
# sudo a2enmod proxy_http
# sudo a2enmod proxy_wstunnel
# sudo a2enmod headers
# sudo systemctl restart apache2
# Verify modules are enabled:
# apache2ctl -M | grep -E '(rewrite|proxy|headers)'
# =============================================================================
# Full VirtualHost skeleton (reference — values match optical-web-1)
# Full VirtualHost Example
# =============================================================================
# Example of complete VirtualHost configuration:
#
# <VirtualHost *:443>
# ServerName optical-dev.oliver.solutions
# ServerName ai-sandbox.oliver.solutions
# ServerAdmin admin@oliver.solutions
#
# DocumentRoot /var/www/html
#
# # SSL Configuration (with wildcard cert)
# SSLEngine on
# SSLCertificateFile /path/to/wildcard.crt
# SSLCertificateKeyFile /path/to/wildcard.key
# SSLCertificateFile /path/to/wildcard-ai-sandbox.oliver.solutions.crt
# SSLCertificateKeyFile /path/to/wildcard-ai-sandbox.oliver.solutions.key
# SSLCertificateChainFile /path/to/chain.crt # If needed
#
# SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1
# # SSL Protocol and Cipher settings
# SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1
# SSLCipherSuite HIGH:!aNULL:!MD5
#
# # — paste the block above here —
# # Frontend configuration (from above)
# Alias /video-accessibility /var/www/html/video-accessibility
# <Directory /var/www/html/video-accessibility>
# ...
# </Directory>
#
# ErrorLog ${APACHE_LOG_DIR}/optical-dev-error.log
# CustomLog ${APACHE_LOG_DIR}/optical-dev-access.log combined
# # Backend API configuration (from above)
# <Location /video-accessibility-back>
# ...
# </Location>
#
# # Logging
# ErrorLog ${APACHE_LOG_DIR}/ai-sandbox-error.log
# CustomLog ${APACHE_LOG_DIR}/ai-sandbox-access.log combined
# </VirtualHost>
# =============================================================================
# Verify
# Testing & Verification
# =============================================================================
# sudo apache2ctl configtest
# sudo systemctl reload apache2
# curl -I https://optical-dev.oliver.solutions/video-accessibility/
# curl https://optical-dev.oliver.solutions/video-accessibility/api/v1/health
# wscat -c wss://optical-dev.oliver.solutions/video-accessibility/api/v1/ws/job-list
# Test Apache configuration:
# sudo apache2ctl configtest
#
# Restart Apache:
# sudo systemctl restart apache2
#
# Test frontend:
# curl -I https://ai-sandbox.oliver.solutions/video-accessibility
#
# Test backend:
# curl https://ai-sandbox.oliver.solutions/video-accessibility-back/health
#
# Test WebSocket (requires wscat):
# wscat -c wss://ai-sandbox.oliver.solutions/video-accessibility-back/api/v1/ws/job-list
# =============================================================================
# Troubleshooting
# =============================================================================
# Check Apache logs:
# sudo tail -f /var/log/apache2/ai-sandbox-error.log
# sudo tail -f /var/log/apache2/ai-sandbox-access.log
#
# Check if backend is running:
# curl http://localhost:8000/health
#
# Check Docker containers:
# cd /opt/accessible-video
# docker-compose ps
#
# Common issues:
# - 502 Bad Gateway: Backend container not running
# - 404 Not Found: Frontend not deployed or Apache alias incorrect
# - WebSocket fails: mod_proxy_wstunnel not enabled
# - CORS errors: Check backend CORS configuration, not Apache

92
backend/.dockerignore.old Normal file
View file

@ -0,0 +1,92 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# Poetry (keep poetry.lock for reproducible builds)
# poetry.lock
# Virtual environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# OS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
# Testing
.coverage
.pytest_cache/
.mypy_cache/
.tox/
htmlcov/
coverage.xml
*.cover
.hypothesis/
# Documentation
docs/
*.md
README*
# Logs
*.log
logs/
# Git
.git/
.gitignore
# Docker
Dockerfile*
.dockerignore
docker-compose*
# CI/CD
.github/
# Local development
.env.local
.env.development
.env.test
# Temporary files
tmp/
temp/
*.tmp
*.bak

1
backend/.gitignore vendored
View file

@ -23,7 +23,6 @@ eggs/
.eggs/
lib/
lib64/
!app/lib/
parts/
sdist/
var/

View file

@ -3,8 +3,8 @@
# =============================================================================
# Stage 1: Builder - Install dependencies
# Stage 2: Base - Common runtime for API and Worker
# Stage 3: API - FastAPI + Gunicorn (no ffmpeg — heavy tasks run on Cloud Run Jobs)
# Stage 4: Worker - Celery worker, lightweight queues only (notify, embed)
# Stage 3: API - FastAPI + Gunicorn (with ffmpeg for TTS audio conversion)
# Stage 4: Worker - Celery worker (with ffmpeg for video processing)
# =============================================================================
# -----------------------------------------------------------------------------
@ -19,7 +19,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# Install Poetry
RUN pip install --no-cache-dir poetry==2.1.4
RUN pip install --no-cache-dir poetry==1.8.2
# Configure Poetry to not create virtual environment (we're in a container)
ENV POETRY_NO_INTERACTION=1 \
@ -33,7 +33,7 @@ COPY pyproject.toml poetry.lock ./
# Install dependencies using Poetry directly (simpler and more reliable)
RUN poetry config virtualenvs.create false \
&& poetry install --only main --no-root --no-interaction --no-ansi \
&& poetry install --only main --no-interaction --no-ansi \
&& rm -rf $POETRY_CACHE_DIR
# -----------------------------------------------------------------------------
@ -46,7 +46,6 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
libmagic1 \
curl \
tini \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
@ -73,10 +72,21 @@ USER app
# -----------------------------------------------------------------------------
# Stage 3: API - FastAPI + Gunicorn (Production API Server)
# Heavy pipeline tasks (ingest/translate/render) run on Cloud Run Jobs
# -----------------------------------------------------------------------------
FROM base AS api
# Switch to root to install ffmpeg
USER root
# Install ffmpeg for TTS audio conversion
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Switch back to non-root user
USER app
# Set production environment variables
ENV APP_ENV=prod
@ -94,10 +104,22 @@ ENTRYPOINT ["tini", "--"]
CMD ["gunicorn", "-c", "gunicorn_conf.py", "app.main:app"]
# -----------------------------------------------------------------------------
# Stage 4: Worker - Celery Worker (lightweight queues: notify, embed)
# Stage 4: Worker - Celery Worker (with ffmpeg for video processing)
# -----------------------------------------------------------------------------
FROM base AS worker
# Switch back to root to install ffmpeg
USER root
# Install ffmpeg for video processing
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Switch back to non-root user
USER app
# Set production environment variables
# WORKER_CONCURRENCY can be overridden at runtime (default: 8)
ENV APP_ENV=prod \
@ -126,6 +148,18 @@ CMD celery -A celery_worker worker \
# -----------------------------------------------------------------------------
FROM base AS whisper-worker
# Switch back to root to install ffmpeg
USER root
# Install ffmpeg for audio extraction
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Switch back to non-root user
USER app
# Pre-download Whisper medium model during build to avoid cold start delays
# Model is cached in ~/.cache/huggingface/hub (~1.5GB)
RUN python -c "from faster_whisper import WhisperModel; WhisperModel('medium', device='cpu', compute_type='int8')"

View file

@ -1,55 +0,0 @@
# =============================================================================
# Cloud Run Job image — va-worker
#
# Reuses the multi-stage base from Dockerfile.
# Entrypoint: python -m app.tasks.runner --task <name> --job-id <id>
#
# Build:
# docker build -f backend/Dockerfile.cloudrun -t va-worker backend/
# =============================================================================
# ── Stage 1: Builder ─────────────────────────────────────────────────────────
FROM python:3.11-slim AS builder
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential curl \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir poetry==1.8.3
WORKDIR /app
COPY pyproject.toml poetry.lock ./
RUN poetry config virtualenvs.create false \
&& poetry install --no-interaction --no-ansi --only main
# ── Stage 2: Runtime ─────────────────────────────────────────────────────────
FROM python:3.11-slim AS runtime
# ffmpeg required for video rendering tasks
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
tini \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy installed packages from builder
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
WORKDIR /app
COPY . .
# Non-root user for security
RUN groupadd -r worker && useradd -r -g worker worker \
&& chown -R worker:worker /app
USER worker
# Cloud Run Jobs: no persistent HTTP port needed.
# Cloud Run passes CLOUD_RUN_TASK_INDEX and CLOUD_RUN_TASK_COUNT env vars.
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONPATH=/app
ENTRYPOINT ["tini", "--", "python", "-m", "app.tasks.runner"]
# Args are injected per-execution via Cloud Run Job overrides:
# --task ingest|translate|render|rerender --job-id <id> [--language <lang>] ...

127
backend/Dockerfile.old Normal file
View file

@ -0,0 +1,127 @@
# Build stage - Install dependencies and build wheels
FROM python:3.11-slim AS builder
# Install build dependencies
RUN apt-get update && apt-get install -y \
build-essential \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install Poetry
RUN pip install poetry==1.8.2
# Set Poetry configuration
ENV POETRY_NO_INTERACTION=1 \
POETRY_VENV_IN_PROJECT=1 \
POETRY_CACHE_DIR=/tmp/poetry_cache
WORKDIR /app
# Copy dependency files
COPY pyproject.toml poetry.lock ./
# Install dependencies into venv
RUN poetry config virtualenvs.in-project true && \
poetry lock --no-update || true && \
poetry install --only=main --no-root && \
rm -rf $POETRY_CACHE_DIR
# Base runtime stage
FROM python:3.11-slim AS base
# Install runtime system dependencies
RUN apt-get update && apt-get install -y \
ffmpeg \
curl \
tini \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Create non-root user
RUN groupadd --gid 1000 app \
&& useradd --uid 1000 --gid app --shell /bin/bash --create-home app
# Set working directory
WORKDIR /app
# Copy virtual environment from builder stage
COPY --from=builder --chown=app:app /app/.venv /app/.venv
# Ensure venv is in PATH
ENV PATH="/app/.venv/bin:$PATH"
# Copy application code
COPY --chown=app:app . .
# Switch to non-root user
USER app
# Production API stage
FROM base AS production
# Set environment variables for production
ENV APP_ENV=prod \
PYTHONPATH=/app \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Expose port
EXPOSE 8000
# Use tini as init system for proper signal handling
ENTRYPOINT ["tini", "--"]
# Default command for API server
CMD ["gunicorn", "-c", "gunicorn_conf.py"]
# Worker stage for Celery workers
FROM base AS worker
# Set environment variables for worker
ENV APP_ENV=prod \
PYTHONPATH=/app \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
C_FORCE_ROOT=1
# Health check for worker (check if Celery is responding)
HEALTHCHECK --interval=60s --timeout=15s --start-period=10s --retries=3 \
CMD python -c "from celery import Celery; app=Celery('app'); print('Worker healthy')" || exit 1
# Use tini as init system for proper signal handling
ENTRYPOINT ["tini", "--"]
# Default command for Celery worker
CMD ["celery", "-A", "app.tasks", "worker", "--loglevel=info", "--concurrency=1"]
# Development stage with dev dependencies
FROM builder AS development
# Install all dependencies including dev
RUN poetry install --no-root && rm -rf $POETRY_CACHE_DIR
# Install additional dev tools
RUN apt-get update && apt-get install -y \
git \
vim \
&& rm -rf /var/lib/apt/lists/*
# Copy application code
COPY --chown=app:app . .
# Switch to non-root user
USER app
# Set environment for development
ENV APP_ENV=dev \
PYTHONPATH=/app \
PYTHONUNBUFFERED=1
EXPOSE 8000
# Development command with hot reload
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]

View file

@ -22,7 +22,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# Install Poetry
RUN pip install --no-cache-dir poetry==2.1.4
RUN pip install --no-cache-dir poetry==1.8.2
# Configure Poetry to not create virtual environment
ENV POETRY_NO_INTERACTION=1 \
@ -36,7 +36,7 @@ COPY pyproject.toml poetry.lock ./
# Install dependencies
RUN poetry config virtualenvs.create false \
&& poetry install --only main --no-root --no-interaction --no-ansi \
&& poetry install --only main --no-interaction --no-ansi \
&& rm -rf $POETRY_CACHE_DIR
# -----------------------------------------------------------------------------

Binary file not shown.

Binary file not shown.

View file

@ -1,28 +1,26 @@
from datetime import datetime, timedelta
from typing import Optional
from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException, Query, Request, status
from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, get_membership_context
from ...core.database import get_database
from ...core.dependencies import get_current_user, require_roles
from ...core.logging import get_logger
from ...core.security import get_password_hash
from ...models.audit_log import AuditAction, AuditLogQuery, AuditLogResponse
from ...core.security import get_password_hash, verify_password
from ...models.user import User, UserRole
from ...models.audit_log import AuditAction, AuditLogQuery, AuditLogResponse
from ...schemas.auth import (
AdminStatsResponse,
ChangePasswordRequest,
CreateUserRequest,
ResetPasswordRequest,
UpdateUserRequest,
UserListResponse,
UserResponse,
)
from ...services.audit_logger import (
audit_logger,
log_user_management,
)
from ...services.audit_logger import audit_logger, log_user_management, log_security_event
from ...telemetry import app_metrics
logger = get_logger(__name__)
@ -32,49 +30,29 @@ router = APIRouter(prefix="/admin", tags=["admin"])
@router.get("/users", response_model=UserListResponse)
async def list_users(
page: int = Query(1, ge=1),
size: int = Query(20, ge=1, le=500),
role: str | None = Query(None, description="Single role or comma-separated list, e.g. 'linguist,admin'"),
size: int = Query(20, ge=1, le=100),
role: Optional[str] = Query(None),
active_only: bool = Query(True),
org_id: str | None = Query(None, description="Filter by org (platform admin only)"),
current_user: User = Depends(require_roles(UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List users with filtering and pagination (admin only)"""
query: dict = {}
query = {}
if role:
roles = [r.strip() for r in role.split(",") if r.strip()]
query["role"] = {"$in": roles} if len(roles) > 1 else roles[0]
query["role"] = role
if active_only:
query["is_active"] = True
if not ctx.is_platform_admin:
# Org-scoped admin: show only users in their org(s) via membership collection
accessible_org_ids = ctx.accessible_org_ids()
if not accessible_org_ids:
return UserListResponse(users=[], total=0, page=page, size=size)
member_ids_cursor = db.memberships.find(
{"organization_id": {"$in": accessible_org_ids}},
{"user_id": 1},
)
member_ids = [doc["user_id"] async for doc in member_ids_cursor]
query["_id"] = {"$in": member_ids}
elif org_id:
# Platform admin filtered to a specific org
member_ids_cursor = db.memberships.find({"organization_id": org_id}, {"user_id": 1})
member_ids = [doc["user_id"] async for doc in member_ids_cursor]
query["_id"] = {"$in": member_ids}
# Get total count
total = await db.users.count_documents(query)
# Get paginated results
skip = (page - 1) * size
cursor = db.users.find(query, {"hashed_password": 0}).sort("created_at", -1).skip(skip).limit(size)
users = await cursor.to_list(length=size)
user_responses = []
for user_doc in users:
user_responses.append(UserResponse(
@ -86,9 +64,8 @@ async def list_users(
is_active=user_doc["is_active"],
created_at=user_doc.get("created_at", datetime.utcnow()).isoformat(),
pm_client_ids=user_doc.get("pm_client_ids", []),
languages=user_doc.get("languages", []),
))
return UserListResponse(
users=user_responses,
total=total,
@ -97,32 +74,6 @@ async def list_users(
)
@router.get("/brief-assignees", response_model=list[UserResponse])
async def list_brief_assignees(
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Return users who can be assigned a brief (PM, production, admin). Accessible to all brief-creating roles."""
docs = await db.users.find(
{
"role": {"$in": [UserRole.ADMIN.value, UserRole.PROJECT_MANAGER.value, UserRole.PRODUCTION.value]},
"is_active": True,
},
{"hashed_password": 0},
).sort("full_name", 1).to_list(None)
return [UserResponse(
id=str(d["_id"]),
email=d["email"],
full_name=d["full_name"],
role=d["role"],
auth_provider=d.get("auth_provider", "local"),
is_active=d["is_active"],
created_at=d.get("created_at", datetime.utcnow()).isoformat() if d.get("created_at") else None,
pm_client_ids=d.get("pm_client_ids", []),
languages=d.get("languages", []),
) for d in docs]
@router.get("/users/{user_id}", response_model=UserResponse)
async def get_user(
user_id: str,
@ -136,7 +87,7 @@ async def get_user(
status_code=status.HTTP_404_NOT_FOUND,
detail="User not found"
)
return UserResponse(
id=str(user_doc["_id"]),
email=user_doc["email"],
@ -146,7 +97,6 @@ async def get_user(
is_active=user_doc["is_active"],
created_at=user_doc.get("created_at", datetime.utcnow()).isoformat(),
pm_client_ids=user_doc.get("pm_client_ids", []),
languages=user_doc.get("languages", []),
)
@ -165,7 +115,7 @@ async def create_user(
status_code=status.HTTP_400_BAD_REQUEST,
detail="User with this email already exists"
)
# Create user document
user_id = str(ObjectId())
user_doc = {
@ -179,12 +129,12 @@ async def create_user(
"created_at": datetime.utcnow(),
"updated_at": datetime.utcnow()
}
await db.users.insert_one(user_doc)
# Record metrics
app_metrics.record_auth_attempt("user_created", user_data.role.value)
logger.info(f"Admin {current_user.id} created user {user_id} with role {user_data.role.value}")
await log_user_management(
AuditAction.USER_CREATE, user_id, current_user, request,
@ -200,7 +150,6 @@ async def create_user(
is_active=True,
created_at=user_doc["created_at"].isoformat(),
pm_client_ids=[],
languages=[],
)
@ -220,7 +169,7 @@ async def update_user(
status_code=status.HTTP_404_NOT_FOUND,
detail="User not found"
)
# Check if email is being changed and doesn't conflict
if user_update.email and user_update.email != user_doc["email"]:
existing_user = await db.users.find_one({"email": user_update.email, "_id": {"$ne": user_id}})
@ -229,10 +178,10 @@ async def update_user(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Email already in use by another user"
)
# Build update document
update_data = {"updated_at": datetime.utcnow()}
if user_update.email:
update_data["email"] = user_update.email
if user_update.full_name:
@ -241,19 +190,19 @@ async def update_user(
update_data["role"] = user_update.role.value
if user_update.is_active is not None:
update_data["is_active"] = user_update.is_active
# Update user
result = await db.users.find_one_and_update(
{"_id": user_id},
{"$set": update_data},
return_document=True
)
logger.info(f"Admin {current_user.id} updated user {user_id}")
action = AuditAction.USER_ROLE_CHANGE if user_update.role else AuditAction.USER_UPDATE
await log_user_management(
action, user_id, current_user, request,
details=dict(user_update.dict(exclude_none=True).items()),
details={k: v for k, v in user_update.dict(exclude_none=True).items()},
)
return UserResponse(
@ -265,7 +214,6 @@ async def update_user(
is_active=result["is_active"],
created_at=result.get("created_at", datetime.utcnow()).isoformat(),
pm_client_ids=result.get("pm_client_ids", []),
languages=result.get("languages", []),
)
@ -282,7 +230,7 @@ async def deactivate_user(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Cannot deactivate your own account"
)
result = await db.users.update_one(
{"_id": user_id},
{
@ -292,13 +240,13 @@ async def deactivate_user(
}
}
)
if result.matched_count == 0:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="User not found"
)
logger.info(f"Admin {current_user.id} deactivated user {user_id}")
await log_user_management(AuditAction.USER_DEACTIVATE, user_id, current_user, request)
@ -316,10 +264,10 @@ async def admin_reset_password(
# Generate temporary password
import secrets
import string
temp_password = ''.join(secrets.choice(string.ascii_letters + string.digits) for _ in range(12))
hashed_password = get_password_hash(temp_password)
result = await db.users.update_one(
{"_id": user_id},
{
@ -329,15 +277,15 @@ async def admin_reset_password(
}
}
)
if result.matched_count == 0:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="User not found"
)
logger.info(f"Admin {current_user.id} reset password for user {user_id}")
# In production, send email with temp password instead of returning it
return {
"message": "Password reset successfully",
@ -353,23 +301,23 @@ async def get_admin_stats(
"""Get system statistics (production/admin only)"""
# Get user count
total_users = await db.users.count_documents({"is_active": True})
# Get job counts
total_jobs = await db.jobs.count_documents({})
# Get jobs by status
pipeline = [
{"$group": {"_id": "$status", "count": {"$sum": 1}}}
]
status_counts = await db.jobs.aggregate(pipeline).to_list(None)
jobs_by_status = {item["_id"]: item["count"] for item in status_counts}
# Get jobs created today
today_start = datetime.utcnow().replace(hour=0, minute=0, second=0, microsecond=0)
active_jobs_today = await db.jobs.count_documents({
"created_at": {"$gte": today_start}
})
# Calculate average processing time for completed jobs
avg_processing_pipeline = [
{"$match": {"status": "completed", "created_at": {"$exists": True}, "updated_at": {"$exists": True}}},
@ -390,10 +338,10 @@ async def get_admin_stats(
}
}
]
avg_result = await db.jobs.aggregate(avg_processing_pipeline).to_list(None)
avg_processing_time = avg_result[0]["avg_processing_time"] if avg_result else 0.0
return AdminStatsResponse(
total_users=total_users,
total_jobs=total_jobs,
@ -414,7 +362,7 @@ async def detailed_health_check(
"timestamp": datetime.utcnow().isoformat(),
"components": {}
}
# Check MongoDB
try:
await db.command("ping")
@ -422,7 +370,7 @@ async def detailed_health_check(
except Exception as e:
health_status["components"]["mongodb"] = {"status": "unhealthy", "error": str(e)}
health_status["status"] = "degraded"
# Check Redis (via import to avoid circular dependency)
try:
from ...core.redis import redis_client
@ -434,23 +382,23 @@ async def detailed_health_check(
except Exception as e:
health_status["components"]["redis"] = {"status": "unhealthy", "error": str(e)}
health_status["status"] = "degraded"
# Check GCS (basic check)
try:
from ...services.gcs import gcs_service
# Simple check to see if bucket is accessible
await gcs_service.file_exists("health_check_dummy") # This will return False but won't error if bucket accessible
bucket_exists = await gcs_service.file_exists("health_check_dummy") # This will return False but won't error if bucket accessible
health_status["components"]["gcs"] = {"status": "healthy"}
except Exception as e:
health_status["components"]["gcs"] = {"status": "unhealthy", "error": str(e)}
health_status["status"] = "degraded"
# Check job queue health
try:
from ...tasks import celery_app
inspect = celery_app.control.inspect()
active_tasks = inspect.active()
if active_tasks:
total_active = sum(len(tasks) for tasks in active_tasks.values())
health_status["components"]["celery"] = {
@ -467,7 +415,7 @@ async def detailed_health_check(
except Exception as e:
health_status["components"]["celery"] = {"status": "unhealthy", "error": str(e)}
health_status["status"] = "degraded"
return health_status
@ -479,18 +427,18 @@ async def get_job_statistics(
):
"""Get job processing statistics (reviewer/production/admin only)"""
since_date = datetime.utcnow() - timedelta(days=days)
# Jobs created in period
jobs_in_period = await db.jobs.count_documents({
"created_at": {"$gte": since_date}
})
# Jobs completed in period
jobs_completed = await db.jobs.count_documents({
"status": "completed",
"updated_at": {"$gte": since_date}
})
# Average processing time for completed jobs
avg_pipeline = [
{
@ -519,12 +467,12 @@ async def get_job_statistics(
}
}
]
avg_result = await db.jobs.aggregate(avg_pipeline).to_list(None)
processing_stats = avg_result[0] if avg_result else {
"avg_time": 0, "min_time": 0, "max_time": 0
}
# Current queue status
current_queue_stats = {}
pipeline = [
@ -533,7 +481,7 @@ async def get_job_statistics(
status_counts = await db.jobs.aggregate(pipeline).to_list(None)
for item in status_counts:
current_queue_stats[item["_id"]] = item["count"]
return {
"period_days": days,
"jobs_created": jobs_in_period,
@ -558,7 +506,7 @@ async def admin_force_password_reset(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Cannot reset your own password this way"
)
# Check if user exists
user_doc = await db.users.find_one({"_id": user_id})
if not user_doc:
@ -566,15 +514,15 @@ async def admin_force_password_reset(
status_code=status.HTTP_404_NOT_FOUND,
detail="User not found"
)
# Generate secure temporary password
import secrets
import string
temp_password = ''.join(secrets.choice(
string.ascii_letters + string.digits + "!@#$%"
) for _ in range(16))
# Update password
await db.users.update_one(
{"_id": user_id},
@ -585,10 +533,10 @@ async def admin_force_password_reset(
}
}
)
# TODO: In production, send via secure email instead of returning password
logger.info(f"Admin {current_user.id} reset password for user {user_id}")
return {
"message": "Password reset successfully",
"temporary_password": temp_password,
@ -596,6 +544,47 @@ async def admin_force_password_reset(
}
@router.get("/audit-logs")
async def get_audit_logs(
job_id: Optional[str] = Query(None),
action: Optional[str] = Query(None),
days: int = Query(7, ge=1, le=90),
page: int = Query(1, ge=1),
size: int = Query(50, ge=1, le=200),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Get audit logs with filtering (production/admin only)"""
query = {
"when": {"$gte": datetime.utcnow() - timedelta(days=days)}
}
if job_id:
query["job_id"] = job_id
if action:
query["action"] = action
# Get total count
total = await db.audit_logs.count_documents(query)
# Get paginated results
skip = (page - 1) * size
cursor = (
db.audit_logs.find(query)
.sort("when", -1)
.skip(skip)
.limit(size)
)
logs = await cursor.to_list(length=size)
return {
"logs": logs,
"total": total,
"page": page,
"size": size,
"period_days": days
}
@router.post("/maintenance/reprocess-job/{job_id}")
async def reprocess_job(
@ -611,7 +600,7 @@ async def reprocess_job(
status_code=status.HTTP_404_NOT_FOUND,
detail="Job not found"
)
# Reset job to created status for reprocessing
await db.jobs.update_one(
{"_id": job_id},
@ -631,7 +620,7 @@ async def reprocess_job(
}
}
)
# Broadcast status update
try:
from ...services.websocket import connection_manager
@ -643,36 +632,36 @@ async def reprocess_job(
)
except Exception as e:
logger.warning(f"Failed to broadcast status update for job reset {job_id}: {e}")
# Trigger ingestion task
from ...tasks.ingest_and_ai import ingest_and_ai_task
ingest_and_ai_task.delay(job_id)
logger.warning(f"Admin {current_user.id} triggered reprocessing for job {job_id}")
return {"message": f"Job {job_id} queued for reprocessing"}
@router.get("/audit-logs", response_model=AuditLogResponse)
async def get_audit_logs_detailed(
# Time range
start_date: datetime | None = Query(None, description="Start date for audit logs"),
end_date: datetime | None = Query(None, description="End date for audit logs"),
start_date: Optional[datetime] = Query(None, description="Start date for audit logs"),
end_date: Optional[datetime] = Query(None, description="End date for audit logs"),
# Filters
action: str | None = Query(None, description="Filter by action type"),
severity: str | None = Query(None, description="Filter by severity level"),
user_email: str | None = Query(None, description="Filter by user email"),
resource_type: str | None = Query(None, description="Filter by resource type"),
resource_id: str | None = Query(None, description="Filter by resource ID"),
success: bool | None = Query(None, description="Filter by success status"),
action: Optional[str] = Query(None, description="Filter by action type"),
severity: Optional[str] = Query(None, description="Filter by severity level"),
user_email: Optional[str] = Query(None, description="Filter by user email"),
resource_type: Optional[str] = Query(None, description="Filter by resource type"),
resource_id: Optional[str] = Query(None, description="Filter by resource ID"),
success: Optional[bool] = Query(None, description="Filter by success status"),
# Search
search: str | None = Query(None, description="Search in description and details"),
search: Optional[str] = Query(None, description="Search in description and details"),
# Pagination (skip/limit to match frontend AuditLogQuery)
skip: int = Query(0, ge=0, description="Number of records to skip"),
limit: int = Query(50, ge=1, le=500, description="Max records to return"),
# Pagination
page: int = Query(1, ge=1, description="Page number"),
size: int = Query(50, ge=1, le=500, description="Page size"),
# Sorting
sort_by: str = Query("timestamp", description="Field to sort by"),
@ -682,7 +671,26 @@ async def get_audit_logs_detailed(
request: Request = None,
):
"""Get audit logs with filtering and pagination (production/admin only)"""
# Log audit log access
await audit_logger.log_action(
action="admin.audit.access",
description=f"Admin {current_user.email} accessed audit logs",
user=current_user,
request=request,
details={
"filters": {
"start_date": start_date.isoformat() if start_date else None,
"end_date": end_date.isoformat() if end_date else None,
"action": action,
"severity": severity,
"user_email": user_email,
"resource_type": resource_type,
"search": search
}
}
)
# Build query
query = AuditLogQuery(
start_date=start_date,
@ -694,12 +702,12 @@ async def get_audit_logs_detailed(
resource_id=resource_id,
success=success,
search=search,
skip=skip,
limit=limit,
skip=(page - 1) * size,
limit=size,
sort_by=sort_by,
sort_order=sort_order
)
return await audit_logger.query_logs(query)
@ -708,34 +716,32 @@ async def get_user_audit_logs(
user_id: str,
days: int = Query(30, ge=1, le=365, description="Number of days to look back"),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
request: Request = None,
):
"""Get audit logs for a specific user — accepts user ID or email (production/admin only)"""
import re as _re
# Accept email address: look up user by case-insensitive email match
resolved_id = user_id
if "@" in user_id:
user_doc = await db.users.find_one(
{"email": _re.compile(f"^{_re.escape(user_id)}$", _re.IGNORECASE)},
{"_id": 1},
"""Get audit logs for a specific user (production/admin only)"""
# Validate user_id
try:
ObjectId(user_id)
except Exception:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Invalid user ID format"
)
if user_doc:
resolved_id = str(user_doc["_id"])
logs = await audit_logger.get_user_activity(resolved_id, days)
# Fallback: query by email field in audit logs (case-insensitive via audit_logger)
if not logs and "@" in user_id:
from ...models.audit_log import AuditLogQuery as ALQ
from ...services.audit_logger import audit_logger as al
q = ALQ(user_email=user_id, limit=1000, sort_by="timestamp", sort_order=-1)
result = await al.query_logs(q)
logs = result.logs
return logs
# Log access to user audit logs
await audit_logger.log_action(
action="admin.audit.access",
description=f"Admin {current_user.email} accessed user audit logs for {user_id}",
user=current_user,
request=request,
resource_type="user",
resource_id=user_id,
details={"days_requested": days}
)
logs = await audit_logger.get_user_activity(user_id, days)
return {"logs": logs, "user_id": user_id, "days": days}
@router.get("/audit-logs/security")
@ -745,7 +751,7 @@ async def get_security_events(
request: Request = None,
):
"""Get recent security events (production/admin only)"""
# Log access to security events
await audit_logger.log_action(
action="admin.audit.access",
@ -754,9 +760,9 @@ async def get_security_events(
request=request,
details={"hours_requested": hours}
)
logs = await audit_logger.get_security_events(hours)
return logs
return {"logs": logs, "hours": hours}
@router.delete("/audit-logs/cleanup")
@ -766,7 +772,7 @@ async def cleanup_audit_logs(
request: Request = None,
):
"""Clean up old audit logs (admin only)"""
# Log audit cleanup action
await audit_logger.log_action(
action="admin.system.action",
@ -776,9 +782,9 @@ async def cleanup_audit_logs(
details={"retention_days": retention_days},
severity="warning"
)
deleted_count = await audit_logger.cleanup_old_logs(retention_days)
# Log cleanup completion
await audit_logger.log_action(
action="admin.system.action",
@ -790,9 +796,9 @@ async def cleanup_audit_logs(
"deleted_count": deleted_count
}
)
return {
"message": f"Deleted {deleted_count} audit logs older than {retention_days} days",
"deleted_count": deleted_count,
"retention_days": retention_days
}
}

View file

@ -1,295 +0,0 @@
"""Admin production endpoints: failure dashboard, bulk retry, queue stats, VTT override."""
from datetime import datetime
import redis.asyncio as aioredis
from fastapi import (
APIRouter,
Depends,
File,
Form,
HTTPException,
Query,
UploadFile,
status,
)
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...core.logging import get_logger
from ...core.redis import get_redis
from ...models.audit_log import AuditAction
from ...models.job import JobStatus, RequestedOutputs
from ...models.user import User, UserRole
from ...schemas.job import JobResponse
from ...services.audit_logger import audit_logger
from ...services.cloud_run_dispatch import dispatch as _cr_dispatch
from ...services.gcs import upload_vtt_to_gcs
logger = get_logger(__name__)
router = APIRouter(prefix="/admin/production", tags=["admin-production"])
_FAILURE_STATUSES = [
JobStatus.PROCESSING_FAILED.value,
JobStatus.TTS_FAILED.value,
JobStatus.RENDER_FAILED.value,
]
_RETRY_CAP = 50
class BulkRetryRequest(BaseModel):
job_ids: list[str]
strategy: str = "auto" # "auto" | "from_scratch"
class BulkRetryResponse(BaseModel):
retried: list[str]
skipped: list[str]
errors: list[dict]
@router.get("/failures", response_model=list[JobResponse])
async def list_failures(
step: str | None = Query(None, description="Filter by failure.step"),
org_id: str | None = Query(None, description="Filter by organization_id"),
limit: int = Query(50, ge=1, le=200),
skip: int = Query(0, ge=0),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List all jobs in a failed status, optionally filtered by step and org."""
query: dict = {"status": {"$in": _FAILURE_STATUSES}}
if step:
query["failure.step"] = step
if org_id:
query["organization_id"] = org_id
cursor = db.jobs.find(query).sort("updated_at", -1).skip(skip).limit(limit)
jobs = await cursor.to_list(length=limit)
return [
JobResponse(
id=str(j["_id"]),
title=j["title"],
status=j["status"],
source=j["source"],
requested_outputs=RequestedOutputs(**j["requested_outputs"]),
review=j.get("review", {"notes": "", "history": []}),
outputs=j.get("outputs"),
created_at=j["created_at"].isoformat(),
updated_at=j["updated_at"].isoformat(),
)
for j in jobs
]
@router.post("/bulk-retry", response_model=BulkRetryResponse)
async def bulk_retry(
payload: BulkRetryRequest,
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Retry up to 50 failed jobs in one call."""
if len(payload.job_ids) > _RETRY_CAP:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"Cannot retry more than {_RETRY_CAP} jobs at once",
)
retried: list[str] = []
skipped: list[str] = []
errors: list[dict] = []
now = datetime.utcnow()
for job_id in payload.job_ids:
try:
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
skipped.append(job_id)
continue
if job_doc["status"] not in _FAILURE_STATUSES:
skipped.append(job_id)
continue
failure = job_doc.get("failure") or {}
if payload.strategy == "from_scratch":
step = "ingestion"
else:
step = failure.get("step")
if not step:
step = "tts" if job_doc["status"] == JobStatus.TTS_FAILED.value else "render"
if step in ("ingestion", "ai_processing"):
reset_status = JobStatus.CREATED.value
elif step == "translation":
reset_status = JobStatus.AI_PROCESSING.value
elif step == "tts":
src = job_doc["source"].get("language", "en")
reset_status = (
JobStatus.APPROVED_ENGLISH.value if src == "en" else JobStatus.APPROVED_SOURCE.value
)
elif step == "render":
reset_status = JobStatus.PENDING_QC.value
else:
skipped.append(job_id)
continue
await db.jobs.update_one(
{"_id": job_id},
{
"$set": {"status": reset_status, "error": None, "updated_at": now},
"$inc": {"retry_count": 1},
"$push": {
"review.history": {
"at": now,
"status": f"bulk_retry_{step}",
"by": str(current_user.id),
}
},
},
)
if step in ("ingestion", "ai_processing"):
await _cr_dispatch("ingest", job_id)
elif step in ("translation", "tts"):
await _cr_dispatch("translate", job_id)
elif step == "render":
lang = job_doc.get("last_render_language", "en")
await _cr_dispatch("rerender", job_id, language=lang)
retried.append(job_id)
except Exception as e:
logger.error(f"bulk-retry failed for job {job_id}: {e}")
errors.append({"job_id": job_id, "error": str(e)})
try:
await audit_logger.log(
action=AuditAction.JOB_BULK_RETRY,
user_id=str(current_user.id),
user_email=current_user.email,
user_role=current_user.role.value if current_user.role else None,
resource_type="job",
description=f"Bulk retry {len(retried)} jobs (strategy={payload.strategy})",
details={"retried": retried, "skipped": skipped, "error_count": len(errors)},
)
except Exception as e:
logger.warning(f"Failed to write bulk-retry audit log: {e}")
return BulkRetryResponse(retried=retried, skipped=skipped, errors=errors)
# ---------------------------------------------------------------------------
# PR-7: Queue depth stats
# ---------------------------------------------------------------------------
_CELERY_QUEUES = ["default", "ingest", "tts", "render", "ffmpeg", "whisper", "notify", "embed"]
class QueueStats(BaseModel):
queues: dict[str, int] # queue_name → pending task count
total_pending: int
@router.get("/queue-stats", response_model=QueueStats)
async def get_queue_stats(
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
redis: aioredis.Redis = Depends(get_redis),
):
"""Return pending task counts per Celery queue (via Redis LLEN)."""
counts: dict[str, int] = {}
for q in _CELERY_QUEUES:
try:
n = await redis.llen(q)
counts[q] = n
except Exception:
counts[q] = 0
return QueueStats(queues=counts, total_pending=sum(counts.values()))
# ---------------------------------------------------------------------------
# PR-8: Upload final VTT override — bypass AI, jump to PENDING_QC
# ---------------------------------------------------------------------------
_BYPASSABLE_STATUSES = {
JobStatus.CREATED.value,
JobStatus.INGESTING.value,
JobStatus.AI_PROCESSING.value,
JobStatus.PROCESSING_FAILED.value,
JobStatus.TTS_FAILED.value,
JobStatus.RENDER_FAILED.value,
}
@router.post("/jobs/{job_id}/upload-final-vtt")
async def upload_final_vtt(
job_id: str,
language: str = Form(..., description="BCP-47 language code, e.g. 'en' or 'fr'"),
vtt_file: UploadFile = File(..., description="WebVTT (.vtt) file"),
vtt_type: str = Form("captions", description="'captions' or 'ad'"),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Upload a hand-crafted VTT to override AI output and advance job to PENDING_QC."""
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
if job_doc["status"] not in _BYPASSABLE_STATUSES:
raise HTTPException(
status_code=status.HTTP_409_CONFLICT,
detail=f"Cannot override VTT when job is in status '{job_doc['status']}'. "
f"Only allowed in: {sorted(_BYPASSABLE_STATUSES)}",
)
if not vtt_file.filename or not vtt_file.filename.endswith(".vtt"):
raise HTTPException(status_code=400, detail="File must be a .vtt file")
vtt_content = (await vtt_file.read()).decode("utf-8")
if not vtt_content.strip().startswith("WEBVTT"):
raise HTTPException(status_code=400, detail="File does not start with WEBVTT header")
if vtt_type not in ("captions", "ad"):
raise HTTPException(status_code=400, detail="vtt_type must be 'captions' or 'ad'")
lang_key = language.replace("-", "_")
field = "captions_vtt_gcs" if vtt_type == "captions" else "ad_vtt_gcs"
gcs_path = f"{job_id}/{lang_key}/{vtt_type}.vtt"
gcs_uri = await upload_vtt_to_gcs(vtt_content, gcs_path)
now = datetime.utcnow()
await db.jobs.update_one(
{"_id": job_id},
{
"$set": {
f"outputs.{lang_key}.{field}": gcs_uri,
"status": JobStatus.PENDING_QC.value,
"updated_at": now,
},
"$push": {
"review.history": {
"at": now,
"status": "manual_vtt_upload",
"by": str(current_user.id),
"note": f"Manual {vtt_type} VTT upload for {language} by {current_user.email}",
}
},
},
)
try:
await audit_logger.log(
action=AuditAction.VTT_EDIT,
user_id=str(current_user.id),
user_email=current_user.email,
user_role=current_user.role.value if current_user.role else None,
resource_type="job",
resource_id=job_id,
description=f"Manual {vtt_type} VTT upload for {language} — job advanced to PENDING_QC",
)
except Exception as e:
logger.warning(f"Failed to write upload-final-vtt audit log: {e}")
return {"status": "ok", "gcs_uri": gcs_uri, "job_status": JobStatus.PENDING_QC.value}

View file

@ -1,126 +1,112 @@
import re
import secrets
from datetime import datetime
from fastapi import APIRouter, Depends, HTTPException, Request, Response, status
from fastapi.security import HTTPBearer
from motor.motor_asyncio import AsyncIOMotorDatabase
from motor.motor_asyncio import AsyncIOMotorClient, AsyncIOMotorDatabase
from ...core.config import settings
from ...core.database import get_database
from ...core.logging import get_logger
from ...core.security import (
create_access_token,
create_refresh_token,
decode_token,
verify_password,
)
from ...models.audit_log import AuditAction, AuditLogSeverity
from ...models.user import AuthProvider, User, UserRole
from ...models.user import User, AuthProvider, UserRole
from ...schemas.auth import (
LoginRequest,
LoginResponse,
LogoutResponse,
RefreshResponse,
MicrosoftLoginRequest,
MicrosoftLoginResponse,
RefreshResponse,
)
from ...services.audit_logger import audit_logger, log_auth_failure, log_auth_success
from ...services.microsoft_auth import (
MicrosoftAuthError,
MicrosoftTokenValidationError,
get_microsoft_auth_service,
MicrosoftTokenValidationError,
MicrosoftAuthError,
)
from ...services.audit_logger import log_auth_success, log_auth_failure, audit_logger
from ...models.audit_log import AuditAction, AuditLogSeverity
logger = get_logger(__name__)
router = APIRouter(prefix="/auth", tags=["auth"])
security = HTTPBearer()
async def _get_user_org_ids(user_id: str, db: AsyncIOMotorDatabase) -> list[str]:
"""Return list of org IDs the user belongs to — used as a JWT hint only."""
cursor = db.memberships.find({"user_id": user_id}, {"organization_id": 1})
memberships = await cursor.to_list(length=200)
return [str(m["organization_id"]) for m in memberships if m.get("organization_id")]
def _set_auth_cookies(response: Response, refresh_token: str) -> str:
"""Set httponly refresh_token cookie and readable csrf_token cookie. Returns the csrf token."""
csrf_token = secrets.token_hex(32)
ttl = settings.jwt_refresh_ttl_days * 24 * 60 * 60
domain = settings.cookie_domain if settings.app_env == "prod" else None
response.set_cookie(
key="refresh_token",
value=refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=domain,
max_age=ttl,
)
response.set_cookie(
key="csrf_token",
value=csrf_token,
httponly=False, # JS-readable for Double Submit Cookie pattern
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=domain,
max_age=ttl,
)
return csrf_token
@router.post("/login", response_model=LoginResponse)
async def login(
login_data: LoginRequest,
request: Request,
response: Response,
db: AsyncIOMotorDatabase = Depends(get_database),
):
user_doc = await db.users.find_one({"email": login_data.email})
if not user_doc:
await log_auth_failure(login_data.email, request, "User not found")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
print(f"LOGIN: Starting login for {login_data.email}")
# Create database connection directly (bypass dependency injection issues)
client = AsyncIOMotorClient(settings.mongodb_uri)
db = client[settings.mongodb_db]
try:
print("LOGIN: Database connection created")
# Find user by email
print("LOGIN: Looking up user in database")
user_doc = await db.users.find_one({"email": login_data.email})
print(f"LOGIN: User lookup complete, found: {user_doc is not None}")
if not user_doc:
await log_auth_failure(login_data.email, request, "User not found")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
)
user = User(**user_doc)
# Check if user uses Microsoft authentication
if user.auth_provider == AuthProvider.MICROSOFT:
await log_auth_failure(login_data.email, request, "Account uses Microsoft SSO")
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="This account uses Microsoft authentication. Please sign in with Microsoft.",
)
# Verify password
if not user.hashed_password or not verify_password(login_data.password, user.hashed_password):
await log_auth_failure(login_data.email, request, "Invalid password")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
)
if not user.is_active:
await log_auth_failure(login_data.email, request, "Account disabled")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account is disabled",
)
# Create tokens
access_token = create_access_token(subject=str(user.id))
refresh_token = create_refresh_token(subject=str(user.id))
# Set refresh token as HttpOnly cookie
response.set_cookie(
key="refresh_token",
value=refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=settings.cookie_domain if settings.app_env == "prod" else None,
max_age=settings.jwt_refresh_ttl_days * 24 * 60 * 60,
)
user = User(**user_doc)
if user.auth_provider == AuthProvider.MICROSOFT:
await log_auth_failure(login_data.email, request, "Account uses Microsoft SSO")
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="This account uses Microsoft authentication. Please sign in with Microsoft.",
await log_auth_success(user, request)
return LoginResponse(
access_token=access_token,
user_id=str(user.id),
role=user.role,
)
if not user.hashed_password or not verify_password(login_data.password, user.hashed_password):
await log_auth_failure(login_data.email, request, "Invalid password")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
)
if not user.is_active:
await log_auth_failure(login_data.email, request, "Account disabled")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account is disabled",
)
org_ids = await _get_user_org_ids(str(user.id), db)
access_token = create_access_token(subject=str(user.id), org_ids=org_ids)
refresh_token = create_refresh_token(subject=str(user.id))
_set_auth_cookies(response, refresh_token)
await log_auth_success(user, request)
return LoginResponse(
access_token=access_token,
user_id=str(user.id),
role=user.role,
)
finally:
# Close database connection
client.close()
@router.post("/microsoft", response_model=MicrosoftLoginResponse)
@ -128,84 +114,127 @@ async def microsoft_login(
login_data: MicrosoftLoginRequest,
request: Request,
response: Response,
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Authenticate user with Microsoft ID token.
This endpoint validates the Microsoft ID token, finds or creates the user,
and returns JWT tokens for API access.
"""
microsoft_auth = get_microsoft_auth_service()
print(f"MICROSOFT LOGIN: Starting Microsoft authentication")
# Create database connection
client = AsyncIOMotorClient(settings.mongodb_uri)
db = client[settings.mongodb_db]
try:
user_info = await microsoft_auth.validate_token(login_data.id_token)
except MicrosoftTokenValidationError as e:
await log_auth_failure(login_data.id_token[:20] + "", request, f"MS token invalid: {e}")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail=f"Microsoft authentication failed: {str(e)}",
) from None
except MicrosoftAuthError as e:
await log_auth_failure("microsoft-sso", request, f"MS auth service error: {e}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Microsoft authentication service error",
) from None
# Look up by Microsoft-derived ID first — handles email casing changes across logins
ms_user_id = f"ms-{user_info.sub[:20]}"
user_doc = await db.users.find_one({"_id": ms_user_id})
if not user_doc:
# Fall back to case-insensitive email lookup (handles local-to-Microsoft migration)
user_doc = await db.users.find_one(
{"email": {"$regex": f"^{re.escape(user_info.email)}$", "$options": "i"}}
)
if user_doc:
user = User(**user_doc)
if user.auth_provider == AuthProvider.LOCAL:
await db.users.update_one(
{"_id": user_doc["_id"]},
{"$set": {"auth_provider": AuthProvider.MICROSOFT.value, "updated_at": datetime.utcnow()}},
# Validate Microsoft token
microsoft_auth = get_microsoft_auth_service()
try:
user_info = microsoft_auth.validate_token(login_data.id_token)
print(f"MICROSOFT LOGIN: Token validated for {user_info.email}")
except MicrosoftTokenValidationError as e:
print(f"MICROSOFT LOGIN ERROR: Token validation failed: {e}")
await log_auth_failure(login_data.id_token[:20] + "", request, f"MS token invalid: {e}")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail=f"Microsoft authentication failed: {str(e)}",
)
except MicrosoftAuthError as e:
print(f"MICROSOFT LOGIN ERROR: Authentication error: {e}")
await log_auth_failure("microsoft-sso", request, f"MS auth service error: {e}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Microsoft authentication service error",
)
user.auth_provider = AuthProvider.MICROSOFT
else:
new_user = {
"_id": ms_user_id,
"email": user_info.email,
"full_name": user_info.name,
"hashed_password": None,
"role": UserRole.CLIENT.value,
"auth_provider": AuthProvider.MICROSOFT.value,
"is_active": True,
"pm_client_ids": [],
"created_at": datetime.utcnow(),
"updated_at": datetime.utcnow(),
}
await db.users.insert_one(new_user)
user = User(**new_user)
if not user.is_active:
await log_auth_failure(user.email, request, "Account disabled")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account is disabled",
# Find or create user
# Look up by Microsoft-derived ID first — handles email casing changes across logins
# (Microsoft can return vadymsamoilenko@... vs VadymSamoilenko@... for the same user)
ms_user_id = f"ms-{user_info.sub[:20]}"
user_doc = await db.users.find_one({"_id": ms_user_id})
if not user_doc:
# Fall back to case-insensitive email lookup (handles local-to-Microsoft migration)
user_doc = await db.users.find_one(
{"email": {"$regex": f"^{re.escape(user_info.email)}$", "$options": "i"}}
)
if user_doc:
# User exists
user = User(**user_doc)
print(f"MICROSOFT LOGIN: Existing user found: {user.id}")
# Update auth_provider if user is switching from local to Microsoft
if user.auth_provider == AuthProvider.LOCAL:
print(f"MICROSOFT LOGIN: Updating user to Microsoft auth provider")
await db.users.update_one(
{"_id": user_doc["_id"]},
{
"$set": {
"auth_provider": AuthProvider.MICROSOFT.value,
"updated_at": datetime.utcnow()
}
}
)
user.auth_provider = AuthProvider.MICROSOFT
else:
# Create new user with zero org memberships (SaaS model).
# They will see a "no access" landing until an admin invites them.
print(f"MICROSOFT LOGIN: Creating new user for {user_info.email}")
new_user = {
"_id": ms_user_id,
"email": user_info.email,
"full_name": user_info.name,
"hashed_password": None,
"role": UserRole.CLIENT.value,
"auth_provider": AuthProvider.MICROSOFT.value,
"is_active": True,
"pm_client_ids": [],
"created_at": datetime.utcnow(),
"updated_at": datetime.utcnow(),
}
await db.users.insert_one(new_user)
user = User(**new_user)
print(f"MICROSOFT LOGIN: New user created (zero memberships): {user.id}")
# Check if user is active
if not user.is_active:
await log_auth_failure(user.email, request, "Account disabled")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account is disabled",
)
# Create JWT tokens
access_token = create_access_token(subject=str(user.id))
refresh_token = create_refresh_token(subject=str(user.id))
# Set refresh token as HttpOnly cookie
response.set_cookie(
key="refresh_token",
value=refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=settings.cookie_domain if settings.app_env == "prod" else None,
max_age=settings.jwt_refresh_ttl_days * 24 * 60 * 60,
)
org_ids = await _get_user_org_ids(str(user.id), db)
access_token = create_access_token(subject=str(user.id), org_ids=org_ids)
refresh_token = create_refresh_token(subject=str(user.id))
print(f"MICROSOFT LOGIN: Authentication successful for {user.email}")
await log_auth_success(user, request)
return MicrosoftLoginResponse(
access_token=access_token,
user_id=str(user.id),
role=user.role if isinstance(user.role, str) else user.role.value,
email=user.email,
full_name=user.full_name,
auth_provider=user.auth_provider,
)
_set_auth_cookies(response, refresh_token)
await log_auth_success(user, request)
return MicrosoftLoginResponse(
access_token=access_token,
user_id=str(user.id),
role=user.role if isinstance(user.role, str) else user.role.value,
email=user.email,
full_name=user.full_name,
auth_provider=user.auth_provider,
)
finally:
# Close database connection
client.close()
@router.post("/refresh", response_model=RefreshResponse)
@ -215,32 +244,29 @@ async def refresh_token(
db: AsyncIOMotorDatabase = Depends(get_database),
):
refresh_token = request.cookies.get("refresh_token")
print(f"🔍 REFRESH DEBUG: Cookie exists: {bool(refresh_token)}")
if not refresh_token:
print("🚨 REFRESH ERROR: No refresh token in cookies")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Refresh token not found",
)
# CSRF protection: Double Submit Cookie pattern
csrf_cookie = request.cookies.get("csrf_token")
csrf_header = request.headers.get("X-CSRF-Token")
if csrf_cookie and (not csrf_header or csrf_header != csrf_cookie):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="CSRF token mismatch",
)
try:
print(f"🔍 REFRESH DEBUG: Attempting to decode token...")
payload = decode_token(refresh_token)
print(f"🔍 REFRESH DEBUG: Token decoded successfully, type={payload.get('type')}")
if payload.get("type") != "refresh":
print(f"🚨 REFRESH ERROR: Wrong token type: {payload.get('type')}")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid token type",
)
user_id = payload.get("sub")
print(f"🔍 REFRESH DEBUG: User ID from token: {user_id}")
if not user_id:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
@ -262,15 +288,22 @@ async def refresh_token(
detail="User account is disabled",
)
# Create new tokens (include org_ids claim for prefilter hint)
_org_ids = await _get_user_org_ids(user_id, db)
new_access_token = create_access_token(subject=user_id, org_ids=_org_ids)
# Create new tokens
new_access_token = create_access_token(subject=user_id)
new_refresh_token = create_refresh_token(subject=user_id)
# Rotate both refresh and CSRF cookies
_set_auth_cookies(response, new_refresh_token)
# Update refresh token cookie
response.set_cookie(
key="refresh_token",
value=new_refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=settings.cookie_domain if settings.app_env == "prod" else None,
max_age=settings.jwt_refresh_ttl_days * 24 * 60 * 60,
)
logger.info("Token refresh successful for user %s", user_id)
print(f"🔍 REFRESH DEBUG: Refresh successful for user {user_id}")
return RefreshResponse(
access_token=new_access_token,
user_id=user_id,
@ -279,15 +312,14 @@ async def refresh_token(
full_name=user.full_name
)
except HTTPException:
raise
except Exception as e:
print(f"🚨 REFRESH ERROR: Exception during refresh: {type(e).__name__}: {e}")
import traceback
logger.exception("Refresh token error: %s\n%s", type(e).__name__, traceback.format_exc())
print(f"Traceback:\n{traceback.format_exc()}")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid refresh token",
) from None
detail=f"Invalid refresh token: {str(e)}",
)
@router.post("/logout", response_model=LogoutResponse)

View file

@ -1,245 +0,0 @@
"""Job Brief CRUD endpoints."""
from datetime import datetime
from fastapi import APIRouter, Depends, HTTPException, Request, status
from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, assert_user_in_org, get_membership_context
from ...core.database import get_database
from ...core.logging import get_logger
from ...models.audit_log import AuditAction
from ...models.job_brief import (
BriefStatus,
JobBriefCreate,
JobBriefResponse,
JobBriefUpdate,
)
from ...models.organization import OrgRole
from ...services.audit_logger import audit_logger
logger = get_logger(__name__)
router = APIRouter(prefix="/briefs", tags=["briefs"])
def _doc_to_response(doc: dict) -> JobBriefResponse:
return JobBriefResponse(
id=str(doc["_id"]),
organization_id=doc["organization_id"],
project_id=doc.get("project_id"),
title=doc["title"],
description=doc.get("description"),
requested_outputs=doc["requested_outputs"],
languages=doc.get("languages", []),
deadline=doc.get("deadline"),
status=doc["status"],
created_by=doc["created_by"],
assignee_id=doc.get("assignee_id"),
job_id=doc.get("job_id"),
created_at=doc["created_at"].isoformat(),
updated_at=doc["updated_at"].isoformat(),
submitted_at=doc["submitted_at"].isoformat() if doc.get("submitted_at") else None,
approved_by=doc.get("approved_by"),
)
@router.get("", response_model=list[JobBriefResponse])
async def list_briefs(
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
org_ids = [m.organization_id for m in ctx.memberships] if hasattr(ctx, "memberships") else []
if ctx.is_platform_admin:
query: dict = {}
elif org_ids:
query = {"organization_id": {"$in": org_ids}}
else:
raise HTTPException(status_code=403, detail="No org memberships")
cursor = db.job_briefs.find(query).sort("created_at", -1).limit(100)
docs = await cursor.to_list(length=100)
return [_doc_to_response(d) for d in docs]
@router.post("", response_model=JobBriefResponse, status_code=status.HTTP_201_CREATED)
async def create_brief(
payload: JobBriefCreate,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
# Resolve org from project if not directly identifiable
org_id: str | None = None
if payload.project_id:
project = await db.projects.find_one({"_id": payload.project_id}, {"client_id": 1})
if project:
org_id = project.get("client_id")
if not org_id:
# Use first membership org if user has only one (or admin)
if ctx.is_platform_admin:
raise HTTPException(status_code=400, detail="Admin must supply project_id or org_id cannot be inferred")
memberships = [m for m in (ctx.memberships if hasattr(ctx, "memberships") else [])
if ctx.can_access_org(m.organization_id, OrgRole.MANAGER)]
if len(memberships) == 1:
org_id = memberships[0].organization_id
else:
raise HTTPException(status_code=400, detail="Cannot infer organization; supply project_id")
assert_user_in_org(ctx, org_id, OrgRole.MANAGER)
now = datetime.utcnow()
doc = {
"_id": f"brief_{now.strftime('%Y%m%d%H%M%S%f')}_{str(ctx.user.id)[-6:]}",
"organization_id": org_id,
"project_id": payload.project_id,
"title": payload.title,
"description": payload.description,
"requested_outputs": payload.requested_outputs.model_dump(),
"languages": payload.languages,
"deadline": payload.deadline,
"assignee_id": payload.assignee_id,
"status": BriefStatus.DRAFT.value,
"created_by": str(ctx.user.id),
"job_id": None,
"created_at": now,
"updated_at": now,
"submitted_at": None,
"approved_by": None,
}
await db.job_briefs.insert_one(doc)
await audit_logger.log_action(
action=AuditAction.BRIEF_CREATE,
description=f"Brief '{payload.title}' created",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=str(doc["_id"]),
details={"title": payload.title, "organization_id": org_id},
)
return _doc_to_response(doc)
@router.get("/{brief_id}", response_model=JobBriefResponse)
async def get_brief(
brief_id: str,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.VIEWER)
return _doc_to_response(doc)
@router.patch("/{brief_id}", response_model=JobBriefResponse)
async def update_brief(
brief_id: str,
payload: JobBriefUpdate,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.MANAGER)
if doc["status"] != BriefStatus.DRAFT.value:
raise HTTPException(status_code=400, detail="Only DRAFT briefs can be updated")
updates: dict = {"updated_at": datetime.utcnow()}
if payload.title is not None:
updates["title"] = payload.title
if payload.description is not None:
updates["description"] = payload.description
if payload.requested_outputs is not None:
updates["requested_outputs"] = payload.requested_outputs.model_dump()
if payload.languages is not None:
updates["languages"] = payload.languages
if payload.deadline is not None:
updates["deadline"] = payload.deadline
result = await db.job_briefs.find_one_and_update(
{"_id": brief_id},
{"$set": updates},
return_document=True,
)
await audit_logger.log_action(
action=AuditAction.BRIEF_UPDATE,
description=f"Brief '{brief_id}' updated",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=brief_id,
details={"fields_updated": list(updates.keys())},
)
return _doc_to_response(result)
@router.post("/{brief_id}/submit", response_model=JobBriefResponse)
async def submit_brief(
brief_id: str,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.MANAGER)
if doc["status"] != BriefStatus.DRAFT.value:
raise HTTPException(status_code=400, detail="Only DRAFT briefs can be submitted")
now = datetime.utcnow()
result = await db.job_briefs.find_one_and_update(
{"_id": brief_id},
{"$set": {"status": BriefStatus.SUBMITTED.value, "submitted_at": now, "updated_at": now}},
return_document=True,
)
await audit_logger.log_action(
action=AuditAction.BRIEF_SUBMIT,
description=f"Brief '{brief_id}' submitted for review",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=brief_id,
details={"organization_id": result.get("organization_id")},
)
return _doc_to_response(result)
@router.post("/{brief_id}/approve", response_model=JobBriefResponse)
async def approve_brief(
brief_id: str,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.ADMIN)
if doc["status"] != BriefStatus.SUBMITTED.value:
raise HTTPException(status_code=400, detail="Only SUBMITTED briefs can be approved")
now = datetime.utcnow()
result = await db.job_briefs.find_one_and_update(
{"_id": brief_id},
{
"$set": {
"status": BriefStatus.APPROVED.value,
"approved_by": str(ctx.user.id),
"updated_at": now,
}
},
return_document=True,
)
await audit_logger.log_action(
action=AuditAction.BRIEF_APPROVE,
description=f"Brief '{brief_id}' approved",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=brief_id,
details={"organization_id": result.get("organization_id")},
)
return _doc_to_response(result)

View file

@ -9,16 +9,15 @@ Access rules:
- List projects (read) Admin, PM, or any team member of the client
"""
from datetime import UTC, datetime
from datetime import datetime, timezone
from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException, Request
from fastapi import APIRouter, Depends, HTTPException
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel
from ...core.database import get_database
from ...core.dependencies import get_current_user, require_roles
from ...models.audit_log import AuditAction
from ...core.dependencies import get_current_user, require_pm_for_client, require_roles
from ...models.client import (
Client,
ClientCreate,
@ -31,7 +30,6 @@ from ...models.client import (
TeamUpdate,
)
from ...models.user import User, UserRole
from ...services.audit_logger import audit_logger
router = APIRouter(prefix="/clients", tags=["clients"])
@ -41,7 +39,7 @@ router = APIRouter(prefix="/clients", tags=["clients"])
# ---------------------------------------------------------------------------
def _now() -> datetime:
return datetime.now(UTC)
return datetime.now(timezone.utc)
async def _get_client_or_404(client_id: str, db: AsyncIOMotorDatabase) -> dict:
@ -93,9 +91,6 @@ def _project_from_doc(doc: dict) -> Project:
name=doc["name"],
client_id=doc["client_id"],
is_active=doc.get("is_active", True),
default_languages=doc.get("default_languages", []),
default_linguist_id=doc.get("default_linguist_id"),
default_reviewer_id=doc.get("default_reviewer_id"),
created_at=doc.get("created_at"),
updated_at=doc.get("updated_at"),
)
@ -123,7 +118,6 @@ async def list_clients(
@router.post("", response_model=Client)
async def create_client(
body: ClientCreate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
@ -140,18 +134,7 @@ async def create_client(
"updated_at": now,
})
doc = await db.clients.find_one({"_id": client_id})
client = _client_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_CREATE,
description=f"Client '{client.name}' created",
user=current_user,
request=request,
resource_type="client",
resource_id=str(client.id),
resource_name=client.name,
details={"slug": client.slug},
)
return client
return _client_from_doc(doc)
@router.get("/{client_id}", response_model=Client)
@ -172,12 +155,11 @@ async def get_client(
async def update_client(
client_id: str,
body: ClientUpdate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
await _get_client_or_404(client_id, db)
update: dict = dict(body.model_dump(exclude_none=True).items())
update: dict = {k: v for k, v in body.model_dump(exclude_none=True).items()}
if not update:
raise HTTPException(status_code=422, detail="No fields to update")
if "slug" in update and await db.clients.find_one({"slug": update["slug"], "_id": {"$ne": client_id}}):
@ -185,39 +167,17 @@ async def update_client(
update["updated_at"] = _now()
await db.clients.update_one({"_id": client_id}, {"$set": update})
doc = await db.clients.find_one({"_id": client_id})
client = _client_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_UPDATE,
description=f"Client '{client.name}' updated",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client.name,
details={"fields_updated": list(body.model_dump(exclude_none=True).keys())},
)
return client
return _client_from_doc(doc)
@router.delete("/{client_id}", status_code=204)
async def deactivate_client(
client_id: str,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await _get_client_or_404(client_id, db)
await _get_client_or_404(client_id, db)
await db.clients.update_one({"_id": client_id}, {"$set": {"is_active": False, "updated_at": _now()}})
await audit_logger.log_action(
action=AuditAction.CLIENT_DEACTIVATE,
description=f"Client '{doc['name']}' deactivated",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=doc["name"],
details={"was_active": doc.get("is_active", True)},
)
# ---------------------------------------------------------------------------
@ -232,11 +192,10 @@ class AssignPMRequest(BaseModel):
async def assign_pm(
client_id: str,
body: AssignPMRequest,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
client_doc = await _get_client_or_404(client_id, db)
await _get_client_or_404(client_id, db)
user_doc = await db.users.find_one({"_id": body.user_id})
if not user_doc:
raise HTTPException(status_code=404, detail="User not found")
@ -247,28 +206,16 @@ async def assign_pm(
"$set": {"role": UserRole.PROJECT_MANAGER.value, "updated_at": _now()},
},
)
await audit_logger.log_action(
action=AuditAction.CLIENT_PM_ASSIGN,
description=f"PM '{user_doc.get('email', body.user_id)}' assigned to client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"pm_user_id": body.user_id, "pm_email": user_doc.get("email")},
)
@router.delete("/{client_id}/pm/{user_id}", status_code=204)
async def remove_pm(
client_id: str,
user_id: str,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
client_doc = await _get_client_or_404(client_id, db)
pm_doc = await db.users.find_one({"_id": user_id})
await _get_client_or_404(client_id, db)
await db.users.update_one(
{"_id": user_id},
{"$pull": {"pm_client_ids": client_id}, "$set": {"updated_at": _now()}},
@ -280,16 +227,6 @@ async def remove_pm(
{"_id": user_id},
{"$set": {"role": UserRole.CLIENT.value, "updated_at": _now()}},
)
await audit_logger.log_action(
action=AuditAction.CLIENT_PM_REMOVE,
description=f"PM '{pm_doc.get('email', user_id) if pm_doc else user_id}' removed from client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"pm_user_id": user_id, "pm_email": pm_doc.get("email") if pm_doc else None},
)
@router.get("/{client_id}/pm", response_model=list[dict])
@ -326,11 +263,10 @@ async def list_teams(
async def create_team(
client_id: str,
body: TeamCreate,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
client_doc = await _get_client_or_404(client_id, db)
await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
now = _now()
team_id = str(ObjectId())
@ -343,18 +279,7 @@ async def create_team(
"updated_at": now,
})
doc = await db.teams.find_one({"_id": team_id})
team = _team_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_CREATE,
description=f"Team '{team.name}' created for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team.name},
)
return team
return _team_from_doc(doc)
@router.patch("/{client_id}/teams/{team_id}", response_model=Team)
@ -362,55 +287,32 @@ async def update_team(
client_id: str,
team_id: str,
body: TeamUpdate,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
client_doc = await _get_client_or_404(client_id, db)
await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
await _get_team_or_404(team_id, client_id, db)
update = dict(body.model_dump(exclude_none=True).items())
update = {k: v for k, v in body.model_dump(exclude_none=True).items()}
if not update:
raise HTTPException(status_code=422, detail="No fields to update")
update["updated_at"] = _now()
await db.teams.update_one({"_id": team_id}, {"$set": update})
doc = await db.teams.find_one({"_id": team_id})
team = _team_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_UPDATE,
description=f"Team '{team.name}' updated for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team.name, "fields_updated": list(body.model_dump(exclude_none=True).keys())},
)
return team
return _team_from_doc(doc)
@router.delete("/{client_id}/teams/{team_id}", status_code=204)
async def delete_team(
client_id: str,
team_id: str,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
client_doc = await _get_client_or_404(client_id, db)
await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
team_doc = await _get_team_or_404(team_id, client_id, db)
await _get_team_or_404(team_id, client_id, db)
await db.teams.delete_one({"_id": team_id})
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_DELETE,
description=f"Team '{team_doc['name']}' deleted from client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team_doc["name"]},
)
# Team membership
@ -424,35 +326,18 @@ async def add_team_member(
client_id: str,
team_id: str,
body: AddMemberRequest,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
client_doc = await _get_client_or_404(client_id, db)
await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
team_doc = await _get_team_or_404(team_id, client_id, db)
member_doc = await db.users.find_one({"_id": body.user_id})
if not member_doc:
await _get_team_or_404(team_id, client_id, db)
if not await db.users.find_one({"_id": body.user_id}):
raise HTTPException(status_code=404, detail="User not found")
# Write to both Team.member_user_ids (legacy) and Membership.team_ids (MT-17)
await db.teams.update_one(
{"_id": team_id},
{"$addToSet": {"member_user_ids": body.user_id}, "$set": {"updated_at": _now()}},
)
await db.memberships.update_one(
{"user_id": body.user_id, "organization_id": client_id},
{"$addToSet": {"team_ids": team_id}},
)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_MEMBER_ADD,
description=f"User '{member_doc.get('email', body.user_id)}' added to team '{team_doc['name']}' of client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team_doc["name"], "member_user_id": body.user_id, "member_email": member_doc.get("email")},
)
@router.delete("/{client_id}/teams/{team_id}/members/{user_id}", status_code=204)
@ -460,56 +345,22 @@ async def remove_team_member(
client_id: str,
team_id: str,
user_id: str,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
client_doc = await _get_client_or_404(client_id, db)
await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
team_doc = await _get_team_or_404(team_id, client_id, db)
member_doc = await db.users.find_one({"_id": user_id})
await _get_team_or_404(team_id, client_id, db)
await db.teams.update_one(
{"_id": team_id},
{"$pull": {"member_user_ids": user_id}, "$set": {"updated_at": _now()}},
)
await db.memberships.update_one(
{"user_id": user_id, "organization_id": client_id},
{"$pull": {"team_ids": team_id}},
)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_MEMBER_REMOVE,
description=f"User '{member_doc.get('email', user_id) if member_doc else user_id}' removed from team '{team_doc['name']}' of client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team_doc["name"], "member_user_id": user_id, "member_email": member_doc.get("email") if member_doc else None},
)
# ---------------------------------------------------------------------------
# Project endpoints
# ---------------------------------------------------------------------------
@router.get("/all-projects", response_model=list[Project])
async def list_all_projects(
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Return all active projects accessible to the current user (across all clients)."""
if current_user.role in (UserRole.ADMIN, UserRole.PRODUCTION, UserRole.PROJECT_MANAGER):
docs = await db.projects.find({"is_active": True}).to_list(None)
else:
accessible_client_ids = await _get_accessible_client_ids(current_user, db)
if not accessible_client_ids:
return []
docs = await db.projects.find(
{"client_id": {"$in": accessible_client_ids}, "is_active": True}
).to_list(None)
return [_project_from_doc(d) for d in docs]
@router.get("/{client_id}/projects", response_model=list[Project])
async def list_projects(
client_id: str,
@ -526,12 +377,11 @@ async def list_projects(
async def create_project(
client_id: str,
body: ProjectCreate,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
client_doc = await _get_client_or_404(client_id, db)
await _assert_pm_or_client_member(current_user, client_id, db)
await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
now = _now()
project_id = str(ObjectId())
await db.projects.insert_one({
@ -539,25 +389,11 @@ async def create_project(
"name": body.name,
"client_id": client_id,
"is_active": True,
"default_languages": body.default_languages,
"default_linguist_id": body.default_linguist_id,
"default_reviewer_id": body.default_reviewer_id,
"created_at": now,
"updated_at": now,
})
doc = await db.projects.find_one({"_id": project_id})
project = _project_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_PROJECT_CREATE,
description=f"Project '{project.name}' created for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"project_id": project_id, "project_name": project.name, "default_languages": body.default_languages},
)
return project
return _project_from_doc(doc)
@router.patch("/{client_id}/projects/{project_id}", response_model=Project)
@ -565,58 +401,35 @@ async def update_project(
client_id: str,
project_id: str,
body: ProjectUpdate,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
client_doc = await _get_client_or_404(client_id, db)
await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
await _get_project_or_404(project_id, client_id, db)
update = dict(body.model_dump(exclude_none=True).items())
update = {k: v for k, v in body.model_dump(exclude_none=True).items()}
if not update:
raise HTTPException(status_code=422, detail="No fields to update")
update["updated_at"] = _now()
await db.projects.update_one({"_id": project_id}, {"$set": update})
doc = await db.projects.find_one({"_id": project_id})
project = _project_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_PROJECT_UPDATE,
description=f"Project '{project.name}' updated for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"project_id": project_id, "project_name": project.name, "fields_updated": list(body.model_dump(exclude_none=True).keys())},
)
return project
return _project_from_doc(doc)
@router.delete("/{client_id}/projects/{project_id}", status_code=204)
async def archive_project(
client_id: str,
project_id: str,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
client_doc = await _get_client_or_404(client_id, db)
await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
project_doc = await _get_project_or_404(project_id, client_id, db)
await _get_project_or_404(project_id, client_id, db)
await db.projects.update_one(
{"_id": project_id},
{"$set": {"is_active": False, "updated_at": _now()}},
)
await audit_logger.log_action(
action=AuditAction.CLIENT_PROJECT_ARCHIVE,
description=f"Project '{project_doc['name']}' archived for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"project_id": project_id, "project_name": project_doc["name"]},
)
# ---------------------------------------------------------------------------
@ -636,37 +449,6 @@ async def _assert_pm_or_admin(user: User, client_id: str, db: AsyncIOMotorDataba
raise HTTPException(status_code=403, detail="Not a manager for this client")
async def _assert_pm_or_client_member(user: User, client_id: str, db: AsyncIOMotorDatabase) -> None:
"""Allow PM/ADMIN/PROD or any org member (CLIENT role) with membership in this client's org."""
if user.role in (UserRole.ADMIN, UserRole.PRODUCTION):
return
if user.role == UserRole.PROJECT_MANAGER:
if client_id in (user.pm_client_ids or []):
return
mem = await db.memberships.find_one({"user_id": str(user.id), "organization_id": client_id})
if mem and mem.get("role_in_org") in ("owner", "admin", "manager"):
return
# Allow CLIENT users who are members of the org
if user.role == UserRole.CLIENT:
mem = await db.memberships.find_one({"user_id": str(user.id), "organization_id": client_id})
if mem:
return
raise HTTPException(status_code=403, detail="Not authorized to create projects for this client")
async def _get_accessible_client_ids(user: User, db: AsyncIOMotorDatabase) -> list[str]:
"""Return list of client_ids the user can access."""
ids: set[str] = set()
# PM assignments (legacy)
if user.pm_client_ids:
ids.update(user.pm_client_ids)
# Org memberships
mems = await db.memberships.find({"user_id": str(user.id)}).to_list(None)
for m in mems:
ids.add(m["organization_id"])
return list(ids)
async def _assert_client_access(user: User, client_id: str, db: AsyncIOMotorDatabase) -> None:
"""Allow platform staff, org members (any role), or PM of the client."""
if user.role in (UserRole.ADMIN, UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.LINGUIST):
@ -678,4 +460,6 @@ async def _assert_client_access(user: User, client_id: str, db: AsyncIOMotorData
# Legacy fallback for pre-migration users
if user.role == UserRole.PROJECT_MANAGER and client_id in (user.pm_client_ids or []):
return
if user.role in (UserRole.CLIENT, UserRole.PROJECT_MANAGER):
return
raise HTTPException(status_code=403, detail="Insufficient permissions")

View file

@ -3,11 +3,11 @@ from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.database import get_database
from ...core.dependencies import get_current_user
from ...models.audit_log import AuditAction
from ...models.user import User
from ...schemas.file import SignedUploadRequest, SignedUploadResponse
from ...services.audit_logger import audit_logger
from ...services.gcs import generate_signed_upload_url
from ...services.audit_logger import audit_logger
from ...models.audit_log import AuditAction
router = APIRouter(prefix="/files", tags=["files"])
@ -28,11 +28,11 @@ async def get_signed_upload_url(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Only video files are supported"
)
# Generate unique blob path
from bson import ObjectId
blob_path = f"temp/{ObjectId()}/{request.filename}"
try:
# Generate signed upload URL with form fields
signed_data = await generate_signed_upload_url(
@ -40,7 +40,7 @@ async def get_signed_upload_url(
content_type=request.content_type,
max_size=request.max_size or 1024 * 1024 * 1024 # 1GB default
)
await audit_logger.log_action(
action=AuditAction.FILE_UPLOAD,
description=f"Signed upload URL generated for {request.filename}",
@ -62,4 +62,4 @@ async def get_signed_upload_url(
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to generate signed upload URL: {str(e)}"
) from None
)

View file

@ -1,326 +0,0 @@
"""
Glossary management endpoints.
Access:
- All glossary mutations (upload, activate, archive) Admin or PM of the client
- Glossary reads (list, detail, terms) Admin, PM, or staff members
Routes are nested under /clients/{client_id}/glossaries to keep ownership clear.
"""
from __future__ import annotations
from fastapi import APIRouter, Depends, File, Form, HTTPException, Query, UploadFile
from ...core.authz import MembershipContext, assert_user_in_org, get_membership_context
from ...core.logging import get_logger
from ...models.audit_log import AuditAction
from ...models.glossary import (
GlossaryDetailResponse,
GlossaryResponse,
GlossaryVersionResponse,
)
from ...models.organization import OrgRole
from ...services import audit_logger as audit_svc
from ...services import glossary_service as svc
logger = get_logger(__name__)
router = APIRouter(
prefix="/clients/{client_id}/glossaries",
tags=["glossaries"],
)
_ALLOWED_CONTENT_TYPES = {
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"application/vnd.ms-excel",
}
_MAX_FILE_SIZE_MB = 50
# ── List glossaries ───────────────────────────────────────────────────────────
@router.get("", response_model=list[GlossaryResponse])
async def list_glossaries(
client_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
"""List all active glossaries for a client."""
assert_user_in_org(ctx, client_id, OrgRole.VIEWER)
glossaries = await svc.get_glossaries_for_client(client_id)
version_map = await svc.get_versions_by_ids([g.current_version_id for g in glossaries if g.current_version_id])
return [_to_response(g, version_map.get(g.current_version_id)) for g in glossaries]
# ── Upload new glossary ───────────────────────────────────────────────────────
@router.post("", response_model=GlossaryDetailResponse, status_code=201)
async def upload_glossary(
client_id: str,
file: UploadFile = File(..., description="xlsx glossary file"),
name: str = Form(...),
source_locale: str = Form(..., description="BCP-47 source locale, e.g. en-GB"),
source_locale_col: str = Form(..., description="xlsx column header for the source language, e.g. en_gb"),
description: str | None = Form(None),
change_note: str | None = Form(None),
ctx: MembershipContext = Depends(get_membership_context),
):
"""Upload a new glossary xlsx file and associate it with a client."""
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
_validate_xlsx(file)
try:
glossary, version = await svc.ingest_glossary(
client_id=client_id,
name=name,
source_locale=source_locale,
source_locale_col=source_locale_col,
file=file,
user_id=str(ctx.user.id),
description=description,
change_note=change_note,
)
except ValueError as exc:
raise HTTPException(status_code=422, detail=str(exc)) from exc
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_UPLOAD,
description=f"Glossary '{name}' uploaded for client {client_id}",
user=ctx.user,
resource_type="glossary",
resource_id=glossary.id,
details={"term_count": version.term_count, "source_locale": source_locale},
)
versions = await svc.get_versions(glossary.id)
return _to_detail_response(glossary, versions)
# ── Get glossary detail ───────────────────────────────────────────────────────
@router.get("/{glossary_id}", response_model=GlossaryDetailResponse)
async def get_glossary(
client_id: str,
glossary_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.VIEWER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
versions = await svc.get_versions(glossary_id)
return _to_detail_response(glossary, versions)
# ── Browse terms ──────────────────────────────────────────────────────────────
@router.get("/{glossary_id}/terms")
async def list_terms(
client_id: str,
glossary_id: str,
version_id: str | None = Query(None, description="Specific version; defaults to active"),
search: str | None = Query(None),
page: int = Query(1, ge=1),
page_size: int = Query(50, ge=1, le=200),
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.VIEWER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
vid = version_id or glossary.current_version_id
if not vid:
return {"terms": [], "total": 0, "page": page, "page_size": page_size}
terms, total = await svc.get_terms_page(vid, search=search, page=page, page_size=page_size)
return {
"terms": [{"source_term": t["source_term"], "translations": t["translations"]} for t in terms],
"total": total,
"page": page,
"page_size": page_size,
}
# ── Upload new version ────────────────────────────────────────────────────────
@router.post("/{glossary_id}/versions", response_model=GlossaryVersionResponse, status_code=201)
async def upload_version(
client_id: str,
glossary_id: str,
file: UploadFile = File(...),
source_locale_col: str = Form(...),
change_note: str | None = Form(None),
ctx: MembershipContext = Depends(get_membership_context),
):
"""Upload a new xlsx file as a new version of an existing glossary."""
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
_validate_xlsx(file)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
try:
version = await svc.ingest_new_version(
glossary_id=glossary_id,
source_locale_col=source_locale_col,
file=file,
user_id=str(ctx.user.id),
change_note=change_note,
)
except ValueError as exc:
raise HTTPException(status_code=422, detail=str(exc)) from exc
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_VERSION_UPLOAD,
description=f"New glossary version uploaded for glossary {glossary_id}",
user=ctx.user,
resource_type="glossary_version",
resource_id=version.id,
details={"term_count": version.term_count, "version_number": version.version_number},
)
return _version_to_response(version)
# ── Activate a version ────────────────────────────────────────────────────────
@router.post("/{glossary_id}/activate")
async def activate_version(
client_id: str,
glossary_id: str,
version_id: str = Form(...),
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
try:
await svc.activate_version(glossary_id, version_id)
except ValueError as exc:
raise HTTPException(status_code=404, detail=str(exc)) from exc
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_ACTIVATE,
description=f"Glossary version {version_id} activated",
user=ctx.user,
resource_type="glossary",
resource_id=glossary_id,
details={"version_id": version_id},
)
return {"status": "ok", "active_version_id": version_id}
# ── Re-queue embedding ────────────────────────────────────────────────────────
@router.post("/{glossary_id}/versions/{version_id}/reembed", status_code=202)
async def reembed_version(
client_id: str,
glossary_id: str,
version_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
"""Re-queue the embedding task for a glossary version (resets failed/pending/stuck embeds)."""
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
versions = await svc.get_versions(glossary_id)
version = next((v for v in versions if str(v.id) == version_id), None)
if not version:
raise HTTPException(status_code=404, detail="Version not found")
try:
import motor.motor_asyncio
from bson import ObjectId
from ...core.config import settings
from ...tasks.embed_glossary import embed_glossary_version_task
client_db = motor.motor_asyncio.AsyncIOMotorClient(settings.mongodb_uri)
db = client_db[settings.mongodb_db]
await db.glossary_versions.update_one(
{"_id": ObjectId(version_id)},
{"$set": {"embedding_status": "pending", "embedded_count": 0}},
)
client_db.close()
embed_glossary_version_task.delay(version_id)
except Exception as exc:
raise HTTPException(status_code=500, detail=f"Failed to queue embedding: {exc}") from exc
return {"status": "queued", "version_id": version_id}
# ── Delete ───────────────────────────────────────────────────────────────────
@router.delete("/{glossary_id}", status_code=204)
async def archive_glossary(
client_id: str,
glossary_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.ADMIN)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
await svc.archive_glossary(glossary_id)
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_ARCHIVE,
description=f"Glossary {glossary_id} archived",
user=ctx.user,
resource_type="glossary",
resource_id=glossary_id,
)
# ── Helpers ───────────────────────────────────────────────────────────────────
def _validate_xlsx(file: UploadFile) -> None:
if file.content_type not in _ALLOWED_CONTENT_TYPES and not (
file.filename and file.filename.endswith(".xlsx")
):
raise HTTPException(
status_code=422,
detail="Only .xlsx files are accepted",
)
def _to_response(g, current_version=None) -> GlossaryResponse:
return GlossaryResponse(
id=str(g.id),
client_id=g.client_id,
name=g.name,
description=g.description,
source_locale=g.source_locale,
source=g.source,
status=g.status,
current_version_id=g.current_version_id,
current_version_embedding_status=current_version.embedding_status if current_version else None,
current_version_embedded_count=current_version.embedded_count if current_version else None,
current_version_term_count=current_version.term_count if current_version else None,
created_at=g.created_at,
created_by=g.created_by,
)
def _version_to_response(v) -> GlossaryVersionResponse:
return GlossaryVersionResponse(
id=str(v.id),
glossary_id=v.glossary_id,
version_number=v.version_number,
term_count=v.term_count,
embedded_count=v.embedded_count,
embedding_status=v.embedding_status,
created_at=v.created_at,
created_by=v.created_by,
change_note=v.change_note,
)
def _to_detail_response(glossary, versions) -> GlossaryDetailResponse:
return GlossaryDetailResponse(
**_to_response(glossary).model_dump(),
versions=[_version_to_response(v) for v in versions],
)

View file

@ -14,21 +14,16 @@ Protected endpoints:
import hashlib
import re
import secrets
from datetime import UTC, datetime, timedelta
from datetime import datetime, timedelta, timezone
from fastapi import APIRouter, Depends, HTTPException, Request
from fastapi import APIRouter, Depends, HTTPException, status
from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import bump_user_membership_cache
from ...core.database import get_database
from ...core.dependencies import get_current_user
from ...core.security import (
create_access_token,
create_refresh_token,
get_password_hash,
)
from ...models.audit_log import AuditAction
from ...core.security import create_access_token, create_refresh_token, get_password_hash
from ...models.invitation import (
Invitation,
InvitationAcceptRequest,
InvitationCreate,
InvitationPreviewResponse,
@ -36,7 +31,7 @@ from ...models.invitation import (
)
from ...models.organization import OrgRole
from ...models.user import AuthProvider, User, UserRole
from ...services.audit_logger import audit_logger
from ...core.authz import bump_user_membership_cache
from ...services.emailer import email_service
from ...services.membership_service import get_membership, upsert_membership
@ -44,7 +39,7 @@ router = APIRouter(tags=["invitations"])
def _now() -> datetime:
return datetime.now(UTC)
return datetime.now(timezone.utc)
def _hash_token(plaintext: str) -> str:
@ -59,7 +54,7 @@ def _make_token() -> tuple[str, str]:
def _inv_from_doc(doc: dict) -> InvitationResponse:
now = _now()
expires_at = doc["expires_at"].replace(tzinfo=UTC) if doc["expires_at"].tzinfo is None else doc["expires_at"]
expires_at = doc["expires_at"].replace(tzinfo=timezone.utc) if doc["expires_at"].tzinfo is None else doc["expires_at"]
return InvitationResponse(
id=str(doc["_id"]),
email=doc["email"],
@ -105,7 +100,6 @@ org_router = APIRouter(prefix="/organizations", tags=["invitations"])
async def create_invitation(
org_id: str,
body: InvitationCreate,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
@ -127,18 +121,6 @@ async def create_invitation(
detail="A pending invitation already exists for this email. Revoke it first to re-invite.",
)
# MT-19: ensure all target_team_ids belong to this org (client_id == org_id)
if body.target_team_ids:
valid_teams = await db.teams.count_documents({
"_id": {"$in": body.target_team_ids},
"client_id": org_id,
})
if valid_teams != len(body.target_team_ids):
raise HTTPException(
status_code=400,
detail="One or more target_team_ids do not belong to this organization.",
)
plaintext, token_hash = _make_token()
now = _now()
expires_at = now + timedelta(days=body.expires_in_days)
@ -172,17 +154,7 @@ async def create_invitation(
expires_at=expires_at,
)
inv = _inv_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.INVITATION_CREATE,
description=f"Invitation created for '{email_lower}' to organization '{org_id}'",
user=current_user,
request=request,
resource_type="invitation",
resource_id=inv.id,
details={"invited_email": email_lower, "org_id": org_id, "role": body.role_in_org},
)
return inv
return _inv_from_doc(doc)
@org_router.get("/{org_id}/invitations", response_model=list[InvitationResponse])
@ -202,30 +174,16 @@ async def list_invitations(
async def revoke_invitation(
org_id: str,
invitation_id: str,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
await _assert_org_admin(org_id, current_user, db)
inv_doc = await db.invitations.find_one({"_id": invitation_id, "organization_id": org_id})
result = await db.invitations.update_one(
{"_id": invitation_id, "organization_id": org_id, "accepted_at": None, "revoked_at": None},
{"$set": {"revoked_at": _now()}},
)
if result.matched_count == 0:
raise HTTPException(status_code=404, detail="Invitation not found or already accepted/revoked")
await audit_logger.log_action(
action=AuditAction.INVITATION_REVOKE,
description=f"Invitation '{invitation_id}' revoked in organization '{org_id}'",
user=current_user,
request=request,
resource_type="invitation",
resource_id=invitation_id,
details={
"invited_email": inv_doc["email"] if inv_doc else None,
"org_id": org_id,
},
)
# ---------------------------------------------------------------------------
@ -248,7 +206,7 @@ async def preview_invitation(
raise HTTPException(status_code=410, detail="Invitation not found or has expired")
now = _now()
expires_at = doc["expires_at"].replace(tzinfo=UTC) if doc["expires_at"].tzinfo is None else doc["expires_at"]
expires_at = doc["expires_at"].replace(tzinfo=timezone.utc) if doc["expires_at"].tzinfo is None else doc["expires_at"]
if doc.get("revoked_at"):
raise HTTPException(status_code=410, detail="This invitation has been revoked")
@ -297,7 +255,6 @@ async def preview_invitation(
@router.post("/invitations/accept")
async def accept_invitation(
body: InvitationAcceptRequest,
request: Request,
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Accept an invitation. Creates user if needed, creates membership, returns tokens."""
@ -360,16 +317,12 @@ async def accept_invitation(
await upsert_membership(user_id, org_id, role_in_org, doc["invited_by_user_id"], db)
await bump_user_membership_cache(user_id)
# Auto-add to target teams — write to both Team.member_user_ids (legacy) and Membership.team_ids (MT-17)
# Auto-add to target teams
for team_id in doc.get("target_team_ids", []):
await db.teams.update_one(
{"_id": team_id, "client_id": org_id},
{"$addToSet": {"member_user_ids": user_id}},
)
await db.memberships.update_one(
{"user_id": user_id, "organization_id": org_id},
{"$addToSet": {"team_ids": team_id}},
)
# Send welcome email
if not existing_user.get("_welcomed"):
@ -380,23 +333,12 @@ async def accept_invitation(
org_name=org_name,
)
# Issue JWT tokens with org_ids claim
_inv_org_ids = [m["organization_id"] async for m in db.memberships.find({"user_id": user_id}, {"organization_id": 1})]
access_token = create_access_token(subject=user_id, org_ids=[str(o) for o in _inv_org_ids if o])
# Issue JWT tokens
access_token = create_access_token(subject=user_id)
refresh_token = create_refresh_token(subject=user_id)
org_name, org_slug = await _get_org_name(org_id, db)
await audit_logger.log_action(
action=AuditAction.INVITATION_ACCEPT,
description=f"Invitation accepted by '{email_lower}' for organization '{org_id}'",
user=None,
request=request,
resource_type="invitation",
resource_id=str(doc["_id"]),
details={"invited_email": email_lower, "org_id": org_id},
)
return {
"access_token": access_token,
"refresh_token": refresh_token,

File diff suppressed because it is too large Load diff

View file

@ -1,580 +0,0 @@
"""Per-language QC endpoints — two-stage (linguist + reviewer) assignment, workflow, comments."""
from datetime import datetime
from fastapi import APIRouter, Depends, HTTPException, Query, Request
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel, Field
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...models.audit_log import AuditAction
from ...models.job import LanguageQCComment, LanguageQCState
from ...models.user import User, UserRole
from ...services import language_qc as lqc
from ...services.audit_logger import audit_logger
router = APIRouter(tags=["language-qc"])
# ── Request / response schemas ────────────────────────────────────────────────
class AssignRequest(BaseModel):
linguist_user_id: str
notes: str | None = None
deadline: datetime | None = None
class ReassignRequest(BaseModel):
linguist_user_id: str
notes: str | None = None
deadline: datetime | None = None
class AssignReviewerRequest(BaseModel):
reviewer_user_id: str
notes: str | None = None
deadline: datetime | None = None
class ReassignReviewerRequest(BaseModel):
reviewer_user_id: str
notes: str | None = None
deadline: datetime | None = None
class ApproveLanguageRequest(BaseModel):
notes: str | None = None
class RejectLanguageRequest(BaseModel):
notes: str
category: str | None = None # timing | mistranslation | terminology | profanity | length | other
class ReopenLanguageRequest(BaseModel):
notes: str | None = None
class AddCommentRequest(BaseModel):
body: str = Field(..., min_length=1, max_length=4000)
class LanguageQCStateResponse(BaseModel):
lang: str
state: LanguageQCState
class LanguageQCMapResponse(BaseModel):
job_id: str
language_qc: dict[str, LanguageQCState]
class QueueItem(BaseModel):
job_id: str
job_title: str
job_status: str
lang: str
lang_qc_status: str
assigned_at: str | None = None
reviewed_at: str | None = None
class QueueResponse(BaseModel):
items: list[QueueItem]
total: int
class BulkAssignRequest(BaseModel):
linguist_user_id: str
reviewer_user_id: str | None = None
languages: list[str] | None = None # None = all available languages
only_unassigned: bool = False # skip languages that already have an assignment
deadline: datetime | None = None
class BulkAssignResponse(BaseModel):
assigned: list[str]
skipped: list[str]
errors: dict[str, str]
# ── Routes ────────────────────────────────────────────────────────────────────
@router.get("/jobs/{job_id}/language-qc", response_model=LanguageQCMapResponse)
async def get_language_qc(
job_id: str,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION,
UserRole.PROJECT_MANAGER, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
# Lazy auto-assignment: apply project/job defaults on first open in PENDING_QC
await lqc.auto_assign_defaults(db, job_id)
states = await lqc.get_all_states(db, job_id)
return LanguageQCMapResponse(job_id=job_id, language_qc=states)
# ── Linguist assignment ───────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/assign", response_model=LanguageQCStateResponse)
async def assign_language(
job_id: str,
lang: str,
request: AssignRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.assign_linguist(
db, job_id, lang, request.linguist_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_ASSIGN,
description=f"Language '{lang}' assigned to linguist '{request.linguist_user_id}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "linguist_user_id": request.linguist_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reassign", response_model=LanguageQCStateResponse)
async def reassign_language(
job_id: str,
lang: str,
request: ReassignRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reassign_linguist(
db, job_id, lang, request.linguist_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REASSIGN,
description=f"Language '{lang}' reassigned to linguist '{request.linguist_user_id}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "linguist_user_id": request.linguist_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Reviewer assignment ───────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/assign-reviewer", response_model=LanguageQCStateResponse)
async def assign_reviewer(
job_id: str,
lang: str,
request: AssignReviewerRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.assign_reviewer(
db, job_id, lang, request.reviewer_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REVIEWER_ASSIGN,
description=f"Reviewer '{request.reviewer_user_id}' assigned to language '{lang}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "reviewer_user_id": request.reviewer_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reassign-reviewer", response_model=LanguageQCStateResponse)
async def reassign_reviewer(
job_id: str,
lang: str,
request: ReassignReviewerRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reassign_reviewer(
db, job_id, lang, request.reviewer_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REVIEWER_REASSIGN,
description=f"Reviewer reassigned to '{request.reviewer_user_id}' for language '{lang}', job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "reviewer_user_id": request.reviewer_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Bulk assignment ───────────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/bulk-assign", response_model=BulkAssignResponse)
async def bulk_assign_languages(
job_id: str,
request: BulkAssignRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Assign one linguist (and optionally one reviewer) to multiple languages in one call."""
job_doc = await db["jobs"].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
available = list((job_doc.get("outputs") or {}).keys())
target_langs = request.languages if request.languages else available
assigned: list[str] = []
skipped: list[str] = []
errors: dict[str, str] = {}
language_qc = job_doc.get("language_qc") or {}
for lang in target_langs:
if lang not in available:
skipped.append(lang)
continue
lang_state = language_qc.get(lang) or {}
already_assigned = bool(lang_state.get("assigned_linguist_id"))
if request.only_unassigned and already_assigned:
skipped.append(lang)
continue
try:
await lqc.assign_linguist(
db, job_id, lang, request.linguist_user_id, current_user,
http_request=http_request, deadline=request.deadline,
)
except Exception as exc:
errors[lang] = str(exc)
continue
if request.reviewer_user_id:
try:
await lqc.assign_reviewer(
db, job_id, lang, request.reviewer_user_id, current_user,
http_request=http_request, deadline=request.deadline,
)
except Exception as exc:
errors[f"{lang}:reviewer"] = str(exc)
assigned.append(lang)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_BULK_ASSIGN,
description=f"Bulk assignment for job {job_id}: {len(assigned)} language(s) assigned to linguist '{request.linguist_user_id}'",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={
"languages": assigned,
"linguist_user_id": request.linguist_user_id,
"reviewer_user_id": request.reviewer_user_id,
"skipped": skipped,
"errors": errors,
},
)
return BulkAssignResponse(assigned=assigned, skipped=skipped, errors=errors)
# ── Workflow transitions ──────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/start-work", response_model=LanguageQCStateResponse)
async def start_linguist_work(
job_id: str,
lang: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Linguist opens the language — pending → in_progress."""
state = await lqc.start_linguist_work(db, job_id, lang, current_user)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_START_WORK,
description=f"Linguist started work on language '{lang}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/submit", response_model=LanguageQCStateResponse)
async def submit_for_review(
job_id: str,
lang: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Linguist submits — in_progress → pending_review. Notifies reviewer by email."""
state = await lqc.submit_for_review(db, job_id, lang, current_user, http_request=http_request)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_SUBMIT,
description=f"Language '{lang}' submitted for review for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/open-review", response_model=LanguageQCStateResponse)
async def open_review(
job_id: str,
lang: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Reviewer opens the review — pending_review → in_review."""
state = await lqc.open_review(db, job_id, lang, current_user, http_request=http_request)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_OPEN_REVIEW,
description=f"Reviewer opened review for language '{lang}', job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Approve / Reject / Reopen ─────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/approve", response_model=LanguageQCStateResponse)
async def approve_language(
job_id: str,
lang: str,
request: ApproveLanguageRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.approve_language(
db, job_id, lang, current_user, http_request=http_request, notes=request.notes,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_APPROVE,
description=f"Language '{lang}' approved for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "notes": request.notes},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reject", response_model=LanguageQCStateResponse)
async def reject_language(
job_id: str,
lang: str,
request: RejectLanguageRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reject_language(
db, job_id, lang, current_user, request.notes, category=request.category, http_request=http_request,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REJECT,
description=f"Language '{lang}' rejected for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "notes": request.notes, "category": request.category},
)
return LanguageQCStateResponse(lang=lang, state=state)
class MarkCueReviewedRequest(BaseModel):
total_cues: int | None = None # client sends on first call to set total
@router.post("/jobs/{job_id}/languages/{lang}/mark-cue-reviewed", response_model=LanguageQCStateResponse)
async def mark_cue_reviewed(
job_id: str,
lang: str,
request: MarkCueReviewedRequest,
http_request: Request,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Increment reviewed_cues counter; optionally set total_cues on first call."""
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
inc_op: dict = {f"language_qc.{lang}.reviewed_cues": 1}
set_op: dict = {"updated_at": datetime.utcnow()}
if request.total_cues is not None:
set_op[f"language_qc.{lang}.total_cues"] = request.total_cues
await db.jobs.update_one({"_id": job_id}, {"$inc": inc_op, "$set": set_op})
updated_doc = await db.jobs.find_one({"_id": job_id})
state_dict = (updated_doc.get("language_qc") or {}).get(lang, {})
from ...models.job import LanguageQCState
state = LanguageQCState(**state_dict) if isinstance(state_dict, dict) else LanguageQCState()
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reopen", response_model=LanguageQCStateResponse)
async def reopen_language(
job_id: str,
lang: str,
request: ReopenLanguageRequest,
http_request: Request,
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reopen_language(
db, job_id, lang, current_user, http_request=http_request, notes=request.notes,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REOPEN,
description=f"Language '{lang}' reopened for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "notes": request.notes},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Comments ──────────────────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/comments", response_model=LanguageQCComment, status_code=201)
async def add_comment(
job_id: str,
lang: str,
request: AddCommentRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.REVIEWER, UserRole.PROJECT_MANAGER,
UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
comment = await lqc.add_comment(
db, job_id, lang, current_user, request.body, http_request=http_request,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_COMMENT,
description=f"Comment added to language '{lang}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "comment_id": str(comment.id) if hasattr(comment, "id") else None},
)
return comment
@router.get("/jobs/{job_id}/languages/{lang}/comments", response_model=list[LanguageQCComment])
async def list_comments(
job_id: str,
lang: str,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.REVIEWER, UserRole.PROJECT_MANAGER,
UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.get_state(db, job_id, lang)
if state is None:
return []
return state.comments
# ── Queues ─────────────────────────────────────────────────────────────────────
@router.get("/me/language-qc-queue", response_model=QueueResponse)
async def my_language_qc_queue(
role: str = Query("linguist", description="'linguist' or 'reviewer'"),
qc_status: str | None = Query(None, description="Filter by status"),
skip: int = Query(0, ge=0),
limit: int = Query(50, ge=1, le=200),
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List jobs and languages assigned to the current user as linguist or reviewer."""
# ADMIN sees all orgs; staff scoped to their orgs from JWT claim (MT-18)
org_ids: list[str] | None = None if current_user.role == UserRole.ADMIN else getattr(current_user, "org_ids", None)
if role == "reviewer":
jobs = await lqc.list_for_reviewer(
db, str(current_user.id), accessible_org_ids=org_ids,
status_filter=qc_status, skip=skip, limit=limit,
)
else:
jobs = await lqc.list_for_linguist(
db, str(current_user.id), accessible_org_ids=org_ids,
status_filter=qc_status, skip=skip, limit=limit,
)
items: list[QueueItem] = []
for job in jobs:
job_id = str(job["_id"])
for assignment in job.get("_my_assignments", []):
lang = assignment["lang"]
state_raw = (job.get("language_qc") or {}).get(lang, {})
items.append(QueueItem(
job_id=job_id,
job_title=job.get("title", ""),
job_status=job.get("status", ""),
lang=lang,
lang_qc_status=assignment.get("status", "pending"),
assigned_at=state_raw.get("assigned_at").isoformat() if isinstance(state_raw, dict) and state_raw.get("assigned_at") else None,
reviewed_at=state_raw.get("reviewed_at").isoformat() if isinstance(state_raw, dict) and state_raw.get("reviewed_at") else None,
))
return QueueResponse(items=items, total=len(items))

View file

@ -12,25 +12,19 @@ underlying MongoDB collections used by routes_clients.py so both
endpoints coexist without data duplication.
"""
from datetime import UTC, datetime
from datetime import datetime, timezone
from fastapi import APIRouter, Depends, HTTPException, Request
from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel
from ...core.authz import bump_user_membership_cache
from ...core.database import get_database
from ...core.dependencies import get_current_user, require_roles
from ...models.audit_log import AuditAction
from ...models.membership import MemberDetail, MembershipCreate, MembershipUpdate
from ...models.organization import (
Organization,
OrganizationCreate,
OrganizationUpdate,
OrgRole,
)
from ...models.organization import OrgRole, Organization, OrganizationCreate, OrganizationUpdate
from ...models.user import User, UserRole
from ...services.audit_logger import audit_logger
from ...core.authz import bump_user_membership_cache
from ...services.membership_service import (
get_membership,
get_memberships_for_user,
@ -45,7 +39,7 @@ ADMIN_ROLES = [UserRole.ADMIN]
def _now() -> datetime:
return datetime.now(UTC)
return datetime.now(timezone.utc)
# ---------------------------------------------------------------------------
@ -121,7 +115,6 @@ class _OrgCreate(BaseModel):
@router.post("", response_model=Organization, status_code=201)
async def create_organization(
body: OrganizationCreate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
@ -140,25 +133,13 @@ async def create_organization(
"updated_at": now,
}
await db.clients.insert_one(doc)
org = _org_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.ORG_CREATE,
description=f"Organization '{org.name}' created",
user=current_user,
request=request,
resource_type="organization",
resource_id=str(org.id),
resource_name=org.name,
details={"slug": org.slug},
)
return org
return _org_from_doc(doc)
@router.patch("/{org_id}", response_model=Organization)
async def update_organization(
org_id: str,
body: OrganizationUpdate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
@ -175,18 +156,7 @@ async def update_organization(
await db.clients.update_one({"_id": org_id}, {"$set": updates})
updated = {**doc, **updates}
org = _org_from_doc(updated)
await audit_logger.log_action(
action=AuditAction.ORG_UPDATE,
description=f"Organization '{org.name}' updated",
user=current_user,
request=request,
resource_type="organization",
resource_id=str(org.id),
resource_name=org.name,
details={k: v for k, v in updates.items() if k != "updated_at"},
)
return org
return _org_from_doc(updated)
# ---------------------------------------------------------------------------
@ -208,7 +178,6 @@ async def list_members(
async def add_member(
org_id: str,
body: MembershipCreate,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
@ -224,15 +193,6 @@ async def add_member(
members = await list_org_members(org_id, db)
for m in members:
if m.user_id == body.user_id:
await audit_logger.log_action(
action=AuditAction.ORG_MEMBER_ADD,
description=f"Member '{body.user_id}' added to organization '{org_id}' with role '{body.role_in_org}'",
user=current_user,
request=request,
resource_type="organization",
resource_id=org_id,
details={"user_id": body.user_id, "role": body.role_in_org},
)
return m
raise HTTPException(status_code=500, detail="Membership created but could not be retrieved")
@ -242,7 +202,6 @@ async def update_member(
org_id: str,
user_id: str,
body: MembershipUpdate,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
@ -259,15 +218,6 @@ async def update_member(
members = await list_org_members(org_id, db)
for m in members:
if m.user_id == user_id:
await audit_logger.log_action(
action=AuditAction.ORG_MEMBER_UPDATE,
description=f"Member '{user_id}' role updated in organization '{org_id}' to '{body.role_in_org}'",
user=current_user,
request=request,
resource_type="organization",
resource_id=org_id,
details={"user_id": user_id, "role": body.role_in_org},
)
return m
raise HTTPException(status_code=500, detail="Could not retrieve updated membership")
@ -276,7 +226,6 @@ async def update_member(
async def remove_member(
org_id: str,
user_id: str,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
@ -290,15 +239,6 @@ async def remove_member(
await remove_membership(user_id, org_id, db)
await bump_user_membership_cache(user_id)
await audit_logger.log_action(
action=AuditAction.ORG_MEMBER_REMOVE,
description=f"Member '{user_id}' removed from organization '{org_id}'",
user=current_user,
request=request,
resource_type="organization",
resource_id=org_id,
details={"user_id": user_id, "role": existing.role_in_org},
)
# ---------------------------------------------------------------------------

View file

@ -1,14 +1,14 @@
"""API routes for review notes - timestamped notes on video assets during review."""
from datetime import datetime
from typing import Optional
from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException, Query, status
from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, get_job_or_403, get_membership_context
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...core.dependencies import get_current_user, require_roles
from ...core.logging import get_logger
from ...models.user import User, UserRole
from ...schemas.review_note import (
@ -25,13 +25,18 @@ router = APIRouter(prefix="/jobs/{job_id}/review-notes", tags=["review-notes"])
@router.get("", response_model=ReviewNotesListResponse)
async def list_review_notes(
job_id: str,
asset_key: str | None = Query(None, description="Filter notes by asset key"),
asset_key: Optional[str] = Query(None, description="Filter notes by asset key"),
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List all review notes for a job, optionally filtered by asset key."""
await get_job_or_403(job_id, ctx, db) # org check + existence check
# Verify job exists
job = await db.jobs.find_one({"_id": job_id})
if not job:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Job not found"
)
# Build query
query = {"job_id": job_id}
@ -53,11 +58,16 @@ async def create_review_note(
job_id: str,
request: ReviewNoteCreateRequest,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Create a new review note for a video asset."""
await get_job_or_403(job_id, ctx, db) # org check + existence check
# Verify job exists
job = await db.jobs.find_one({"_id": job_id})
if not job:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Job not found"
)
# Create note document
note_id = str(ObjectId())
@ -86,11 +96,9 @@ async def get_review_note(
job_id: str,
note_id: str,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Get a single review note by ID."""
await get_job_or_403(job_id, ctx, db) # org check
note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id})
if not note:
raise HTTPException(
@ -107,11 +115,9 @@ async def update_review_note(
note_id: str,
request: ReviewNoteUpdateRequest,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Update a review note. Only the note owner can update."""
await get_job_or_403(job_id, ctx, db) # org check
note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id})
if not note:
raise HTTPException(
@ -145,11 +151,9 @@ async def delete_review_note(
job_id: str,
note_id: str,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Delete a review note. Only the note owner can delete."""
await get_job_or_403(job_id, ctx, db) # org check
note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id})
if not note:
raise HTTPException(

View file

@ -1,354 +0,0 @@
"""Share-token endpoints — create/revoke/list tokens + public read-only view + client decision."""
import secrets
from datetime import datetime, timedelta
from typing import Literal
from fastapi import APIRouter, Depends, HTTPException, Request
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel
from ...core.config import settings
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...models.audit_log import AuditAction
from ...models.share_token import ShareTokenResponse
from ...models.user import User, UserRole
from ...services.audit_logger import audit_logger
from ...services.gcs import get_signed_download_url
router = APIRouter(tags=["share"])
_TOKENS = "share_tokens"
_JOBS = "jobs"
def _share_url(token: str) -> str:
return f"{settings.app_url}/share/{token}"
# ── Request schemas ───────────────────────────────────────────────────────────
class CreateShareTokenRequest(BaseModel):
expires_in_days: int | None = 30 # None = no expiry
label: str | None = None
class ShareTokenListResponse(BaseModel):
tokens: list[ShareTokenResponse]
class PublicJobPreviewLanguage(BaseModel):
captions_vtt_url: str | None = None
audio_description_vtt_url: str | None = None
accessible_video_mp4_url: str | None = None
audio_description_mp3_url: str | None = None
class PublicJobPreviewResponse(BaseModel):
job_id: str
job_title: str
job_status: str
source_language: str
languages: list[str]
language_outputs: dict[str, PublicJobPreviewLanguage]
class ClientDecisionRequest(BaseModel):
action: Literal["approve", "reject"]
notes: str | None = None
client_name: str | None = None
class ClientDecisionResponse(BaseModel):
status: str
new_job_status: str
# ── Authenticated routes ──────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/share", response_model=ShareTokenResponse, status_code=201)
async def create_share_token(
job_id: str,
request: CreateShareTokenRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Generate a read-only share link for a job."""
job_doc = await db[_JOBS].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
token_id = secrets.token_hex(32)
now = datetime.utcnow()
expires_at = (now + timedelta(days=request.expires_in_days)) if request.expires_in_days else None
token_doc = {
"_id": token_id,
"job_id": job_id,
"organization_id": job_doc.get("organization_id", ""),
"created_by_user_id": str(current_user.id),
"created_by_email": current_user.email,
"created_at": now,
"expires_at": expires_at,
"is_active": True,
"label": request.label,
}
await db[_TOKENS].insert_one(token_doc)
await audit_logger.log_action(
action=AuditAction.SHARE_TOKEN_CREATE,
description=f"Share token created for job '{job_id}'",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"token_id": token_id, "label": request.label, "expires_in_days": request.expires_in_days},
)
return ShareTokenResponse(
id=token_id,
job_id=job_id,
created_by_email=current_user.email,
created_at=now,
expires_at=expires_at,
is_active=True,
label=request.label,
share_url=_share_url(token_id),
)
@router.get("/jobs/{job_id}/share", response_model=ShareTokenListResponse)
async def list_share_tokens(
job_id: str,
current_user: User = Depends(require_roles(
UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List all active share tokens for a job."""
job_doc = await db[_JOBS].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
cursor = db[_TOKENS].find({"job_id": job_id, "is_active": True})
tokens = []
async for doc in cursor:
tokens.append(ShareTokenResponse(
id=doc["_id"],
job_id=doc["job_id"],
created_by_email=doc["created_by_email"],
created_at=doc["created_at"],
expires_at=doc.get("expires_at"),
is_active=doc["is_active"],
label=doc.get("label"),
share_url=_share_url(doc["_id"]),
))
return ShareTokenListResponse(tokens=tokens)
@router.delete("/jobs/{job_id}/share/{token_id}", status_code=204)
async def revoke_share_token(
job_id: str,
token_id: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Revoke (deactivate) a share token."""
result = await db[_TOKENS].update_one(
{"_id": token_id, "job_id": job_id},
{"$set": {"is_active": False}},
)
if result.matched_count == 0:
raise HTTPException(status_code=404, detail="Token not found")
await audit_logger.log_action(
action=AuditAction.SHARE_TOKEN_REVOKE,
description=f"Share token '{token_id}' revoked for job '{job_id}'",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"token_id": token_id},
)
# ── Public route (no auth) ────────────────────────────────────────────────────
@router.get("/public/share/{token}", response_model=PublicJobPreviewResponse)
async def get_public_job_preview(
token: str,
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Return read-only job preview for a valid share token. No authentication required."""
token_doc = await db[_TOKENS].find_one({"_id": token, "is_active": True})
if not token_doc:
raise HTTPException(status_code=404, detail="Share link not found or has been revoked")
if token_doc.get("expires_at") and token_doc["expires_at"] < datetime.utcnow():
raise HTTPException(status_code=410, detail="Share link has expired")
job_doc = await db[_JOBS].find_one({"_id": token_doc["job_id"]})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
outputs = job_doc.get("outputs") or {}
language_outputs: dict[str, PublicJobPreviewLanguage] = {}
for lang, lang_output in outputs.items():
if not isinstance(lang_output, dict):
continue
lang_data = PublicJobPreviewLanguage()
if "captions_vtt_gcs" in lang_output:
blob_path = lang_output["captions_vtt_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.captions_vtt_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
if "ad_vtt_gcs" in lang_output:
blob_path = lang_output["ad_vtt_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.audio_description_vtt_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
if "ad_mp3_gcs" in lang_output:
blob_path = lang_output["ad_mp3_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.audio_description_mp3_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
if "accessible_video_gcs" in lang_output:
blob_path = lang_output["accessible_video_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.accessible_video_mp4_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
language_outputs[lang] = lang_data
return PublicJobPreviewResponse(
job_id=str(job_doc["_id"]),
job_title=job_doc.get("title", "Untitled"),
job_status=job_doc.get("status", ""),
source_language=job_doc.get("source", {}).get("language", "en"),
languages=list(outputs.keys()),
language_outputs=language_outputs,
)
@router.post("/public/share/{token}/decision", response_model=ClientDecisionResponse)
async def client_decision(
token: str,
request: ClientDecisionRequest,
http_request: Request,
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Submit client approval or rejection via a share link. No authentication required."""
from ...services.validation import asset_validation_service
token_doc = await db[_TOKENS].find_one({"_id": token, "is_active": True})
if not token_doc:
raise HTTPException(status_code=404, detail="Share link not found or has been revoked")
if token_doc.get("expires_at") and token_doc["expires_at"] < datetime.utcnow():
raise HTTPException(status_code=410, detail="Share link has expired")
job_id = token_doc["job_id"]
job_doc = await db[_JOBS].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
if job_doc.get("status") != "pending_final_review":
raise HTTPException(
status_code=409,
detail="This job is not currently awaiting client review"
)
now = datetime.utcnow()
by_label = f"client:{request.client_name or 'anonymous'} (share/{token[:8]})"
if request.action == "approve":
is_valid, validation_errors = await asset_validation_service.validate_job_assets(job_doc)
if not is_valid:
raise HTTPException(
status_code=400,
detail=f"Asset validation failed: {'; '.join(validation_errors)}"
)
new_status = "completed"
update = {
"$set": {
"status": new_status,
"review.notes": request.notes or "",
"updated_at": now,
},
"$push": {
"review.history": {
"at": now,
"status": new_status,
"by": by_label,
"notes": request.notes or "",
}
},
}
else:
new_status = "qc_feedback"
update = {
"$set": {
"status": new_status,
"review.notes": request.notes or "",
"review.reviewer_id": by_label,
"updated_at": now,
},
"$push": {
"review.history": {
"at": now,
"status": new_status,
"by": by_label,
"notes": request.notes or "",
}
},
}
result = await db[_JOBS].find_one_and_update(
{"_id": job_id, "status": "pending_final_review"},
update,
return_document=True,
)
if not result:
raise HTTPException(
status_code=409,
detail="Decision could not be submitted — the job status may have changed"
)
await audit_logger.log_action(
action=AuditAction.SHARE_CLIENT_DECISION,
description=f"Client '{request.client_name or 'anonymous'}' submitted decision '{request.action}' for job '{job_id}' via share token",
user=None,
request=http_request,
resource_type="job",
resource_id=job_id,
details={
"action": request.action,
"token": token,
"client_name": request.client_name,
"new_status": new_status,
"notes": request.notes,
},
)
if request.action == "approve":
try:
from ...tasks.notify import notify_client_task
notify_client_task.delay(job_id)
except Exception:
pass
return ClientDecisionResponse(status="ok", new_job_status=new_status)

View file

@ -1,18 +1,18 @@
import asyncio
import time
from typing import Literal
from typing import Literal, Optional
from fastapi import APIRouter, Depends, HTTPException, Query
from fastapi.responses import Response
from pydantic import BaseModel, Field
from ...core.config import settings
from ...core.dependencies import get_current_user
from ...core.logging import get_logger
from ...services import cost_tracker
from ...services.elevenlabs_voices import elevenlabs_voice_service
from ...services.gemini_tts import gemini_tts_service
from ...services.elevenlabs_voices import elevenlabs_voice_service
from ...services.tts import tts_service
from ...services import cost_tracker
from ...core.dependencies import get_current_user
logger = get_logger(__name__)
@ -30,20 +30,20 @@ class VoicePreviewRequest(BaseModel):
style_preset: Literal[
"neutral", "calm", "energetic", "professional", "warm", "documentary", "custom"
] = "neutral"
custom_style_prompt: str | None = None
custom_style_prompt: Optional[str] = None
# ElevenLabs-specific
stability: float | None = Field(default=None, ge=0.0, le=1.0)
similarity_boost: float | None = Field(default=None, ge=0.0, le=1.0)
stability: Optional[float] = Field(default=None, ge=0.0, le=1.0)
similarity_boost: Optional[float] = Field(default=None, ge=0.0, le=1.0)
class VoiceInfo(BaseModel):
"""Structured voice information for any provider."""
id: str
name: str
description: str | None = None
preview_url: str | None = None
labels: dict[str, str] | None = None
category: str | None = None
description: Optional[str] = None
preview_url: Optional[str] = None
labels: Optional[dict[str, str]] = None
category: Optional[str] = None
class ProviderVoicesResponse(BaseModel):
@ -52,7 +52,7 @@ class ProviderVoicesResponse(BaseModel):
voices: list[VoiceInfo]
default: str
available: bool = True
error: str | None = None
error: Optional[str] = None
class LanguagesResponse(BaseModel):
@ -87,12 +87,12 @@ class ProviderOptionsResponse(BaseModel):
"""Available TTS configuration options for a provider."""
provider: str
# Gemini-specific
models: list[TTSOptionItem] | None = None
style_presets: list[TTSOptionItem] | None = None
speed_range: SpeedRange | None = None
models: Optional[list[TTSOptionItem]] = None
style_presets: Optional[list[TTSOptionItem]] = None
speed_range: Optional[SpeedRange] = None
# ElevenLabs-specific
stability_range: FloatRange | None = None
similarity_boost_range: FloatRange | None = None
stability_range: Optional[FloatRange] = None
similarity_boost_range: Optional[FloatRange] = None
@router.get("/voices", response_model=ProviderVoicesResponse)

View file

@ -1,151 +0,0 @@
"""VTT version control endpoints."""
from fastapi import APIRouter, Depends, HTTPException, Query, Request, status
from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, get_job_or_403, get_membership_context
from ...core.config import settings
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...models.audit_log import AuditAction
from ...models.user import User, UserRole
from ...models.vtt_version import (
VttDiffResponse,
VttKind,
VttVersionListResponse,
VttVersionSummary,
)
from ...services import vtt_versioning
from ...services.audit_logger import audit_logger
from ...services.gcs import gcs_service
router = APIRouter(prefix="/jobs", tags=["vtt-versions"])
_EDITABLE_ROLES = (UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)
@router.get("/{job_id}/vtt/versions", response_model=VttVersionListResponse)
async def list_vtt_versions(
job_id: str,
lang: str = Query(...),
kind: VttKind = Query(...),
skip: int = Query(0, ge=0),
limit: int = Query(50, ge=1, le=200),
current_user: User = Depends(require_roles(*_EDITABLE_ROLES)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List all VTT versions for a job/lang/kind, newest first."""
await get_job_or_403(job_id, ctx, db) # org check
return await vtt_versioning.list_versions(db, job_id, lang, kind, skip, limit)
@router.get("/{job_id}/vtt/versions/{version}", response_model=dict)
async def get_vtt_version(
job_id: str,
version: int,
lang: str = Query(...),
kind: VttKind = Query(...),
current_user: User = Depends(require_roles(*_EDITABLE_ROLES)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Get full VTT content for a specific version."""
await get_job_or_403(job_id, ctx, db) # org check
v = await vtt_versioning.get_version(db, job_id, lang, kind, version)
if not v:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Version not found")
return {
"job_id": v.job_id,
"lang": v.lang,
"kind": v.kind,
"version": v.version,
"content": v.content,
"gcs_uri": v.gcs_uri,
"created_at": v.created_at.isoformat(),
"created_by": v.created_by.dict(),
"note": v.note,
"parent_version": v.parent_version,
"cue_count": v.cue_count,
"byte_size": v.byte_size,
}
@router.get("/{job_id}/vtt/versions/diff", response_model=VttDiffResponse)
async def diff_vtt_versions(
job_id: str,
lang: str = Query(...),
kind: VttKind = Query(...),
from_version: int = Query(..., alias="from"),
to_version: int = Query(..., alias="to"),
current_user: User = Depends(require_roles(*_EDITABLE_ROLES)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Line-level diff between two versions of a VTT file."""
await get_job_or_403(job_id, ctx, db) # org check
v_from = await vtt_versioning.get_version(db, job_id, lang, kind, from_version)
v_to = await vtt_versioning.get_version(db, job_id, lang, kind, to_version)
if not v_from:
raise HTTPException(status_code=404, detail=f"Version {from_version} not found")
if not v_to:
raise HTTPException(status_code=404, detail=f"Version {to_version} not found")
return vtt_versioning.diff_versions(job_id, lang, kind, v_from, v_to)
@router.post(
"/{job_id}/vtt/versions/{version}/restore",
response_model=VttVersionSummary,
status_code=status.HTTP_201_CREATED,
)
async def restore_vtt_version(
job_id: str,
version: int,
lang: str = Query(...),
kind: VttKind = Query(...),
http_request: Request = None,
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""
Restore a previous version as the new live VTT.
Non-destructive: creates a new version entry whose content mirrors the old one,
then overwrites the live GCS file.
"""
await get_job_or_403(job_id, ctx, db) # org check
src = await vtt_versioning.get_version(db, job_id, lang, kind, version)
if not src:
raise HTTPException(status_code=404, detail="Version not found")
# Create new version snapshot (this also bumps the counter)
new_ver = await vtt_versioning.restore_version(db, job_id, lang, kind, version, current_user)
# Overwrite the live file in GCS so the QC editor sees the restored content
live_path = f"{job_id}/{lang}/{'captions' if kind == 'captions' else 'ad'}.vtt"
try:
await gcs_service.upload_text_to_gcs(src.content, live_path, "text/vtt")
except Exception as exc:
raise HTTPException(
status_code=500,
detail=f"Version snapshot created (v{new_ver.version}) but live file update failed: {exc}",
) from None
# Update the GCS URI pointer in the job document
gcs_uri_key = "captions_vtt_gcs" if kind == "captions" else "ad_vtt_gcs"
new_gcs_uri = f"gs://{settings.gcs_bucket}/{live_path}"
await db.jobs.update_one(
{"_id": job_id},
{"$set": {f"outputs.{lang}.{gcs_uri_key}": new_gcs_uri}},
)
await audit_logger.log_action(
action=AuditAction.VTT_EDIT,
description=f"VTT restored to v{version} for job {job_id} lang={lang} kind={kind}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "kind": kind, "restored_from_version": version, "new_version": new_ver.version},
)
return new_ver

View file

@ -5,146 +5,107 @@ Provides WebSocket endpoints for:
1. Individual job status updates: /ws/jobs/{job_id}
2. Job list updates: /ws/jobs (all jobs for authenticated user)
"""
import asyncio
import logging
from typing import Optional
from fastapi import (
APIRouter,
Depends,
Query,
WebSocket,
WebSocketDisconnect,
)
from fastapi import APIRouter, WebSocket, WebSocketDisconnect, HTTPException, Depends, Query
from fastapi.security import HTTPBearer
from ...core.authz import PLATFORM_ADMIN_ROLES, _cached_memberships
from ...core.database import get_database
from ...models.user import UserRole
from ...services.websocket import (
ConnectionManager,
authenticate_websocket,
connection_manager,
authenticate_websocket,
get_connection_manager,
ConnectionManager
)
from ...models.job import Job
from ...core.database import get_database
from ...core.dependencies import get_current_user
logger = logging.getLogger(__name__)
router = APIRouter(tags=["WebSocket"])
security = HTTPBearer()
# Close codes that indicate a permanent auth/permission failure — frontend must NOT retry
_TERMINAL_CLOSE_CODES = {4001, 4003, 4004, 4403}
# Seconds between server-side keepalive frames.
# Must be < Apache mod_proxy_wstunnel idle timeout.
# Mod Comms incident 2026-03-18: 25s was insufficient; 20s is safe.
_KEEPALIVE_INTERVAL_S = 20
async def _resolve_user_and_org(websocket: WebSocket, user_id: str, db):
"""
Fetch user document and resolve org memberships from cache.
Returns (user_doc, memberships_dict) or closes the socket and returns (None, None).
"""
user = await db["users"].find_one({"_id": user_id})
if not user:
try:
from bson import ObjectId
user = await db["users"].find_one({"_id": ObjectId(user_id)})
except Exception:
pass
if not user:
await websocket.close(code=4001, reason="User not found")
return None, None
is_platform_admin = UserRole(user.get("role", "")) in PLATFORM_ADMIN_ROLES
if is_platform_admin:
return user, None # None memberships = unrestricted
memberships = await _cached_memberships(user_id, db)
return user, memberships
def _can_access_org(org_id: str | None, memberships: dict | None) -> bool:
"""Return True if user (with these memberships) may access the given org_id."""
if memberships is None:
return True # platform admin
if not org_id:
return True # legacy job without org: allow (further checks done below if needed)
return org_id in memberships
@router.websocket("/ws/jobs/{job_id}")
async def websocket_job_status(
websocket: WebSocket,
job_id: str,
token: str | None = Query(None),
token: Optional[str] = Query(None),
manager: ConnectionManager = Depends(get_connection_manager)
):
"""
WebSocket endpoint for real-time job status updates.
WebSocket endpoint for real-time job status updates
Usage:
- Connect: ws://localhost:8000/api/v1/ws/jobs/{job_id}?token={jwt_token}
- Receives: Real-time status updates for the specific job
Close codes:
4001 user not found
4003 role-based access denied
4004 job not found
4403 org membership access denied (do not retry)
Message format:
{
"type": "job_status_update",
"data": {
"job_id": "...",
"status": "processing",
"updated_at": "2023-...",
"message": "Processing video...",
"progress": 45
}
}
"""
# Authenticate the WebSocket connection
user_id = await authenticate_websocket(websocket, token)
if not user_id:
return
try:
# Verify user has access to this job
db = await get_database()
job = await db["jobs"].find_one({"_id": job_id})
jobs_collection = db["jobs"]
job = await jobs_collection.find_one({"_id": job_id})
if not job:
await websocket.close(code=4004, reason="Job not found")
return
user, memberships = await _resolve_user_and_org(websocket, user_id, db)
if user is None:
return # socket already closed inside helper
# Role-based client restriction
# Check permissions - users can only access their own jobs unless they're admin/reviewer
user = await db["users"].find_one({"_id": user_id})
if not user:
try:
from bson import ObjectId
user = await db["users"].find_one({"_id": ObjectId(user_id)})
except Exception:
pass # Invalid ObjectId format
if not user:
await websocket.close(code=4001, reason="User not found")
return
# Check access permissions
if user["role"] == "client" and job.get("created_by") != user_id:
await websocket.close(code=4003, reason="Access denied")
return
# Org membership check
job_org = job.get("organization_id")
if not _can_access_org(job_org, memberships):
await websocket.close(code=4403, reason="Org access denied")
return
# Connect to job status updates
await manager.connect_job_status(websocket, user_id, job_id)
# Keep connection alive and handle incoming messages
while True:
try:
# Wait up to _KEEPALIVE_INTERVAL_S for a client message.
# On timeout send a keepalive frame so the proxy idle timer resets.
message = await asyncio.wait_for(
websocket.receive_text(),
timeout=_KEEPALIVE_INTERVAL_S,
)
# Wait for incoming WebSocket messages (for heartbeat, etc.)
message = await websocket.receive_text()
logger.debug(f"Received WebSocket message from user {user_id}: {message}")
# Handle heartbeat or other client messages if needed
if message == "ping":
await websocket.send_text("pong")
except TimeoutError:
await websocket.send_text("keepalive")
except WebSocketDisconnect:
break
except Exception as e:
logger.error(f"Error in WebSocket message handling: {e}")
break
except WebSocketDisconnect:
pass
except Exception as e:
@ -156,54 +117,75 @@ async def websocket_job_status(
@router.websocket("/ws/jobs")
async def websocket_job_list(
websocket: WebSocket,
token: str | None = Query(None),
token: Optional[str] = Query(None),
manager: ConnectionManager = Depends(get_connection_manager)
):
"""
WebSocket endpoint for real-time job list updates.
WebSocket endpoint for real-time job list updates
Usage:
- Connect: ws://localhost:8000/api/v1/ws/jobs?token={jwt_token}
- Receives: Real-time status updates for all jobs the user can access
Only events for jobs in the user's accessible orgs are delivered.
Message format:
{
"type": "job_list_update",
"data": {
"job_id": "...",
"status": "processing",
"updated_at": "2023-...",
"message": "Processing video...",
"progress": 45
}
}
"""
# Authenticate the WebSocket connection
user_id = await authenticate_websocket(websocket, token)
if not user_id:
return
try:
# Verify user exists
logger.info(f"WebSocket: Looking up user {user_id} in database")
db = await get_database()
user, memberships = await _resolve_user_and_org(websocket, user_id, db)
if user is None:
return # socket already closed inside helper
# Try looking up user by string ID first, then by ObjectId
user = await db["users"].find_one({"_id": user_id})
if not user:
try:
from bson import ObjectId
user = await db["users"].find_one({"_id": ObjectId(user_id)})
except Exception:
pass # Invalid ObjectId format
if not user:
logger.warning(f"WebSocket: User {user_id} not found in database (tried both string and ObjectId)")
await websocket.close(code=4001, reason="User not found")
return
logger.info(f"WebSocket: User {user_id} found, role: {user.get('role', 'unknown')}")
accessible_org_ids = None if memberships is None else list(memberships.keys())
await manager.connect_job_list(websocket, user_id, accessible_org_ids=accessible_org_ids)
logger.info(f"WebSocket: User {user_id} found, connecting to job list updates")
# Connect to job list updates
await manager.connect_job_list(websocket, user_id)
# Keep connection alive and handle incoming messages
while True:
try:
message = await asyncio.wait_for(
websocket.receive_text(),
timeout=_KEEPALIVE_INTERVAL_S,
)
# Wait for incoming WebSocket messages
message = await websocket.receive_text()
logger.debug(f"Received WebSocket message from user {user_id}: {message}")
# Handle heartbeat or other client messages if needed
if message == "ping":
await websocket.send_text("pong")
except TimeoutError:
await websocket.send_text("keepalive")
except WebSocketDisconnect:
break
except Exception as e:
logger.error(f"Error in WebSocket message handling: {e}")
break
except WebSocketDisconnect:
pass
except Exception as e:
@ -214,15 +196,19 @@ async def websocket_job_list(
@router.get("/ws/status")
async def websocket_status():
"""Get WebSocket connection status and statistics (debug/monitoring)."""
"""
Get WebSocket connection status and statistics
Useful for debugging and monitoring
"""
stats = {
"active_connections": len(connection_manager.active_connections),
"job_subscriptions": len(connection_manager.job_subscriptions),
"global_subscriptions": len(connection_manager.global_subscriptions),
"redis_connected": connection_manager.redis_client is not None,
"subscriber_running": (
connection_manager.subscriber_task is not None and
connection_manager.subscriber_task is not None and
not connection_manager.subscriber_task.done()
)
}
return stats
return stats

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -11,6 +11,7 @@ Provides:
import json
from dataclasses import dataclass
from typing import Optional
from fastapi import Depends, HTTPException, status
from motor.motor_asyncio import AsyncIOMotorDatabase
@ -63,10 +64,10 @@ async def _cached_memberships(
db: AsyncIOMotorDatabase,
) -> dict[str, OrgRole]:
"""Load memberships, with Redis cache (60s TTL)."""
cache_key = f"mem:user:{user_id}"
try:
redis = await get_redis()
redis = get_redis()
if redis:
cache_key = f"mem:user:{user_id}"
cached = await redis.get(cache_key)
if cached:
raw = json.loads(cached)
@ -77,7 +78,7 @@ async def _cached_memberships(
memberships = await _load_memberships(user_id, db)
try:
redis = await get_redis()
redis = get_redis()
if redis:
await redis.setex(
cache_key,
@ -158,7 +159,7 @@ class OrgScopedQuery:
def filter(
self,
base_query: dict,
org_id: str | None = None,
org_id: Optional[str] = None,
org_field: str = "organization_id",
) -> dict:
if self.ctx.is_platform_admin:
@ -182,50 +183,6 @@ class OrgScopedQuery:
return {**base_query, org_field: {"$in": accessible}}
def assert_user_in_org(
ctx: "MembershipContext",
org_id: str,
min_role: OrgRole = OrgRole.VIEWER,
) -> None:
"""Raise 403 if ctx user does not have min_role in org_id. Platform admins always pass."""
if not ctx.can_access_org(org_id, min_role):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Access to this organization is not permitted",
)
async def get_job_or_403(
job_id: str,
ctx: "MembershipContext",
db: AsyncIOMotorDatabase,
) -> dict:
"""Load job document and verify ctx user can access its organization. Returns 404 for missing jobs."""
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
org_id = job_doc.get("organization_id")
if not org_id:
# Legacy job without org: try resolving via project
project_id = job_doc.get("project_id")
if project_id:
project = await db.projects.find_one({"_id": project_id}, {"client_id": 1})
if project:
org_id = project.get("client_id")
if org_id:
if not ctx.can_access_org(org_id):
# Return 404 to avoid leaking existence of cross-org jobs
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
else:
# Truly legacy job (no project, no org): only the original uploader or admin can access
if not ctx.is_platform_admin and job_doc.get("client_id") != str(ctx.user.id):
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
return job_doc
async def bump_user_membership_cache(user_id: str) -> None:
"""Invalidate the Redis membership cache for a user (call on any membership write)."""
try:

View file

@ -6,7 +6,6 @@ class Settings(BaseSettings):
# App
app_env: str = "dev"
api_base_url: str = "http://localhost:8000"
app_url: str = "https://optical-dev.oliver.solutions/video-accessibility"
# Auth
jwt_secret: str
@ -23,14 +22,13 @@ class Settings(BaseSettings):
# Redis
redis_url: str
# Celery
celery_broker_url: str = ""
celery_result_backend: str = ""
# GCP
gcp_project_id: str
gcp_location: str = "us-central1"
gcs_bucket: str = "accessible-video"
google_application_credentials: str = ""
@ -38,7 +36,7 @@ class Settings(BaseSettings):
gemini_api_key: str
elevenlabs_api_key: str = ""
google_tts_credentials: str = ""
# TTS Voice Configuration
tts_provider: str = "gemini" # "gemini", "google", or "elevenlabs"
google_tts_voices: dict[str, str] = {
@ -52,7 +50,7 @@ class Settings(BaseSettings):
elevenlabs_voices: dict[str, str] = {}
# Gemini TTS Configuration
gemini_tts_model: str = "gemini-3.1-flash-tts-preview"
gemini_tts_model: str = "gemini-2.5-flash-preview-tts"
gemini_tts_default_voice: str = "Kore"
gemini_tts_voices: list[str] = [
"Zephyr", "Puck", "Charon", "Kore", "Fenrir", "Leda", "Orus", "Aoede",
@ -95,24 +93,7 @@ class Settings(BaseSettings):
"sv": "sv-SE",
"es-419": "es-US",
"pt-BR": "pt-BR",
"fr-CA": "fr-CA",
# Explicit region variants (added for locale-aware glossary support)
"de-DE": "de-DE",
"en-US": "en-US",
"en-GB": "en-GB",
"en-CA": "en-CA",
"es-ES": "es-ES",
"es-MX": "es-US",
"fr-FR": "fr-FR",
"it-IT": "it-IT",
"ja-JP": "ja-JP",
"ko-KR": "ko-KR",
"nl-NL": "nl-NL",
"pl-PL": "pl-PL",
"cs-CZ": "cs-CZ",
"tr-TR": "tr-TR",
"id-ID": "id-ID",
"pt-PT": "pt-PT",
"fr-CA": "fr-CA"
}
gemini_tts_language_names: dict[str, str] = {
"en": "English",
@ -148,24 +129,7 @@ class Settings(BaseSettings):
"sv": "Swedish",
"es-419": "Spanish (Latin America)",
"pt-BR": "Portuguese (Brazil)",
"fr-CA": "French (Canada)",
# Explicit region variants
"de-DE": "German (Germany)",
"en-US": "English (US)",
"en-GB": "English (UK)",
"en-CA": "English (Canada)",
"es-ES": "Spanish (Spain)",
"es-MX": "Spanish (Mexico)",
"fr-FR": "French (France)",
"it-IT": "Italian (Italy)",
"ja-JP": "Japanese (Japan)",
"ko-KR": "Korean (Korea)",
"nl-NL": "Dutch (Netherlands)",
"pl-PL": "Polish (Poland)",
"cs-CZ": "Czech (Czech Republic)",
"tr-TR": "Turkish (Turkey)",
"id-ID": "Indonesian (Indonesia)",
"pt-PT": "Portuguese (Portugal)",
"fr-CA": "French (Canada)"
}
gemini_tts_preview_samples: dict[str, str] = {
"en": "This is a preview of the audio description voice.",
@ -201,30 +165,13 @@ class Settings(BaseSettings):
"sv": "Det här är en förhandsgranskning av ljudbeskrivningsrösten.",
"es-419": "Esta es una vista previa de la voz de audiodescripción.",
"pt-BR": "Esta é uma prévia da voz da audiodescrição.",
"fr-CA": "Ceci est un aperçu de la voix de l'audiodescription.",
# Explicit region variants
"de-DE": "Dies ist eine Vorschau der Audiodeskriptionsstimme.",
"en-US": "This is a preview of the audio description voice.",
"en-GB": "This is a preview of the audio description voice.",
"en-CA": "This is a preview of the audio description voice.",
"es-ES": "Esta es una vista previa de la voz de audiodescripción.",
"es-MX": "Esta es una vista previa de la voz de audiodescripción.",
"fr-FR": "Ceci est un aperçu de la voix de l'audiodescription.",
"it-IT": "Questa è un'anteprima della voce dell'audiodescrizione.",
"ja-JP": "これは音声解説の声のプレビューです。",
"ko-KR": "이것은 오디오 설명 음성의 미리보기입니다.",
"nl-NL": "Dit is een voorbeeld van de audiodescriptiestem.",
"pl-PL": "To jest podgląd głosu audiodeskrypcji.",
"cs-CZ": "Toto je náhled hlasu zvukového popisu.",
"tr-TR": "Bu, sesli betimleme sesinin bir önizlemesidir.",
"id-ID": "Ini adalah pratinjau suara deskripsi audio.",
"pt-PT": "Esta é uma pré-visualização da voz da audiodescrição.",
"fr-CA": "Ceci est un aperçu de la voix de l'audiodescription."
}
# Gemini TTS Model Options
gemini_tts_models: dict[str, str] = {
"flash": "gemini-3.1-flash-tts-preview", # Fast, cost-efficient (Preview)
"pro": "gemini-2.5-pro-tts", # Higher quality (GA)
"flash": "gemini-2.5-flash-preview-tts", # Fast, cost-efficient
"pro": "gemini-2.5-pro-preview-tts", # Higher quality
}
# Gemini TTS Style Presets - prompts prepended to text for style control
@ -249,14 +196,6 @@ class Settings(BaseSettings):
whisper_sentence_gap_threshold: float = 0.5 # Gap duration to classify as sentence boundary
whisper_phrase_gap_threshold: float = 0.3 # Gap duration to classify as phrase boundary
whisper_min_gap_threshold: float = 0.15 # Minimum gap duration to consider
# Forward-preferred snap windows (A2)
whisper_snap_forward_window: float = 4.0 # Prefer boundary up to N seconds ahead of Gemini point
whisper_snap_backward_window: float = 1.5 # Fall back to boundary up to N seconds behind
# Adaptive silence buffer (A1)
ad_silence_buffer_default: float = 0.5 # Base silence duration (s) before/after AD audio
ad_silence_buffer_min_after: float = 0.1 # Minimum silence after AD audio
# Minimum gap required at the chosen pause point (A3)
ad_min_acceptable_gap: float = 0.2 # Seconds; points with shorter gaps trigger forward search
# Cloud Run Service URLs (empty = use local processing)
# When set, CPU-intensive work is offloaded to Cloud Run with autoscaling
@ -275,10 +214,11 @@ class Settings(BaseSettings):
ffmpeg_worker_concurrency: int = 4 # FFmpeg tasks on main worker
tts_worker_concurrency: int = 8 # TTS worker
# Email (Mailgun)
# Email (Mailgun — primary; sendgrid_api_key kept for backward compat)
mailgun_api_key: str = ""
mailgun_domain: str = "mg.oliver.solutions"
mailgun_from: str = "noreply@mg.oliver.solutions"
sendgrid_api_key: str = ""
email_from: str = "noreply@mg.oliver.solutions"
client_base_url: str
@ -297,10 +237,6 @@ class Settings(BaseSettings):
cost_tracker_source_app: str = "video-accessibility"
cost_tracker_enabled: bool = True
# Upload limits (T-14 — single source of truth)
upload_max_video_bytes: int = 2 * 1024 * 1024 * 1024 # 2GB
upload_signed_url_ttl_hours: int = 24 # signed URL lifetime
# CORS - comma-separated list of allowed origins
cors_origins: str = "http://localhost:5173,http://localhost:5174,http://localhost:3000,http://localhost:6001"

View file

@ -56,7 +56,7 @@ async def create_indexes():
await db.audit_logs.create_index([("resource_type", 1), ("resource_id", 1)]) # Resource tracking
await db.audit_logs.create_index([("ip_address", 1), ("timestamp", -1)]) # IP-based analysis
await db.audit_logs.create_index([("success", 1), ("timestamp", -1)]) # Failed operations
# Text search index for description and details
await db.audit_logs.create_index([
("description", "text"),
@ -64,19 +64,9 @@ async def create_indexes():
("error_message", "text")
])
# Per-language QC assignment index — for linguist queue queries
await db.jobs.create_index([("qc_assignments.linguist_id", 1), ("qc_assignments.status", 1)])
# Review notes collection indexes
await db.review_notes.create_index([("job_id", 1), ("asset_key", 1)])
await db.review_notes.create_index([("job_id", 1), ("asset_key", 1), ("timestamp_seconds", 1)])
await db.review_notes.create_index([("user_id", 1)])
# VTT versions collection indexes
await db.vtt_versions.create_index(
[("job_id", 1), ("lang", 1), ("kind", 1), ("version", -1)],
unique=True,
)
await db.vtt_versions.create_index([("job_id", 1), ("created_at", -1)])
logger.info("Database indexes created successfully")

View file

@ -1,16 +1,18 @@
from typing import Optional
from fastapi import Depends, HTTPException, Request, status
from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
from motor.motor_asyncio import AsyncIOMotorDatabase
from ..models.user import User, UserRole
from .config import settings
from .database import get_database
from .security import decode_token
security = HTTPBearer()
# Only admins bypass tenant isolation; other staff are scoped by team membership
STAFF_ROLES = {UserRole.ADMIN}
# Roles that see all jobs (no tenant isolation)
STAFF_ROLES = {UserRole.ADMIN, UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION}
async def get_current_user(
@ -19,13 +21,6 @@ async def get_current_user(
) -> User:
token = credentials.credentials
payload = decode_token(token)
if payload.get("type") == "refresh":
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Could not validate credentials",
)
user_id: str = payload.get("sub")
if user_id is None:
@ -41,12 +36,7 @@ async def get_current_user(
detail="User not found",
)
user = User(**user_doc)
# Attach org_ids hint from token as transient attribute (never used for authz)
token_org_ids = payload.get("org_ids", [])
if token_org_ids:
user.__dict__["org_ids"] = token_org_ids
return user
return User(**user_doc)
def require_role(required_role: UserRole):
@ -76,7 +66,7 @@ def require_roles(*required_roles: UserRole):
async def get_current_user_optional(
request: Request,
db: AsyncIOMotorDatabase = Depends(get_database),
) -> User | None:
) -> Optional[User]:
authorization: str = request.headers.get("Authorization")
if not authorization:
return None
@ -87,9 +77,6 @@ async def get_current_user_optional(
return None
payload = decode_token(token)
if payload.get("type") == "refresh":
return None
user_id: str = payload.get("sub")
if user_id is None:
@ -107,28 +94,21 @@ async def get_current_user_optional(
async def get_accessible_project_ids(
user: User,
db: AsyncIOMotorDatabase,
) -> list[str] | None:
) -> Optional[list[str]]:
"""
Returns project IDs the user may access, or None meaning "see everything".
- Admin None (unrestricted)
- Staff (REVIEWER/LINGUIST/PRODUCTION) scoped by team membership;
if not yet assigned to any team, falls back to None (see all)
so existing staff aren't locked out before teams are configured
- PM projects in accessible orgs/clients (pm_client_ids legacy)
- CLIENT projects in orgs where the user holds any membership
- Staff / Admin None (unrestricted)
- Otherwise projects in orgs where the user holds any membership
(falls back to legacy pm_client_ids/team lookups if no memberships found)
"""
if user.role in STAFF_ROLES:
return None
# Primary path: use memberships collection (Phase 3 SaaS)
user_id = str(user.id)
# Primary path: use Redis-cached memberships (60s TTL, same cache as authz.py)
from .authz import (
_cached_memberships, # local import to avoid circular dep at module level
)
memberships_map = await _cached_memberships(user_id, db)
org_ids = list(memberships_map.keys())
membership_cursor = db.memberships.find({"user_id": user_id}, {"organization_id": 1})
org_ids = [doc["organization_id"] async for doc in membership_cursor]
if org_ids:
projects = await db.projects.find(
@ -137,98 +117,29 @@ async def get_accessible_project_ids(
).to_list(None)
return [str(p["_id"]) for p in projects]
# Legacy fallback: team membership (used by REVIEWER/LINGUIST/PRODUCTION and legacy CLIENT)
teams = await db.teams.find(
{"member_user_ids": user_id},
{"client_id": 1},
).to_list(None)
client_ids = list({t["client_id"] for t in teams})
if client_ids:
# Legacy fallback (pre-backfill) — keeps the app working before migration runs
if user.role == UserRole.PROJECT_MANAGER:
client_ids = user.pm_client_ids or []
if not client_ids:
return []
projects = await db.projects.find(
{"client_id": {"$in": client_ids}, "is_active": True},
{"_id": 1},
).to_list(None)
return [str(p["_id"]) for p in projects]
# PM legacy: scoped via pm_client_ids
if user.role == UserRole.PROJECT_MANAGER:
pm_client_ids = user.pm_client_ids or []
if not pm_client_ids:
return []
projects = await db.projects.find(
{"client_id": {"$in": pm_client_ids}, "is_active": True},
{"_id": 1},
).to_list(None)
return [str(p["_id"]) for p in projects]
# Staff with no team assignments → unrestricted until teams are configured
if user.role in {UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION}:
return None
# CLIENT with no memberships and no teams → show nothing
return []
async def get_user_org_ids(user: User, db: AsyncIOMotorDatabase) -> list[str] | None:
"""Return org IDs the user belongs to, or None meaning unrestricted (ADMIN).
Priority: memberships pm_client_ids (PM legacy) team.member_user_ids (staff legacy)
"""
if user.role == UserRole.ADMIN:
return None
user_id = str(user.id)
# Primary: Membership collection
org_ids: list[str] = []
async for m in db.memberships.find({"user_id": user_id}, {"organization_id": 1}):
if m.get("organization_id"):
org_ids.append(str(m["organization_id"]))
if org_ids:
return org_ids
# PM legacy: pm_client_ids
if user.role == UserRole.PROJECT_MANAGER:
return list(user.pm_client_ids or [])
# Staff legacy: team.member_user_ids
teams = await db.teams.find({"member_user_ids": user_id}, {"client_id": 1}).to_list(None)
if teams:
return [str(t["client_id"]) for t in teams if t.get("client_id")]
return []
async def assert_job_in_user_org(job: dict, user: User, db: AsyncIOMotorDatabase) -> None:
"""Raise 404 (not 403) when user cannot access this job — avoids information disclosure."""
if user.role == UserRole.ADMIN:
return
org_ids = await get_user_org_ids(user, db)
if org_ids is None:
return # unrestricted
job_org = job.get("organization_id")
if job_org:
if job_org in org_ids:
return
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
# No organization_id — try project fallback
project_id = job.get("project_id")
if project_id:
project = await db.projects.find_one({"_id": project_id}, {"client_id": 1})
if project and project.get("client_id") in org_ids:
return
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
# Legacy: client_id == creator user_id
job_client_id = job.get("client_id")
if job_client_id and job_client_id == str(user.id):
return
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
teams = await db.teams.find(
{"member_user_ids": user_id},
{"client_id": 1},
).to_list(None)
client_ids = list({t["client_id"] for t in teams})
if not client_ids:
return []
projects = await db.projects.find(
{"client_id": {"$in": client_ids}, "is_active": True},
{"_id": 1},
).to_list(None)
return [str(p["_id"]) for p in projects]
def require_pm_for_client(client_id_param: str = "client_id"):

View file

@ -1,6 +1,10 @@
"""Enhanced configuration system with Secret Manager integration."""
import os
import asyncio
from typing import Dict, Optional, Any
from functools import lru_cache
from pydantic_settings import BaseSettings
from .config import Settings as BaseConfig
from .logging import get_logger
@ -10,40 +14,41 @@ logger = get_logger(__name__)
class SecretsConfig(BaseConfig):
"""Enhanced configuration that loads secrets from GCP Secret Manager."""
def __init__(self, **kwargs):
# Initialize with base configuration first
super().__init__(**kwargs)
# Flag to track if secrets have been loaded
self._secrets_loaded = False
self._secret_values: dict[str, str] = {}
self._secret_values: Dict[str, str] = {}
async def load_secrets(self) -> None:
"""Load secrets from Secret Manager asynchronously."""
if self._secrets_loaded:
return
try:
# Only import here to avoid circular imports
from app.services.secrets_manager import secrets_manager
# Define which config fields should be loaded from secrets
secret_mappings = {
# Config field -> Secret Manager name
"jwt_secret": "jwt-secret",
"jwt_refresh_secret": "jwt-refresh-secret",
"jwt_refresh_secret": "jwt-refresh-secret",
"mongodb_uri": "mongodb-url",
"redis_url": "redis-url",
"gemini_api_key": "gemini-api-key",
"sendgrid_api_key": "sendgrid-api-key",
"elevenlabs_api_key": "elevenlabs-api-key",
"sentry_dsn": "sentry-dsn"
}
# Get all secrets in batch
secret_names = list(secret_mappings.values())
retrieved_secrets = await secrets_manager.get_secrets_batch(secret_names)
# Map secrets back to config fields
for config_field, secret_name in secret_mappings.items():
if secret_name in retrieved_secrets:
@ -53,50 +58,50 @@ class SecretsConfig(BaseConfig):
logger.debug(f"Loaded secret for {config_field}")
else:
logger.warning(f"Secret {secret_name} not available, using environment/default")
self._secrets_loaded = True
logger.info(f"Successfully loaded {len(retrieved_secrets)} secrets from Secret Manager")
except Exception as e:
logger.warning(f"Failed to load secrets from Secret Manager: {e}")
logger.warning("Falling back to environment variables")
self._secrets_loaded = True # Mark as loaded to prevent retries
def get_secret_value(self, field_name: str) -> str | None:
def get_secret_value(self, field_name: str) -> Optional[str]:
"""Get a secret value if it was loaded from Secret Manager."""
return self._secret_values.get(field_name)
async def refresh_secrets(self) -> None:
"""Force refresh secrets from Secret Manager."""
self._secrets_loaded = False
self._secret_values.clear()
# Clear the secrets manager cache
from app.services.secrets_manager import secrets_manager
secrets_manager.clear_cache()
await self.load_secrets()
@property
def is_production(self) -> bool:
"""Check if running in production environment."""
return self.app_env == "prod"
@property
def is_development(self) -> bool:
"""Check if running in development environment."""
return self.app_env == "dev"
@property
def google_cloud_project(self) -> str:
"""Get Google Cloud Project ID."""
return self.gcp_project_id
@property
def jwt_refresh_secret(self) -> str:
"""Get JWT refresh secret (fallback to main secret if not set)."""
return getattr(self, '_jwt_refresh_secret', self.jwt_secret)
@jwt_refresh_secret.setter
def jwt_refresh_secret(self, value: str) -> None:
"""Set JWT refresh secret."""
@ -104,37 +109,37 @@ class SecretsConfig(BaseConfig):
# Global configuration instance
_config_instance: SecretsConfig | None = None
_config_instance: Optional[SecretsConfig] = None
async def initialize_config() -> SecretsConfig:
"""Initialize configuration with secrets loading."""
global _config_instance
if _config_instance is None:
_config_instance = SecretsConfig()
await _config_instance.load_secrets()
return _config_instance
def get_settings() -> SecretsConfig:
"""Get settings instance (synchronous)."""
global _config_instance
if _config_instance is None:
# Initialize without secrets for backwards compatibility
_config_instance = SecretsConfig()
logger.warning("Settings accessed before async initialization - secrets not loaded")
return _config_instance
@lru_cache
@lru_cache()
def get_settings_cached() -> SecretsConfig:
"""Get cached settings instance."""
return get_settings()
# Backwards compatibility
settings = get_settings()
settings = get_settings()

View file

@ -1,5 +1,5 @@
from datetime import datetime, timedelta
from typing import Any
from typing import Any, Optional, Union
from fastapi import HTTPException, status
from jose import JWTError, jwt
@ -11,24 +11,20 @@ pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
def create_access_token(
subject: str | Any,
expires_delta: timedelta | None = None,
org_ids: list[str] | None = None,
subject: Union[str, Any], expires_delta: Optional[timedelta] = None
) -> str:
if expires_delta:
expire = datetime.utcnow() + expires_delta
else:
expire = datetime.utcnow() + timedelta(minutes=settings.jwt_access_ttl_min)
to_encode: dict[str, Any] = {"exp": expire, "sub": str(subject), "v": 2}
if org_ids:
to_encode["org_ids"] = org_ids
to_encode = {"exp": expire, "sub": str(subject)}
encoded_jwt = jwt.encode(to_encode, settings.jwt_secret, algorithm=settings.jwt_alg)
return encoded_jwt
def create_refresh_token(
subject: str | Any, expires_delta: timedelta | None = None
subject: Union[str, Any], expires_delta: Optional[timedelta] = None
) -> str:
if expires_delta:
expire = datetime.utcnow() + expires_delta
@ -41,8 +37,6 @@ def create_refresh_token(
def verify_password(plain_password: str, hashed_password: str) -> bool:
if not hashed_password:
return False
return pwd_context.verify(plain_password, hashed_password)
@ -58,4 +52,4 @@ def decode_token(token: str) -> dict[str, Any]:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Could not validate credentials",
) from None
)

View file

@ -34,13 +34,7 @@ async def seed_default_admin(db) -> None:
print(f"✅ Default admin {DEFAULT_ADMIN_EMAIL} already exists")
return
password = os.environ.get("DEFAULT_ADMIN_PASSWORD")
if not password:
print(
"⚠️ DEFAULT_ADMIN_PASSWORD not set — skipping default admin creation. "
"Set this env var and restart to create the admin account."
)
return
password = os.environ.get("DEFAULT_ADMIN_PASSWORD", "ChangeMe123!")
user_doc = {
"_id": str(ObjectId()),
"email": DEFAULT_ADMIN_EMAIL,

Binary file not shown.

View file

@ -1,245 +0,0 @@
"""
Central locale registry.
Provides a single source of truth for BCP-47 codes, display names,
and Gemini-friendly labels used throughout the translation/TTS pipeline.
Convention: BCP-47 with hyphen separator (fr-FR, en-GB, pt-BR).
xlsx underscore format (fr_fr, en_gb) is normalized at import time.
Bare language-only codes (fr, en) remain valid for legacy compat.
"""
from __future__ import annotations
from dataclasses import dataclass
@dataclass(frozen=True)
class Locale:
code: str # canonical BCP-47 (e.g. "fr-FR")
display_name: str # human-readable (e.g. "French (France)")
gemini_label: str # what to pass to Gemini prompts (e.g. "French (France)")
tts_lang: str # BCP-47 for TTS API (may differ, e.g. es-MX → es-US)
preview_sample: str # sample sentence for TTS preview
# Master locale registry. Bare language codes (legacy) + explicit region variants.
_REGISTRY: dict[str, Locale] = {loc.code: loc for loc in [
# ── English ──────────────────────────────────────────────────────────────
Locale("en", "English", "English", "en-US",
"This is a preview of the audio description voice."),
Locale("en-US", "English (US)", "English (United States)", "en-US",
"This is a preview of the audio description voice."),
Locale("en-GB", "English (UK)", "English (United Kingdom)", "en-GB",
"This is a preview of the audio description voice."),
Locale("en-CA", "English (Canada)", "English (Canada)", "en-CA",
"This is a preview of the audio description voice."),
# ── Spanish ──────────────────────────────────────────────────────────────
Locale("es", "Spanish", "Spanish", "es-US",
"Esta es una vista previa de la voz de audiodescripcion."),
Locale("es-ES", "Spanish (Spain)", "Spanish (Spain)", "es-ES",
"Esta es una vista previa de la voz de audiodescripción."),
Locale("es-MX", "Spanish (Mexico)", "Spanish (Mexico)", "es-US",
"Esta es una vista previa de la voz de audiodescripción."),
Locale("es-419", "Spanish (Latin America)", "Spanish (Latin America)", "es-US",
"Esta es una vista previa de la voz de audiodescripción."),
# ── French ───────────────────────────────────────────────────────────────
Locale("fr", "French", "French", "fr-FR",
"Ceci est un apercu de la voix de l'audiodescription."),
Locale("fr-FR", "French (France)", "French (France)", "fr-FR",
"Ceci est un aperçu de la voix de l'audiodescription."),
Locale("fr-CA", "French (Canada)", "French (Canada)", "fr-CA",
"Ceci est un aperçu de la voix de l'audiodescription."),
# ── German ───────────────────────────────────────────────────────────────
Locale("de", "German", "German", "de-DE",
"Dies ist eine Vorschau der Audiodeskriptionsstimme."),
Locale("de-DE", "German (Germany)", "German (Germany)", "de-DE",
"Dies ist eine Vorschau der Audiodeskriptionsstimme."),
# ── Italian ──────────────────────────────────────────────────────────────
Locale("it", "Italian", "Italian", "it-IT",
"Questa e un'anteprima della voce dell'audiodescrizione."),
Locale("it-IT", "Italian (Italy)", "Italian (Italy)", "it-IT",
"Questa è un'anteprima della voce dell'audiodescrizione."),
# ── Portuguese ───────────────────────────────────────────────────────────
Locale("pt", "Portuguese", "Portuguese", "pt-BR",
"Esta e uma previa da voz da audiodescricao."),
Locale("pt-BR", "Portuguese (Brazil)", "Portuguese (Brazil)", "pt-BR",
"Esta é uma prévia da voz da audiodescrição."),
Locale("pt-PT", "Portuguese (Portugal)", "Portuguese (Portugal)", "pt-PT",
"Esta é uma pré-visualização da voz da audiodescrição."),
# ── Japanese ─────────────────────────────────────────────────────────────
Locale("ja", "Japanese", "Japanese", "ja-JP",
"これは音声解説の声のプレビューです。"),
Locale("ja-JP", "Japanese (Japan)", "Japanese (Japan)", "ja-JP",
"これは音声解説の声のプレビューです。"),
# ── Korean ───────────────────────────────────────────────────────────────
Locale("ko", "Korean", "Korean", "ko-KR",
"이것은 오디오 설명 음성의 미리보기입니다."),
Locale("ko-KR", "Korean (Korea)", "Korean (South Korea)", "ko-KR",
"이것은 오디오 설명 음성의 미리보기입니다."),
# ── Arabic ───────────────────────────────────────────────────────────────
Locale("ar", "Arabic", "Arabic", "ar-EG",
"هذه معاينة لصوت الوصف الصوتي."),
# ── Hindi ────────────────────────────────────────────────────────────────
Locale("hi", "Hindi", "Hindi", "hi-IN",
"यह ऑडियो विवरण आवाज का पूर्वावलोकन है।"),
# ── Indonesian ───────────────────────────────────────────────────────────
Locale("id", "Indonesian", "Indonesian", "id-ID",
"Ini adalah pratinjau suara deskripsi audio."),
Locale("id-ID", "Indonesian (Indonesia)", "Indonesian (Indonesia)", "id-ID",
"Ini adalah pratinjau suara deskripsi audio."),
# ── Dutch ────────────────────────────────────────────────────────────────
Locale("nl", "Dutch", "Dutch", "nl-NL",
"Dit is een voorbeeld van de audiodescriptiestem."),
Locale("nl-NL", "Dutch (Netherlands)", "Dutch (Netherlands)", "nl-NL",
"Dit is een voorbeeld van de audiodescriptiestem."),
# ── Polish ───────────────────────────────────────────────────────────────
Locale("pl", "Polish", "Polish", "pl-PL",
"To jest podglad glosu audiodeskrypcji."),
Locale("pl-PL", "Polish (Poland)", "Polish (Poland)", "pl-PL",
"To jest podgląd głosu audiodeskrypcji."),
# ── Russian ──────────────────────────────────────────────────────────────
Locale("ru", "Russian", "Russian", "ru-RU",
"Это предварительный просмотр голоса аудиоописания."),
# ── Thai ─────────────────────────────────────────────────────────────────
Locale("th", "Thai", "Thai", "th-TH",
"นี่คือตัวอย่างเสียงบรรยายภาพ"),
# ── Turkish ──────────────────────────────────────────────────────────────
Locale("tr", "Turkish", "Turkish", "tr-TR",
"Bu, sesli betimleme sesinin bir onizlemesidir."),
Locale("tr-TR", "Turkish (Turkey)", "Turkish (Turkey)", "tr-TR",
"Bu, sesli betimleme sesinin bir önizlemesidir."),
# ── Vietnamese ───────────────────────────────────────────────────────────
Locale("vi", "Vietnamese", "Vietnamese", "vi-VN",
"Day la ban xem truoc giong mo ta am thanh."),
# ── Romanian ─────────────────────────────────────────────────────────────
Locale("ro", "Romanian", "Romanian", "ro-RO",
"Aceasta este o previzualizare a vocii descrierii audio."),
# ── Ukrainian ────────────────────────────────────────────────────────────
Locale("uk", "Ukrainian", "Ukrainian", "uk-UA",
"Це попередній перегляд голосу аудіоопису."),
# ── Bengali ──────────────────────────────────────────────────────────────
Locale("bn", "Bengali", "Bengali", "bn-BD",
"এটি অডিও বর্ণনা ভয়েসের একটি প্রিভিউ।"),
# ── Marathi ──────────────────────────────────────────────────────────────
Locale("mr", "Marathi", "Marathi", "mr-IN",
"हे ऑडिओ वर्णन आवाजाचे पूर्वावलोकन आहे."),
# ── Tamil ────────────────────────────────────────────────────────────────
Locale("ta", "Tamil", "Tamil", "ta-IN",
"இது ஆடியோ விளக்க குரலின் முன்னோட்டம்."),
# ── Telugu ───────────────────────────────────────────────────────────────
Locale("te", "Telugu", "Telugu", "te-IN",
"ఇది ఆడియో వివరణ స్వరం యొక్క ప్రివ్యూ."),
# ── Chinese ──────────────────────────────────────────────────────────────
Locale("zh", "Chinese", "Chinese (Simplified)", "zh-CN",
"这是音频描述语音的预览。"),
# ── Czech ────────────────────────────────────────────────────────────────
Locale("cs", "Czech", "Czech", "cs-CZ",
"Toto je náhled hlasu zvukového popisu."),
Locale("cs-CZ", "Czech (Czech Republic)", "Czech (Czech Republic)", "cs-CZ",
"Toto je náhled hlasu zvukového popisu."),
# ── Danish ───────────────────────────────────────────────────────────────
Locale("da", "Danish", "Danish", "da-DK",
"Dette er en forhåndsvisning af lydbeskrivelsesstemmen."),
# ── Finnish ──────────────────────────────────────────────────────────────
Locale("fi", "Finnish", "Finnish", "fi-FI",
"Tämä on äänikuvauksen äänen esikatselu."),
# ── Hungarian ────────────────────────────────────────────────────────────
Locale("hu", "Hungarian", "Hungarian", "hu-HU",
"Ez a hangos leírás hangjának előnézete."),
# ── Norwegian ────────────────────────────────────────────────────────────
Locale("no", "Norwegian", "Norwegian", "nb-NO",
"Dette er en forhåndsvisning av lydbeskrivelsesstemmen."),
# ── Slovak ───────────────────────────────────────────────────────────────
Locale("sk", "Slovak", "Slovak", "sk-SK",
"Toto je náhľad hlasu zvukového popisu."),
# ── Swedish ──────────────────────────────────────────────────────────────
Locale("sv", "Swedish", "Swedish", "sv-SE",
"Det här är en förhandsgranskning av ljudbeskrivningsrösten."),
]}
# xlsx uses underscores; normalize to BCP-47 hyphen form
_XLSX_ALIASES: dict[str, str] = {
code.replace("-", "_").lower(): code
for code in _REGISTRY
if "-" in code
}
# a few extra mappings for edge cases
_XLSX_ALIASES.update({
"id": "id", # Indonesian column header is just "id" (no region)
})
def normalize_code(code: str) -> str:
"""
Normalize an arbitrary locale code to the canonical BCP-47 form used in this registry.
Handles:
- xlsx underscore form: "fr_fr" "fr-FR"
- Bare language code: "fr" "fr" (passthrough, legacy compat)
- Already canonical: "fr-FR" "fr-FR"
"""
if not code:
return code
lowered = code.strip().lower()
# e.g. "fr_fr" -> check alias table
if "_" in lowered:
return _XLSX_ALIASES.get(lowered, code.replace("_", "-").upper() if len(lowered) > 3 else code)
# Already hyphen form — canonicalise case
if "-" in code:
parts = code.split("-", 1)
canonical = f"{parts[0].lower()}-{parts[1].upper()}"
if canonical in _REGISTRY:
return canonical
return canonical
# Bare language code — return as-is (legacy)
return lowered
def get(code: str) -> Locale | None:
"""Return Locale for the given code, or None if unknown."""
canonical = normalize_code(code)
return _REGISTRY.get(canonical) or _REGISTRY.get(canonical.split("-")[0])
def get_display_name(code: str) -> str:
"""Human-readable display name, e.g. 'French (Canada)'."""
locale = get(code)
return locale.display_name if locale else code
def get_gemini_label(code: str) -> str:
"""
Label to use inside Gemini prompts, e.g. 'French (Canada)'.
Gemini models respond more reliably to human-readable language names
than to bare BCP-47 codes when used inside instruction prompts.
"""
locale = get(code)
return locale.gemini_label if locale else code
def get_tts_lang(code: str) -> str:
"""BCP-47 code for the TTS API (may differ from canonical, e.g. es-MX → es-US)."""
locale = get(code)
return locale.tts_lang if locale else code
def get_preview_sample(code: str) -> str:
"""Language-appropriate TTS preview sentence."""
locale = get(code)
if locale:
return locale.preview_sample
# fallback: try parent language then English
parent = get(code.split("-")[0]) if "-" in code else None
if parent:
return parent.preview_sample
return "This is a preview of the audio description voice."
def all_codes() -> list[str]:
"""Return all registered locale codes, sorted."""
return sorted(_REGISTRY.keys())
def all_display_map() -> dict[str, str]:
"""Return {code: display_name} for all registered locales."""
return {code: locale.display_name for code, locale in _REGISTRY.items()}

View file

@ -8,7 +8,6 @@ class VTTCue:
end_time: float # seconds
text: str
identifier: str | None = None
settings: str = ""
class VTTParser:
@ -38,11 +37,10 @@ class VTTParser:
# Parse timing line
if " --> " in line:
timing_match = re.match(r'([\d:.,]+)\s+-->\s+([\d:.,]+)\s*(.*)', line)
timing_match = re.match(r'([\d:.,]+)\s+-->\s+([\d:.,]+)', line)
if timing_match:
start_time = VTTParser._parse_timestamp(timing_match.group(1))
end_time = VTTParser._parse_timestamp(timing_match.group(2))
settings = timing_match.group(3).strip()
# Collect text lines until empty line or next cue
i += 1
@ -51,13 +49,13 @@ class VTTParser:
text_lines.append(lines[i].strip())
i += 1
cues.append(VTTCue(
start_time=start_time,
end_time=end_time,
text="\n".join(text_lines),
identifier=identifier,
settings=settings,
))
if text_lines:
cues.append(VTTCue(
start_time=start_time,
end_time=end_time,
text="\n".join(text_lines),
identifier=identifier
))
else:
i += 1
@ -73,19 +71,16 @@ class VTTParser:
if cue.identifier:
lines.append(cue.identifier)
# Add timing line (preserve cue settings like line:0%)
# Add timing line
start_timestamp = VTTParser._format_timestamp(cue.start_time)
end_timestamp = VTTParser._format_timestamp(cue.end_time)
timing_line = f"{start_timestamp} --> {end_timestamp}"
if cue.settings:
timing_line += f" {cue.settings}"
lines.append(timing_line)
lines.append(f"{start_timestamp} --> {end_timestamp}")
# Add text (can be multi-line)
lines.append(cue.text)
lines.append("") # Empty line between cues
return "\n".join(lines) + "\n"
return "\n".join(lines)
@staticmethod
def _parse_timestamp(timestamp: str) -> float:
@ -126,7 +121,7 @@ class VTTParser:
secs = seconds % 60
whole_secs = int(secs)
milliseconds = round((secs - whole_secs) * 1000)
milliseconds = int((secs - whole_secs) * 1000)
return f"{hours:02d}:{minutes:02d}:{whole_secs:02d}.{milliseconds:03d}"
@ -153,22 +148,6 @@ class VTTEditor:
return VTTParser.build(cues)
@staticmethod
def assert_cue_alignment(en_vtt: str, target_vtt: str, lang: str) -> None:
"""Raise ValueError if target VTT cue count or timestamps diverge from EN master."""
en_cues = VTTParser.parse(en_vtt)
tgt_cues = VTTParser.parse(target_vtt)
if len(tgt_cues) != len(en_cues):
raise ValueError(
f"Cue count mismatch for {lang}: EN has {len(en_cues)}, target has {len(tgt_cues)}"
)
for i, (en, tgt) in enumerate(zip(en_cues, tgt_cues, strict=True)):
if en.start_time != tgt.start_time or en.end_time != tgt.end_time:
raise ValueError(
f"Timestamp mismatch for {lang} cue {i}: "
f"EN {en.start_time}-->{en.end_time}, target {tgt.start_time}-->{tgt.end_time}"
)
@staticmethod
def update_cue_text(vtt_content: str, cue_index: int, new_text: str) -> str:
"""Update text for a specific cue by index"""
@ -207,20 +186,6 @@ class VTTEditor:
return len(errors) == 0, errors
@staticmethod
def fix_overlapping_cues(vtt_content: str) -> str:
"""Trim end_time of each cue so it does not overlap the next cue's start_time."""
cues = VTTParser.parse(vtt_content)
for i in range(1, len(cues)):
if cues[i].start_time < cues[i - 1].end_time:
# Clamp previous cue end to 1ms before next cue start
new_end = cues[i].start_time - 0.001
# Never let end_time go at or below start_time
if new_end <= cues[i - 1].start_time:
new_end = cues[i - 1].start_time + 0.001
cues[i - 1].end_time = new_end
return VTTParser.build(cues)
@staticmethod
def get_cue_count(vtt_content: str) -> int:
"""Get the number of cues in VTT content"""
@ -256,7 +221,7 @@ class VTTEditor:
)
return False, errors
for i, (src, tgt) in enumerate(zip(source_cues, translated_cues, strict=False)):
for i, (src, tgt) in enumerate(zip(source_cues, translated_cues)):
if abs(src.start_time - tgt.start_time) > 0.001:
errors.append(
f"Cue {i + 1}: start time changed "
@ -286,33 +251,3 @@ class VTTEditor:
return VTTParser.build(cues)
# DCMP §6.01 filler patterns per language (whole-word, case-insensitive)
_FILLER_PATTERNS: dict[str, str] = {
"en": r'\b(um+|uh+|ah+|er+|hmm+|you know|i mean|sort of|kind of|basically|literally|honestly|actually|right\?|so yeah)\b',
"es": r'\b(eh+|este|o sea|pues|bueno|o sea que|mmm+)\b',
"fr": r'\b(euh+|beh|ben|donc|quoi|enfin|voilà|genre)\b',
"de": r'\b(äh+|ähm+|halt|ne|also|naja|sozusagen|quasi)\b',
"it": r'\b(ehm+|allora|cioè|tipo|praticamente|insomma|ecco)\b',
"nl": r'\b(eh+|nou|zeg|eigenlijk|gewoon|toch|zo van|hè)\b',
"pt": r'\b(ahn+|hã+|né|sabe|tipo|então|assim)\b',
"pl": r'\b(no|że|bo|znaczy|właśnie|jakby|wiesz)\b',
"uk": r'\b(ну+|ем+|типу|знаєш|значить|власне|от)\b',
"ru": r'\b(ну+|эм+|типа|знаешь|значит|вот|собственно)\b',
}
@staticmethod
def clean_disfluencies(vtt_content: str, lang: str) -> str:
"""Remove filler words and hesitations per DCMP §6.01 for supported languages."""
pattern = VTTEditor._FILLER_PATTERNS.get(lang.split("-")[0].lower())
if not pattern:
return vtt_content
cues = VTTParser.parse(vtt_content)
compiled = re.compile(pattern, re.IGNORECASE)
for cue in cues:
cleaned = compiled.sub("", cue.text)
# Collapse multiple spaces and strip leading/trailing punctuation artifacts
cleaned = re.sub(r'[ \t]{2,}', ' ', cleaned).strip().strip(',').strip()
if cleaned:
cue.text = cleaned
return VTTParser.build(cues)

View file

@ -1,55 +1,48 @@
from contextlib import asynccontextmanager
import sentry_sdk
from fastapi import FastAPI, HTTPException, Request
from fastapi import FastAPI, Request, HTTPException
from fastapi.exceptions import RequestValidationError
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
from sentry_sdk.integrations.celery import CeleryIntegration
from sentry_sdk.integrations.fastapi import FastApiIntegration
from sentry_sdk.integrations.pymongo import PyMongoIntegration
from sentry_sdk.integrations.redis import RedisIntegration
from sentry_sdk.integrations.pymongo import PyMongoIntegration
from sentry_sdk.integrations.celery import CeleryIntegration
from .api.v1.routes_admin import router as admin_router
from .api.v1.routes_admin_production import router as admin_production_router
from .api.v1.routes_auth import router as auth_router
from .api.v1.routes_briefs import router as briefs_router
from .api.v1.routes_clients import router as clients_router
from .api.v1.routes_files import router as files_router
from .api.v1.routes_glossaries import router as glossaries_router
from .api.v1.routes_jobs import router as jobs_router
from .api.v1.routes_invitations import org_router as invitations_org_router
from .api.v1.routes_invitations import router as invitations_router
from .api.v1.routes_jobs import router as jobs_router
from .api.v1.routes_language_qc import router as language_qc_router
from .api.v1.routes_organizations import router as organizations_router
from .api.v1.routes_review_notes import router as review_notes_router
from .api.v1.routes_share import router as share_router
from .api.v1.routes_tts import router as tts_router
from .api.v1.routes_vtt_versions import router as vtt_versions_router
from .api.v1.routes_websockets import router as websockets_router
from .services.websocket import connection_manager
from .core.config import settings
from .core.database import (
close_mongo_connection,
connect_to_mongo,
get_database,
)
from .core.secrets_config import initialize_config
from .core.database import close_mongo_connection, connect_to_mongo, create_indexes, get_database
from .core.logging import setup_logging
from .core.redis import close_redis_connection, connect_to_redis, get_redis_client
from .core.secrets_config import initialize_config
from .core.seed import seed_default_admin
from .middleware import create_rate_limit_middleware, create_validation_middleware
from .services.language_qc import seed_language_qc_for_job
from .services.websocket import connection_manager
from .telemetry import (
app_metrics,
instrument_dependencies,
instrument_fastapi_app,
setup_tracing
)
from .services.websocket import connection_manager
@asynccontextmanager
async def lifespan(app: FastAPI):
# Startup
setup_logging()
# Initialize configuration with secrets
if settings.app_env == "prod":
try:
@ -58,7 +51,7 @@ async def lifespan(app: FastAPI):
except Exception as e:
print(f"⚠️ Failed to load secrets from Secret Manager: {e}")
print("⚠️ Falling back to environment variables")
# Initialize Sentry error tracking
if settings.sentry_dsn and settings.sentry_dsn.startswith(('http', 'https')):
sentry_sdk.init(
@ -75,15 +68,15 @@ async def lifespan(app: FastAPI):
attach_stacktrace=True,
send_default_pii=False, # Don't send PII for privacy
)
# Initialize telemetry (disabled for local development)
# setup_tracing("accessible-video-api", "1.0.0")
# instrument_dependencies()
# Start Prometheus metrics server in production
if settings.app_env == "prod":
app_metrics.start_prometheus_server(port=8001)
await connect_to_mongo()
await connect_to_redis()
@ -93,37 +86,20 @@ async def lifespan(app: FastAPI):
except Exception as e:
print(f"⚠️ Could not seed default admin: {e}")
# await create_indexes() # Temporarily disabled for debugging
# T-16: Seed language_qc only for jobs that still lack it (idempotent, skips on subsequent starts)
try:
db = await get_database()
pending_count = await db.jobs.count_documents({"language_qc": {"$exists": False}})
if pending_count > 0:
async for job_doc in db.jobs.find(
{"language_qc": {"$exists": False}},
{"_id": 1, "status": 1, "outputs": 1, "source": 1, "review": 1, "updated_at": 1, "requested_outputs": 1},
):
await seed_language_qc_for_job(db, job_doc)
print(f"✅ language_qc migration complete ({pending_count} jobs seeded)")
except Exception as e:
print(f"⚠️ language_qc migration failed: {e}")
# Start WebSocket connection manager
await connection_manager.start()
# Initialize middleware with Redis client
redis_client = get_redis_client()
if redis_client:
rate_limit_middleware = await create_rate_limit_middleware(redis_client)
validation_middleware = await create_validation_middleware()
# Store middleware in app state for access
app.state.rate_limit_middleware = rate_limit_middleware
app.state.validation_middleware = validation_middleware
elif settings.redis_url:
# T-13: REDIS_URL is configured but client unavailable — rate limiting is disabled
print(f"⚠️ Redis configured at {settings.redis_url!r} but connection failed — rate limiting disabled")
yield
# Shutdown
await connection_manager.stop()
@ -155,17 +131,18 @@ async def cors_error_handler(request, call_next):
try:
response = await call_next(request)
except Exception as e:
# LOG THE EXCEPTION BEFORE HANDLING IT
print(f"🚨 EXCEPTION IN CORS MIDDLEWARE: {e}")
import traceback
print(f"Traceback:\n{traceback.format_exc()}")
from .core.logging import get_logger as _get_logger
_get_logger(__name__).exception("🚨 CORS middleware caught: %s\n%s", e, traceback.format_exc())
# Handle any unhandled exceptions and add CORS headers
from fastapi.responses import JSONResponse
response = JSONResponse(
status_code=500,
content={"detail": "Internal server error"},
content={"detail": "Internal server error", "error": str(e)}
)
# Always add CORS headers for allowed origins
origin = request.headers.get("origin")
if origin and origin in settings.cors_origins_list:
@ -186,7 +163,7 @@ async def http_exception_handler(request: Request, exc: HTTPException):
status_code=exc.status_code,
content={"detail": exc.detail}
)
# Add CORS headers
origin = request.headers.get("origin")
if origin and origin in settings.cors_origins_list:
@ -221,18 +198,21 @@ async def validation_exception_handler(request: Request, exc: RequestValidationE
async def general_exception_handler(request: Request, exc: Exception):
"""Handle all uncaught exceptions with logging"""
import traceback
from .core.logging import get_logger
logger = get_logger(__name__)
logger.exception(
"🚨 Unhandled %s %s: %s\n%s",
request.method, request.url.path, exc, traceback.format_exc(),
)
logger.error(f"Unhandled exception in {request.method} {request.url.path}: {exc}")
logger.error(f"Exception type: {type(exc).__name__}")
logger.error(f"Traceback: {traceback.format_exc()}")
# Also print to stdout for immediate visibility
print(f"🚨 UNHANDLED EXCEPTION: {request.method} {request.url.path}")
print(f"Exception: {exc}")
print(f"Traceback:\n{traceback.format_exc()}")
response = JSONResponse(
status_code=500,
content={"detail": "Internal server error"},
content={"detail": "Internal server error", "error": str(exc)}
)
# Add CORS headers
@ -247,6 +227,9 @@ async def general_exception_handler(request: Request, exc: Exception):
@app.middleware("http")
async def rate_limiting_middleware(request, call_next):
"""Apply rate limiting middleware."""
# Skip middleware for auth endpoints during debugging
if request.url.path in ["/api/v1/auth/login", "/api/v1/auth/refresh"]:
return await call_next(request)
if hasattr(app.state, 'rate_limit_middleware'):
return await app.state.rate_limit_middleware(request, call_next)
return await call_next(request)
@ -254,7 +237,11 @@ async def rate_limiting_middleware(request, call_next):
@app.middleware("http")
async def validation_middleware(request, call_next):
"""Apply request validation middleware."""
if request.url.path in ["/health", "/metrics", "/api/v1/auth/login", "/api/v1/auth/refresh"]:
# TEMPORARILY DISABLED FOR DEBUGGING
return await call_next(request)
# Skip middleware for auth endpoints during debugging
if request.url.path in ["/api/v1/auth/login", "/api/v1/auth/refresh"]:
return await call_next(request)
if hasattr(app.state, 'validation_middleware'):
return await app.state.validation_middleware(request, call_next)
@ -272,28 +259,54 @@ app.include_router(invitations_router, prefix="/api/v1")
app.include_router(files_router, prefix="/api/v1")
app.include_router(jobs_router, prefix="/api/v1")
app.include_router(review_notes_router, prefix="/api/v1")
app.include_router(vtt_versions_router, prefix="/api/v1")
app.include_router(language_qc_router, prefix="/api/v1")
app.include_router(glossaries_router, prefix="/api/v1")
app.include_router(tts_router, prefix="/api/v1")
app.include_router(admin_router, prefix="/api/v1")
app.include_router(admin_production_router, prefix="/api/v1")
app.include_router(briefs_router, prefix="/api/v1")
app.include_router(share_router, prefix="/api/v1")
app.include_router(websockets_router, prefix="/api/v1")
@app.on_event("startup")
async def startup_event():
"""Initialize services on startup"""
logger.info("🚀 Starting up FastAPI application...")
# Start WebSocket connection manager
try:
await connection_manager.start()
logger.info("✅ WebSocket connection manager started successfully")
except Exception as e:
logger.error(f"❌ Failed to start WebSocket connection manager: {e}")
raise
@app.on_event("shutdown")
async def shutdown_event():
"""Cleanup services on shutdown"""
logger.info("🛑 Shutting down FastAPI application...")
# Stop WebSocket connection manager
try:
await connection_manager.stop()
logger.info("✅ WebSocket connection manager stopped successfully")
except Exception as e:
logger.error(f"❌ Error stopping WebSocket connection manager: {e}")
@app.get("/health")
async def health_check():
return {"status": "healthy", "version": "1.0.0"}
@app.get("/debug-test")
async def debug_test():
print("🔥🔥🔥 DEBUG TEST ENDPOINT HIT 🔥🔥🔥")
return {"message": "If you see this, routing works"}
@app.get("/metrics")
async def metrics():
"""Prometheus metrics endpoint"""
from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
from fastapi import Response
from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
return Response(
content=generate_latest(),
media_type=CONTENT_TYPE_LATEST

View file

@ -1,16 +1,12 @@
"""Middleware package for FastAPI application."""
from .rate_limiting import (
IPWhitelist,
RateLimitMiddleware,
create_rate_limit_middleware,
)
from .rate_limiting import RateLimitMiddleware, IPWhitelist, create_rate_limit_middleware
from .validation import ValidationMiddleware, create_validation_middleware
__all__ = [
"RateLimitMiddleware",
"IPWhitelist",
"IPWhitelist",
"create_rate_limit_middleware",
"ValidationMiddleware",
"create_validation_middleware"
]
]

View file

@ -1,10 +1,14 @@
"""Rate limiting middleware for API endpoints."""
import time
from collections import defaultdict
from typing import Dict, Optional, Tuple
import redis.asyncio as aioredis
from fastapi import Request, status
from fastapi import HTTPException, Request, status
from fastapi.responses import JSONResponse
import json
import asyncio
from datetime import datetime, timedelta
from app.core.config import get_settings
from app.telemetry.metrics import track_rate_limit_metrics
@ -12,50 +16,50 @@ from app.telemetry.metrics import track_rate_limit_metrics
class RateLimiter:
"""Redis-based rate limiter with sliding window algorithm."""
def __init__(self, redis_client: aioredis.Redis):
self.redis = redis_client
async def is_allowed(
self,
key: str,
limit: int,
self,
key: str,
limit: int,
window_seconds: int,
identifier: str = ""
) -> tuple[bool, dict[str, int]]:
) -> Tuple[bool, Dict[str, int]]:
"""
Check if request is allowed under rate limit.
Returns:
Tuple of (is_allowed, rate_limit_info)
"""
now = time.time()
pipeline = self.redis.pipeline()
# Remove expired entries
pipeline.zremrangebyscore(key, 0, now - window_seconds)
# Count current requests in window
pipeline.zcard(key)
# Add current request
pipeline.zadd(key, {str(now): now})
# Set expiry
pipeline.expire(key, window_seconds)
results = await pipeline.execute()
current_requests = results[1]
rate_limit_info = {
"limit": limit,
"remaining": max(0, limit - current_requests),
"reset_time": int(now + window_seconds),
"retry_after": window_seconds if current_requests >= limit else 0
}
is_allowed = current_requests <= limit
# Track metrics
track_rate_limit_metrics(
identifier=identifier,
@ -63,17 +67,17 @@ class RateLimiter:
current_requests=current_requests,
limit=limit
)
return is_allowed, rate_limit_info
class RateLimitMiddleware:
"""FastAPI middleware for rate limiting."""
def __init__(self, redis_client: aioredis.Redis):
self.limiter = RateLimiter(redis_client)
self.settings = get_settings()
# Rate limit configurations by endpoint pattern
self.rate_limits = {
# Authentication endpoints
@ -81,96 +85,93 @@ class RateLimitMiddleware:
"POST:/api/v1/auth/register": (3, 3600), # 3 requests per hour
"POST:/api/v1/auth/refresh": (10, 300), # 10 requests per 5 minutes
"POST:/api/v1/auth/forgot-password": (3, 3600), # 3 requests per hour
# File upload endpoints
"POST:/api/v1/files/upload": (10, 3600), # 10 uploads per hour
"POST:/api/v1/jobs": (20, 3600), # 20 job creations per hour
# Job management endpoints
"GET:/api/v1/jobs": (100, 300), # 100 requests per 5 minutes
"PATCH:/api/v1/jobs/*/approve": (50, 3600), # 50 approvals per hour
"PATCH:/api/v1/jobs/*/reject": (50, 3600), # 50 rejections per hour
# VTT editing endpoints
"PATCH:/api/v1/jobs/*/vtt": (100, 3600), # 100 VTT edits per hour
# Admin endpoints (more restrictive)
"GET:/api/v1/admin/*": (50, 300), # 50 requests per 5 minutes
"POST:/api/v1/admin/*": (20, 3600), # 20 admin actions per hour
"PATCH:/api/v1/admin/*": (20, 3600), # 20 admin updates per hour
"DELETE:/api/v1/admin/*": (10, 3600), # 10 admin deletions per hour
}
# Default rate limits
self.default_limits = {
"authenticated": (1000, 3600), # 1000 requests per hour for authenticated users
"anonymous": (100, 3600), # 100 requests per hour for anonymous users
}
def _get_client_identifier(self, request: Request) -> str:
"""Get client identifier for rate limiting."""
# Try to get user ID from JWT token
user = getattr(request.state, 'user', None)
if user:
return f"user:{user.id}"
# Only trust X-Forwarded-For when the request arrived via HTTPS (i.e. through
# the Apache/nginx reverse proxy). On plain HTTP (direct connections, local
# dev) the header can be forged, so we fall back to the socket IP.
if request.headers.get("X-Forwarded-Proto") == "https":
forwarded_for = request.headers.get("X-Forwarded-For")
if forwarded_for:
# Take the right-most IP added by the trusted proxy, not client-supplied ones.
return f"ip:{forwarded_for.split(',')[-1].strip()}"
# Fall back to IP address
forwarded_for = request.headers.get("X-Forwarded-For")
if forwarded_for:
return f"ip:{forwarded_for.split(',')[0].strip()}"
client_ip = request.client.host if request.client else "unknown"
return f"ip:{client_ip}"
def _get_endpoint_key(self, request: Request) -> str:
"""Get endpoint pattern for rate limiting."""
method = request.method
path = request.url.path
# Replace job IDs with wildcard for pattern matching
import re
path = re.sub(r'/jobs/[a-f0-9-]+/', '/jobs/*/', path)
path = re.sub(r'/admin/users/[a-f0-9-]+', '/admin/users/*', path)
return f"{method}:{path}"
def _get_rate_limit(self, request: Request) -> tuple[int, int]:
def _get_rate_limit(self, request: Request) -> Tuple[int, int]:
"""Get rate limit for the current request."""
endpoint_key = self._get_endpoint_key(request)
# Check for specific endpoint limits
if endpoint_key in self.rate_limits:
return self.rate_limits[endpoint_key]
# Check for wildcard matches
for pattern, limits in self.rate_limits.items():
if pattern.endswith("*") and endpoint_key.startswith(pattern[:-1]):
return limits
# Use default limits based on authentication
user = getattr(request.state, 'user', None)
if user:
return self.default_limits["authenticated"]
else:
return self.default_limits["anonymous"]
async def __call__(self, request: Request, call_next):
"""Process rate limiting for the request."""
# Skip rate limiting for health checks and metrics only
if request.url.path in ["/health", "/metrics"]:
# Skip rate limiting for health checks and login (temporary for debugging)
if request.url.path in ["/health", "/metrics", "/api/v1/auth/login"]:
return await call_next(request)
client_id = self._get_client_identifier(request)
endpoint_key = self._get_endpoint_key(request)
limit, window = self._get_rate_limit(request)
# Create rate limit key
rate_limit_key = f"rate_limit:{client_id}:{endpoint_key}"
try:
is_allowed, rate_info = await self.limiter.is_allowed(
key=rate_limit_key,
@ -178,7 +179,7 @@ class RateLimitMiddleware:
window_seconds=window,
identifier=client_id
)
if not is_allowed:
# Return rate limit exceeded response
return JSONResponse(
@ -195,17 +196,17 @@ class RateLimitMiddleware:
"Retry-After": str(rate_info["retry_after"])
}
)
# Process the request
response = await call_next(request)
# Add rate limit headers to response
response.headers["X-RateLimit-Limit"] = str(rate_info["limit"])
response.headers["X-RateLimit-Remaining"] = str(rate_info["remaining"])
response.headers["X-RateLimit-Reset"] = str(rate_info["reset_time"])
return response
except Exception as e:
# Log error but don't block request if rate limiting fails
print(f"Rate limiting error: {e}")
@ -214,30 +215,30 @@ class RateLimitMiddleware:
class IPWhitelist:
"""IP whitelist for bypassing rate limits."""
def __init__(self, redis_client: aioredis.Redis):
self.redis = redis_client
self.whitelist_key = "ip_whitelist"
# Default whitelisted IPs (health checks, monitoring)
self.default_whitelist = {
"127.0.0.1",
"::1",
"169.254.169.254", # GCP metadata server
}
async def is_whitelisted(self, ip: str) -> bool:
"""Check if IP is whitelisted."""
if ip in self.default_whitelist:
return True
try:
is_member = await self.redis.sismember(self.whitelist_key, ip)
return bool(is_member)
except Exception:
return False
async def add_ip(self, ip: str, ttl_seconds: int | None = None) -> bool:
async def add_ip(self, ip: str, ttl_seconds: Optional[int] = None) -> bool:
"""Add IP to whitelist."""
try:
await self.redis.sadd(self.whitelist_key, ip)
@ -248,7 +249,7 @@ class IPWhitelist:
return True
except Exception:
return False
async def remove_ip(self, ip: str) -> bool:
"""Remove IP from whitelist."""
try:
@ -260,4 +261,4 @@ class IPWhitelist:
async def create_rate_limit_middleware(redis_client: aioredis.Redis) -> RateLimitMiddleware:
"""Factory function to create rate limit middleware."""
return RateLimitMiddleware(redis_client)
return RateLimitMiddleware(redis_client)

View file

@ -3,17 +3,15 @@
import json
import re
import time
from typing import Any
from typing import Any, Dict, List, Optional, Set
from fastapi import HTTPException, Request, status
from fastapi.responses import JSONResponse
from pydantic import BaseModel, ValidationError as PydanticValidationError
import magic
from urllib.parse import unquote
import magic
from fastapi import Request, status
from fastapi.responses import JSONResponse
from app.telemetry.metrics import track_validation_metrics
from ..core.config import settings
class ValidationError(Exception):
"""Custom validation error."""
@ -27,93 +25,89 @@ class SecurityValidationError(Exception):
class RequestValidator:
"""Enhanced request validation with security checks."""
def __init__(self):
# File type restrictions
self.allowed_video_types = {
"video/mp4",
"video/quicktime",
"video/quicktime",
"video/x-msvideo" # AVI
}
self.allowed_subtitle_types = {
"text/vtt",
"text/plain"
}
# Security patterns to block
self.malicious_patterns = [
# SQL injection patterns
r"\b(union|select|insert|update|delete|drop|create|alter)\b\s+",
r"vbscript:", # vbscript protocol injection
r"\b(onload|onerror|onclick)\s*=", # HTML event handler attribute injection
r"(union|select|insert|update|delete|drop|create|alter)\s+",
r"(script|javascript|vbscript|onload|onerror|onclick)",
r"<\s*script[^>]*>",
r"javascript:",
r"data:.*base64",
# Path traversal
r"\.\./",
r"\.\.\\",
r"%2e%2e%2f",
r"%2e%2e\\",
# Command injection (removed $ and ; — semicolons are common in natural language)
r"[&|`](?!\s*$)",
r"\b(rm|wget|curl|nc|bash|sh|cmd|powershell)\b\s+",
# MongoDB injection — NoSQL operator abuse
r"\$where|\$expr|\$function|\$accumulator"
r"|\$ne|\$nin|\$not"
r"|\$gt|\$gte|\$lt|\$lte"
r"|\$regex|\$jsonSchema|\$mod",
# Command injection (removed $ to allow MongoDB operators in controlled contexts)
r"[;&|`](?!\s*$)", # Allow $ but not as command separator
r"(rm|wget|curl|nc|bash|sh|cmd|powershell)\s+",
# MongoDB injection
r"\$where|\$ne|\$gt|\$lt|\$regex",
]
self.compiled_patterns = [re.compile(pattern, re.IGNORECASE) for pattern in self.malicious_patterns]
# Max file sizes (in bytes) — driven by central config (T-14)
self.max_video_size = settings.upload_max_video_bytes
# Max file sizes (in bytes)
self.max_video_size = 2 * 1024 * 1024 * 1024 # 2GB
self.max_subtitle_size = 10 * 1024 * 1024 # 10MB
# Request size limits
self.max_json_size = 1024 * 1024 # 1MB
self.max_form_fields = 50
def validate_string_content(self, content: str, field_name: str = "input") -> None:
"""Validate string content for malicious patterns."""
if not isinstance(content, str):
return
for pattern in self.compiled_patterns:
if pattern.search(content):
raise SecurityValidationError(
f"Potentially malicious content detected in {field_name}"
)
def validate_filename(self, filename: str) -> str:
"""Validate and sanitize filename."""
if not filename:
raise ValidationError("Filename cannot be empty")
# Decode URL encoding
filename = unquote(filename)
# Check for malicious patterns
self.validate_string_content(filename, "filename")
# Remove dangerous characters
safe_filename = re.sub(r'[^\w\-_\.]', '_', filename)
# Prevent hidden files
if safe_filename.startswith('.'):
safe_filename = 'file_' + safe_filename[1:]
# Limit length
if len(safe_filename) > 255:
name, ext = safe_filename.rsplit('.', 1) if '.' in safe_filename else (safe_filename, '')
safe_filename = name[:250] + ('.' + ext if ext else '')
return safe_filename
def validate_file_type(self, content: bytes, expected_type: str, filename: str) -> None:
"""Validate file type using magic numbers."""
try:
@ -123,13 +117,13 @@ class RequestValidator:
ext = filename.lower().split('.')[-1] if '.' in filename else ''
video_extensions = {'mp4', 'mov', 'avi', 'mkv'}
subtitle_extensions = {'vtt', 'srt', 'txt'}
if expected_type == "video" and ext not in video_extensions:
raise ValidationError(f"Invalid video file extension: {ext}") from None
raise ValidationError(f"Invalid video file extension: {ext}")
elif expected_type == "subtitle" and ext not in subtitle_extensions:
raise ValidationError(f"Invalid subtitle file extension: {ext}") from None
raise ValidationError(f"Invalid subtitle file extension: {ext}")
return
if expected_type == "video" and detected_type not in self.allowed_video_types:
raise ValidationError(
f"Invalid video file type: {detected_type}. "
@ -140,7 +134,7 @@ class RequestValidator:
f"Invalid subtitle file type: {detected_type}. "
f"Allowed types: {', '.join(self.allowed_subtitle_types)}"
)
def validate_file_size(self, size: int, file_type: str) -> None:
"""Validate file size limits."""
if file_type == "video" and size > self.max_video_size:
@ -153,16 +147,16 @@ class RequestValidator:
f"Subtitle file too large: {size} bytes. "
f"Maximum allowed: {self.max_subtitle_size} bytes"
)
async def validate_json_payload(self, request: Request) -> dict[str, Any] | None:
async def validate_json_payload(self, request: Request) -> Optional[Dict[str, Any]]:
"""Validate JSON request payload."""
if not request.headers.get("content-type", "").startswith("application/json"):
return None
content_length = request.headers.get("content-length")
if content_length and int(content_length) > self.max_json_size:
raise ValidationError(f"JSON payload too large: {content_length} bytes")
try:
# Check if body has already been read
if hasattr(request, '_cached_body'):
@ -171,67 +165,63 @@ class RequestValidator:
body = await request.body()
# Cache the body so FastAPI can read it later
request._cached_body = body
if len(body) > self.max_json_size:
raise ValidationError(f"JSON payload too large: {len(body)} bytes")
if not body:
return {}
payload = json.loads(body)
# Recursively validate all string values
self._validate_json_values(payload)
return payload
except json.JSONDecodeError as e:
raise ValidationError(f"Invalid JSON: {e}") from e
# Fields that contain free-form natural language — skip injection pattern checks
_FREETEXT_FIELDS = {"captions_vtt", "audio_description_vtt", "text", "notes", "change_note", "description"}
raise ValidationError(f"Invalid JSON: {e}")
def _validate_json_values(self, obj: Any, path: str = "root") -> None:
"""Recursively validate JSON values."""
if isinstance(obj, dict):
if len(obj) > self.max_form_fields:
raise ValidationError(f"Too many fields in object at {path}")
for key, value in obj.items():
self.validate_string_content(key, f"{path}.key")
# Skip pattern scanning for free-text fields (VTT content, notes, etc.)
if key not in self._FREETEXT_FIELDS:
self._validate_json_values(value, f"{path}.{key}")
if isinstance(key, str):
self.validate_string_content(key, f"{path}.{key}")
self._validate_json_values(value, f"{path}.{key}")
elif isinstance(obj, list):
if len(obj) > 1000: # Prevent large arrays
raise ValidationError(f"Array too large at {path}")
for i, item in enumerate(obj):
self._validate_json_values(item, f"{path}[{i}]")
elif isinstance(obj, str):
self.validate_string_content(obj, path)
def validate_query_params(self, request: Request) -> None:
"""Validate query parameters."""
for key, value in request.query_params.items():
self.validate_string_content(key, f"query.{key}")
self.validate_string_content(str(value), f"query.{key}")
def validate_headers(self, request: Request) -> None:
"""Validate request headers."""
suspicious_headers = {
"x-forwarded-host",
"x-original-host",
"x-original-host",
"x-rewrite-url"
}
for header_name, header_value in request.headers.items():
# Check for suspicious headers
if header_name.lower() in suspicious_headers:
self.validate_string_content(header_value, f"header.{header_name}")
# Validate user-agent length
if header_name.lower() == "user-agent" and len(header_value) > 500:
raise SecurityValidationError("User-Agent header too long")
@ -239,34 +229,34 @@ class RequestValidator:
class ValidationMiddleware:
"""FastAPI middleware for enhanced request validation."""
def __init__(self):
self.validator = RequestValidator()
async def __call__(self, request: Request, call_next):
"""Process validation for the request."""
start_time = time.time()
validation_errors = []
# Skip validation for timing adjustment endpoint temporarily
if "/vtt/adjust-timing" in request.url.path:
return await call_next(request)
try:
# Validate headers
self.validator.validate_headers(request)
# Validate query parameters
self.validator.validate_query_params(request)
# Validate JSON payload if present
if request.method in ["POST", "PUT", "PATCH"]:
await self.validator.validate_json_payload(request)
# Process the request
response = await call_next(request)
# Track successful validation
track_validation_metrics(
endpoint=request.url.path,
@ -275,10 +265,10 @@ class ValidationMiddleware:
validation_time=time.time() - start_time,
error_types=[]
)
return response
except SecurityValidationError:
except SecurityValidationError as e:
validation_errors.append("security")
track_validation_metrics(
endpoint=request.url.path,
@ -287,7 +277,7 @@ class ValidationMiddleware:
validation_time=time.time() - start_time,
error_types=validation_errors
)
return JSONResponse(
status_code=status.HTTP_400_BAD_REQUEST,
content={
@ -295,7 +285,7 @@ class ValidationMiddleware:
"error_code": "SECURITY_VALIDATION_ERROR"
}
)
except ValidationError as e:
validation_errors.append("format")
track_validation_metrics(
@ -305,7 +295,7 @@ class ValidationMiddleware:
validation_time=time.time() - start_time,
error_types=validation_errors
)
return JSONResponse(
status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
content={
@ -313,7 +303,7 @@ class ValidationMiddleware:
"error_code": "VALIDATION_ERROR"
}
)
except Exception as e:
validation_errors.append("unknown")
track_validation_metrics(
@ -323,7 +313,7 @@ class ValidationMiddleware:
validation_time=time.time() - start_time,
error_types=validation_errors
)
# Log unexpected error but continue processing
print(f"Validation middleware error: {e}")
return await call_next(request)
@ -331,4 +321,4 @@ class ValidationMiddleware:
async def create_validation_middleware() -> ValidationMiddleware:
"""Factory function to create validation middleware."""
return ValidationMiddleware()
return ValidationMiddleware()

View file

@ -1,5 +1,5 @@
"""Database migration framework for MongoDB."""
from .migrator import Migration, MigrationManager
from .migrator import MigrationManager, Migration
__all__ = ["MigrationManager", "Migration"]
__all__ = ["MigrationManager", "Migration"]

View file

@ -1,10 +1,11 @@
"""MongoDB migration framework."""
import os
import importlib.util
from abc import ABC, abstractmethod
from datetime import datetime
from pathlib import Path
from typing import List, Optional
from motor.motor_asyncio import AsyncIOMotorDatabase
from app.core.database import get_database
@ -16,23 +17,22 @@ logger = get_logger(__name__)
class Migration(ABC):
"""Base class for database migrations."""
version: str = "0000-00-00-000000" # overridden by subclass as class variable
description: str = ""
def __init__(self):
self.db: AsyncIOMotorDatabase | None = None
self.version: str = "0000-00-00-000000" # Format: YYYY-MM-DD-HHMMSS
self.description: str = ""
self.db: Optional[AsyncIOMotorDatabase] = None
@abstractmethod
async def up(self) -> None:
"""Apply the migration."""
pass
@abstractmethod
async def down(self) -> None:
"""Rollback the migration."""
pass
async def set_database(self, db: AsyncIOMotorDatabase) -> None:
"""Set the database instance."""
self.db = db
@ -40,7 +40,7 @@ class Migration(ABC):
class MigrationRecord:
"""Represents a migration record in the database."""
def __init__(self, version: str, description: str, applied_at: datetime):
self.version = version
self.description = description
@ -49,163 +49,163 @@ class MigrationRecord:
class MigrationManager:
"""Manages database migrations."""
def __init__(self):
self.db: AsyncIOMotorDatabase | None = None
self.db: Optional[AsyncIOMotorDatabase] = None
self.migrations_dir = Path(__file__).parent / "scripts"
self.collection_name = "migration_history"
async def initialize(self) -> None:
"""Initialize the migration manager."""
self.db = await get_database()
await self._ensure_migration_collection()
async def _ensure_migration_collection(self) -> None:
"""Ensure the migration history collection exists with proper indexes."""
collection = self.db[self.collection_name]
# Create indexes for migration history
await collection.create_index([("version", 1)], unique=True)
await collection.create_index([("applied_at", -1)])
logger.info("Migration history collection initialized")
def discover_migrations(self) -> list[str]:
def discover_migrations(self) -> List[str]:
"""Discover all migration files in the migrations directory."""
if not self.migrations_dir.exists():
logger.warning(f"Migrations directory not found: {self.migrations_dir}")
return []
migration_files = []
for file_path in self.migrations_dir.glob("*.py"):
if file_path.name.startswith("migration_") and not file_path.name.startswith("__"):
migration_files.append(file_path.stem)
# Sort by version (filename should start with version)
migration_files.sort()
return migration_files
async def load_migration(self, migration_name: str) -> Migration:
"""Dynamically load a migration class."""
migration_path = self.migrations_dir / f"{migration_name}.py"
if not migration_path.exists():
raise FileNotFoundError(f"Migration file not found: {migration_path}")
# Load the module
spec = importlib.util.spec_from_file_location(migration_name, migration_path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Get the migration class (assume it's named Migration)
if not hasattr(module, 'Migration'):
raise AttributeError(f"Migration class not found in {migration_name}")
migration_class = module.Migration
migration_class = getattr(module, 'Migration')
migration = migration_class()
await migration.set_database(self.db)
return migration
async def get_applied_migrations(self) -> list[str]:
async def get_applied_migrations(self) -> List[str]:
"""Get list of applied migration versions."""
collection = self.db[self.collection_name]
cursor = collection.find({}, {"version": 1}).sort("version", 1)
applied = []
async for doc in cursor:
applied.append(doc["version"])
return applied
async def record_migration(self, migration: Migration) -> None:
"""Record a successful migration in the database."""
collection = self.db[self.collection_name]
record = {
"version": migration.version,
"description": migration.description,
"applied_at": datetime.utcnow()
}
await collection.insert_one(record)
logger.info(f"Recorded migration: {migration.version} - {migration.description}")
async def remove_migration_record(self, version: str) -> None:
"""Remove a migration record (for rollback)."""
collection = self.db[self.collection_name]
await collection.delete_one({"version": version})
logger.info(f"Removed migration record: {version}")
@trace_async_operation("migration_manager.migrate_up")
async def migrate_up(self, target_version: str | None = None) -> list[str]:
async def migrate_up(self, target_version: Optional[str] = None) -> List[str]:
"""
Apply migrations up to the target version.
Args:
target_version: Version to migrate to. If None, applies all pending migrations.
Returns:
List of applied migration versions.
"""
await self.initialize()
# Discover all migrations
all_migrations = self.discover_migrations()
applied_migrations = await self.get_applied_migrations()
# Find pending migrations
pending_migrations = []
for migration_name in all_migrations:
# Extract version from filename (assumes format: migration_YYYY-MM-DD-HHMMSS_description.py)
version = migration_name.replace("migration_", "").split("_")[0]
if version not in applied_migrations:
if target_version is None or version <= target_version:
pending_migrations.append((migration_name, version))
# Sort by version
pending_migrations.sort(key=lambda x: x[1])
applied = []
for migration_name, version in pending_migrations:
try:
logger.info(f"Applying migration: {migration_name}")
migration = await self.load_migration(migration_name)
await migration.up()
await self.record_migration(migration)
applied.append(version)
logger.info(f"Successfully applied migration: {version}")
except Exception as e:
logger.error(f"Failed to apply migration {migration_name}: {e}")
raise
return applied
@trace_async_operation("migration_manager.migrate_down")
async def migrate_down(self, target_version: str) -> list[str]:
async def migrate_down(self, target_version: str) -> List[str]:
"""
Rollback migrations down to the target version.
Args:
target_version: Version to rollback to.
Returns:
List of rolled back migration versions.
"""
await self.initialize()
applied_migrations = await self.get_applied_migrations()
# Find migrations to rollback (newer than target)
to_rollback = []
for version in reversed(applied_migrations):
if version > target_version:
to_rollback.append(version)
rolled_back = []
for version in to_rollback:
try:
@ -215,39 +215,39 @@ class MigrationManager:
if version in migration_file:
migration_name = migration_file
break
if not migration_name:
logger.warning(f"Migration file not found for version {version}")
continue
logger.info(f"Rolling back migration: {migration_name}")
migration = await self.load_migration(migration_name)
await migration.down()
await self.remove_migration_record(version)
rolled_back.append(version)
logger.info(f"Successfully rolled back migration: {version}")
except Exception as e:
logger.error(f"Failed to rollback migration {version}: {e}")
raise
return rolled_back
async def get_migration_status(self) -> dict:
"""Get current migration status."""
await self.initialize()
all_migrations = self.discover_migrations()
applied_migrations = await self.get_applied_migrations()
pending_count = len(all_migrations) - len(applied_migrations)
return {
"total_migrations": len(all_migrations),
"applied_migrations": len(applied_migrations),
"pending_migrations": pending_count,
"latest_applied": applied_migrations[-1] if applied_migrations else None,
"all_applied": applied_migrations
}
}

View file

@ -1,22 +0,0 @@
"""Entry point for running migrations: python -m app.migrations.run"""
import asyncio
from app.core.database import close_mongo_connection, connect_to_mongo
from app.migrations.migrator import MigrationManager
async def main() -> None:
await connect_to_mongo()
try:
mgr = MigrationManager()
applied = await mgr.migrate_up()
if applied:
print(f"Applied {len(applied)} migration(s): {applied}")
else:
print("Already up to date — no pending migrations.")
finally:
await close_mongo_connection()
if __name__ == "__main__":
asyncio.run(main())

View file

@ -1,38 +1,39 @@
"""Initial database schema setup migration."""
from datetime import datetime
from app.migrations.migrator import Migration
class Migration(Migration):
"""Initial schema setup with all collections and indexes."""
def __init__(self):
super().__init__()
self.version = "2025-08-17-120000"
self.description = "Initial database schema with users, jobs, and audit_logs collections"
async def up(self) -> None:
"""Create initial collections and indexes."""
# Users collection setup
await self.db.users.create_index([("email", 1)], unique=True)
await self.db.users.create_index([("role", 1)])
await self.db.users.create_index([("is_active", 1)])
await self.db.users.create_index([("created_at", -1)])
# Jobs collection setup
await self.db.jobs.create_index([("status", 1), ("created_at", -1)])
await self.db.jobs.create_index([("client_id", 1)])
await self.db.jobs.create_index([("updated_at", -1)])
await self.db.jobs.create_index([("languages", 1)])
# Create compound index for job queries
await self.db.jobs.create_index([
("status", 1),
("client_id", 1),
("created_at", -1)
])
# Audit logs collection setup
await self.db.audit_logs.create_index([("timestamp", -1)])
await self.db.audit_logs.create_index([("action", 1), ("timestamp", -1)])
@ -41,23 +42,23 @@ class Migration(Migration):
await self.db.audit_logs.create_index([("resource_type", 1), ("resource_id", 1)])
await self.db.audit_logs.create_index([("ip_address", 1), ("timestamp", -1)])
await self.db.audit_logs.create_index([("success", 1), ("timestamp", -1)])
# Text search index for audit logs
await self.db.audit_logs.create_index([
("description", "text"),
("details", "text"),
("error_message", "text")
])
print(f"✅ Applied migration {self.version}: {self.description}")
async def down(self) -> None:
"""Drop all collections (destructive - use with caution)."""
# This is a destructive operation - in production, you might want to backup first
await self.db.users.drop()
await self.db.jobs.drop()
await self.db.audit_logs.drop()
print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print("⚠️ WARNING: All data has been deleted!")
print("⚠️ WARNING: All data has been deleted!")

View file

@ -5,75 +5,75 @@ from app.migrations.migrator import Migration
class Migration(Migration):
"""Optimize indexes for better query performance."""
def __init__(self):
super().__init__()
self.version = "2025-08-17-120001"
self.description = "Index optimization for query performance improvements"
async def up(self) -> None:
"""Add optimized indexes for common query patterns."""
# Jobs collection optimizations
# Index for job status transitions and monitoring
await self.db.jobs.create_index([
("status", 1),
("updated_at", -1),
("client_id", 1)
], name="jobs_status_updated_client_idx")
# Index for queue management (pending jobs)
await self.db.jobs.create_index([
("status", 1),
("created_at", 1)
], name="jobs_queue_processing_idx")
# Index for client job history
await self.db.jobs.create_index([
("client_id", 1),
("created_at", -1),
("status", 1)
], name="jobs_client_history_idx")
# Sparse index for error tracking
await self.db.jobs.create_index([
("status", 1),
("error", 1)
], sparse=True, name="jobs_error_tracking_idx")
# Users collection optimizations
# Index for active user queries
await self.db.users.create_index([
("is_active", 1),
("role", 1),
("last_login_at", -1)
], name="users_active_role_login_idx")
# Index for user search by email pattern
await self.db.users.create_index([
("email", "text"),
("first_name", "text"),
("last_name", "text")
], name="users_search_idx")
# Audit logs collection optimizations
# Compound index for security monitoring
await self.db.audit_logs.create_index([
("severity", 1),
("action", 1),
("timestamp", -1)
], name="audit_security_monitoring_idx")
# Index for user activity analysis
await self.db.audit_logs.create_index([
("user_id", 1),
("action", 1),
("timestamp", -1)
], name="audit_user_activity_idx")
# Index for resource access tracking
await self.db.audit_logs.create_index([
("resource_type", 1),
@ -81,30 +81,30 @@ class Migration(Migration):
("action", 1),
("timestamp", -1)
], name="audit_resource_access_idx")
# Sparse index for failed operations
await self.db.audit_logs.create_index([
("success", 1),
("timestamp", -1)
], sparse=True, name="audit_failures_idx")
# Add TTL index for automatic audit log cleanup (optional)
# Uncomment if you want automatic cleanup after 2 years
# await self.db.audit_logs.create_index(
# [("timestamp", 1)],
# [("timestamp", 1)],
# expireAfterSeconds=63072000, # 2 years
# name="audit_ttl_idx"
# )
print(f"✅ Applied migration {self.version}: {self.description}")
async def down(self) -> None:
"""Remove the optimized indexes."""
# Drop the indexes we created
indexes_to_drop = [
"jobs_status_updated_client_idx",
"jobs_queue_processing_idx",
"jobs_queue_processing_idx",
"jobs_client_history_idx",
"jobs_error_tracking_idx",
"users_active_role_login_idx",
@ -114,21 +114,21 @@ class Migration(Migration):
"audit_resource_access_idx",
"audit_failures_idx"
]
for index_name in indexes_to_drop:
try:
await self.db.jobs.drop_index(index_name)
except Exception:
pass # Index might not exist on this collection
try:
await self.db.users.drop_index(index_name)
except Exception:
pass
try:
await self.db.audit_logs.drop_index(index_name)
except Exception:
pass
print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print(f"⚠️ Rolled back migration {self.version}: {self.description}")

View file

@ -1,21 +1,20 @@
"""Migrate audit log schema from basic to comprehensive format."""
from datetime import datetime
from app.migrations.migrator import Migration
class Migration(Migration):
"""Update audit log schema to comprehensive format."""
def __init__(self):
super().__init__()
self.version = "2025-08-17-120002"
self.description = "Update audit log schema from basic to comprehensive format"
async def up(self) -> None:
"""Migrate existing audit logs to new schema format."""
# Find all existing audit logs with old schema
old_logs_cursor = self.db.audit_logs.find({
# Look for logs that have the old schema structure
@ -25,9 +24,9 @@ class Migration(Migration):
{"timestamp": {"$exists": False}} # Missing new timestamp field
]
})
migration_count = 0
async for old_log in old_logs_cursor:
try:
# Map old fields to new schema
@ -39,82 +38,82 @@ class Migration(Migration):
"description": old_log.get("action", "Legacy action"),
"success": True,
"environment": "prod",
"service_name": "accessible-video-api",
"service_name": "accessible-video-api",
"api_version": "v1"
}
# Map optional fields if they exist
if "user_id" in old_log:
new_log["user_id"] = old_log["user_id"]
if "job_id" in old_log:
new_log["resource_type"] = "job"
new_log["resource_id"] = old_log["job_id"]
if "ip_address" in old_log:
new_log["ip_address"] = old_log["ip_address"]
if "user_agent" in old_log:
new_log["user_agent"] = old_log["user_agent"]
if "details" in old_log:
new_log["details"] = old_log["details"]
# Replace the old document with the new schema
await self.db.audit_logs.replace_one(
{"_id": old_log["_id"]},
new_log
)
migration_count += 1
except Exception as e:
print(f"Error migrating audit log {old_log.get('_id')}: {e}")
continue
print(f"✅ Applied migration {self.version}: Migrated {migration_count} audit log records")
def _map_old_action(self, old_action: str) -> str:
"""Map old action strings to new AuditAction enum values."""
action_mapping = {
# Job actions
"job_created": "job.create",
"job_approved": "job.approve",
"job_approved": "job.approve",
"job_rejected": "job.reject",
"job_updated": "job.update",
"job_cancelled": "job.cancel",
# Auth actions
"login": "auth.login.success",
"logout": "auth.logout",
"login_failed": "auth.login.failure",
# File actions
"file_uploaded": "file.upload",
"file_downloaded": "file.download",
# VTT actions
"vtt_edited": "vtt.edit",
# Admin actions
"user_created": "user.create",
"user_updated": "user.update",
"user_deleted": "user.delete",
}
return action_mapping.get(old_action, old_action)
async def down(self) -> None:
"""Rollback to old audit log schema format (limited)."""
# Find all audit logs with new schema
new_logs_cursor = self.db.audit_logs.find({
"timestamp": {"$exists": True},
"action": {"$exists": True}
})
rollback_count = 0
async for new_log in new_logs_cursor:
try:
# Map new fields back to old schema (lossy conversion)
@ -123,34 +122,34 @@ class Migration(Migration):
"when": new_log["timestamp"],
"action": new_log["action"]
}
# Map back optional fields
if "user_id" in new_log:
old_log["user_id"] = new_log["user_id"]
if "resource_type" in new_log and new_log["resource_type"] == "job":
old_log["job_id"] = new_log.get("resource_id")
if "ip_address" in new_log:
old_log["ip_address"] = new_log["ip_address"]
if "user_agent" in new_log:
old_log["user_agent"] = new_log["user_agent"]
if "details" in new_log:
old_log["details"] = new_log["details"]
# Replace with old schema
await self.db.audit_logs.replace_one(
{"_id": new_log["_id"]},
old_log
)
rollback_count += 1
except Exception as e:
print(f"Error rolling back audit log {new_log.get('_id')}: {e}")
continue
print(f"⚠️ Rolled back migration {self.version}: Reverted {rollback_count} audit log records")
print("⚠️ WARNING: Some audit log data may have been lost due to schema differences")
print("⚠️ WARNING: Some audit log data may have been lost due to schema differences")

View file

@ -24,7 +24,7 @@ class Migration(Migration):
# Create index on auth_provider for faster queries
await self.db.users.create_index([("auth_provider", 1)])
print("✅ Created index on auth_provider field")
print(f"✅ Created index on auth_provider field")
print(f"✅ Applied migration {self.version}: {self.description}")
@ -34,7 +34,7 @@ class Migration(Migration):
# Drop the index
try:
await self.db.users.drop_index("auth_provider_1")
print("✅ Dropped index on auth_provider field")
print(f"✅ Dropped index on auth_provider field")
except Exception as e:
print(f"⚠️ Could not drop index: {e}")

View file

@ -75,7 +75,7 @@ class Migration(Migration):
"validationLevel": "moderate", # moderate = only validate on insert/update, not existing docs
"validationAction": "error" # error = reject invalid documents
})
print("✅ Updated users collection validator")
print(f"✅ Updated users collection validator")
except Exception as e:
print(f"⚠️ Could not update validator: {e}")
# Try creating the collection if it doesn't exist
@ -86,7 +86,7 @@ class Migration(Migration):
validationLevel="moderate",
validationAction="error"
)
print("✅ Created users collection with validator")
print(f"✅ Created users collection with validator")
except Exception as e2:
print(f"⚠️ Could not create collection: {e2}")
@ -136,4 +136,4 @@ class Migration(Migration):
})
print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print("⚠️ WARNING: Production role users will fail validation!")
print(f"⚠️ WARNING: Production role users will fail validation!")

View file

@ -53,7 +53,7 @@ class Migration(Migration):
"validationLevel": "moderate",
"validationAction": "error"
})
print(" Updated jobs collection validator")
print(f" Updated jobs collection validator")
except Exception as e:
print(f" Could not update validator: {e}")
raise
@ -101,4 +101,4 @@ class Migration(Migration):
})
print(f" Rolled back migration {self.version}: {self.description}")
print(" WARNING: Jobs with approved_source or qc_feedback status will fail validation!")
print(f" WARNING: Jobs with approved_source or qc_feedback status will fail validation!")

View file

@ -54,7 +54,7 @@ class Migration(Migration):
"validationLevel": "moderate",
"validationAction": "error"
})
print(" Updated jobs collection validator")
print(f" Updated jobs collection validator")
except Exception as e:
print(f" Could not update validator: {e}")
raise
@ -104,4 +104,4 @@ class Migration(Migration):
})
print(f" Rolled back migration {self.version}: {self.description}")
print(" WARNING: Jobs with rendering_video status will fail validation!")
print(f" WARNING: Jobs with rendering_video status will fail validation!")

View file

@ -60,7 +60,7 @@ class Migration(Migration):
"validationLevel": "moderate",
"validationAction": "error"
})
print(" Updated jobs collection validator")
print(f" Updated jobs collection validator")
except Exception as e:
print(f" Could not update validator: {e}")
raise
@ -111,4 +111,4 @@ class Migration(Migration):
})
print(f" Rolled back migration {self.version}: {self.description}")
print(" WARNING: Jobs with tts_failed or render_failed status will fail validation!")
print(f" WARNING: Jobs with tts_failed or render_failed status will fail validation!")

View file

@ -61,7 +61,7 @@ class Migration(Migration):
"validationLevel": "moderate",
"validationAction": "error"
})
print(" Updated jobs collection validator")
print(f" Updated jobs collection validator")
except Exception as e:
print(f" Could not update validator: {e}")
raise
@ -114,4 +114,4 @@ class Migration(Migration):
})
print(f" Rolled back migration {self.version}: {self.description}")
print(" WARNING: Jobs with rendering_qc status will fail validation!")
print(f" WARNING: Jobs with rendering_qc status will fail validation!")

View file

@ -64,7 +64,7 @@ class Migration(Migration):
"validationLevel": "moderate",
"validationAction": "error"
})
print("✅ Updated users collection validator")
print(f"✅ Updated users collection validator")
except Exception as e:
print(f"⚠️ Could not update validator: {e}")
try:
@ -74,7 +74,7 @@ class Migration(Migration):
validationLevel="moderate",
validationAction="error"
)
print("✅ Created users collection with validator")
print(f"✅ Created users collection with validator")
except Exception as e2:
print(f"⚠️ Could not create collection: {e2}")
@ -134,4 +134,4 @@ class Migration(Migration):
})
print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print("⚠️ WARNING: Linguist role users will fail validation!")
print(f"⚠️ WARNING: Linguist role users will fail validation!")

View file

@ -69,7 +69,7 @@ class Migration(Migration):
"validationLevel": "moderate",
"validationAction": "error"
})
print("✅ Updated users collection validator")
print(f"✅ Updated users collection validator")
except Exception as e:
print(f"⚠️ Could not update validator: {e}")
try:
@ -79,7 +79,7 @@ class Migration(Migration):
validationLevel="moderate",
validationAction="error"
)
print("✅ Created users collection with validator")
print(f"✅ Created users collection with validator")
except Exception as e2:
print(f"⚠️ Could not create collection: {e2}")
@ -139,4 +139,4 @@ class Migration(Migration):
})
print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print("⚠️ WARNING: project_manager role users will fail validation!")
print(f"⚠️ WARNING: project_manager role users will fail validation!")

View file

@ -1,6 +1,6 @@
"""Backfill memberships collection from existing pm_client_ids and team.member_user_ids."""
from datetime import UTC, datetime
from datetime import datetime, timezone
from app.migrations.migrator import Migration
@ -13,7 +13,7 @@ class Migration(Migration):
self.description = "Backfill memberships from pm_client_ids and team member lists"
async def up(self) -> None:
now = datetime.now(UTC)
now = datetime.now(timezone.utc)
upserted = 0
# 1. PROJECT_MANAGER users → MANAGER membership for each pm_client_id

View file

@ -1,53 +0,0 @@
"""Add PROCESSING_FAILED status to job schema validator and create failure indexes."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-29-000000"
description = "Add processing_failed status and failure/status compound indexes on jobs"
async def up(self) -> None:
db = self.db
# Add processing_failed to the schema validator enum (if validator exists)
try:
validator_info = await db.command(
"listCollections", filter={"name": "jobs"}
)
collections = [c async for c in validator_info["cursor"]]
if collections and collections[0].get("options", {}).get("validator"):
existing_validator = collections[0]["options"]["validator"]
status_path = (
existing_validator.get("$jsonSchema", {})
.get("properties", {})
.get("status", {})
.get("enum", [])
)
if status_path and "processing_failed" not in status_path:
status_path.append("processing_failed")
await db.command(
"collMod",
"jobs",
validator=existing_validator,
validationAction="warn",
)
except Exception:
# No validator or unsupported — skip gracefully
pass
# Indexes for failure dashboard queries
await db.jobs.create_index(
[("failure.step", 1), ("status", 1)],
name="idx_jobs_failure_step_status",
background=True,
)
await db.jobs.create_index(
[("status", 1), ("organization_id", 1), ("created_at", -1)],
name="idx_jobs_status_org_created",
background=True,
)
async def down(self) -> None:
db = self.db
await db.jobs.drop_index("idx_jobs_failure_step_status")
await db.jobs.drop_index("idx_jobs_status_org_created")

View file

@ -1,46 +0,0 @@
"""Create job_briefs collection with indexes."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-29-000001"
description = "Create job_briefs collection and indexes"
async def up(self) -> None:
db = self.db
# Ensure collection exists (insert + delete a dummy doc)
try:
await db.create_collection("job_briefs")
except Exception:
pass # already exists
await db.job_briefs.create_index(
[("organization_id", 1), ("status", 1), ("created_at", -1)],
name="idx_briefs_org_status_created",
background=True,
)
await db.job_briefs.create_index(
[("created_by", 1)],
name="idx_briefs_created_by",
background=True,
)
await db.job_briefs.create_index(
[("project_id", 1)],
name="idx_briefs_project_id",
background=True,
sparse=True,
)
await db.job_briefs.create_index(
[("job_id", 1)],
name="idx_briefs_job_id",
background=True,
sparse=True,
)
async def down(self) -> None:
db = self.db
await db.job_briefs.drop_index("idx_briefs_org_status_created")
await db.job_briefs.drop_index("idx_briefs_created_by")
await db.job_briefs.drop_index("idx_briefs_project_id")
await db.job_briefs.drop_index("idx_briefs_job_id")

View file

@ -1,44 +0,0 @@
"""Backfill Membership.team_ids from Team.member_user_ids (MT-17)."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-30-000000"
description = "Backfill team_ids on Membership records from Team.member_user_ids"
async def up(self) -> None:
db = self.db
upserted = 0
# For each team that has member_user_ids, push team_id into the matching Membership
async for team in db.teams.find(
{"member_user_ids": {"$exists": True, "$ne": []}},
{"_id": 1, "client_id": 1, "member_user_ids": 1},
):
team_id = str(team["_id"])
org_id = str(team.get("client_id", ""))
for user_id in team.get("member_user_ids", []):
result = await db.memberships.update_one(
{"user_id": str(user_id), "organization_id": org_id},
{"$addToSet": {"team_ids": team_id}},
)
if result.modified_count:
upserted += 1
# Ensure index for efficient team-based lookups
await db.memberships.create_index(
[("team_ids", 1)],
name="idx_memberships_team_ids",
background=True,
sparse=True,
)
print(f"✅ Backfilled team_ids on {upserted} Membership records")
async def down(self) -> None:
db = self.db
await db.memberships.update_many({}, {"$unset": {"team_ids": ""}})
try:
await db.memberships.drop_index("idx_memberships_team_ids")
except Exception:
pass

View file

@ -1,38 +0,0 @@
"""Add cancelled status to job schema validator."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-30-000001"
description = "Add cancelled status to jobs collection schema validator"
async def up(self) -> None:
db = self.db
try:
validator_info = await db.command(
"listCollections", filter={"name": "jobs"}
)
collections = [c async for c in validator_info["cursor"]]
if collections and collections[0].get("options", {}).get("validator"):
existing_validator = collections[0]["options"]["validator"]
status_path = (
existing_validator.get("$jsonSchema", {})
.get("properties", {})
.get("status", {})
.get("enum", [])
)
if status_path and "cancelled" not in status_path:
status_path.append("cancelled")
await db.command(
"collMod",
"jobs",
validator=existing_validator,
validationAction="warn",
)
except Exception:
# No validator or unsupported — skip gracefully
pass
async def down(self) -> None:
pass

View file

@ -1,47 +0,0 @@
"""Replace status enum in $jsonSchema validator with the full current list."""
from app.migrations.migrator import Migration
ALL_STATUSES = [
"created", "ingesting", "ai_processing",
"pending_qc", "approved_english", "approved_source",
"rejected", "qc_feedback",
"translating", "tts_generating", "tts_failed",
"rendering_video", "render_failed", "rendering_qc",
"pending_final_review", "completed",
"processing_failed", "cancelled",
]
class Migration(Migration):
version = "2026-04-30-000002"
description = "Fix status enum in jobs $jsonSchema validator (add processing_failed + cancelled)"
async def up(self) -> None:
db = self.db
result = await db.command("listCollections", filter={"name": "jobs"})
batch = result.get("cursor", {}).get("firstBatch", [])
if not batch:
return
existing_validator = batch[0].get("options", {}).get("validator")
if not existing_validator:
return
schema = existing_validator.get("$jsonSchema", {})
status_prop = schema.get("properties", {}).get("status")
if not status_prop:
return
status_prop["enum"] = ALL_STATUSES
await db.command(
"collMod",
"jobs",
validator=existing_validator,
validationLevel="moderate",
validationAction="error",
)
async def down(self) -> None:
pass

View file

@ -1,26 +0,0 @@
"""Backfill source_has_ad=False on existing jobs and job_briefs."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-05-08-000000"
description = "Add source_has_ad field to jobs.source and job_briefs"
async def up(self) -> None:
db = self.db
jobs_result = await db.jobs.update_many(
{"source.source_has_ad": {"$exists": False}},
{"$set": {"source.source_has_ad": False}},
)
briefs_result = await db.job_briefs.update_many(
{"source_has_ad": {"$exists": False}},
{"$set": {"source_has_ad": False}},
)
print(f"✅ Backfilled source_has_ad on {jobs_result.modified_count} jobs, {briefs_result.modified_count} job_briefs")
async def down(self) -> None:
db = self.db
await db.jobs.update_many({}, {"$unset": {"source.source_has_ad": ""}})
await db.job_briefs.update_many({}, {"$unset": {"source_has_ad": ""}})

Binary file not shown.

Binary file not shown.

View file

@ -1,18 +1,17 @@
"""Audit log model for tracking sensitive operations."""
from datetime import datetime
from enum import StrEnum
from typing import Any
from enum import Enum
from typing import Any, Dict, Optional
from bson import ObjectId
from pydantic import BaseModel, Field
from .user import PyObjectId
class AuditAction(StrEnum):
class AuditAction(str, Enum):
"""Enumeration of auditable actions."""
# Authentication actions
LOGIN_SUCCESS = "auth.login.success"
LOGIN_FAILURE = "auth.login.failure"
@ -20,7 +19,7 @@ class AuditAction(StrEnum):
TOKEN_REFRESH = "auth.token.refresh"
PASSWORD_CHANGE = "auth.password.change"
PASSWORD_RESET = "auth.password.reset"
# User management actions
USER_CREATE = "user.create"
USER_UPDATE = "user.update"
@ -28,7 +27,7 @@ class AuditAction(StrEnum):
USER_ROLE_CHANGE = "user.role.change"
USER_ACTIVATE = "user.activate"
USER_DEACTIVATE = "user.deactivate"
# Job management actions
JOB_CREATE = "job.create"
JOB_UPDATE = "job.update"
@ -37,89 +36,24 @@ class AuditAction(StrEnum):
JOB_REJECT = "job.reject"
JOB_CANCEL = "job.cancel"
JOB_STATUS_CHANGE = "job.status.change"
JOB_TASK_FAILED = "job.task.failed"
JOB_RETRY = "job.retry"
JOB_BULK_RETRY = "job.bulk_retry"
# File operations
FILE_UPLOAD = "file.upload"
FILE_DOWNLOAD = "file.download"
FILE_DELETE = "file.delete"
FILE_ACCESS = "file.access"
# VTT editing actions
VTT_EDIT = "vtt.edit"
VTT_APPROVE = "vtt.approve"
VTT_REJECT = "vtt.reject"
VTT_RETRANSLATE = "vtt.retranslate"
# Per-language QC actions
LANGUAGE_QC_ASSIGN = "language_qc.assign"
LANGUAGE_QC_REASSIGN = "language_qc.reassign"
LANGUAGE_QC_REVIEWER_ASSIGN = "language_qc.reviewer_assign"
LANGUAGE_QC_REVIEWER_REASSIGN = "language_qc.reviewer_reassign"
LANGUAGE_QC_SUBMIT = "language_qc.submit"
LANGUAGE_QC_OPEN_REVIEW = "language_qc.open_review"
LANGUAGE_QC_APPROVE = "language_qc.approve"
LANGUAGE_QC_REJECT = "language_qc.reject"
LANGUAGE_QC_REOPEN = "language_qc.reopen"
LANGUAGE_QC_COMMENT = "language_qc.comment"
# Admin actions
ADMIN_CONFIG_CHANGE = "admin.config.change"
ADMIN_SYSTEM_ACTION = "admin.system.action"
ADMIN_DATA_EXPORT = "admin.data.export"
ADMIN_AUDIT_ACCESS = "admin.audit.access"
# Glossary management
GLOSSARY_UPLOAD = "glossary.upload"
GLOSSARY_VERSION_UPLOAD = "glossary.version.upload"
GLOSSARY_ACTIVATE = "glossary.activate"
GLOSSARY_ARCHIVE = "glossary.archive"
# Client management
CLIENT_CREATE = "client.create"
CLIENT_UPDATE = "client.update"
CLIENT_DEACTIVATE = "client.deactivate"
CLIENT_PM_ASSIGN = "client.pm_assign"
CLIENT_PM_REMOVE = "client.pm_remove"
CLIENT_TEAM_CREATE = "client.team_create"
CLIENT_TEAM_UPDATE = "client.team_update"
CLIENT_TEAM_DELETE = "client.team_delete"
CLIENT_TEAM_MEMBER_ADD = "client.team_member_add"
CLIENT_TEAM_MEMBER_REMOVE = "client.team_member_remove"
CLIENT_PROJECT_CREATE = "client.project_create"
CLIENT_PROJECT_UPDATE = "client.project_update"
CLIENT_PROJECT_ARCHIVE = "client.project_archive"
# Organization management
ORG_CREATE = "org.create"
ORG_UPDATE = "org.update"
ORG_MEMBER_ADD = "org.member_add"
ORG_MEMBER_UPDATE = "org.member_update"
ORG_MEMBER_REMOVE = "org.member_remove"
# Invitations
INVITATION_CREATE = "invitation.create"
INVITATION_REVOKE = "invitation.revoke"
INVITATION_ACCEPT = "invitation.accept"
# Language QC (additional)
LANGUAGE_QC_BULK_ASSIGN = "language_qc.bulk_assign"
LANGUAGE_QC_START_WORK = "language_qc.start_work"
LANGUAGE_QC_MARK_CUE_REVIEWED = "language_qc.mark_cue_reviewed"
# Brief management
BRIEF_CREATE = "brief.create"
BRIEF_UPDATE = "brief.update"
BRIEF_SUBMIT = "brief.submit"
BRIEF_APPROVE = "brief.approve"
# Share tokens
SHARE_TOKEN_CREATE = "share.token_create"
SHARE_TOKEN_REVOKE = "share.token_revoke"
SHARE_CLIENT_DECISION = "share.client_decision"
# Security events
RATE_LIMIT_EXCEEDED = "security.rate_limit.exceeded"
VALIDATION_FAILURE = "security.validation.failure"
@ -127,9 +61,9 @@ class AuditAction(StrEnum):
SUSPICIOUS_ACTIVITY = "security.suspicious.activity"
class AuditLogSeverity(StrEnum):
class AuditLogSeverity(str, Enum):
"""Severity levels for audit events."""
INFO = "info" # Normal operations
WARNING = "warning" # Suspicious but not critical
ERROR = "error" # Failed operations
@ -138,43 +72,43 @@ class AuditLogSeverity(StrEnum):
class AuditLog(BaseModel):
"""Audit log entry model."""
id: PyObjectId | None = Field(default_factory=lambda: str(ObjectId()), alias="_id")
id: Optional[PyObjectId] = Field(default_factory=PyObjectId, alias="_id")
# Core audit fields
timestamp: datetime = Field(default_factory=datetime.utcnow)
action: AuditAction
severity: AuditLogSeverity = AuditLogSeverity.INFO
# Actor information
user_id: PyObjectId | None = None
user_email: str | None = None
user_role: str | None = None
user_id: Optional[PyObjectId] = None
user_email: Optional[str] = None
user_role: Optional[str] = None
# Request context
ip_address: str | None = None
user_agent: str | None = None
request_id: str | None = None
session_id: str | None = None
ip_address: Optional[str] = None
user_agent: Optional[str] = None
request_id: Optional[str] = None
session_id: Optional[str] = None
# Resource information
resource_type: str | None = None # e.g., "job", "user", "file"
resource_id: str | None = None
resource_name: str | None = None
resource_type: Optional[str] = None # e.g., "job", "user", "file"
resource_id: Optional[str] = None
resource_name: Optional[str] = None
# Action details
description: str
details: dict[str, Any] = Field(default_factory=dict)
details: Dict[str, Any] = Field(default_factory=dict)
# Outcome
success: bool = True
error_message: str | None = None
error_message: Optional[str] = None
# Additional metadata
environment: str = "prod"
service_name: str = "accessible-video-api"
api_version: str = "v1"
class Config:
populate_by_name = True
arbitrary_types_allowed = True
@ -183,49 +117,49 @@ class AuditLog(BaseModel):
class AuditLogCreate(BaseModel):
"""Schema for creating audit log entries."""
action: AuditAction
severity: AuditLogSeverity = AuditLogSeverity.INFO
description: str
# Optional fields that can be provided
user_id: PyObjectId | None = None
user_email: str | None = None
user_role: str | None = None
ip_address: str | None = None
user_agent: str | None = None
request_id: str | None = None
resource_type: str | None = None
resource_id: str | None = None
resource_name: str | None = None
details: dict[str, Any] = Field(default_factory=dict)
user_id: Optional[PyObjectId] = None
user_email: Optional[str] = None
user_role: Optional[str] = None
ip_address: Optional[str] = None
user_agent: Optional[str] = None
request_id: Optional[str] = None
resource_type: Optional[str] = None
resource_id: Optional[str] = None
resource_name: Optional[str] = None
details: Dict[str, Any] = Field(default_factory=dict)
success: bool = True
error_message: str | None = None
error_message: Optional[str] = None
class AuditLogQuery(BaseModel):
"""Schema for querying audit logs."""
# Time range
start_date: datetime | None = None
end_date: datetime | None = None
start_date: Optional[datetime] = None
end_date: Optional[datetime] = None
# Filters
action: AuditAction | None = None
severity: AuditLogSeverity | None = None
user_id: PyObjectId | None = None
user_email: str | None = None
resource_type: str | None = None
resource_id: str | None = None
success: bool | None = None
action: Optional[AuditAction] = None
severity: Optional[AuditLogSeverity] = None
user_id: Optional[PyObjectId] = None
user_email: Optional[str] = None
resource_type: Optional[str] = None
resource_id: Optional[str] = None
success: Optional[bool] = None
# Search
search: str | None = None # Full-text search in description and details
search: Optional[str] = None # Full-text search in description and details
# Pagination
skip: int = 0
limit: int = 100
# Sorting
sort_by: str = "timestamp"
sort_order: int = -1 # -1 for descending, 1 for ascending
@ -233,7 +167,7 @@ class AuditLogQuery(BaseModel):
class AuditLogResponse(BaseModel):
"""Response schema for audit log queries."""
logs: list[AuditLog]
total_count: int
page: int

View file

@ -1,5 +1,5 @@
from datetime import datetime
from typing import Annotated
from typing import Optional, Annotated
from bson import ObjectId
from pydantic import BaseModel, BeforeValidator
@ -17,12 +17,12 @@ PyObjectId = Annotated[str, BeforeValidator(validate_object_id)]
class Client(BaseModel):
id: str | None = None
id: Optional[str] = None
name: str
slug: str
is_active: bool = True
created_at: datetime | None = None
updated_at: datetime | None = None
created_at: Optional[datetime] = None
updated_at: Optional[datetime] = None
class ClientCreate(BaseModel):
@ -31,18 +31,18 @@ class ClientCreate(BaseModel):
class ClientUpdate(BaseModel):
name: str | None = None
slug: str | None = None
is_active: bool | None = None
name: Optional[str] = None
slug: Optional[str] = None
is_active: Optional[bool] = None
class Team(BaseModel):
id: str | None = None
id: Optional[str] = None
name: str
client_id: str
member_user_ids: list[str] = []
created_at: datetime | None = None
updated_at: datetime | None = None
created_at: Optional[datetime] = None
updated_at: Optional[datetime] = None
class TeamCreate(BaseModel):
@ -50,31 +50,22 @@ class TeamCreate(BaseModel):
class TeamUpdate(BaseModel):
name: str | None = None
name: Optional[str] = None
class Project(BaseModel):
id: str | None = None
id: Optional[str] = None
name: str
client_id: str
is_active: bool = True
default_languages: list[str] = []
default_linguist_id: str | None = None
default_reviewer_id: str | None = None
created_at: datetime | None = None
updated_at: datetime | None = None
created_at: Optional[datetime] = None
updated_at: Optional[datetime] = None
class ProjectCreate(BaseModel):
name: str
default_languages: list[str] = []
default_linguist_id: str | None = None
default_reviewer_id: str | None = None
class ProjectUpdate(BaseModel):
name: str | None = None
is_active: bool | None = None
default_languages: list[str] | None = None
default_linguist_id: str | None = None
default_reviewer_id: str | None = None
name: Optional[str] = None
is_active: Optional[bool] = None

View file

@ -1,142 +0,0 @@
from __future__ import annotations
from datetime import datetime
from enum import StrEnum
from pydantic import BaseModel, Field
class GlossarySource(StrEnum):
XLSX_UPLOAD = "xlsx_upload"
FRAZE_API = "fraze_api" # reserved for future FRAZE integration
class GlossaryStatus(StrEnum):
ACTIVE = "active"
ARCHIVED = "archived"
class EmbeddingStatus(StrEnum):
PENDING = "pending"
IN_PROGRESS = "in_progress"
DONE = "done"
FAILED = "failed"
class Glossary(BaseModel):
id: str | None = Field(None, alias="_id")
client_id: str
name: str
description: str | None = None
source_locale: str # BCP-47 source column, e.g. "en-GB"
source: GlossarySource = GlossarySource.XLSX_UPLOAD
status: GlossaryStatus = GlossaryStatus.ACTIVE
current_version_id: str | None = None
created_at: datetime = Field(default_factory=datetime.utcnow)
created_by: str # user_id
model_config = {"populate_by_name": True, "arbitrary_types_allowed": True}
class GlossaryVersion(BaseModel):
id: str | None = Field(None, alias="_id")
glossary_id: str
version_number: int
source_xlsx_gcs_path: str | None = None # GCS path to original file
term_count: int = 0
embedded_count: int = 0
embedding_status: EmbeddingStatus = EmbeddingStatus.PENDING
created_at: datetime = Field(default_factory=datetime.utcnow)
created_by: str
change_note: str | None = None
model_config = {"populate_by_name": True}
class GlossaryTerm(BaseModel):
"""One source term with its per-locale translations."""
id: str | None = Field(None, alias="_id")
glossary_id: str
version_id: str
cid: str | None = None # 3M Content ID from xlsx
tid: str | None = None # 3M Term ID from xlsx
source_term: str # canonical source text (whitespace-normalised)
source_term_lower: str # lowercase for case-insensitive index
translations: dict[str, str] = {} # {locale_code: translated_text}
embedding: list[float] | None = None # 768-dim Gemini embedding
model_config = {"populate_by_name": True}
# ── Schema models (API request/response) ──────────────────────────────────────
class GlossaryCreate(BaseModel):
name: str
description: str | None = None
source_locale: str
change_note: str | None = None
class GlossaryVersionCreate(BaseModel):
source_locale: str
change_note: str | None = None
class GlossaryResponse(BaseModel):
id: str
client_id: str
name: str
description: str | None = None
source_locale: str
source: GlossarySource
status: GlossaryStatus
current_version_id: str | None = None
current_version_embedding_status: EmbeddingStatus | None = None
current_version_embedded_count: int | None = None
current_version_term_count: int | None = None
created_at: datetime
created_by: str
class GlossaryVersionResponse(BaseModel):
id: str
glossary_id: str
version_number: int
term_count: int
embedded_count: int
embedding_status: EmbeddingStatus
created_at: datetime
created_by: str
change_note: str | None = None
class GlossaryDetailResponse(GlossaryResponse):
versions: list[GlossaryVersionResponse] = []
class GlossaryTermPreview(BaseModel):
"""Subset of GlossaryTerm for UI previews."""
source_term: str
translations: dict[str, str]
class MatchedTerm(BaseModel):
"""A term matched against VTT source text, with the target-locale translation."""
source_term: str
target_translation: str
match_kind: str # "exact" | "vector"
score: float # 1.0 for exact, cosine similarity for vector
def glossary_from_doc(doc: dict) -> Glossary:
doc = dict(doc)
if "_id" in doc:
doc["_id"] = str(doc["_id"])
return Glossary.model_validate(doc)
def glossary_version_from_doc(doc: dict) -> GlossaryVersion:
doc = dict(doc)
if "_id" in doc:
doc["_id"] = str(doc["_id"])
return GlossaryVersion.model_validate(doc)

View file

@ -1,4 +1,5 @@
from datetime import datetime
from typing import Optional
from pydantic import BaseModel, EmailStr
@ -6,7 +7,7 @@ from .organization import OrgRole
class Invitation(BaseModel):
id: str | None = None
id: Optional[str] = None
email: str
organization_id: str
role_in_org: OrgRole
@ -14,9 +15,9 @@ class Invitation(BaseModel):
token_hash: str
invited_by_user_id: str
expires_at: datetime
accepted_at: datetime | None = None
revoked_at: datetime | None = None
created_at: datetime | None = None
accepted_at: Optional[datetime] = None
revoked_at: Optional[datetime] = None
created_at: Optional[datetime] = None
class InvitationCreate(BaseModel):
@ -39,9 +40,9 @@ class InvitationPreviewResponse(BaseModel):
class InvitationAcceptRequest(BaseModel):
token: str
full_name: str | None = None
password: str | None = None
ms_id_token: str | None = None
full_name: Optional[str] = None
password: Optional[str] = None
ms_id_token: Optional[str] = None
class InvitationResponse(BaseModel):
@ -51,9 +52,9 @@ class InvitationResponse(BaseModel):
role_in_org: OrgRole
invited_by_user_id: str
expires_at: datetime
accepted_at: datetime | None = None
revoked_at: datetime | None = None
created_at: datetime | None = None
accepted_at: Optional[datetime] = None
revoked_at: Optional[datetime] = None
created_at: Optional[datetime] = None
is_expired: bool = False
is_accepted: bool = False
is_revoked: bool = False

Some files were not shown because too many files have changed in this diff Show more