video-accessibility/video_accessibility_development_plan.txt
2025-08-24 16:28:33 -05:00

798 lines
No EOL
25 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Development Plan for the Accessible Video App
**(Gemini 2.5 Pro • Python/FastAPI backend • React + Vite SPA • MongoDB • GCS)**
This is a full, hand-off-ready plan tailored for a **React + Vite single-page app (SPA)** that talks to a **FastAPI** backend. It retains the flow you defined (Ingestion → QC → Translation/MP3 → Final Delivery) and details architecture, schemas, prompts, API contracts, pipelines, UI specs, CI/CD, and acceptance criteria—rewritten for a Vite SPA rather than Next.js.
---
## 1) Executive Summary
- **Goal:** Generate accessible assets from customer MP4 videos:
1) **Closed Captions (VTT)**
2) **Audio Description text (VTT)**
3) **Audio Description voiceover (MP3)**
with optional **translations** and **transcreation** per language.
- **Foundation Models & Services**
- **Gemini 2.5 Pro** for structured extraction, AD/CC generation, and transcreation
- **Google Cloud Translate** for standard translation
- **ElevenLabs or Google Cloud TTS** for voiceover MP3
- **Tech Stack**
- **Frontend:** React 18 + **Vite** (TypeScript), React Router v6+, TanStack Query, Zod, Axios/Fetch
- **Backend:** FastAPI (Python 3.11+), Celery workers, Redis, MongoDB Atlas, GCS, SendGrid
- **Observability:** OpenTelemetry, Sentry, Prometheus metrics
- **Auth:** JWT (access in memory; refresh in HttpOnly cookie), RBAC (client/reviewer/admin)
- **Infra:** Docker, GitHub Actions, Cloud Run (API & workers), Cloud Storage + Cloud CDN (SPA)
- **Key State Machine:**
`created → ingesting → ai_processing → pending_qc → approved_english | rejected → translating → tts_generating → pending_final_review → completed`
---
## 2) Flowchart → System Mapping
| Flow Node | SPA / Backend Mapping |
|---|---|
| A. Client uploads MP4 via Frontend | SPA `/jobs/new` → `POST /jobs` (multipart or signed-upload) → GCS + Mongo |
| B. Backend API Endpoint | FastAPI `/jobs` |
| C. Worker 1: Ingestion & AI | Celery `tasks.ingest_and_ai(job_id)` |
| D. MongoDB jobs collection | `jobs` collection (schema below) |
| E. GCS Bucket | `gs://accessible-video/{jobId}/...` |
| F. Build optimized prompt | `prompts/gemini_ingestion.md` |
| G. Call Gemini 2.5 Pro | `services.gemini.extract_accessibility()` |
| H/I. Parse JSON → Persist | Pydantic validation → write VTT files + DB fields |
| D_status_1 Pending_QC | `jobs.status="pending_qc"` |
| J. Admin UI Pending QC | SPA `/admin/qc` |
| KN. Approve/Reject | SPA calls `PATCH /jobs/:id` → DB status updates |
| O. DB change stream Listener | Triggers worker fan-out, translation/tts |
| P. Worker 2: Translation & MP3 | `tasks.translate_and_synthesize(job_id)` |
| Q. For each Language | Per-language subtasks |
| R/S/T. Translate/Transcreate/TTS | Translate API / Gemini / TTS provider |
| D_status_2 Pending_Final_Review | `jobs.status="pending_final_review"` |
| UV. Final Review & Complete | SPA `/admin/final` → `POST /jobs/:id/actions/complete` |
| WX. Notify Client | `tasks.notify_client(job_id)` + SendGrid |
| Y. Client Receives Files | Email with signed URLs + SPA downloads page |
---
## 3) Monorepo Structure
```
accessible-video/
backend/
app/
api/
v1/
routes_jobs.py
routes_auth.py
routes_files.py
routes_admin.py
core/
config.py
security.py
logging.py
dependencies.py
models/
job.py
user.py
file.py
audit_log.py
schemas/
job.py
user.py
file.py
auth.py
services/
gcs.py
gemini.py
translate.py
tts.py
emailer.py
signed_urls.py
tasks/
__init__.py
ingest_and_ai.py
translate_and_synthesize.py
notify.py
watchers.py
prompts/
gemini_ingestion.md
gemini_transcreation.md
telemetry/
tracing.py
metrics.py
tests/
unit/
integration/
e2e/
Dockerfile
pyproject.toml
poetry.lock
celery_worker.py
gunicorn_conf.py
frontend/
src/
App.tsx
main.tsx
routes/
index.tsx
jobs/
NewJob.tsx
JobDetail.tsx
admin/
QCList.tsx
QCDetail.tsx
FinalList.tsx
FinalDetail.tsx
components/
UploadDropzone.tsx
VttEditor/
VttEditor.tsx
utils.ts
VideoWithCaptions.tsx
StatusBadge.tsx
OutputsTable.tsx
ReviewerNotes.tsx
Auth/
LoginForm.tsx
RequireAuth.tsx
RoleGate.tsx
lib/
api.ts
auth.ts
queryClient.ts
store.ts
vtt.ts
styles/
index.css
types/
api.ts
hooks/
useAuth.ts
useJob.ts
public/
favicon.svg
index.html
tsconfig.json
vite.config.ts
package.json
.env.example
infra/
cloud-cdn/
spa-rewrite-config.md
k8s/ or cloud-run/
service.yaml
.github/
workflows/
ci.yml
cd-api.yml
cd-frontend.yml
Makefile
README.md
```
---
## 4) Environment & Secrets
**Backend `.env.example`**
```
APP_ENV=dev
API_BASE_URL=https://api.yourdomain.com
# Auth
JWT_SECRET=change_me
JWT_ALG=HS256
JWT_ACCESS_TTL_MIN=15
JWT_REFRESH_TTL_DAYS=7
COOKIE_DOMAIN=yourdomain.com
COOKIE_SECURE=true
COOKIE_SAMESITE=Lax
# MongoDB
MONGODB_URI=mongodb+srv://...
MONGODB_DB=accessible_video
# Redis
REDIS_URL=redis://...
# GCP
GCP_PROJECT_ID=...
GCS_BUCKET=accessible-video
GOOGLE_APPLICATION_CREDENTIALS=/secrets/gcp.json
# AI
GEMINI_API_KEY=...
TRANSLATE_API_KEY=...
ELEVENLABS_API_KEY=...
GOOGLE_TTS_CREDENTIALS=/secrets/gcp_tts.json
# Email
SENDGRID_API_KEY=...
EMAIL_FROM=support@yourdomain.com
CLIENT_BASE_URL=https://app.yourdomain.com
```
**Frontend `.env.example`**
```
VITE_API_BASE_URL=https://api.yourdomain.com
VITE_SENTRY_DSN=
VITE_APP_ENV=dev
```
**CORS & Cookies**
- Allow SPA origin on API CORS.
- **Refresh token**: set by API via **HttpOnly, SameSite=Lax** cookie.
- **Access token**: returned in body; store **in memory** (not localStorage) to minimize XSS risk.
---
## 5) Domain Model (MongoDB)
### 5.1 `jobs` collection
```json
{
"_id": {"$oid": "..."},
"client_id": {"$oid": "..."},
"title": "Acme Explainer",
"source": {
"filename": "acme.mp4",
"gcs_uri": "gs://accessible-video/64f.../source.mp4",
"duration_s": 123.4,
"language": "en"
},
"requested_outputs": {
"captions_vtt": true,
"audio_description_vtt": true,
"audio_description_mp3": true,
"languages": ["es","fr"],
"transcreation": ["es"]
},
"status": "pending_qc",
"review": {
"notes": "",
"reviewer_id": {"$oid": "..."},
"history": [
{"at": {"$date": "..."}, "status": "pending_qc", "by": "system"},
{"at": {"$date": "..."}, "status": "approved_english", "by": {"$oid": "..."}, "notes": "Looks good"}
]
},
"outputs": {
"en": {
"captions_vtt_gcs": "gs://.../en/captions.vtt",
"ad_vtt_gcs": "gs://.../en/audio_description.vtt",
"ad_mp3_gcs": "gs://.../en/ad.mp3"
},
"es": {
"captions_vtt_gcs": "gs://.../es/captions.vtt",
"ad_vtt_gcs": "gs://.../es/audio_description.vtt",
"ad_mp3_gcs": "gs://.../es/ad.mp3",
"origin": "translate|transcreate",
"qa_notes": ""
}
},
"ai": {
"ingestion_json": {},
"confidence": 0.92
},
"error": null,
"created_at": {"$date": "..."},
"updated_at": {"$date": "..."}
}
```
**Status enum:**
`created → ingesting → ai_processing → pending_qc → approved_english | rejected → translating → tts_generating → pending_final_review → completed`
**Indexes**
- `jobs`: `{ status: 1, created_at: -1 }`, `{ client_id: 1 }`
- `users`: `{ email: 1 } unique`
- `audit_logs`: `{ job_id: 1, when: -1 }`
---
## 6) Storage Layout (GCS)
```
gs://accessible-video/{jobId}/
source.mp4
en/
captions.vtt
ad.vtt
ad.mp3
{lang}/
captions.vtt
ad.vtt
ad.mp3
```
- Serve downloads via **signed URLs** (24h).
- Content-Types: `text/vtt`, `audio/mpeg`, `video/mp4`.
---
## 7) API Design (FastAPI)
### 7.1 Auth & RBAC
- `POST /auth/login` → returns `{ access }` and sets **refresh** cookie (HttpOnly).
- `POST /auth/refresh` → rotates and returns `{ access }`.
- `POST /auth/logout` → clears refresh cookie.
- Roles: `client | reviewer | admin`.
### 7.2 Jobs (main)
- `POST /jobs` (multipart OR signed-upload flow)
- Body: `file`, `title`, `language`, `requested_outputs`, `languages[]`, `transcreation[]`
- Creates DB record (`status=created`), stores file in GCS, enqueues `ingest_and_ai`.
- `GET /jobs?status=&mine=`
- `GET /jobs/{id}`
- `PATCH /jobs/{id}` (reviewer/admin): update `status`, `review.notes`, VTT text fields
- `POST /jobs/{id}/actions/approve_english`
- `POST /jobs/{id}/actions/reject` (notes required)
- `POST /jobs/{id}/actions/complete`
### 7.3 Files (optional signed upload optimization)
- `POST /files/signed-upload` → `{ url, fields }` (for direct browser → GCS)
- `POST /jobs/{id}/files` → additional assets
- `GET /jobs/{id}/downloads` → map of signed URLs
### 7.4 OpenAPI seed
```yaml
openapi: 3.0.3
info:
title: Accessible Video API
version: 1.0.0
paths:
/auth/login:
post: { summary: Login }
/auth/refresh:
post: { summary: Refresh access token }
/jobs:
post:
summary: Create a job from an MP4
requestBody:
content:
multipart/form-data:
schema:
type: object
properties:
file: { type: string, format: binary }
title: { type: string }
language: { type: string, example: "en" }
requested_outputs:
type: object
properties:
captions_vtt: { type: boolean }
audio_description_vtt: { type: boolean }
audio_description_mp3: { type: boolean }
languages:
type: array
items: { type: string }
transcreation:
type: array
items: { type: string }
responses:
'201': { description: Created }
```
---
## 8) Background Workers & Pipelines (Celery)
**Broker/Backend:** Redis (broker), Mongo or Redis for results
**Queues:** `ingest`, `translate`, `tts`, `notify`
**Idempotency:** `tasks_run[job_id][task_name]=hash(inputs)` to avoid duplication
### 8.1 Pipeline 1 — Ingestion & AI (`tasks.ingest_and_ai(job_id)`)
1. `status="ingesting"`
2. Probe video (ffprobe): duration, codec; update `source.duration_s`
3. **Gemini 2.5 Pro**:
- Preferred: pass audio/video to Gemini; ask for JSON containing:
- `transcript_plaintext`, `captions_vtt`, `audio_description_vtt`, `summary`, `confidence`
- Fallback: STT → Gemini transforms to VTT + AD VTT
4. Validate with Pydantic; if invalid → **self-heal** prompt; retry/backoff
5. Write `en/captions.vtt` & `en/ad.vtt` to GCS
6. Update `jobs.outputs.en.*` and `ai.*`
7. `status="pending_qc"`
### 8.2 Pipeline 2 — Translation & MP3 (`tasks.translate_and_synthesize(job_id)`)
Triggered by `approved_english`.
1. `status="translating"`
2. For each language `L`:
- **Standard**: Google Translate → rebuild VTT with same timestamps
- **Transcreation**: Gemini 2.5 Pro using brand/audience brief; preserve timings
- Save `{L}/captions.vtt`, `{L}/ad.vtt`
3. `status="tts_generating"`
4. For each `L` where MP3 requested:
- TTS synth per cue; if no timed SSML, stitch with pydub/ffmpeg (tiny crossfades)
- Save `{L}/ad.mp3`
5. `status="pending_final_review"`
### 8.3 Pipeline 3 — Notification (`tasks.notify_client(job_id)`)
Triggered by `completed`.
1. Compile signed URLs for each output
2. Send email via SendGrid (HTML template)
3. Append to `audit_logs`
### 8.4 DB Change Watcher
- Mongo change streams on `jobs.status`
- On `approved_english` → enqueue translation/tts
- On `completed` → enqueue notify
### 8.5 Retries & Error Handling
- Exponential backoff; max 5 attempts
- Persist `job.error` on terminal failure
- Sentry capture + alerting
---
## 9) Prompts (Gemini 2.5 Pro)
**`backend/app/prompts/gemini_ingestion.md`**
```
SYSTEM:
You are an expert accessibility writer for film/TV and e-learning. Produce STRICT JSON only.
USER:
You are given a video. Return a JSON object with:
- language: BCP-47 code (e.g., "en")
- confidence: 0..1
- summary: 12 sentence synopsis
- transcript_plaintext: full spoken words, punctuated
- captions_vtt: a valid WebVTT file as a single string, with accurate timings and no styling
- audio_description_vtt: a valid WebVTT file as a single string, describing key visual elements (no spoilers), synchronized with the program
Constraints:
- Output MUST be valid JSON. Do not include markdown fences.
- Use short, clear AD phrases. Do not duplicate spoken dialogue.
- WebVTT must start with "WEBVTT" and use HH:MM:SS.mmm timestamps.
Return ONLY the JSON.
```
**Self-heal re-ask**
```
SYSTEM: Return STRICT JSON. If you cannot, say "REASK" as plain text.
USER:
The previous output was not valid JSON. Return the same object again, ensuring it parses.
```
**`backend/app/prompts/gemini_transcreation.md`**
```
SYSTEM:
You are a culturally-savvy accessibility writer.
USER:
Rewrite the following English captions and audio descriptions into {TARGET_LANGUAGE}, preserving:
- meaning, tone, and accessibility intent,
- timing boundaries (same cue timestamps),
- line lengths friendly for readability (~3240 chars).
Input:
- captions_vtt_en: <VTT text>
- ad_vtt_en: <VTT text>
- brief: <brand + audience notes>
Output:
JSON:
{
"captions_vtt": "<VTT in {TARGET_LANGUAGE}>",
"audio_description_vtt": "<VTT in {TARGET_LANGUAGE}>"
}
```
---
## 10) Frontend (React + Vite SPA, TypeScript)
### 10.1 Tech choices
- **Routing:** React Router v6+ (nested routes; protected routes)
- **Data fetching:** TanStack Query (cache, retries, mutation)
- **Validation:** Zod (forms with zodResolver)
- **HTTP:** Axios or Fetch with interceptors for auth
- **State:** Minimal global UI state via Zustand (optional); server-state via React Query
- **Styling:** Tailwind or CSS Modules; Radix UI or headless components (optional)
- **Accessibility:** ARIA, keyboard navigation; caption preview
### 10.2 Auth strategy (SPA-friendly)
- **Login flow:**
- SPA `POST /auth/login` (email, password)
- API sets **refresh token cookie** (HttpOnly, SameSite=Lax); response body returns **access token**
- SPA stores access token **in memory** (React state or module variable)
- **Auto-refresh:**
- Axios interceptor checks 401; calls `/auth/refresh` (cookie present) to get new access token
- On failure, redirect to `/login`
- **Route protection:**
- `RequireAuth` wrapper checks auth; shows loader while refreshing
- `RoleGate` component enforces roles for admin/reviewer routes
### 10.3 Uploads
- **Option A (simple):** `POST /jobs` multipart → API streams to GCS
- **Option B (optimized):** SPA requests `POST /files/signed-upload`, then uploads directly to GCS with form-data (tus optional), finally calls `/jobs` with file metadata
- Display progress (XHR progress or tus events)
### 10.4 SPA Routes
```
/
- landing/dashboard (recent jobs)
/login
/jobs/new
/jobs/:id
/admin/qc
/admin/qc/:id
/admin/final
/admin/final/:id
```
### 10.5 Key Components & Pages
- **UploadDropzone.tsx**: drag-and-drop, validation, progress, cancel
- **JobForm.tsx**: title, base language, checkboxes for outputs, target languages, transcreation selection
- **JobDetail.tsx**: status timeline, video player w/ captions, audio player, downloads
- **VttEditor/**: cue list view, timestamp validation, diff mode, save/undo
- **QCList/QCDetail**: reviewer worklist and editor with Approve/Reject actions
- **FinalList/FinalDetail**: final check and Complete action
- **StatusBadge/OutputsTable**: present status and files per language
- **VideoWithCaptions.tsx**: HTML5 video + track element; swap captions by language
- **ReviewerNotes.tsx**: markdown textarea with autosave
### 10.6 Data Layer
- `lib/api.ts`: Axios instance (baseURL = `VITE_API_BASE_URL`), auth interceptors
- `lib/queryClient.ts`: create QueryClient with sensible defaults
- React Query hooks:
- `useJobs(filters)`
- `useJob(jobId)`
- `useCreateJob()`, `useUpdateJob()`
- `useApproveEnglish(jobId)`, `useRejectJob(jobId)`, `useCompleteJob(jobId)`
- Re-fetch policies: on window focus and network reconnect
### 10.7 Validation & Types
- `types/api.ts` TypeScript interfaces matching backend schemas
- Zod schemas for forms; narrow types on submit
### 10.8 Error Handling & UX
- Global error boundary for unexpected errors
- Toasts/snackbars for mutations; inline field errors
- Loading skeletons for lists and details
- Empty states and retries
### 10.9 Accessibility
- Keyboard focus indicators, skip links
- Ensure `track kind="captions"` usage; language switching accessible
- Semantic headings and labels
### 10.10 Frontend Security
- Access token only in memory; never localStorage
- Refresh token cookie: SameSite=Lax, Secure in prod
- CSRF:
- For same-site cookie strategy and pure JSON APIs, CSRF risk is minimized; if needed, implement double-submit token header
- CSP headers set at CDN (script-src 'self' plus Sentry/Vendor hosts)
---
## 11) Security & Compliance (end-to-end)
- **RBAC** on server; do not trust client
- **Audit logs** for reviewer/admin actions
- **PII-minimal** user model
- **TLS** everywhere, strict HSTS
- **Secret management** via GCP Secret Manager
- **Signed URLs** expire quickly (24h) and scoped to object
---
## 12) Observability
- **Backend:** Structured logs, tracing (OpenTelemetry), Prometheus metrics
- **Frontend:** Sentry (release + sourcemaps), console log suppression in prod
- **KPIs:** Job throughput, task latency, error rates, queue depth, time-to-completion
---
## 13) CI/CD & Deployment
### 13.1 Frontend (SPA)
- **Build:** `vite build` → static assets in `dist/`
- **Host:** Upload `dist/` to **GCS bucket** with **Cloud CDN** in front
- **SPA rewrite:** Set CDN rewrite for all non-asset paths → `/index.html`
- **Cache:** Long cache for hashed assets; no-cache for `index.html`
- **Environment:** Inject `VITE_*` at build; use `.env.production` in CI
### 13.2 Backend & Workers
- Docker images built in CI, pushed to Artifact Registry
- Deploy to Cloud Run (API and workers separately)
- Concurrency & autoscaling rules per queue load
- Migrate secrets via versions; run e2e smoke checks post-deploy
### 13.3 GitHub Actions (high level)
- `ci.yml`: lint (eslint, tsc, ruff, mypy), unit tests (vitest/pytest), build artifacts
- `cd-frontend.yml`: build SPA, upload to GCS, purge CDN
- `cd-api.yml`: build/push images, deploy to Cloud Run, run migrations (if any), smoke tests
---
## 14) Testing Strategy
### 14.1 Frontend
- **Unit:** Vitest + React Testing Library for components (VttEditor utils, UploadDropzone)
- **Integration:** Mock API (MSW) for query/mutation flows
- **E2E:** Playwright (auth, upload, QC approve/reject, final complete, downloads present)
### 14.2 Backend
- **Unit:** VTT parser/builder, prompt builders, signed URL helpers, RBAC
- **Integration:** Mock Gemini/Translate/TTS; assert outputs to GCS and DB mutations
- **E2E:** Full flow from `/jobs` to completion using small test MP4
### 14.3 Performance & Load
- K6/Locust to stress uploads and job queue; measure SLOs
---
## 15) Acceptance Criteria (Phase-wise)
### Phase 1: Ingestion & AI
- [ ] SPA allows MP4 upload and job creation
- [ ] `en/captions.vtt` & `en/ad.vtt` exist in GCS; job → `pending_qc`
- [ ] Job Detail shows English previews (video + captions toggle)
### Phase 2: QC Loop
- [ ] Reviewer edits VTT in SPA; changes persist
- [ ] Approve → `approved_english` and pipeline fires
- [ ] Reject → `rejected` with required notes → client sees reason
### Phase 3: Translation & MP3
- [ ] VTTs generated per language with preserved timings
- [ ] Transcreation performed for selected languages (spot-check)
- [ ] MP3 AD voiceovers present for requested languages
- [ ] Job → `pending_final_review`
### Phase 4: Final Review & Delivery
- [ ] Reviewer can mark `completed`
- [ ] Client receives email with signed links (expire in 24h)
- [ ] SPA shows “Completed” and enables downloads
---
## 16) Concrete Build Tasks for Claude Code (Step-by-step Prompts)
### 16.1 Backend scaffolding (FastAPI)
> Create a FastAPI project per structure in section 3. Implement `/auth/login`, `/auth/refresh`, `/auth/logout` with refresh cookie and access token response. Add RBAC decorators, Pydantic schemas mirroring section 5. Enable CORS for the SPA origin. Implement `/jobs` endpoints from section 7.2. Configure Gunicorn+Uvicorn workers.
### 16.2 Storage & Signed URLs
> Implement `services/gcs.py` for uploads, signed URLs, and text/binary writers with correct content-types. Unit tests for content-types and signed URL expiry.
### 16.3 Gemini/Translate/TTS Services
> Implement `services/gemini.py` (`extract_accessibility`, `transcreate`), `services/translate.py`, `services/tts.py` with retries and typed exceptions. Load prompts from `prompts/*.md`.
### 16.4 Celery & Pipelines
> Configure Celery (Redis broker). Implement tasks in `tasks/ingest_and_ai.py`, `tasks/translate_and_synthesize.py`, `tasks/notify.py`. Add Mongo change streams watcher to enqueue on status transitions.
### 16.5 VTT Utilities
> Implement `app/lib/vtt.py` to parse/build VTT, preserve timestamps, reconstruct translated/transcreated text. Unit tests with fixtures.
### 16.6 SPA scaffolding (Vite + React + TS)
> Initialize Vite (react-ts). Add React Router, TanStack Query, Axios, Zod, Tailwind. Create route layout per section 10.4. Implement `lib/api.ts` with interceptors for access token refresh. Add `RequireAuth` and `RoleGate`.
### 16.7 SPA Features
> Build Upload page with Dropzone and progress. Implement Job Detail with video+captions preview and download links. Create Admin QC List & Detail with `VttEditor` (cue edit, timestamp validation). Final Review pages with “Complete” action.
### 16.8 Email Templates
> Implement `services/emailer.py` and Jinja template for delivery email listing signed links by language.
### 16.9 Observability & CI/CD
> Add OpenTelemetry (server & workers), Sentry DSN support in SPA, GitHub Actions workflows for lint/test/build/deploy. Add Cloud CDN SPA rewrite and caching config.
---
## 17) Example Pydantic Schemas (Backend)
```python
# backend/app/schemas/job.py
from pydantic import BaseModel, Field, constr
from typing import List, Dict, Optional, Literal
Status = Literal[
"created","ingesting","ai_processing","pending_qc",
"approved_english","rejected",
"translating","tts_generating","pending_final_review","completed"
]
class Source(BaseModel):
filename: str
gcs_uri: str
duration_s: Optional[float] = None
language: constr(min_length=2, max_length=10) = "en"
class RequestedOutputs(BaseModel):
captions_vtt: bool = True
audio_description_vtt: bool = True
audio_description_mp3: bool = True
languages: List[str] = []
transcreation: List[str] = []
class LangOutput(BaseModel):
captions_vtt_gcs: Optional[str] = None
ad_vtt_gcs: Optional[str] = None
ad_mp3_gcs: Optional[str] = None
origin: Optional[Literal["translate","transcreate"]] = None
qa_notes: Optional[str] = None
class Outputs(BaseModel):
__root__: Dict[str, LangOutput] # keyed by language code
class ReviewHistoryItem(BaseModel):
at: str
status: str
by: Optional[str] = None
notes: Optional[str] = None
class Review(BaseModel):
notes: Optional[str] = ""
reviewer_id: Optional[str] = None
history: List[ReviewHistoryItem] = []
class AISection(BaseModel):
ingestion_json: Optional[dict] = None
confidence: Optional[float] = None
class Job(BaseModel):
id: Optional[str] = Field(None, alias="_id")
client_id: Optional[str] = None
title: str
source: Source
requested_outputs: RequestedOutputs
status: Status = "created"
review: Review = Review()
outputs: Optional[Outputs] = None
ai: Optional[AISection] = None
error: Optional[dict] = None
created_at: Optional[str] = None
updated_at: Optional[str] = None
```
---
## 18) Sample Test Fixtures
- `tests/fixtures/sample_ingestion.json` (valid Gemini output)
- `tests/fixtures/sample_en_captions.vtt`
- `tests/fixtures/sample_en_ad.vtt`
- `tests/fixtures/sample_es_captions.vtt`
- `tests/fixtures/sample_es_ad.vtt`
- `tests/fixtures/source_5s.mp4` (tiny clip)
---
## 19) Risk Matrix & Mitigations
- **Invalid JSON from model:** Pydantic validation + self-heal prompt + retries; capture bad response
- **Timestamp drift:** Preserve cue timings; only replace text
- **TTS alignment:** Per-cue synthesis; stitch with small crossfades
- **Large videos:** Chunk STT; parallelize; concatenate cues
- **Queue backlog:** Autoscale workers; alert on queue depth
- **Secrets exposure:** Secret Manager; least-privilege IAM; no keys in client
---
## 20) Future Enhancements
- Client-facing caption editor (limited rights)
- Translator role per language
- Brand glossaries & terminology management
- Watermarked preview player
- Webhooks for customer systems
---
## 21) Developer Definition of Done (per PR)
- [ ] Unit tests ≥80% for services/utils
- [ ] OpenAPI up-to-date
- [ ] RBAC enforced server-side
- [ ] Tasks idempotent with retry/backoff
- [ ] VTT validation on write
- [ ] Traces/metrics/logs in place
- [ ] Security scan clean
- [ ] Docs updated (README, ENV, runbooks)