# Development Plan for the Accessible Video App **(Gemini 2.5 Pro • Python/FastAPI backend • React + Vite SPA • MongoDB • GCS)** This is a full, hand-off-ready plan tailored for a **React + Vite single-page app (SPA)** that talks to a **FastAPI** backend. It retains the flow you defined (Ingestion → QC → Translation/MP3 → Final Delivery) and details architecture, schemas, prompts, API contracts, pipelines, UI specs, CI/CD, and acceptance criteria—rewritten for a Vite SPA rather than Next.js. --- ## 1) Executive Summary - **Goal:** Generate accessible assets from customer MP4 videos: 1) **Closed Captions (VTT)** 2) **Audio Description text (VTT)** 3) **Audio Description voiceover (MP3)** with optional **translations** and **transcreation** per language. - **Foundation Models & Services** - **Gemini 2.5 Pro** for structured extraction, AD/CC generation, and transcreation - **Google Cloud Translate** for standard translation - **ElevenLabs or Google Cloud TTS** for voiceover MP3 - **Tech Stack** - **Frontend:** React 18 + **Vite** (TypeScript), React Router v6+, TanStack Query, Zod, Axios/Fetch - **Backend:** FastAPI (Python 3.11+), Celery workers, Redis, MongoDB Atlas, GCS, SendGrid - **Observability:** OpenTelemetry, Sentry, Prometheus metrics - **Auth:** JWT (access in memory; refresh in HttpOnly cookie), RBAC (client/reviewer/admin) - **Infra:** Docker, GitHub Actions, Cloud Run (API & workers), Cloud Storage + Cloud CDN (SPA) - **Key State Machine:** `created → ingesting → ai_processing → pending_qc → approved_english | rejected → translating → tts_generating → pending_final_review → completed` --- ## 2) Flowchart → System Mapping | Flow Node | SPA / Backend Mapping | |---|---| | A. Client uploads MP4 via Frontend | SPA `/jobs/new` → `POST /jobs` (multipart or signed-upload) → GCS + Mongo | | B. Backend API Endpoint | FastAPI `/jobs` | | C. Worker 1: Ingestion & AI | Celery `tasks.ingest_and_ai(job_id)` | | D. MongoDB ‘jobs’ collection | `jobs` collection (schema below) | | E. GCS Bucket | `gs://accessible-video/{jobId}/...` | | F. Build optimized prompt | `prompts/gemini_ingestion.md` | | G. Call Gemini 2.5 Pro | `services.gemini.extract_accessibility()` | | H/I. Parse JSON → Persist | Pydantic validation → write VTT files + DB fields | | D_status_1 Pending_QC | `jobs.status="pending_qc"` | | J. Admin UI ‘Pending QC’ | SPA `/admin/qc` | | K–N. Approve/Reject | SPA calls `PATCH /jobs/:id` → DB status updates | | O. DB change stream Listener | Triggers worker fan-out, translation/tts | | P. Worker 2: Translation & MP3 | `tasks.translate_and_synthesize(job_id)` | | Q. For each Language | Per-language subtasks | | R/S/T. Translate/Transcreate/TTS | Translate API / Gemini / TTS provider | | D_status_2 Pending_Final_Review | `jobs.status="pending_final_review"` | | U–V. Final Review & Complete | SPA `/admin/final` → `POST /jobs/:id/actions/complete` | | W–X. Notify Client | `tasks.notify_client(job_id)` + SendGrid | | Y. Client Receives Files | Email with signed URLs + SPA downloads page | --- ## 3) Monorepo Structure ``` accessible-video/ backend/ app/ api/ v1/ routes_jobs.py routes_auth.py routes_files.py routes_admin.py core/ config.py security.py logging.py dependencies.py models/ job.py user.py file.py audit_log.py schemas/ job.py user.py file.py auth.py services/ gcs.py gemini.py translate.py tts.py emailer.py signed_urls.py tasks/ __init__.py ingest_and_ai.py translate_and_synthesize.py notify.py watchers.py prompts/ gemini_ingestion.md gemini_transcreation.md telemetry/ tracing.py metrics.py tests/ unit/ integration/ e2e/ Dockerfile pyproject.toml poetry.lock celery_worker.py gunicorn_conf.py frontend/ src/ App.tsx main.tsx routes/ index.tsx jobs/ NewJob.tsx JobDetail.tsx admin/ QCList.tsx QCDetail.tsx FinalList.tsx FinalDetail.tsx components/ UploadDropzone.tsx VttEditor/ VttEditor.tsx utils.ts VideoWithCaptions.tsx StatusBadge.tsx OutputsTable.tsx ReviewerNotes.tsx Auth/ LoginForm.tsx RequireAuth.tsx RoleGate.tsx lib/ api.ts auth.ts queryClient.ts store.ts vtt.ts styles/ index.css types/ api.ts hooks/ useAuth.ts useJob.ts public/ favicon.svg index.html tsconfig.json vite.config.ts package.json .env.example infra/ cloud-cdn/ spa-rewrite-config.md k8s/ or cloud-run/ service.yaml .github/ workflows/ ci.yml cd-api.yml cd-frontend.yml Makefile README.md ``` --- ## 4) Environment & Secrets **Backend `.env.example`** ``` APP_ENV=dev API_BASE_URL=https://api.yourdomain.com # Auth JWT_SECRET=change_me JWT_ALG=HS256 JWT_ACCESS_TTL_MIN=15 JWT_REFRESH_TTL_DAYS=7 COOKIE_DOMAIN=yourdomain.com COOKIE_SECURE=true COOKIE_SAMESITE=Lax # MongoDB MONGODB_URI=mongodb+srv://... MONGODB_DB=accessible_video # Redis REDIS_URL=redis://... # GCP GCP_PROJECT_ID=... GCS_BUCKET=accessible-video GOOGLE_APPLICATION_CREDENTIALS=/secrets/gcp.json # AI GEMINI_API_KEY=... TRANSLATE_API_KEY=... ELEVENLABS_API_KEY=... GOOGLE_TTS_CREDENTIALS=/secrets/gcp_tts.json # Email SENDGRID_API_KEY=... EMAIL_FROM=support@yourdomain.com CLIENT_BASE_URL=https://app.yourdomain.com ``` **Frontend `.env.example`** ``` VITE_API_BASE_URL=https://api.yourdomain.com VITE_SENTRY_DSN= VITE_APP_ENV=dev ``` **CORS & Cookies** - Allow SPA origin on API CORS. - **Refresh token**: set by API via **HttpOnly, SameSite=Lax** cookie. - **Access token**: returned in body; store **in memory** (not localStorage) to minimize XSS risk. --- ## 5) Domain Model (MongoDB) ### 5.1 `jobs` collection ```json { "_id": {"$oid": "..."}, "client_id": {"$oid": "..."}, "title": "Acme Explainer", "source": { "filename": "acme.mp4", "gcs_uri": "gs://accessible-video/64f.../source.mp4", "duration_s": 123.4, "language": "en" }, "requested_outputs": { "captions_vtt": true, "audio_description_vtt": true, "audio_description_mp3": true, "languages": ["es","fr"], "transcreation": ["es"] }, "status": "pending_qc", "review": { "notes": "", "reviewer_id": {"$oid": "..."}, "history": [ {"at": {"$date": "..."}, "status": "pending_qc", "by": "system"}, {"at": {"$date": "..."}, "status": "approved_english", "by": {"$oid": "..."}, "notes": "Looks good"} ] }, "outputs": { "en": { "captions_vtt_gcs": "gs://.../en/captions.vtt", "ad_vtt_gcs": "gs://.../en/audio_description.vtt", "ad_mp3_gcs": "gs://.../en/ad.mp3" }, "es": { "captions_vtt_gcs": "gs://.../es/captions.vtt", "ad_vtt_gcs": "gs://.../es/audio_description.vtt", "ad_mp3_gcs": "gs://.../es/ad.mp3", "origin": "translate|transcreate", "qa_notes": "" } }, "ai": { "ingestion_json": {}, "confidence": 0.92 }, "error": null, "created_at": {"$date": "..."}, "updated_at": {"$date": "..."} } ``` **Status enum:** `created → ingesting → ai_processing → pending_qc → approved_english | rejected → translating → tts_generating → pending_final_review → completed` **Indexes** - `jobs`: `{ status: 1, created_at: -1 }`, `{ client_id: 1 }` - `users`: `{ email: 1 } unique` - `audit_logs`: `{ job_id: 1, when: -1 }` --- ## 6) Storage Layout (GCS) ``` gs://accessible-video/{jobId}/ source.mp4 en/ captions.vtt ad.vtt ad.mp3 {lang}/ captions.vtt ad.vtt ad.mp3 ``` - Serve downloads via **signed URLs** (24h). - Content-Types: `text/vtt`, `audio/mpeg`, `video/mp4`. --- ## 7) API Design (FastAPI) ### 7.1 Auth & RBAC - `POST /auth/login` → returns `{ access }` and sets **refresh** cookie (HttpOnly). - `POST /auth/refresh` → rotates and returns `{ access }`. - `POST /auth/logout` → clears refresh cookie. - Roles: `client | reviewer | admin`. ### 7.2 Jobs (main) - `POST /jobs` (multipart OR signed-upload flow) - Body: `file`, `title`, `language`, `requested_outputs`, `languages[]`, `transcreation[]` - Creates DB record (`status=created`), stores file in GCS, enqueues `ingest_and_ai`. - `GET /jobs?status=&mine=` - `GET /jobs/{id}` - `PATCH /jobs/{id}` (reviewer/admin): update `status`, `review.notes`, VTT text fields - `POST /jobs/{id}/actions/approve_english` - `POST /jobs/{id}/actions/reject` (notes required) - `POST /jobs/{id}/actions/complete` ### 7.3 Files (optional signed upload optimization) - `POST /files/signed-upload` → `{ url, fields }` (for direct browser → GCS) - `POST /jobs/{id}/files` → additional assets - `GET /jobs/{id}/downloads` → map of signed URLs ### 7.4 OpenAPI seed ```yaml openapi: 3.0.3 info: title: Accessible Video API version: 1.0.0 paths: /auth/login: post: { summary: Login } /auth/refresh: post: { summary: Refresh access token } /jobs: post: summary: Create a job from an MP4 requestBody: content: multipart/form-data: schema: type: object properties: file: { type: string, format: binary } title: { type: string } language: { type: string, example: "en" } requested_outputs: type: object properties: captions_vtt: { type: boolean } audio_description_vtt: { type: boolean } audio_description_mp3: { type: boolean } languages: type: array items: { type: string } transcreation: type: array items: { type: string } responses: '201': { description: Created } ``` --- ## 8) Background Workers & Pipelines (Celery) **Broker/Backend:** Redis (broker), Mongo or Redis for results **Queues:** `ingest`, `translate`, `tts`, `notify` **Idempotency:** `tasks_run[job_id][task_name]=hash(inputs)` to avoid duplication ### 8.1 Pipeline 1 — Ingestion & AI (`tasks.ingest_and_ai(job_id)`) 1. `status="ingesting"` 2. Probe video (ffprobe): duration, codec; update `source.duration_s` 3. **Gemini 2.5 Pro**: - Preferred: pass audio/video to Gemini; ask for JSON containing: - `transcript_plaintext`, `captions_vtt`, `audio_description_vtt`, `summary`, `confidence` - Fallback: STT → Gemini transforms to VTT + AD VTT 4. Validate with Pydantic; if invalid → **self-heal** prompt; retry/backoff 5. Write `en/captions.vtt` & `en/ad.vtt` to GCS 6. Update `jobs.outputs.en.*` and `ai.*` 7. `status="pending_qc"` ### 8.2 Pipeline 2 — Translation & MP3 (`tasks.translate_and_synthesize(job_id)`) Triggered by `approved_english`. 1. `status="translating"` 2. For each language `L`: - **Standard**: Google Translate → rebuild VTT with same timestamps - **Transcreation**: Gemini 2.5 Pro using brand/audience brief; preserve timings - Save `{L}/captions.vtt`, `{L}/ad.vtt` 3. `status="tts_generating"` 4. For each `L` where MP3 requested: - TTS synth per cue; if no timed SSML, stitch with pydub/ffmpeg (tiny crossfades) - Save `{L}/ad.mp3` 5. `status="pending_final_review"` ### 8.3 Pipeline 3 — Notification (`tasks.notify_client(job_id)`) Triggered by `completed`. 1. Compile signed URLs for each output 2. Send email via SendGrid (HTML template) 3. Append to `audit_logs` ### 8.4 DB Change Watcher - Mongo change streams on `jobs.status` - On `approved_english` → enqueue translation/tts - On `completed` → enqueue notify ### 8.5 Retries & Error Handling - Exponential backoff; max 5 attempts - Persist `job.error` on terminal failure - Sentry capture + alerting --- ## 9) Prompts (Gemini 2.5 Pro) **`backend/app/prompts/gemini_ingestion.md`** ``` SYSTEM: You are an expert accessibility writer for film/TV and e-learning. Produce STRICT JSON only. USER: You are given a video. Return a JSON object with: - language: BCP-47 code (e.g., "en") - confidence: 0..1 - summary: 1–2 sentence synopsis - transcript_plaintext: full spoken words, punctuated - captions_vtt: a valid WebVTT file as a single string, with accurate timings and no styling - audio_description_vtt: a valid WebVTT file as a single string, describing key visual elements (no spoilers), synchronized with the program Constraints: - Output MUST be valid JSON. Do not include markdown fences. - Use short, clear AD phrases. Do not duplicate spoken dialogue. - WebVTT must start with "WEBVTT" and use HH:MM:SS.mmm timestamps. Return ONLY the JSON. ``` **Self-heal re-ask** ``` SYSTEM: Return STRICT JSON. If you cannot, say "REASK" as plain text. USER: The previous output was not valid JSON. Return the same object again, ensuring it parses. ``` **`backend/app/prompts/gemini_transcreation.md`** ``` SYSTEM: You are a culturally-savvy accessibility writer. USER: Rewrite the following English captions and audio descriptions into {TARGET_LANGUAGE}, preserving: - meaning, tone, and accessibility intent, - timing boundaries (same cue timestamps), - line lengths friendly for readability (~32–40 chars). Input: - captions_vtt_en: - ad_vtt_en: - brief: Output: JSON: { "captions_vtt": "", "audio_description_vtt": "" } ``` --- ## 10) Frontend (React + Vite SPA, TypeScript) ### 10.1 Tech choices - **Routing:** React Router v6+ (nested routes; protected routes) - **Data fetching:** TanStack Query (cache, retries, mutation) - **Validation:** Zod (forms with zodResolver) - **HTTP:** Axios or Fetch with interceptors for auth - **State:** Minimal global UI state via Zustand (optional); server-state via React Query - **Styling:** Tailwind or CSS Modules; Radix UI or headless components (optional) - **Accessibility:** ARIA, keyboard navigation; caption preview ### 10.2 Auth strategy (SPA-friendly) - **Login flow:** - SPA `POST /auth/login` (email, password) - API sets **refresh token cookie** (HttpOnly, SameSite=Lax); response body returns **access token** - SPA stores access token **in memory** (React state or module variable) - **Auto-refresh:** - Axios interceptor checks 401; calls `/auth/refresh` (cookie present) to get new access token - On failure, redirect to `/login` - **Route protection:** - `RequireAuth` wrapper checks auth; shows loader while refreshing - `RoleGate` component enforces roles for admin/reviewer routes ### 10.3 Uploads - **Option A (simple):** `POST /jobs` multipart → API streams to GCS - **Option B (optimized):** SPA requests `POST /files/signed-upload`, then uploads directly to GCS with form-data (tus optional), finally calls `/jobs` with file metadata - Display progress (XHR progress or tus events) ### 10.4 SPA Routes ``` / - landing/dashboard (recent jobs) /login /jobs/new /jobs/:id /admin/qc /admin/qc/:id /admin/final /admin/final/:id ``` ### 10.5 Key Components & Pages - **UploadDropzone.tsx**: drag-and-drop, validation, progress, cancel - **JobForm.tsx**: title, base language, checkboxes for outputs, target languages, transcreation selection - **JobDetail.tsx**: status timeline, video player w/ captions, audio player, downloads - **VttEditor/**: cue list view, timestamp validation, diff mode, save/undo - **QCList/QCDetail**: reviewer worklist and editor with Approve/Reject actions - **FinalList/FinalDetail**: final check and Complete action - **StatusBadge/OutputsTable**: present status and files per language - **VideoWithCaptions.tsx**: HTML5 video + track element; swap captions by language - **ReviewerNotes.tsx**: markdown textarea with autosave ### 10.6 Data Layer - `lib/api.ts`: Axios instance (baseURL = `VITE_API_BASE_URL`), auth interceptors - `lib/queryClient.ts`: create QueryClient with sensible defaults - React Query hooks: - `useJobs(filters)` - `useJob(jobId)` - `useCreateJob()`, `useUpdateJob()` - `useApproveEnglish(jobId)`, `useRejectJob(jobId)`, `useCompleteJob(jobId)` - Re-fetch policies: on window focus and network reconnect ### 10.7 Validation & Types - `types/api.ts` TypeScript interfaces matching backend schemas - Zod schemas for forms; narrow types on submit ### 10.8 Error Handling & UX - Global error boundary for unexpected errors - Toasts/snackbars for mutations; inline field errors - Loading skeletons for lists and details - Empty states and retries ### 10.9 Accessibility - Keyboard focus indicators, skip links - Ensure `track kind="captions"` usage; language switching accessible - Semantic headings and labels ### 10.10 Frontend Security - Access token only in memory; never localStorage - Refresh token cookie: SameSite=Lax, Secure in prod - CSRF: - For same-site cookie strategy and pure JSON APIs, CSRF risk is minimized; if needed, implement double-submit token header - CSP headers set at CDN (script-src 'self' plus Sentry/Vendor hosts) --- ## 11) Security & Compliance (end-to-end) - **RBAC** on server; do not trust client - **Audit logs** for reviewer/admin actions - **PII-minimal** user model - **TLS** everywhere, strict HSTS - **Secret management** via GCP Secret Manager - **Signed URLs** expire quickly (24h) and scoped to object --- ## 12) Observability - **Backend:** Structured logs, tracing (OpenTelemetry), Prometheus metrics - **Frontend:** Sentry (release + sourcemaps), console log suppression in prod - **KPIs:** Job throughput, task latency, error rates, queue depth, time-to-completion --- ## 13) CI/CD & Deployment ### 13.1 Frontend (SPA) - **Build:** `vite build` → static assets in `dist/` - **Host:** Upload `dist/` to **GCS bucket** with **Cloud CDN** in front - **SPA rewrite:** Set CDN rewrite for all non-asset paths → `/index.html` - **Cache:** Long cache for hashed assets; no-cache for `index.html` - **Environment:** Inject `VITE_*` at build; use `.env.production` in CI ### 13.2 Backend & Workers - Docker images built in CI, pushed to Artifact Registry - Deploy to Cloud Run (API and workers separately) - Concurrency & autoscaling rules per queue load - Migrate secrets via versions; run e2e smoke checks post-deploy ### 13.3 GitHub Actions (high level) - `ci.yml`: lint (eslint, tsc, ruff, mypy), unit tests (vitest/pytest), build artifacts - `cd-frontend.yml`: build SPA, upload to GCS, purge CDN - `cd-api.yml`: build/push images, deploy to Cloud Run, run migrations (if any), smoke tests --- ## 14) Testing Strategy ### 14.1 Frontend - **Unit:** Vitest + React Testing Library for components (VttEditor utils, UploadDropzone) - **Integration:** Mock API (MSW) for query/mutation flows - **E2E:** Playwright (auth, upload, QC approve/reject, final complete, downloads present) ### 14.2 Backend - **Unit:** VTT parser/builder, prompt builders, signed URL helpers, RBAC - **Integration:** Mock Gemini/Translate/TTS; assert outputs to GCS and DB mutations - **E2E:** Full flow from `/jobs` to completion using small test MP4 ### 14.3 Performance & Load - K6/Locust to stress uploads and job queue; measure SLOs --- ## 15) Acceptance Criteria (Phase-wise) ### Phase 1: Ingestion & AI - [ ] SPA allows MP4 upload and job creation - [ ] `en/captions.vtt` & `en/ad.vtt` exist in GCS; job → `pending_qc` - [ ] Job Detail shows English previews (video + captions toggle) ### Phase 2: QC Loop - [ ] Reviewer edits VTT in SPA; changes persist - [ ] Approve → `approved_english` and pipeline fires - [ ] Reject → `rejected` with required notes → client sees reason ### Phase 3: Translation & MP3 - [ ] VTTs generated per language with preserved timings - [ ] Transcreation performed for selected languages (spot-check) - [ ] MP3 AD voiceovers present for requested languages - [ ] Job → `pending_final_review` ### Phase 4: Final Review & Delivery - [ ] Reviewer can mark `completed` - [ ] Client receives email with signed links (expire in 24h) - [ ] SPA shows “Completed” and enables downloads --- ## 16) Concrete Build Tasks for Claude Code (Step-by-step Prompts) ### 16.1 Backend scaffolding (FastAPI) > Create a FastAPI project per structure in section 3. Implement `/auth/login`, `/auth/refresh`, `/auth/logout` with refresh cookie and access token response. Add RBAC decorators, Pydantic schemas mirroring section 5. Enable CORS for the SPA origin. Implement `/jobs` endpoints from section 7.2. Configure Gunicorn+Uvicorn workers. ### 16.2 Storage & Signed URLs > Implement `services/gcs.py` for uploads, signed URLs, and text/binary writers with correct content-types. Unit tests for content-types and signed URL expiry. ### 16.3 Gemini/Translate/TTS Services > Implement `services/gemini.py` (`extract_accessibility`, `transcreate`), `services/translate.py`, `services/tts.py` with retries and typed exceptions. Load prompts from `prompts/*.md`. ### 16.4 Celery & Pipelines > Configure Celery (Redis broker). Implement tasks in `tasks/ingest_and_ai.py`, `tasks/translate_and_synthesize.py`, `tasks/notify.py`. Add Mongo change streams watcher to enqueue on status transitions. ### 16.5 VTT Utilities > Implement `app/lib/vtt.py` to parse/build VTT, preserve timestamps, reconstruct translated/transcreated text. Unit tests with fixtures. ### 16.6 SPA scaffolding (Vite + React + TS) > Initialize Vite (react-ts). Add React Router, TanStack Query, Axios, Zod, Tailwind. Create route layout per section 10.4. Implement `lib/api.ts` with interceptors for access token refresh. Add `RequireAuth` and `RoleGate`. ### 16.7 SPA Features > Build Upload page with Dropzone and progress. Implement Job Detail with video+captions preview and download links. Create Admin QC List & Detail with `VttEditor` (cue edit, timestamp validation). Final Review pages with “Complete” action. ### 16.8 Email Templates > Implement `services/emailer.py` and Jinja template for delivery email listing signed links by language. ### 16.9 Observability & CI/CD > Add OpenTelemetry (server & workers), Sentry DSN support in SPA, GitHub Actions workflows for lint/test/build/deploy. Add Cloud CDN SPA rewrite and caching config. --- ## 17) Example Pydantic Schemas (Backend) ```python # backend/app/schemas/job.py from pydantic import BaseModel, Field, constr from typing import List, Dict, Optional, Literal Status = Literal[ "created","ingesting","ai_processing","pending_qc", "approved_english","rejected", "translating","tts_generating","pending_final_review","completed" ] class Source(BaseModel): filename: str gcs_uri: str duration_s: Optional[float] = None language: constr(min_length=2, max_length=10) = "en" class RequestedOutputs(BaseModel): captions_vtt: bool = True audio_description_vtt: bool = True audio_description_mp3: bool = True languages: List[str] = [] transcreation: List[str] = [] class LangOutput(BaseModel): captions_vtt_gcs: Optional[str] = None ad_vtt_gcs: Optional[str] = None ad_mp3_gcs: Optional[str] = None origin: Optional[Literal["translate","transcreate"]] = None qa_notes: Optional[str] = None class Outputs(BaseModel): __root__: Dict[str, LangOutput] # keyed by language code class ReviewHistoryItem(BaseModel): at: str status: str by: Optional[str] = None notes: Optional[str] = None class Review(BaseModel): notes: Optional[str] = "" reviewer_id: Optional[str] = None history: List[ReviewHistoryItem] = [] class AISection(BaseModel): ingestion_json: Optional[dict] = None confidence: Optional[float] = None class Job(BaseModel): id: Optional[str] = Field(None, alias="_id") client_id: Optional[str] = None title: str source: Source requested_outputs: RequestedOutputs status: Status = "created" review: Review = Review() outputs: Optional[Outputs] = None ai: Optional[AISection] = None error: Optional[dict] = None created_at: Optional[str] = None updated_at: Optional[str] = None ``` --- ## 18) Sample Test Fixtures - `tests/fixtures/sample_ingestion.json` (valid Gemini output) - `tests/fixtures/sample_en_captions.vtt` - `tests/fixtures/sample_en_ad.vtt` - `tests/fixtures/sample_es_captions.vtt` - `tests/fixtures/sample_es_ad.vtt` - `tests/fixtures/source_5s.mp4` (tiny clip) --- ## 19) Risk Matrix & Mitigations - **Invalid JSON from model:** Pydantic validation + self-heal prompt + retries; capture bad response - **Timestamp drift:** Preserve cue timings; only replace text - **TTS alignment:** Per-cue synthesis; stitch with small crossfades - **Large videos:** Chunk STT; parallelize; concatenate cues - **Queue backlog:** Autoscale workers; alert on queue depth - **Secrets exposure:** Secret Manager; least-privilege IAM; no keys in client --- ## 20) Future Enhancements - Client-facing caption editor (limited rights) - Translator role per language - Brand glossaries & terminology management - Watermarked preview player - Webhooks for customer systems --- ## 21) Developer Definition of Done (per PR) - [ ] Unit tests ≥80% for services/utils - [ ] OpenAPI up-to-date - [ ] RBAC enforced server-side - [ ] Tasks idempotent with retry/backoff - [ ] VTT validation on write - [ ] Traces/metrics/logs in place - [ ] Security scan clean - [ ] Docs updated (README, ENV, runbooks)