798 lines
No EOL
25 KiB
Text
798 lines
No EOL
25 KiB
Text
# Development Plan for the Accessible Video App
|
||
**(Gemini 2.5 Pro • Python/FastAPI backend • React + Vite SPA • MongoDB • GCS)**
|
||
|
||
This is a full, hand-off-ready plan tailored for a **React + Vite single-page app (SPA)** that talks to a **FastAPI** backend. It retains the flow you defined (Ingestion → QC → Translation/MP3 → Final Delivery) and details architecture, schemas, prompts, API contracts, pipelines, UI specs, CI/CD, and acceptance criteria—rewritten for a Vite SPA rather than Next.js.
|
||
|
||
---
|
||
|
||
## 1) Executive Summary
|
||
|
||
- **Goal:** Generate accessible assets from customer MP4 videos:
|
||
1) **Closed Captions (VTT)**
|
||
2) **Audio Description text (VTT)**
|
||
3) **Audio Description voiceover (MP3)**
|
||
with optional **translations** and **transcreation** per language.
|
||
- **Foundation Models & Services**
|
||
- **Gemini 2.5 Pro** for structured extraction, AD/CC generation, and transcreation
|
||
- **Google Cloud Translate** for standard translation
|
||
- **ElevenLabs or Google Cloud TTS** for voiceover MP3
|
||
- **Tech Stack**
|
||
- **Frontend:** React 18 + **Vite** (TypeScript), React Router v6+, TanStack Query, Zod, Axios/Fetch
|
||
- **Backend:** FastAPI (Python 3.11+), Celery workers, Redis, MongoDB Atlas, GCS, SendGrid
|
||
- **Observability:** OpenTelemetry, Sentry, Prometheus metrics
|
||
- **Auth:** JWT (access in memory; refresh in HttpOnly cookie), RBAC (client/reviewer/admin)
|
||
- **Infra:** Docker, GitHub Actions, Cloud Run (API & workers), Cloud Storage + Cloud CDN (SPA)
|
||
- **Key State Machine:**
|
||
`created → ingesting → ai_processing → pending_qc → approved_english | rejected → translating → tts_generating → pending_final_review → completed`
|
||
|
||
---
|
||
|
||
## 2) Flowchart → System Mapping
|
||
|
||
| Flow Node | SPA / Backend Mapping |
|
||
|---|---|
|
||
| A. Client uploads MP4 via Frontend | SPA `/jobs/new` → `POST /jobs` (multipart or signed-upload) → GCS + Mongo |
|
||
| B. Backend API Endpoint | FastAPI `/jobs` |
|
||
| C. Worker 1: Ingestion & AI | Celery `tasks.ingest_and_ai(job_id)` |
|
||
| D. MongoDB ‘jobs’ collection | `jobs` collection (schema below) |
|
||
| E. GCS Bucket | `gs://accessible-video/{jobId}/...` |
|
||
| F. Build optimized prompt | `prompts/gemini_ingestion.md` |
|
||
| G. Call Gemini 2.5 Pro | `services.gemini.extract_accessibility()` |
|
||
| H/I. Parse JSON → Persist | Pydantic validation → write VTT files + DB fields |
|
||
| D_status_1 Pending_QC | `jobs.status="pending_qc"` |
|
||
| J. Admin UI ‘Pending QC’ | SPA `/admin/qc` |
|
||
| K–N. Approve/Reject | SPA calls `PATCH /jobs/:id` → DB status updates |
|
||
| O. DB change stream Listener | Triggers worker fan-out, translation/tts |
|
||
| P. Worker 2: Translation & MP3 | `tasks.translate_and_synthesize(job_id)` |
|
||
| Q. For each Language | Per-language subtasks |
|
||
| R/S/T. Translate/Transcreate/TTS | Translate API / Gemini / TTS provider |
|
||
| D_status_2 Pending_Final_Review | `jobs.status="pending_final_review"` |
|
||
| U–V. Final Review & Complete | SPA `/admin/final` → `POST /jobs/:id/actions/complete` |
|
||
| W–X. Notify Client | `tasks.notify_client(job_id)` + SendGrid |
|
||
| Y. Client Receives Files | Email with signed URLs + SPA downloads page |
|
||
|
||
---
|
||
|
||
## 3) Monorepo Structure
|
||
|
||
```
|
||
accessible-video/
|
||
backend/
|
||
app/
|
||
api/
|
||
v1/
|
||
routes_jobs.py
|
||
routes_auth.py
|
||
routes_files.py
|
||
routes_admin.py
|
||
core/
|
||
config.py
|
||
security.py
|
||
logging.py
|
||
dependencies.py
|
||
models/
|
||
job.py
|
||
user.py
|
||
file.py
|
||
audit_log.py
|
||
schemas/
|
||
job.py
|
||
user.py
|
||
file.py
|
||
auth.py
|
||
services/
|
||
gcs.py
|
||
gemini.py
|
||
translate.py
|
||
tts.py
|
||
emailer.py
|
||
signed_urls.py
|
||
tasks/
|
||
__init__.py
|
||
ingest_and_ai.py
|
||
translate_and_synthesize.py
|
||
notify.py
|
||
watchers.py
|
||
prompts/
|
||
gemini_ingestion.md
|
||
gemini_transcreation.md
|
||
telemetry/
|
||
tracing.py
|
||
metrics.py
|
||
tests/
|
||
unit/
|
||
integration/
|
||
e2e/
|
||
Dockerfile
|
||
pyproject.toml
|
||
poetry.lock
|
||
celery_worker.py
|
||
gunicorn_conf.py
|
||
|
||
frontend/
|
||
src/
|
||
App.tsx
|
||
main.tsx
|
||
routes/
|
||
index.tsx
|
||
jobs/
|
||
NewJob.tsx
|
||
JobDetail.tsx
|
||
admin/
|
||
QCList.tsx
|
||
QCDetail.tsx
|
||
FinalList.tsx
|
||
FinalDetail.tsx
|
||
components/
|
||
UploadDropzone.tsx
|
||
VttEditor/
|
||
VttEditor.tsx
|
||
utils.ts
|
||
VideoWithCaptions.tsx
|
||
StatusBadge.tsx
|
||
OutputsTable.tsx
|
||
ReviewerNotes.tsx
|
||
Auth/
|
||
LoginForm.tsx
|
||
RequireAuth.tsx
|
||
RoleGate.tsx
|
||
lib/
|
||
api.ts
|
||
auth.ts
|
||
queryClient.ts
|
||
store.ts
|
||
vtt.ts
|
||
styles/
|
||
index.css
|
||
types/
|
||
api.ts
|
||
hooks/
|
||
useAuth.ts
|
||
useJob.ts
|
||
public/
|
||
favicon.svg
|
||
index.html
|
||
tsconfig.json
|
||
vite.config.ts
|
||
package.json
|
||
.env.example
|
||
|
||
infra/
|
||
cloud-cdn/
|
||
spa-rewrite-config.md
|
||
k8s/ or cloud-run/
|
||
service.yaml
|
||
|
||
.github/
|
||
workflows/
|
||
ci.yml
|
||
cd-api.yml
|
||
cd-frontend.yml
|
||
|
||
Makefile
|
||
README.md
|
||
```
|
||
|
||
---
|
||
|
||
## 4) Environment & Secrets
|
||
|
||
**Backend `.env.example`**
|
||
```
|
||
APP_ENV=dev
|
||
API_BASE_URL=https://api.yourdomain.com
|
||
|
||
# Auth
|
||
JWT_SECRET=change_me
|
||
JWT_ALG=HS256
|
||
JWT_ACCESS_TTL_MIN=15
|
||
JWT_REFRESH_TTL_DAYS=7
|
||
COOKIE_DOMAIN=yourdomain.com
|
||
COOKIE_SECURE=true
|
||
COOKIE_SAMESITE=Lax
|
||
|
||
# MongoDB
|
||
MONGODB_URI=mongodb+srv://...
|
||
MONGODB_DB=accessible_video
|
||
|
||
# Redis
|
||
REDIS_URL=redis://...
|
||
|
||
# GCP
|
||
GCP_PROJECT_ID=...
|
||
GCS_BUCKET=accessible-video
|
||
GOOGLE_APPLICATION_CREDENTIALS=/secrets/gcp.json
|
||
|
||
# AI
|
||
GEMINI_API_KEY=...
|
||
TRANSLATE_API_KEY=...
|
||
ELEVENLABS_API_KEY=...
|
||
GOOGLE_TTS_CREDENTIALS=/secrets/gcp_tts.json
|
||
|
||
# Email
|
||
SENDGRID_API_KEY=...
|
||
EMAIL_FROM=support@yourdomain.com
|
||
CLIENT_BASE_URL=https://app.yourdomain.com
|
||
```
|
||
|
||
**Frontend `.env.example`**
|
||
```
|
||
VITE_API_BASE_URL=https://api.yourdomain.com
|
||
VITE_SENTRY_DSN=
|
||
VITE_APP_ENV=dev
|
||
```
|
||
|
||
**CORS & Cookies**
|
||
- Allow SPA origin on API CORS.
|
||
- **Refresh token**: set by API via **HttpOnly, SameSite=Lax** cookie.
|
||
- **Access token**: returned in body; store **in memory** (not localStorage) to minimize XSS risk.
|
||
|
||
---
|
||
|
||
## 5) Domain Model (MongoDB)
|
||
|
||
### 5.1 `jobs` collection
|
||
```json
|
||
{
|
||
"_id": {"$oid": "..."},
|
||
"client_id": {"$oid": "..."},
|
||
"title": "Acme Explainer",
|
||
"source": {
|
||
"filename": "acme.mp4",
|
||
"gcs_uri": "gs://accessible-video/64f.../source.mp4",
|
||
"duration_s": 123.4,
|
||
"language": "en"
|
||
},
|
||
"requested_outputs": {
|
||
"captions_vtt": true,
|
||
"audio_description_vtt": true,
|
||
"audio_description_mp3": true,
|
||
"languages": ["es","fr"],
|
||
"transcreation": ["es"]
|
||
},
|
||
"status": "pending_qc",
|
||
"review": {
|
||
"notes": "",
|
||
"reviewer_id": {"$oid": "..."},
|
||
"history": [
|
||
{"at": {"$date": "..."}, "status": "pending_qc", "by": "system"},
|
||
{"at": {"$date": "..."}, "status": "approved_english", "by": {"$oid": "..."}, "notes": "Looks good"}
|
||
]
|
||
},
|
||
"outputs": {
|
||
"en": {
|
||
"captions_vtt_gcs": "gs://.../en/captions.vtt",
|
||
"ad_vtt_gcs": "gs://.../en/audio_description.vtt",
|
||
"ad_mp3_gcs": "gs://.../en/ad.mp3"
|
||
},
|
||
"es": {
|
||
"captions_vtt_gcs": "gs://.../es/captions.vtt",
|
||
"ad_vtt_gcs": "gs://.../es/audio_description.vtt",
|
||
"ad_mp3_gcs": "gs://.../es/ad.mp3",
|
||
"origin": "translate|transcreate",
|
||
"qa_notes": ""
|
||
}
|
||
},
|
||
"ai": {
|
||
"ingestion_json": {},
|
||
"confidence": 0.92
|
||
},
|
||
"error": null,
|
||
"created_at": {"$date": "..."},
|
||
"updated_at": {"$date": "..."}
|
||
}
|
||
```
|
||
|
||
**Status enum:**
|
||
`created → ingesting → ai_processing → pending_qc → approved_english | rejected → translating → tts_generating → pending_final_review → completed`
|
||
|
||
**Indexes**
|
||
- `jobs`: `{ status: 1, created_at: -1 }`, `{ client_id: 1 }`
|
||
- `users`: `{ email: 1 } unique`
|
||
- `audit_logs`: `{ job_id: 1, when: -1 }`
|
||
|
||
---
|
||
|
||
## 6) Storage Layout (GCS)
|
||
|
||
```
|
||
gs://accessible-video/{jobId}/
|
||
source.mp4
|
||
en/
|
||
captions.vtt
|
||
ad.vtt
|
||
ad.mp3
|
||
{lang}/
|
||
captions.vtt
|
||
ad.vtt
|
||
ad.mp3
|
||
```
|
||
|
||
- Serve downloads via **signed URLs** (24h).
|
||
- Content-Types: `text/vtt`, `audio/mpeg`, `video/mp4`.
|
||
|
||
---
|
||
|
||
## 7) API Design (FastAPI)
|
||
|
||
### 7.1 Auth & RBAC
|
||
- `POST /auth/login` → returns `{ access }` and sets **refresh** cookie (HttpOnly).
|
||
- `POST /auth/refresh` → rotates and returns `{ access }`.
|
||
- `POST /auth/logout` → clears refresh cookie.
|
||
- Roles: `client | reviewer | admin`.
|
||
|
||
### 7.2 Jobs (main)
|
||
- `POST /jobs` (multipart OR signed-upload flow)
|
||
- Body: `file`, `title`, `language`, `requested_outputs`, `languages[]`, `transcreation[]`
|
||
- Creates DB record (`status=created`), stores file in GCS, enqueues `ingest_and_ai`.
|
||
- `GET /jobs?status=&mine=`
|
||
- `GET /jobs/{id}`
|
||
- `PATCH /jobs/{id}` (reviewer/admin): update `status`, `review.notes`, VTT text fields
|
||
- `POST /jobs/{id}/actions/approve_english`
|
||
- `POST /jobs/{id}/actions/reject` (notes required)
|
||
- `POST /jobs/{id}/actions/complete`
|
||
|
||
### 7.3 Files (optional signed upload optimization)
|
||
- `POST /files/signed-upload` → `{ url, fields }` (for direct browser → GCS)
|
||
- `POST /jobs/{id}/files` → additional assets
|
||
- `GET /jobs/{id}/downloads` → map of signed URLs
|
||
|
||
### 7.4 OpenAPI seed
|
||
```yaml
|
||
openapi: 3.0.3
|
||
info:
|
||
title: Accessible Video API
|
||
version: 1.0.0
|
||
paths:
|
||
/auth/login:
|
||
post: { summary: Login }
|
||
/auth/refresh:
|
||
post: { summary: Refresh access token }
|
||
/jobs:
|
||
post:
|
||
summary: Create a job from an MP4
|
||
requestBody:
|
||
content:
|
||
multipart/form-data:
|
||
schema:
|
||
type: object
|
||
properties:
|
||
file: { type: string, format: binary }
|
||
title: { type: string }
|
||
language: { type: string, example: "en" }
|
||
requested_outputs:
|
||
type: object
|
||
properties:
|
||
captions_vtt: { type: boolean }
|
||
audio_description_vtt: { type: boolean }
|
||
audio_description_mp3: { type: boolean }
|
||
languages:
|
||
type: array
|
||
items: { type: string }
|
||
transcreation:
|
||
type: array
|
||
items: { type: string }
|
||
responses:
|
||
'201': { description: Created }
|
||
```
|
||
|
||
---
|
||
|
||
## 8) Background Workers & Pipelines (Celery)
|
||
|
||
**Broker/Backend:** Redis (broker), Mongo or Redis for results
|
||
**Queues:** `ingest`, `translate`, `tts`, `notify`
|
||
**Idempotency:** `tasks_run[job_id][task_name]=hash(inputs)` to avoid duplication
|
||
|
||
### 8.1 Pipeline 1 — Ingestion & AI (`tasks.ingest_and_ai(job_id)`)
|
||
1. `status="ingesting"`
|
||
2. Probe video (ffprobe): duration, codec; update `source.duration_s`
|
||
3. **Gemini 2.5 Pro**:
|
||
- Preferred: pass audio/video to Gemini; ask for JSON containing:
|
||
- `transcript_plaintext`, `captions_vtt`, `audio_description_vtt`, `summary`, `confidence`
|
||
- Fallback: STT → Gemini transforms to VTT + AD VTT
|
||
4. Validate with Pydantic; if invalid → **self-heal** prompt; retry/backoff
|
||
5. Write `en/captions.vtt` & `en/ad.vtt` to GCS
|
||
6. Update `jobs.outputs.en.*` and `ai.*`
|
||
7. `status="pending_qc"`
|
||
|
||
### 8.2 Pipeline 2 — Translation & MP3 (`tasks.translate_and_synthesize(job_id)`)
|
||
Triggered by `approved_english`.
|
||
|
||
1. `status="translating"`
|
||
2. For each language `L`:
|
||
- **Standard**: Google Translate → rebuild VTT with same timestamps
|
||
- **Transcreation**: Gemini 2.5 Pro using brand/audience brief; preserve timings
|
||
- Save `{L}/captions.vtt`, `{L}/ad.vtt`
|
||
3. `status="tts_generating"`
|
||
4. For each `L` where MP3 requested:
|
||
- TTS synth per cue; if no timed SSML, stitch with pydub/ffmpeg (tiny crossfades)
|
||
- Save `{L}/ad.mp3`
|
||
5. `status="pending_final_review"`
|
||
|
||
### 8.3 Pipeline 3 — Notification (`tasks.notify_client(job_id)`)
|
||
Triggered by `completed`.
|
||
1. Compile signed URLs for each output
|
||
2. Send email via SendGrid (HTML template)
|
||
3. Append to `audit_logs`
|
||
|
||
### 8.4 DB Change Watcher
|
||
- Mongo change streams on `jobs.status`
|
||
- On `approved_english` → enqueue translation/tts
|
||
- On `completed` → enqueue notify
|
||
|
||
### 8.5 Retries & Error Handling
|
||
- Exponential backoff; max 5 attempts
|
||
- Persist `job.error` on terminal failure
|
||
- Sentry capture + alerting
|
||
|
||
---
|
||
|
||
## 9) Prompts (Gemini 2.5 Pro)
|
||
|
||
**`backend/app/prompts/gemini_ingestion.md`**
|
||
```
|
||
SYSTEM:
|
||
You are an expert accessibility writer for film/TV and e-learning. Produce STRICT JSON only.
|
||
|
||
USER:
|
||
You are given a video. Return a JSON object with:
|
||
- language: BCP-47 code (e.g., "en")
|
||
- confidence: 0..1
|
||
- summary: 1–2 sentence synopsis
|
||
- transcript_plaintext: full spoken words, punctuated
|
||
- captions_vtt: a valid WebVTT file as a single string, with accurate timings and no styling
|
||
- audio_description_vtt: a valid WebVTT file as a single string, describing key visual elements (no spoilers), synchronized with the program
|
||
|
||
Constraints:
|
||
- Output MUST be valid JSON. Do not include markdown fences.
|
||
- Use short, clear AD phrases. Do not duplicate spoken dialogue.
|
||
- WebVTT must start with "WEBVTT" and use HH:MM:SS.mmm timestamps.
|
||
|
||
Return ONLY the JSON.
|
||
```
|
||
|
||
**Self-heal re-ask**
|
||
```
|
||
SYSTEM: Return STRICT JSON. If you cannot, say "REASK" as plain text.
|
||
|
||
USER:
|
||
The previous output was not valid JSON. Return the same object again, ensuring it parses.
|
||
```
|
||
|
||
**`backend/app/prompts/gemini_transcreation.md`**
|
||
```
|
||
SYSTEM:
|
||
You are a culturally-savvy accessibility writer.
|
||
|
||
USER:
|
||
Rewrite the following English captions and audio descriptions into {TARGET_LANGUAGE}, preserving:
|
||
- meaning, tone, and accessibility intent,
|
||
- timing boundaries (same cue timestamps),
|
||
- line lengths friendly for readability (~32–40 chars).
|
||
|
||
Input:
|
||
- captions_vtt_en: <VTT text>
|
||
- ad_vtt_en: <VTT text>
|
||
- brief: <brand + audience notes>
|
||
|
||
Output:
|
||
JSON:
|
||
{
|
||
"captions_vtt": "<VTT in {TARGET_LANGUAGE}>",
|
||
"audio_description_vtt": "<VTT in {TARGET_LANGUAGE}>"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 10) Frontend (React + Vite SPA, TypeScript)
|
||
|
||
### 10.1 Tech choices
|
||
- **Routing:** React Router v6+ (nested routes; protected routes)
|
||
- **Data fetching:** TanStack Query (cache, retries, mutation)
|
||
- **Validation:** Zod (forms with zodResolver)
|
||
- **HTTP:** Axios or Fetch with interceptors for auth
|
||
- **State:** Minimal global UI state via Zustand (optional); server-state via React Query
|
||
- **Styling:** Tailwind or CSS Modules; Radix UI or headless components (optional)
|
||
- **Accessibility:** ARIA, keyboard navigation; caption preview
|
||
|
||
### 10.2 Auth strategy (SPA-friendly)
|
||
- **Login flow:**
|
||
- SPA `POST /auth/login` (email, password)
|
||
- API sets **refresh token cookie** (HttpOnly, SameSite=Lax); response body returns **access token**
|
||
- SPA stores access token **in memory** (React state or module variable)
|
||
- **Auto-refresh:**
|
||
- Axios interceptor checks 401; calls `/auth/refresh` (cookie present) to get new access token
|
||
- On failure, redirect to `/login`
|
||
- **Route protection:**
|
||
- `RequireAuth` wrapper checks auth; shows loader while refreshing
|
||
- `RoleGate` component enforces roles for admin/reviewer routes
|
||
|
||
### 10.3 Uploads
|
||
- **Option A (simple):** `POST /jobs` multipart → API streams to GCS
|
||
- **Option B (optimized):** SPA requests `POST /files/signed-upload`, then uploads directly to GCS with form-data (tus optional), finally calls `/jobs` with file metadata
|
||
- Display progress (XHR progress or tus events)
|
||
|
||
### 10.4 SPA Routes
|
||
```
|
||
/
|
||
- landing/dashboard (recent jobs)
|
||
/login
|
||
/jobs/new
|
||
/jobs/:id
|
||
/admin/qc
|
||
/admin/qc/:id
|
||
/admin/final
|
||
/admin/final/:id
|
||
```
|
||
|
||
### 10.5 Key Components & Pages
|
||
- **UploadDropzone.tsx**: drag-and-drop, validation, progress, cancel
|
||
- **JobForm.tsx**: title, base language, checkboxes for outputs, target languages, transcreation selection
|
||
- **JobDetail.tsx**: status timeline, video player w/ captions, audio player, downloads
|
||
- **VttEditor/**: cue list view, timestamp validation, diff mode, save/undo
|
||
- **QCList/QCDetail**: reviewer worklist and editor with Approve/Reject actions
|
||
- **FinalList/FinalDetail**: final check and Complete action
|
||
- **StatusBadge/OutputsTable**: present status and files per language
|
||
- **VideoWithCaptions.tsx**: HTML5 video + track element; swap captions by language
|
||
- **ReviewerNotes.tsx**: markdown textarea with autosave
|
||
|
||
### 10.6 Data Layer
|
||
- `lib/api.ts`: Axios instance (baseURL = `VITE_API_BASE_URL`), auth interceptors
|
||
- `lib/queryClient.ts`: create QueryClient with sensible defaults
|
||
- React Query hooks:
|
||
- `useJobs(filters)`
|
||
- `useJob(jobId)`
|
||
- `useCreateJob()`, `useUpdateJob()`
|
||
- `useApproveEnglish(jobId)`, `useRejectJob(jobId)`, `useCompleteJob(jobId)`
|
||
- Re-fetch policies: on window focus and network reconnect
|
||
|
||
### 10.7 Validation & Types
|
||
- `types/api.ts` TypeScript interfaces matching backend schemas
|
||
- Zod schemas for forms; narrow types on submit
|
||
|
||
### 10.8 Error Handling & UX
|
||
- Global error boundary for unexpected errors
|
||
- Toasts/snackbars for mutations; inline field errors
|
||
- Loading skeletons for lists and details
|
||
- Empty states and retries
|
||
|
||
### 10.9 Accessibility
|
||
- Keyboard focus indicators, skip links
|
||
- Ensure `track kind="captions"` usage; language switching accessible
|
||
- Semantic headings and labels
|
||
|
||
### 10.10 Frontend Security
|
||
- Access token only in memory; never localStorage
|
||
- Refresh token cookie: SameSite=Lax, Secure in prod
|
||
- CSRF:
|
||
- For same-site cookie strategy and pure JSON APIs, CSRF risk is minimized; if needed, implement double-submit token header
|
||
- CSP headers set at CDN (script-src 'self' plus Sentry/Vendor hosts)
|
||
|
||
---
|
||
|
||
## 11) Security & Compliance (end-to-end)
|
||
|
||
- **RBAC** on server; do not trust client
|
||
- **Audit logs** for reviewer/admin actions
|
||
- **PII-minimal** user model
|
||
- **TLS** everywhere, strict HSTS
|
||
- **Secret management** via GCP Secret Manager
|
||
- **Signed URLs** expire quickly (24h) and scoped to object
|
||
|
||
---
|
||
|
||
## 12) Observability
|
||
|
||
- **Backend:** Structured logs, tracing (OpenTelemetry), Prometheus metrics
|
||
- **Frontend:** Sentry (release + sourcemaps), console log suppression in prod
|
||
- **KPIs:** Job throughput, task latency, error rates, queue depth, time-to-completion
|
||
|
||
---
|
||
|
||
## 13) CI/CD & Deployment
|
||
|
||
### 13.1 Frontend (SPA)
|
||
- **Build:** `vite build` → static assets in `dist/`
|
||
- **Host:** Upload `dist/` to **GCS bucket** with **Cloud CDN** in front
|
||
- **SPA rewrite:** Set CDN rewrite for all non-asset paths → `/index.html`
|
||
- **Cache:** Long cache for hashed assets; no-cache for `index.html`
|
||
- **Environment:** Inject `VITE_*` at build; use `.env.production` in CI
|
||
|
||
### 13.2 Backend & Workers
|
||
- Docker images built in CI, pushed to Artifact Registry
|
||
- Deploy to Cloud Run (API and workers separately)
|
||
- Concurrency & autoscaling rules per queue load
|
||
- Migrate secrets via versions; run e2e smoke checks post-deploy
|
||
|
||
### 13.3 GitHub Actions (high level)
|
||
- `ci.yml`: lint (eslint, tsc, ruff, mypy), unit tests (vitest/pytest), build artifacts
|
||
- `cd-frontend.yml`: build SPA, upload to GCS, purge CDN
|
||
- `cd-api.yml`: build/push images, deploy to Cloud Run, run migrations (if any), smoke tests
|
||
|
||
---
|
||
|
||
## 14) Testing Strategy
|
||
|
||
### 14.1 Frontend
|
||
- **Unit:** Vitest + React Testing Library for components (VttEditor utils, UploadDropzone)
|
||
- **Integration:** Mock API (MSW) for query/mutation flows
|
||
- **E2E:** Playwright (auth, upload, QC approve/reject, final complete, downloads present)
|
||
|
||
### 14.2 Backend
|
||
- **Unit:** VTT parser/builder, prompt builders, signed URL helpers, RBAC
|
||
- **Integration:** Mock Gemini/Translate/TTS; assert outputs to GCS and DB mutations
|
||
- **E2E:** Full flow from `/jobs` to completion using small test MP4
|
||
|
||
### 14.3 Performance & Load
|
||
- K6/Locust to stress uploads and job queue; measure SLOs
|
||
|
||
---
|
||
|
||
## 15) Acceptance Criteria (Phase-wise)
|
||
|
||
### Phase 1: Ingestion & AI
|
||
- [ ] SPA allows MP4 upload and job creation
|
||
- [ ] `en/captions.vtt` & `en/ad.vtt` exist in GCS; job → `pending_qc`
|
||
- [ ] Job Detail shows English previews (video + captions toggle)
|
||
|
||
### Phase 2: QC Loop
|
||
- [ ] Reviewer edits VTT in SPA; changes persist
|
||
- [ ] Approve → `approved_english` and pipeline fires
|
||
- [ ] Reject → `rejected` with required notes → client sees reason
|
||
|
||
### Phase 3: Translation & MP3
|
||
- [ ] VTTs generated per language with preserved timings
|
||
- [ ] Transcreation performed for selected languages (spot-check)
|
||
- [ ] MP3 AD voiceovers present for requested languages
|
||
- [ ] Job → `pending_final_review`
|
||
|
||
### Phase 4: Final Review & Delivery
|
||
- [ ] Reviewer can mark `completed`
|
||
- [ ] Client receives email with signed links (expire in 24h)
|
||
- [ ] SPA shows “Completed” and enables downloads
|
||
|
||
---
|
||
|
||
## 16) Concrete Build Tasks for Claude Code (Step-by-step Prompts)
|
||
|
||
### 16.1 Backend scaffolding (FastAPI)
|
||
> Create a FastAPI project per structure in section 3. Implement `/auth/login`, `/auth/refresh`, `/auth/logout` with refresh cookie and access token response. Add RBAC decorators, Pydantic schemas mirroring section 5. Enable CORS for the SPA origin. Implement `/jobs` endpoints from section 7.2. Configure Gunicorn+Uvicorn workers.
|
||
|
||
### 16.2 Storage & Signed URLs
|
||
> Implement `services/gcs.py` for uploads, signed URLs, and text/binary writers with correct content-types. Unit tests for content-types and signed URL expiry.
|
||
|
||
### 16.3 Gemini/Translate/TTS Services
|
||
> Implement `services/gemini.py` (`extract_accessibility`, `transcreate`), `services/translate.py`, `services/tts.py` with retries and typed exceptions. Load prompts from `prompts/*.md`.
|
||
|
||
### 16.4 Celery & Pipelines
|
||
> Configure Celery (Redis broker). Implement tasks in `tasks/ingest_and_ai.py`, `tasks/translate_and_synthesize.py`, `tasks/notify.py`. Add Mongo change streams watcher to enqueue on status transitions.
|
||
|
||
### 16.5 VTT Utilities
|
||
> Implement `app/lib/vtt.py` to parse/build VTT, preserve timestamps, reconstruct translated/transcreated text. Unit tests with fixtures.
|
||
|
||
### 16.6 SPA scaffolding (Vite + React + TS)
|
||
> Initialize Vite (react-ts). Add React Router, TanStack Query, Axios, Zod, Tailwind. Create route layout per section 10.4. Implement `lib/api.ts` with interceptors for access token refresh. Add `RequireAuth` and `RoleGate`.
|
||
|
||
### 16.7 SPA Features
|
||
> Build Upload page with Dropzone and progress. Implement Job Detail with video+captions preview and download links. Create Admin QC List & Detail with `VttEditor` (cue edit, timestamp validation). Final Review pages with “Complete” action.
|
||
|
||
### 16.8 Email Templates
|
||
> Implement `services/emailer.py` and Jinja template for delivery email listing signed links by language.
|
||
|
||
### 16.9 Observability & CI/CD
|
||
> Add OpenTelemetry (server & workers), Sentry DSN support in SPA, GitHub Actions workflows for lint/test/build/deploy. Add Cloud CDN SPA rewrite and caching config.
|
||
|
||
---
|
||
|
||
## 17) Example Pydantic Schemas (Backend)
|
||
|
||
```python
|
||
# backend/app/schemas/job.py
|
||
from pydantic import BaseModel, Field, constr
|
||
from typing import List, Dict, Optional, Literal
|
||
|
||
Status = Literal[
|
||
"created","ingesting","ai_processing","pending_qc",
|
||
"approved_english","rejected",
|
||
"translating","tts_generating","pending_final_review","completed"
|
||
]
|
||
|
||
class Source(BaseModel):
|
||
filename: str
|
||
gcs_uri: str
|
||
duration_s: Optional[float] = None
|
||
language: constr(min_length=2, max_length=10) = "en"
|
||
|
||
class RequestedOutputs(BaseModel):
|
||
captions_vtt: bool = True
|
||
audio_description_vtt: bool = True
|
||
audio_description_mp3: bool = True
|
||
languages: List[str] = []
|
||
transcreation: List[str] = []
|
||
|
||
class LangOutput(BaseModel):
|
||
captions_vtt_gcs: Optional[str] = None
|
||
ad_vtt_gcs: Optional[str] = None
|
||
ad_mp3_gcs: Optional[str] = None
|
||
origin: Optional[Literal["translate","transcreate"]] = None
|
||
qa_notes: Optional[str] = None
|
||
|
||
class Outputs(BaseModel):
|
||
__root__: Dict[str, LangOutput] # keyed by language code
|
||
|
||
class ReviewHistoryItem(BaseModel):
|
||
at: str
|
||
status: str
|
||
by: Optional[str] = None
|
||
notes: Optional[str] = None
|
||
|
||
class Review(BaseModel):
|
||
notes: Optional[str] = ""
|
||
reviewer_id: Optional[str] = None
|
||
history: List[ReviewHistoryItem] = []
|
||
|
||
class AISection(BaseModel):
|
||
ingestion_json: Optional[dict] = None
|
||
confidence: Optional[float] = None
|
||
|
||
class Job(BaseModel):
|
||
id: Optional[str] = Field(None, alias="_id")
|
||
client_id: Optional[str] = None
|
||
title: str
|
||
source: Source
|
||
requested_outputs: RequestedOutputs
|
||
status: Status = "created"
|
||
review: Review = Review()
|
||
outputs: Optional[Outputs] = None
|
||
ai: Optional[AISection] = None
|
||
error: Optional[dict] = None
|
||
created_at: Optional[str] = None
|
||
updated_at: Optional[str] = None
|
||
```
|
||
|
||
---
|
||
|
||
## 18) Sample Test Fixtures
|
||
|
||
- `tests/fixtures/sample_ingestion.json` (valid Gemini output)
|
||
- `tests/fixtures/sample_en_captions.vtt`
|
||
- `tests/fixtures/sample_en_ad.vtt`
|
||
- `tests/fixtures/sample_es_captions.vtt`
|
||
- `tests/fixtures/sample_es_ad.vtt`
|
||
- `tests/fixtures/source_5s.mp4` (tiny clip)
|
||
|
||
---
|
||
|
||
## 19) Risk Matrix & Mitigations
|
||
|
||
- **Invalid JSON from model:** Pydantic validation + self-heal prompt + retries; capture bad response
|
||
- **Timestamp drift:** Preserve cue timings; only replace text
|
||
- **TTS alignment:** Per-cue synthesis; stitch with small crossfades
|
||
- **Large videos:** Chunk STT; parallelize; concatenate cues
|
||
- **Queue backlog:** Autoscale workers; alert on queue depth
|
||
- **Secrets exposure:** Secret Manager; least-privilege IAM; no keys in client
|
||
|
||
---
|
||
|
||
## 20) Future Enhancements
|
||
|
||
- Client-facing caption editor (limited rights)
|
||
- Translator role per language
|
||
- Brand glossaries & terminology management
|
||
- Watermarked preview player
|
||
- Webhooks for customer systems
|
||
|
||
---
|
||
|
||
## 21) Developer Definition of Done (per PR)
|
||
|
||
- [ ] Unit tests ≥80% for services/utils
|
||
- [ ] OpenAPI up-to-date
|
||
- [ ] RBAC enforced server-side
|
||
- [ ] Tasks idempotent with retry/backoff
|
||
- [ ] VTT validation on write
|
||
- [ ] Traces/metrics/logs in place
|
||
- [ ] Security scan clean
|
||
- [ ] Docs updated (README, ENV, runbooks) |