Compare commits

..

180 commits
dev ... main

Author SHA1 Message Date
Vadym Samoilenko
fb99a5e8c7 feat(vtt): add note field to VttUpdateRequest and wire through create_version calls
Some checks failed
Deploy Backend / Deploy API to Cloud Run (push) Has been cancelled
Deploy Frontend / Build and Deploy Frontend (push) Has been cancelled
CI / Backend Lint & Test (push) Has been cancelled
CI / Frontend Lint & Test (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / Dependency Check (push) Has been cancelled
Deploy Backend / Deploy Worker to Cloud Run (push) Has been cancelled
Deploy Backend / Run Smoke Tests (push) Has been cancelled
Deploy Backend / Notify Deployment Status (push) Has been cancelled
Deploy Frontend / Notify Deployment Status (push) Has been cancelled
CI / Integration Tests (push) Has been cancelled
CI / Build Backend Docker Image (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
2026-05-14 11:44:07 +01:00
Vadym Samoilenko
07d2112e53 fix(cost): use new_event_loop pattern for Whisper cost tracking (matches ingest_and_ai.py) 2026-05-14 11:43:20 +01:00
Vadym Samoilenko
922cb9318e feat(cost): add Whisper transcription cost tracking
Records audio_duration (as chars) + latency_ms to cost tracker after each
successful transcription; wrapped in try/except so it never fails the task.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 11:42:17 +01:00
Vadym Samoilenko
cff62c51ff fix(audit): add details to submit_brief and approve_brief audit calls 2026-05-14 11:41:22 +01:00
Vadym Samoilenko
b24f7a9a0f feat(audit): add audit logging to brief and share routes
Adds BRIEF_CREATE/UPDATE/SUBMIT/APPROVE audit calls to routes_briefs.py
and SHARE_TOKEN_CREATE/REVOKE/SHARE_CLIENT_DECISION to routes_share.py;
public client_decision endpoint passes user=None per convention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 11:40:19 +01:00
Vadym Samoilenko
11bf08a29d feat(audit): add audit logging to org and invitation routes
Adds audit log entries for all write endpoints in routes_organizations.py
(ORG_CREATE, ORG_UPDATE, ORG_MEMBER_ADD, ORG_MEMBER_UPDATE, ORG_MEMBER_REMOVE)
and routes_invitations.py (INVITATION_CREATE, INVITATION_REVOKE, INVITATION_ACCEPT).
The public accept endpoint passes user=None per the no-auth contract.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 11:37:43 +01:00
Vadym Samoilenko
42a0c8acb1 fix(audit): deactivate_client details + non-raising audit insert in service 2026-05-14 11:35:40 +01:00
Vadym Samoilenko
bd1dd69467 feat(audit): add audit logging to client management routes
All 13 write endpoints in routes_clients.py now emit audit log entries
(CLIENT_CREATE, CLIENT_UPDATE, CLIENT_DEACTIVATE, CLIENT_PM_ASSIGN/REMOVE,
CLIENT_TEAM_CREATE/UPDATE/DELETE, CLIENT_TEAM_MEMBER_ADD/REMOVE,
CLIENT_PROJECT_CREATE/UPDATE/ARCHIVE). request: Request added to each
endpoint signature; resource_name and relevant details included in every call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 11:33:58 +01:00
Vadym Samoilenko
82d438df7c fix(audit): remove per-cue audit noise from mark_cue_reviewed endpoint 2026-05-14 11:31:37 +01:00
Vadym Samoilenko
7bba8256ce feat(audit): add audit logging to language QC routes
Adds audit_logger.log_action calls to all 13 write endpoints in
routes_language_qc.py using existing AuditAction enum values. Also
adds missing http_request: Request parameter to mark_cue_reviewed.
2026-05-14 11:30:28 +01:00
Vadym Samoilenko
000e99c2d0 feat(audit): add missing AuditAction enum values for clients, orgs, invitations, QC, briefs, share 2026-05-14 11:28:30 +01:00
Vadym Samoilenko
700347857a chore: ignore .worktrees directory 2026-05-14 11:27:13 +01:00
Vadym Samoilenko
3b31012901 fix(vtt): strip cue settings from end timestamp in parse_ad_cues
tts_synthesis.parse_ad_cues() was passing "00:00:02.500 line:0%" to
_parse_timestamp() — cue settings were not stripped from the end-time part
of the timing line. Split on whitespace and take first token only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 19:18:02 +01:00
Vadym Samoilenko
f22d568fc5 fix(security): fix false-positive injection blocks on French/multilingual VTT content
- Remove ';' from command-injection pattern — semicolons are common in French
  and other European languages, not a shell injection risk in JSON context
- Skip security pattern scanning for free-text fields (captions_vtt,
  audio_description_vtt, notes, etc.) — natural language always generates
  false positives against injection regexes
- Add GET/HEAD to GCS CORS config so browsers can load signed VTT URLs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 19:11:01 +01:00
Vadym Samoilenko
4645e67611 fix(glossary-list): show real embedding progress in glossary list view
- Add current_version_embedding_status/embedded_count/term_count to GlossaryResponse
- Batch-fetch current versions in list endpoint (single extra query, not N queries)
- Add get_versions_by_ids() helper to glossary_service
- Fix GlossaryList.tsx: embeddingBadge('') → embeddingBadge(g) with real status + pct

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 19:00:56 +01:00
Vadym Samoilenko
e70a67718e fix(glossary): hard-delete glossary with cascade on archive
archive_glossary() now deletes terms, versions, and the glossary document
instead of soft-deleting. Prevents orphaned 34k-term datasets from consuming
embedding quota and storage after a glossary is removed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 18:44:51 +01:00
Vadym Samoilenko
6bf88474ee feat(embed): switch embeddings to Vertex AI text-multilingual-embedding-002
Replace AI Studio gemini-embedding-001 with Vertex AI text-multilingual-embedding-002
via google-genai SDK (vertexai=True). Vertex AI uses ADC (already configured) and
has significantly higher per-project quotas than AI Studio per-user limits.
Same 768-dim output; multilingual model better suited for 50+ language glossaries.
Add gcp_location config field (default us-central1).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 18:41:32 +01:00
Vadym Samoilenko
7a7b6c1c12 fix(embed): respect Gemini 429 retryDelay and reduce concurrency
- Parse retryDelay from 429 error body and sleep for server_delay+1s
  instead of our own 2s/4s backoff (which was shorter than API requires)
- Reduce embed concurrency 5→2 to halve burst when multiple glossary
  versions embed simultaneously
- Increase max_retries 3→5 and initial backoff 2s→8s for headroom

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 18:07:22 +01:00
Vadym Samoilenko
ca312d48fa chore(lint): fix all ruff errors — 0 warnings remaining
- B904 (55): add `from err` / `from None` to raise-in-except across 13 files
- F821 (1): add missing HTTPException import in routes_language_qc.py
- F841 (7): remove unused variable assignments (current_user, job_title, tts_provider, etc.)
- W293 (13): strip trailing whitespace from blank lines
- C416 (4): rewrite unnecessary dict comprehensions as dict()
- C401 (1): rewrite unnecessary generator as set comprehension
- E701 (4): split multi-statement lines in cost_tracker.py
- E741 (1): rename ambiguous `l` to `lang` in cloud_run_dispatch.py
- B007 (4): prefix unused loop variables with _ in tts.py, video_renderer.py
- I001 (1): sort imports in tasks/__init__.py (move stdlib to top)
- E402 (3): move threading/time/signals imports to top of tasks/__init__.py
- UP042 (9): replace (str, Enum) with StrEnum in all model/schema enums

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 17:13:08 +01:00
Vadym Samoilenko
16000a8bd9 fix(glossary,vtt): 4 bugs — locale fallback, ingestion source, cue settings, overlap on save
- glossary_service: _get_translation now handles bare→specific fallback (fr→fr-FR);
  previously only specific→bare worked, causing zero term matches when job uses
  bare locale codes ("fr") but XLSX has region columns ("fr_fr" → "fr-FR")
- ingest_and_ai: use title + brand_context as glossary source text; previously
  empty brand_context caused glossary to be skipped entirely during AI ingestion
- routes_jobs.py: apply fix_overlapping_cues before validating PATCH /vtt;
  mirrors what AI generation already does — prevents save errors for minor overlaps
- frontend/vtt.ts: preserve raw cue settings (line:0%, align:end, etc.) through
  parse→build round-trip; previously settings were parsed into positionTop flag
  only and dropped on serialization, losing caption positioning after edit

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 16:58:13 +01:00
Vadym Samoilenko
69eff9ca9d chore(deps): regenerate poetry.lock after google-cloud-texttospeech upgrade
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 18:38:36 +01:00
Vadym Samoilenko
76bee82119 fix(pipeline): fix 5 QA tickets — caption alignment, glossary, source_has_ad render, filler words, NL error surfacing
- caption_aligner: lower match ratio 0.5→0.35, widen search window 60→150, add time-based cursor fallback on miss
- gemini.py: explicit 'MUST use glossary terms' requirement in translate_vtt prompt; source_has_ad prompt now instructs not to include AD narration in captions
- ingest_and_ai: load glossary for source language and pass to extract_accessibility
- render_accessible_video: handle source_has_ad=True via caption-embed path (ffmpeg subtitle inject, no AD pipeline)
- translate_and_synthesize: track failed languages, write translation_errors to DB, add exc_info to error log
- vtt.py: expand _FILLER_PATTERNS to nl/pt/pl/uk/ru, widen EN/ES/FR/DE/IT lists
- gemini_ingestion.md: strengthen line:0% placement rule, expand disfluency examples per language

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 18:36:59 +01:00
Vadym Samoilenko
f7708f0214 chore(deps): upgrade google-cloud-texttospeech to ^2.36.0
2.27.0 (previously locked) lacks VoiceSelectionParams.model_name field
required for Gemini TTS model selection via Cloud TTS API.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 17:26:30 +01:00
Vadym Samoilenko
c380a96c72 refactor(tts): switch Gemini TTS from AI Studio API to Cloud TTS API
The AI Studio API (generativelanguage.googleapis.com) enforces a hard 10 RPM
quota on preview models regardless of billing tier. Switching to Cloud TTS API
(texttospeech.googleapis.com) with the same Gemini models uses a separate,
production-grade quota that scales on paid plans.

Changes:
- Replace genai.Client + generate_content(AUDIO) with texttospeech.TextToSpeechClient
- Style prompt now goes to SynthesisInput.prompt (dedicated field, not prepended text)
- Speed goes to AudioConfig.speaking_rate (no longer encoded in prompt text)
- Cloud TTS returns MP3 directly — remove PCM→MP3 lameenc conversion
- config: update pro model from gemini-2.5-pro-preview-tts → gemini-2.5-pro-tts (GA)
- Service account already has roles/aiplatform.user (granted today)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 17:16:32 +01:00
Vadym Samoilenko
95dbed03bd fix(tts): respect API retryDelay on 429 instead of short exponential backoff
Gemini TTS allows 10 RPM; with concurrency=8 the rate limit is hit quickly.
The previous backoff (1-3s) was far too short — the API returns retryDelay ~37s.
Both synthesize_cue_task (Celery retry countdown) and GeminiTTSService
(_synthesize_cue_with_retry sleep) now parse the retryDelay from the 429
error message and use it (+ 5s buffer) instead of the exponential guess.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 17:04:45 +01:00
Vadym Samoilenko
39a9d62b06 fix(qc): dispatch TTS+render for source-only jobs when accessible_video_mp4 is requested
When EN is approved on a source-only job (no target languages), the translation
branch was skipped entirely, leaving the accessible video render pipeline never
dispatched. Added elif branch: if accessible_video_mp4 is requested and there
are no target languages to translate, dispatch translate_and_synthesize_task
(which will skip translation, run TTS for source language, and dispatch the
render task).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 16:58:39 +01:00
Vadym Samoilenko
36b3b3e47c fix(ui): correct sdh field name to sdh_vtt in job detail outputs
Used wrong field name sdh_captions instead of sdh_vtt, causing
TypeScript build failure on optical-dev.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 14:16:53 +01:00
Vadym Samoilenko
8598852da1 fix(ui): show all 5 requested output types in job detail
accessible_video_mp4 and sdh_captions were missing from the
Outputs section render — only 3 of 5 fields were displayed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 14:14:50 +01:00
Vadym Samoilenko
77a4eb10e0 fix(auth): await get_redis() coroutine in membership cache
get_redis() is an async function but was called without await in
_cached_memberships(), causing RuntimeWarning and silently bypassing
the Redis membership cache on every request — all membership lookups
were hitting MongoDB instead of cache.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:57:57 +01:00
Vadym Samoilenko
5a93bdc1b6 fix(tts): run TTS pipeline when accessible_video_mp4=True even if audio_description_mp3=False
Per-cue MP3s (ad_cue_manifest) are required by render_accessible_video_task regardless
of whether the assembled ad.mp3 is requested as a client download. Previously, jobs with
accessible_video_mp4=True but audio_description_mp3=False would silently skip TTS, leaving
render tasks never dispatched and jobs stuck in tts_generating indefinitely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:53:49 +01:00
Vadym Samoilenko
c8a610b3f7 fix(vtt): auto-fix overlapping cues from AI-generated output
Gemini occasionally produces captions where a cue's start_time is
earlier than the previous cue's end_time. Add VTTEditor.fix_overlapping_cues()
that trims each cue's end_time to 1ms before the next cue's start, applied
to both captions and AD VTT immediately after AI generation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:23:08 +01:00
Vadym Samoilenko
3371466e10 fix(ui): hide error banner for non-failed job statuses
job.error persists after retry succeeds — only show it when status is
tts_failed / render_failed / processing_failed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:12:31 +01:00
Vadym Samoilenko
cff1b35aa0 fix(gemini): fallback on empty response (response.text is None)
Gemini occasionally returns response.text=None under load or safety filters.
Treat it as a retriable error so the fallback chain is used.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 12:54:10 +01:00
Vadym Samoilenko
796cd85a1d fix(gemini): include 503 UNAVAILABLE in fallback retry condition
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 12:38:26 +01:00
Vadym Samoilenko
e2391e2603 fix(gemini): correct fallback model ID + graceful downloads for failed jobs
- gemini-3.1-flash-preview doesn't exist; replace with gemini-3-flash-preview
- GET /jobs/{id}/downloads: return empty {} instead of 400 when job has no
  outputs (e.g. processing_failed before AI stage completes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 12:32:39 +01:00
Vadym Samoilenko
56a3a62368 feat(gemini): add model fallback chain on 429 quota errors
Routes all generate_content calls through _generate() which retries
gemini-3.1-flash-preview then gemini-2.5-pro when primary model hits
RESOURCE_EXHAUSTED. Cost tracker records actual model used.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 12:02:59 +01:00
Vadym Samoilenko
f38325b461 fix(tts): scope retranslation TTS to target language only
When retranslate=True, _generate_tts_for_languages was receiving
the full outputs dict (all 9 languages) and regenerating TTS + render
for every language on every single-language retranslation task.
That multiplied API calls by 8x and triggered unnecessary renders.

Now passes only the target language outputs when retranslate=True.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 16:57:20 +01:00
Vadym Samoilenko
b873f0af6d fix(translation): use per-language dot-notation to prevent race condition
concurrent retranslation tasks (concurrency:2) were each replacing the
entire outputs doc, so the last writer silently overwrote the others.
Now each task only writes outputs.<lang> for the languages it processed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 16:45:28 +01:00
Vadym Samoilenko
865473937f feat(qc): bulk retranslate broken languages button
Adds "↺ Retranslate broken (N)" button in the Languages panel header.
Visible to production/admin when EN is approved and there are languages
with video_native origin or missing captions_vtt_gcs.
Confirm modal shows each broken language with its failure reason,
then queues individual retranslation tasks sequentially.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 16:41:55 +01:00
Vadym Samoilenko
290d5e32e6 fix: 7 caption/AD quality bugs + retranslation error handling
Bug fixes:
- Bug 1a: source_has_ad flag prevents AI generating AD over existing professional AD;
  JobBrief/Job models, gemini service prompt conditional, NewBrief UI checkbox
- Bug 1b: disable native textTracks on video element to prevent double captions
- Bug 2: caption ALL audible speech including off-screen narrators (prompt fix)
- Bug 3: DCMP §6.01 disfluency removal for EN/ES/FR/DE/IT (prompt + post-pass)
- Bug 4: VTT cue settings (line:0%, position:) preserved through parser round-trip
- Bug 5: Whisper word-level timestamp alignment via new caption_aligner service
- Bug 6: assert_cue_alignment used .start/.end; renamed to .start_time/.end_time
- New migration: backfill source_has_ad=False on existing jobs and job_briefs

Also fix retranslation error handling to preserve existing GCS URIs on failure
so video_native captions remain accessible if retranslation fails.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 15:38:20 +01:00
Vadym Samoilenko
00dd1643f5 docs(help): add screenshot to PM EN-first pipeline section
Some checks failed
Deploy Backend / Deploy API to Cloud Run (push) Has been cancelled
Deploy Frontend / Build and Deploy Frontend (push) Has been cancelled
CI / Backend Lint & Test (push) Has been cancelled
CI / Frontend Lint & Test (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / Dependency Check (push) Has been cancelled
Deploy Backend / Deploy Worker to Cloud Run (push) Has been cancelled
Deploy Backend / Run Smoke Tests (push) Has been cancelled
Deploy Backend / Notify Deployment Status (push) Has been cancelled
Deploy Frontend / Notify Deployment Status (push) Has been cancelled
CI / Integration Tests (push) Has been cancelled
CI / Build Backend Docker Image (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 12:47:50 +01:00
Vadym Samoilenko
c3835843db docs(help): document EN-first translation pipeline for all roles
Add §6 EN-First Translation Pipeline to Production, Linguist, Admin, and
Project Manager guides explaining the new flow: translations are generated
only after English QC is approved, preserving 1:1 cue structure.

Documents origin badges (⚠ video-native), the amber TranslationGateBanner
on target-language cards, the ↺ Re-translate from EN button, and the
blue info note on the New Job form. Adds 5 new screenshots captured from
the deployed optical-dev environment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 12:42:58 +01:00
Vadym Samoilenko
4ba489eaaa docs(help): document EN-first translation pipeline for all roles
Add §6 EN-First Translation Pipeline to Production, Linguist, Admin, and
Project Manager guides explaining the new flow: translations are generated
only after English QC is approved, preserving 1:1 cue structure.

Documents origin badges (⚠ video-native), the amber TranslationGateBanner
on target-language cards, the ↺ Re-translate from EN button, and the
blue info note on the New Job form. Adds 5 new screenshots captured from
the deployed optical-dev environment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 12:42:03 +01:00
Vadym Samoilenko
f99be62256 fix(briefs): replace video_native with traditional in NewBrief default
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 12:21:29 +01:00
Vadym Samoilenko
c74fde4f40 fix(translation): approve_source triggers translation for any role
approve_source is callable by any qualified role (reviewer, linguist,
production, admin) — not just linguists. Now correctly dispatches the
translation pipeline when target languages are untranslated, regardless
of who approves the source. Without this fix, only language_qc.approve_language
(EN path) would trigger translation, leaving other roles stuck.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 12:13:39 +01:00
Vadym Samoilenko
fddf803b74 feat(translation): enforce EN-first pipeline with cue-preserving translations
All translations now derive strictly from the approved English master VTT,
eliminating the cue-count and timestamp drift reported by linguists
(e.g. PL AD = 11 cues vs EN AD = 17 cues).

Key changes:
- Remove video_native translation mode entirely; all languages go through
  translate_vtt() which guarantees 1:1 cue alignment with EN master
- Transcreation languages now use translate_vtt(style="transcreate") —
  same cue-preserving contract, culturally-adapted instructions
- Post-translation cue alignment validator added (VTTEditor.assert_cue_alignment)
- After ingestion, job moves to PENDING_QC (EN-only) instead of TRANSLATING;
  translation pipeline dispatches automatically when EN QC is approved
- New POST /jobs/{id}/retranslate-language endpoint for PM/admin to fix
  legacy video_native jobs on demand
- Frontend: origin badge (EN-aligned / transcreated / video-native warning),
  EN-first gate banner on target-language cards, Re-translate from EN button
  with confirm modal, removed translation mode selector from NewJob

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 12:11:35 +01:00
Vadym Samoilenko
e2af5c0f2f fix(share): correct decisionState type narrowing in ShareView
Wrap the render condition in parentheses and include 'submitting' so
TypeScript does not narrow decisionState to 'idle'|'error' inside the
review form block, eliminating four TS2367 comparison errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 09:57:35 +01:00
Vadym Samoilenko
d70b5acaf9 feat(share): client review form on share link; hide client role from UI
- POST /public/share/{token}/decision — unauthenticated approve/reject via share token
  - approve: validates assets, sets status completed, triggers notification
  - reject: sets status qc_feedback, stores client name + notes in review history
- ShareView: review form (name, comments, Approve / Return for Corrections)
  - shows only when job is pending_final_review
  - confirmation screen after decision
- api.ts: submitShareDecision()
- Hide 'client' role from UserList/UserDetail dropdowns
- Hide 'Client' guide tab from Help

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 09:51:58 +01:00
Vadym Samoilenko
f91cb16005 fix(middleware): add word boundaries to injection patterns; default role to admin
- Add \b word boundaries to SQL injection and command injection regex patterns
  to prevent false positives on names like "Josh Smith" (sh\s+), "Norm " (rm\s+)
- Change default role in CreateUserModal from 'client' to 'admin'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 09:45:28 +01:00
Vadym Samoilenko
3a2bbc9ca0 fix(membership): correct \$unwind option preserveNullAndEmpty → preserveNullAndEmptyArrays
MongoDB 7.0 rejects the invalid key with code 28811, causing 500 on
GET /organizations/{id}/members.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 18:58:07 +01:00
Vadym Samoilenko
5f084f359f feat(help): add FAQ tab + full-content search
- New FAQ guide (faq.md) covering: re-render accessible video, stuck jobs,
  linguist/reviewer assignment, downloads, TTS voice, briefs, status colours
- extractSections() parses markdown body text per section; search now
  matches against section body text, not just heading text
- FAQ tab added between Overview and Client in the sidebar

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 18:46:06 +01:00
Vadym Samoilenko
6588feedc7 fix(membership): use 'or ""' to guard against null email/full_name
u.get("key", "") returns None when key exists with null value in MongoDB,
causing Pydantic ValidationError on MemberDetail.email/full_name: str → 500.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 18:41:31 +01:00
Vadym Samoilenko
e52abca74b fix(qc): move accessibleCaptionCues memos after state declarations
Resolves TS2448/TS2454 — useMemo blocks referenced captionsVtt and
retimedCaptionsVtt before their useState declarations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 18:31:58 +01:00
Vadym Samoilenko
90867e9824 fix(qc): show captions on accessible video + allow admin/PM as linguist/reviewer
- Add retimed captions overlay to accessible video player in QCDetail;
  falls back to original captions if retimed VTT not yet generated
- Extend listUsers to accept comma-separated roles (e.g. linguist,admin)
  so admin/production users appear in linguist/reviewer assignment dropdowns

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 18:07:55 +01:00
Vadym Samoilenko
68ac65ac05 fix(briefs): fix Project/Assign-To dropdowns and expand Requested Outputs
Projects:
- PM now sees all active projects (same as admin/production) — was filtering
  to empty when pm_client_ids and org memberships were both unset

Assign To:
- Replaced useOrganizations()+useOrgMembers() with a new GET /admin/brief-assignees
  endpoint accessible to all authenticated users — returns active admin/PM/production
  users sorted by name; shows role next to name in dropdown

Requested Outputs:
- Added SDH Captions (VTT), Descriptive Transcript, Accessible Video (MP4)
- Accessible Video shows Pause Insert / Voice Overlay radio selector
- Added descriptive_transcript field to RequestedOutputs model (backend + frontend)

Access:
- Brief routes now open to 'client' role in addition to admin/PM/production

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 17:54:21 +01:00
Vadym Samoilenko
a14444d61c Revert "feat(briefs): remove Briefs feature from frontend"
This reverts commit 98ece9faac.
2026-05-01 17:49:12 +01:00
Vadym Samoilenko
98ece9faac feat(briefs): remove Briefs feature from frontend
Deleted route files, App.tsx routes, Sidebar nav item, and Dashboard
"Awaiting Upload" card. The feature wasn't ready (Project/Assign To dropdowns
were empty for non-admin users) and isn't needed at this stage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 17:48:02 +01:00
Vadym Samoilenko
27286e23db fix(sidebar): show org settings link for platform admins without memberships
Platform admins query GET /organizations (not memberships) so currentOrgId
was always null — hiding the Settings nav link. Now falls back to the first
org from useOrganizations() for admins, gated with enabled:isPlatformAdmin
to avoid 403 for non-admin roles.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 17:45:49 +01:00
Vadym Samoilenko
e60e7c96e7 fix(settings): use organization_id (not slug) in /org/:orgSlug/settings URL
The backend /organizations/{org_id}/members endpoint queries memberships
by organization_id (_id hex string), but the sidebar was building the URL
from organization_slug (e.g. "3m-test"), causing 403 on every Settings page
load ("Failed to load members.").

- Sidebar: derive currentOrgId from organization_id; option values = org ID
- OrgSettingsLayout: alias orgSlug param as orgId for clarity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 17:34:20 +01:00
Vadym Samoilenko
a3cfe2ff8c fix(renderer): skip ffprobe Phase 3.5 — use pre-computed freeze duration
Cloud Run-generated freeze segments caused FFprobe to return code 1 with
empty stderr when dispatched to the Celery ffmpeg queue, crashing the
render for every language. The freeze segments are created to an exact
pre-computed duration (ad_duration + silence_before + silence_after),
so probing is unnecessary — assign that value directly instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 17:29:33 +01:00
Vadym Samoilenko
9733700874 fix(captions): remove duplicate native track display, fix position to bottom
- Remove `default` from <track> element so browser doesn't render native
  captions on top of the custom overlay (was causing double display)
- Remove positionTop logic — always render overlay at bottom-14 (above controls)
  regardless of VTT line hints; applies to both VideoWithCaptions and VideoReviewPlayer

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 16:33:33 +01:00
Vadym Samoilenko
df7fec701d fix(ui): connection dot in navbar, profile page, render error visibility + audit log
- Navbar: add WebSocket connection dot (green/yellow/red/gray) from GlobalWebSocketContext
- Profile page: /profile route shows email, full_name, role, auth_provider, languages
- JobResponse: expose failure and error fields (were stored in MongoDB but not returned)
  so frontend now shows actual render error message instead of generic fallback
- render_accessible_video: write JOB_TASK_FAILED audit log entry on render failure
  with language, error detail, step=render
- rerender_accessible_video: same audit log on re-render failure, step=rerender

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 16:19:12 +01:00
Vadym Samoilenko
2f4925353a feat(pause-insert): adaptive buffer, forward-snap, timeline drag + share link fix
Backend (Phase A):
- A1: Adaptive silence buffer — natural_gap_ms persisted per cue; renderer computes
  per-cue silence_before/silence_after instead of fixed 500ms; per-cue silence files
- A2: Forward-preferred snap — snap_pause_point prefers boundaries up to 4s ahead
  over boundaries within 1.5s behind, reducing mid-scene cuts
- A3: Min-gap validation — pause points with < 200ms gap trigger forward search
  to the next acceptable gap
- natural_gap_ms added to PausePointData model and api.ts type
- New config fields: whisper_snap_forward_window, whisper_snap_backward_window,
  ad_silence_buffer_default, ad_silence_buffer_min_after, ad_min_acceptable_gap
- Tests: test_whisper_snap.py (13 tests), test_video_renderer_buffers.py

Frontend (Phase B):
- B1: Drag pause-point markers — pointer state machine with 3px move threshold,
  clamp to min/max bounds, click-without-move still opens PausePointEditor
- B2: Drag freeze blocks — orange blocks translate with linked pause point
- B3: Time tooltip visible during drag, hidden on release
- Tests: TimelinePreview.drag.test.tsx (10 tests)

Fixes:
- Share link pointed to ai-sandbox.oliver.solutions — added app_url to Settings
  with correct optical-dev.oliver.solutions default; share_url now configurable
  via APP_URL env var
- Removed all ai-sandbox.oliver.solutions references from docker-compose,
  apache config, docs, and scripts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 16:09:09 +01:00
Vadym Samoilenko
31d631f70d fix(qc-queue): correct sidebar badge and Refresh button loading state
Sidebar My QC Queue badge was showing org-wide pending_qc job count
instead of the current user personal assigned tasks. Now uses
useMyQCQueueCount which sums the linguist and reviewer queue totals
from the same me/language-qc-queue API the queue page uses.

Refresh button now shows a spinner and Refreshing label while the
refetch is in progress so users can see the action took effect.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 14:41:37 +01:00
Vadym Samoilenko
8dee0b6ff5 fix(membership): guard against missing/invalid role_in_org in membership docs
list_org_members and _membership_from_doc used bracket access on role_in_org
which raises KeyError if the field is absent (old docs or direct DB inserts).
Also handles ValueError if the stored value doesn't match a valid OrgRole.
Falls back to OrgRole.MEMBER in both cases.

Fixes 500 on GET /organizations/{org_id}/members.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 14:35:09 +01:00
Vadym Samoilenko
997c1f622b fix(rbac): allow reviewer role to assign linguists and reviewers
assign, assign-reviewer, reassign-reviewer, and bulk-assign endpoints
were gated to project_manager/production/admin only, but the Reviewer
QC Detail page exposes Assign buttons to reviewer users.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 14:29:15 +01:00
Vadym Samoilenko
d4cb31e5d9 feat(help): add real screenshots for all 7 role guides (77 images)
Captures admin, client, linguist, reviewer, production, project-manager,
and global help screenshots from optical-dev using Playwright MCP.
All markdown-referenced filenames now have corresponding PNG files.
Placeholders used where live data or role permissions prevent full capture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 14:02:54 +01:00
Vadym Samoilenko
2c816a5e69 docs(help): add Timeline Preview & Rendering section to linguist/reviewer/production/admin guides
All 4 roles that access QCDetail now have section explaining:
- Timeline bar colour legend (Video/AD Audio/Queued/Pause Point/Adjusted)
- Render Accessible Video Changes panel triggers and behaviour
- Whisper pause refinement checkbox guidance
- Step-by-step render workflow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 13:20:48 +01:00
Vadym Samoilenko
ce048a2196 fix(help): resolve screenshot paths under Vite subpath deploy
Markdown guides use /help-screenshots/... (root-relative). With Vite
base=/video-accessibility/, images were requested at the wrong URL.
Custom img renderer now prepends import.meta.env.BASE_URL so paths
resolve correctly on both /video-accessibility/ and local dev.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 13:14:47 +01:00
Vadym Samoilenko
67219797b6 feat(help): add captured screenshots for all 7 role guides (25 images)
Screenshots captured via Playwright against optical-dev. Covers:
global (login + interface), client, linguist, reviewer, production,
project-manager, admin — all 25 PNGs under frontend/public/help-screenshots/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 13:11:40 +01:00
Vadym Samoilenko
6559ccc1f9 feat(help): in-app role-based help guides + screenshot capture pipeline
- Help.tsx: role tabs, TOC scroll-spy, search, lightbox, react-markdown renderer
- 7 markdown guides (global, client, linguist, reviewer, production, PM, admin)
  with explicit click/drag/keyboard annotations throughout
- Sidebar: Help button added at bottom of nav (all roles)
- App.tsx: /help route, no RoleGate
- frontend/public/help-screenshots/{role}/: directories ready for screenshots
- tools/capture-help-screenshots.ts: Playwright screenshot script
  - Clicks "Local login" toggle before filling credentials
  - Uses test-admin local account (not SSO)
- backend/scripts/seed_test_users.py: idempotent MongoDB seed script
  creates 6 local-auth users (admin + 5 roles) for capture + local dev
- .env.screenshots.example: template with test-admin credentials
- Removes docs/video_accessibility_user_guide_v3.md (superseded by in-app guides)
- Deps: react-markdown, remark-gfm, rehype-raw added to frontend

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 13:08:13 +01:00
Vadym Samoilenko
d2adfbc3b4 fix(dashboard): briefsData is array, not {briefs:[]} — remove stale .briefs accessor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 12:13:58 +01:00
Vadym Samoilenko
c3a42cb5fe Merge fix/multi-tenancy-and-english-first into main 2026-05-01 12:07:37 +01:00
Vadym Samoilenko
9e6ce657bf fix(schema): empty string → None for captions/AD VTT fields (Bug 2B)
Frontend sends audio_description_vtt: "" for CC-only jobs.
Pydantic validator converts "" to None before validation,
so the backend skips VTT format validation and returns 200
instead of 400.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 12:06:09 +01:00
Vadym Samoilenko
f2968a2989 fix(vtt): regenerate descriptive_transcript.txt after PATCH /vtt saves
Bug 1: Editing any AD cue never updated descriptive_transcript.txt in GCS.
Bug 2A: Uploading replacement CC or AD .vtt had the same root cause.

After saving captions or AD VTT, read the other stream from GCS if not
provided in the request, merge both via generate_descriptive_transcript(),
upload the result to {job_id}/{lang}/descriptive_transcript.txt, and
update lang_output["descriptive_transcript_gcs"] before the DB write.

Bug 2B (CC-only job → 400 on empty audio_description_vtt): already fixed
by the existing `if request.audio_description_vtt:` guard (empty string
and None are both falsy) and frontend `adVtt || undefined` sending no
field rather than an empty string.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 12:03:35 +01:00
Vadym Samoilenko
32b12ff0a6 feat(ux): P2 role UX — reviewer queue, dashboard widgets, org filter, WS toast
Phase 2.3: VttEditor sticky banner + Re-translate wired into QCDetail
Phase 3.1: RoleGate on /briefs/* (PM/admin/production only)
Phase 3.2: LinguistQueue — sortable Assigned column, defaultRole prop
Phase 3.3: ReviewerQueue component + /qc/reviewer-queue route + sidebar entry
Phase 3.4: PM dashboard — Overdue and Stuck >24h widgets
Phase 3.5: Production dashboard — Awaiting Upload and Pending QC Handoff widgets
Phase 3.6: Admin UserList — org_id filter dropdown (uses listOrganizations)
WebSocket: onTerminalClose callback + error toast in GlobalWebSocketContext
Runbook: Apache ProxyTimeout ≥60s recommendation for WebSocket keepalives
Backend: fix F841 unused variable in test_cross_tenant_isolation.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 11:58:29 +01:00
Vadym Samoilenko
b427ee9f49 fix(authz): MT-3/6/7/8 org isolation + P1 English-first QC enforcement
Multi-tenancy isolation (P0):
- MT-3: Add get_job_or_403 (org membership check) to all 19+ job action endpoints
- MT-6: Same gate added to all review_notes (5) and vtt_versions (4) handlers
- MT-7: WebSocket /ws/jobs/{job_id} closes with 4403 on org mismatch;
  /ws/jobs passes accessible_org_ids to ConnectionManager; server-side
  keepalive at 20 s (asyncio.wait_for timeout) prevents proxy idle drops
- MT-8: list_users scoped to org memberships for non-platform-admins

WebSocket fixes (Mod Comms 2026-03-18 incident):
- Frontend heartbeat lowered 30 000 → 20 000 ms (was at Apache timeout edge)
- Terminal close codes 4001/4003/4004/4403 no longer trigger reconnect loop
- Silently discard server "keepalive" frames alongside existing "pong"

English-first QC (P1):
- _assert_can_approve blocks target language approval until source is APPROVED
- PRODUCTION/ADMIN roles bypass the gate
- Source VTT edits reset stale APPROVED/PENDING_REVIEW/IN_REVIEW target states

Tests (all passing):
- backend/tests/unit/test_language_qc_english_first.py (15 cases)
- backend/tests/unit/test_routes_jobs_org_isolation.py (12 cases)
- backend/tests/unit/test_review_notes_org_isolation.py (16 parametrized cases)
- backend/tests/unit/test_vtt_versions_org_isolation.py (16 parametrized cases)
- backend/tests/unit/test_websocket_org_isolation.py (11 cases)
- backend/tests/unit/test_admin_users_org_filter.py (7 cases)
- frontend: useJobStatusWebSocket.terminal.test.ts (9 cases)
- frontend: useJobStatusWebSocket.heartbeat.test.ts (9 cases)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 11:43:10 +01:00
Vadym Samoilenko
98764f5065 fix(tts-worker): make concurrency configurable via TTS_WORKER_CONCURRENCY env var
Hardcoded --concurrency=8 with 512MB memory limit caused 1162+ OOM restarts.
Default is 2; set TTS_WORKER_CONCURRENCY in .env.production to override.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 10:22:06 +01:00
Vadym Samoilenko
5d8d992e5a feat(briefs+notify+downloads): fix projects dropdown, add assignee, expand languages, fix PM email, add Download All
- NewBrief: use useAllProjects() (was useProjects('') which never fired)
- NewBrief: expand languages from 12 to 52 options with region variants
- NewBrief: add Assign To dropdown from org members
- Backend: add GET /clients/all-projects endpoint for cross-client project listing
- Backend: add assignee_id to JobBriefCreate/JobBriefResponse models + routes
- notify.py: send completion email to PMs (pm_client_ids) not client user — fixes email never arriving (was looking up users._id by client entity ID)
- Downloads: add Download All button that fetches all files sequentially

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 21:47:28 +01:00
Vadym Samoilenko
3bed598025 fix(glossary+jobs): add debug logging for glossary failures and fix AllJobs filter stale state
- glossary_service: add step-by-step debug/warning logs at each early-return point so
  the exact failure reason is visible in worker logs (project not found, no active version, etc.)
- glossary_service: guard against source_term_lower=None in ahocorasick automaton build
- glossary_service: guard against target_locale=None in _get_translation
- glossary_service: add full traceback to the outer exception catch for easier debugging
- JobsList: fix statusFilter stale state — useEffect now always syncs with URL params,
  clearing the filter when no ?status= param is present (previously the filter was never
  cleared, so navigating from /jobs?status=X to /jobs kept the old filter)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 21:25:41 +01:00
Vadym Samoilenko
713ae46d4a fix(tts): revert pro TTS to gemini-2.5-pro-preview-tts (3.1 pro TTS doesn't exist yet)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 21:01:22 +01:00
Vadym Samoilenko
3fb8dce3ee feat(ai): upgrade Gemini models to 3.1-pro-preview and 3.1-pro-tts-preview
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 21:00:32 +01:00
Vadym Samoilenko
12fe4ebcbb feat(tts): upgrade Gemini TTS model to gemini-3.1-flash-tts-preview
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 20:57:37 +01:00
Vadym Samoilenko
43ef3a6cd8 fix(migrations): correct listCollections cursor parsing, add processing_failed+cancelled to status enum
Previous migrations used async-for on a dict (Atlas returns firstBatch, not
async cursor) — silently failed. New migration reads firstBatch correctly and
sets the complete status list.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 20:47:21 +01:00
Vadym Samoilenko
8a1440201e fix(migrations): connect to mongo before running migrations in run.py
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 20:43:48 +01:00
Vadym Samoilenko
99554173e6 feat(migrations): add run.py entry point for python -m app.migrations.run
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 20:41:52 +01:00
Vadym Samoilenko
2e8cf8269e fix(tts): fetch job_doc before gcs_path call in _generate_language_tts; add cancelled migration
- translate_and_synthesize.py: fetch job_doc from DB right before the combined
  MP3 upload so gcs_path() has the gcs_prefix needed for newer jobs; removes the
  duplicate fetch that existed later in the same function
- migration_2026-04-30-000001: add 'cancelled' to MongoDB $jsonSchema validator
  enum so cancel_job writes no longer fail Document validation
- Dashboard.tsx: include all active processing statuses in the Processing counter

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 20:36:03 +01:00
Vadym Samoilenko
f681bd4f53 feat: add Stop Process button to cancel in-progress jobs
Adds POST /jobs/{id}/cancel endpoint that revokes the Celery task and
sets status to 'cancelled'. Shows a confirmation widget in the job
detail sidebar for admin/production roles when the job is in an active
processing state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 19:50:39 +01:00
Vadym Samoilenko
08a8a0d636 fix(tts): convert lameenc bytearray to bytes before GCS upload
lameenc.encode() returns bytearray, but google-cloud-storage's
_to_bytes() only accepts bytes/str — causing TypeError on every
upload_from_string() call. Cast to bytes() before returning.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 19:35:28 +01:00
Vadym Samoilenko
77a9d3b255 fix(docker): add ffmpeg to base image — fixes pydub AudioSegment in worker
ffmpeg was missing from the base image, causing all pydub operations
(AudioSegment.from_file, export) to fail in worker and tts-worker containers.
Moved ffmpeg install from whisper-worker stage to the shared base stage so
all container variants (api, worker, tts-worker, whisper-worker) have it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 19:12:57 +01:00
Vadym Samoilenko
7c15acc18a chore: update poetry.lock after adding lameenc dependency
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 18:34:04 +01:00
Vadym Samoilenko
a53cf960ae fix(tts): replace pydub MP3 export with lameenc (pure Python, no system ffmpeg)
Gemini TTS _pcm_to_mp3 used pydub.AudioSegment.export(format='mp3') which
requires a system ffmpeg binary. Worker containers don't have ffmpeg installed
(video ops run on Cloud Run). Switch to lameenc which is pure Python and
encodes PCM→MP3 without any system binary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 18:24:15 +01:00
Vadym Samoilenko
b0a90777ed fix(ts): cast job.error to string before rendering in failure banner
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 18:03:04 +01:00
Vadym Samoilenko
efa2395527 feat: inline title rename in JobDetail and QCDetail
Click the pencil icon next to the job title to rename it inline.
Enter saves, Escape or blur cancels. Available for admin/production/PM.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 17:52:43 +01:00
Vadym Samoilenko
0badae9e5d feat(jobs-list): add per-row Edit (rename) and Delete buttons
- Edit button opens inline rename modal with Enter/Escape support
- Delete button shows confirmation modal with clear warning about
  permanent removal from storage and database
- Both actions available for admin/production/project_manager roles
- Delete uses existing single-job DELETE endpoint (GCS + MongoDB)
- Rename uses existing PATCH endpoint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 17:49:51 +01:00
Vadym Samoilenko
5db01248b6 fix: pass USE_CELERY_FALLBACK to containers and show real error in failure UI
- docker-compose.yml: add USE_CELERY_FALLBACK env var to api and worker
  services so cloud_run_dispatch uses Celery on optical-dev
- JobDetail.tsx: show actual error message instead of generic
  "Processing failed at ." when failure step is unknown; also show
  job.error string when no structured failure object exists

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 17:48:02 +01:00
Vadym Samoilenko
37873c433d fix(deploy): set USE_CELERY_FALLBACK=true on optical-dev — no Cloud Run Jobs here
google.cloud.run_v2 is not installed; optical-dev dispatches pipeline tasks
via local Celery workers, not Cloud Run Jobs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 17:14:45 +01:00
Vadym Samoilenko
105895dd14 feat: apply EN source VTT changes to all target languages
When a reviewer saves the source language VTT during QC and confirms
the re-translate dialog, all target languages are re-translated via
Celery. Job transitions to `translating` and returns to `pending_qc`
when done. Existing polling in useJob covers progress display.

- schemas/job.py: add `retranslate_languages: bool` to VttUpdateRequest
- audit_log.py: add VTT_RETRANSLATE audit action
- translate_and_synthesize_task: accept languages/retranslate params,
  filter to specified languages, skip video render, return to PENDING_QC
- routes_jobs.py: add _trigger_retranslation helper, call after VTT save
- types/api.ts: add retranslate_languages to VttUpdateRequest
- useJob.ts: invalidate all lang VTTs on retranslate
- QCDetail.tsx: confirmation dialog when saving source VTT with targets

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 17:13:06 +01:00
Vadym Samoilenko
ce4b3b0d95 fix(frontend): prevent premature downloads fetch before job has outputs
- Guard useJobDownloads with !!jobStatus so it never fires when job is
  still loading (status undefined on first render)
- Expand EARLY_STATUSES to cover translating/tts_generating/rendering_*
  which also have no outputs yet
- Remove Downloads.tsx hack that locked downloads to completed-only

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 16:54:01 +01:00
Vadym Samoilenko
46477b7b32 fix(deploy): target sites-enabled instead of sites-available for Apache Include injection
On optical-dev the Apache vhost is a standalone file in sites-enabled (not
a symlink to sites-available), so injecting the Include into sites-available
had no effect and the ProxyPassMatch rules were never loaded by Apache.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 16:32:23 +01:00
Vadym Samoilenko
31199f8705 chore: push all session changes — backend hardening, tests, apache config, deploy scripts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 15:52:14 +01:00
Vadym Samoilenko
24d93277de fix(deploy): restore original memory limits on ffmpeg/whisper workers
faster_whisper loads its model into RAM at startup regardless of whether
tasks are routed to Cloud Run — reducing the limit to 512M caused OOM kill
on container start. Restored original limits (ffmpeg: 1G, whisper: 2G).

Cloud Run URLs (FFMPEG_SERVICE_URL / WHISPER_SERVICE_URL) remain set so CPU
offload is still active.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 14:32:24 +01:00
Vadym Samoilenko
ec1ce5c13a feat(deploy): offload ffmpeg+whisper to Cloud Run HTTP services on optical-dev
Sets FFMPEG_SERVICE_URL and WHISPER_SERVICE_URL so video_renderer.py and
whisper_transcribe.py route CPU-heavy work to Cloud Run instead of running
ffmpeg/Whisper locally. Both Cloud Run services and IAM (roles/run.invoker
for accessible-video-worker@ and video-accessibility@ SAs) are already
provisioned — only the env vars were missing.

ffmpeg-worker container: 1G/0.5CPU → 256M/0.25CPU (HTTP dispatcher only)
whisper-worker container: 2G/0.5CPU → 512M/0.25CPU (HTTP dispatcher only)

Expected outcome: ffmpeg-worker drops from 51% CPU / 97% RAM to < 5% CPU.
Server load avg should fall from ~2.2 to ~1.0-1.3.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 14:28:58 +01:00
Vadym Samoilenko
5fd370c093 test: fix all unit tests — 168 passing, 0 failures
- conftest.py: set required env vars before app import to prevent Settings() crash
- gcs.py: lazy bucket init checks _bucket instead of _client; add @bucket.setter
- vtt.py: fix float precision in _format_timestamp; include empty-text cues in parser
- security.py: guard verify_password against empty hash (passlib UnknownHashError)
- tts.py: _parse_timestamp raises ValueError("Invalid timestamp format: …")
- emailer.py: HTML-escape job_title in _render_completion_template (XSS fix)
- test_emailer.py: rewrite for Mailgun-based service (replaced SendGrid)
- test_gcs.py: fix UploadFile constructor, MIME type, remove executor.submit mock
- test_gemini.py: patch module-level client instead of non-existent genai.upload_file;
  translate_vtt tests use numbered-list mock responses matching new implementation
- test_tts.py: fix aiohttp async CM mock pattern; fix error message match
- test_models.py: update JobCreate to use source_is_english instead of language
- test_security.py: set jwt_access_ttl_min in token test
- test_cross_tenant_isolation.py: add patch to imports

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 14:02:04 +01:00
Vadym Samoilenko
90cbf23f0d chore: remove obsolete deploy scripts (ai-sandbox era)
Some checks failed
Deploy Backend / Deploy API to Cloud Run (push) Has been cancelled
Deploy Frontend / Build and Deploy Frontend (push) Has been cancelled
CI / Backend Lint & Test (push) Has been cancelled
CI / Frontend Lint & Test (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / Dependency Check (push) Has been cancelled
Deploy Backend / Deploy Worker to Cloud Run (push) Has been cancelled
Deploy Backend / Run Smoke Tests (push) Has been cancelled
Deploy Backend / Notify Deployment Status (push) Has been cancelled
Deploy Frontend / Notify Deployment Status (push) Has been cancelled
CI / Integration Tests (push) Has been cancelled
CI / Build Backend Docker Image (push) Has been cancelled
CI / Build Frontend (push) Has been cancelled
deploy.sh and full-deploy.sh predate the optical-dev setup and reference
old URLs/compose files. deploy-dev.sh is the single source of truth.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 12:10:57 +01:00
Vadym Samoilenko
5e55d9f27a fix(deploy): add reservations to workers in optical-dev — prevent limit < reservation OOM error
whisper-worker base has reservation 4G, optical-dev limit 2G causes Docker error.
Added explicit reservations to all three pipeline workers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 12:07:52 +01:00
Vadym Samoilenko
d5e63129dd feat(upload): PR-3 GCS resumable chunked upload for large videos
Files >100 MB bypass the load balancer via browser→GCS direct upload:
- POST /jobs/upload/init — creates GCS resumable session, returns job_id + session URI
- POST /jobs/upload/complete — verifies GCS object, creates job, dispatches ingestion
- Frontend sends 8 MB chunks with Content-Range directly to GCS session URI
- infra/gcs-cors.json + deploy-dev.sh ensure_gcs_cors() enable browser CORS on bucket

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 11:35:13 +01:00
Vadym Samoilenko
4edd4da0b2 fix(deploy): optical-dev deploy script and Apache config ready for production
deploy-dev.sh:
- BUILD_SERVICES now includes tts-worker, ffmpeg-worker, whisper-worker (enabled
  in docker-compose.optical-dev.yml via USE_CELERY_FALLBACK=true)
- ensure_apache_modules(): enables proxy, proxy_http, proxy_wstunnel, rewrite
- Apache fragment: WS proxy (ws://) placed BEFORE HTTP /api/ proxy (required
  for correct longest-match precedence in Apache)
- Added ProxyTimeout 600 (10 min) and LimitRequestBody 2147483648 (2 GB) for
  large video uploads; disablereuse=on for WS pool correctness
- Fragment always regenerated on deploy (picks up PORT/WEBROOT changes)
- Logs command uses full $COMPOSE variable instead of hardcoded partial flags

deploy/apache-video-accessibility.conf:
- Static reference copy of the Apache fragment with inline comments explaining
  each directive

.env.production:
- Updated remaining ai-sandbox.oliver.solutions URLs to optical-dev.oliver.solutions
  (API_BASE_URL, COOKIE_DOMAIN, CLIENT_BASE_URL, AZURE_REDIRECT_URI, CORS_ORIGINS)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 11:24:40 +01:00
Vadym Samoilenko
c1948ea198 feat(ux): T-2/PR-7/PR-8 — status color helper, queue stats widget, upload-final-VTT override
T-2: Extract getJobStatusColor() into utils/jobStatusMessages.ts; StatusBadge now uses the
     shared helper (single source of truth for badge colors).

PR-7: GET /admin/production/queue-stats — returns Celery queue depths via Redis LLEN.
      Production dashboard shows a live panel (10s refresh) with per-queue task counts.

PR-8: POST /admin/production/jobs/{id}/upload-final-vtt — Production/Admin can upload a
      hand-crafted VTT to bypass AI, writing to GCS and advancing the job to PENDING_QC.
      Upload modal added to FailuresList with language + type (captions/ad) selectors.

docker-compose.optical-dev.yml: enable USE_CELERY_FALLBACK=true, set worker replicas=1
      for all pipeline workers (ffmpeg/tts/whisper) with WORKER_CONCURRENCY=2 so the full
      pipeline runs on the 2-CPU optical-dev server until Cloud Run VPC Connector is ready.

Fix: remove unused effectiveMs variable in TimelinePreview (TS6133).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 11:12:36 +01:00
Vadym Samoilenko
e4b350cd7d feat(ux): R-8 linguist language warn, PM CC editing, timeline right-click + CC insert
R-8 — Linguist language competence:
- Add User.languages[] BCP-47 field to backend model + UserResponse schema
- Frontend: show amber warning in assign modal when selected linguist has no
  competence listed for the target language

PM VTT editing (FinalDetail):
- PM and ADMIN can now edit captions/AD in the final review stage
- VttEditor becomes read-write with onCueSave wired to updateVttMutation
- Other roles remain read-only

Timeline right-click + add pause:
- Right-click anywhere on the timeline opens a context menu showing the timestamp
- If near a pause point marker: "Edit timing" + "Regenerate TTS" options
- If on empty space: "Add AD cue at Xs" → inserts a new AD cue in the editor
- Pause point markers widened from 1px → 2px (3px on hover) for easier clicking
- Right-click on a pause point marker directly opens the editor

VttEditor insertAtTimeMs prop:
- New prop triggers programmatic insert at a specific video timestamp
- Used by the timeline right-click "Add AD cue here" action

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 10:51:31 +01:00
Vadym Samoilenko
518796c852 fix(vtt-editor): always-visible insert buttons + gap insert rows for silent sections
- Remove hover gate on insert/delete action buttons — all 3 buttons now permanently
  visible when !readOnly so the insert affordance is clear on touch and small screens
- Add GapInsertRow: a clickable dashed bar shown before the first cue (when gap > 0.5s)
  and between any two cues with a gap > 0.5s — directly addresses the case where music
  or silence precedes the first caption (e.g. 0:00–24.5s gap in the Command Strip video)
- Fix: insertCue now calls saveCue immediately so the placeholder cue persists even if
  the user navigates away before typing text

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 10:43:24 +01:00
Vadym Samoilenko
3f557724d3 feat(api): L-18 blocked-on-source, PR-10 promote-to-qc, R-12 reviewed_cues reset
- POST /{job_id}/actions/blocked_on_source (L-18): linguist/reviewer flags a source
  video issue; moves job to QC_FEEDBACK and records blocked_on_source_reason/at/by
- POST /{job_id}/actions/promote_to_qc (PR-10): production/admin manually bypasses
  AI processing for edge-case failures; adds audit history entry
- Reset reviewed_cues to 0 on submit_for_review (R-12) so reviewer must re-acknowledge
  all cues after each linguist resubmit
- Add assert_job_in_user_org + get_user_org_ids to core/dependencies.py (used by
  the new endpoints and the cross-tenant isolation test suite)
- Remove unused ingest_and_ai_task / translate_and_synthesize_task imports

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 10:38:39 +01:00
Vadym Samoilenko
ff372c7322 fix(security): close MT-17/18/19, restore cross-tenant tests, quick wins
Blocks 1–5 of stabilization plan:

SECURITY
- validation.py: restore settings.upload_max_video_bytes (T-14 regression fix)
  and JSON object key validation that was incorrectly removed
- MT-18: add accessible_org_ids filter to list_for_reviewer/list_for_linguist
  so reviewers/linguists only see jobs from their own org in QC queue
- MT-17: add Membership.team_ids[], write to it on invitation acceptance and
  direct team add/remove; migration backfills from Team.member_user_ids
- MT-19: validate all target_team_ids belong to invitation's org_id at creation

TESTS
- Restore test_cross_tenant_isolation.py (was deleted, only .pyc remained)
- Extend with MT-18 reviewer org isolation tests

QUICK WINS
- W-8: remove time.sleep(1) + dead debug block from POST /jobs (task was
  undefined — would have caused NameError → HTTP 500 on every job creation)
- T-13: warn at startup when REDIS_URL configured but connection failed
- T-16: skip language_qc lifespan migration when count=0 (no DB scan on startup)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 10:32:23 +01:00
Vadym Samoilenko
812a2bffce fix(frontend): remove /api suffix from VITE_API_BASE_URL (api.ts appends /api/v1 itself)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 22:32:15 +01:00
Vadym Samoilenko
9413200681 fix(login): replace placeholder support email with actual contact
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 22:29:26 +01:00
Vadym Samoilenko
8e33b413a3 fix(frontend): update .env.production URLs to optical-dev.oliver.solutions
API base URL and MSAL redirect URI were pointing to old ai-sandbox host,
causing Microsoft auth popup to redirect back to the wrong domain.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 22:28:57 +01:00
Vadym Samoilenko
2ab5a6f681 fix(frontend): remove unused useRetryTts; npm audit fix — 0 vulnerabilities
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 22:25:18 +01:00
Vadym Samoilenko
5679a38f1e fix(ts): resolve 5 TypeScript errors blocking frontend build
- QCDetail: remove unused commentsQuery variable
- BriefDetail: remove unused navigate import and assignment
- JobDetail: import type JobFailure, remove unused handleRetryTts
- NewJob: sdh_vtt fallback to false (boolean | undefined → boolean)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 22:22:55 +01:00
Vadym Samoilenko
ea30425a63 fix(migrations): version/description as class vars, not instance vars in Migration base
__init__ was setting self.version = "0000-00-00-000000" on every instantiation,
overriding the subclass class variable. All migrations were recorded in DB
with the default version instead of their own, causing duplicate key errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 22:16:12 +01:00
Vadym Samoilenko
89fa87ba8a refactor(docker): remove ffmpeg from api/worker images — runs on Cloud Run Jobs
Heavy pipeline tasks (ingest, translate, render, tts) now dispatch to
va-worker Cloud Run Job which has its own Dockerfile.cloudrun with ffmpeg.
API and lightweight Celery worker (notify/embed) don't need it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 22:08:25 +01:00
Vadym Samoilenko
f4a82dcf76 fix(migrations): replace relative imports with absolute in PR-7 migrations
Migration runner executes scripts outside package context — relative
imports fail. Pattern matches all other migration files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 22:05:32 +01:00
Vadym Samoilenko
1e5a07b06e fix(deploy): change API host port to 8012 (8010 also occupied)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 22:02:44 +01:00
Vadym Samoilenko
582f8ad2e8 fix(deploy): change API host port 8003→8010, move image to video-accessibility repo
Port 8003 is occupied by infra-api-1 on optical-dev server.
Artifact Registry repo renamed from nexus to video-accessibility.
cloudbuild.yaml defaults _TAG to 'latest' for manual runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 22:02:14 +01:00
Vadym Samoilenko
b3ace22009 feat(infra): move heavy workers to Cloud Run Jobs
Heavy pipeline tasks (ingest, translate, render, rerender) now dispatch
to a Cloud Run Job (va-worker) instead of local Celery workers. optical-dev
runs only api + lightweight worker (notify/embed) within its 2-CPU budget.

- backend/app/tasks/runner.py — Cloud Run Job entrypoint
- backend/app/services/cloud_run_dispatch.py — replaces .delay() for heavy tasks
- backend/Dockerfile.cloudrun — Cloud Run worker image (ffmpeg included)
- docker-compose.optical-dev.yml — 2-CPU safe overrides, disables heavy workers
- cloudbuild.yaml — builds va-worker image and updates Cloud Run Job
- deploy-dev.sh — uses 3-file compose, builds only api+worker locally
- routes_jobs, routes_admin_production, ingest_and_ai, translate_and_synthesize
  — all dispatch sites updated to use cloud_run_dispatch.dispatch()

USE_CELERY_FALLBACK=true in .env.local to use Celery locally during dev.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 21:47:10 +01:00
Vadym Samoilenko
f723e3f0bc chore(deploy): add whisper-worker, --redeploy flag, usage hints
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 21:36:45 +01:00
Vadym Samoilenko
c7eaa7a952 chore: add deploy-dev.sh for optical-dev deployment
Sequential image builds (one at a time to avoid OOM), auto Apache
fragment, migrations, frontend rsync, smoke test. Flags:
  --skip-build / --skip-frontend / --skip-migrations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 21:35:19 +01:00
Vadym Samoilenko
49835f9b0c feat(pr7): final hardening — MT-11..MT-16, W-12..W-14, GCS org-prefix
Closes all remaining multi-tenant security gaps and adds production UX:

Security (MT-11/12/13/15/16):
- Cross-org assignment guard in language_qc for linguist/reviewer slots
- Remove PM/CLIENT bypass from _assert_client_access
- Bind all 8 glossary handlers to MembershipContext + OrgRole check
- Consolidate authz: get_job_or_403, assert_user_in_org, OrgScopedQuery in list_jobs
- JWT access tokens now carry org_ids hint claim (transient, not authoritative)

GCS org-prefix (MT-14):
- gcs_prefix field on Job: orgs/{org_id}/jobs/{job_id} for new jobs
- gcs_path() helper — falls back to legacy {job_id}/ for old jobs
- Rewrote 30+ hardcoded GCS path sites across tasks and routes
- Operator script tools/migrate_gcs_org_prefix.py (copy-verify-delete, resumable)

Failure recovery (W-13/14):
- Unified JobFailure schema: step/type/message/retriable/occurred_at/retry_count
- PROCESSING_FAILED status; legacy TTS_FAILED/RENDER_FAILED kept for back-compat
- Fix: translation-phase exceptions now record step="translation" not "tts"
- Generic POST /jobs/{id}/retry dispatches by failure.step
- GET /admin/production/failures + POST /admin/production/bulk-retry (cap 50)
- FailureBanner in JobDetail, failures badge in Sidebar

Job Brief workflow (W-12):
- JobBrief model + 6 CRUD endpoints (list/create/get/patch/submit/approve)
- create_job accepts brief_id Form param; copies org/deadline/project; marks FULFILLED
- BriefsList, NewBrief, BriefDetail UI; NewJob pre-fills from ?brief_id=
- Briefs badge in Sidebar for submitted briefs

Migrations: 2026-04-29-000000 (failure indexes) + 2026-04-29-000001 (job_briefs)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 20:55:50 +01:00
Vadym Samoilenko
4623b89aeb feat(mt-16): JWT org_ids claim + transient user.org_ids in deps
- create_access_token gains optional org_ids: list[str] param; encodes
  {exp, sub, org_ids, v:2} — org_ids is a prefilter hint only, never
  used as authorization source of truth (Redis cache is authoritative)
- Login, MS login, refresh endpoints: fetch memberships and include
  org_ids in issued access tokens via _get_user_org_ids() helper
- routes_invitations.py accept flow: same org_ids population on token
- get_current_user: reads org_ids from payload, attaches as transient
  user.__dict__["org_ids"] — available to OrgScopedQuery for prefilter
- Force logout: rotate JWT_SECRET env var at deployment time (no code
  change needed; all existing tokens immediately invalidated)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:46:39 +01:00
Vadym Samoilenko
54fcf47887 feat(mt-14): gcs_prefix on Job, gcs_path helper, rewrite path sites
- gcs_path(job, *parts) helper in gcs.py: uses job.gcs_prefix if set,
  falls back to job._id (legacy) — backward-compatible for all old jobs
- create_job: sets gcs_prefix=orgs/{org_id}/jobs/{job_id} when
  organization_id is known; legacy jobs without org get null prefix
- Rewrote hardcoded f"{job_id}/{lang}/..." paths in:
  - ingest_and_ai.py (4 upload sites)
  - translate_and_synthesize.py (9 sites via bulk regex)
  - render_accessible_video.py (3 sites: segments, video, captions)
  - rerender_accessible_video.py (3 sites)
- tools/migrate_gcs_org_prefix.py: idempotent operator script —
  preflight checks, copy→verify(count+md5)→mongo update→delete,
  ThreadPoolExecutor(4), resume file, dry-run + rollback modes

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:45:12 +01:00
Vadym Samoilenko
fe608401be feat(w-12): brief workflow UI — list, create, detail, NewJob pre-fill
- BriefsList.tsx: table with status badge, submitted badge count
- NewBrief.tsx: form with title, description, outputs, language picker,
  deadline, project selector; calls POST /briefs
- BriefDetail.tsx: status actions — Submit (DRAFT), Approve (SUBMITTED,
  admin/PM), Create Job link (?brief_id=) for APPROVED briefs
- NewJob.tsx: reads ?brief_id, fetches brief via useBrief, pre-fills
  languages/outputs/deadline/project_id; sends brief_id in FormData
- Sidebar: Briefs link (client/production/admin/PM) with submitted-count
  badge from useBriefs()
- JobCreateRequest type: brief_id optional field
- briefs API methods: listBriefs, createBrief, getBrief, submitBrief,
  approveBrief; hooks: useBriefs, useBrief, useCreateBrief,
  useSubmitBrief, useApproveBrief

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:41:49 +01:00
Vadym Samoilenko
595897e61a feat(w-12): JobBrief model, endpoints, migration + brief→job linkage
- JobBrief model (DRAFT→SUBMITTED→APPROVED→FULFILLED) with 6 CRUD
  endpoints: list, create, get, patch (DRAFT only), submit, approve
- All endpoints use MembershipContext; read=VIEWER, mutate=MANAGER,
  approve=ADMIN for org-scoped access
- create_job accepts brief_id Form field; validates APPROVED brief,
  copies organization_id/project_id/deadline from brief, marks brief
  FULFILLED after job insert
- organization_id now populated from project client_id on job create
  (fixes missing multi-tenant field on new jobs)
- migration_2026-04-29-000001: job_briefs collection + 4 indexes
- Wired briefs router into main.py

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:38:08 +01:00
Vadym Samoilenko
a945653e73 feat(w-14): bulk failures dashboard + sidebar badge
- GET /admin/production/failures: list failed jobs filtered by step/org
- POST /admin/production/bulk-retry: dispatch retry for up to 50 jobs
  with "auto" (from failure.step) or "from_scratch" strategies
- FailuresList.tsx: accordion-grouped by error type, multi-select,
  bulk retry action, step label, retry count (red >3), updated date
- Sidebar: "Failures" item with live badge for production/admin roles
  (polls useJobs with processing_failed,tts_failed,render_failed)
- New useFailures / useBulkRetry hooks

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:36:30 +01:00
Vadym Samoilenko
264561895e feat(w-13): generic /jobs/{id}/retry endpoint + unified failure UI
- POST /jobs/{job_id}/retry dispatches correct pipeline task based on
  failure.step: ingestion/ai_processing → ingest_and_ai_task,
  translation/tts → translate_and_synthesize_task, render → rerender
- Increments retry_count, writes JOB_RETRY audit log entry
- Adds processing_failed to JobStatus type; JobFailure interface on Job
- Replaces TTS-only retry block with FailureBanner showing step/message/
  retry_count for all failed statuses (processing_failed, tts_failed,
  render_failed); Escalate mailto link for high-retry-count cases
- useRetryJob hook + apiClient.retryJob() call new endpoint

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:33:50 +01:00
Vadym Samoilenko
dca1ca9c8c feat(w-13): structured failure handlers in tasks; fix translation→TTS_FAILED bug
ingest_and_ai: exception handler now sets status=PROCESSING_FAILED and
writes Job.failure{step, type, message, retriable, occurred_at} instead
of leaving status unchanged (was silent failure).

translate_and_synthesize: replace the blanket TTS_FAILED status (even for
translation failures) with PROCESSING_FAILED + failure.step="translation"|
"tts" based on current job status at failure time.

render_accessible_video: add failure{step="render"} alongside existing
RENDER_FAILED status for UI consumption.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:28:37 +01:00
Vadym Samoilenko
3e3be935c6 feat(w-13): structured Job.failure schema, PROCESSING_FAILED status, audit actions
Add JobFailure model (step, type, message, retriable, occurred_at,
retry_count) to job.py. Add PROCESSING_FAILED to JobStatus (legacy
TTS_FAILED/RENDER_FAILED preserved for back-compat).

Add missing Job fields that existed in DB but not the Pydantic model:
organization_id, brief_id, gcs_prefix, initial_linguist_id,
initial_reviewer_id, failure, retry_count.

Add JOB_TASK_FAILED, JOB_RETRY, JOB_BULK_RETRY to AuditAction enum.

Add migration 2026-04-29-000000: processing_failed in schema validator +
compound indexes (failure.step/status) and (status/org_id/created_at).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:27:28 +01:00
Vadym Samoilenko
38038862c9 refactor(mt-15): consolidate authz in routes_jobs and dependencies
list_jobs now uses MembershipContext (Redis-cached, 60s TTL) to build
org-scoped queries instead of per-request memberships.find(). Falls back
to legacy get_accessible_project_ids for users with no memberships.

get_job replaces the role-specific CLIENT/PM access check with
get_job_or_403() which uniformly checks organization_id membership for
all roles (returns 404 not 403 to avoid leaking cross-org job existence).

get_accessible_project_ids in dependencies.py now uses _cached_memberships
from authz.py, eliminating the duplicate uncached DB query.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:26:07 +01:00
Vadym Samoilenko
5209f04318 feat(mt-13): bind glossary handlers to client_id via org membership check
All 8 glossary route handlers now verify the requesting user has org
membership in the target client_id using assert_user_in_org() from
core/authz.py. Read endpoints require VIEWER, mutations require MANAGER,
archive requires ADMIN (org-level). Removed dead _assert_can_read() and
_require_client_staff() helpers. Removed unused require_roles/User/UserRole
imports. Also added get_job_or_403() to authz.py for MT-15.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:24:41 +01:00
Vadym Samoilenko
b2d524e702 fix(mt-12): remove PM/CLIENT legacy bypass in _assert_client_access
The unconditional `if user.role in (CLIENT, PROJECT_MANAGER): return`
allowed any PM to access any client regardless of membership. Removed;
kept pm_client_ids legacy fallback for pre-migration users.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:22:56 +01:00
Vadym Samoilenko
312af2d7fb feat(mt-11): cross-org assignment guard in language_qc
Prevent PM in org A from assigning linguist/reviewer from org B.

Added _assert_user_in_job_org() helper that resolves job org_id (with
project fallback) and checks db.memberships for the assignee. Also added
assert_user_in_org() and get_job_or_403() to core/authz.py for use in
upcoming MT-13 and MT-15 commits.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:22:46 +01:00
Vadym Samoilenko
08fcb4daa4 feat(pr6): WS real-time updates, per-cue AD playback, upload guard
W-4: team assignment (linguist/reviewer) stored on job at creation,
     auto-assigned to all language QC states on first GET /language-qc
     (lazy init via auto_assign_defaults)

L-3 WS: broadcast_to_job when reviewer opens VTT for editing;
        QCDetail shows "User X is editing [lang]" banner (auto-clears 5s)

R-5: comment broadcast via broadcast_to_job on add_comment();
     QCDetail invalidates comments query on language_qc_comment WS event

L-15: QCDetail subscribes to language_qc_assigned WS event →
      refetches lang-qc data and shows toast

R-7: VttEditor gets onCuePlay prop; AD editor in QCDetail wires
     handleAdCuePlay → switches to accessible video mode, seeks & plays

T-15: beforeunload warning in NewJob while upload is in progress

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 19:42:57 +01:00
Vadym Samoilenko
bdfa0f82ab fix(lint): restore baseline lint count — no new errors introduced
QCDetail.tsx: 4 new `any` types replaced with `unknown` + type casts.
backend: ruff auto-fix sorted imports, removed unused imports, updated Optional[X] → X | None in routes_share + share_token model.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 19:16:35 +01:00
Vadym Samoilenko
1317ee7ca4 feat(t6+t7+t11): native captions track, AD audio sync, CSRF protection
T-6: Add Blob URL native <track> in VideoWithCaptions so browser CC button works in fullscreen.
T-7: Sync hidden <audio> AD playback with video play/pause/seeked events.
T-11: Double Submit Cookie CSRF — _set_auth_cookies issues httponly refresh_token + readable csrf_token; /refresh validates X-CSRF-Token header; frontend reads csrf_token cookie and sends header on all refresh calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 19:08:27 +01:00
Vadym Samoilenko
aba43a67d7 feat(l7): diff AI baseline vs current VTT in QCDetail
VttDiffView component (frontend/src/components/VttEditor/VttDiffView.tsx):
- Lazy-loads VTT version list (newest-first) and diffs version 1 (AI baseline)
  against the latest version
- Renders unified diff: green lines = added, red lines = removed (unchanged hidden)
- Collapsed by default; expand with "↔ Diff vs AI baseline" button
- Shows +N/-N change summary in header

QCDetail integration:
- VttDiffView added below both Captions and Audio Description VttEditors
  (only appears for the selected language)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 19:03:25 +01:00
Vadym Samoilenko
dc1cfd01dc feat(l3): optimistic locking for VTT edits (ETag / 409 Conflict)
Backend:
- VttContentResponse gets etag field (SHA1 of captions+AD content)
- VttUpdateRequest gets if_match field (optional)
- GET /jobs/{id}/vtt: computes and returns etag
- PATCH /jobs/{id}/vtt: if if_match present, fetches current content, recomputes
  hash, returns 409 Conflict if mismatch

Frontend:
- VttContentResponse type + VttUpdateRequest type updated
- QCDetail stores vttEtag from GET response
- All updateVttMutation calls pass if_match: vttEtag
- 409 responses show specific "Conflict: another user has modified" message

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 19:01:57 +01:00
Vadym Samoilenko
bb751033c0 feat(l1-l6): glossary inline highlights + CPS warning in VttEditor
VttEditor:
- New props: glossaryTerms and language
- Glossary: source_term occurrences underlined (amber) with preferred translation
  tooltip on hover; only terms that have a translation for the current language
- CPS badge:  N CPS shown in amber when characters-per-second > 20

QCDetail:
- Fetches active glossary for job's client (getGlossaries → find one with
  current_version_id → getGlossaryTerms up to 500 terms)
- Passes glossaryTerms + language to both Captions and AD VttEditor instances

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 18:58:59 +01:00
Vadym Samoilenko
abf81515a4 feat(pm15): share read-only link for client preview
Backend:
- ShareToken model (share_tokens collection)
- POST /jobs/{id}/share — create token (PM/PROD/ADMIN)
- GET /jobs/{id}/share — list active tokens
- DELETE /jobs/{id}/share/{token_id} — revoke token
- GET /public/share/{token} — unauthenticated preview with signed GCS URLs (6h TTL)
  Returns video, captions, AD for all languages

Frontend:
- ShareView.tsx — public page at /share/:token with language switcher, video player, download tiles
- App.tsx — /share/:token route (no auth wrapper)
- QCDetail.tsx — "↗ Share link" button in header → modal to generate + copy link

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 18:56:44 +01:00
Vadym Samoilenko
f1a9e6ee46 feat(pm7): bulk assign linguist/reviewer to all languages in one click
- POST /jobs/{job_id}/languages/bulk-assign — assigns linguist (required) and
  reviewer (optional) across all or selected languages; supports only_unassigned
  flag and optional deadline
- bulkAssignLanguages() added to API client
- QCDetail: "Assign all languages" button in Languages header; opens modal with
  linguist/reviewer dropdowns, deadline, and skip-already-assigned checkbox

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 18:53:14 +01:00
Vadym Samoilenko
1bf0fb9eed feat(pr4+pr5): hotkeys, unified status labels, upload size constant
PR-4 hotkeys (L-9):
- QCDetail: Cmd/Ctrl+S saves current VTT (handleSaveFullVtt)
- QCDetail: Escape closes both reject forms (final review + language reject modal)

PR-5 T-1 (unified status labels):
- Add JOB_STATUS_LABELS and getJobStatusLabel to utils/jobStatusMessages.ts
- JobsList.tsx: remove local STATUS_LABELS duplicate, import from shared util
- StatusBadge.tsx: remove 30-line switch duplicate, use getJobStatusLabel

PR-5 T-14 (unified upload size constant):
- config.py: upload_max_video_bytes = 2GB, upload_signed_url_ttl_hours = 24
- validation.py: use settings.upload_max_video_bytes instead of magic number
- notify.py: use settings.upload_signed_url_ttl_hours for signed URL TTL

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 18:42:03 +01:00
Vadym Samoilenko
13db347d65 feat(pr3+pr4): deadline field, job clone, reject categories, reviewed-cues gate
PM-1 (deadline):
- Job model: add deadline field (job-level PM deadline)
- POST /jobs: accept deadline as ISO date form param
- JobsList: deadline column with overdue highlight (red + warning icon)
- NewJob: date picker for deadline field
- useMultiUpload: pass deadline to batch job creation

PM-2 (clone job):
- POST /jobs/{id}/clone: creates config copy in 'created' state, no reupload
- useCloneJob hook, Clone button in JobsList actions
- navigate to cloned job on success

R-4 (reject categories):
- LanguageQCState: add reject_category field
- reject_language service: accept optional category (timing/mistranslation/terminology/profanity/length/other)
- RejectLanguageRequest: add category field
- QCDetail reject modal: category pill-selector before free-text notes

R-2 (reviewed-cues tracking):
- LanguageQCState: add reviewed_cues (int) + total_cues (nullable)
- POST /jobs/{id}/languages/{lang}/mark-cue-reviewed endpoint
- QCDetail: progress bar + approve gated at 80% for reviewer (admin bypasses)
- markCueReviewed API client method

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 18:39:05 +01:00
Vadym Samoilenko
460c6ce091 feat(pr3): PM productivity — server pagination, quick filters, PM access
- JobsList: switch from size:10000 to server-side pagination (PAGE_SIZE=50)
  with page state and numbered pagination controls
- JobsList: move status filter server-side; search/user/date remain client-side
- JobsList: add PM quick-filter presets (Final Review / In QC / Failed)
  shown for project_manager and admin roles
- JobsList: extend canManageJobs, New Job button, and Final Review action
  link to include project_manager role
- NewJob (W-5): autofill job languages from project.default_languages
  when selecting an existing project from the dropdown

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 18:28:45 +01:00
Vadym Samoilenko
c7a6f13b10 feat(workflow): PR-2 workflow blockers — PM/Production dashboards, two-stage QC, role routing
Changes:
- Dashboard: add project_manager case (final review / QC counts / new job widgets)
  and production case (AI pipeline / failures widgets)
- Sidebar: add project_manager to Final Review and Audit Log nav items;
  live badge counts for QC Queue (pending_qc) and Final Review (pending_final_review)
- App.tsx: add project_manager to Final Review and Audit Log RoleGates (W-10, PM-18)
- Login: role-based redirect after login — linguist/reviewer → /qc/queue, others → /
- language_qc._assert_can_approve: enforce two-stage QC; remove linguist self-approve
  fallback; require reviewer assignment + submitted_for_review_at (W-6)
- routes_jobs.complete_job: allow project_manager to complete jobs (W-9)
- notify.py: re-enable email notifications (W-7)
- Fix 400 on cue save: treat empty-string audio_description_vtt/captions_vtt as absent
  both in backend (truthy check) and frontend (|| undefined) — root cause was adVtt
  initialising to '' when job has no AD track

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 18:18:24 +01:00
Vadym Samoilenko
a168af1aa7 feat: two-stage QC (linguist→reviewer), project picker, comments, email notifications, deadlines
- Two-stage QC workflow: linguist edits + submits → reviewer approves/rejects per language.
  New statuses: in_progress, pending_review, in_review. New service functions: submit_for_review,
  open_review, assign_reviewer, reassign_reviewer, add_comment. Linguist and reviewer deadlines.
- Reject now resets language to in_progress so linguist can iterate without full re-assignment.
- QC comment threads per language (append-only), visible to all assignees.
- Email notifications via Mailgun on: assignment, submit-for-review, comment, approve, reject.
  Best-effort (failures do not roll back QC actions). asyncio.gather for parallel fan-out.
- New audit actions: LANGUAGE_QC_REVIEWER_ASSIGN/REASSIGN, LANGUAGE_QC_SUBMIT,
  LANGUAGE_QC_OPEN_REVIEW, LANGUAGE_QC_COMMENT.
- Inline project picker in NewJob: "+ Create new project…" option with name, default
  languages, default linguist, default reviewer. Pre-fills languages on the new job.
- Project model extended with default_languages, default_linguist_id, default_reviewer_id.
- RBAC: CLIENT org-members can now create projects (backend guard relaxed).
- LinguistQueue: role toggle "As linguist / As reviewer" + new status tabs.
- QCDetail: two-slot assignment cards (linguist + reviewer), deadline display, role-aware
  action buttons, comments panel with optimistic insert and 15s refetch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 16:59:40 +01:00
Vadym Samoilenko
bfb3a18d65 fix: switch embedding model to gemini-embedding-001
text-embedding-004 and text-multilingual-embedding-002 are not available
through this API key. gemini-embedding-001 (768-dim, multilingual) is.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 16:02:12 +01:00
Vadym Samoilenko
be0bffe459 fix: get_terms_page avoids GlossaryTerm validation on partial projection
Projected docs only have _id/source_term/translations; validating against
GlossaryTerm (which requires glossary_id, version_id, source_term_lower)
caused 500 on the terms endpoint. Return plain dicts instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:57:12 +01:00
Vadym Samoilenko
125c69fb1d fix: audit log user/security endpoints return correct shapes
- /audit-logs/user/{id}: now accepts email OR ObjectId, returns bare array
- /audit-logs/security: returns bare array instead of {logs, hours} wrapper
  Both match AuditLogEntry[] that the frontend expects.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:48:00 +01:00
Vadym Samoilenko
0444e88178 feat: make client and project required when creating a job
- Both fields now show a validation error on submit if not selected
- Labels updated to show required asterisk
- Section always visible regardless of client list length

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:43:50 +01:00
Vadym Samoilenko
ad67089b09 fix: remove duplicate /audit-logs route and align pagination params with frontend
The legacy GET /audit-logs (returning wrong shape) shadowed the proper one.
Removed the duplicate and changed page/size params to skip/limit to match
the AuditLogQuery the frontend sends.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:39:22 +01:00
Vadym Samoilenko
8dc693db54 revert: restore linguist-only filter in assignment dropdown
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:36:08 +01:00
Vadym Samoilenko
bf303586f1 fix: assignment dropdown shows all active internal staff, not just linguists
Querying only role=linguist left the dropdown empty since no active linguist
users exist. Now fetches all active users and filters out clients on the
frontend, so any staff member (PM, reviewer, admin, linguist) can be assigned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:34:38 +01:00
Vadym Samoilenko
dfc9bbe37b fix: guard total_count in AuditLog against undefined before toLocaleString
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:32:07 +01:00
Vadym Samoilenko
e7917cde10 fix: use 'is None' check for Motor collection to avoid NotImplementedError
PyMongo Collection raises NotImplementedError on bool(), so 'if not self.collection'
crashes on every audit log write. Changed to 'if self.collection is None'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:23:44 +01:00
Vadym Samoilenko
dee4d69b40 fix: raise user list size limit to 500 and guard toLocaleString calls
- routes_admin.py: size query param max raised from 100 → 500 so
  ClientDetail.tsx (size=200) no longer returns 422
- GlossaryDetail.tsx: three .toLocaleString() calls guarded with ?? 0
  to prevent TypeError when term_count is undefined on first render

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:20:38 +01:00
Vadym Samoilenko
e48d63bdbd fix: generate valid ObjectId for audit log entries
default_factory=PyObjectId produced "" (empty string) since
Annotated[str, ...] is a type annotation, not a callable factory.
Replace with lambda: str(ObjectId()) to generate a real unique ID.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:06:15 +01:00
Vadym Samoilenko
0d46c1440c fix: remove unused TypeScript imports and variables
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:43:36 +01:00
Vadym Samoilenko
85e1e852ed fix: add --no-root to poetry install in Dockerfiles (Poetry 2.x) 2026-04-29 14:35:28 +01:00
Vadym Samoilenko
5cd2fb2743 fix: regenerate poetry.lock + align whisper Dockerfile poetry version
poetry.lock was out of sync with pyproject.toml (cost-tracker and
glossary deps added since last lock). Regenerated with Poetry 2.1.4.
Also updated Dockerfile.whisper-service from poetry==1.8.2 to 2.1.4
to match the main Dockerfile and avoid format incompatibility.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:32:41 +01:00
Vadym Samoilenko
a3b300b76a docs: add canonical documentation + audit cleanup
- AGENTS.md: canonical project entry point (Quick Nav, pipeline, constraints)
- docs/: complete docs tree — architecture, API spec, DB schema, infra,
  runbook, requirements, tech stack, principles, reference ADRs, guides,
  tasks backlog, testing strategy
- tests/README.md: test commands, structure, known gaps
- README.md / CLAUDE.md / DEPLOYMENT.md: updated with canonical doc links
- .archive/: backup of pre-documentation-pipeline originals
- backend/uv.lock: uv dependency lockfile
- Delete committed __pycache__ .pyc files (should have been gitignored)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:22:51 +01:00
Vadym Samoilenko
fd154e7799 fix: upgrade poetry in Dockerfile from 1.8.2 to 2.1.4
poetry.lock was generated with 2.1.4 — using 1.8.2 caused
incompatible lock file error and failed Docker builds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:21:42 +01:00
Vadym Samoilenko
743a8597c2 fix: auto-sync poetry.lock during Docker build
Prevents build failures when pyproject.toml changes without a lock regen.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:21:01 +01:00
Vadym Samoilenko
4c6624c3d4 fix: code health sweep — M-01 through M-07
M-01 authz.py: move cache_key above try block to avoid NameError when
     first Redis call returns None
M-02 main.py: re-enable validation middleware (was TEMPORARILY DISABLED)
M-03 routes_auth.py / main.py: replace print() debug lines with
     structured logger calls; logger now module-level in routes_auth.py
M-04 gcs.py: asyncio.get_event_loop() → get_running_loop() (deprecation)
M-05 translate_and_synthesize.py: bind loop vars in closure defaults
     to fix B023 ruff warnings (transcreate/translate_captions/etc.)
M-06 rate_limiting.py: only trust X-Forwarded-For when X-Forwarded-Proto
     is https; use rightmost entry (proxy-appended) not leftmost
M-07 validation.py: extend MongoDB operator blocklist to cover $expr,
     $function, $accumulator, $nin, $gte, $lte, $jsonSchema, $mod

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:18:02 +01:00
Vadym Samoilenko
86ef5a86fb refactor: extract broadcast_status_update to shared _websocket_bridge (H-08)
The function was copy-pasted identically in ingest_and_ai.py and
translate_and_synthesize.py. Extracted to tasks/_websocket_bridge.py
as the single definition; all four task modules now import from there.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:14:57 +01:00
Vadym Samoilenko
87ae6571fe perf: use DI connection pool for auth routes, async httpx for MS SSO (H-01, H-02)
- login and microsoft_login routes now use Depends(get_database) instead
  of creating a per-request MongoClient — removes connection-pool churn
  under load
- MicrosoftAuthService._get_openid_config/_get_jwks/validate_token are
  now async, using httpx.AsyncClient instead of blocking requests.get —
  removes ~400ms event-loop block per Microsoft login
- Removed unused AsyncIOMotorClient import from routes_auth.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:13:50 +01:00
Vadym Samoilenko
c6b19d01f2 security: remove default admin password fallback (C-04)
seed_default_admin now skips creation and logs a warning when
DEFAULT_ADMIN_PASSWORD is unset instead of falling back to the
hardcoded ChangeMe123! value. Existing-admin promotion path is
unaffected. Added DEFAULT_ADMIN_PASSWORD to .env.prod.example.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:12:24 +01:00
Vadym Samoilenko
e81acebc45 security: remove exception detail from /auth/refresh response (C-03)
Replaced the bare except that leaked str(e) (JWT library internals,
claim validation messages) with a generic "Invalid refresh token" detail.
Full traceback is now logged server-side via the structured logger.
Re-raises HTTPException before the generic handler so valid 401s from
inner checks are not double-wrapped.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:11:59 +01:00
Vadym Samoilenko
70f6c6befb security: reject refresh tokens used as access tokens (C-02)
get_current_user and get_current_user_optional now reject any token
whose payload carries type="refresh". Access tokens carry no type field
so the check is asymmetric and safe. Prevents a refresh-cookie value
from being replayed as a Bearer access token.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:11:50 +01:00
Vadym Samoilenko
93cb7527ab security: enforce rate limit on /auth/login (C-01)
Removed /api/v1/auth/login from the rate-limit bypass list in both
rate_limiting.py and main.py. The existing 5-req/5-min limit for the
login endpoint was already configured but never applied.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:11:36 +01:00
Vadym Samoilenko
103b409f78 fix: handle role as str or Enum in audit_logger
user.role stored as plain string in MongoDB — calling .value on it
caused AttributeError on every login, blocking all auth.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:09:46 +01:00
Vadym Samoilenko
fa351e4d25 feat: per-client glossary — hybrid exact/vector retrieval + AI injection
Adds full glossary system so Gemini uses client-approved terminology
when generating subtitles and translations (critical for 3M brand names
and product codes across 16 target locales).

Backend:
- lib/locales.py: BCP-47 locale registry, normalises xlsx fr_fr → fr-FR
- models/glossary.py: Glossary / GlossaryVersion / GlossaryTerm + enums
- services/glossary_service.py: xlsx parse (openpyxl), ingest to Mongo,
  hybrid retrieval (Aho-Corasick exact + Atlas Vector Search), prompt block
- services/embedding_service.py: Gemini text-embedding-004, batch 100, retry
- tasks/embed_glossary.py: Celery background task for async embedding
- api/v1/routes_glossaries.py: CRUD endpoints under /clients/{id}/glossaries
- gemini.py: _build_glossary_block(), {GLOSSARY} injection in all 4 call sites
- tts.py / gemini_tts.py: pass full locale codes (no split("-")[0] truncation)
- tasks/translate_and_synthesize.py: glossary lookup + injection per language
- prompts: {GLOSSARY} placeholder in ingestion, targeted, transcreation prompts
- pyproject.toml: +openpyxl, +pyahocorasick

Frontend:
- routes/admin/glossaries/: GlossaryList, GlossaryUpload, GlossaryDetail
- App.tsx: 3 new routes under /admin/clients/:clientId/glossaries
- ClientDetail.tsx: Glossaries card with count + quick links
- types/api.ts: Glossary, GlossaryVersion, GlossaryDetail, GlossaryTerm types
- lib/api.ts: 7 new API methods (upload, list, detail, terms, versions, activate, archive)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 13:03:38 +01:00
Vadym Samoilenko
05f25a1141 feat: per-language QC workflow with linguist assignment
- Job.language_qc dict tracks per-language status (pending/in_review/approved/rejected)
  with full event history; qc_assignments denormalized array enables efficient queue queries
- language_qc service handles assign/reassign/approve/reject/reopen with atomic DB updates,
  audit logging, and auto-advancement to pending_final_review when all languages approved
- Linguists can only edit VTT and trigger re-renders for their assigned language (403 guard)
- return_to_qc resets all language statuses while preserving assignments
- routes_language_qc.py: 7 new endpoints; /me/language-qc-queue for linguist queue
- Startup migration idempotently seeds language_qc for all existing jobs
- Frontend: LanguageQCState types, API methods, LinguistQueue page, QCDetail redesigned
  with per-language status badges, assignment dropdown, inline approve/reject buttons,
  progress bar, and reject modal; My QC Queue sidebar link

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:09:40 +01:00
Vadym Samoilenko
bab30e1508 feat: VTT version control — snapshots, diff, restore
Backend:
- VttVersion model (vtt_version.py): immutable snapshot per job/lang/kind/version
- vtt_versioning service: create_version (atomic counter + GCS snapshot),
  list_versions, get_version, restore_version, diff_versions (difflib line-level)
- routes_vtt_versions.py: GET /versions, GET /versions/{v}, GET /versions/diff,
  POST /versions/{v}/restore (PRODUCTION/ADMIN only, overwrites live file + audit log)
- Hook create_version into update_job_vtt_content before each live-file overwrite
- Mongo indexes: unique (job_id, lang, kind, version) + (job_id, created_at)

Frontend:
- VttVersionSummary / VttVersionFull / VttDiffResponse types
- api.ts: listVttVersions, getVttVersion, diffVttVersions, restoreVttVersion
- VersionsTab.tsx: lang/kind switcher, version list with A/B compare buttons,
  inline diff viewer (color-coded +/−), content viewer, restore with confirm dialog
- JobDetail.tsx: new "VTT Versions" tab wired to VersionsTab

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 11:46:21 +01:00
384 changed files with 31149 additions and 5726 deletions

View file

@ -0,0 +1,25 @@
# Source Documentation Archive — 2026-04-29
## What was archived
Original non-canonical documentation files backed up before canonical structure was created.
## Files archived
| File | Migrated to |
|------|------------|
| `README.md` | Updated in place; canonical docs in `docs/` |
| `DEPLOYMENT.md` | `docs/project/runbook.md` + `docs/project/infrastructure.md` |
| `DEPLOYMENT_OPTIONS.md` | `docs/project/infrastructure.md` |
| `APACHE_DEPLOYMENT.md` | `docs/project/runbook.md` (Apache config section) |
## Rollback
To restore original files: copy from `original/` back to project root.
```
cp original/README.md ../../README.md
cp original/DEPLOYMENT.md ../../DEPLOYMENT.md
cp original/DEPLOYMENT_OPTIONS.md ../../DEPLOYMENT_OPTIONS.md
cp original/APACHE_DEPLOYMENT.md ../../APACHE_DEPLOYMENT.md
```

View file

@ -0,0 +1,236 @@
# Apache Frontend + Docker Backend Deployment Guide
## 🏗 Architecture Overview
**Frontend**: Built React app served by your existing Apache webserver
**Backend**: Docker containers running FastAPI + workers + database
```
Apache Webserver (Frontend) → Docker Backend Services
└── Built React App ├── FastAPI API (:8000)
├── Celery Workers
├── Change Stream Service
├── MongoDB
└── Redis
```
## 🚀 Deployment Steps
### 1. **Deploy Backend Services**
```bash
# 1. Create production environment file
cp .env.prod.example .env.prod
# Edit .env.prod with your production values
# 2. Start backend services only
docker-compose -f docker-compose.prod.yml up -d
# 3. Verify services are running
docker-compose -f docker-compose.prod.yml ps
```
**Running Services:**
- `accessible-video-api-prod` - FastAPI API (port 8000)
- `accessible-video-worker-prod` - Celery workers
- `accessible-video-mongo-prod` - MongoDB database
- `accessible-video-redis-prod` - Redis cache/queue
### 2. **Build and Deploy Frontend to Apache**
```bash
# 1. Configure frontend environment
cd frontend
cp .env.example .env.production.local
# Edit .env.production.local:
# VITE_API_URL=https://your-api-domain.com:8000
# VITE_SENTRY_DSN=your-sentry-dsn
# VITE_ENVIRONMENT=production
# 2. Build production frontend
npm run build
# 3. Deploy to Apache document root
sudo cp -r dist/* /var/www/html/your-app/
# OR
sudo rsync -av --delete dist/ /var/www/html/your-app/
```
### 3. **Configure Apache Virtual Host**
Create `/etc/apache2/sites-available/your-app.conf`:
```apache
<VirtualHost *:443>
ServerName your-domain.com
ServerAlias www.your-domain.com
DocumentRoot /var/www/html/your-app
# SSL Configuration
SSLEngine on
SSLCertificateFile /path/to/your/certificate.crt
SSLCertificateKeyFile /path/to/your/private.key
# Security Headers
Header always set X-Frame-Options "SAMEORIGIN"
Header always set X-Content-Type-Options "nosniff"
Header always set X-XSS-Protection "1; mode=block"
Header always set Referrer-Policy "strict-origin-when-cross-origin"
Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
# Compression
<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript
</IfModule>
# Caching for static assets
<LocationMatch "\.(css|js|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$">
ExpiresActive On
ExpiresDefault "access plus 1 year"
Header set Cache-Control "public, immutable"
</LocationMatch>
# Don't cache HTML files
<LocationMatch "\.html$">
ExpiresActive On
ExpiresDefault "access plus 0 seconds"
Header set Cache-Control "no-cache, no-store, must-revalidate"
</LocationMatch>
# React Router support (handle client-side routing)
<Directory "/var/www/html/your-app">
Options -Indexes
AllowOverride All
Require all granted
# Fallback to index.html for client-side routing
FallbackResource /index.html
</Directory>
# Optional: Proxy API requests (alternative to CORS)
# ProxyPreserveHost On
# ProxyPass /api/ http://your-docker-host:8000/api/
# ProxyPassReverse /api/ http://your-docker-host:8000/api/
# Logs
ErrorLog ${APACHE_LOG_DIR}/your-app_error.log
CustomLog ${APACHE_LOG_DIR}/your-app_access.log combined
</VirtualHost>
# HTTP to HTTPS redirect
<VirtualHost *:80>
ServerName your-domain.com
ServerAlias www.your-domain.com
Redirect permanent / https://your-domain.com/
</VirtualHost>
```
Enable the site:
```bash
sudo a2ensite your-app.conf
sudo systemctl reload apache2
```
## ⚙️ Configuration Files Updated
### `docker-compose.prod.yml`
- ✅ Removed frontend and nginx services
- ✅ Added CORS_ORIGINS environment variable
- ✅ Backend services only (API, workers, database)
### `.env.prod.example`
- ✅ Production environment template
- ✅ CORS configuration for Apache frontend
- ✅ All required variables documented
## 🔧 CORS Configuration
Since frontend and backend are on different domains, configure CORS in your backend:
**In `.env.prod`:**
```bash
CORS_ORIGINS=https://your-domain.com,https://www.your-domain.com
```
**Backend automatically handles CORS** based on this environment variable.
## 📋 Deployment Checklist
### Backend Services
- [ ] Copy `.env.prod.example` to `.env.prod`
- [ ] Update all environment variables in `.env.prod`
- [ ] Run `docker-compose -f docker-compose.prod.yml up -d`
- [ ] Verify API accessible at `http://your-docker-host:8000/docs`
- [ ] Check logs: `docker-compose -f docker-compose.prod.yml logs -f`
### Frontend Deployment
- [ ] Update `frontend/.env.production.local` with API URL
- [ ] Run `npm run build` in frontend directory
- [ ] Copy `dist/*` to Apache document root
- [ ] Configure Apache virtual host
- [ ] Enable site and reload Apache
- [ ] Test frontend loads and connects to API
### Security & Performance
- [ ] SSL certificate configured
- [ ] Security headers enabled
- [ ] Gzip compression enabled
- [ ] Static file caching configured
- [ ] CORS origins properly set
- [ ] Firewall rules: only expose port 8000 for API
## 🔍 Troubleshooting
### Common Issues
**CORS Errors:**
- Verify `CORS_ORIGINS` in `.env.prod` matches your domain
- Check browser dev tools for exact error
**API Connection Failed:**
- Verify `VITE_API_URL` in frontend build
- Check backend API is accessible from frontend server
- Ensure port 8000 is open and reachable
**React Router 404s:**
- Verify `FallbackResource /index.html` in Apache config
- Ensure `AllowOverride All` is set
**File Upload Issues:**
- Check Apache `LimitRequestBody` directive
- Verify backend can write to GCS bucket
### Monitoring Commands
```bash
# Backend services status
docker-compose -f docker-compose.prod.yml ps
# View logs
docker-compose -f docker-compose.prod.yml logs -f api
docker-compose -f docker-compose.prod.yml logs -f worker
# Apache status
sudo systemctl status apache2
sudo tail -f /var/log/apache2/your-app_error.log
```
## 🎯 Benefits of This Setup
**Separation of Concerns** - Frontend and backend independently deployable
**Existing Infrastructure** - Uses your current Apache setup
**Scalability** - Backend can be moved to different hosts easily
**Caching** - Apache handles static file caching efficiently
**SSL Termination** - Apache handles HTTPS for frontend
**Monitoring** - Separate logs and monitoring for each tier
Your backend services will run in Docker containers while the frontend integrates seamlessly with your existing Apache web server infrastructure.

Binary file not shown.

View file

@ -0,0 +1,168 @@
# Deployment Options for Video Accessibility Platform
## 🏗 Current Docker Setup
Your `docker-compose.yml` serves **both frontend and backend** in **development mode**:
- **Frontend**: Vite dev server on port 5173 (hot reload)
- **Backend**: FastAPI on port 8000 (auto-reload)
- **Database**: MongoDB + Redis
- **Workers**: Celery + Change Stream service
## 🚀 Production Deployment Options
### 1. **All-in-Docker Production** ✅ Recommended
**What it does:**
- Frontend: Built React app served by Nginx (port 80)
- Backend: Production FastAPI (port 8000)
- Single `docker-compose up` deployment
**Usage:**
```bash
# Production deployment
docker-compose -f docker-compose.prod.yml up -d
# Access:
# Frontend: http://localhost:80
# Backend API: http://localhost:8000
```
**Benefits:**
- ✅ Single command deployment
- ✅ Optimized frontend build
- ✅ Production-ready configuration
- ✅ Built-in health checks
- ✅ Nginx caching and compression
### 2. **Single Domain with Nginx Proxy** ✅ Best UX
**What it does:**
- Everything served from one domain (port 80)
- `/api/*` routes to backend
- `/*` routes to frontend
- WebSocket support included
**Usage:**
```bash
# Uses nginx/nginx.conf for routing
docker-compose -f docker-compose.prod.yml up nginx
# Access everything at: http://localhost
```
**Benefits:**
- ✅ No CORS issues
- ✅ Single domain simplicity
- ✅ Better caching control
- ✅ Rate limiting built-in
- ✅ SSL termination ready
### 3. **Cloud-Native (Google Cloud)** 🌟 Enterprise
**Architecture:**
```
Frontend (Cloud Storage + CDN) → API (Cloud Run) → Database (MongoDB Atlas)
Workers (Cloud Run)
```
**Components:**
- **Frontend**: Build + deploy to Cloud Storage, serve via Cloud CDN
- **Backend**: Deploy to Cloud Run (auto-scaling)
- **Workers**: Separate Cloud Run service for Celery
- **Database**: MongoDB Atlas (managed)
- **Files**: Google Cloud Storage (already integrated)
**Benefits:**
- ✅ Auto-scaling
- ✅ Global CDN
- ✅ Managed services
- ✅ Pay-per-use
- ✅ High availability
## 📊 Comparison Matrix
| Option | Complexity | Cost | Scalability | Maintenance |
|--------|------------|------|-------------|-------------|
| **Dev Docker** | Low | Very Low | Limited | Manual |
| **Prod Docker** | Low | Low | Manual | Medium |
| **Nginx Proxy** | Medium | Low | Manual | Medium |
| **Cloud Native** | High | Variable | Automatic | Low |
## 🚀 Quick Migration Guide
### From Development → Production Docker
1. **Update environment variables:**
```bash
cp .env.example .env.prod
# Edit .env.prod with production values
```
2. **Deploy:**
```bash
docker-compose -f docker-compose.prod.yml up -d
```
3. **Verify:**
```bash
# Frontend (optimized build)
curl http://localhost:80
# Backend API
curl http://localhost:8000/health
```
### From Docker → Cloud Native
1. **Build frontend:**
```bash
cd frontend && npm run build
gsutil -m rsync -r -d dist/ gs://your-bucket/
```
2. **Deploy backend:**
```bash
gcloud run deploy video-api --source=./backend --region=us-central1
```
3. **Deploy workers:**
```bash
gcloud run deploy video-workers --source=./backend --region=us-central1
```
## 🔧 Configuration Files Created
### `docker-compose.prod.yml`
- Production-ready Docker setup
- Nginx serving frontend
- Optimized environment variables
- Health checks included
### `nginx/nginx.conf`
- Single-domain routing configuration
- API proxy with rate limiting
- WebSocket support
- Static file caching
- Security headers
## 🎯 Recommendations by Use Case
### **Small Team / MVP**
→ Use **Production Docker** (`docker-compose.prod.yml`)
### **Growing Business**
→ Use **Nginx Proxy** setup for better performance
### **Enterprise / Scale**
→ Go **Cloud Native** with Google Cloud Run + CDN
## 🔍 Current Status
**Development**: Already working with `docker-compose up`
**Production Docker**: Ready with `docker-compose.prod.yml`
**Nginx Proxy**: Configured and ready to deploy
⚠️ **Cloud Native**: Requires GCP setup and configuration
Your current Docker setup is **development-optimized**. For production, use the new `docker-compose.prod.yml` which properly builds and serves the React app through Nginx while keeping the backend API separate but coordinated.

View file

@ -0,0 +1,384 @@
# Accessible Video Processing Platform
A comprehensive AI-powered platform for generating accessible video content with closed captions, audio descriptions, and multi-language translations. Features a complete workflow from video upload to final delivery with quality control processes.
## ✅ Current Status: **Production-Ready** (85% Complete)
**Lines of Code:** 20,471 total (12,198 backend + 8,273 frontend)
## 🚀 Key Features Implemented
### Core Functionality ✅
- **AI-Powered Processing**: Complete Gemini 2.5 Pro integration for intelligent caption and audio description generation
- **Multi-Language Pipeline**: Google Translate + cultural transcreation with 50+ language support
- **Quality Control Workflow**: Full reviewer approval/rejection system with VTT editing capabilities
- **Audio Description TTS**: Google Cloud TTS and ElevenLabs integration with audio synthesis
- **Real-time Updates**: WebSocket-powered job status tracking and notifications
- **Advanced Video Player**: Multi-language caption support with timeline navigation
- **Role-Based Access Control**: Complete CLIENT/REVIEWER/ADMIN role system
### Security & Infrastructure ✅
- **JWT Authentication**: Secure access/refresh token system with HttpOnly cookies
- **Audit Logging**: Comprehensive audit trail for all reviewer actions
- **Signed URLs**: Secure Google Cloud Storage file access (24h expiry)
- **Input Validation**: Complete request validation and sanitization
- **HTTPS/CORS**: Production-ready security configuration
### User Experience ✅
- **Responsive Design**: Mobile-first Tailwind CSS implementation
- **Real-time Feedback**: Live job progress tracking and notifications
- **Advanced File Management**: Drag-and-drop uploads with progress indicators
- **VTT Editor**: Inline caption editing with live preview
- **Download Portal**: Secure asset delivery with organized file structure
## 🛠 Tech Stack
### Backend (FastAPI + Python 3.11)
- **FastAPI 0.115.0** - Modern async web framework with OpenAPI documentation
- **Celery 5.3.4** - Distributed task queue with Redis broker
- **MongoDB 7.0** - Document database with replica set support
- **Redis 7.2** - Caching and message queuing
- **Google Cloud Platform** - Storage, AI services, Secret Manager, TTS
- **Pydantic 2.5** - Data validation and serialization
- **OpenTelemetry** - Observability and monitoring
- **Sentry** - Error tracking and performance monitoring
### Frontend (React 19 + TypeScript)
- **React 19.1.1** - Modern UI framework with latest features
- **Vite 7.1.2** - Lightning-fast build tool and dev server
- **TypeScript 5.8** - Full type safety throughout application
- **TanStack Query 5.85** - Advanced server state management with caching
- **React Router 7.8** - Client-side routing with protected routes
- **Tailwind CSS 4.1** - Utility-first CSS framework
- **Zustand 5.0** - Lightweight client state management
- **React Hook Form + Zod** - Form handling with schema validation
## 🏗 Architecture Overview
### Complete Job Processing Pipeline ✅
```
Upload → Ingestion → AI Processing → QC Review → Translation → TTS → Final Review → Delivery
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
GCS Gemini 2.5 VTT Generation Human Google Text-to- Reviewer Email +
Storage Pro + Validation Review Translate Speech Approval Downloads
```
### System Architecture
- **Monorepo Structure**: `/backend`, `/frontend`, `/infra` with clear separation
- **Microservices Ready**: Modular FastAPI services with proper dependency injection
- **Event-Driven**: WebSocket real-time updates with connection management
- **Scalable Workers**: Celery task queue with auto-retry and error recovery
- **Secure by Design**: RBAC, signed URLs, audit logging, input validation
## 🚀 Getting Started
### Prerequisites
- **Python 3.11+** (backend development)
- **Node.js 18+** (frontend development)
- **Docker & Docker Compose** (required for local development)
- **Google Cloud Project** with APIs enabled (for video processing)
### 🐳 Local Development with Docker (Recommended)
This is the recommended approach for local development. Backend services run in Docker containers while the frontend runs via Vite dev server for fast hot-reload.
#### Initial Setup
```bash
# 1. Clone the repository
git clone <repository>
cd video_accessibility
# 2. Copy and configure environment files
cp .env.prod.example .env.local
# Edit .env.local with your API keys and settings
# 3. Set up frontend environment
cp frontend/.env.example frontend/.env.local
# The defaults should work for local development
# 4. Ensure GCP credentials are in place
# Copy your GCP service account JSON to: ./secrets/gcp-credentials.json
```
#### Starting the Development Environment
**Step 1: Start Backend Services (Docker)**
```bash
# Start API, Worker, MongoDB, and Redis in Docker
./scripts/run-local.sh
# Services will be available at:
# - API: http://localhost:8003
# - API Docs: http://localhost:8003/docs
# - MongoDB: mongodb://localhost:27017
# - Redis: redis://localhost:6379
```
**Step 2: Start Frontend (Vite Dev Server)**
```bash
# In a separate terminal
cd frontend
npm install # First time only
npm run dev
# Frontend will be available at:
# - Application: http://localhost:6001/video-accessibility
```
#### Useful Commands
```bash
# View logs
docker compose logs -f api # API logs
docker compose logs -f worker # Worker logs
docker compose logs -f # All logs
# Restart a service
docker compose restart api
docker compose restart worker
# Rebuild and restart (after code changes)
./scripts/run-local.sh --rebuild
# Stop all services
./scripts/run-local.sh --stop
# or
docker compose down
```
#### Test User Credentials (Local Development Only)
For testing different user roles locally:
```
Admin: admin@example.com / admin
Production: production@example.com / production
Reviewer: reviewer@example.com / reviewer
Client: client@example.com / client123
```
**Note**: These test users are only for local development. Production uses Microsoft authentication.
### Alternative: Native Development (Without Docker)
For development without Docker, you'll need to run each service manually:
```bash
# Terminal 1: MongoDB
mongod --dbpath ./data/db
# Terminal 2: Redis
redis-server
# Terminal 3: Backend API
cd backend
poetry install
poetry run uvicorn app.main:app --reload --port 8000
# Terminal 4: Celery Worker
cd backend
poetry run celery -A app.tasks worker --loglevel=info
# Terminal 5: Frontend
cd frontend
npm install
npm run dev
```
**Note**: The Docker approach is strongly recommended as it ensures consistency and simplifies setup.
### Testing & Quality
```bash
# Backend tests + linting
cd backend
poetry run pytest
poetry run ruff check .
poetry run mypy .
# Frontend tests + linting
cd frontend
npm run test
npm run test:e2e
npm run lint
npm run type-check
```
## 📁 Project Structure
```
video_accessibility/ # Root monorepo
├── backend/ # FastAPI Python backend (12,198 LOC)
│ ├── app/
│ │ ├── api/v1/ # REST API endpoints
│ │ │ ├── auth.py # JWT authentication
│ │ │ ├── jobs.py # Job CRUD & workflow
│ │ │ ├── admin.py # Admin operations
│ │ │ └── files.py # File management
│ │ ├── core/ # Core configuration
│ │ ├── models/ # Database models
│ │ ├── schemas/ # Pydantic request/response schemas
│ │ ├── services/ # External service integrations
│ │ │ ├── gemini.py # AI processing
│ │ │ ├── gcs.py # Google Cloud Storage
│ │ │ ├── translation.py # Multi-language support
│ │ │ └── tts.py # Text-to-speech
│ │ ├── tasks/ # Celery background workers
│ │ ├── middleware/ # Request processing
│ │ └── telemetry/ # Observability
│ ├── tests/ # Comprehensive test suite
│ └── Dockerfile # Container configuration
├── frontend/ # React TypeScript SPA (8,273 LOC)
│ ├── src/
│ │ ├── routes/ # Page components
│ │ │ ├── auth/ # Login system
│ │ │ ├── jobs/ # Job management
│ │ │ ├── qc/ # Quality control
│ │ │ └── admin/ # Admin interface
│ │ ├── components/ # Reusable UI components
│ │ │ ├── VideoWithCaptions.tsx # Advanced video player
│ │ │ ├── VttEditor.tsx # Caption editing
│ │ │ └── UploadDropzone.tsx # File upload
│ │ ├── lib/ # Utilities and API client
│ │ ├── hooks/ # Custom React hooks
│ │ └── types/ # TypeScript definitions
│ ├── tests/ # Unit + E2E tests
│ ├── .env.local # Local development config
│ └── Dockerfile # Container configuration
├── scripts/
│ ├── run-local.sh # Local development startup
│ ├── deploy.sh # Production deployment
│ ├── full-deploy.sh # Full production rebuild
│ └── build-frontend.sh # Frontend build script
├── docker-compose.yml # Base Docker configuration
├── docker-compose.local.yml # Local development overrides
├── docker-compose.prod.yml # Production overrides
├── .env.local # Local environment variables
├── .env.production # Production environment variables
├── CLAUDE.md # Development guidelines
└── video_accessibility_development_plan.txt # Complete specification
```
## ⚙️ Configuration
### Environment Variables
**Backend** (`backend/.env`):
```bash
# Database
MONGODB_URL=mongodb://admin:password@localhost:27017/accessible_video
REDIS_URL=redis://localhost:6379/0
# Authentication
JWT_SECRET_KEY=your-jwt-secret
JWT_REFRESH_SECRET_KEY=your-refresh-secret
# AI Services
GEMINI_API_KEY=your-gemini-key
ELEVENLABS_API_KEY=your-elevenlabs-key
# Google Cloud
GCS_BUCKET_NAME=your-bucket-name
GOOGLE_CLOUD_PROJECT=your-project-id
# Email
SENDGRID_API_KEY=your-sendgrid-key
# Monitoring
SENTRY_DSN=your-sentry-dsn
```
**Frontend** (`frontend/.env`):
```bash
VITE_API_URL=http://localhost:8000
VITE_SENTRY_DSN=your-sentry-dsn
VITE_ENVIRONMENT=development
```
### Google Cloud Setup
1. **Create GCP Project** with billing enabled
2. **Enable APIs**:
- Cloud Storage API
- Cloud Translation API
- Cloud Text-to-Speech API
- Vertex AI API (for Gemini)
- Secret Manager API
3. **Create Service Account** with roles:
- Storage Admin
- AI Platform Admin
- Secret Manager Admin
4. **Download JSON key** and set `GOOGLE_APPLICATION_CREDENTIALS`
## 🚢 Deployment Options
### Production Architecture (Google Cloud)
- **Frontend**: Cloud Storage + Cloud CDN (static hosting)
- **Backend API**: Cloud Run (serverless, auto-scaling)
- **Workers**: Cloud Run (Celery with Redis)
- **Database**: MongoDB Atlas (managed)
- **Queue**: Cloud Memorystore (Redis)
- **Storage**: Google Cloud Storage
- **Monitoring**: Cloud Monitoring + Sentry
### Docker Production
```bash
# Build production images
docker-compose -f docker-compose.prod.yml up -d
```
## 🔒 Security Features
### Implemented Security ✅
- **JWT Authentication**: Access (15min) + refresh (7 days) token rotation
- **RBAC System**: CLIENT/REVIEWER/ADMIN roles with endpoint protection
- **Secure Storage**: HttpOnly cookies for refresh tokens
- **File Security**: Signed URLs with 24h expiry, no client access to raw files
- **Input Validation**: Comprehensive Pydantic validation on all endpoints
- **Audit Logging**: Complete trail of all reviewer actions and system events
- **CORS Protection**: Configured for production domains
- **Rate Limiting**: Request throttling and validation middleware
## 🔧 API Documentation
### Key Endpoints Implemented
```
POST /api/v1/auth/login # Authentication
POST /api/v1/jobs # Create job with file upload
GET /api/v1/jobs # List jobs (filtered by role)
GET /api/v1/jobs/{id} # Job details with real-time status
POST /api/v1/jobs/{id}/actions/* # Workflow actions (approve/reject/complete)
GET /api/v1/jobs/{id}/vtt # VTT content retrieval
PATCH /api/v1/jobs/{id}/vtt # VTT editing and updates
GET /api/v1/jobs/{id}/downloads # Signed download URLs
WS /api/v1/ws/jobs/{id} # Real-time job status updates
```
**OpenAPI Documentation**: http://localhost:8000/docs
## 🎯 Development Status
### ✅ Completed (Production Ready)
- **User Management**: Full authentication, RBAC, password management
- **Job Pipeline**: Complete video processing workflow with state machine
- **Quality Control**: VTT editor, approval workflows, reviewer dashboards
- **Real-time Features**: WebSocket updates, live notifications
- **Multi-language**: Translation pipeline with cultural transcreation
- **File Management**: Secure uploads, downloads, asset validation
- **Admin Features**: User management, system monitoring, audit logs
### ⚠️ Needs Attention (Minor)
- **Integration Tests**: Framework exists but needs completion
- **Email Templates**: Service implemented, templates may need customization
- **Performance Testing**: No load testing implemented yet
- **Documentation**: API docs complete, user guides could be enhanced
### 🎯 Recommended Next Steps
1. **Complete integration test suite** for end-to-end validation
2. **Performance testing** with realistic video processing loads
3. **Production deployment** configuration and CI/CD pipeline
4. **User documentation** and training materials
5. **Monitoring dashboards** for production operations
## 📚 Development Resources
- **Complete Specification**: `video_accessibility_development_plan.txt`
- **Development Guidelines**: `CLAUDE.md`
- **API Documentation**: http://localhost:8000/docs (when running)
- **Test Coverage Reports**: `backend/htmlcov/` (after running tests)

View file

@ -0,0 +1,94 @@
{
"permissions": {
"allow": [
"WebSearch",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/backend && ruff check app/services/elevenlabs_voices.py app/services/tts.py app/api/v1/routes_tts.py app/models/job.py app/tasks/tts_synthesis.py app/core/config.py 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/backend && python -m ruff check app/services/elevenlabs_voices.py app/services/tts.py app/api/v1/routes_tts.py app/models/job.py app/tasks/tts_synthesis.py app/core/config.py 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/backend && pip3 show ruff 2>&1 | head -5; which pip3 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/frontend && npm run type-check 2>&1 | tail -20)",
"Bash(node_modules/.bin/tsc --noEmit 2>&1 | tail -20)",
"Bash(./node_modules/.bin/tsc --noEmit 2>&1 | tail -30)",
"Bash(npm run type-check 2>&1)",
"Bash(cd /Volumes/SSD/Projects/Oliver/video-accessibility/frontend && npm run type-check 2>&1)",
"Bash(npm run lint 2>&1)",
"WebFetch(domain:dcmp.org)",
"WebFetch(domain:www.w3.org)",
"WebFetch(domain:partnerhelp.netflixstudios.com)",
"WebFetch(domain:m.media-amazon.com)",
"WebFetch(domain:www.acb.org)",
"Bash(./node_modules/.bin/tsc --noEmit)",
"Bash(node_modules/.bin/tsc --noEmit)",
"Bash(pandoc --version)",
"WebFetch(domain:ai-sandbox.oliver.solutions)",
"Bash(gcloud run:*)",
"Bash(gcloud logging:*)",
"Bash(ssh optical:*)",
"Bash(/Volumes/SSD/Projects/Oliver/video-accessibility/backend/.venv/bin/python3.11 -c \"import sys; sys.path.insert\\(0, '.'\\); from app.models.user import UserRole; print\\([r.value for r in UserRole]\\)\")",
"Bash(npm list *)",
"Bash(brew list *)",
"Bash(npx --yes puppeteer --version)",
"Bash(node md_to_pdf.js)",
"Bash(npm root *)",
"Bash(node *)",
"Bash(ssh optical-web-1 *)",
"Bash(git *)",
"WebFetch(domain:docs.anthropic.com)",
"Bash(poetry lock *)",
"Bash(pip show *)",
"Read(//Users/ai_leed/.local/bin/**)",
"Read(//opt/homebrew/bin/**)",
"Bash(pip3 install *)",
"Bash(poetry --version)",
"Bash(docker run *)",
"Read(//Users/ai_leed/.docker/run/**)",
"Bash(docker context *)",
"Bash(DOCKER_HOST=unix:///var/run/docker.sock docker run --rm -v \"$\\(pwd\\):/app\" -w /app python:3.11-slim bash -c \"pip install poetry==1.8.2 -q && poetry lock --no-update\")",
"Bash(brew install *)",
"Bash(npm run *)",
"Bash(scp /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/models/audit_log.py optical:/tmp/audit_log.py)",
"Bash(scp *)",
"Bash(kill %1)",
"Bash(ssh optical-dev *)",
"Skill(fullstack-dev-skills:security-reviewer)",
"Bash(chmod +x *)",
"Bash(gcloud auth *)",
"Bash(gcloud config *)",
"Bash(gcloud artifacts *)",
"Bash(sed -n '190,200p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '1914,1922p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '2048,2062p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '2490,2502p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(sed -n '2628,2638p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/api/v1/routes_jobs.py)",
"Bash(gcloud builds submit *)",
"Bash(gcloud builds describe 79802b34-e17b-4446-b01d-68d99d569262 *)",
"Bash(gcloud compute instances list *)",
"Bash(gcloud compute networks vpc-access connectors list *)",
"Bash(gcloud builds *)",
"Bash(gcloud projects get-iam-policy optical-414516 *)",
"Bash(gcloud projects *)",
"Bash(npm audit *)",
"Skill(codebase-audit-suite:ln-622-build-auditor)",
"Skill(codebase-audit-suite:ln-624-code-quality-auditor)",
"Skill(codebase-audit-suite:ln-625-dependencies-auditor)",
"Skill(codebase-audit-suite:ln-626-dead-code-auditor)",
"Bash(/opt/homebrew/bin/ruff check *)",
"Bash(npm test *)",
"Bash(sed -n '35,42p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/src/test/utils.tsx)",
"Bash(sed -n '55,90p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/tests/helpers/auth.ts)",
"Bash(sed -n '48,60p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/src/components/Layout/Sidebar.tsx)",
"Bash(sed -n '152,170p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/frontend/src/components/Layout/Sidebar.tsx)",
"Bash(poetry env *)",
"Bash(poetry install *)",
"Bash(poetry run *)",
"Bash(docker info *)",
"Bash(sed -n '1,30p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/services/gcs.py)",
"Bash(sed -n '155,165p' /Users/ai_leed/Documents/Projects/Oliver/video-accessibility/backend/app/services/gcs.py)",
"Bash(gcloud secrets *)",
"Bash(openssl rand *)",
"Bash(ssh *)",
"Skill(commit-commands:commit-push-pr)",
"Bash(obsidian read *)",
"Bash(obsidian search *)"
]
}
}

View file

@ -10,6 +10,8 @@ REDIS_URL=redis://redis:6379/0
# JWT Authentication
JWT_SECRET_KEY=your-production-jwt-secret-key-min-32-chars
JWT_REFRESH_SECRET_KEY=your-production-refresh-secret-key-min-32-chars
# Required: admin account created on first boot. Unset = admin not seeded.
DEFAULT_ADMIN_PASSWORD=your-secure-admin-password
# AI Services
GEMINI_API_KEY=your-gemini-api-key
@ -19,8 +21,11 @@ ELEVENLABS_API_KEY=your-elevenlabs-api-key
GCS_BUCKET_NAME=your-production-bucket-name
GOOGLE_CLOUD_PROJECT=your-gcp-project-id
# Email Service
SENDGRID_API_KEY=your-sendgrid-api-key
# Email Service (Mailgun)
SENDGRID_API_KEY=
MAILGUN_API_KEY=your-mailgun-api-key
MAILGUN_DOMAIN=mg.oliver.solutions
MAILGUN_FROM=noreply@mg.oliver.solutions
# Monitoring
SENTRY_DSN=your-sentry-dsn-url

View file

@ -9,18 +9,18 @@
# App Configuration
# -----------------------------------------------------------------------------
APP_ENV=prod
API_BASE_URL=https://ai-sandbox.oliver.solutions/video-accessibility-back
API_BASE_URL=https://optical-dev.oliver.solutions/video-accessibility
# -----------------------------------------------------------------------------
# Authentication & Security
# -----------------------------------------------------------------------------
# IMPORTANT: Generate a secure random secret for JWT_SECRET
# Example: openssl rand -hex 32
JWT_SECRET=CHANGE_ME_TO_SECURE_RANDOM_64_CHAR_STRING
JWT_SECRET=d81fd31798510f53b374951908b6bedd75f7ddaabe9b4e4c4ca5bf81393f48b7
JWT_ALG=HS256
JWT_ACCESS_TTL_MIN=240
JWT_REFRESH_TTL_DAYS=7
COOKIE_DOMAIN=ai-sandbox.oliver.solutions
COOKIE_DOMAIN=optical-dev.oliver.solutions
COOKIE_SECURE=true
COOKIE_SAMESITE=Lax
@ -63,29 +63,31 @@ TRANSLATE_API_KEY=
ELEVENLABS_API_KEY=sk_c17be2768ca784f1807018420b84c7f1ee969946e698f986
# -----------------------------------------------------------------------------
# Email Configuration (SendGrid)
# Email Configuration (Mailgun)
# -----------------------------------------------------------------------------
# IMPORTANT: Get SendGrid API key from https://app.sendgrid.com/settings/api_keys
SENDGRID_API_KEY=
MAILGUN_API_KEY=1d8c6f38c53f237305353cc2e55f39f2-c6620443-4b9961f5
MAILGUN_DOMAIN=mg.oliver.solutions
MAILGUN_FROM=noreply@mg.oliver.solutions
# Email sender address (must be verified in SendGrid)
EMAIL_FROM=noreply@ai-sandbox.oliver.solutions
# Email sender address
EMAIL_FROM=noreply@mg.oliver.solutions
# Client-facing URL (used in emails)
CLIENT_BASE_URL=https://ai-sandbox.oliver.solutions/video-accessibility
CLIENT_BASE_URL=https://optical-dev.oliver.solutions/video-accessibility
# -----------------------------------------------------------------------------
# Microsoft Authentication (Azure AD)
# -----------------------------------------------------------------------------
AZURE_CLIENT_ID=9079054c-9620-4757-a256-23413042f1ef
AZURE_AUTHORITY=https://login.microsoftonline.com/e519c2e6-bc6d-4fdf-8d9c-923c2f002385
AZURE_REDIRECT_URI=https://ai-sandbox.oliver.solutions/video-accessibility/
AZURE_REDIRECT_URI=https://optical-dev.oliver.solutions/video-accessibility/
# -----------------------------------------------------------------------------
# CORS Configuration
# -----------------------------------------------------------------------------
# Comma-separated list of allowed origins
CORS_ORIGINS=https://ai-sandbox.oliver.solutions
CORS_ORIGINS=https://optical-dev.oliver.solutions
# -----------------------------------------------------------------------------
# Observability & Monitoring (Optional)
@ -116,6 +118,9 @@ OTEL_EXPORTER_OTLP_ENDPOINT=
WHISPER_SERVICE_URL=https://whisper-http-service-bcb6ipdqka-uc.a.run.app
FFMPEG_SERVICE_URL=https://ffmpeg-http-service-bcb6ipdqka-uc.a.run.app
# optical-dev uses Celery workers (not Cloud Run Jobs) for pipeline dispatch
USE_CELERY_FALLBACK=true
# Worker Concurrency (higher values for Cloud Run mode since workers just make HTTP calls)
WHISPER_WORKER_CONCURRENCY=10
FFMPEG_WORKER_CONCURRENCY=20

23
.env.screenshots.example Normal file
View file

@ -0,0 +1,23 @@
# Screenshot capture credentials — copy to .env.screenshots and fill in values
# NEVER commit .env.screenshots (it is gitignored)
BASE_URL=https://optical-dev.oliver.solutions/video-accessibility
# Local-password admin seeded by backend/scripts/seed_test_users.py
TEST_ADMIN_EMAIL=test-admin@oliver.agency
TEST_ADMIN_PASSWORD=TestAdmin2026!
TEST_CLIENT_EMAIL=test-client@oliver.agency
TEST_CLIENT_PASSWORD=TestClient2026!
TEST_LINGUIST_EMAIL=test-linguist@oliver.agency
TEST_LINGUIST_PASSWORD=TestLinguist2026!
TEST_REVIEWER_EMAIL=test-reviewer@oliver.agency
TEST_REVIEWER_PASSWORD=TestReviewer2026!
TEST_PRODUCTION_EMAIL=test-production@oliver.agency
TEST_PRODUCTION_PASSWORD=TestProduction2026!
TEST_PM_EMAIL=test-pm@oliver.agency
TEST_PM_PASSWORD=TestPM2026!

13
.gitignore vendored
View file

@ -12,6 +12,7 @@ examples/
.env.local
.env.production
.env.*.local
.env.screenshots
secrets/
*.pem
*.key
@ -98,3 +99,15 @@ docs/*.pdf
/var/www/html/video-accessibility.backup.*
backend/.env
# Node / npm artifacts at repo root (Playwright MCP installs these)
node_modules/
package.json
package-lock.json
# Playwright MCP session snapshots
.playwright-mcp/
# Test videos
test-video.mp4
.worktrees/

View file

@ -0,0 +1,118 @@
# Build Health Audit — ln-622
**Score: 5.5/10** | Issues: 28 (C:0 H:5 M:18 L:5)
**Date:** 2026-04-30 | **Stack:** Python 3.11 / FastAPI / Celery + React 19 / Vite / TypeScript 5.8
---
## 1. Compiler / Linter Errors
### Backend — ruff: 1314 errors (HIGH)
`ruff check app/` exits non-zero with 1314 violations. The ruff config in `pyproject.toml` uses **deprecated top-level `select`/`ignore`/`per-file-ignores`** instead of `[tool.ruff.lint]` — ruff emits a warning on every run.
Top violation codes:
| Code | Meaning | Volume |
|------|---------|--------|
| I001 | Import block unsorted | ~400 |
| UP | pyupgrade (f-strings, typing aliases) | ~500 |
| B | flake8-bugbear | ~200 |
| F401 | Unused import | 58 |
Most violations are **auto-fixable** (`ruff check --fix`). The unsorted imports and UP rules are cosmetic but make CI noisy and block future enforcement.
**Severity: HIGH** — CI cannot gate on ruff without fixing this first.
### Frontend — ESLint: 36 problems (30 errors, 6 warnings) (MEDIUM)
Key errors:
| File | Rule | Count |
|------|------|-------|
| `contexts/GlobalWebSocketContext.tsx:56` | `react-refresh/only-export-components` | 1 |
| `contexts/NotificationContext.tsx:91` | `react-refresh/only-export-components` | 1 |
| `contexts/ToastContext.tsx:83` | `react-refresh/only-export-components` | 1 |
| `lib/api.ts:539` | `@typescript-eslint/no-explicit-any` | 1 |
| `routes/admin/QCDetail.tsx` | `@typescript-eslint/no-explicit-any` | 6 |
| `routes/AcceptInvite.tsx` | `@typescript-eslint/no-explicit-any` | 1 |
| `routes/jobs/JobDetail.tsx` | `no-unused-vars` (err catch) | 2 |
| `hooks/__tests__/useJob.test.tsx` | `no-unused-vars` | 1 |
| `tests/helpers/auth.ts` | `no-explicit-any` | 3 |
**Severity: MEDIUM** — build succeeds, but `any` types and react-refresh errors degrade DX and HMR.
---
## 2. Type Errors
### Frontend — tsc: CLEAN ✓
`tsc --noEmit` exits 0. No TypeScript compilation errors. The `any` issues above are ESLint-level, not tsc errors.
### Backend — mypy: NOT RUN
Cannot run mypy outside the poetry venv. Needs `poetry run mypy .` inside Docker or an activated venv.
**Severity: LOW** (mypy not blocking, but should be run in CI)
---
## 3. Tests
### Frontend — vitest: 13 failed / 75 total (HIGH)
8 test files affected:
| Test | Failures | Root cause |
|------|----------|-----------|
| `auth.test.ts` | 1 | Mock shape mismatch — response has extra field `organizationId` |
| `StatusBadge.test.tsx` | 1 | Unknown status no longer renders text (component changed) |
| `VttEditor.test.tsx` | 1 | Multiple elements found for `Insert cue before` title — DOM duplication |
| `useJob.test.tsx` | 3 | `useApproveEnglish` — pending state never resolves in test (timeout 1s); `useCreateJob` arg mismatch |
| `UploadDropzone.test.tsx` | 6 | Text broken across elements — test uses exact string match, component renders in `<span>` nodes |
| `useJobStatusWebSocket.test.tsx` | 1 | (see output) |
**Severity: HIGH** — 17% test failure rate. Several are stale tests from component refactors (UploadDropzone, StatusBadge).
### Backend — pytest: CANNOT RUN (CRITICAL)
Running `pytest` outside poetry venv fails with `ModuleNotFoundError` for `fastapi`, `aiohttp`, etc. Tests must be run with `poetry run pytest` inside Docker or an activated poetry environment.
The `backend/.venv` exists but appears to be a plain venv, not the poetry-managed one. **Tests are effectively unrunnable in local dev without explicit poetry activation.**
**Severity: CRITICAL** — Developers with system Python cannot run tests without explicit setup steps.
---
## 4. Build Configuration Issues
### ruff config deprecated (MEDIUM)
`pyproject.toml` uses `[tool.ruff]` top-level `select`, `ignore`, `per-file-ignores`. Current ruff ≥ 0.2 expects `[tool.ruff.lint]`. Fix:
```toml
# Before
[tool.ruff]
select = ["E", "W", ...]
ignore = ["E501", ...]
# After
[tool.ruff]
target-version = "py311"
line-length = 88
[tool.ruff.lint]
select = ["E", "W", ...]
ignore = ["E501", ...]
```
### Backend venv mismatch (MEDIUM)
`backend/.venv` cannot run `ruff`, `pytest`, or `mypy` — they are installed in the poetry-managed venv, not this one. Confusing to new devs.
### AGENTS.md commands incorrect (LOW)
`AGENTS.md` documents `cd backend && poetry run pytest` but the backend has `.venv` and `pyproject.toml` with no Makefile wrapper. The actual working path is `cd backend && .venv/bin/python -m pytest` or requires `poetry shell`.
---
## Summary
| Check | Result | Severity |
|-------|--------|---------|
| ruff backend | 1314 violations (auto-fixable) | HIGH |
| ESLint frontend | 36 problems | MEDIUM |
| tsc frontend | ✓ Clean | OK |
| mypy backend | Not runnable locally | LOW |
| vitest frontend | 13/75 failing | HIGH |
| pytest backend | Not runnable locally | CRITICAL |
| ruff config | Deprecated syntax | MEDIUM |
| venv setup | Confusing / broken | MEDIUM |

View file

@ -0,0 +1,116 @@
# Code Quality Audit — ln-624
**Score: 5.0/10** | Issues: 22 (C:2 H:8 M:9 L:3)
**Date:** 2026-04-30
---
## 1. God Classes / Files (> 500 lines)
| File | Lines | Severity |
|------|-------|---------|
| `backend/app/api/v1/routes_jobs.py` | 2882 | **CRITICAL** |
| `frontend/src/routes/admin/QCDetail.tsx` | 2079 | **CRITICAL** |
| `backend/app/services/video_renderer.py` | 1695 | **HIGH** |
| `frontend/src/routes/jobs/JobsList.tsx` | 1246 | **HIGH** |
| `frontend/src/lib/api.ts` | 1056 | **HIGH** |
| `backend/app/tasks/translate_and_synthesize.py` | 1019 | **HIGH** |
| `frontend/src/routes/jobs/NewJob.tsx` | 1038 | **HIGH** |
| `frontend/src/types/api.ts` | 891 | **MEDIUM** |
| `frontend/src/routes/jobs/JobDetail.tsx` | 732 | **MEDIUM** |
| `frontend/src/routes/admin/UserDetail.tsx` | 523 | **MEDIUM** |
| `frontend/src/hooks/useJobStatusWebSocket.ts` | 443 | **MEDIUM** |
**routes_jobs.py at 2882 lines** is the worst offender — it mixes upload, approval, translation, TTS, VTT editing, download, admin, and websocket concerns in a single router. Splitting by domain (e.g., `routes_upload.py`, `routes_vtt.py`, `routes_review.py`, `routes_tts.py`) would bring each under 500 lines.
**QCDetail.tsx at 2079 lines** handles the entire QC workflow, VTT display, audio preview, language selection, and approval modals in one component. Needs extraction of at minimum: `LanguageQCPanel`, `VttReviewView`, `ApprovalModal`.
---
## 2. Long Methods (> 100 lines)
| File:line | Function | Length | Severity |
|-----------|---------|--------|---------|
| `tasks/translate_and_synthesize.py:109` | `_async_translate_and_synthesize()` | 485 lines | **CRITICAL** |
| `services/video_renderer.py:487` | `_render_pause_insert_method()` | 419 lines | **CRITICAL** |
| `tasks/ingest_and_ai.py:53` | `ingest_and_ai_task_impl()` | 276 lines | **HIGH** |
| `tasks/rerender_accessible_video.py:110` | `_async_rerender_accessible_video()` | 280 lines | **HIGH** |
| `tasks/render_accessible_video.py:56` | `_async_render_accessible_video()` | 287 lines | **HIGH** |
| `api/v1/routes_jobs.py:1552` | `update_job_vtt_content()` | 215 lines | **HIGH** |
| `tasks/notify.py:29` | `run_async()` | 169 lines | **HIGH** |
| `api/v1/routes_jobs.py:2738` | `update_tts_preferences()` | 144 lines | **MEDIUM** |
| `services/whisper_service.py:241` | `_find_sentence_boundaries()` | 120 lines | **MEDIUM** |
| `services/gemini.py:591` | `analyze_accessible_video_placement()` | 132 lines | **MEDIUM** |
The two most critical ones (`_async_translate_and_synthesize` at 485 lines and `_render_pause_insert_method` at 419 lines) are orchestrator-style functions with sequential pipeline steps. They could be split into named pipeline stages, each ~50 lines.
---
## 3. Deep Nesting
Not systematically scanned with a tool (radon/lizard not installed). The long functions above likely contain 45+ nesting levels given their complexity.
---
## 4. Too Many Parameters
| Location | Function | Params | Severity |
|----------|---------|--------|---------|
| `services/gemini.py` | `extract_accessibility_targeted()` | 7+ | **MEDIUM** |
| `tasks/translate_and_synthesize.py` | `_generate_language_tts()` | 8+ | **MEDIUM** |
Pattern: many functions pass `db`, `job`, `language`, `settings`, `gcs_client`, etc. individually instead of grouping into a context dataclass.
---
## 5. Magic Numbers
### Backend (MEDIUM)
Scattered timing constants without named definitions:
- TTS retry delays (hardcoded seconds)
- chunk sizes in upload
- Audio padding values in video_renderer.py
### Frontend (LOW)
Mostly clean. Some inline pixel values in Tailwind (acceptable). No concerning business-logic magic numbers found.
---
## 6. N+1 Query Patterns (MEDIUM)
Potential N+1 patterns found:
- `app/main.py:102``async for job_doc in db.jobs.find(...)` — check if this iterates and makes additional queries per document
- `app/core/dependencies.py:185``async for m in db.memberships.find(...)` — membership lookup per request in auth middleware (acceptable if cached, but no caching observed)
- `app/core/authz.py:54``async for doc in db.memberships.find(...)` — similar pattern in auth check
These are all async iterators over `find()` — not necessarily N+1 if no nested DB calls, but should be reviewed for `.find()` calls inside the loop body.
---
## 7. Method Signature Quality
### Boolean flag parameters (MEDIUM)
Several async functions in tasks accept `bool` flags controlling behavior variants (e.g., `skip_tts`, `force_regenerate`). These should be enums or separate functions.
### Unclear return types (MEDIUM)
Some routes return `dict` or untyped responses instead of Pydantic response models. `routes_admin_production.py` has a few endpoints returning bare dicts.
---
## 8. Side-Effect Cascade Depth
`_async_translate_and_synthesize()` at 485 lines is the worst case: it writes to GCS, updates MongoDB, dispatches TTS tasks, sends notifications, and updates job status — 5+ distinct side-effect categories from a single function call. This warrants extraction into an orchestrator that delegates to named sink functions.
---
## Summary
| Check | Status | Severity |
|-------|--------|---------|
| God files (>500L) | 11 files | CRITICAL×2, HIGH×4 |
| Long methods (>100L) | 10 functions | CRITICAL×2, HIGH×5 |
| N+1 patterns | 3 potential | MEDIUM |
| Magic numbers | Some in tasks | MEDIUM |
| Method signatures | Boolean flags, unclear returns | MEDIUM |
| Side-effect cascade | translate_and_synthesize | HIGH |
**Primary recommendation:** Split `routes_jobs.py` and `QCDetail.tsx` — these two files account for the majority of the quality debt.

View file

@ -0,0 +1,94 @@
# Dependencies & Reuse Audit — ln-625
**Score: 7.5/10** | Issues: 9 (C:0 H:2 M:5 L:2)
**Date:** 2026-04-30
---
## 1. Vulnerability Scan (CVE/CVSS)
### Frontend — npm audit: ✓ CLEAN
```
Total packages: 479
Vulnerabilities: info:0 low:0 moderate:0 high:0 critical:0 total:0
```
Zero CVEs. Excellent.
### Backend — pip-audit: NOT RUN
`pip-audit` not installed in local env. Recommended to add to CI:
```bash
pip install pip-audit && pip-audit -r requirements.txt
```
Given many heavy deps (Celery 5.3, google-cloud-*, faster-whisper, aiohttp), a CI scan is strongly advised.
---
## 2. Outdated Packages
### Frontend — npm outdated (many minor/major updates pending)
**MAJOR version gaps (HIGH):**
| Package | Installed | Latest | Notes |
|---------|-----------|--------|-------|
| `@azure/msal-browser` | 4.25.0 | **5.9.0** | MSAL v5 has breaking API changes |
| `@azure/msal-react` | 3.0.20 | **5.3.2** | Paired with msal-browser, coordinated upgrade needed |
| `@sentry/react` | 8.55.0 | **10.51.0** | Sentry v10 has breaking changes |
| `typescript` | 5.8.3 | **6.0.3** | TS 6 has strictness changes |
| `vite` | 7.3.2 | **8.0.10** | Vite 8 breaking changes |
| `eslint` | 9.33.0 | **10.2.1** | ESLint 10 config format may change |
| `jsdom` | 26.1.0 | **29.1.1** | Test environment |
**Minor updates (LOW-MEDIUM):** Most other packages have minor/patch updates pending (react 19.1→19.2, tailwindcss 4.1→4.2, etc.)
**Recommendation:** Keep MSAL and Sentry on current major until dedicated upgrade sprint. React, TailwindCSS, react-query minor updates are safe to apply immediately.
### Backend — pip outdated: pip-audit not available
Based on pyproject.toml dates vs ecosystem:
- `ruff ^0.1.6` → installed ruff is `0.15.12` (already updated, good)
- `google-genai ^1.56.0` → recently updated per git log
- `faster-whisper ^1.2.0` → check for 1.x updates
---
## 3. Unused Dependencies
### Backend — `sendgrid` (MEDIUM)
`pyproject.toml` lists `sendgrid = "^6.11.0"`. However:
- The actual emailer (`app/services/emailer.py`) uses **Mailgun** REST API via `httpx`
- `sendgrid` is referenced **only** in `app/core/config.py` as a dead config field `sendgrid_api_key: str = ""` with comment `# Email (Mailgun — primary; sendgrid_api_key kept for backward compat)`
- No `import sendgrid` anywhere in app code
**Action:** Remove `sendgrid` from `pyproject.toml` dependencies and remove the `sendgrid_api_key` config field.
### Frontend — no unused dependencies found
- `axios` → used in `lib/api.ts`
- `@azure/msal-*` → used in `main.tsx`, `routes/Login.tsx`
- `date-fns` → used in 5+ components
- `zustand`, `@tanstack/react-query`, `react-hook-form`, `zod` → all actively used
- `react-dropzone` → used in upload components
---
## 4. Available Native Alternatives
### Frontend — axios vs fetch (LOW)
`axios` is used for all API calls in `lib/api.ts`. The project targets modern browsers and uses Vite. Native `fetch` + `AbortController` could replace axios, reducing bundle by ~14kb gzipped. However, axios provides request/response interceptors that are actively used for auth token refresh — migration effort is medium. **Not urgent.**
---
## 5. Custom Implementations
No custom crypto or hand-rolled validation libraries found. All auth uses `python-jose` + `libpass` (bcrypt). VTT parsing is domain-specific and not replaceable by a library. No concerns.
---
## Summary
| Check | Result | Severity |
|-------|--------|---------|
| Frontend CVEs | ✓ 0 vulnerabilities | OK |
| Backend CVEs | ⚠ Not scanned | MEDIUM |
| Frontend major updates | MSAL×2, Sentry, TS, Vite, ESLint | HIGH |
| Frontend minor updates | Many | LOW |
| Backend unused dep | `sendgrid` in pyproject.toml | MEDIUM |
| Native alternatives | axios → fetch possible | LOW |
| Custom implementations | None found | OK |

View file

@ -0,0 +1,143 @@
# Dead Code Audit — ln-626
**Score: 7.0/10** | Issues: 14 (C:0 H:0 M:6 L:8)
**Date:** 2026-04-30
---
## 1. Unused Imports (Python — F401)
ruff detected **58 unused import violations** across backend. Sample:
| File | Unused import |
|------|--------------|
| `routes_admin.py:9` | `get_current_user` |
| `routes_admin.py:11` | `verify_password` |
| `routes_admin.py:16` | `ChangePasswordRequest` |
| `routes_admin.py:23` | `log_security_event` |
| (+ 54 more across all files) | |
All are auto-fixable with `ruff check --fix --select F401`. The `__init__.py` files are correctly excluded via `per-file-ignores`.
**Severity: MEDIUM** — clutters imports, increases cognitive load when reading files.
---
## 2. Deprecated / Legacy Types (Frontend)
`frontend/src/types/api.ts` contains 3 deprecated exported types with JSDoc markers:
| Line | Type | Marker |
|------|------|--------|
| 96 | `TtsVoicesResponse` | `@deprecated Use ProviderVoicesResponse instead` |
| 137 | `TtsOptionsResponse` | `@deprecated Use ProviderOptionsResponse instead` |
| 555-566 | `Client` / `OrganizationLegacy` | `@deprecated Use Organization instead` + `export { Client as OrganizationLegacy }` |
These types are still exported, meaning consumers could use them by mistake. If no external consumers exist (library not published), they should be deleted.
**Severity: MEDIUM** — active deprecation markers indicate intent to remove. Leaving them causes confusion.
---
## 3. Legacy Status Values (Frontend)
`frontend/src/types/api.ts:12,14`:
```ts
| "tts_failed" // legacy: keep for back-compat
| "render_failed" // legacy: keep for back-compat
```
These job statuses are marked as legacy. If the backend no longer emits them, they are dead type branches. If it still does (for old jobs in MongoDB), they're valid — but should be clearly documented with a removal condition.
**Severity: LOW** — no runtime impact, but requires clarification.
---
## 4. Backward Compatibility Code (Frontend)
### lib/api.ts:239 — Legacy approval method (MEDIUM)
```ts
// Legacy method - calls approve_source for backwards compatibility
```
A backward-compat shim in the API client. If all callers have been updated to the new method, this should be removed.
### VideoWithCaptions.tsx:1643 — Legacy single-language props (MEDIUM)
```ts
// Legacy single-language props (still supported)
sourceLanguage?: string; // Language code for legacy props
// Legacy props
// Combine legacy props with tracks (use useMemo to prevent recreation)
```
The component maintains backward-compat with old single-language prop API. If no callers use these legacy props, they can be removed.
### JobDetail.tsx:41 — Legacy status mapping (LOW)
```ts
// Handle legacy approved_english/approved_source statuses (map to pending_final_review)
```
Status mapping shim for old job records. Should be removed after all existing jobs are migrated.
---
## 5. Commented-Out Code (Backend)
| File | Line | Content |
|------|------|---------|
| `telemetry/tracing.py:5` | `# from opentelemetry.exporter.gcp.trace import CloudTraceSpanExporter # Disabled for local dev` | GCP trace exporter disabled |
| `telemetry/metrics.py:5` | `# from opentelemetry.exporter.prometheus import PrometheusMetricReader # Disabled for local dev` | Prometheus reader disabled |
| `pyproject.toml` | `# opentelemetry-exporter-prometheus = ... # Temporarily disabled - version conflicts` | Dep commented out |
These are intentional (local dev vs prod config), not dead code. However, the conditional should be expressed via environment config, not source comments. **Low priority.**
**Severity: LOW**
---
## 6. Leftover .old Files (MEDIUM)
| File | Age | Action |
|------|-----|--------|
| `docker-compose.yml.old` | Created 2026-03-03 (~2 months) | Delete |
| `backend/Dockerfile.old` | Created 2026-03-03 (~2 months) | Delete |
| `backend/.dockerignore.old` | — | Delete |
These files have no build references. Git history preserves them.
---
## 7. Unused Dockerfiles
| File | Referenced in compose? |
|------|----------------------|
| `backend/Dockerfile.ffmpeg-service` | No — ffmpeg is embedded in main worker |
| `backend/Dockerfile.cloudrun` | Yes — referenced for Cloud Run deploys |
| `backend/Dockerfile.whisper-service` | Yes — whisper-worker service in compose |
`Dockerfile.ffmpeg-service` appears to be dead — the main Dockerfile handles ffmpeg. Should be confirmed and deleted if unused.
**Severity: LOW**
---
## 8. Dead Config Field
`backend/app/core/config.py:272`:
```python
sendgrid_api_key: str = "" # Email (Mailgun — primary; sendgrid_api_key kept for backward compat)
```
`sendgrid` package not used. Config field and `secrets_config.py` secret reference both dead.
**Severity: MEDIUM** — misleads ops into configuring a sendgrid secret that has no effect.
---
## Summary
| Check | Issues | Severity |
|-------|--------|---------|
| Unused Python imports | 58 (auto-fixable) | MEDIUM |
| Deprecated TS types | 3 types | MEDIUM |
| Backward-compat shims | 3 in frontend | MEDIUM |
| Commented-out code | 3 telemetry lines | LOW |
| .old files | 3 files | MEDIUM |
| Unused Dockerfile | Dockerfile.ffmpeg-service | LOW |
| Dead config field | sendgrid_api_key | MEDIUM |
| Legacy status values | 2 status strings | LOW |

97
AGENTS.md Normal file
View file

@ -0,0 +1,97 @@
# Accessible Video Processing Platform — Project Entry Point
<!-- SCOPE: root | owner: ln-111 | generated: 2026-04-29 -->
## What Is This Project
AI-powered SaaS platform that generates legally-required accessibility assets from video files: closed captions, audio descriptions, SDH captions, and descriptive transcripts. Outputs are reviewed through a human QC workflow before client delivery. 50+ language translation and cultural transcreation are built in.
**Client:** Oliver Internal
**Server:** optical-web-1
**Status:** 85% production-ready
---
## Quick Navigation
| Need | Go to |
|------|-------|
| Architecture, data flow, state machine | [docs/project/architecture.md](docs/project/architecture.md) |
| Tech stack versions and config | [docs/project/tech_stack.md](docs/project/tech_stack.md) |
| API endpoint reference | [docs/project/api_spec.md](docs/project/api_spec.md) |
| Database collections and indexes | [docs/project/database_schema.md](docs/project/database_schema.md) |
| Infrastructure inventory | [docs/project/infrastructure.md](docs/project/infrastructure.md) |
| Runbook — deploy, restart, rollback | [docs/project/runbook.md](docs/project/runbook.md) |
| Functional requirements | [docs/project/requirements.md](docs/project/requirements.md) |
| Development principles | [docs/principles.md](docs/principles.md) |
| Reference — ADRs, guides, research | [docs/reference/README.md](docs/reference/README.md) |
| Task management | [docs/tasks/README.md](docs/tasks/README.md) |
| Test strategy and commands | [tests/README.md](tests/README.md) |
| Documentation hub | [docs/README.md](docs/README.md) |
---
## Entry Points by Audience
| Audience | Start here |
|----------|-----------|
| New developer | [docs/project/runbook.md](docs/project/runbook.md) → local setup section |
| Reviewer / QC | [docs/project/requirements.md](docs/project/requirements.md) → QC workflow section |
| DevOps | [docs/project/infrastructure.md](docs/project/infrastructure.md) + [docs/project/runbook.md](docs/project/runbook.md) |
| Security reviewer | [docs/project/architecture.md](docs/project/architecture.md) → security section |
| AI agent | Read this file → pick topic → read `_index`-equivalent doc → synthesize |
---
## Core Pipeline (one-line summary per stage)
| Stage | What happens | Key file |
|-------|-------------|---------|
| Upload | MP4 → GCS + MongoDB job record | `routes_files.py` |
| Ingestion | Celery worker transcribes with Gemini 2.5 Pro | `tasks/ingest_and_ai.py` |
| AI Processing | VTT generated, validated, stored in GCS | `services/gemini.py` |
| QC Review | Reviewer edits VTT, approves or rejects | `services/language_qc.py` |
| Translation | Google Translate + transcreation per language | `tasks/translate_and_synthesize.py` |
| TTS | Per-cue audio synthesis (Google TTS / ElevenLabs) | `services/tts.py` |
| Final Review | PM approves deliverables | `routes_language_qc.py` |
| Delivery | Signed GCS URLs emailed to client | `services/emailer.py` |
See full state machine (16 states) in [docs/project/architecture.md](docs/project/architecture.md#job-state-machine).
---
## Development Commands
| Action | Command |
|--------|---------|
| Start local (Docker + Vite) | `./scripts/run-local.sh` |
| Rebuild after code change | `./scripts/run-local.sh --rebuild` |
| Stop all local services | `./scripts/run-local.sh --stop` |
| Backend lint | `cd backend && ruff check .` |
| Backend type-check | `cd backend && mypy .` (run in Docker container) |
| Frontend lint | `cd frontend && npm run lint` |
| Frontend type-check | `cd frontend && npm run type-check` |
| Backend tests | `cd backend && poetry run pytest` |
| Frontend tests | `cd frontend && npm run test` |
| E2E tests | `cd frontend && npm run test:e2e` |
---
## Key Constraints
- **NO SSH to optical-web-1** without explicit user instruction — hard rule in CLAUDE.md
- **Access tokens in memory only** (not localStorage) — auth architecture constraint
- **Refresh tokens in HttpOnly cookies** — security requirement
- **Signed GCS URLs** expire in 24h — do not cache or store URLs
- **RBAC enforced server-side** — never trust client-supplied role claims
- **All reviewer actions emit audit log entries** — compliance requirement
---
## Maintenance
**Update triggers:** New route added, deployment target changes, key dependency version change, new team member onboarded.
**Verification:** All links in Quick Navigation resolve. Entry commands are correct against current scripts/.
<!-- END SCOPE: root -->

View file

@ -1,5 +1,8 @@
# Accessible Video Processing Platform - Development Guide
<!-- Documentation entry point: see @AGENTS.md for full project navigation -->
@AGENTS.md
## Project Overview
This is a comprehensive video accessibility platform that automatically generates closed captions and audio descriptions using AI, with quality control workflows and multi-language support.

Binary file not shown.

View file

@ -2,6 +2,8 @@
A comprehensive AI-powered platform for generating accessible video content with closed captions, audio descriptions, and multi-language translations. Features a complete workflow from video upload to final delivery with quality control processes.
**Documentation:** See [AGENTS.md](AGENTS.md) for full navigation, or [docs/README.md](docs/README.md) for the documentation hub.
## ✅ Current Status: **Production-Ready** (85% Complete)
**Lines of Code:** 20,471 total (12,198 backend + 8,273 frontend)

View file

@ -1,172 +1,96 @@
# =============================================================================
# Apache Configuration for Accessible Video Platform
# =============================================================================
# Add this configuration to your existing VirtualHost for ai-sandbox.oliver.solutions
# Location: /etc/apache2/sites-available/ai-sandbox.oliver.solutions-ssl.conf
# Apache config fragment — Accessible Video Platform
# Inject into: /etc/apache2/sites-available/optical-dev.oliver.solutions-ssl.conf
#
# Required modules:
# sudo a2enmod proxy proxy_http proxy_wstunnel rewrite headers
#
# Container port map:
# accessible-video-api → 0.0.0.0:8012->8000/tcp
# =============================================================================
# -----------------------------------------------------------------------------
# Frontend - Static React SPA served from subdirectory
# -----------------------------------------------------------------------------
# ── Timeouts for large video uploads (up to 2 GB, ~10 min) ──────────────────
<IfModule mod_proxy.c>
ProxyTimeout 600
</IfModule>
# Serve frontend from /video-accessibility subdirectory
# ── WebSocket proxy (MUST be before /api/ HTTP proxy) ───────────────────────
# disablereuse=on prevents long-lived WS connections from exhausting the pool
ProxyPassMatch ^/video-accessibility/api/v1/ws/(.*)$ ws://127.0.0.1:8012/api/v1/ws/$1 disablereuse=on
ProxyPassReverse /video-accessibility/api/v1/ws/ ws://127.0.0.1:8012/api/v1/ws/
# ── API proxy ────────────────────────────────────────────────────────────────
# Strips /video-accessibility prefix — FastAPI sees /api/v1/...
ProxyPassMatch ^/video-accessibility/api/(.*)$ http://127.0.0.1:8012/api/$1
ProxyPassReverse /video-accessibility/api/ http://127.0.0.1:8012/api/
# Swagger / OpenAPI
ProxyPassMatch ^/video-accessibility/docs(/.*)?$ http://127.0.0.1:8012/docs$1
ProxyPassReverse /video-accessibility/docs http://127.0.0.1:8012/docs
ProxyPassMatch ^/video-accessibility/openapi\.json$ http://127.0.0.1:8012/openapi.json
ProxyPassReverse /video-accessibility/openapi.json http://127.0.0.1:8012/openapi.json
# ── SPA static files ─────────────────────────────────────────────────────────
Alias /video-accessibility /var/www/html/video-accessibility
<Directory /var/www/html/video-accessibility>
# Basic options
Options -Indexes +FollowSymLinks
AllowOverride All
AllowOverride None
Require all granted
# React SPA routing - rewrite all requests to index.html
# Allow video uploads up to 2 GB
LimitRequestBody 2147483648
RewriteEngine On
RewriteBase /video-accessibility
RewriteBase /video-accessibility/
# Don't rewrite files or directories that exist
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# Serve real files/directories directly (JS, CSS, assets, fonts)
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ - [L]
# Rewrite everything else to index.html
RewriteRule ^ /video-accessibility/index.html [L]
# Everything else → index.html (React Router handles client-side nav)
RewriteRule ^ index.html [L]
# Security headers
Header always set X-Frame-Options "SAMEORIGIN"
Header always set X-Content-Type-Options "nosniff"
Header always set X-XSS-Protection "1; mode=block"
Header always set Referrer-Policy "strict-origin-when-cross-origin"
# Cache control for static assets
<FilesMatch "\.(js|css|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$">
# Cache-bust hashed assets indefinitely; never cache HTML
<FilesMatch "\.(js|css|woff2?|ttf|eot|png|jpg|jpeg|gif|ico|svg)$">
Header set Cache-Control "public, max-age=31536000, immutable"
</FilesMatch>
# No cache for HTML files
<FilesMatch "\.(html)$">
<FilesMatch "\.html$">
Header set Cache-Control "no-cache, no-store, must-revalidate"
Header set Pragma "no-cache"
Header set Expires "0"
</FilesMatch>
</Directory>
# -----------------------------------------------------------------------------
# Backend API - Reverse proxy to Docker container
# -----------------------------------------------------------------------------
# Proxy backend API to Docker container on port 8000
<Location /video-accessibility-back>
# Preserve original host header
ProxyPreserveHost On
# Proxy HTTP requests
ProxyPass http://localhost:8000
ProxyPassReverse http://localhost:8000
# Proxy timeout settings (important for long-running video processing)
ProxyTimeout 300
# WebSocket support (CRITICAL for real-time job updates)
RewriteEngine On
RewriteCond %{HTTP:Upgrade} =websocket [NC]
RewriteRule /video-accessibility-back/(.*) ws://localhost:8000/$1 [P,L]
RewriteCond %{HTTP:Upgrade} !=websocket [NC]
RewriteRule /video-accessibility-back/(.*) http://localhost:8000/$1 [P,L]
# Security headers
Header always set X-Frame-Options "SAMEORIGIN"
Header always set X-Content-Type-Options "nosniff"
# CORS is handled by the backend, don't add headers here
</Location>
# -----------------------------------------------------------------------------
# Required Apache Modules
# -----------------------------------------------------------------------------
# Enable these modules with:
# sudo a2enmod rewrite
# sudo a2enmod proxy
# sudo a2enmod proxy_http
# sudo a2enmod proxy_wstunnel
# sudo a2enmod headers
# sudo systemctl restart apache2
# Verify modules are enabled:
# apache2ctl -M | grep -E '(rewrite|proxy|headers)'
Header always set Referrer-Policy "strict-origin-when-cross-origin"
</Directory>
# =============================================================================
# Full VirtualHost Example
# Full VirtualHost skeleton (reference — values match optical-web-1)
# =============================================================================
# Example of complete VirtualHost configuration:
#
# <VirtualHost *:443>
# ServerName ai-sandbox.oliver.solutions
# ServerAdmin admin@oliver.solutions
#
# ServerName optical-dev.oliver.solutions
# DocumentRoot /var/www/html
#
# # SSL Configuration (with wildcard cert)
# SSLEngine on
# SSLCertificateFile /path/to/wildcard-ai-sandbox.oliver.solutions.crt
# SSLCertificateKeyFile /path/to/wildcard-ai-sandbox.oliver.solutions.key
# SSLCertificateChainFile /path/to/chain.crt # If needed
# SSLCertificateFile /path/to/wildcard.crt
# SSLCertificateKeyFile /path/to/wildcard.key
#
# # SSL Protocol and Cipher settings
# SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1
# SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1
# SSLCipherSuite HIGH:!aNULL:!MD5
#
# # Frontend configuration (from above)
# Alias /video-accessibility /var/www/html/video-accessibility
# <Directory /var/www/html/video-accessibility>
# ...
# </Directory>
# # — paste the block above here —
#
# # Backend API configuration (from above)
# <Location /video-accessibility-back>
# ...
# </Location>
#
# # Logging
# ErrorLog ${APACHE_LOG_DIR}/ai-sandbox-error.log
# CustomLog ${APACHE_LOG_DIR}/ai-sandbox-access.log combined
# ErrorLog ${APACHE_LOG_DIR}/optical-dev-error.log
# CustomLog ${APACHE_LOG_DIR}/optical-dev-access.log combined
# </VirtualHost>
# =============================================================================
# Testing & Verification
# Verify
# =============================================================================
# Test Apache configuration:
# sudo apache2ctl configtest
#
# Restart Apache:
# sudo systemctl restart apache2
#
# Test frontend:
# curl -I https://ai-sandbox.oliver.solutions/video-accessibility
#
# Test backend:
# curl https://ai-sandbox.oliver.solutions/video-accessibility-back/health
#
# Test WebSocket (requires wscat):
# wscat -c wss://ai-sandbox.oliver.solutions/video-accessibility-back/api/v1/ws/job-list
# =============================================================================
# Troubleshooting
# =============================================================================
# Check Apache logs:
# sudo tail -f /var/log/apache2/ai-sandbox-error.log
# sudo tail -f /var/log/apache2/ai-sandbox-access.log
#
# Check if backend is running:
# curl http://localhost:8000/health
#
# Check Docker containers:
# cd /opt/accessible-video
# docker-compose ps
#
# Common issues:
# - 502 Bad Gateway: Backend container not running
# - 404 Not Found: Frontend not deployed or Apache alias incorrect
# - WebSocket fails: mod_proxy_wstunnel not enabled
# - CORS errors: Check backend CORS configuration, not Apache
# sudo apache2ctl configtest
# sudo systemctl reload apache2
# curl -I https://optical-dev.oliver.solutions/video-accessibility/
# curl https://optical-dev.oliver.solutions/video-accessibility/api/v1/health
# wscat -c wss://optical-dev.oliver.solutions/video-accessibility/api/v1/ws/job-list

View file

@ -1,92 +0,0 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# Poetry (keep poetry.lock for reproducible builds)
# poetry.lock
# Virtual environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# OS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
# Testing
.coverage
.pytest_cache/
.mypy_cache/
.tox/
htmlcov/
coverage.xml
*.cover
.hypothesis/
# Documentation
docs/
*.md
README*
# Logs
*.log
logs/
# Git
.git/
.gitignore
# Docker
Dockerfile*
.dockerignore
docker-compose*
# CI/CD
.github/
# Local development
.env.local
.env.development
.env.test
# Temporary files
tmp/
temp/
*.tmp
*.bak

1
backend/.gitignore vendored
View file

@ -23,6 +23,7 @@ eggs/
.eggs/
lib/
lib64/
!app/lib/
parts/
sdist/
var/

View file

@ -3,8 +3,8 @@
# =============================================================================
# Stage 1: Builder - Install dependencies
# Stage 2: Base - Common runtime for API and Worker
# Stage 3: API - FastAPI + Gunicorn (with ffmpeg for TTS audio conversion)
# Stage 4: Worker - Celery worker (with ffmpeg for video processing)
# Stage 3: API - FastAPI + Gunicorn (no ffmpeg — heavy tasks run on Cloud Run Jobs)
# Stage 4: Worker - Celery worker, lightweight queues only (notify, embed)
# =============================================================================
# -----------------------------------------------------------------------------
@ -19,7 +19,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# Install Poetry
RUN pip install --no-cache-dir poetry==1.8.2
RUN pip install --no-cache-dir poetry==2.1.4
# Configure Poetry to not create virtual environment (we're in a container)
ENV POETRY_NO_INTERACTION=1 \
@ -33,7 +33,7 @@ COPY pyproject.toml poetry.lock ./
# Install dependencies using Poetry directly (simpler and more reliable)
RUN poetry config virtualenvs.create false \
&& poetry install --only main --no-interaction --no-ansi \
&& poetry install --only main --no-root --no-interaction --no-ansi \
&& rm -rf $POETRY_CACHE_DIR
# -----------------------------------------------------------------------------
@ -46,6 +46,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
libmagic1 \
curl \
tini \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
@ -72,21 +73,10 @@ USER app
# -----------------------------------------------------------------------------
# Stage 3: API - FastAPI + Gunicorn (Production API Server)
# Heavy pipeline tasks (ingest/translate/render) run on Cloud Run Jobs
# -----------------------------------------------------------------------------
FROM base AS api
# Switch to root to install ffmpeg
USER root
# Install ffmpeg for TTS audio conversion
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Switch back to non-root user
USER app
# Set production environment variables
ENV APP_ENV=prod
@ -104,22 +94,10 @@ ENTRYPOINT ["tini", "--"]
CMD ["gunicorn", "-c", "gunicorn_conf.py", "app.main:app"]
# -----------------------------------------------------------------------------
# Stage 4: Worker - Celery Worker (with ffmpeg for video processing)
# Stage 4: Worker - Celery Worker (lightweight queues: notify, embed)
# -----------------------------------------------------------------------------
FROM base AS worker
# Switch back to root to install ffmpeg
USER root
# Install ffmpeg for video processing
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Switch back to non-root user
USER app
# Set production environment variables
# WORKER_CONCURRENCY can be overridden at runtime (default: 8)
ENV APP_ENV=prod \
@ -148,18 +126,6 @@ CMD celery -A celery_worker worker \
# -----------------------------------------------------------------------------
FROM base AS whisper-worker
# Switch back to root to install ffmpeg
USER root
# Install ffmpeg for audio extraction
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Switch back to non-root user
USER app
# Pre-download Whisper medium model during build to avoid cold start delays
# Model is cached in ~/.cache/huggingface/hub (~1.5GB)
RUN python -c "from faster_whisper import WhisperModel; WhisperModel('medium', device='cpu', compute_type='int8')"

View file

@ -0,0 +1,55 @@
# =============================================================================
# Cloud Run Job image — va-worker
#
# Reuses the multi-stage base from Dockerfile.
# Entrypoint: python -m app.tasks.runner --task <name> --job-id <id>
#
# Build:
# docker build -f backend/Dockerfile.cloudrun -t va-worker backend/
# =============================================================================
# ── Stage 1: Builder ─────────────────────────────────────────────────────────
FROM python:3.11-slim AS builder
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential curl \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir poetry==1.8.3
WORKDIR /app
COPY pyproject.toml poetry.lock ./
RUN poetry config virtualenvs.create false \
&& poetry install --no-interaction --no-ansi --only main
# ── Stage 2: Runtime ─────────────────────────────────────────────────────────
FROM python:3.11-slim AS runtime
# ffmpeg required for video rendering tasks
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
tini \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy installed packages from builder
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
WORKDIR /app
COPY . .
# Non-root user for security
RUN groupadd -r worker && useradd -r -g worker worker \
&& chown -R worker:worker /app
USER worker
# Cloud Run Jobs: no persistent HTTP port needed.
# Cloud Run passes CLOUD_RUN_TASK_INDEX and CLOUD_RUN_TASK_COUNT env vars.
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONPATH=/app
ENTRYPOINT ["tini", "--", "python", "-m", "app.tasks.runner"]
# Args are injected per-execution via Cloud Run Job overrides:
# --task ingest|translate|render|rerender --job-id <id> [--language <lang>] ...

View file

@ -1,127 +0,0 @@
# Build stage - Install dependencies and build wheels
FROM python:3.11-slim AS builder
# Install build dependencies
RUN apt-get update && apt-get install -y \
build-essential \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install Poetry
RUN pip install poetry==1.8.2
# Set Poetry configuration
ENV POETRY_NO_INTERACTION=1 \
POETRY_VENV_IN_PROJECT=1 \
POETRY_CACHE_DIR=/tmp/poetry_cache
WORKDIR /app
# Copy dependency files
COPY pyproject.toml poetry.lock ./
# Install dependencies into venv
RUN poetry config virtualenvs.in-project true && \
poetry lock --no-update || true && \
poetry install --only=main --no-root && \
rm -rf $POETRY_CACHE_DIR
# Base runtime stage
FROM python:3.11-slim AS base
# Install runtime system dependencies
RUN apt-get update && apt-get install -y \
ffmpeg \
curl \
tini \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Create non-root user
RUN groupadd --gid 1000 app \
&& useradd --uid 1000 --gid app --shell /bin/bash --create-home app
# Set working directory
WORKDIR /app
# Copy virtual environment from builder stage
COPY --from=builder --chown=app:app /app/.venv /app/.venv
# Ensure venv is in PATH
ENV PATH="/app/.venv/bin:$PATH"
# Copy application code
COPY --chown=app:app . .
# Switch to non-root user
USER app
# Production API stage
FROM base AS production
# Set environment variables for production
ENV APP_ENV=prod \
PYTHONPATH=/app \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Expose port
EXPOSE 8000
# Use tini as init system for proper signal handling
ENTRYPOINT ["tini", "--"]
# Default command for API server
CMD ["gunicorn", "-c", "gunicorn_conf.py"]
# Worker stage for Celery workers
FROM base AS worker
# Set environment variables for worker
ENV APP_ENV=prod \
PYTHONPATH=/app \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
C_FORCE_ROOT=1
# Health check for worker (check if Celery is responding)
HEALTHCHECK --interval=60s --timeout=15s --start-period=10s --retries=3 \
CMD python -c "from celery import Celery; app=Celery('app'); print('Worker healthy')" || exit 1
# Use tini as init system for proper signal handling
ENTRYPOINT ["tini", "--"]
# Default command for Celery worker
CMD ["celery", "-A", "app.tasks", "worker", "--loglevel=info", "--concurrency=1"]
# Development stage with dev dependencies
FROM builder AS development
# Install all dependencies including dev
RUN poetry install --no-root && rm -rf $POETRY_CACHE_DIR
# Install additional dev tools
RUN apt-get update && apt-get install -y \
git \
vim \
&& rm -rf /var/lib/apt/lists/*
# Copy application code
COPY --chown=app:app . .
# Switch to non-root user
USER app
# Set environment for development
ENV APP_ENV=dev \
PYTHONPATH=/app \
PYTHONUNBUFFERED=1
EXPOSE 8000
# Development command with hot reload
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]

View file

@ -22,7 +22,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# Install Poetry
RUN pip install --no-cache-dir poetry==1.8.2
RUN pip install --no-cache-dir poetry==2.1.4
# Configure Poetry to not create virtual environment
ENV POETRY_NO_INTERACTION=1 \
@ -36,7 +36,7 @@ COPY pyproject.toml poetry.lock ./
# Install dependencies
RUN poetry config virtualenvs.create false \
&& poetry install --only main --no-interaction --no-ansi \
&& poetry install --only main --no-root --no-interaction --no-ansi \
&& rm -rf $POETRY_CACHE_DIR
# -----------------------------------------------------------------------------

View file

@ -1,26 +1,28 @@
from datetime import datetime, timedelta
from typing import Optional
from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException, Query, Request, status
from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, get_membership_context
from ...core.database import get_database
from ...core.dependencies import get_current_user, require_roles
from ...core.logging import get_logger
from ...core.security import get_password_hash, verify_password
from ...models.user import User, UserRole
from ...core.security import get_password_hash
from ...models.audit_log import AuditAction, AuditLogQuery, AuditLogResponse
from ...models.user import User, UserRole
from ...schemas.auth import (
AdminStatsResponse,
ChangePasswordRequest,
CreateUserRequest,
ResetPasswordRequest,
UpdateUserRequest,
UserListResponse,
UserResponse,
)
from ...services.audit_logger import audit_logger, log_user_management, log_security_event
from ...services.audit_logger import (
audit_logger,
log_user_management,
)
from ...telemetry import app_metrics
logger = get_logger(__name__)
@ -30,29 +32,49 @@ router = APIRouter(prefix="/admin", tags=["admin"])
@router.get("/users", response_model=UserListResponse)
async def list_users(
page: int = Query(1, ge=1),
size: int = Query(20, ge=1, le=100),
role: Optional[str] = Query(None),
size: int = Query(20, ge=1, le=500),
role: str | None = Query(None, description="Single role or comma-separated list, e.g. 'linguist,admin'"),
active_only: bool = Query(True),
org_id: str | None = Query(None, description="Filter by org (platform admin only)"),
current_user: User = Depends(require_roles(UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List users with filtering and pagination (admin only)"""
query = {}
query: dict = {}
if role:
query["role"] = role
roles = [r.strip() for r in role.split(",") if r.strip()]
query["role"] = {"$in": roles} if len(roles) > 1 else roles[0]
if active_only:
query["is_active"] = True
if not ctx.is_platform_admin:
# Org-scoped admin: show only users in their org(s) via membership collection
accessible_org_ids = ctx.accessible_org_ids()
if not accessible_org_ids:
return UserListResponse(users=[], total=0, page=page, size=size)
member_ids_cursor = db.memberships.find(
{"organization_id": {"$in": accessible_org_ids}},
{"user_id": 1},
)
member_ids = [doc["user_id"] async for doc in member_ids_cursor]
query["_id"] = {"$in": member_ids}
elif org_id:
# Platform admin filtered to a specific org
member_ids_cursor = db.memberships.find({"organization_id": org_id}, {"user_id": 1})
member_ids = [doc["user_id"] async for doc in member_ids_cursor]
query["_id"] = {"$in": member_ids}
# Get total count
total = await db.users.count_documents(query)
# Get paginated results
skip = (page - 1) * size
cursor = db.users.find(query, {"hashed_password": 0}).sort("created_at", -1).skip(skip).limit(size)
users = await cursor.to_list(length=size)
user_responses = []
for user_doc in users:
user_responses.append(UserResponse(
@ -64,8 +86,9 @@ async def list_users(
is_active=user_doc["is_active"],
created_at=user_doc.get("created_at", datetime.utcnow()).isoformat(),
pm_client_ids=user_doc.get("pm_client_ids", []),
languages=user_doc.get("languages", []),
))
return UserListResponse(
users=user_responses,
total=total,
@ -74,6 +97,32 @@ async def list_users(
)
@router.get("/brief-assignees", response_model=list[UserResponse])
async def list_brief_assignees(
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Return users who can be assigned a brief (PM, production, admin). Accessible to all brief-creating roles."""
docs = await db.users.find(
{
"role": {"$in": [UserRole.ADMIN.value, UserRole.PROJECT_MANAGER.value, UserRole.PRODUCTION.value]},
"is_active": True,
},
{"hashed_password": 0},
).sort("full_name", 1).to_list(None)
return [UserResponse(
id=str(d["_id"]),
email=d["email"],
full_name=d["full_name"],
role=d["role"],
auth_provider=d.get("auth_provider", "local"),
is_active=d["is_active"],
created_at=d.get("created_at", datetime.utcnow()).isoformat() if d.get("created_at") else None,
pm_client_ids=d.get("pm_client_ids", []),
languages=d.get("languages", []),
) for d in docs]
@router.get("/users/{user_id}", response_model=UserResponse)
async def get_user(
user_id: str,
@ -87,7 +136,7 @@ async def get_user(
status_code=status.HTTP_404_NOT_FOUND,
detail="User not found"
)
return UserResponse(
id=str(user_doc["_id"]),
email=user_doc["email"],
@ -97,6 +146,7 @@ async def get_user(
is_active=user_doc["is_active"],
created_at=user_doc.get("created_at", datetime.utcnow()).isoformat(),
pm_client_ids=user_doc.get("pm_client_ids", []),
languages=user_doc.get("languages", []),
)
@ -115,7 +165,7 @@ async def create_user(
status_code=status.HTTP_400_BAD_REQUEST,
detail="User with this email already exists"
)
# Create user document
user_id = str(ObjectId())
user_doc = {
@ -129,12 +179,12 @@ async def create_user(
"created_at": datetime.utcnow(),
"updated_at": datetime.utcnow()
}
await db.users.insert_one(user_doc)
# Record metrics
app_metrics.record_auth_attempt("user_created", user_data.role.value)
logger.info(f"Admin {current_user.id} created user {user_id} with role {user_data.role.value}")
await log_user_management(
AuditAction.USER_CREATE, user_id, current_user, request,
@ -150,6 +200,7 @@ async def create_user(
is_active=True,
created_at=user_doc["created_at"].isoformat(),
pm_client_ids=[],
languages=[],
)
@ -169,7 +220,7 @@ async def update_user(
status_code=status.HTTP_404_NOT_FOUND,
detail="User not found"
)
# Check if email is being changed and doesn't conflict
if user_update.email and user_update.email != user_doc["email"]:
existing_user = await db.users.find_one({"email": user_update.email, "_id": {"$ne": user_id}})
@ -178,10 +229,10 @@ async def update_user(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Email already in use by another user"
)
# Build update document
update_data = {"updated_at": datetime.utcnow()}
if user_update.email:
update_data["email"] = user_update.email
if user_update.full_name:
@ -190,19 +241,19 @@ async def update_user(
update_data["role"] = user_update.role.value
if user_update.is_active is not None:
update_data["is_active"] = user_update.is_active
# Update user
result = await db.users.find_one_and_update(
{"_id": user_id},
{"$set": update_data},
return_document=True
)
logger.info(f"Admin {current_user.id} updated user {user_id}")
action = AuditAction.USER_ROLE_CHANGE if user_update.role else AuditAction.USER_UPDATE
await log_user_management(
action, user_id, current_user, request,
details={k: v for k, v in user_update.dict(exclude_none=True).items()},
details=dict(user_update.dict(exclude_none=True).items()),
)
return UserResponse(
@ -214,6 +265,7 @@ async def update_user(
is_active=result["is_active"],
created_at=result.get("created_at", datetime.utcnow()).isoformat(),
pm_client_ids=result.get("pm_client_ids", []),
languages=result.get("languages", []),
)
@ -230,7 +282,7 @@ async def deactivate_user(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Cannot deactivate your own account"
)
result = await db.users.update_one(
{"_id": user_id},
{
@ -240,13 +292,13 @@ async def deactivate_user(
}
}
)
if result.matched_count == 0:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="User not found"
)
logger.info(f"Admin {current_user.id} deactivated user {user_id}")
await log_user_management(AuditAction.USER_DEACTIVATE, user_id, current_user, request)
@ -264,10 +316,10 @@ async def admin_reset_password(
# Generate temporary password
import secrets
import string
temp_password = ''.join(secrets.choice(string.ascii_letters + string.digits) for _ in range(12))
hashed_password = get_password_hash(temp_password)
result = await db.users.update_one(
{"_id": user_id},
{
@ -277,15 +329,15 @@ async def admin_reset_password(
}
}
)
if result.matched_count == 0:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="User not found"
)
logger.info(f"Admin {current_user.id} reset password for user {user_id}")
# In production, send email with temp password instead of returning it
return {
"message": "Password reset successfully",
@ -301,23 +353,23 @@ async def get_admin_stats(
"""Get system statistics (production/admin only)"""
# Get user count
total_users = await db.users.count_documents({"is_active": True})
# Get job counts
total_jobs = await db.jobs.count_documents({})
# Get jobs by status
pipeline = [
{"$group": {"_id": "$status", "count": {"$sum": 1}}}
]
status_counts = await db.jobs.aggregate(pipeline).to_list(None)
jobs_by_status = {item["_id"]: item["count"] for item in status_counts}
# Get jobs created today
today_start = datetime.utcnow().replace(hour=0, minute=0, second=0, microsecond=0)
active_jobs_today = await db.jobs.count_documents({
"created_at": {"$gte": today_start}
})
# Calculate average processing time for completed jobs
avg_processing_pipeline = [
{"$match": {"status": "completed", "created_at": {"$exists": True}, "updated_at": {"$exists": True}}},
@ -338,10 +390,10 @@ async def get_admin_stats(
}
}
]
avg_result = await db.jobs.aggregate(avg_processing_pipeline).to_list(None)
avg_processing_time = avg_result[0]["avg_processing_time"] if avg_result else 0.0
return AdminStatsResponse(
total_users=total_users,
total_jobs=total_jobs,
@ -362,7 +414,7 @@ async def detailed_health_check(
"timestamp": datetime.utcnow().isoformat(),
"components": {}
}
# Check MongoDB
try:
await db.command("ping")
@ -370,7 +422,7 @@ async def detailed_health_check(
except Exception as e:
health_status["components"]["mongodb"] = {"status": "unhealthy", "error": str(e)}
health_status["status"] = "degraded"
# Check Redis (via import to avoid circular dependency)
try:
from ...core.redis import redis_client
@ -382,23 +434,23 @@ async def detailed_health_check(
except Exception as e:
health_status["components"]["redis"] = {"status": "unhealthy", "error": str(e)}
health_status["status"] = "degraded"
# Check GCS (basic check)
try:
from ...services.gcs import gcs_service
# Simple check to see if bucket is accessible
bucket_exists = await gcs_service.file_exists("health_check_dummy") # This will return False but won't error if bucket accessible
await gcs_service.file_exists("health_check_dummy") # This will return False but won't error if bucket accessible
health_status["components"]["gcs"] = {"status": "healthy"}
except Exception as e:
health_status["components"]["gcs"] = {"status": "unhealthy", "error": str(e)}
health_status["status"] = "degraded"
# Check job queue health
try:
from ...tasks import celery_app
inspect = celery_app.control.inspect()
active_tasks = inspect.active()
if active_tasks:
total_active = sum(len(tasks) for tasks in active_tasks.values())
health_status["components"]["celery"] = {
@ -415,7 +467,7 @@ async def detailed_health_check(
except Exception as e:
health_status["components"]["celery"] = {"status": "unhealthy", "error": str(e)}
health_status["status"] = "degraded"
return health_status
@ -427,18 +479,18 @@ async def get_job_statistics(
):
"""Get job processing statistics (reviewer/production/admin only)"""
since_date = datetime.utcnow() - timedelta(days=days)
# Jobs created in period
jobs_in_period = await db.jobs.count_documents({
"created_at": {"$gte": since_date}
})
# Jobs completed in period
jobs_completed = await db.jobs.count_documents({
"status": "completed",
"updated_at": {"$gte": since_date}
})
# Average processing time for completed jobs
avg_pipeline = [
{
@ -467,12 +519,12 @@ async def get_job_statistics(
}
}
]
avg_result = await db.jobs.aggregate(avg_pipeline).to_list(None)
processing_stats = avg_result[0] if avg_result else {
"avg_time": 0, "min_time": 0, "max_time": 0
}
# Current queue status
current_queue_stats = {}
pipeline = [
@ -481,7 +533,7 @@ async def get_job_statistics(
status_counts = await db.jobs.aggregate(pipeline).to_list(None)
for item in status_counts:
current_queue_stats[item["_id"]] = item["count"]
return {
"period_days": days,
"jobs_created": jobs_in_period,
@ -506,7 +558,7 @@ async def admin_force_password_reset(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Cannot reset your own password this way"
)
# Check if user exists
user_doc = await db.users.find_one({"_id": user_id})
if not user_doc:
@ -514,15 +566,15 @@ async def admin_force_password_reset(
status_code=status.HTTP_404_NOT_FOUND,
detail="User not found"
)
# Generate secure temporary password
import secrets
import string
temp_password = ''.join(secrets.choice(
string.ascii_letters + string.digits + "!@#$%"
) for _ in range(16))
# Update password
await db.users.update_one(
{"_id": user_id},
@ -533,10 +585,10 @@ async def admin_force_password_reset(
}
}
)
# TODO: In production, send via secure email instead of returning password
logger.info(f"Admin {current_user.id} reset password for user {user_id}")
return {
"message": "Password reset successfully",
"temporary_password": temp_password,
@ -544,47 +596,6 @@ async def admin_force_password_reset(
}
@router.get("/audit-logs")
async def get_audit_logs(
job_id: Optional[str] = Query(None),
action: Optional[str] = Query(None),
days: int = Query(7, ge=1, le=90),
page: int = Query(1, ge=1),
size: int = Query(50, ge=1, le=200),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Get audit logs with filtering (production/admin only)"""
query = {
"when": {"$gte": datetime.utcnow() - timedelta(days=days)}
}
if job_id:
query["job_id"] = job_id
if action:
query["action"] = action
# Get total count
total = await db.audit_logs.count_documents(query)
# Get paginated results
skip = (page - 1) * size
cursor = (
db.audit_logs.find(query)
.sort("when", -1)
.skip(skip)
.limit(size)
)
logs = await cursor.to_list(length=size)
return {
"logs": logs,
"total": total,
"page": page,
"size": size,
"period_days": days
}
@router.post("/maintenance/reprocess-job/{job_id}")
async def reprocess_job(
@ -600,7 +611,7 @@ async def reprocess_job(
status_code=status.HTTP_404_NOT_FOUND,
detail="Job not found"
)
# Reset job to created status for reprocessing
await db.jobs.update_one(
{"_id": job_id},
@ -620,7 +631,7 @@ async def reprocess_job(
}
}
)
# Broadcast status update
try:
from ...services.websocket import connection_manager
@ -632,36 +643,36 @@ async def reprocess_job(
)
except Exception as e:
logger.warning(f"Failed to broadcast status update for job reset {job_id}: {e}")
# Trigger ingestion task
from ...tasks.ingest_and_ai import ingest_and_ai_task
ingest_and_ai_task.delay(job_id)
logger.warning(f"Admin {current_user.id} triggered reprocessing for job {job_id}")
return {"message": f"Job {job_id} queued for reprocessing"}
@router.get("/audit-logs", response_model=AuditLogResponse)
async def get_audit_logs_detailed(
# Time range
start_date: Optional[datetime] = Query(None, description="Start date for audit logs"),
end_date: Optional[datetime] = Query(None, description="End date for audit logs"),
start_date: datetime | None = Query(None, description="Start date for audit logs"),
end_date: datetime | None = Query(None, description="End date for audit logs"),
# Filters
action: Optional[str] = Query(None, description="Filter by action type"),
severity: Optional[str] = Query(None, description="Filter by severity level"),
user_email: Optional[str] = Query(None, description="Filter by user email"),
resource_type: Optional[str] = Query(None, description="Filter by resource type"),
resource_id: Optional[str] = Query(None, description="Filter by resource ID"),
success: Optional[bool] = Query(None, description="Filter by success status"),
action: str | None = Query(None, description="Filter by action type"),
severity: str | None = Query(None, description="Filter by severity level"),
user_email: str | None = Query(None, description="Filter by user email"),
resource_type: str | None = Query(None, description="Filter by resource type"),
resource_id: str | None = Query(None, description="Filter by resource ID"),
success: bool | None = Query(None, description="Filter by success status"),
# Search
search: Optional[str] = Query(None, description="Search in description and details"),
search: str | None = Query(None, description="Search in description and details"),
# Pagination
page: int = Query(1, ge=1, description="Page number"),
size: int = Query(50, ge=1, le=500, description="Page size"),
# Pagination (skip/limit to match frontend AuditLogQuery)
skip: int = Query(0, ge=0, description="Number of records to skip"),
limit: int = Query(50, ge=1, le=500, description="Max records to return"),
# Sorting
sort_by: str = Query("timestamp", description="Field to sort by"),
@ -671,26 +682,7 @@ async def get_audit_logs_detailed(
request: Request = None,
):
"""Get audit logs with filtering and pagination (production/admin only)"""
# Log audit log access
await audit_logger.log_action(
action="admin.audit.access",
description=f"Admin {current_user.email} accessed audit logs",
user=current_user,
request=request,
details={
"filters": {
"start_date": start_date.isoformat() if start_date else None,
"end_date": end_date.isoformat() if end_date else None,
"action": action,
"severity": severity,
"user_email": user_email,
"resource_type": resource_type,
"search": search
}
}
)
# Build query
query = AuditLogQuery(
start_date=start_date,
@ -702,12 +694,12 @@ async def get_audit_logs_detailed(
resource_id=resource_id,
success=success,
search=search,
skip=(page - 1) * size,
limit=size,
skip=skip,
limit=limit,
sort_by=sort_by,
sort_order=sort_order
)
return await audit_logger.query_logs(query)
@ -716,32 +708,34 @@ async def get_user_audit_logs(
user_id: str,
days: int = Query(30, ge=1, le=365, description="Number of days to look back"),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
request: Request = None,
):
"""Get audit logs for a specific user (production/admin only)"""
# Validate user_id
try:
ObjectId(user_id)
except Exception:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Invalid user ID format"
"""Get audit logs for a specific user — accepts user ID or email (production/admin only)"""
import re as _re
# Accept email address: look up user by case-insensitive email match
resolved_id = user_id
if "@" in user_id:
user_doc = await db.users.find_one(
{"email": _re.compile(f"^{_re.escape(user_id)}$", _re.IGNORECASE)},
{"_id": 1},
)
# Log access to user audit logs
await audit_logger.log_action(
action="admin.audit.access",
description=f"Admin {current_user.email} accessed user audit logs for {user_id}",
user=current_user,
request=request,
resource_type="user",
resource_id=user_id,
details={"days_requested": days}
)
logs = await audit_logger.get_user_activity(user_id, days)
return {"logs": logs, "user_id": user_id, "days": days}
if user_doc:
resolved_id = str(user_doc["_id"])
logs = await audit_logger.get_user_activity(resolved_id, days)
# Fallback: query by email field in audit logs (case-insensitive via audit_logger)
if not logs and "@" in user_id:
from ...models.audit_log import AuditLogQuery as ALQ
from ...services.audit_logger import audit_logger as al
q = ALQ(user_email=user_id, limit=1000, sort_by="timestamp", sort_order=-1)
result = await al.query_logs(q)
logs = result.logs
return logs
@router.get("/audit-logs/security")
@ -751,7 +745,7 @@ async def get_security_events(
request: Request = None,
):
"""Get recent security events (production/admin only)"""
# Log access to security events
await audit_logger.log_action(
action="admin.audit.access",
@ -760,9 +754,9 @@ async def get_security_events(
request=request,
details={"hours_requested": hours}
)
logs = await audit_logger.get_security_events(hours)
return {"logs": logs, "hours": hours}
return logs
@router.delete("/audit-logs/cleanup")
@ -772,7 +766,7 @@ async def cleanup_audit_logs(
request: Request = None,
):
"""Clean up old audit logs (admin only)"""
# Log audit cleanup action
await audit_logger.log_action(
action="admin.system.action",
@ -782,9 +776,9 @@ async def cleanup_audit_logs(
details={"retention_days": retention_days},
severity="warning"
)
deleted_count = await audit_logger.cleanup_old_logs(retention_days)
# Log cleanup completion
await audit_logger.log_action(
action="admin.system.action",
@ -796,9 +790,9 @@ async def cleanup_audit_logs(
"deleted_count": deleted_count
}
)
return {
"message": f"Deleted {deleted_count} audit logs older than {retention_days} days",
"deleted_count": deleted_count,
"retention_days": retention_days
}
}

View file

@ -0,0 +1,295 @@
"""Admin production endpoints: failure dashboard, bulk retry, queue stats, VTT override."""
from datetime import datetime
import redis.asyncio as aioredis
from fastapi import (
APIRouter,
Depends,
File,
Form,
HTTPException,
Query,
UploadFile,
status,
)
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...core.logging import get_logger
from ...core.redis import get_redis
from ...models.audit_log import AuditAction
from ...models.job import JobStatus, RequestedOutputs
from ...models.user import User, UserRole
from ...schemas.job import JobResponse
from ...services.audit_logger import audit_logger
from ...services.cloud_run_dispatch import dispatch as _cr_dispatch
from ...services.gcs import upload_vtt_to_gcs
logger = get_logger(__name__)
router = APIRouter(prefix="/admin/production", tags=["admin-production"])
_FAILURE_STATUSES = [
JobStatus.PROCESSING_FAILED.value,
JobStatus.TTS_FAILED.value,
JobStatus.RENDER_FAILED.value,
]
_RETRY_CAP = 50
class BulkRetryRequest(BaseModel):
job_ids: list[str]
strategy: str = "auto" # "auto" | "from_scratch"
class BulkRetryResponse(BaseModel):
retried: list[str]
skipped: list[str]
errors: list[dict]
@router.get("/failures", response_model=list[JobResponse])
async def list_failures(
step: str | None = Query(None, description="Filter by failure.step"),
org_id: str | None = Query(None, description="Filter by organization_id"),
limit: int = Query(50, ge=1, le=200),
skip: int = Query(0, ge=0),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List all jobs in a failed status, optionally filtered by step and org."""
query: dict = {"status": {"$in": _FAILURE_STATUSES}}
if step:
query["failure.step"] = step
if org_id:
query["organization_id"] = org_id
cursor = db.jobs.find(query).sort("updated_at", -1).skip(skip).limit(limit)
jobs = await cursor.to_list(length=limit)
return [
JobResponse(
id=str(j["_id"]),
title=j["title"],
status=j["status"],
source=j["source"],
requested_outputs=RequestedOutputs(**j["requested_outputs"]),
review=j.get("review", {"notes": "", "history": []}),
outputs=j.get("outputs"),
created_at=j["created_at"].isoformat(),
updated_at=j["updated_at"].isoformat(),
)
for j in jobs
]
@router.post("/bulk-retry", response_model=BulkRetryResponse)
async def bulk_retry(
payload: BulkRetryRequest,
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Retry up to 50 failed jobs in one call."""
if len(payload.job_ids) > _RETRY_CAP:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"Cannot retry more than {_RETRY_CAP} jobs at once",
)
retried: list[str] = []
skipped: list[str] = []
errors: list[dict] = []
now = datetime.utcnow()
for job_id in payload.job_ids:
try:
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
skipped.append(job_id)
continue
if job_doc["status"] not in _FAILURE_STATUSES:
skipped.append(job_id)
continue
failure = job_doc.get("failure") or {}
if payload.strategy == "from_scratch":
step = "ingestion"
else:
step = failure.get("step")
if not step:
step = "tts" if job_doc["status"] == JobStatus.TTS_FAILED.value else "render"
if step in ("ingestion", "ai_processing"):
reset_status = JobStatus.CREATED.value
elif step == "translation":
reset_status = JobStatus.AI_PROCESSING.value
elif step == "tts":
src = job_doc["source"].get("language", "en")
reset_status = (
JobStatus.APPROVED_ENGLISH.value if src == "en" else JobStatus.APPROVED_SOURCE.value
)
elif step == "render":
reset_status = JobStatus.PENDING_QC.value
else:
skipped.append(job_id)
continue
await db.jobs.update_one(
{"_id": job_id},
{
"$set": {"status": reset_status, "error": None, "updated_at": now},
"$inc": {"retry_count": 1},
"$push": {
"review.history": {
"at": now,
"status": f"bulk_retry_{step}",
"by": str(current_user.id),
}
},
},
)
if step in ("ingestion", "ai_processing"):
await _cr_dispatch("ingest", job_id)
elif step in ("translation", "tts"):
await _cr_dispatch("translate", job_id)
elif step == "render":
lang = job_doc.get("last_render_language", "en")
await _cr_dispatch("rerender", job_id, language=lang)
retried.append(job_id)
except Exception as e:
logger.error(f"bulk-retry failed for job {job_id}: {e}")
errors.append({"job_id": job_id, "error": str(e)})
try:
await audit_logger.log(
action=AuditAction.JOB_BULK_RETRY,
user_id=str(current_user.id),
user_email=current_user.email,
user_role=current_user.role.value if current_user.role else None,
resource_type="job",
description=f"Bulk retry {len(retried)} jobs (strategy={payload.strategy})",
details={"retried": retried, "skipped": skipped, "error_count": len(errors)},
)
except Exception as e:
logger.warning(f"Failed to write bulk-retry audit log: {e}")
return BulkRetryResponse(retried=retried, skipped=skipped, errors=errors)
# ---------------------------------------------------------------------------
# PR-7: Queue depth stats
# ---------------------------------------------------------------------------
_CELERY_QUEUES = ["default", "ingest", "tts", "render", "ffmpeg", "whisper", "notify", "embed"]
class QueueStats(BaseModel):
queues: dict[str, int] # queue_name → pending task count
total_pending: int
@router.get("/queue-stats", response_model=QueueStats)
async def get_queue_stats(
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
redis: aioredis.Redis = Depends(get_redis),
):
"""Return pending task counts per Celery queue (via Redis LLEN)."""
counts: dict[str, int] = {}
for q in _CELERY_QUEUES:
try:
n = await redis.llen(q)
counts[q] = n
except Exception:
counts[q] = 0
return QueueStats(queues=counts, total_pending=sum(counts.values()))
# ---------------------------------------------------------------------------
# PR-8: Upload final VTT override — bypass AI, jump to PENDING_QC
# ---------------------------------------------------------------------------
_BYPASSABLE_STATUSES = {
JobStatus.CREATED.value,
JobStatus.INGESTING.value,
JobStatus.AI_PROCESSING.value,
JobStatus.PROCESSING_FAILED.value,
JobStatus.TTS_FAILED.value,
JobStatus.RENDER_FAILED.value,
}
@router.post("/jobs/{job_id}/upload-final-vtt")
async def upload_final_vtt(
job_id: str,
language: str = Form(..., description="BCP-47 language code, e.g. 'en' or 'fr'"),
vtt_file: UploadFile = File(..., description="WebVTT (.vtt) file"),
vtt_type: str = Form("captions", description="'captions' or 'ad'"),
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Upload a hand-crafted VTT to override AI output and advance job to PENDING_QC."""
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
if job_doc["status"] not in _BYPASSABLE_STATUSES:
raise HTTPException(
status_code=status.HTTP_409_CONFLICT,
detail=f"Cannot override VTT when job is in status '{job_doc['status']}'. "
f"Only allowed in: {sorted(_BYPASSABLE_STATUSES)}",
)
if not vtt_file.filename or not vtt_file.filename.endswith(".vtt"):
raise HTTPException(status_code=400, detail="File must be a .vtt file")
vtt_content = (await vtt_file.read()).decode("utf-8")
if not vtt_content.strip().startswith("WEBVTT"):
raise HTTPException(status_code=400, detail="File does not start with WEBVTT header")
if vtt_type not in ("captions", "ad"):
raise HTTPException(status_code=400, detail="vtt_type must be 'captions' or 'ad'")
lang_key = language.replace("-", "_")
field = "captions_vtt_gcs" if vtt_type == "captions" else "ad_vtt_gcs"
gcs_path = f"{job_id}/{lang_key}/{vtt_type}.vtt"
gcs_uri = await upload_vtt_to_gcs(vtt_content, gcs_path)
now = datetime.utcnow()
await db.jobs.update_one(
{"_id": job_id},
{
"$set": {
f"outputs.{lang_key}.{field}": gcs_uri,
"status": JobStatus.PENDING_QC.value,
"updated_at": now,
},
"$push": {
"review.history": {
"at": now,
"status": "manual_vtt_upload",
"by": str(current_user.id),
"note": f"Manual {vtt_type} VTT upload for {language} by {current_user.email}",
}
},
},
)
try:
await audit_logger.log(
action=AuditAction.VTT_EDIT,
user_id=str(current_user.id),
user_email=current_user.email,
user_role=current_user.role.value if current_user.role else None,
resource_type="job",
resource_id=job_id,
description=f"Manual {vtt_type} VTT upload for {language} — job advanced to PENDING_QC",
)
except Exception as e:
logger.warning(f"Failed to write upload-final-vtt audit log: {e}")
return {"status": "ok", "gcs_uri": gcs_uri, "job_status": JobStatus.PENDING_QC.value}

View file

@ -1,112 +1,126 @@
import re
import secrets
from datetime import datetime
from fastapi import APIRouter, Depends, HTTPException, Request, Response, status
from fastapi.security import HTTPBearer
from motor.motor_asyncio import AsyncIOMotorClient, AsyncIOMotorDatabase
from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.config import settings
from ...core.database import get_database
from ...core.logging import get_logger
from ...core.security import (
create_access_token,
create_refresh_token,
decode_token,
verify_password,
)
from ...models.user import User, AuthProvider, UserRole
from ...models.audit_log import AuditAction, AuditLogSeverity
from ...models.user import AuthProvider, User, UserRole
from ...schemas.auth import (
LoginRequest,
LoginResponse,
LogoutResponse,
RefreshResponse,
MicrosoftLoginRequest,
MicrosoftLoginResponse,
RefreshResponse,
)
from ...services.audit_logger import audit_logger, log_auth_failure, log_auth_success
from ...services.microsoft_auth import (
get_microsoft_auth_service,
MicrosoftTokenValidationError,
MicrosoftAuthError,
MicrosoftTokenValidationError,
get_microsoft_auth_service,
)
from ...services.audit_logger import log_auth_success, log_auth_failure, audit_logger
from ...models.audit_log import AuditAction, AuditLogSeverity
logger = get_logger(__name__)
router = APIRouter(prefix="/auth", tags=["auth"])
security = HTTPBearer()
async def _get_user_org_ids(user_id: str, db: AsyncIOMotorDatabase) -> list[str]:
"""Return list of org IDs the user belongs to — used as a JWT hint only."""
cursor = db.memberships.find({"user_id": user_id}, {"organization_id": 1})
memberships = await cursor.to_list(length=200)
return [str(m["organization_id"]) for m in memberships if m.get("organization_id")]
def _set_auth_cookies(response: Response, refresh_token: str) -> str:
"""Set httponly refresh_token cookie and readable csrf_token cookie. Returns the csrf token."""
csrf_token = secrets.token_hex(32)
ttl = settings.jwt_refresh_ttl_days * 24 * 60 * 60
domain = settings.cookie_domain if settings.app_env == "prod" else None
response.set_cookie(
key="refresh_token",
value=refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=domain,
max_age=ttl,
)
response.set_cookie(
key="csrf_token",
value=csrf_token,
httponly=False, # JS-readable for Double Submit Cookie pattern
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=domain,
max_age=ttl,
)
return csrf_token
@router.post("/login", response_model=LoginResponse)
async def login(
login_data: LoginRequest,
request: Request,
response: Response,
db: AsyncIOMotorDatabase = Depends(get_database),
):
print(f"LOGIN: Starting login for {login_data.email}")
# Create database connection directly (bypass dependency injection issues)
client = AsyncIOMotorClient(settings.mongodb_uri)
db = client[settings.mongodb_db]
try:
print("LOGIN: Database connection created")
# Find user by email
print("LOGIN: Looking up user in database")
user_doc = await db.users.find_one({"email": login_data.email})
print(f"LOGIN: User lookup complete, found: {user_doc is not None}")
if not user_doc:
await log_auth_failure(login_data.email, request, "User not found")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
)
user = User(**user_doc)
# Check if user uses Microsoft authentication
if user.auth_provider == AuthProvider.MICROSOFT:
await log_auth_failure(login_data.email, request, "Account uses Microsoft SSO")
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="This account uses Microsoft authentication. Please sign in with Microsoft.",
)
# Verify password
if not user.hashed_password or not verify_password(login_data.password, user.hashed_password):
await log_auth_failure(login_data.email, request, "Invalid password")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
)
if not user.is_active:
await log_auth_failure(login_data.email, request, "Account disabled")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account is disabled",
)
# Create tokens
access_token = create_access_token(subject=str(user.id))
refresh_token = create_refresh_token(subject=str(user.id))
# Set refresh token as HttpOnly cookie
response.set_cookie(
key="refresh_token",
value=refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=settings.cookie_domain if settings.app_env == "prod" else None,
max_age=settings.jwt_refresh_ttl_days * 24 * 60 * 60,
user_doc = await db.users.find_one({"email": login_data.email})
if not user_doc:
await log_auth_failure(login_data.email, request, "User not found")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
)
await log_auth_success(user, request)
return LoginResponse(
access_token=access_token,
user_id=str(user.id),
role=user.role,
user = User(**user_doc)
if user.auth_provider == AuthProvider.MICROSOFT:
await log_auth_failure(login_data.email, request, "Account uses Microsoft SSO")
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="This account uses Microsoft authentication. Please sign in with Microsoft.",
)
finally:
# Close database connection
client.close()
if not user.hashed_password or not verify_password(login_data.password, user.hashed_password):
await log_auth_failure(login_data.email, request, "Invalid password")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
)
if not user.is_active:
await log_auth_failure(login_data.email, request, "Account disabled")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account is disabled",
)
org_ids = await _get_user_org_ids(str(user.id), db)
access_token = create_access_token(subject=str(user.id), org_ids=org_ids)
refresh_token = create_refresh_token(subject=str(user.id))
_set_auth_cookies(response, refresh_token)
await log_auth_success(user, request)
return LoginResponse(
access_token=access_token,
user_id=str(user.id),
role=user.role,
)
@router.post("/microsoft", response_model=MicrosoftLoginResponse)
@ -114,127 +128,84 @@ async def microsoft_login(
login_data: MicrosoftLoginRequest,
request: Request,
response: Response,
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Authenticate user with Microsoft ID token.
This endpoint validates the Microsoft ID token, finds or creates the user,
and returns JWT tokens for API access.
"""
print(f"MICROSOFT LOGIN: Starting Microsoft authentication")
# Create database connection
client = AsyncIOMotorClient(settings.mongodb_uri)
db = client[settings.mongodb_db]
microsoft_auth = get_microsoft_auth_service()
try:
# Validate Microsoft token
microsoft_auth = get_microsoft_auth_service()
try:
user_info = microsoft_auth.validate_token(login_data.id_token)
print(f"MICROSOFT LOGIN: Token validated for {user_info.email}")
except MicrosoftTokenValidationError as e:
print(f"MICROSOFT LOGIN ERROR: Token validation failed: {e}")
await log_auth_failure(login_data.id_token[:20] + "", request, f"MS token invalid: {e}")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail=f"Microsoft authentication failed: {str(e)}",
)
except MicrosoftAuthError as e:
print(f"MICROSOFT LOGIN ERROR: Authentication error: {e}")
await log_auth_failure("microsoft-sso", request, f"MS auth service error: {e}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Microsoft authentication service error",
)
user_info = await microsoft_auth.validate_token(login_data.id_token)
except MicrosoftTokenValidationError as e:
await log_auth_failure(login_data.id_token[:20] + "", request, f"MS token invalid: {e}")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail=f"Microsoft authentication failed: {str(e)}",
) from None
except MicrosoftAuthError as e:
await log_auth_failure("microsoft-sso", request, f"MS auth service error: {e}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Microsoft authentication service error",
) from None
# Find or create user
# Look up by Microsoft-derived ID first — handles email casing changes across logins
# (Microsoft can return vadymsamoilenko@... vs VadymSamoilenko@... for the same user)
ms_user_id = f"ms-{user_info.sub[:20]}"
user_doc = await db.users.find_one({"_id": ms_user_id})
if not user_doc:
# Fall back to case-insensitive email lookup (handles local-to-Microsoft migration)
user_doc = await db.users.find_one(
{"email": {"$regex": f"^{re.escape(user_info.email)}$", "$options": "i"}}
)
if user_doc:
# User exists
user = User(**user_doc)
print(f"MICROSOFT LOGIN: Existing user found: {user.id}")
# Update auth_provider if user is switching from local to Microsoft
if user.auth_provider == AuthProvider.LOCAL:
print(f"MICROSOFT LOGIN: Updating user to Microsoft auth provider")
await db.users.update_one(
{"_id": user_doc["_id"]},
{
"$set": {
"auth_provider": AuthProvider.MICROSOFT.value,
"updated_at": datetime.utcnow()
}
}
)
user.auth_provider = AuthProvider.MICROSOFT
else:
# Create new user with zero org memberships (SaaS model).
# They will see a "no access" landing until an admin invites them.
print(f"MICROSOFT LOGIN: Creating new user for {user_info.email}")
new_user = {
"_id": ms_user_id,
"email": user_info.email,
"full_name": user_info.name,
"hashed_password": None,
"role": UserRole.CLIENT.value,
"auth_provider": AuthProvider.MICROSOFT.value,
"is_active": True,
"pm_client_ids": [],
"created_at": datetime.utcnow(),
"updated_at": datetime.utcnow(),
}
await db.users.insert_one(new_user)
user = User(**new_user)
print(f"MICROSOFT LOGIN: New user created (zero memberships): {user.id}")
# Check if user is active
if not user.is_active:
await log_auth_failure(user.email, request, "Account disabled")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account is disabled",
)
# Create JWT tokens
access_token = create_access_token(subject=str(user.id))
refresh_token = create_refresh_token(subject=str(user.id))
# Set refresh token as HttpOnly cookie
response.set_cookie(
key="refresh_token",
value=refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=settings.cookie_domain if settings.app_env == "prod" else None,
max_age=settings.jwt_refresh_ttl_days * 24 * 60 * 60,
# Look up by Microsoft-derived ID first — handles email casing changes across logins
ms_user_id = f"ms-{user_info.sub[:20]}"
user_doc = await db.users.find_one({"_id": ms_user_id})
if not user_doc:
# Fall back to case-insensitive email lookup (handles local-to-Microsoft migration)
user_doc = await db.users.find_one(
{"email": {"$regex": f"^{re.escape(user_info.email)}$", "$options": "i"}}
)
print(f"MICROSOFT LOGIN: Authentication successful for {user.email}")
await log_auth_success(user, request)
return MicrosoftLoginResponse(
access_token=access_token,
user_id=str(user.id),
role=user.role if isinstance(user.role, str) else user.role.value,
email=user.email,
full_name=user.full_name,
auth_provider=user.auth_provider,
if user_doc:
user = User(**user_doc)
if user.auth_provider == AuthProvider.LOCAL:
await db.users.update_one(
{"_id": user_doc["_id"]},
{"$set": {"auth_provider": AuthProvider.MICROSOFT.value, "updated_at": datetime.utcnow()}},
)
user.auth_provider = AuthProvider.MICROSOFT
else:
new_user = {
"_id": ms_user_id,
"email": user_info.email,
"full_name": user_info.name,
"hashed_password": None,
"role": UserRole.CLIENT.value,
"auth_provider": AuthProvider.MICROSOFT.value,
"is_active": True,
"pm_client_ids": [],
"created_at": datetime.utcnow(),
"updated_at": datetime.utcnow(),
}
await db.users.insert_one(new_user)
user = User(**new_user)
if not user.is_active:
await log_auth_failure(user.email, request, "Account disabled")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account is disabled",
)
finally:
# Close database connection
client.close()
org_ids = await _get_user_org_ids(str(user.id), db)
access_token = create_access_token(subject=str(user.id), org_ids=org_ids)
refresh_token = create_refresh_token(subject=str(user.id))
_set_auth_cookies(response, refresh_token)
await log_auth_success(user, request)
return MicrosoftLoginResponse(
access_token=access_token,
user_id=str(user.id),
role=user.role if isinstance(user.role, str) else user.role.value,
email=user.email,
full_name=user.full_name,
auth_provider=user.auth_provider,
)
@router.post("/refresh", response_model=RefreshResponse)
@ -244,29 +215,32 @@ async def refresh_token(
db: AsyncIOMotorDatabase = Depends(get_database),
):
refresh_token = request.cookies.get("refresh_token")
print(f"🔍 REFRESH DEBUG: Cookie exists: {bool(refresh_token)}")
if not refresh_token:
print("🚨 REFRESH ERROR: No refresh token in cookies")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Refresh token not found",
)
# CSRF protection: Double Submit Cookie pattern
csrf_cookie = request.cookies.get("csrf_token")
csrf_header = request.headers.get("X-CSRF-Token")
if csrf_cookie and (not csrf_header or csrf_header != csrf_cookie):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="CSRF token mismatch",
)
try:
print(f"🔍 REFRESH DEBUG: Attempting to decode token...")
payload = decode_token(refresh_token)
print(f"🔍 REFRESH DEBUG: Token decoded successfully, type={payload.get('type')}")
if payload.get("type") != "refresh":
print(f"🚨 REFRESH ERROR: Wrong token type: {payload.get('type')}")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid token type",
)
user_id = payload.get("sub")
print(f"🔍 REFRESH DEBUG: User ID from token: {user_id}")
if not user_id:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
@ -288,22 +262,15 @@ async def refresh_token(
detail="User account is disabled",
)
# Create new tokens
new_access_token = create_access_token(subject=user_id)
# Create new tokens (include org_ids claim for prefilter hint)
_org_ids = await _get_user_org_ids(user_id, db)
new_access_token = create_access_token(subject=user_id, org_ids=_org_ids)
new_refresh_token = create_refresh_token(subject=user_id)
# Update refresh token cookie
response.set_cookie(
key="refresh_token",
value=new_refresh_token,
httponly=True,
secure=settings.cookie_secure,
samesite=settings.cookie_samesite,
domain=settings.cookie_domain if settings.app_env == "prod" else None,
max_age=settings.jwt_refresh_ttl_days * 24 * 60 * 60,
)
# Rotate both refresh and CSRF cookies
_set_auth_cookies(response, new_refresh_token)
print(f"🔍 REFRESH DEBUG: Refresh successful for user {user_id}")
logger.info("Token refresh successful for user %s", user_id)
return RefreshResponse(
access_token=new_access_token,
user_id=user_id,
@ -312,14 +279,15 @@ async def refresh_token(
full_name=user.full_name
)
except HTTPException:
raise
except Exception as e:
print(f"🚨 REFRESH ERROR: Exception during refresh: {type(e).__name__}: {e}")
import traceback
print(f"Traceback:\n{traceback.format_exc()}")
logger.exception("Refresh token error: %s\n%s", type(e).__name__, traceback.format_exc())
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail=f"Invalid refresh token: {str(e)}",
)
detail="Invalid refresh token",
) from None
@router.post("/logout", response_model=LogoutResponse)

View file

@ -0,0 +1,245 @@
"""Job Brief CRUD endpoints."""
from datetime import datetime
from fastapi import APIRouter, Depends, HTTPException, Request, status
from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, assert_user_in_org, get_membership_context
from ...core.database import get_database
from ...core.logging import get_logger
from ...models.audit_log import AuditAction
from ...models.job_brief import (
BriefStatus,
JobBriefCreate,
JobBriefResponse,
JobBriefUpdate,
)
from ...models.organization import OrgRole
from ...services.audit_logger import audit_logger
logger = get_logger(__name__)
router = APIRouter(prefix="/briefs", tags=["briefs"])
def _doc_to_response(doc: dict) -> JobBriefResponse:
return JobBriefResponse(
id=str(doc["_id"]),
organization_id=doc["organization_id"],
project_id=doc.get("project_id"),
title=doc["title"],
description=doc.get("description"),
requested_outputs=doc["requested_outputs"],
languages=doc.get("languages", []),
deadline=doc.get("deadline"),
status=doc["status"],
created_by=doc["created_by"],
assignee_id=doc.get("assignee_id"),
job_id=doc.get("job_id"),
created_at=doc["created_at"].isoformat(),
updated_at=doc["updated_at"].isoformat(),
submitted_at=doc["submitted_at"].isoformat() if doc.get("submitted_at") else None,
approved_by=doc.get("approved_by"),
)
@router.get("", response_model=list[JobBriefResponse])
async def list_briefs(
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
org_ids = [m.organization_id for m in ctx.memberships] if hasattr(ctx, "memberships") else []
if ctx.is_platform_admin:
query: dict = {}
elif org_ids:
query = {"organization_id": {"$in": org_ids}}
else:
raise HTTPException(status_code=403, detail="No org memberships")
cursor = db.job_briefs.find(query).sort("created_at", -1).limit(100)
docs = await cursor.to_list(length=100)
return [_doc_to_response(d) for d in docs]
@router.post("", response_model=JobBriefResponse, status_code=status.HTTP_201_CREATED)
async def create_brief(
payload: JobBriefCreate,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
# Resolve org from project if not directly identifiable
org_id: str | None = None
if payload.project_id:
project = await db.projects.find_one({"_id": payload.project_id}, {"client_id": 1})
if project:
org_id = project.get("client_id")
if not org_id:
# Use first membership org if user has only one (or admin)
if ctx.is_platform_admin:
raise HTTPException(status_code=400, detail="Admin must supply project_id or org_id cannot be inferred")
memberships = [m for m in (ctx.memberships if hasattr(ctx, "memberships") else [])
if ctx.can_access_org(m.organization_id, OrgRole.MANAGER)]
if len(memberships) == 1:
org_id = memberships[0].organization_id
else:
raise HTTPException(status_code=400, detail="Cannot infer organization; supply project_id")
assert_user_in_org(ctx, org_id, OrgRole.MANAGER)
now = datetime.utcnow()
doc = {
"_id": f"brief_{now.strftime('%Y%m%d%H%M%S%f')}_{str(ctx.user.id)[-6:]}",
"organization_id": org_id,
"project_id": payload.project_id,
"title": payload.title,
"description": payload.description,
"requested_outputs": payload.requested_outputs.model_dump(),
"languages": payload.languages,
"deadline": payload.deadline,
"assignee_id": payload.assignee_id,
"status": BriefStatus.DRAFT.value,
"created_by": str(ctx.user.id),
"job_id": None,
"created_at": now,
"updated_at": now,
"submitted_at": None,
"approved_by": None,
}
await db.job_briefs.insert_one(doc)
await audit_logger.log_action(
action=AuditAction.BRIEF_CREATE,
description=f"Brief '{payload.title}' created",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=str(doc["_id"]),
details={"title": payload.title, "organization_id": org_id},
)
return _doc_to_response(doc)
@router.get("/{brief_id}", response_model=JobBriefResponse)
async def get_brief(
brief_id: str,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.VIEWER)
return _doc_to_response(doc)
@router.patch("/{brief_id}", response_model=JobBriefResponse)
async def update_brief(
brief_id: str,
payload: JobBriefUpdate,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.MANAGER)
if doc["status"] != BriefStatus.DRAFT.value:
raise HTTPException(status_code=400, detail="Only DRAFT briefs can be updated")
updates: dict = {"updated_at": datetime.utcnow()}
if payload.title is not None:
updates["title"] = payload.title
if payload.description is not None:
updates["description"] = payload.description
if payload.requested_outputs is not None:
updates["requested_outputs"] = payload.requested_outputs.model_dump()
if payload.languages is not None:
updates["languages"] = payload.languages
if payload.deadline is not None:
updates["deadline"] = payload.deadline
result = await db.job_briefs.find_one_and_update(
{"_id": brief_id},
{"$set": updates},
return_document=True,
)
await audit_logger.log_action(
action=AuditAction.BRIEF_UPDATE,
description=f"Brief '{brief_id}' updated",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=brief_id,
details={"fields_updated": list(updates.keys())},
)
return _doc_to_response(result)
@router.post("/{brief_id}/submit", response_model=JobBriefResponse)
async def submit_brief(
brief_id: str,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.MANAGER)
if doc["status"] != BriefStatus.DRAFT.value:
raise HTTPException(status_code=400, detail="Only DRAFT briefs can be submitted")
now = datetime.utcnow()
result = await db.job_briefs.find_one_and_update(
{"_id": brief_id},
{"$set": {"status": BriefStatus.SUBMITTED.value, "submitted_at": now, "updated_at": now}},
return_document=True,
)
await audit_logger.log_action(
action=AuditAction.BRIEF_SUBMIT,
description=f"Brief '{brief_id}' submitted for review",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=brief_id,
details={"organization_id": result.get("organization_id")},
)
return _doc_to_response(result)
@router.post("/{brief_id}/approve", response_model=JobBriefResponse)
async def approve_brief(
brief_id: str,
http_request: Request,
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
doc = await db.job_briefs.find_one({"_id": brief_id})
if not doc:
raise HTTPException(status_code=404, detail="Brief not found")
assert_user_in_org(ctx, doc["organization_id"], OrgRole.ADMIN)
if doc["status"] != BriefStatus.SUBMITTED.value:
raise HTTPException(status_code=400, detail="Only SUBMITTED briefs can be approved")
now = datetime.utcnow()
result = await db.job_briefs.find_one_and_update(
{"_id": brief_id},
{
"$set": {
"status": BriefStatus.APPROVED.value,
"approved_by": str(ctx.user.id),
"updated_at": now,
}
},
return_document=True,
)
await audit_logger.log_action(
action=AuditAction.BRIEF_APPROVE,
description=f"Brief '{brief_id}' approved",
user=ctx.user,
request=http_request,
resource_type="brief",
resource_id=brief_id,
details={"organization_id": result.get("organization_id")},
)
return _doc_to_response(result)

View file

@ -9,15 +9,16 @@ Access rules:
- List projects (read) Admin, PM, or any team member of the client
"""
from datetime import datetime, timezone
from datetime import UTC, datetime
from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException
from fastapi import APIRouter, Depends, HTTPException, Request
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel
from ...core.database import get_database
from ...core.dependencies import get_current_user, require_pm_for_client, require_roles
from ...core.dependencies import get_current_user, require_roles
from ...models.audit_log import AuditAction
from ...models.client import (
Client,
ClientCreate,
@ -30,6 +31,7 @@ from ...models.client import (
TeamUpdate,
)
from ...models.user import User, UserRole
from ...services.audit_logger import audit_logger
router = APIRouter(prefix="/clients", tags=["clients"])
@ -39,7 +41,7 @@ router = APIRouter(prefix="/clients", tags=["clients"])
# ---------------------------------------------------------------------------
def _now() -> datetime:
return datetime.now(timezone.utc)
return datetime.now(UTC)
async def _get_client_or_404(client_id: str, db: AsyncIOMotorDatabase) -> dict:
@ -91,6 +93,9 @@ def _project_from_doc(doc: dict) -> Project:
name=doc["name"],
client_id=doc["client_id"],
is_active=doc.get("is_active", True),
default_languages=doc.get("default_languages", []),
default_linguist_id=doc.get("default_linguist_id"),
default_reviewer_id=doc.get("default_reviewer_id"),
created_at=doc.get("created_at"),
updated_at=doc.get("updated_at"),
)
@ -118,6 +123,7 @@ async def list_clients(
@router.post("", response_model=Client)
async def create_client(
body: ClientCreate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
@ -134,7 +140,18 @@ async def create_client(
"updated_at": now,
})
doc = await db.clients.find_one({"_id": client_id})
return _client_from_doc(doc)
client = _client_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_CREATE,
description=f"Client '{client.name}' created",
user=current_user,
request=request,
resource_type="client",
resource_id=str(client.id),
resource_name=client.name,
details={"slug": client.slug},
)
return client
@router.get("/{client_id}", response_model=Client)
@ -155,11 +172,12 @@ async def get_client(
async def update_client(
client_id: str,
body: ClientUpdate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
await _get_client_or_404(client_id, db)
update: dict = {k: v for k, v in body.model_dump(exclude_none=True).items()}
update: dict = dict(body.model_dump(exclude_none=True).items())
if not update:
raise HTTPException(status_code=422, detail="No fields to update")
if "slug" in update and await db.clients.find_one({"slug": update["slug"], "_id": {"$ne": client_id}}):
@ -167,17 +185,39 @@ async def update_client(
update["updated_at"] = _now()
await db.clients.update_one({"_id": client_id}, {"$set": update})
doc = await db.clients.find_one({"_id": client_id})
return _client_from_doc(doc)
client = _client_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_UPDATE,
description=f"Client '{client.name}' updated",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client.name,
details={"fields_updated": list(body.model_dump(exclude_none=True).keys())},
)
return client
@router.delete("/{client_id}", status_code=204)
async def deactivate_client(
client_id: str,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
await _get_client_or_404(client_id, db)
doc = await _get_client_or_404(client_id, db)
await db.clients.update_one({"_id": client_id}, {"$set": {"is_active": False, "updated_at": _now()}})
await audit_logger.log_action(
action=AuditAction.CLIENT_DEACTIVATE,
description=f"Client '{doc['name']}' deactivated",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=doc["name"],
details={"was_active": doc.get("is_active", True)},
)
# ---------------------------------------------------------------------------
@ -192,10 +232,11 @@ class AssignPMRequest(BaseModel):
async def assign_pm(
client_id: str,
body: AssignPMRequest,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
await _get_client_or_404(client_id, db)
client_doc = await _get_client_or_404(client_id, db)
user_doc = await db.users.find_one({"_id": body.user_id})
if not user_doc:
raise HTTPException(status_code=404, detail="User not found")
@ -206,16 +247,28 @@ async def assign_pm(
"$set": {"role": UserRole.PROJECT_MANAGER.value, "updated_at": _now()},
},
)
await audit_logger.log_action(
action=AuditAction.CLIENT_PM_ASSIGN,
description=f"PM '{user_doc.get('email', body.user_id)}' assigned to client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"pm_user_id": body.user_id, "pm_email": user_doc.get("email")},
)
@router.delete("/{client_id}/pm/{user_id}", status_code=204)
async def remove_pm(
client_id: str,
user_id: str,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
await _get_client_or_404(client_id, db)
client_doc = await _get_client_or_404(client_id, db)
pm_doc = await db.users.find_one({"_id": user_id})
await db.users.update_one(
{"_id": user_id},
{"$pull": {"pm_client_ids": client_id}, "$set": {"updated_at": _now()}},
@ -227,6 +280,16 @@ async def remove_pm(
{"_id": user_id},
{"$set": {"role": UserRole.CLIENT.value, "updated_at": _now()}},
)
await audit_logger.log_action(
action=AuditAction.CLIENT_PM_REMOVE,
description=f"PM '{pm_doc.get('email', user_id) if pm_doc else user_id}' removed from client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"pm_user_id": user_id, "pm_email": pm_doc.get("email") if pm_doc else None},
)
@router.get("/{client_id}/pm", response_model=list[dict])
@ -263,10 +326,11 @@ async def list_teams(
async def create_team(
client_id: str,
body: TeamCreate,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
await _get_client_or_404(client_id, db)
client_doc = await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
now = _now()
team_id = str(ObjectId())
@ -279,7 +343,18 @@ async def create_team(
"updated_at": now,
})
doc = await db.teams.find_one({"_id": team_id})
return _team_from_doc(doc)
team = _team_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_CREATE,
description=f"Team '{team.name}' created for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team.name},
)
return team
@router.patch("/{client_id}/teams/{team_id}", response_model=Team)
@ -287,32 +362,55 @@ async def update_team(
client_id: str,
team_id: str,
body: TeamUpdate,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
await _get_client_or_404(client_id, db)
client_doc = await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
await _get_team_or_404(team_id, client_id, db)
update = {k: v for k, v in body.model_dump(exclude_none=True).items()}
update = dict(body.model_dump(exclude_none=True).items())
if not update:
raise HTTPException(status_code=422, detail="No fields to update")
update["updated_at"] = _now()
await db.teams.update_one({"_id": team_id}, {"$set": update})
doc = await db.teams.find_one({"_id": team_id})
return _team_from_doc(doc)
team = _team_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_UPDATE,
description=f"Team '{team.name}' updated for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team.name, "fields_updated": list(body.model_dump(exclude_none=True).keys())},
)
return team
@router.delete("/{client_id}/teams/{team_id}", status_code=204)
async def delete_team(
client_id: str,
team_id: str,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
await _get_client_or_404(client_id, db)
client_doc = await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
await _get_team_or_404(team_id, client_id, db)
team_doc = await _get_team_or_404(team_id, client_id, db)
await db.teams.delete_one({"_id": team_id})
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_DELETE,
description=f"Team '{team_doc['name']}' deleted from client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team_doc["name"]},
)
# Team membership
@ -326,18 +424,35 @@ async def add_team_member(
client_id: str,
team_id: str,
body: AddMemberRequest,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
await _get_client_or_404(client_id, db)
client_doc = await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
await _get_team_or_404(team_id, client_id, db)
if not await db.users.find_one({"_id": body.user_id}):
team_doc = await _get_team_or_404(team_id, client_id, db)
member_doc = await db.users.find_one({"_id": body.user_id})
if not member_doc:
raise HTTPException(status_code=404, detail="User not found")
# Write to both Team.member_user_ids (legacy) and Membership.team_ids (MT-17)
await db.teams.update_one(
{"_id": team_id},
{"$addToSet": {"member_user_ids": body.user_id}, "$set": {"updated_at": _now()}},
)
await db.memberships.update_one(
{"user_id": body.user_id, "organization_id": client_id},
{"$addToSet": {"team_ids": team_id}},
)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_MEMBER_ADD,
description=f"User '{member_doc.get('email', body.user_id)}' added to team '{team_doc['name']}' of client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team_doc["name"], "member_user_id": body.user_id, "member_email": member_doc.get("email")},
)
@router.delete("/{client_id}/teams/{team_id}/members/{user_id}", status_code=204)
@ -345,22 +460,56 @@ async def remove_team_member(
client_id: str,
team_id: str,
user_id: str,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
await _get_client_or_404(client_id, db)
client_doc = await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
await _get_team_or_404(team_id, client_id, db)
team_doc = await _get_team_or_404(team_id, client_id, db)
member_doc = await db.users.find_one({"_id": user_id})
await db.teams.update_one(
{"_id": team_id},
{"$pull": {"member_user_ids": user_id}, "$set": {"updated_at": _now()}},
)
await db.memberships.update_one(
{"user_id": user_id, "organization_id": client_id},
{"$pull": {"team_ids": team_id}},
)
await audit_logger.log_action(
action=AuditAction.CLIENT_TEAM_MEMBER_REMOVE,
description=f"User '{member_doc.get('email', user_id) if member_doc else user_id}' removed from team '{team_doc['name']}' of client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"team_id": team_id, "team_name": team_doc["name"], "member_user_id": user_id, "member_email": member_doc.get("email") if member_doc else None},
)
# ---------------------------------------------------------------------------
# Project endpoints
# ---------------------------------------------------------------------------
@router.get("/all-projects", response_model=list[Project])
async def list_all_projects(
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Return all active projects accessible to the current user (across all clients)."""
if current_user.role in (UserRole.ADMIN, UserRole.PRODUCTION, UserRole.PROJECT_MANAGER):
docs = await db.projects.find({"is_active": True}).to_list(None)
else:
accessible_client_ids = await _get_accessible_client_ids(current_user, db)
if not accessible_client_ids:
return []
docs = await db.projects.find(
{"client_id": {"$in": accessible_client_ids}, "is_active": True}
).to_list(None)
return [_project_from_doc(d) for d in docs]
@router.get("/{client_id}/projects", response_model=list[Project])
async def list_projects(
client_id: str,
@ -377,11 +526,12 @@ async def list_projects(
async def create_project(
client_id: str,
body: ProjectCreate,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
client_doc = await _get_client_or_404(client_id, db)
await _assert_pm_or_client_member(current_user, client_id, db)
now = _now()
project_id = str(ObjectId())
await db.projects.insert_one({
@ -389,11 +539,25 @@ async def create_project(
"name": body.name,
"client_id": client_id,
"is_active": True,
"default_languages": body.default_languages,
"default_linguist_id": body.default_linguist_id,
"default_reviewer_id": body.default_reviewer_id,
"created_at": now,
"updated_at": now,
})
doc = await db.projects.find_one({"_id": project_id})
return _project_from_doc(doc)
project = _project_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_PROJECT_CREATE,
description=f"Project '{project.name}' created for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"project_id": project_id, "project_name": project.name, "default_languages": body.default_languages},
)
return project
@router.patch("/{client_id}/projects/{project_id}", response_model=Project)
@ -401,35 +565,58 @@ async def update_project(
client_id: str,
project_id: str,
body: ProjectUpdate,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
await _get_client_or_404(client_id, db)
client_doc = await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
await _get_project_or_404(project_id, client_id, db)
update = {k: v for k, v in body.model_dump(exclude_none=True).items()}
update = dict(body.model_dump(exclude_none=True).items())
if not update:
raise HTTPException(status_code=422, detail="No fields to update")
update["updated_at"] = _now()
await db.projects.update_one({"_id": project_id}, {"$set": update})
doc = await db.projects.find_one({"_id": project_id})
return _project_from_doc(doc)
project = _project_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.CLIENT_PROJECT_UPDATE,
description=f"Project '{project.name}' updated for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"project_id": project_id, "project_name": project.name, "fields_updated": list(body.model_dump(exclude_none=True).keys())},
)
return project
@router.delete("/{client_id}/projects/{project_id}", status_code=204)
async def archive_project(
client_id: str,
project_id: str,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
await _get_client_or_404(client_id, db)
client_doc = await _get_client_or_404(client_id, db)
await _assert_pm_or_admin(current_user, client_id, db)
await _get_project_or_404(project_id, client_id, db)
project_doc = await _get_project_or_404(project_id, client_id, db)
await db.projects.update_one(
{"_id": project_id},
{"$set": {"is_active": False, "updated_at": _now()}},
)
await audit_logger.log_action(
action=AuditAction.CLIENT_PROJECT_ARCHIVE,
description=f"Project '{project_doc['name']}' archived for client '{client_doc['name']}'",
user=current_user,
request=request,
resource_type="client",
resource_id=client_id,
resource_name=client_doc["name"],
details={"project_id": project_id, "project_name": project_doc["name"]},
)
# ---------------------------------------------------------------------------
@ -449,6 +636,37 @@ async def _assert_pm_or_admin(user: User, client_id: str, db: AsyncIOMotorDataba
raise HTTPException(status_code=403, detail="Not a manager for this client")
async def _assert_pm_or_client_member(user: User, client_id: str, db: AsyncIOMotorDatabase) -> None:
"""Allow PM/ADMIN/PROD or any org member (CLIENT role) with membership in this client's org."""
if user.role in (UserRole.ADMIN, UserRole.PRODUCTION):
return
if user.role == UserRole.PROJECT_MANAGER:
if client_id in (user.pm_client_ids or []):
return
mem = await db.memberships.find_one({"user_id": str(user.id), "organization_id": client_id})
if mem and mem.get("role_in_org") in ("owner", "admin", "manager"):
return
# Allow CLIENT users who are members of the org
if user.role == UserRole.CLIENT:
mem = await db.memberships.find_one({"user_id": str(user.id), "organization_id": client_id})
if mem:
return
raise HTTPException(status_code=403, detail="Not authorized to create projects for this client")
async def _get_accessible_client_ids(user: User, db: AsyncIOMotorDatabase) -> list[str]:
"""Return list of client_ids the user can access."""
ids: set[str] = set()
# PM assignments (legacy)
if user.pm_client_ids:
ids.update(user.pm_client_ids)
# Org memberships
mems = await db.memberships.find({"user_id": str(user.id)}).to_list(None)
for m in mems:
ids.add(m["organization_id"])
return list(ids)
async def _assert_client_access(user: User, client_id: str, db: AsyncIOMotorDatabase) -> None:
"""Allow platform staff, org members (any role), or PM of the client."""
if user.role in (UserRole.ADMIN, UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.LINGUIST):
@ -460,6 +678,4 @@ async def _assert_client_access(user: User, client_id: str, db: AsyncIOMotorData
# Legacy fallback for pre-migration users
if user.role == UserRole.PROJECT_MANAGER and client_id in (user.pm_client_ids or []):
return
if user.role in (UserRole.CLIENT, UserRole.PROJECT_MANAGER):
return
raise HTTPException(status_code=403, detail="Insufficient permissions")

View file

@ -3,11 +3,11 @@ from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.database import get_database
from ...core.dependencies import get_current_user
from ...models.audit_log import AuditAction
from ...models.user import User
from ...schemas.file import SignedUploadRequest, SignedUploadResponse
from ...services.gcs import generate_signed_upload_url
from ...services.audit_logger import audit_logger
from ...models.audit_log import AuditAction
from ...services.gcs import generate_signed_upload_url
router = APIRouter(prefix="/files", tags=["files"])
@ -28,11 +28,11 @@ async def get_signed_upload_url(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Only video files are supported"
)
# Generate unique blob path
from bson import ObjectId
blob_path = f"temp/{ObjectId()}/{request.filename}"
try:
# Generate signed upload URL with form fields
signed_data = await generate_signed_upload_url(
@ -40,7 +40,7 @@ async def get_signed_upload_url(
content_type=request.content_type,
max_size=request.max_size or 1024 * 1024 * 1024 # 1GB default
)
await audit_logger.log_action(
action=AuditAction.FILE_UPLOAD,
description=f"Signed upload URL generated for {request.filename}",
@ -62,4 +62,4 @@ async def get_signed_upload_url(
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to generate signed upload URL: {str(e)}"
)
) from None

View file

@ -0,0 +1,326 @@
"""
Glossary management endpoints.
Access:
- All glossary mutations (upload, activate, archive) Admin or PM of the client
- Glossary reads (list, detail, terms) Admin, PM, or staff members
Routes are nested under /clients/{client_id}/glossaries to keep ownership clear.
"""
from __future__ import annotations
from fastapi import APIRouter, Depends, File, Form, HTTPException, Query, UploadFile
from ...core.authz import MembershipContext, assert_user_in_org, get_membership_context
from ...core.logging import get_logger
from ...models.audit_log import AuditAction
from ...models.glossary import (
GlossaryDetailResponse,
GlossaryResponse,
GlossaryVersionResponse,
)
from ...models.organization import OrgRole
from ...services import audit_logger as audit_svc
from ...services import glossary_service as svc
logger = get_logger(__name__)
router = APIRouter(
prefix="/clients/{client_id}/glossaries",
tags=["glossaries"],
)
_ALLOWED_CONTENT_TYPES = {
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"application/vnd.ms-excel",
}
_MAX_FILE_SIZE_MB = 50
# ── List glossaries ───────────────────────────────────────────────────────────
@router.get("", response_model=list[GlossaryResponse])
async def list_glossaries(
client_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
"""List all active glossaries for a client."""
assert_user_in_org(ctx, client_id, OrgRole.VIEWER)
glossaries = await svc.get_glossaries_for_client(client_id)
version_map = await svc.get_versions_by_ids([g.current_version_id for g in glossaries if g.current_version_id])
return [_to_response(g, version_map.get(g.current_version_id)) for g in glossaries]
# ── Upload new glossary ───────────────────────────────────────────────────────
@router.post("", response_model=GlossaryDetailResponse, status_code=201)
async def upload_glossary(
client_id: str,
file: UploadFile = File(..., description="xlsx glossary file"),
name: str = Form(...),
source_locale: str = Form(..., description="BCP-47 source locale, e.g. en-GB"),
source_locale_col: str = Form(..., description="xlsx column header for the source language, e.g. en_gb"),
description: str | None = Form(None),
change_note: str | None = Form(None),
ctx: MembershipContext = Depends(get_membership_context),
):
"""Upload a new glossary xlsx file and associate it with a client."""
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
_validate_xlsx(file)
try:
glossary, version = await svc.ingest_glossary(
client_id=client_id,
name=name,
source_locale=source_locale,
source_locale_col=source_locale_col,
file=file,
user_id=str(ctx.user.id),
description=description,
change_note=change_note,
)
except ValueError as exc:
raise HTTPException(status_code=422, detail=str(exc)) from exc
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_UPLOAD,
description=f"Glossary '{name}' uploaded for client {client_id}",
user=ctx.user,
resource_type="glossary",
resource_id=glossary.id,
details={"term_count": version.term_count, "source_locale": source_locale},
)
versions = await svc.get_versions(glossary.id)
return _to_detail_response(glossary, versions)
# ── Get glossary detail ───────────────────────────────────────────────────────
@router.get("/{glossary_id}", response_model=GlossaryDetailResponse)
async def get_glossary(
client_id: str,
glossary_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.VIEWER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
versions = await svc.get_versions(glossary_id)
return _to_detail_response(glossary, versions)
# ── Browse terms ──────────────────────────────────────────────────────────────
@router.get("/{glossary_id}/terms")
async def list_terms(
client_id: str,
glossary_id: str,
version_id: str | None = Query(None, description="Specific version; defaults to active"),
search: str | None = Query(None),
page: int = Query(1, ge=1),
page_size: int = Query(50, ge=1, le=200),
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.VIEWER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
vid = version_id or glossary.current_version_id
if not vid:
return {"terms": [], "total": 0, "page": page, "page_size": page_size}
terms, total = await svc.get_terms_page(vid, search=search, page=page, page_size=page_size)
return {
"terms": [{"source_term": t["source_term"], "translations": t["translations"]} for t in terms],
"total": total,
"page": page,
"page_size": page_size,
}
# ── Upload new version ────────────────────────────────────────────────────────
@router.post("/{glossary_id}/versions", response_model=GlossaryVersionResponse, status_code=201)
async def upload_version(
client_id: str,
glossary_id: str,
file: UploadFile = File(...),
source_locale_col: str = Form(...),
change_note: str | None = Form(None),
ctx: MembershipContext = Depends(get_membership_context),
):
"""Upload a new xlsx file as a new version of an existing glossary."""
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
_validate_xlsx(file)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
try:
version = await svc.ingest_new_version(
glossary_id=glossary_id,
source_locale_col=source_locale_col,
file=file,
user_id=str(ctx.user.id),
change_note=change_note,
)
except ValueError as exc:
raise HTTPException(status_code=422, detail=str(exc)) from exc
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_VERSION_UPLOAD,
description=f"New glossary version uploaded for glossary {glossary_id}",
user=ctx.user,
resource_type="glossary_version",
resource_id=version.id,
details={"term_count": version.term_count, "version_number": version.version_number},
)
return _version_to_response(version)
# ── Activate a version ────────────────────────────────────────────────────────
@router.post("/{glossary_id}/activate")
async def activate_version(
client_id: str,
glossary_id: str,
version_id: str = Form(...),
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
try:
await svc.activate_version(glossary_id, version_id)
except ValueError as exc:
raise HTTPException(status_code=404, detail=str(exc)) from exc
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_ACTIVATE,
description=f"Glossary version {version_id} activated",
user=ctx.user,
resource_type="glossary",
resource_id=glossary_id,
details={"version_id": version_id},
)
return {"status": "ok", "active_version_id": version_id}
# ── Re-queue embedding ────────────────────────────────────────────────────────
@router.post("/{glossary_id}/versions/{version_id}/reembed", status_code=202)
async def reembed_version(
client_id: str,
glossary_id: str,
version_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
"""Re-queue the embedding task for a glossary version (resets failed/pending/stuck embeds)."""
assert_user_in_org(ctx, client_id, OrgRole.MANAGER)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
versions = await svc.get_versions(glossary_id)
version = next((v for v in versions if str(v.id) == version_id), None)
if not version:
raise HTTPException(status_code=404, detail="Version not found")
try:
import motor.motor_asyncio
from bson import ObjectId
from ...core.config import settings
from ...tasks.embed_glossary import embed_glossary_version_task
client_db = motor.motor_asyncio.AsyncIOMotorClient(settings.mongodb_uri)
db = client_db[settings.mongodb_db]
await db.glossary_versions.update_one(
{"_id": ObjectId(version_id)},
{"$set": {"embedding_status": "pending", "embedded_count": 0}},
)
client_db.close()
embed_glossary_version_task.delay(version_id)
except Exception as exc:
raise HTTPException(status_code=500, detail=f"Failed to queue embedding: {exc}") from exc
return {"status": "queued", "version_id": version_id}
# ── Delete ───────────────────────────────────────────────────────────────────
@router.delete("/{glossary_id}", status_code=204)
async def archive_glossary(
client_id: str,
glossary_id: str,
ctx: MembershipContext = Depends(get_membership_context),
):
assert_user_in_org(ctx, client_id, OrgRole.ADMIN)
glossary = await svc.get_glossary(glossary_id)
if not glossary or glossary.client_id != client_id:
raise HTTPException(status_code=404, detail="Glossary not found")
await svc.archive_glossary(glossary_id)
await audit_svc.audit_logger.log_action(
action=AuditAction.GLOSSARY_ARCHIVE,
description=f"Glossary {glossary_id} archived",
user=ctx.user,
resource_type="glossary",
resource_id=glossary_id,
)
# ── Helpers ───────────────────────────────────────────────────────────────────
def _validate_xlsx(file: UploadFile) -> None:
if file.content_type not in _ALLOWED_CONTENT_TYPES and not (
file.filename and file.filename.endswith(".xlsx")
):
raise HTTPException(
status_code=422,
detail="Only .xlsx files are accepted",
)
def _to_response(g, current_version=None) -> GlossaryResponse:
return GlossaryResponse(
id=str(g.id),
client_id=g.client_id,
name=g.name,
description=g.description,
source_locale=g.source_locale,
source=g.source,
status=g.status,
current_version_id=g.current_version_id,
current_version_embedding_status=current_version.embedding_status if current_version else None,
current_version_embedded_count=current_version.embedded_count if current_version else None,
current_version_term_count=current_version.term_count if current_version else None,
created_at=g.created_at,
created_by=g.created_by,
)
def _version_to_response(v) -> GlossaryVersionResponse:
return GlossaryVersionResponse(
id=str(v.id),
glossary_id=v.glossary_id,
version_number=v.version_number,
term_count=v.term_count,
embedded_count=v.embedded_count,
embedding_status=v.embedding_status,
created_at=v.created_at,
created_by=v.created_by,
change_note=v.change_note,
)
def _to_detail_response(glossary, versions) -> GlossaryDetailResponse:
return GlossaryDetailResponse(
**_to_response(glossary).model_dump(),
versions=[_version_to_response(v) for v in versions],
)

View file

@ -14,16 +14,21 @@ Protected endpoints:
import hashlib
import re
import secrets
from datetime import datetime, timedelta, timezone
from datetime import UTC, datetime, timedelta
from fastapi import APIRouter, Depends, HTTPException, status
from fastapi import APIRouter, Depends, HTTPException, Request
from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import bump_user_membership_cache
from ...core.database import get_database
from ...core.dependencies import get_current_user
from ...core.security import create_access_token, create_refresh_token, get_password_hash
from ...core.security import (
create_access_token,
create_refresh_token,
get_password_hash,
)
from ...models.audit_log import AuditAction
from ...models.invitation import (
Invitation,
InvitationAcceptRequest,
InvitationCreate,
InvitationPreviewResponse,
@ -31,7 +36,7 @@ from ...models.invitation import (
)
from ...models.organization import OrgRole
from ...models.user import AuthProvider, User, UserRole
from ...core.authz import bump_user_membership_cache
from ...services.audit_logger import audit_logger
from ...services.emailer import email_service
from ...services.membership_service import get_membership, upsert_membership
@ -39,7 +44,7 @@ router = APIRouter(tags=["invitations"])
def _now() -> datetime:
return datetime.now(timezone.utc)
return datetime.now(UTC)
def _hash_token(plaintext: str) -> str:
@ -54,7 +59,7 @@ def _make_token() -> tuple[str, str]:
def _inv_from_doc(doc: dict) -> InvitationResponse:
now = _now()
expires_at = doc["expires_at"].replace(tzinfo=timezone.utc) if doc["expires_at"].tzinfo is None else doc["expires_at"]
expires_at = doc["expires_at"].replace(tzinfo=UTC) if doc["expires_at"].tzinfo is None else doc["expires_at"]
return InvitationResponse(
id=str(doc["_id"]),
email=doc["email"],
@ -100,6 +105,7 @@ org_router = APIRouter(prefix="/organizations", tags=["invitations"])
async def create_invitation(
org_id: str,
body: InvitationCreate,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
@ -121,6 +127,18 @@ async def create_invitation(
detail="A pending invitation already exists for this email. Revoke it first to re-invite.",
)
# MT-19: ensure all target_team_ids belong to this org (client_id == org_id)
if body.target_team_ids:
valid_teams = await db.teams.count_documents({
"_id": {"$in": body.target_team_ids},
"client_id": org_id,
})
if valid_teams != len(body.target_team_ids):
raise HTTPException(
status_code=400,
detail="One or more target_team_ids do not belong to this organization.",
)
plaintext, token_hash = _make_token()
now = _now()
expires_at = now + timedelta(days=body.expires_in_days)
@ -154,7 +172,17 @@ async def create_invitation(
expires_at=expires_at,
)
return _inv_from_doc(doc)
inv = _inv_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.INVITATION_CREATE,
description=f"Invitation created for '{email_lower}' to organization '{org_id}'",
user=current_user,
request=request,
resource_type="invitation",
resource_id=inv.id,
details={"invited_email": email_lower, "org_id": org_id, "role": body.role_in_org},
)
return inv
@org_router.get("/{org_id}/invitations", response_model=list[InvitationResponse])
@ -174,16 +202,30 @@ async def list_invitations(
async def revoke_invitation(
org_id: str,
invitation_id: str,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
await _assert_org_admin(org_id, current_user, db)
inv_doc = await db.invitations.find_one({"_id": invitation_id, "organization_id": org_id})
result = await db.invitations.update_one(
{"_id": invitation_id, "organization_id": org_id, "accepted_at": None, "revoked_at": None},
{"$set": {"revoked_at": _now()}},
)
if result.matched_count == 0:
raise HTTPException(status_code=404, detail="Invitation not found or already accepted/revoked")
await audit_logger.log_action(
action=AuditAction.INVITATION_REVOKE,
description=f"Invitation '{invitation_id}' revoked in organization '{org_id}'",
user=current_user,
request=request,
resource_type="invitation",
resource_id=invitation_id,
details={
"invited_email": inv_doc["email"] if inv_doc else None,
"org_id": org_id,
},
)
# ---------------------------------------------------------------------------
@ -206,7 +248,7 @@ async def preview_invitation(
raise HTTPException(status_code=410, detail="Invitation not found or has expired")
now = _now()
expires_at = doc["expires_at"].replace(tzinfo=timezone.utc) if doc["expires_at"].tzinfo is None else doc["expires_at"]
expires_at = doc["expires_at"].replace(tzinfo=UTC) if doc["expires_at"].tzinfo is None else doc["expires_at"]
if doc.get("revoked_at"):
raise HTTPException(status_code=410, detail="This invitation has been revoked")
@ -255,6 +297,7 @@ async def preview_invitation(
@router.post("/invitations/accept")
async def accept_invitation(
body: InvitationAcceptRequest,
request: Request,
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Accept an invitation. Creates user if needed, creates membership, returns tokens."""
@ -317,12 +360,16 @@ async def accept_invitation(
await upsert_membership(user_id, org_id, role_in_org, doc["invited_by_user_id"], db)
await bump_user_membership_cache(user_id)
# Auto-add to target teams
# Auto-add to target teams — write to both Team.member_user_ids (legacy) and Membership.team_ids (MT-17)
for team_id in doc.get("target_team_ids", []):
await db.teams.update_one(
{"_id": team_id, "client_id": org_id},
{"$addToSet": {"member_user_ids": user_id}},
)
await db.memberships.update_one(
{"user_id": user_id, "organization_id": org_id},
{"$addToSet": {"team_ids": team_id}},
)
# Send welcome email
if not existing_user.get("_welcomed"):
@ -333,12 +380,23 @@ async def accept_invitation(
org_name=org_name,
)
# Issue JWT tokens
access_token = create_access_token(subject=user_id)
# Issue JWT tokens with org_ids claim
_inv_org_ids = [m["organization_id"] async for m in db.memberships.find({"user_id": user_id}, {"organization_id": 1})]
access_token = create_access_token(subject=user_id, org_ids=[str(o) for o in _inv_org_ids if o])
refresh_token = create_refresh_token(subject=user_id)
org_name, org_slug = await _get_org_name(org_id, db)
await audit_logger.log_action(
action=AuditAction.INVITATION_ACCEPT,
description=f"Invitation accepted by '{email_lower}' for organization '{org_id}'",
user=None,
request=request,
resource_type="invitation",
resource_id=str(doc["_id"]),
details={"invited_email": email_lower, "org_id": org_id},
)
return {
"access_token": access_token,
"refresh_token": refresh_token,

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,580 @@
"""Per-language QC endpoints — two-stage (linguist + reviewer) assignment, workflow, comments."""
from datetime import datetime
from fastapi import APIRouter, Depends, HTTPException, Query, Request
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel, Field
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...models.audit_log import AuditAction
from ...models.job import LanguageQCComment, LanguageQCState
from ...models.user import User, UserRole
from ...services import language_qc as lqc
from ...services.audit_logger import audit_logger
router = APIRouter(tags=["language-qc"])
# ── Request / response schemas ────────────────────────────────────────────────
class AssignRequest(BaseModel):
linguist_user_id: str
notes: str | None = None
deadline: datetime | None = None
class ReassignRequest(BaseModel):
linguist_user_id: str
notes: str | None = None
deadline: datetime | None = None
class AssignReviewerRequest(BaseModel):
reviewer_user_id: str
notes: str | None = None
deadline: datetime | None = None
class ReassignReviewerRequest(BaseModel):
reviewer_user_id: str
notes: str | None = None
deadline: datetime | None = None
class ApproveLanguageRequest(BaseModel):
notes: str | None = None
class RejectLanguageRequest(BaseModel):
notes: str
category: str | None = None # timing | mistranslation | terminology | profanity | length | other
class ReopenLanguageRequest(BaseModel):
notes: str | None = None
class AddCommentRequest(BaseModel):
body: str = Field(..., min_length=1, max_length=4000)
class LanguageQCStateResponse(BaseModel):
lang: str
state: LanguageQCState
class LanguageQCMapResponse(BaseModel):
job_id: str
language_qc: dict[str, LanguageQCState]
class QueueItem(BaseModel):
job_id: str
job_title: str
job_status: str
lang: str
lang_qc_status: str
assigned_at: str | None = None
reviewed_at: str | None = None
class QueueResponse(BaseModel):
items: list[QueueItem]
total: int
class BulkAssignRequest(BaseModel):
linguist_user_id: str
reviewer_user_id: str | None = None
languages: list[str] | None = None # None = all available languages
only_unassigned: bool = False # skip languages that already have an assignment
deadline: datetime | None = None
class BulkAssignResponse(BaseModel):
assigned: list[str]
skipped: list[str]
errors: dict[str, str]
# ── Routes ────────────────────────────────────────────────────────────────────
@router.get("/jobs/{job_id}/language-qc", response_model=LanguageQCMapResponse)
async def get_language_qc(
job_id: str,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION,
UserRole.PROJECT_MANAGER, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
# Lazy auto-assignment: apply project/job defaults on first open in PENDING_QC
await lqc.auto_assign_defaults(db, job_id)
states = await lqc.get_all_states(db, job_id)
return LanguageQCMapResponse(job_id=job_id, language_qc=states)
# ── Linguist assignment ───────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/assign", response_model=LanguageQCStateResponse)
async def assign_language(
job_id: str,
lang: str,
request: AssignRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.assign_linguist(
db, job_id, lang, request.linguist_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_ASSIGN,
description=f"Language '{lang}' assigned to linguist '{request.linguist_user_id}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "linguist_user_id": request.linguist_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reassign", response_model=LanguageQCStateResponse)
async def reassign_language(
job_id: str,
lang: str,
request: ReassignRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reassign_linguist(
db, job_id, lang, request.linguist_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REASSIGN,
description=f"Language '{lang}' reassigned to linguist '{request.linguist_user_id}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "linguist_user_id": request.linguist_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Reviewer assignment ───────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/assign-reviewer", response_model=LanguageQCStateResponse)
async def assign_reviewer(
job_id: str,
lang: str,
request: AssignReviewerRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.assign_reviewer(
db, job_id, lang, request.reviewer_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REVIEWER_ASSIGN,
description=f"Reviewer '{request.reviewer_user_id}' assigned to language '{lang}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "reviewer_user_id": request.reviewer_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reassign-reviewer", response_model=LanguageQCStateResponse)
async def reassign_reviewer(
job_id: str,
lang: str,
request: ReassignReviewerRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reassign_reviewer(
db, job_id, lang, request.reviewer_user_id, current_user,
http_request=http_request, notes=request.notes, deadline=request.deadline,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REVIEWER_REASSIGN,
description=f"Reviewer reassigned to '{request.reviewer_user_id}' for language '{lang}', job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "reviewer_user_id": request.reviewer_user_id},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Bulk assignment ───────────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/bulk-assign", response_model=BulkAssignResponse)
async def bulk_assign_languages(
job_id: str,
request: BulkAssignRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Assign one linguist (and optionally one reviewer) to multiple languages in one call."""
job_doc = await db["jobs"].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
available = list((job_doc.get("outputs") or {}).keys())
target_langs = request.languages if request.languages else available
assigned: list[str] = []
skipped: list[str] = []
errors: dict[str, str] = {}
language_qc = job_doc.get("language_qc") or {}
for lang in target_langs:
if lang not in available:
skipped.append(lang)
continue
lang_state = language_qc.get(lang) or {}
already_assigned = bool(lang_state.get("assigned_linguist_id"))
if request.only_unassigned and already_assigned:
skipped.append(lang)
continue
try:
await lqc.assign_linguist(
db, job_id, lang, request.linguist_user_id, current_user,
http_request=http_request, deadline=request.deadline,
)
except Exception as exc:
errors[lang] = str(exc)
continue
if request.reviewer_user_id:
try:
await lqc.assign_reviewer(
db, job_id, lang, request.reviewer_user_id, current_user,
http_request=http_request, deadline=request.deadline,
)
except Exception as exc:
errors[f"{lang}:reviewer"] = str(exc)
assigned.append(lang)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_BULK_ASSIGN,
description=f"Bulk assignment for job {job_id}: {len(assigned)} language(s) assigned to linguist '{request.linguist_user_id}'",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={
"languages": assigned,
"linguist_user_id": request.linguist_user_id,
"reviewer_user_id": request.reviewer_user_id,
"skipped": skipped,
"errors": errors,
},
)
return BulkAssignResponse(assigned=assigned, skipped=skipped, errors=errors)
# ── Workflow transitions ──────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/start-work", response_model=LanguageQCStateResponse)
async def start_linguist_work(
job_id: str,
lang: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Linguist opens the language — pending → in_progress."""
state = await lqc.start_linguist_work(db, job_id, lang, current_user)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_START_WORK,
description=f"Linguist started work on language '{lang}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/submit", response_model=LanguageQCStateResponse)
async def submit_for_review(
job_id: str,
lang: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Linguist submits — in_progress → pending_review. Notifies reviewer by email."""
state = await lqc.submit_for_review(db, job_id, lang, current_user, http_request=http_request)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_SUBMIT,
description=f"Language '{lang}' submitted for review for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/open-review", response_model=LanguageQCStateResponse)
async def open_review(
job_id: str,
lang: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Reviewer opens the review — pending_review → in_review."""
state = await lqc.open_review(db, job_id, lang, current_user, http_request=http_request)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_OPEN_REVIEW,
description=f"Reviewer opened review for language '{lang}', job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Approve / Reject / Reopen ─────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/approve", response_model=LanguageQCStateResponse)
async def approve_language(
job_id: str,
lang: str,
request: ApproveLanguageRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.approve_language(
db, job_id, lang, current_user, http_request=http_request, notes=request.notes,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_APPROVE,
description=f"Language '{lang}' approved for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "notes": request.notes},
)
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reject", response_model=LanguageQCStateResponse)
async def reject_language(
job_id: str,
lang: str,
request: RejectLanguageRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reject_language(
db, job_id, lang, current_user, request.notes, category=request.category, http_request=http_request,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REJECT,
description=f"Language '{lang}' rejected for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "notes": request.notes, "category": request.category},
)
return LanguageQCStateResponse(lang=lang, state=state)
class MarkCueReviewedRequest(BaseModel):
total_cues: int | None = None # client sends on first call to set total
@router.post("/jobs/{job_id}/languages/{lang}/mark-cue-reviewed", response_model=LanguageQCStateResponse)
async def mark_cue_reviewed(
job_id: str,
lang: str,
request: MarkCueReviewedRequest,
http_request: Request,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Increment reviewed_cues counter; optionally set total_cues on first call."""
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
inc_op: dict = {f"language_qc.{lang}.reviewed_cues": 1}
set_op: dict = {"updated_at": datetime.utcnow()}
if request.total_cues is not None:
set_op[f"language_qc.{lang}.total_cues"] = request.total_cues
await db.jobs.update_one({"_id": job_id}, {"$inc": inc_op, "$set": set_op})
updated_doc = await db.jobs.find_one({"_id": job_id})
state_dict = (updated_doc.get("language_qc") or {}).get(lang, {})
from ...models.job import LanguageQCState
state = LanguageQCState(**state_dict) if isinstance(state_dict, dict) else LanguageQCState()
return LanguageQCStateResponse(lang=lang, state=state)
@router.post("/jobs/{job_id}/languages/{lang}/reopen", response_model=LanguageQCStateResponse)
async def reopen_language(
job_id: str,
lang: str,
request: ReopenLanguageRequest,
http_request: Request,
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.reopen_language(
db, job_id, lang, current_user, http_request=http_request, notes=request.notes,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_REOPEN,
description=f"Language '{lang}' reopened for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "notes": request.notes},
)
return LanguageQCStateResponse(lang=lang, state=state)
# ── Comments ──────────────────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/languages/{lang}/comments", response_model=LanguageQCComment, status_code=201)
async def add_comment(
job_id: str,
lang: str,
request: AddCommentRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.REVIEWER, UserRole.PROJECT_MANAGER,
UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
comment = await lqc.add_comment(
db, job_id, lang, current_user, request.body, http_request=http_request,
)
await audit_logger.log_action(
action=AuditAction.LANGUAGE_QC_COMMENT,
description=f"Comment added to language '{lang}' for job {job_id}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "comment_id": str(comment.id) if hasattr(comment, "id") else None},
)
return comment
@router.get("/jobs/{job_id}/languages/{lang}/comments", response_model=list[LanguageQCComment])
async def list_comments(
job_id: str,
lang: str,
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.REVIEWER, UserRole.PROJECT_MANAGER,
UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
state = await lqc.get_state(db, job_id, lang)
if state is None:
return []
return state.comments
# ── Queues ─────────────────────────────────────────────────────────────────────
@router.get("/me/language-qc-queue", response_model=QueueResponse)
async def my_language_qc_queue(
role: str = Query("linguist", description="'linguist' or 'reviewer'"),
qc_status: str | None = Query(None, description="Filter by status"),
skip: int = Query(0, ge=0),
limit: int = Query(50, ge=1, le=200),
current_user: User = Depends(require_roles(
UserRole.LINGUIST, UserRole.REVIEWER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List jobs and languages assigned to the current user as linguist or reviewer."""
# ADMIN sees all orgs; staff scoped to their orgs from JWT claim (MT-18)
org_ids: list[str] | None = None if current_user.role == UserRole.ADMIN else getattr(current_user, "org_ids", None)
if role == "reviewer":
jobs = await lqc.list_for_reviewer(
db, str(current_user.id), accessible_org_ids=org_ids,
status_filter=qc_status, skip=skip, limit=limit,
)
else:
jobs = await lqc.list_for_linguist(
db, str(current_user.id), accessible_org_ids=org_ids,
status_filter=qc_status, skip=skip, limit=limit,
)
items: list[QueueItem] = []
for job in jobs:
job_id = str(job["_id"])
for assignment in job.get("_my_assignments", []):
lang = assignment["lang"]
state_raw = (job.get("language_qc") or {}).get(lang, {})
items.append(QueueItem(
job_id=job_id,
job_title=job.get("title", ""),
job_status=job.get("status", ""),
lang=lang,
lang_qc_status=assignment.get("status", "pending"),
assigned_at=state_raw.get("assigned_at").isoformat() if isinstance(state_raw, dict) and state_raw.get("assigned_at") else None,
reviewed_at=state_raw.get("reviewed_at").isoformat() if isinstance(state_raw, dict) and state_raw.get("reviewed_at") else None,
))
return QueueResponse(items=items, total=len(items))

View file

@ -12,19 +12,25 @@ underlying MongoDB collections used by routes_clients.py so both
endpoints coexist without data duplication.
"""
from datetime import datetime, timezone
from datetime import UTC, datetime
from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException
from fastapi import APIRouter, Depends, HTTPException, Request
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel
from ...core.authz import bump_user_membership_cache
from ...core.database import get_database
from ...core.dependencies import get_current_user, require_roles
from ...models.audit_log import AuditAction
from ...models.membership import MemberDetail, MembershipCreate, MembershipUpdate
from ...models.organization import OrgRole, Organization, OrganizationCreate, OrganizationUpdate
from ...models.organization import (
Organization,
OrganizationCreate,
OrganizationUpdate,
OrgRole,
)
from ...models.user import User, UserRole
from ...core.authz import bump_user_membership_cache
from ...services.audit_logger import audit_logger
from ...services.membership_service import (
get_membership,
get_memberships_for_user,
@ -39,7 +45,7 @@ ADMIN_ROLES = [UserRole.ADMIN]
def _now() -> datetime:
return datetime.now(timezone.utc)
return datetime.now(UTC)
# ---------------------------------------------------------------------------
@ -115,6 +121,7 @@ class _OrgCreate(BaseModel):
@router.post("", response_model=Organization, status_code=201)
async def create_organization(
body: OrganizationCreate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
@ -133,13 +140,25 @@ async def create_organization(
"updated_at": now,
}
await db.clients.insert_one(doc)
return _org_from_doc(doc)
org = _org_from_doc(doc)
await audit_logger.log_action(
action=AuditAction.ORG_CREATE,
description=f"Organization '{org.name}' created",
user=current_user,
request=request,
resource_type="organization",
resource_id=str(org.id),
resource_name=org.name,
details={"slug": org.slug},
)
return org
@router.patch("/{org_id}", response_model=Organization)
async def update_organization(
org_id: str,
body: OrganizationUpdate,
request: Request,
current_user: User = Depends(require_roles(UserRole.ADMIN)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
@ -156,7 +175,18 @@ async def update_organization(
await db.clients.update_one({"_id": org_id}, {"$set": updates})
updated = {**doc, **updates}
return _org_from_doc(updated)
org = _org_from_doc(updated)
await audit_logger.log_action(
action=AuditAction.ORG_UPDATE,
description=f"Organization '{org.name}' updated",
user=current_user,
request=request,
resource_type="organization",
resource_id=str(org.id),
resource_name=org.name,
details={k: v for k, v in updates.items() if k != "updated_at"},
)
return org
# ---------------------------------------------------------------------------
@ -178,6 +208,7 @@ async def list_members(
async def add_member(
org_id: str,
body: MembershipCreate,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
@ -193,6 +224,15 @@ async def add_member(
members = await list_org_members(org_id, db)
for m in members:
if m.user_id == body.user_id:
await audit_logger.log_action(
action=AuditAction.ORG_MEMBER_ADD,
description=f"Member '{body.user_id}' added to organization '{org_id}' with role '{body.role_in_org}'",
user=current_user,
request=request,
resource_type="organization",
resource_id=org_id,
details={"user_id": body.user_id, "role": body.role_in_org},
)
return m
raise HTTPException(status_code=500, detail="Membership created but could not be retrieved")
@ -202,6 +242,7 @@ async def update_member(
org_id: str,
user_id: str,
body: MembershipUpdate,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
@ -218,6 +259,15 @@ async def update_member(
members = await list_org_members(org_id, db)
for m in members:
if m.user_id == user_id:
await audit_logger.log_action(
action=AuditAction.ORG_MEMBER_UPDATE,
description=f"Member '{user_id}' role updated in organization '{org_id}' to '{body.role_in_org}'",
user=current_user,
request=request,
resource_type="organization",
resource_id=org_id,
details={"user_id": user_id, "role": body.role_in_org},
)
return m
raise HTTPException(status_code=500, detail="Could not retrieve updated membership")
@ -226,6 +276,7 @@ async def update_member(
async def remove_member(
org_id: str,
user_id: str,
request: Request,
current_user: User = Depends(get_current_user),
db: AsyncIOMotorDatabase = Depends(get_database),
):
@ -239,6 +290,15 @@ async def remove_member(
await remove_membership(user_id, org_id, db)
await bump_user_membership_cache(user_id)
await audit_logger.log_action(
action=AuditAction.ORG_MEMBER_REMOVE,
description=f"Member '{user_id}' removed from organization '{org_id}'",
user=current_user,
request=request,
resource_type="organization",
resource_id=org_id,
details={"user_id": user_id, "role": existing.role_in_org},
)
# ---------------------------------------------------------------------------

View file

@ -1,14 +1,14 @@
"""API routes for review notes - timestamped notes on video assets during review."""
from datetime import datetime
from typing import Optional
from bson import ObjectId
from fastapi import APIRouter, Depends, HTTPException, Query, status
from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, get_job_or_403, get_membership_context
from ...core.database import get_database
from ...core.dependencies import get_current_user, require_roles
from ...core.dependencies import require_roles
from ...core.logging import get_logger
from ...models.user import User, UserRole
from ...schemas.review_note import (
@ -25,18 +25,13 @@ router = APIRouter(prefix="/jobs/{job_id}/review-notes", tags=["review-notes"])
@router.get("", response_model=ReviewNotesListResponse)
async def list_review_notes(
job_id: str,
asset_key: Optional[str] = Query(None, description="Filter notes by asset key"),
asset_key: str | None = Query(None, description="Filter notes by asset key"),
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List all review notes for a job, optionally filtered by asset key."""
# Verify job exists
job = await db.jobs.find_one({"_id": job_id})
if not job:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Job not found"
)
await get_job_or_403(job_id, ctx, db) # org check + existence check
# Build query
query = {"job_id": job_id}
@ -58,16 +53,11 @@ async def create_review_note(
job_id: str,
request: ReviewNoteCreateRequest,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Create a new review note for a video asset."""
# Verify job exists
job = await db.jobs.find_one({"_id": job_id})
if not job:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Job not found"
)
await get_job_or_403(job_id, ctx, db) # org check + existence check
# Create note document
note_id = str(ObjectId())
@ -96,9 +86,11 @@ async def get_review_note(
job_id: str,
note_id: str,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Get a single review note by ID."""
await get_job_or_403(job_id, ctx, db) # org check
note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id})
if not note:
raise HTTPException(
@ -115,9 +107,11 @@ async def update_review_note(
note_id: str,
request: ReviewNoteUpdateRequest,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Update a review note. Only the note owner can update."""
await get_job_or_403(job_id, ctx, db) # org check
note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id})
if not note:
raise HTTPException(
@ -151,9 +145,11 @@ async def delete_review_note(
job_id: str,
note_id: str,
current_user: User = Depends(require_roles(UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Delete a review note. Only the note owner can delete."""
await get_job_or_403(job_id, ctx, db) # org check
note = await db.review_notes.find_one({"_id": note_id, "job_id": job_id})
if not note:
raise HTTPException(

View file

@ -0,0 +1,354 @@
"""Share-token endpoints — create/revoke/list tokens + public read-only view + client decision."""
import secrets
from datetime import datetime, timedelta
from typing import Literal
from fastapi import APIRouter, Depends, HTTPException, Request
from motor.motor_asyncio import AsyncIOMotorDatabase
from pydantic import BaseModel
from ...core.config import settings
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...models.audit_log import AuditAction
from ...models.share_token import ShareTokenResponse
from ...models.user import User, UserRole
from ...services.audit_logger import audit_logger
from ...services.gcs import get_signed_download_url
router = APIRouter(tags=["share"])
_TOKENS = "share_tokens"
_JOBS = "jobs"
def _share_url(token: str) -> str:
return f"{settings.app_url}/share/{token}"
# ── Request schemas ───────────────────────────────────────────────────────────
class CreateShareTokenRequest(BaseModel):
expires_in_days: int | None = 30 # None = no expiry
label: str | None = None
class ShareTokenListResponse(BaseModel):
tokens: list[ShareTokenResponse]
class PublicJobPreviewLanguage(BaseModel):
captions_vtt_url: str | None = None
audio_description_vtt_url: str | None = None
accessible_video_mp4_url: str | None = None
audio_description_mp3_url: str | None = None
class PublicJobPreviewResponse(BaseModel):
job_id: str
job_title: str
job_status: str
source_language: str
languages: list[str]
language_outputs: dict[str, PublicJobPreviewLanguage]
class ClientDecisionRequest(BaseModel):
action: Literal["approve", "reject"]
notes: str | None = None
client_name: str | None = None
class ClientDecisionResponse(BaseModel):
status: str
new_job_status: str
# ── Authenticated routes ──────────────────────────────────────────────────────
@router.post("/jobs/{job_id}/share", response_model=ShareTokenResponse, status_code=201)
async def create_share_token(
job_id: str,
request: CreateShareTokenRequest,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Generate a read-only share link for a job."""
job_doc = await db[_JOBS].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
token_id = secrets.token_hex(32)
now = datetime.utcnow()
expires_at = (now + timedelta(days=request.expires_in_days)) if request.expires_in_days else None
token_doc = {
"_id": token_id,
"job_id": job_id,
"organization_id": job_doc.get("organization_id", ""),
"created_by_user_id": str(current_user.id),
"created_by_email": current_user.email,
"created_at": now,
"expires_at": expires_at,
"is_active": True,
"label": request.label,
}
await db[_TOKENS].insert_one(token_doc)
await audit_logger.log_action(
action=AuditAction.SHARE_TOKEN_CREATE,
description=f"Share token created for job '{job_id}'",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"token_id": token_id, "label": request.label, "expires_in_days": request.expires_in_days},
)
return ShareTokenResponse(
id=token_id,
job_id=job_id,
created_by_email=current_user.email,
created_at=now,
expires_at=expires_at,
is_active=True,
label=request.label,
share_url=_share_url(token_id),
)
@router.get("/jobs/{job_id}/share", response_model=ShareTokenListResponse)
async def list_share_tokens(
job_id: str,
current_user: User = Depends(require_roles(
UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List all active share tokens for a job."""
job_doc = await db[_JOBS].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
cursor = db[_TOKENS].find({"job_id": job_id, "is_active": True})
tokens = []
async for doc in cursor:
tokens.append(ShareTokenResponse(
id=doc["_id"],
job_id=doc["job_id"],
created_by_email=doc["created_by_email"],
created_at=doc["created_at"],
expires_at=doc.get("expires_at"),
is_active=doc["is_active"],
label=doc.get("label"),
share_url=_share_url(doc["_id"]),
))
return ShareTokenListResponse(tokens=tokens)
@router.delete("/jobs/{job_id}/share/{token_id}", status_code=204)
async def revoke_share_token(
job_id: str,
token_id: str,
http_request: Request,
current_user: User = Depends(require_roles(
UserRole.PROJECT_MANAGER, UserRole.PRODUCTION, UserRole.ADMIN,
)),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Revoke (deactivate) a share token."""
result = await db[_TOKENS].update_one(
{"_id": token_id, "job_id": job_id},
{"$set": {"is_active": False}},
)
if result.matched_count == 0:
raise HTTPException(status_code=404, detail="Token not found")
await audit_logger.log_action(
action=AuditAction.SHARE_TOKEN_REVOKE,
description=f"Share token '{token_id}' revoked for job '{job_id}'",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"token_id": token_id},
)
# ── Public route (no auth) ────────────────────────────────────────────────────
@router.get("/public/share/{token}", response_model=PublicJobPreviewResponse)
async def get_public_job_preview(
token: str,
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Return read-only job preview for a valid share token. No authentication required."""
token_doc = await db[_TOKENS].find_one({"_id": token, "is_active": True})
if not token_doc:
raise HTTPException(status_code=404, detail="Share link not found or has been revoked")
if token_doc.get("expires_at") and token_doc["expires_at"] < datetime.utcnow():
raise HTTPException(status_code=410, detail="Share link has expired")
job_doc = await db[_JOBS].find_one({"_id": token_doc["job_id"]})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
outputs = job_doc.get("outputs") or {}
language_outputs: dict[str, PublicJobPreviewLanguage] = {}
for lang, lang_output in outputs.items():
if not isinstance(lang_output, dict):
continue
lang_data = PublicJobPreviewLanguage()
if "captions_vtt_gcs" in lang_output:
blob_path = lang_output["captions_vtt_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.captions_vtt_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
if "ad_vtt_gcs" in lang_output:
blob_path = lang_output["ad_vtt_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.audio_description_vtt_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
if "ad_mp3_gcs" in lang_output:
blob_path = lang_output["ad_mp3_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.audio_description_mp3_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
if "accessible_video_gcs" in lang_output:
blob_path = lang_output["accessible_video_gcs"].replace(f"gs://{settings.gcs_bucket}/", "")
try:
lang_data.accessible_video_mp4_url = await get_signed_download_url(blob_path, 6)
except Exception:
pass
language_outputs[lang] = lang_data
return PublicJobPreviewResponse(
job_id=str(job_doc["_id"]),
job_title=job_doc.get("title", "Untitled"),
job_status=job_doc.get("status", ""),
source_language=job_doc.get("source", {}).get("language", "en"),
languages=list(outputs.keys()),
language_outputs=language_outputs,
)
@router.post("/public/share/{token}/decision", response_model=ClientDecisionResponse)
async def client_decision(
token: str,
request: ClientDecisionRequest,
http_request: Request,
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Submit client approval or rejection via a share link. No authentication required."""
from ...services.validation import asset_validation_service
token_doc = await db[_TOKENS].find_one({"_id": token, "is_active": True})
if not token_doc:
raise HTTPException(status_code=404, detail="Share link not found or has been revoked")
if token_doc.get("expires_at") and token_doc["expires_at"] < datetime.utcnow():
raise HTTPException(status_code=410, detail="Share link has expired")
job_id = token_doc["job_id"]
job_doc = await db[_JOBS].find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=404, detail="Job not found")
if job_doc.get("status") != "pending_final_review":
raise HTTPException(
status_code=409,
detail="This job is not currently awaiting client review"
)
now = datetime.utcnow()
by_label = f"client:{request.client_name or 'anonymous'} (share/{token[:8]})"
if request.action == "approve":
is_valid, validation_errors = await asset_validation_service.validate_job_assets(job_doc)
if not is_valid:
raise HTTPException(
status_code=400,
detail=f"Asset validation failed: {'; '.join(validation_errors)}"
)
new_status = "completed"
update = {
"$set": {
"status": new_status,
"review.notes": request.notes or "",
"updated_at": now,
},
"$push": {
"review.history": {
"at": now,
"status": new_status,
"by": by_label,
"notes": request.notes or "",
}
},
}
else:
new_status = "qc_feedback"
update = {
"$set": {
"status": new_status,
"review.notes": request.notes or "",
"review.reviewer_id": by_label,
"updated_at": now,
},
"$push": {
"review.history": {
"at": now,
"status": new_status,
"by": by_label,
"notes": request.notes or "",
}
},
}
result = await db[_JOBS].find_one_and_update(
{"_id": job_id, "status": "pending_final_review"},
update,
return_document=True,
)
if not result:
raise HTTPException(
status_code=409,
detail="Decision could not be submitted — the job status may have changed"
)
await audit_logger.log_action(
action=AuditAction.SHARE_CLIENT_DECISION,
description=f"Client '{request.client_name or 'anonymous'}' submitted decision '{request.action}' for job '{job_id}' via share token",
user=None,
request=http_request,
resource_type="job",
resource_id=job_id,
details={
"action": request.action,
"token": token,
"client_name": request.client_name,
"new_status": new_status,
"notes": request.notes,
},
)
if request.action == "approve":
try:
from ...tasks.notify import notify_client_task
notify_client_task.delay(job_id)
except Exception:
pass
return ClientDecisionResponse(status="ok", new_job_status=new_status)

View file

@ -1,18 +1,18 @@
import asyncio
import time
from typing import Literal, Optional
from typing import Literal
from fastapi import APIRouter, Depends, HTTPException, Query
from fastapi.responses import Response
from pydantic import BaseModel, Field
from ...core.config import settings
from ...core.logging import get_logger
from ...services.gemini_tts import gemini_tts_service
from ...services.elevenlabs_voices import elevenlabs_voice_service
from ...services.tts import tts_service
from ...services import cost_tracker
from ...core.dependencies import get_current_user
from ...core.logging import get_logger
from ...services import cost_tracker
from ...services.elevenlabs_voices import elevenlabs_voice_service
from ...services.gemini_tts import gemini_tts_service
from ...services.tts import tts_service
logger = get_logger(__name__)
@ -30,20 +30,20 @@ class VoicePreviewRequest(BaseModel):
style_preset: Literal[
"neutral", "calm", "energetic", "professional", "warm", "documentary", "custom"
] = "neutral"
custom_style_prompt: Optional[str] = None
custom_style_prompt: str | None = None
# ElevenLabs-specific
stability: Optional[float] = Field(default=None, ge=0.0, le=1.0)
similarity_boost: Optional[float] = Field(default=None, ge=0.0, le=1.0)
stability: float | None = Field(default=None, ge=0.0, le=1.0)
similarity_boost: float | None = Field(default=None, ge=0.0, le=1.0)
class VoiceInfo(BaseModel):
"""Structured voice information for any provider."""
id: str
name: str
description: Optional[str] = None
preview_url: Optional[str] = None
labels: Optional[dict[str, str]] = None
category: Optional[str] = None
description: str | None = None
preview_url: str | None = None
labels: dict[str, str] | None = None
category: str | None = None
class ProviderVoicesResponse(BaseModel):
@ -52,7 +52,7 @@ class ProviderVoicesResponse(BaseModel):
voices: list[VoiceInfo]
default: str
available: bool = True
error: Optional[str] = None
error: str | None = None
class LanguagesResponse(BaseModel):
@ -87,12 +87,12 @@ class ProviderOptionsResponse(BaseModel):
"""Available TTS configuration options for a provider."""
provider: str
# Gemini-specific
models: Optional[list[TTSOptionItem]] = None
style_presets: Optional[list[TTSOptionItem]] = None
speed_range: Optional[SpeedRange] = None
models: list[TTSOptionItem] | None = None
style_presets: list[TTSOptionItem] | None = None
speed_range: SpeedRange | None = None
# ElevenLabs-specific
stability_range: Optional[FloatRange] = None
similarity_boost_range: Optional[FloatRange] = None
stability_range: FloatRange | None = None
similarity_boost_range: FloatRange | None = None
@router.get("/voices", response_model=ProviderVoicesResponse)

View file

@ -0,0 +1,151 @@
"""VTT version control endpoints."""
from fastapi import APIRouter, Depends, HTTPException, Query, Request, status
from motor.motor_asyncio import AsyncIOMotorDatabase
from ...core.authz import MembershipContext, get_job_or_403, get_membership_context
from ...core.config import settings
from ...core.database import get_database
from ...core.dependencies import require_roles
from ...models.audit_log import AuditAction
from ...models.user import User, UserRole
from ...models.vtt_version import (
VttDiffResponse,
VttKind,
VttVersionListResponse,
VttVersionSummary,
)
from ...services import vtt_versioning
from ...services.audit_logger import audit_logger
from ...services.gcs import gcs_service
router = APIRouter(prefix="/jobs", tags=["vtt-versions"])
_EDITABLE_ROLES = (UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION, UserRole.ADMIN)
@router.get("/{job_id}/vtt/versions", response_model=VttVersionListResponse)
async def list_vtt_versions(
job_id: str,
lang: str = Query(...),
kind: VttKind = Query(...),
skip: int = Query(0, ge=0),
limit: int = Query(50, ge=1, le=200),
current_user: User = Depends(require_roles(*_EDITABLE_ROLES)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""List all VTT versions for a job/lang/kind, newest first."""
await get_job_or_403(job_id, ctx, db) # org check
return await vtt_versioning.list_versions(db, job_id, lang, kind, skip, limit)
@router.get("/{job_id}/vtt/versions/{version}", response_model=dict)
async def get_vtt_version(
job_id: str,
version: int,
lang: str = Query(...),
kind: VttKind = Query(...),
current_user: User = Depends(require_roles(*_EDITABLE_ROLES)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Get full VTT content for a specific version."""
await get_job_or_403(job_id, ctx, db) # org check
v = await vtt_versioning.get_version(db, job_id, lang, kind, version)
if not v:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Version not found")
return {
"job_id": v.job_id,
"lang": v.lang,
"kind": v.kind,
"version": v.version,
"content": v.content,
"gcs_uri": v.gcs_uri,
"created_at": v.created_at.isoformat(),
"created_by": v.created_by.dict(),
"note": v.note,
"parent_version": v.parent_version,
"cue_count": v.cue_count,
"byte_size": v.byte_size,
}
@router.get("/{job_id}/vtt/versions/diff", response_model=VttDiffResponse)
async def diff_vtt_versions(
job_id: str,
lang: str = Query(...),
kind: VttKind = Query(...),
from_version: int = Query(..., alias="from"),
to_version: int = Query(..., alias="to"),
current_user: User = Depends(require_roles(*_EDITABLE_ROLES)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""Line-level diff between two versions of a VTT file."""
await get_job_or_403(job_id, ctx, db) # org check
v_from = await vtt_versioning.get_version(db, job_id, lang, kind, from_version)
v_to = await vtt_versioning.get_version(db, job_id, lang, kind, to_version)
if not v_from:
raise HTTPException(status_code=404, detail=f"Version {from_version} not found")
if not v_to:
raise HTTPException(status_code=404, detail=f"Version {to_version} not found")
return vtt_versioning.diff_versions(job_id, lang, kind, v_from, v_to)
@router.post(
"/{job_id}/vtt/versions/{version}/restore",
response_model=VttVersionSummary,
status_code=status.HTTP_201_CREATED,
)
async def restore_vtt_version(
job_id: str,
version: int,
lang: str = Query(...),
kind: VttKind = Query(...),
http_request: Request = None,
current_user: User = Depends(require_roles(UserRole.PRODUCTION, UserRole.ADMIN)),
ctx: MembershipContext = Depends(get_membership_context),
db: AsyncIOMotorDatabase = Depends(get_database),
):
"""
Restore a previous version as the new live VTT.
Non-destructive: creates a new version entry whose content mirrors the old one,
then overwrites the live GCS file.
"""
await get_job_or_403(job_id, ctx, db) # org check
src = await vtt_versioning.get_version(db, job_id, lang, kind, version)
if not src:
raise HTTPException(status_code=404, detail="Version not found")
# Create new version snapshot (this also bumps the counter)
new_ver = await vtt_versioning.restore_version(db, job_id, lang, kind, version, current_user)
# Overwrite the live file in GCS so the QC editor sees the restored content
live_path = f"{job_id}/{lang}/{'captions' if kind == 'captions' else 'ad'}.vtt"
try:
await gcs_service.upload_text_to_gcs(src.content, live_path, "text/vtt")
except Exception as exc:
raise HTTPException(
status_code=500,
detail=f"Version snapshot created (v{new_ver.version}) but live file update failed: {exc}",
) from None
# Update the GCS URI pointer in the job document
gcs_uri_key = "captions_vtt_gcs" if kind == "captions" else "ad_vtt_gcs"
new_gcs_uri = f"gs://{settings.gcs_bucket}/{live_path}"
await db.jobs.update_one(
{"_id": job_id},
{"$set": {f"outputs.{lang}.{gcs_uri_key}": new_gcs_uri}},
)
await audit_logger.log_action(
action=AuditAction.VTT_EDIT,
description=f"VTT restored to v{version} for job {job_id} lang={lang} kind={kind}",
user=current_user,
request=http_request,
resource_type="job",
resource_id=job_id,
details={"lang": lang, "kind": kind, "restored_from_version": version, "new_version": new_ver.version},
)
return new_ver

View file

@ -5,107 +5,146 @@ Provides WebSocket endpoints for:
1. Individual job status updates: /ws/jobs/{job_id}
2. Job list updates: /ws/jobs (all jobs for authenticated user)
"""
import asyncio
import logging
from typing import Optional
from fastapi import APIRouter, WebSocket, WebSocketDisconnect, HTTPException, Depends, Query
from fastapi import (
APIRouter,
Depends,
Query,
WebSocket,
WebSocketDisconnect,
)
from fastapi.security import HTTPBearer
from ...services.websocket import (
connection_manager,
authenticate_websocket,
get_connection_manager,
ConnectionManager
)
from ...models.job import Job
from ...core.authz import PLATFORM_ADMIN_ROLES, _cached_memberships
from ...core.database import get_database
from ...core.dependencies import get_current_user
from ...models.user import UserRole
from ...services.websocket import (
ConnectionManager,
authenticate_websocket,
connection_manager,
get_connection_manager,
)
logger = logging.getLogger(__name__)
router = APIRouter(tags=["WebSocket"])
security = HTTPBearer()
# Close codes that indicate a permanent auth/permission failure — frontend must NOT retry
_TERMINAL_CLOSE_CODES = {4001, 4003, 4004, 4403}
# Seconds between server-side keepalive frames.
# Must be < Apache mod_proxy_wstunnel idle timeout.
# Mod Comms incident 2026-03-18: 25s was insufficient; 20s is safe.
_KEEPALIVE_INTERVAL_S = 20
async def _resolve_user_and_org(websocket: WebSocket, user_id: str, db):
"""
Fetch user document and resolve org memberships from cache.
Returns (user_doc, memberships_dict) or closes the socket and returns (None, None).
"""
user = await db["users"].find_one({"_id": user_id})
if not user:
try:
from bson import ObjectId
user = await db["users"].find_one({"_id": ObjectId(user_id)})
except Exception:
pass
if not user:
await websocket.close(code=4001, reason="User not found")
return None, None
is_platform_admin = UserRole(user.get("role", "")) in PLATFORM_ADMIN_ROLES
if is_platform_admin:
return user, None # None memberships = unrestricted
memberships = await _cached_memberships(user_id, db)
return user, memberships
def _can_access_org(org_id: str | None, memberships: dict | None) -> bool:
"""Return True if user (with these memberships) may access the given org_id."""
if memberships is None:
return True # platform admin
if not org_id:
return True # legacy job without org: allow (further checks done below if needed)
return org_id in memberships
@router.websocket("/ws/jobs/{job_id}")
async def websocket_job_status(
websocket: WebSocket,
job_id: str,
token: Optional[str] = Query(None),
token: str | None = Query(None),
manager: ConnectionManager = Depends(get_connection_manager)
):
"""
WebSocket endpoint for real-time job status updates
WebSocket endpoint for real-time job status updates.
Usage:
- Connect: ws://localhost:8000/api/v1/ws/jobs/{job_id}?token={jwt_token}
- Receives: Real-time status updates for the specific job
Message format:
{
"type": "job_status_update",
"data": {
"job_id": "...",
"status": "processing",
"updated_at": "2023-...",
"message": "Processing video...",
"progress": 45
}
}
Close codes:
4001 user not found
4003 role-based access denied
4004 job not found
4403 org membership access denied (do not retry)
"""
# Authenticate the WebSocket connection
user_id = await authenticate_websocket(websocket, token)
if not user_id:
return
try:
# Verify user has access to this job
db = await get_database()
jobs_collection = db["jobs"]
job = await jobs_collection.find_one({"_id": job_id})
job = await db["jobs"].find_one({"_id": job_id})
if not job:
await websocket.close(code=4004, reason="Job not found")
return
# Check permissions - users can only access their own jobs unless they're admin/reviewer
user = await db["users"].find_one({"_id": user_id})
if not user:
try:
from bson import ObjectId
user = await db["users"].find_one({"_id": ObjectId(user_id)})
except Exception:
pass # Invalid ObjectId format
if not user:
await websocket.close(code=4001, reason="User not found")
return
# Check access permissions
user, memberships = await _resolve_user_and_org(websocket, user_id, db)
if user is None:
return # socket already closed inside helper
# Role-based client restriction
if user["role"] == "client" and job.get("created_by") != user_id:
await websocket.close(code=4003, reason="Access denied")
return
# Connect to job status updates
# Org membership check
job_org = job.get("organization_id")
if not _can_access_org(job_org, memberships):
await websocket.close(code=4403, reason="Org access denied")
return
await manager.connect_job_status(websocket, user_id, job_id)
# Keep connection alive and handle incoming messages
while True:
try:
# Wait for incoming WebSocket messages (for heartbeat, etc.)
message = await websocket.receive_text()
# Wait up to _KEEPALIVE_INTERVAL_S for a client message.
# On timeout send a keepalive frame so the proxy idle timer resets.
message = await asyncio.wait_for(
websocket.receive_text(),
timeout=_KEEPALIVE_INTERVAL_S,
)
logger.debug(f"Received WebSocket message from user {user_id}: {message}")
# Handle heartbeat or other client messages if needed
if message == "ping":
await websocket.send_text("pong")
except TimeoutError:
await websocket.send_text("keepalive")
except WebSocketDisconnect:
break
except Exception as e:
logger.error(f"Error in WebSocket message handling: {e}")
break
except WebSocketDisconnect:
pass
except Exception as e:
@ -117,75 +156,54 @@ async def websocket_job_status(
@router.websocket("/ws/jobs")
async def websocket_job_list(
websocket: WebSocket,
token: Optional[str] = Query(None),
token: str | None = Query(None),
manager: ConnectionManager = Depends(get_connection_manager)
):
"""
WebSocket endpoint for real-time job list updates
WebSocket endpoint for real-time job list updates.
Usage:
- Connect: ws://localhost:8000/api/v1/ws/jobs?token={jwt_token}
- Receives: Real-time status updates for all jobs the user can access
Message format:
{
"type": "job_list_update",
"data": {
"job_id": "...",
"status": "processing",
"updated_at": "2023-...",
"message": "Processing video...",
"progress": 45
}
}
Only events for jobs in the user's accessible orgs are delivered.
"""
# Authenticate the WebSocket connection
user_id = await authenticate_websocket(websocket, token)
if not user_id:
return
try:
# Verify user exists
logger.info(f"WebSocket: Looking up user {user_id} in database")
db = await get_database()
# Try looking up user by string ID first, then by ObjectId
user = await db["users"].find_one({"_id": user_id})
if not user:
try:
from bson import ObjectId
user = await db["users"].find_one({"_id": ObjectId(user_id)})
except Exception:
pass # Invalid ObjectId format
if not user:
logger.warning(f"WebSocket: User {user_id} not found in database (tried both string and ObjectId)")
await websocket.close(code=4001, reason="User not found")
return
user, memberships = await _resolve_user_and_org(websocket, user_id, db)
if user is None:
return # socket already closed inside helper
logger.info(f"WebSocket: User {user_id} found, role: {user.get('role', 'unknown')}")
logger.info(f"WebSocket: User {user_id} found, connecting to job list updates")
# Connect to job list updates
await manager.connect_job_list(websocket, user_id)
# Keep connection alive and handle incoming messages
accessible_org_ids = None if memberships is None else list(memberships.keys())
await manager.connect_job_list(websocket, user_id, accessible_org_ids=accessible_org_ids)
while True:
try:
# Wait for incoming WebSocket messages
message = await websocket.receive_text()
message = await asyncio.wait_for(
websocket.receive_text(),
timeout=_KEEPALIVE_INTERVAL_S,
)
logger.debug(f"Received WebSocket message from user {user_id}: {message}")
# Handle heartbeat or other client messages if needed
if message == "ping":
await websocket.send_text("pong")
except TimeoutError:
await websocket.send_text("keepalive")
except WebSocketDisconnect:
break
except Exception as e:
logger.error(f"Error in WebSocket message handling: {e}")
break
except WebSocketDisconnect:
pass
except Exception as e:
@ -196,19 +214,15 @@ async def websocket_job_list(
@router.get("/ws/status")
async def websocket_status():
"""
Get WebSocket connection status and statistics
Useful for debugging and monitoring
"""
"""Get WebSocket connection status and statistics (debug/monitoring)."""
stats = {
"active_connections": len(connection_manager.active_connections),
"job_subscriptions": len(connection_manager.job_subscriptions),
"global_subscriptions": len(connection_manager.global_subscriptions),
"redis_connected": connection_manager.redis_client is not None,
"subscriber_running": (
connection_manager.subscriber_task is not None and
connection_manager.subscriber_task is not None and
not connection_manager.subscriber_task.done()
)
}
return stats
return stats

View file

@ -11,7 +11,6 @@ Provides:
import json
from dataclasses import dataclass
from typing import Optional
from fastapi import Depends, HTTPException, status
from motor.motor_asyncio import AsyncIOMotorDatabase
@ -64,10 +63,10 @@ async def _cached_memberships(
db: AsyncIOMotorDatabase,
) -> dict[str, OrgRole]:
"""Load memberships, with Redis cache (60s TTL)."""
cache_key = f"mem:user:{user_id}"
try:
redis = get_redis()
redis = await get_redis()
if redis:
cache_key = f"mem:user:{user_id}"
cached = await redis.get(cache_key)
if cached:
raw = json.loads(cached)
@ -78,7 +77,7 @@ async def _cached_memberships(
memberships = await _load_memberships(user_id, db)
try:
redis = get_redis()
redis = await get_redis()
if redis:
await redis.setex(
cache_key,
@ -159,7 +158,7 @@ class OrgScopedQuery:
def filter(
self,
base_query: dict,
org_id: Optional[str] = None,
org_id: str | None = None,
org_field: str = "organization_id",
) -> dict:
if self.ctx.is_platform_admin:
@ -183,6 +182,50 @@ class OrgScopedQuery:
return {**base_query, org_field: {"$in": accessible}}
def assert_user_in_org(
ctx: "MembershipContext",
org_id: str,
min_role: OrgRole = OrgRole.VIEWER,
) -> None:
"""Raise 403 if ctx user does not have min_role in org_id. Platform admins always pass."""
if not ctx.can_access_org(org_id, min_role):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Access to this organization is not permitted",
)
async def get_job_or_403(
job_id: str,
ctx: "MembershipContext",
db: AsyncIOMotorDatabase,
) -> dict:
"""Load job document and verify ctx user can access its organization. Returns 404 for missing jobs."""
job_doc = await db.jobs.find_one({"_id": job_id})
if not job_doc:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
org_id = job_doc.get("organization_id")
if not org_id:
# Legacy job without org: try resolving via project
project_id = job_doc.get("project_id")
if project_id:
project = await db.projects.find_one({"_id": project_id}, {"client_id": 1})
if project:
org_id = project.get("client_id")
if org_id:
if not ctx.can_access_org(org_id):
# Return 404 to avoid leaking existence of cross-org jobs
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
else:
# Truly legacy job (no project, no org): only the original uploader or admin can access
if not ctx.is_platform_admin and job_doc.get("client_id") != str(ctx.user.id):
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
return job_doc
async def bump_user_membership_cache(user_id: str) -> None:
"""Invalidate the Redis membership cache for a user (call on any membership write)."""
try:

View file

@ -6,6 +6,7 @@ class Settings(BaseSettings):
# App
app_env: str = "dev"
api_base_url: str = "http://localhost:8000"
app_url: str = "https://optical-dev.oliver.solutions/video-accessibility"
# Auth
jwt_secret: str
@ -22,13 +23,14 @@ class Settings(BaseSettings):
# Redis
redis_url: str
# Celery
celery_broker_url: str = ""
celery_result_backend: str = ""
# GCP
gcp_project_id: str
gcp_location: str = "us-central1"
gcs_bucket: str = "accessible-video"
google_application_credentials: str = ""
@ -36,7 +38,7 @@ class Settings(BaseSettings):
gemini_api_key: str
elevenlabs_api_key: str = ""
google_tts_credentials: str = ""
# TTS Voice Configuration
tts_provider: str = "gemini" # "gemini", "google", or "elevenlabs"
google_tts_voices: dict[str, str] = {
@ -50,7 +52,7 @@ class Settings(BaseSettings):
elevenlabs_voices: dict[str, str] = {}
# Gemini TTS Configuration
gemini_tts_model: str = "gemini-2.5-flash-preview-tts"
gemini_tts_model: str = "gemini-3.1-flash-tts-preview"
gemini_tts_default_voice: str = "Kore"
gemini_tts_voices: list[str] = [
"Zephyr", "Puck", "Charon", "Kore", "Fenrir", "Leda", "Orus", "Aoede",
@ -93,7 +95,24 @@ class Settings(BaseSettings):
"sv": "sv-SE",
"es-419": "es-US",
"pt-BR": "pt-BR",
"fr-CA": "fr-CA"
"fr-CA": "fr-CA",
# Explicit region variants (added for locale-aware glossary support)
"de-DE": "de-DE",
"en-US": "en-US",
"en-GB": "en-GB",
"en-CA": "en-CA",
"es-ES": "es-ES",
"es-MX": "es-US",
"fr-FR": "fr-FR",
"it-IT": "it-IT",
"ja-JP": "ja-JP",
"ko-KR": "ko-KR",
"nl-NL": "nl-NL",
"pl-PL": "pl-PL",
"cs-CZ": "cs-CZ",
"tr-TR": "tr-TR",
"id-ID": "id-ID",
"pt-PT": "pt-PT",
}
gemini_tts_language_names: dict[str, str] = {
"en": "English",
@ -129,7 +148,24 @@ class Settings(BaseSettings):
"sv": "Swedish",
"es-419": "Spanish (Latin America)",
"pt-BR": "Portuguese (Brazil)",
"fr-CA": "French (Canada)"
"fr-CA": "French (Canada)",
# Explicit region variants
"de-DE": "German (Germany)",
"en-US": "English (US)",
"en-GB": "English (UK)",
"en-CA": "English (Canada)",
"es-ES": "Spanish (Spain)",
"es-MX": "Spanish (Mexico)",
"fr-FR": "French (France)",
"it-IT": "Italian (Italy)",
"ja-JP": "Japanese (Japan)",
"ko-KR": "Korean (Korea)",
"nl-NL": "Dutch (Netherlands)",
"pl-PL": "Polish (Poland)",
"cs-CZ": "Czech (Czech Republic)",
"tr-TR": "Turkish (Turkey)",
"id-ID": "Indonesian (Indonesia)",
"pt-PT": "Portuguese (Portugal)",
}
gemini_tts_preview_samples: dict[str, str] = {
"en": "This is a preview of the audio description voice.",
@ -165,13 +201,30 @@ class Settings(BaseSettings):
"sv": "Det här är en förhandsgranskning av ljudbeskrivningsrösten.",
"es-419": "Esta es una vista previa de la voz de audiodescripción.",
"pt-BR": "Esta é uma prévia da voz da audiodescrição.",
"fr-CA": "Ceci est un aperçu de la voix de l'audiodescription."
"fr-CA": "Ceci est un aperçu de la voix de l'audiodescription.",
# Explicit region variants
"de-DE": "Dies ist eine Vorschau der Audiodeskriptionsstimme.",
"en-US": "This is a preview of the audio description voice.",
"en-GB": "This is a preview of the audio description voice.",
"en-CA": "This is a preview of the audio description voice.",
"es-ES": "Esta es una vista previa de la voz de audiodescripción.",
"es-MX": "Esta es una vista previa de la voz de audiodescripción.",
"fr-FR": "Ceci est un aperçu de la voix de l'audiodescription.",
"it-IT": "Questa è un'anteprima della voce dell'audiodescrizione.",
"ja-JP": "これは音声解説の声のプレビューです。",
"ko-KR": "이것은 오디오 설명 음성의 미리보기입니다.",
"nl-NL": "Dit is een voorbeeld van de audiodescriptiestem.",
"pl-PL": "To jest podgląd głosu audiodeskrypcji.",
"cs-CZ": "Toto je náhled hlasu zvukového popisu.",
"tr-TR": "Bu, sesli betimleme sesinin bir önizlemesidir.",
"id-ID": "Ini adalah pratinjau suara deskripsi audio.",
"pt-PT": "Esta é uma pré-visualização da voz da audiodescrição.",
}
# Gemini TTS Model Options
gemini_tts_models: dict[str, str] = {
"flash": "gemini-2.5-flash-preview-tts", # Fast, cost-efficient
"pro": "gemini-2.5-pro-preview-tts", # Higher quality
"flash": "gemini-3.1-flash-tts-preview", # Fast, cost-efficient (Preview)
"pro": "gemini-2.5-pro-tts", # Higher quality (GA)
}
# Gemini TTS Style Presets - prompts prepended to text for style control
@ -196,6 +249,14 @@ class Settings(BaseSettings):
whisper_sentence_gap_threshold: float = 0.5 # Gap duration to classify as sentence boundary
whisper_phrase_gap_threshold: float = 0.3 # Gap duration to classify as phrase boundary
whisper_min_gap_threshold: float = 0.15 # Minimum gap duration to consider
# Forward-preferred snap windows (A2)
whisper_snap_forward_window: float = 4.0 # Prefer boundary up to N seconds ahead of Gemini point
whisper_snap_backward_window: float = 1.5 # Fall back to boundary up to N seconds behind
# Adaptive silence buffer (A1)
ad_silence_buffer_default: float = 0.5 # Base silence duration (s) before/after AD audio
ad_silence_buffer_min_after: float = 0.1 # Minimum silence after AD audio
# Minimum gap required at the chosen pause point (A3)
ad_min_acceptable_gap: float = 0.2 # Seconds; points with shorter gaps trigger forward search
# Cloud Run Service URLs (empty = use local processing)
# When set, CPU-intensive work is offloaded to Cloud Run with autoscaling
@ -214,11 +275,10 @@ class Settings(BaseSettings):
ffmpeg_worker_concurrency: int = 4 # FFmpeg tasks on main worker
tts_worker_concurrency: int = 8 # TTS worker
# Email (Mailgun — primary; sendgrid_api_key kept for backward compat)
# Email (Mailgun)
mailgun_api_key: str = ""
mailgun_domain: str = "mg.oliver.solutions"
mailgun_from: str = "noreply@mg.oliver.solutions"
sendgrid_api_key: str = ""
email_from: str = "noreply@mg.oliver.solutions"
client_base_url: str
@ -237,6 +297,10 @@ class Settings(BaseSettings):
cost_tracker_source_app: str = "video-accessibility"
cost_tracker_enabled: bool = True
# Upload limits (T-14 — single source of truth)
upload_max_video_bytes: int = 2 * 1024 * 1024 * 1024 # 2GB
upload_signed_url_ttl_hours: int = 24 # signed URL lifetime
# CORS - comma-separated list of allowed origins
cors_origins: str = "http://localhost:5173,http://localhost:5174,http://localhost:3000,http://localhost:6001"

View file

@ -56,7 +56,7 @@ async def create_indexes():
await db.audit_logs.create_index([("resource_type", 1), ("resource_id", 1)]) # Resource tracking
await db.audit_logs.create_index([("ip_address", 1), ("timestamp", -1)]) # IP-based analysis
await db.audit_logs.create_index([("success", 1), ("timestamp", -1)]) # Failed operations
# Text search index for description and details
await db.audit_logs.create_index([
("description", "text"),
@ -64,9 +64,19 @@ async def create_indexes():
("error_message", "text")
])
# Per-language QC assignment index — for linguist queue queries
await db.jobs.create_index([("qc_assignments.linguist_id", 1), ("qc_assignments.status", 1)])
# Review notes collection indexes
await db.review_notes.create_index([("job_id", 1), ("asset_key", 1)])
await db.review_notes.create_index([("job_id", 1), ("asset_key", 1), ("timestamp_seconds", 1)])
await db.review_notes.create_index([("user_id", 1)])
# VTT versions collection indexes
await db.vtt_versions.create_index(
[("job_id", 1), ("lang", 1), ("kind", 1), ("version", -1)],
unique=True,
)
await db.vtt_versions.create_index([("job_id", 1), ("created_at", -1)])
logger.info("Database indexes created successfully")

View file

@ -1,18 +1,16 @@
from typing import Optional
from fastapi import Depends, HTTPException, Request, status
from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
from motor.motor_asyncio import AsyncIOMotorDatabase
from ..models.user import User, UserRole
from .config import settings
from .database import get_database
from .security import decode_token
security = HTTPBearer()
# Roles that see all jobs (no tenant isolation)
STAFF_ROLES = {UserRole.ADMIN, UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION}
# Only admins bypass tenant isolation; other staff are scoped by team membership
STAFF_ROLES = {UserRole.ADMIN}
async def get_current_user(
@ -21,6 +19,13 @@ async def get_current_user(
) -> User:
token = credentials.credentials
payload = decode_token(token)
if payload.get("type") == "refresh":
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Could not validate credentials",
)
user_id: str = payload.get("sub")
if user_id is None:
@ -36,7 +41,12 @@ async def get_current_user(
detail="User not found",
)
return User(**user_doc)
user = User(**user_doc)
# Attach org_ids hint from token as transient attribute (never used for authz)
token_org_ids = payload.get("org_ids", [])
if token_org_ids:
user.__dict__["org_ids"] = token_org_ids
return user
def require_role(required_role: UserRole):
@ -66,7 +76,7 @@ def require_roles(*required_roles: UserRole):
async def get_current_user_optional(
request: Request,
db: AsyncIOMotorDatabase = Depends(get_database),
) -> Optional[User]:
) -> User | None:
authorization: str = request.headers.get("Authorization")
if not authorization:
return None
@ -77,6 +87,9 @@ async def get_current_user_optional(
return None
payload = decode_token(token)
if payload.get("type") == "refresh":
return None
user_id: str = payload.get("sub")
if user_id is None:
@ -94,21 +107,28 @@ async def get_current_user_optional(
async def get_accessible_project_ids(
user: User,
db: AsyncIOMotorDatabase,
) -> Optional[list[str]]:
) -> list[str] | None:
"""
Returns project IDs the user may access, or None meaning "see everything".
- Staff / Admin None (unrestricted)
- Otherwise projects in orgs where the user holds any membership
(falls back to legacy pm_client_ids/team lookups if no memberships found)
- Admin None (unrestricted)
- Staff (REVIEWER/LINGUIST/PRODUCTION) scoped by team membership;
if not yet assigned to any team, falls back to None (see all)
so existing staff aren't locked out before teams are configured
- PM projects in accessible orgs/clients (pm_client_ids legacy)
- CLIENT projects in orgs where the user holds any membership
"""
if user.role in STAFF_ROLES:
return None
# Primary path: use memberships collection (Phase 3 SaaS)
user_id = str(user.id)
membership_cursor = db.memberships.find({"user_id": user_id}, {"organization_id": 1})
org_ids = [doc["organization_id"] async for doc in membership_cursor]
# Primary path: use Redis-cached memberships (60s TTL, same cache as authz.py)
from .authz import (
_cached_memberships, # local import to avoid circular dep at module level
)
memberships_map = await _cached_memberships(user_id, db)
org_ids = list(memberships_map.keys())
if org_ids:
projects = await db.projects.find(
@ -117,29 +137,98 @@ async def get_accessible_project_ids(
).to_list(None)
return [str(p["_id"]) for p in projects]
# Legacy fallback (pre-backfill) — keeps the app working before migration runs
if user.role == UserRole.PROJECT_MANAGER:
client_ids = user.pm_client_ids or []
if not client_ids:
return []
# Legacy fallback: team membership (used by REVIEWER/LINGUIST/PRODUCTION and legacy CLIENT)
teams = await db.teams.find(
{"member_user_ids": user_id},
{"client_id": 1},
).to_list(None)
client_ids = list({t["client_id"] for t in teams})
if client_ids:
projects = await db.projects.find(
{"client_id": {"$in": client_ids}, "is_active": True},
{"_id": 1},
).to_list(None)
return [str(p["_id"]) for p in projects]
teams = await db.teams.find(
{"member_user_ids": user_id},
{"client_id": 1},
).to_list(None)
client_ids = list({t["client_id"] for t in teams})
if not client_ids:
return []
projects = await db.projects.find(
{"client_id": {"$in": client_ids}, "is_active": True},
{"_id": 1},
).to_list(None)
return [str(p["_id"]) for p in projects]
# PM legacy: scoped via pm_client_ids
if user.role == UserRole.PROJECT_MANAGER:
pm_client_ids = user.pm_client_ids or []
if not pm_client_ids:
return []
projects = await db.projects.find(
{"client_id": {"$in": pm_client_ids}, "is_active": True},
{"_id": 1},
).to_list(None)
return [str(p["_id"]) for p in projects]
# Staff with no team assignments → unrestricted until teams are configured
if user.role in {UserRole.REVIEWER, UserRole.LINGUIST, UserRole.PRODUCTION}:
return None
# CLIENT with no memberships and no teams → show nothing
return []
async def get_user_org_ids(user: User, db: AsyncIOMotorDatabase) -> list[str] | None:
"""Return org IDs the user belongs to, or None meaning unrestricted (ADMIN).
Priority: memberships pm_client_ids (PM legacy) team.member_user_ids (staff legacy)
"""
if user.role == UserRole.ADMIN:
return None
user_id = str(user.id)
# Primary: Membership collection
org_ids: list[str] = []
async for m in db.memberships.find({"user_id": user_id}, {"organization_id": 1}):
if m.get("organization_id"):
org_ids.append(str(m["organization_id"]))
if org_ids:
return org_ids
# PM legacy: pm_client_ids
if user.role == UserRole.PROJECT_MANAGER:
return list(user.pm_client_ids or [])
# Staff legacy: team.member_user_ids
teams = await db.teams.find({"member_user_ids": user_id}, {"client_id": 1}).to_list(None)
if teams:
return [str(t["client_id"]) for t in teams if t.get("client_id")]
return []
async def assert_job_in_user_org(job: dict, user: User, db: AsyncIOMotorDatabase) -> None:
"""Raise 404 (not 403) when user cannot access this job — avoids information disclosure."""
if user.role == UserRole.ADMIN:
return
org_ids = await get_user_org_ids(user, db)
if org_ids is None:
return # unrestricted
job_org = job.get("organization_id")
if job_org:
if job_org in org_ids:
return
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
# No organization_id — try project fallback
project_id = job.get("project_id")
if project_id:
project = await db.projects.find_one({"_id": project_id}, {"client_id": 1})
if project and project.get("client_id") in org_ids:
return
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
# Legacy: client_id == creator user_id
job_client_id = job.get("client_id")
if job_client_id and job_client_id == str(user.id):
return
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Job not found")
def require_pm_for_client(client_id_param: str = "client_id"):

View file

@ -1,10 +1,6 @@
"""Enhanced configuration system with Secret Manager integration."""
import os
import asyncio
from typing import Dict, Optional, Any
from functools import lru_cache
from pydantic_settings import BaseSettings
from .config import Settings as BaseConfig
from .logging import get_logger
@ -14,41 +10,40 @@ logger = get_logger(__name__)
class SecretsConfig(BaseConfig):
"""Enhanced configuration that loads secrets from GCP Secret Manager."""
def __init__(self, **kwargs):
# Initialize with base configuration first
super().__init__(**kwargs)
# Flag to track if secrets have been loaded
self._secrets_loaded = False
self._secret_values: Dict[str, str] = {}
self._secret_values: dict[str, str] = {}
async def load_secrets(self) -> None:
"""Load secrets from Secret Manager asynchronously."""
if self._secrets_loaded:
return
try:
# Only import here to avoid circular imports
from app.services.secrets_manager import secrets_manager
# Define which config fields should be loaded from secrets
secret_mappings = {
# Config field -> Secret Manager name
"jwt_secret": "jwt-secret",
"jwt_refresh_secret": "jwt-refresh-secret",
"jwt_refresh_secret": "jwt-refresh-secret",
"mongodb_uri": "mongodb-url",
"redis_url": "redis-url",
"gemini_api_key": "gemini-api-key",
"sendgrid_api_key": "sendgrid-api-key",
"elevenlabs_api_key": "elevenlabs-api-key",
"sentry_dsn": "sentry-dsn"
}
# Get all secrets in batch
secret_names = list(secret_mappings.values())
retrieved_secrets = await secrets_manager.get_secrets_batch(secret_names)
# Map secrets back to config fields
for config_field, secret_name in secret_mappings.items():
if secret_name in retrieved_secrets:
@ -58,50 +53,50 @@ class SecretsConfig(BaseConfig):
logger.debug(f"Loaded secret for {config_field}")
else:
logger.warning(f"Secret {secret_name} not available, using environment/default")
self._secrets_loaded = True
logger.info(f"Successfully loaded {len(retrieved_secrets)} secrets from Secret Manager")
except Exception as e:
logger.warning(f"Failed to load secrets from Secret Manager: {e}")
logger.warning("Falling back to environment variables")
self._secrets_loaded = True # Mark as loaded to prevent retries
def get_secret_value(self, field_name: str) -> Optional[str]:
def get_secret_value(self, field_name: str) -> str | None:
"""Get a secret value if it was loaded from Secret Manager."""
return self._secret_values.get(field_name)
async def refresh_secrets(self) -> None:
"""Force refresh secrets from Secret Manager."""
self._secrets_loaded = False
self._secret_values.clear()
# Clear the secrets manager cache
from app.services.secrets_manager import secrets_manager
secrets_manager.clear_cache()
await self.load_secrets()
@property
def is_production(self) -> bool:
"""Check if running in production environment."""
return self.app_env == "prod"
@property
def is_development(self) -> bool:
"""Check if running in development environment."""
return self.app_env == "dev"
@property
def google_cloud_project(self) -> str:
"""Get Google Cloud Project ID."""
return self.gcp_project_id
@property
def jwt_refresh_secret(self) -> str:
"""Get JWT refresh secret (fallback to main secret if not set)."""
return getattr(self, '_jwt_refresh_secret', self.jwt_secret)
@jwt_refresh_secret.setter
def jwt_refresh_secret(self, value: str) -> None:
"""Set JWT refresh secret."""
@ -109,37 +104,37 @@ class SecretsConfig(BaseConfig):
# Global configuration instance
_config_instance: Optional[SecretsConfig] = None
_config_instance: SecretsConfig | None = None
async def initialize_config() -> SecretsConfig:
"""Initialize configuration with secrets loading."""
global _config_instance
if _config_instance is None:
_config_instance = SecretsConfig()
await _config_instance.load_secrets()
return _config_instance
def get_settings() -> SecretsConfig:
"""Get settings instance (synchronous)."""
global _config_instance
if _config_instance is None:
# Initialize without secrets for backwards compatibility
_config_instance = SecretsConfig()
logger.warning("Settings accessed before async initialization - secrets not loaded")
return _config_instance
@lru_cache()
@lru_cache
def get_settings_cached() -> SecretsConfig:
"""Get cached settings instance."""
return get_settings()
# Backwards compatibility
settings = get_settings()
settings = get_settings()

View file

@ -1,5 +1,5 @@
from datetime import datetime, timedelta
from typing import Any, Optional, Union
from typing import Any
from fastapi import HTTPException, status
from jose import JWTError, jwt
@ -11,20 +11,24 @@ pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
def create_access_token(
subject: Union[str, Any], expires_delta: Optional[timedelta] = None
subject: str | Any,
expires_delta: timedelta | None = None,
org_ids: list[str] | None = None,
) -> str:
if expires_delta:
expire = datetime.utcnow() + expires_delta
else:
expire = datetime.utcnow() + timedelta(minutes=settings.jwt_access_ttl_min)
to_encode = {"exp": expire, "sub": str(subject)}
to_encode: dict[str, Any] = {"exp": expire, "sub": str(subject), "v": 2}
if org_ids:
to_encode["org_ids"] = org_ids
encoded_jwt = jwt.encode(to_encode, settings.jwt_secret, algorithm=settings.jwt_alg)
return encoded_jwt
def create_refresh_token(
subject: Union[str, Any], expires_delta: Optional[timedelta] = None
subject: str | Any, expires_delta: timedelta | None = None
) -> str:
if expires_delta:
expire = datetime.utcnow() + expires_delta
@ -37,6 +41,8 @@ def create_refresh_token(
def verify_password(plain_password: str, hashed_password: str) -> bool:
if not hashed_password:
return False
return pwd_context.verify(plain_password, hashed_password)
@ -52,4 +58,4 @@ def decode_token(token: str) -> dict[str, Any]:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Could not validate credentials",
)
) from None

View file

@ -34,7 +34,13 @@ async def seed_default_admin(db) -> None:
print(f"✅ Default admin {DEFAULT_ADMIN_EMAIL} already exists")
return
password = os.environ.get("DEFAULT_ADMIN_PASSWORD", "ChangeMe123!")
password = os.environ.get("DEFAULT_ADMIN_PASSWORD")
if not password:
print(
"⚠️ DEFAULT_ADMIN_PASSWORD not set — skipping default admin creation. "
"Set this env var and restart to create the admin account."
)
return
user_doc = {
"_id": str(ObjectId()),
"email": DEFAULT_ADMIN_EMAIL,

245
backend/app/lib/locales.py Normal file
View file

@ -0,0 +1,245 @@
"""
Central locale registry.
Provides a single source of truth for BCP-47 codes, display names,
and Gemini-friendly labels used throughout the translation/TTS pipeline.
Convention: BCP-47 with hyphen separator (fr-FR, en-GB, pt-BR).
xlsx underscore format (fr_fr, en_gb) is normalized at import time.
Bare language-only codes (fr, en) remain valid for legacy compat.
"""
from __future__ import annotations
from dataclasses import dataclass
@dataclass(frozen=True)
class Locale:
code: str # canonical BCP-47 (e.g. "fr-FR")
display_name: str # human-readable (e.g. "French (France)")
gemini_label: str # what to pass to Gemini prompts (e.g. "French (France)")
tts_lang: str # BCP-47 for TTS API (may differ, e.g. es-MX → es-US)
preview_sample: str # sample sentence for TTS preview
# Master locale registry. Bare language codes (legacy) + explicit region variants.
_REGISTRY: dict[str, Locale] = {loc.code: loc for loc in [
# ── English ──────────────────────────────────────────────────────────────
Locale("en", "English", "English", "en-US",
"This is a preview of the audio description voice."),
Locale("en-US", "English (US)", "English (United States)", "en-US",
"This is a preview of the audio description voice."),
Locale("en-GB", "English (UK)", "English (United Kingdom)", "en-GB",
"This is a preview of the audio description voice."),
Locale("en-CA", "English (Canada)", "English (Canada)", "en-CA",
"This is a preview of the audio description voice."),
# ── Spanish ──────────────────────────────────────────────────────────────
Locale("es", "Spanish", "Spanish", "es-US",
"Esta es una vista previa de la voz de audiodescripcion."),
Locale("es-ES", "Spanish (Spain)", "Spanish (Spain)", "es-ES",
"Esta es una vista previa de la voz de audiodescripción."),
Locale("es-MX", "Spanish (Mexico)", "Spanish (Mexico)", "es-US",
"Esta es una vista previa de la voz de audiodescripción."),
Locale("es-419", "Spanish (Latin America)", "Spanish (Latin America)", "es-US",
"Esta es una vista previa de la voz de audiodescripción."),
# ── French ───────────────────────────────────────────────────────────────
Locale("fr", "French", "French", "fr-FR",
"Ceci est un apercu de la voix de l'audiodescription."),
Locale("fr-FR", "French (France)", "French (France)", "fr-FR",
"Ceci est un aperçu de la voix de l'audiodescription."),
Locale("fr-CA", "French (Canada)", "French (Canada)", "fr-CA",
"Ceci est un aperçu de la voix de l'audiodescription."),
# ── German ───────────────────────────────────────────────────────────────
Locale("de", "German", "German", "de-DE",
"Dies ist eine Vorschau der Audiodeskriptionsstimme."),
Locale("de-DE", "German (Germany)", "German (Germany)", "de-DE",
"Dies ist eine Vorschau der Audiodeskriptionsstimme."),
# ── Italian ──────────────────────────────────────────────────────────────
Locale("it", "Italian", "Italian", "it-IT",
"Questa e un'anteprima della voce dell'audiodescrizione."),
Locale("it-IT", "Italian (Italy)", "Italian (Italy)", "it-IT",
"Questa è un'anteprima della voce dell'audiodescrizione."),
# ── Portuguese ───────────────────────────────────────────────────────────
Locale("pt", "Portuguese", "Portuguese", "pt-BR",
"Esta e uma previa da voz da audiodescricao."),
Locale("pt-BR", "Portuguese (Brazil)", "Portuguese (Brazil)", "pt-BR",
"Esta é uma prévia da voz da audiodescrição."),
Locale("pt-PT", "Portuguese (Portugal)", "Portuguese (Portugal)", "pt-PT",
"Esta é uma pré-visualização da voz da audiodescrição."),
# ── Japanese ─────────────────────────────────────────────────────────────
Locale("ja", "Japanese", "Japanese", "ja-JP",
"これは音声解説の声のプレビューです。"),
Locale("ja-JP", "Japanese (Japan)", "Japanese (Japan)", "ja-JP",
"これは音声解説の声のプレビューです。"),
# ── Korean ───────────────────────────────────────────────────────────────
Locale("ko", "Korean", "Korean", "ko-KR",
"이것은 오디오 설명 음성의 미리보기입니다."),
Locale("ko-KR", "Korean (Korea)", "Korean (South Korea)", "ko-KR",
"이것은 오디오 설명 음성의 미리보기입니다."),
# ── Arabic ───────────────────────────────────────────────────────────────
Locale("ar", "Arabic", "Arabic", "ar-EG",
"هذه معاينة لصوت الوصف الصوتي."),
# ── Hindi ────────────────────────────────────────────────────────────────
Locale("hi", "Hindi", "Hindi", "hi-IN",
"यह ऑडियो विवरण आवाज का पूर्वावलोकन है।"),
# ── Indonesian ───────────────────────────────────────────────────────────
Locale("id", "Indonesian", "Indonesian", "id-ID",
"Ini adalah pratinjau suara deskripsi audio."),
Locale("id-ID", "Indonesian (Indonesia)", "Indonesian (Indonesia)", "id-ID",
"Ini adalah pratinjau suara deskripsi audio."),
# ── Dutch ────────────────────────────────────────────────────────────────
Locale("nl", "Dutch", "Dutch", "nl-NL",
"Dit is een voorbeeld van de audiodescriptiestem."),
Locale("nl-NL", "Dutch (Netherlands)", "Dutch (Netherlands)", "nl-NL",
"Dit is een voorbeeld van de audiodescriptiestem."),
# ── Polish ───────────────────────────────────────────────────────────────
Locale("pl", "Polish", "Polish", "pl-PL",
"To jest podglad glosu audiodeskrypcji."),
Locale("pl-PL", "Polish (Poland)", "Polish (Poland)", "pl-PL",
"To jest podgląd głosu audiodeskrypcji."),
# ── Russian ──────────────────────────────────────────────────────────────
Locale("ru", "Russian", "Russian", "ru-RU",
"Это предварительный просмотр голоса аудиоописания."),
# ── Thai ─────────────────────────────────────────────────────────────────
Locale("th", "Thai", "Thai", "th-TH",
"นี่คือตัวอย่างเสียงบรรยายภาพ"),
# ── Turkish ──────────────────────────────────────────────────────────────
Locale("tr", "Turkish", "Turkish", "tr-TR",
"Bu, sesli betimleme sesinin bir onizlemesidir."),
Locale("tr-TR", "Turkish (Turkey)", "Turkish (Turkey)", "tr-TR",
"Bu, sesli betimleme sesinin bir önizlemesidir."),
# ── Vietnamese ───────────────────────────────────────────────────────────
Locale("vi", "Vietnamese", "Vietnamese", "vi-VN",
"Day la ban xem truoc giong mo ta am thanh."),
# ── Romanian ─────────────────────────────────────────────────────────────
Locale("ro", "Romanian", "Romanian", "ro-RO",
"Aceasta este o previzualizare a vocii descrierii audio."),
# ── Ukrainian ────────────────────────────────────────────────────────────
Locale("uk", "Ukrainian", "Ukrainian", "uk-UA",
"Це попередній перегляд голосу аудіоопису."),
# ── Bengali ──────────────────────────────────────────────────────────────
Locale("bn", "Bengali", "Bengali", "bn-BD",
"এটি অডিও বর্ণনা ভয়েসের একটি প্রিভিউ।"),
# ── Marathi ──────────────────────────────────────────────────────────────
Locale("mr", "Marathi", "Marathi", "mr-IN",
"हे ऑडिओ वर्णन आवाजाचे पूर्वावलोकन आहे."),
# ── Tamil ────────────────────────────────────────────────────────────────
Locale("ta", "Tamil", "Tamil", "ta-IN",
"இது ஆடியோ விளக்க குரலின் முன்னோட்டம்."),
# ── Telugu ───────────────────────────────────────────────────────────────
Locale("te", "Telugu", "Telugu", "te-IN",
"ఇది ఆడియో వివరణ స్వరం యొక్క ప్రివ్యూ."),
# ── Chinese ──────────────────────────────────────────────────────────────
Locale("zh", "Chinese", "Chinese (Simplified)", "zh-CN",
"这是音频描述语音的预览。"),
# ── Czech ────────────────────────────────────────────────────────────────
Locale("cs", "Czech", "Czech", "cs-CZ",
"Toto je náhled hlasu zvukového popisu."),
Locale("cs-CZ", "Czech (Czech Republic)", "Czech (Czech Republic)", "cs-CZ",
"Toto je náhled hlasu zvukového popisu."),
# ── Danish ───────────────────────────────────────────────────────────────
Locale("da", "Danish", "Danish", "da-DK",
"Dette er en forhåndsvisning af lydbeskrivelsesstemmen."),
# ── Finnish ──────────────────────────────────────────────────────────────
Locale("fi", "Finnish", "Finnish", "fi-FI",
"Tämä on äänikuvauksen äänen esikatselu."),
# ── Hungarian ────────────────────────────────────────────────────────────
Locale("hu", "Hungarian", "Hungarian", "hu-HU",
"Ez a hangos leírás hangjának előnézete."),
# ── Norwegian ────────────────────────────────────────────────────────────
Locale("no", "Norwegian", "Norwegian", "nb-NO",
"Dette er en forhåndsvisning av lydbeskrivelsesstemmen."),
# ── Slovak ───────────────────────────────────────────────────────────────
Locale("sk", "Slovak", "Slovak", "sk-SK",
"Toto je náhľad hlasu zvukového popisu."),
# ── Swedish ──────────────────────────────────────────────────────────────
Locale("sv", "Swedish", "Swedish", "sv-SE",
"Det här är en förhandsgranskning av ljudbeskrivningsrösten."),
]}
# xlsx uses underscores; normalize to BCP-47 hyphen form
_XLSX_ALIASES: dict[str, str] = {
code.replace("-", "_").lower(): code
for code in _REGISTRY
if "-" in code
}
# a few extra mappings for edge cases
_XLSX_ALIASES.update({
"id": "id", # Indonesian column header is just "id" (no region)
})
def normalize_code(code: str) -> str:
"""
Normalize an arbitrary locale code to the canonical BCP-47 form used in this registry.
Handles:
- xlsx underscore form: "fr_fr" "fr-FR"
- Bare language code: "fr" "fr" (passthrough, legacy compat)
- Already canonical: "fr-FR" "fr-FR"
"""
if not code:
return code
lowered = code.strip().lower()
# e.g. "fr_fr" -> check alias table
if "_" in lowered:
return _XLSX_ALIASES.get(lowered, code.replace("_", "-").upper() if len(lowered) > 3 else code)
# Already hyphen form — canonicalise case
if "-" in code:
parts = code.split("-", 1)
canonical = f"{parts[0].lower()}-{parts[1].upper()}"
if canonical in _REGISTRY:
return canonical
return canonical
# Bare language code — return as-is (legacy)
return lowered
def get(code: str) -> Locale | None:
"""Return Locale for the given code, or None if unknown."""
canonical = normalize_code(code)
return _REGISTRY.get(canonical) or _REGISTRY.get(canonical.split("-")[0])
def get_display_name(code: str) -> str:
"""Human-readable display name, e.g. 'French (Canada)'."""
locale = get(code)
return locale.display_name if locale else code
def get_gemini_label(code: str) -> str:
"""
Label to use inside Gemini prompts, e.g. 'French (Canada)'.
Gemini models respond more reliably to human-readable language names
than to bare BCP-47 codes when used inside instruction prompts.
"""
locale = get(code)
return locale.gemini_label if locale else code
def get_tts_lang(code: str) -> str:
"""BCP-47 code for the TTS API (may differ from canonical, e.g. es-MX → es-US)."""
locale = get(code)
return locale.tts_lang if locale else code
def get_preview_sample(code: str) -> str:
"""Language-appropriate TTS preview sentence."""
locale = get(code)
if locale:
return locale.preview_sample
# fallback: try parent language then English
parent = get(code.split("-")[0]) if "-" in code else None
if parent:
return parent.preview_sample
return "This is a preview of the audio description voice."
def all_codes() -> list[str]:
"""Return all registered locale codes, sorted."""
return sorted(_REGISTRY.keys())
def all_display_map() -> dict[str, str]:
"""Return {code: display_name} for all registered locales."""
return {code: locale.display_name for code, locale in _REGISTRY.items()}

View file

@ -8,6 +8,7 @@ class VTTCue:
end_time: float # seconds
text: str
identifier: str | None = None
settings: str = ""
class VTTParser:
@ -37,10 +38,11 @@ class VTTParser:
# Parse timing line
if " --> " in line:
timing_match = re.match(r'([\d:.,]+)\s+-->\s+([\d:.,]+)', line)
timing_match = re.match(r'([\d:.,]+)\s+-->\s+([\d:.,]+)\s*(.*)', line)
if timing_match:
start_time = VTTParser._parse_timestamp(timing_match.group(1))
end_time = VTTParser._parse_timestamp(timing_match.group(2))
settings = timing_match.group(3).strip()
# Collect text lines until empty line or next cue
i += 1
@ -49,13 +51,13 @@ class VTTParser:
text_lines.append(lines[i].strip())
i += 1
if text_lines:
cues.append(VTTCue(
start_time=start_time,
end_time=end_time,
text="\n".join(text_lines),
identifier=identifier
))
cues.append(VTTCue(
start_time=start_time,
end_time=end_time,
text="\n".join(text_lines),
identifier=identifier,
settings=settings,
))
else:
i += 1
@ -71,16 +73,19 @@ class VTTParser:
if cue.identifier:
lines.append(cue.identifier)
# Add timing line
# Add timing line (preserve cue settings like line:0%)
start_timestamp = VTTParser._format_timestamp(cue.start_time)
end_timestamp = VTTParser._format_timestamp(cue.end_time)
lines.append(f"{start_timestamp} --> {end_timestamp}")
timing_line = f"{start_timestamp} --> {end_timestamp}"
if cue.settings:
timing_line += f" {cue.settings}"
lines.append(timing_line)
# Add text (can be multi-line)
lines.append(cue.text)
lines.append("") # Empty line between cues
return "\n".join(lines)
return "\n".join(lines) + "\n"
@staticmethod
def _parse_timestamp(timestamp: str) -> float:
@ -121,7 +126,7 @@ class VTTParser:
secs = seconds % 60
whole_secs = int(secs)
milliseconds = int((secs - whole_secs) * 1000)
milliseconds = round((secs - whole_secs) * 1000)
return f"{hours:02d}:{minutes:02d}:{whole_secs:02d}.{milliseconds:03d}"
@ -148,6 +153,22 @@ class VTTEditor:
return VTTParser.build(cues)
@staticmethod
def assert_cue_alignment(en_vtt: str, target_vtt: str, lang: str) -> None:
"""Raise ValueError if target VTT cue count or timestamps diverge from EN master."""
en_cues = VTTParser.parse(en_vtt)
tgt_cues = VTTParser.parse(target_vtt)
if len(tgt_cues) != len(en_cues):
raise ValueError(
f"Cue count mismatch for {lang}: EN has {len(en_cues)}, target has {len(tgt_cues)}"
)
for i, (en, tgt) in enumerate(zip(en_cues, tgt_cues, strict=True)):
if en.start_time != tgt.start_time or en.end_time != tgt.end_time:
raise ValueError(
f"Timestamp mismatch for {lang} cue {i}: "
f"EN {en.start_time}-->{en.end_time}, target {tgt.start_time}-->{tgt.end_time}"
)
@staticmethod
def update_cue_text(vtt_content: str, cue_index: int, new_text: str) -> str:
"""Update text for a specific cue by index"""
@ -186,6 +207,20 @@ class VTTEditor:
return len(errors) == 0, errors
@staticmethod
def fix_overlapping_cues(vtt_content: str) -> str:
"""Trim end_time of each cue so it does not overlap the next cue's start_time."""
cues = VTTParser.parse(vtt_content)
for i in range(1, len(cues)):
if cues[i].start_time < cues[i - 1].end_time:
# Clamp previous cue end to 1ms before next cue start
new_end = cues[i].start_time - 0.001
# Never let end_time go at or below start_time
if new_end <= cues[i - 1].start_time:
new_end = cues[i - 1].start_time + 0.001
cues[i - 1].end_time = new_end
return VTTParser.build(cues)
@staticmethod
def get_cue_count(vtt_content: str) -> int:
"""Get the number of cues in VTT content"""
@ -221,7 +256,7 @@ class VTTEditor:
)
return False, errors
for i, (src, tgt) in enumerate(zip(source_cues, translated_cues)):
for i, (src, tgt) in enumerate(zip(source_cues, translated_cues, strict=False)):
if abs(src.start_time - tgt.start_time) > 0.001:
errors.append(
f"Cue {i + 1}: start time changed "
@ -251,3 +286,33 @@ class VTTEditor:
return VTTParser.build(cues)
# DCMP §6.01 filler patterns per language (whole-word, case-insensitive)
_FILLER_PATTERNS: dict[str, str] = {
"en": r'\b(um+|uh+|ah+|er+|hmm+|you know|i mean|sort of|kind of|basically|literally|honestly|actually|right\?|so yeah)\b',
"es": r'\b(eh+|este|o sea|pues|bueno|o sea que|mmm+)\b',
"fr": r'\b(euh+|beh|ben|donc|quoi|enfin|voilà|genre)\b',
"de": r'\b(äh+|ähm+|halt|ne|also|naja|sozusagen|quasi)\b',
"it": r'\b(ehm+|allora|cioè|tipo|praticamente|insomma|ecco)\b',
"nl": r'\b(eh+|nou|zeg|eigenlijk|gewoon|toch|zo van|hè)\b',
"pt": r'\b(ahn+|hã+|né|sabe|tipo|então|assim)\b',
"pl": r'\b(no|że|bo|znaczy|właśnie|jakby|wiesz)\b',
"uk": r'\b(ну+|ем+|типу|знаєш|значить|власне|от)\b',
"ru": r'\b(ну+|эм+|типа|знаешь|значит|вот|собственно)\b',
}
@staticmethod
def clean_disfluencies(vtt_content: str, lang: str) -> str:
"""Remove filler words and hesitations per DCMP §6.01 for supported languages."""
pattern = VTTEditor._FILLER_PATTERNS.get(lang.split("-")[0].lower())
if not pattern:
return vtt_content
cues = VTTParser.parse(vtt_content)
compiled = re.compile(pattern, re.IGNORECASE)
for cue in cues:
cleaned = compiled.sub("", cue.text)
# Collapse multiple spaces and strip leading/trailing punctuation artifacts
cleaned = re.sub(r'[ \t]{2,}', ' ', cleaned).strip().strip(',').strip()
if cleaned:
cue.text = cleaned
return VTTParser.build(cues)

View file

@ -1,48 +1,55 @@
from contextlib import asynccontextmanager
import sentry_sdk
from fastapi import FastAPI, Request, HTTPException
from fastapi import FastAPI, HTTPException, Request
from fastapi.exceptions import RequestValidationError
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
from sentry_sdk.integrations.fastapi import FastApiIntegration
from sentry_sdk.integrations.redis import RedisIntegration
from sentry_sdk.integrations.pymongo import PyMongoIntegration
from sentry_sdk.integrations.celery import CeleryIntegration
from sentry_sdk.integrations.fastapi import FastApiIntegration
from sentry_sdk.integrations.pymongo import PyMongoIntegration
from sentry_sdk.integrations.redis import RedisIntegration
from .api.v1.routes_admin import router as admin_router
from .api.v1.routes_admin_production import router as admin_production_router
from .api.v1.routes_auth import router as auth_router
from .api.v1.routes_briefs import router as briefs_router
from .api.v1.routes_clients import router as clients_router
from .api.v1.routes_files import router as files_router
from .api.v1.routes_jobs import router as jobs_router
from .api.v1.routes_glossaries import router as glossaries_router
from .api.v1.routes_invitations import org_router as invitations_org_router
from .api.v1.routes_invitations import router as invitations_router
from .api.v1.routes_jobs import router as jobs_router
from .api.v1.routes_language_qc import router as language_qc_router
from .api.v1.routes_organizations import router as organizations_router
from .api.v1.routes_review_notes import router as review_notes_router
from .api.v1.routes_share import router as share_router
from .api.v1.routes_tts import router as tts_router
from .api.v1.routes_vtt_versions import router as vtt_versions_router
from .api.v1.routes_websockets import router as websockets_router
from .services.websocket import connection_manager
from .core.config import settings
from .core.secrets_config import initialize_config
from .core.database import close_mongo_connection, connect_to_mongo, create_indexes, get_database
from .core.database import (
close_mongo_connection,
connect_to_mongo,
get_database,
)
from .core.logging import setup_logging
from .core.redis import close_redis_connection, connect_to_redis, get_redis_client
from .core.secrets_config import initialize_config
from .core.seed import seed_default_admin
from .middleware import create_rate_limit_middleware, create_validation_middleware
from .services.language_qc import seed_language_qc_for_job
from .services.websocket import connection_manager
from .telemetry import (
app_metrics,
instrument_dependencies,
instrument_fastapi_app,
setup_tracing
)
from .services.websocket import connection_manager
@asynccontextmanager
async def lifespan(app: FastAPI):
# Startup
setup_logging()
# Initialize configuration with secrets
if settings.app_env == "prod":
try:
@ -51,7 +58,7 @@ async def lifespan(app: FastAPI):
except Exception as e:
print(f"⚠️ Failed to load secrets from Secret Manager: {e}")
print("⚠️ Falling back to environment variables")
# Initialize Sentry error tracking
if settings.sentry_dsn and settings.sentry_dsn.startswith(('http', 'https')):
sentry_sdk.init(
@ -68,15 +75,15 @@ async def lifespan(app: FastAPI):
attach_stacktrace=True,
send_default_pii=False, # Don't send PII for privacy
)
# Initialize telemetry (disabled for local development)
# setup_tracing("accessible-video-api", "1.0.0")
# instrument_dependencies()
# Start Prometheus metrics server in production
if settings.app_env == "prod":
app_metrics.start_prometheus_server(port=8001)
await connect_to_mongo()
await connect_to_redis()
@ -86,20 +93,37 @@ async def lifespan(app: FastAPI):
except Exception as e:
print(f"⚠️ Could not seed default admin: {e}")
# await create_indexes() # Temporarily disabled for debugging
# T-16: Seed language_qc only for jobs that still lack it (idempotent, skips on subsequent starts)
try:
db = await get_database()
pending_count = await db.jobs.count_documents({"language_qc": {"$exists": False}})
if pending_count > 0:
async for job_doc in db.jobs.find(
{"language_qc": {"$exists": False}},
{"_id": 1, "status": 1, "outputs": 1, "source": 1, "review": 1, "updated_at": 1, "requested_outputs": 1},
):
await seed_language_qc_for_job(db, job_doc)
print(f"✅ language_qc migration complete ({pending_count} jobs seeded)")
except Exception as e:
print(f"⚠️ language_qc migration failed: {e}")
# Start WebSocket connection manager
await connection_manager.start()
# Initialize middleware with Redis client
redis_client = get_redis_client()
if redis_client:
rate_limit_middleware = await create_rate_limit_middleware(redis_client)
validation_middleware = await create_validation_middleware()
# Store middleware in app state for access
app.state.rate_limit_middleware = rate_limit_middleware
app.state.validation_middleware = validation_middleware
elif settings.redis_url:
# T-13: REDIS_URL is configured but client unavailable — rate limiting is disabled
print(f"⚠️ Redis configured at {settings.redis_url!r} but connection failed — rate limiting disabled")
yield
# Shutdown
await connection_manager.stop()
@ -131,18 +155,17 @@ async def cors_error_handler(request, call_next):
try:
response = await call_next(request)
except Exception as e:
# LOG THE EXCEPTION BEFORE HANDLING IT
print(f"🚨 EXCEPTION IN CORS MIDDLEWARE: {e}")
import traceback
print(f"Traceback:\n{traceback.format_exc()}")
# Handle any unhandled exceptions and add CORS headers
from .core.logging import get_logger as _get_logger
_get_logger(__name__).exception("🚨 CORS middleware caught: %s\n%s", e, traceback.format_exc())
from fastapi.responses import JSONResponse
response = JSONResponse(
status_code=500,
content={"detail": "Internal server error", "error": str(e)}
content={"detail": "Internal server error"},
)
# Always add CORS headers for allowed origins
origin = request.headers.get("origin")
if origin and origin in settings.cors_origins_list:
@ -163,7 +186,7 @@ async def http_exception_handler(request: Request, exc: HTTPException):
status_code=exc.status_code,
content={"detail": exc.detail}
)
# Add CORS headers
origin = request.headers.get("origin")
if origin and origin in settings.cors_origins_list:
@ -198,21 +221,18 @@ async def validation_exception_handler(request: Request, exc: RequestValidationE
async def general_exception_handler(request: Request, exc: Exception):
"""Handle all uncaught exceptions with logging"""
import traceback
from .core.logging import get_logger
logger = get_logger(__name__)
logger.error(f"Unhandled exception in {request.method} {request.url.path}: {exc}")
logger.error(f"Exception type: {type(exc).__name__}")
logger.error(f"Traceback: {traceback.format_exc()}")
# Also print to stdout for immediate visibility
print(f"🚨 UNHANDLED EXCEPTION: {request.method} {request.url.path}")
print(f"Exception: {exc}")
print(f"Traceback:\n{traceback.format_exc()}")
logger.exception(
"🚨 Unhandled %s %s: %s\n%s",
request.method, request.url.path, exc, traceback.format_exc(),
)
response = JSONResponse(
status_code=500,
content={"detail": "Internal server error", "error": str(exc)}
content={"detail": "Internal server error"},
)
# Add CORS headers
@ -227,9 +247,6 @@ async def general_exception_handler(request: Request, exc: Exception):
@app.middleware("http")
async def rate_limiting_middleware(request, call_next):
"""Apply rate limiting middleware."""
# Skip middleware for auth endpoints during debugging
if request.url.path in ["/api/v1/auth/login", "/api/v1/auth/refresh"]:
return await call_next(request)
if hasattr(app.state, 'rate_limit_middleware'):
return await app.state.rate_limit_middleware(request, call_next)
return await call_next(request)
@ -237,11 +254,7 @@ async def rate_limiting_middleware(request, call_next):
@app.middleware("http")
async def validation_middleware(request, call_next):
"""Apply request validation middleware."""
# TEMPORARILY DISABLED FOR DEBUGGING
return await call_next(request)
# Skip middleware for auth endpoints during debugging
if request.url.path in ["/api/v1/auth/login", "/api/v1/auth/refresh"]:
if request.url.path in ["/health", "/metrics", "/api/v1/auth/login", "/api/v1/auth/refresh"]:
return await call_next(request)
if hasattr(app.state, 'validation_middleware'):
return await app.state.validation_middleware(request, call_next)
@ -259,54 +272,28 @@ app.include_router(invitations_router, prefix="/api/v1")
app.include_router(files_router, prefix="/api/v1")
app.include_router(jobs_router, prefix="/api/v1")
app.include_router(review_notes_router, prefix="/api/v1")
app.include_router(vtt_versions_router, prefix="/api/v1")
app.include_router(language_qc_router, prefix="/api/v1")
app.include_router(glossaries_router, prefix="/api/v1")
app.include_router(tts_router, prefix="/api/v1")
app.include_router(admin_router, prefix="/api/v1")
app.include_router(admin_production_router, prefix="/api/v1")
app.include_router(briefs_router, prefix="/api/v1")
app.include_router(share_router, prefix="/api/v1")
app.include_router(websockets_router, prefix="/api/v1")
@app.on_event("startup")
async def startup_event():
"""Initialize services on startup"""
logger.info("🚀 Starting up FastAPI application...")
# Start WebSocket connection manager
try:
await connection_manager.start()
logger.info("✅ WebSocket connection manager started successfully")
except Exception as e:
logger.error(f"❌ Failed to start WebSocket connection manager: {e}")
raise
@app.on_event("shutdown")
async def shutdown_event():
"""Cleanup services on shutdown"""
logger.info("🛑 Shutting down FastAPI application...")
# Stop WebSocket connection manager
try:
await connection_manager.stop()
logger.info("✅ WebSocket connection manager stopped successfully")
except Exception as e:
logger.error(f"❌ Error stopping WebSocket connection manager: {e}")
@app.get("/health")
async def health_check():
return {"status": "healthy", "version": "1.0.0"}
@app.get("/debug-test")
async def debug_test():
print("🔥🔥🔥 DEBUG TEST ENDPOINT HIT 🔥🔥🔥")
return {"message": "If you see this, routing works"}
@app.get("/metrics")
async def metrics():
"""Prometheus metrics endpoint"""
from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
from fastapi import Response
from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
return Response(
content=generate_latest(),
media_type=CONTENT_TYPE_LATEST

View file

@ -1,12 +1,16 @@
"""Middleware package for FastAPI application."""
from .rate_limiting import RateLimitMiddleware, IPWhitelist, create_rate_limit_middleware
from .rate_limiting import (
IPWhitelist,
RateLimitMiddleware,
create_rate_limit_middleware,
)
from .validation import ValidationMiddleware, create_validation_middleware
__all__ = [
"RateLimitMiddleware",
"IPWhitelist",
"IPWhitelist",
"create_rate_limit_middleware",
"ValidationMiddleware",
"create_validation_middleware"
]
]

View file

@ -1,14 +1,10 @@
"""Rate limiting middleware for API endpoints."""
import time
from collections import defaultdict
from typing import Dict, Optional, Tuple
import redis.asyncio as aioredis
from fastapi import HTTPException, Request, status
from fastapi import Request, status
from fastapi.responses import JSONResponse
import json
import asyncio
from datetime import datetime, timedelta
from app.core.config import get_settings
from app.telemetry.metrics import track_rate_limit_metrics
@ -16,50 +12,50 @@ from app.telemetry.metrics import track_rate_limit_metrics
class RateLimiter:
"""Redis-based rate limiter with sliding window algorithm."""
def __init__(self, redis_client: aioredis.Redis):
self.redis = redis_client
async def is_allowed(
self,
key: str,
limit: int,
self,
key: str,
limit: int,
window_seconds: int,
identifier: str = ""
) -> Tuple[bool, Dict[str, int]]:
) -> tuple[bool, dict[str, int]]:
"""
Check if request is allowed under rate limit.
Returns:
Tuple of (is_allowed, rate_limit_info)
"""
now = time.time()
pipeline = self.redis.pipeline()
# Remove expired entries
pipeline.zremrangebyscore(key, 0, now - window_seconds)
# Count current requests in window
pipeline.zcard(key)
# Add current request
pipeline.zadd(key, {str(now): now})
# Set expiry
pipeline.expire(key, window_seconds)
results = await pipeline.execute()
current_requests = results[1]
rate_limit_info = {
"limit": limit,
"remaining": max(0, limit - current_requests),
"reset_time": int(now + window_seconds),
"retry_after": window_seconds if current_requests >= limit else 0
}
is_allowed = current_requests <= limit
# Track metrics
track_rate_limit_metrics(
identifier=identifier,
@ -67,17 +63,17 @@ class RateLimiter:
current_requests=current_requests,
limit=limit
)
return is_allowed, rate_limit_info
class RateLimitMiddleware:
"""FastAPI middleware for rate limiting."""
def __init__(self, redis_client: aioredis.Redis):
self.limiter = RateLimiter(redis_client)
self.settings = get_settings()
# Rate limit configurations by endpoint pattern
self.rate_limits = {
# Authentication endpoints
@ -85,93 +81,96 @@ class RateLimitMiddleware:
"POST:/api/v1/auth/register": (3, 3600), # 3 requests per hour
"POST:/api/v1/auth/refresh": (10, 300), # 10 requests per 5 minutes
"POST:/api/v1/auth/forgot-password": (3, 3600), # 3 requests per hour
# File upload endpoints
"POST:/api/v1/files/upload": (10, 3600), # 10 uploads per hour
"POST:/api/v1/jobs": (20, 3600), # 20 job creations per hour
# Job management endpoints
"GET:/api/v1/jobs": (100, 300), # 100 requests per 5 minutes
"PATCH:/api/v1/jobs/*/approve": (50, 3600), # 50 approvals per hour
"PATCH:/api/v1/jobs/*/reject": (50, 3600), # 50 rejections per hour
# VTT editing endpoints
"PATCH:/api/v1/jobs/*/vtt": (100, 3600), # 100 VTT edits per hour
# Admin endpoints (more restrictive)
"GET:/api/v1/admin/*": (50, 300), # 50 requests per 5 minutes
"POST:/api/v1/admin/*": (20, 3600), # 20 admin actions per hour
"PATCH:/api/v1/admin/*": (20, 3600), # 20 admin updates per hour
"DELETE:/api/v1/admin/*": (10, 3600), # 10 admin deletions per hour
}
# Default rate limits
self.default_limits = {
"authenticated": (1000, 3600), # 1000 requests per hour for authenticated users
"anonymous": (100, 3600), # 100 requests per hour for anonymous users
}
def _get_client_identifier(self, request: Request) -> str:
"""Get client identifier for rate limiting."""
# Try to get user ID from JWT token
user = getattr(request.state, 'user', None)
if user:
return f"user:{user.id}"
# Fall back to IP address
forwarded_for = request.headers.get("X-Forwarded-For")
if forwarded_for:
return f"ip:{forwarded_for.split(',')[0].strip()}"
# Only trust X-Forwarded-For when the request arrived via HTTPS (i.e. through
# the Apache/nginx reverse proxy). On plain HTTP (direct connections, local
# dev) the header can be forged, so we fall back to the socket IP.
if request.headers.get("X-Forwarded-Proto") == "https":
forwarded_for = request.headers.get("X-Forwarded-For")
if forwarded_for:
# Take the right-most IP added by the trusted proxy, not client-supplied ones.
return f"ip:{forwarded_for.split(',')[-1].strip()}"
client_ip = request.client.host if request.client else "unknown"
return f"ip:{client_ip}"
def _get_endpoint_key(self, request: Request) -> str:
"""Get endpoint pattern for rate limiting."""
method = request.method
path = request.url.path
# Replace job IDs with wildcard for pattern matching
import re
path = re.sub(r'/jobs/[a-f0-9-]+/', '/jobs/*/', path)
path = re.sub(r'/admin/users/[a-f0-9-]+', '/admin/users/*', path)
return f"{method}:{path}"
def _get_rate_limit(self, request: Request) -> Tuple[int, int]:
def _get_rate_limit(self, request: Request) -> tuple[int, int]:
"""Get rate limit for the current request."""
endpoint_key = self._get_endpoint_key(request)
# Check for specific endpoint limits
if endpoint_key in self.rate_limits:
return self.rate_limits[endpoint_key]
# Check for wildcard matches
for pattern, limits in self.rate_limits.items():
if pattern.endswith("*") and endpoint_key.startswith(pattern[:-1]):
return limits
# Use default limits based on authentication
user = getattr(request.state, 'user', None)
if user:
return self.default_limits["authenticated"]
else:
return self.default_limits["anonymous"]
async def __call__(self, request: Request, call_next):
"""Process rate limiting for the request."""
# Skip rate limiting for health checks and login (temporary for debugging)
if request.url.path in ["/health", "/metrics", "/api/v1/auth/login"]:
# Skip rate limiting for health checks and metrics only
if request.url.path in ["/health", "/metrics"]:
return await call_next(request)
client_id = self._get_client_identifier(request)
endpoint_key = self._get_endpoint_key(request)
limit, window = self._get_rate_limit(request)
# Create rate limit key
rate_limit_key = f"rate_limit:{client_id}:{endpoint_key}"
try:
is_allowed, rate_info = await self.limiter.is_allowed(
key=rate_limit_key,
@ -179,7 +178,7 @@ class RateLimitMiddleware:
window_seconds=window,
identifier=client_id
)
if not is_allowed:
# Return rate limit exceeded response
return JSONResponse(
@ -196,17 +195,17 @@ class RateLimitMiddleware:
"Retry-After": str(rate_info["retry_after"])
}
)
# Process the request
response = await call_next(request)
# Add rate limit headers to response
response.headers["X-RateLimit-Limit"] = str(rate_info["limit"])
response.headers["X-RateLimit-Remaining"] = str(rate_info["remaining"])
response.headers["X-RateLimit-Reset"] = str(rate_info["reset_time"])
return response
except Exception as e:
# Log error but don't block request if rate limiting fails
print(f"Rate limiting error: {e}")
@ -215,30 +214,30 @@ class RateLimitMiddleware:
class IPWhitelist:
"""IP whitelist for bypassing rate limits."""
def __init__(self, redis_client: aioredis.Redis):
self.redis = redis_client
self.whitelist_key = "ip_whitelist"
# Default whitelisted IPs (health checks, monitoring)
self.default_whitelist = {
"127.0.0.1",
"::1",
"169.254.169.254", # GCP metadata server
}
async def is_whitelisted(self, ip: str) -> bool:
"""Check if IP is whitelisted."""
if ip in self.default_whitelist:
return True
try:
is_member = await self.redis.sismember(self.whitelist_key, ip)
return bool(is_member)
except Exception:
return False
async def add_ip(self, ip: str, ttl_seconds: Optional[int] = None) -> bool:
async def add_ip(self, ip: str, ttl_seconds: int | None = None) -> bool:
"""Add IP to whitelist."""
try:
await self.redis.sadd(self.whitelist_key, ip)
@ -249,7 +248,7 @@ class IPWhitelist:
return True
except Exception:
return False
async def remove_ip(self, ip: str) -> bool:
"""Remove IP from whitelist."""
try:
@ -261,4 +260,4 @@ class IPWhitelist:
async def create_rate_limit_middleware(redis_client: aioredis.Redis) -> RateLimitMiddleware:
"""Factory function to create rate limit middleware."""
return RateLimitMiddleware(redis_client)
return RateLimitMiddleware(redis_client)

View file

@ -3,15 +3,17 @@
import json
import re
import time
from typing import Any, Dict, List, Optional, Set
from fastapi import HTTPException, Request, status
from fastapi.responses import JSONResponse
from pydantic import BaseModel, ValidationError as PydanticValidationError
import magic
from typing import Any
from urllib.parse import unquote
import magic
from fastapi import Request, status
from fastapi.responses import JSONResponse
from app.telemetry.metrics import track_validation_metrics
from ..core.config import settings
class ValidationError(Exception):
"""Custom validation error."""
@ -25,89 +27,93 @@ class SecurityValidationError(Exception):
class RequestValidator:
"""Enhanced request validation with security checks."""
def __init__(self):
# File type restrictions
self.allowed_video_types = {
"video/mp4",
"video/quicktime",
"video/quicktime",
"video/x-msvideo" # AVI
}
self.allowed_subtitle_types = {
"text/vtt",
"text/plain"
}
# Security patterns to block
self.malicious_patterns = [
# SQL injection patterns
r"(union|select|insert|update|delete|drop|create|alter)\s+",
r"(script|javascript|vbscript|onload|onerror|onclick)",
r"\b(union|select|insert|update|delete|drop|create|alter)\b\s+",
r"vbscript:", # vbscript protocol injection
r"\b(onload|onerror|onclick)\s*=", # HTML event handler attribute injection
r"<\s*script[^>]*>",
r"javascript:",
r"data:.*base64",
# Path traversal
r"\.\./",
r"\.\.\\",
r"%2e%2e%2f",
r"%2e%2e\\",
# Command injection (removed $ to allow MongoDB operators in controlled contexts)
r"[;&|`](?!\s*$)", # Allow $ but not as command separator
r"(rm|wget|curl|nc|bash|sh|cmd|powershell)\s+",
# MongoDB injection
r"\$where|\$ne|\$gt|\$lt|\$regex",
# Command injection (removed $ and ; — semicolons are common in natural language)
r"[&|`](?!\s*$)",
r"\b(rm|wget|curl|nc|bash|sh|cmd|powershell)\b\s+",
# MongoDB injection — NoSQL operator abuse
r"\$where|\$expr|\$function|\$accumulator"
r"|\$ne|\$nin|\$not"
r"|\$gt|\$gte|\$lt|\$lte"
r"|\$regex|\$jsonSchema|\$mod",
]
self.compiled_patterns = [re.compile(pattern, re.IGNORECASE) for pattern in self.malicious_patterns]
# Max file sizes (in bytes)
self.max_video_size = 2 * 1024 * 1024 * 1024 # 2GB
# Max file sizes (in bytes) — driven by central config (T-14)
self.max_video_size = settings.upload_max_video_bytes
self.max_subtitle_size = 10 * 1024 * 1024 # 10MB
# Request size limits
self.max_json_size = 1024 * 1024 # 1MB
self.max_form_fields = 50
def validate_string_content(self, content: str, field_name: str = "input") -> None:
"""Validate string content for malicious patterns."""
if not isinstance(content, str):
return
for pattern in self.compiled_patterns:
if pattern.search(content):
raise SecurityValidationError(
f"Potentially malicious content detected in {field_name}"
)
def validate_filename(self, filename: str) -> str:
"""Validate and sanitize filename."""
if not filename:
raise ValidationError("Filename cannot be empty")
# Decode URL encoding
filename = unquote(filename)
# Check for malicious patterns
self.validate_string_content(filename, "filename")
# Remove dangerous characters
safe_filename = re.sub(r'[^\w\-_\.]', '_', filename)
# Prevent hidden files
if safe_filename.startswith('.'):
safe_filename = 'file_' + safe_filename[1:]
# Limit length
if len(safe_filename) > 255:
name, ext = safe_filename.rsplit('.', 1) if '.' in safe_filename else (safe_filename, '')
safe_filename = name[:250] + ('.' + ext if ext else '')
return safe_filename
def validate_file_type(self, content: bytes, expected_type: str, filename: str) -> None:
"""Validate file type using magic numbers."""
try:
@ -117,13 +123,13 @@ class RequestValidator:
ext = filename.lower().split('.')[-1] if '.' in filename else ''
video_extensions = {'mp4', 'mov', 'avi', 'mkv'}
subtitle_extensions = {'vtt', 'srt', 'txt'}
if expected_type == "video" and ext not in video_extensions:
raise ValidationError(f"Invalid video file extension: {ext}")
raise ValidationError(f"Invalid video file extension: {ext}") from None
elif expected_type == "subtitle" and ext not in subtitle_extensions:
raise ValidationError(f"Invalid subtitle file extension: {ext}")
raise ValidationError(f"Invalid subtitle file extension: {ext}") from None
return
if expected_type == "video" and detected_type not in self.allowed_video_types:
raise ValidationError(
f"Invalid video file type: {detected_type}. "
@ -134,7 +140,7 @@ class RequestValidator:
f"Invalid subtitle file type: {detected_type}. "
f"Allowed types: {', '.join(self.allowed_subtitle_types)}"
)
def validate_file_size(self, size: int, file_type: str) -> None:
"""Validate file size limits."""
if file_type == "video" and size > self.max_video_size:
@ -147,16 +153,16 @@ class RequestValidator:
f"Subtitle file too large: {size} bytes. "
f"Maximum allowed: {self.max_subtitle_size} bytes"
)
async def validate_json_payload(self, request: Request) -> Optional[Dict[str, Any]]:
async def validate_json_payload(self, request: Request) -> dict[str, Any] | None:
"""Validate JSON request payload."""
if not request.headers.get("content-type", "").startswith("application/json"):
return None
content_length = request.headers.get("content-length")
if content_length and int(content_length) > self.max_json_size:
raise ValidationError(f"JSON payload too large: {content_length} bytes")
try:
# Check if body has already been read
if hasattr(request, '_cached_body'):
@ -165,63 +171,67 @@ class RequestValidator:
body = await request.body()
# Cache the body so FastAPI can read it later
request._cached_body = body
if len(body) > self.max_json_size:
raise ValidationError(f"JSON payload too large: {len(body)} bytes")
if not body:
return {}
payload = json.loads(body)
# Recursively validate all string values
self._validate_json_values(payload)
return payload
except json.JSONDecodeError as e:
raise ValidationError(f"Invalid JSON: {e}")
raise ValidationError(f"Invalid JSON: {e}") from e
# Fields that contain free-form natural language — skip injection pattern checks
_FREETEXT_FIELDS = {"captions_vtt", "audio_description_vtt", "text", "notes", "change_note", "description"}
def _validate_json_values(self, obj: Any, path: str = "root") -> None:
"""Recursively validate JSON values."""
if isinstance(obj, dict):
if len(obj) > self.max_form_fields:
raise ValidationError(f"Too many fields in object at {path}")
for key, value in obj.items():
if isinstance(key, str):
self.validate_string_content(key, f"{path}.{key}")
self._validate_json_values(value, f"{path}.{key}")
self.validate_string_content(key, f"{path}.key")
# Skip pattern scanning for free-text fields (VTT content, notes, etc.)
if key not in self._FREETEXT_FIELDS:
self._validate_json_values(value, f"{path}.{key}")
elif isinstance(obj, list):
if len(obj) > 1000: # Prevent large arrays
raise ValidationError(f"Array too large at {path}")
for i, item in enumerate(obj):
self._validate_json_values(item, f"{path}[{i}]")
elif isinstance(obj, str):
self.validate_string_content(obj, path)
def validate_query_params(self, request: Request) -> None:
"""Validate query parameters."""
for key, value in request.query_params.items():
self.validate_string_content(key, f"query.{key}")
self.validate_string_content(str(value), f"query.{key}")
def validate_headers(self, request: Request) -> None:
"""Validate request headers."""
suspicious_headers = {
"x-forwarded-host",
"x-original-host",
"x-original-host",
"x-rewrite-url"
}
for header_name, header_value in request.headers.items():
# Check for suspicious headers
if header_name.lower() in suspicious_headers:
self.validate_string_content(header_value, f"header.{header_name}")
# Validate user-agent length
if header_name.lower() == "user-agent" and len(header_value) > 500:
raise SecurityValidationError("User-Agent header too long")
@ -229,34 +239,34 @@ class RequestValidator:
class ValidationMiddleware:
"""FastAPI middleware for enhanced request validation."""
def __init__(self):
self.validator = RequestValidator()
async def __call__(self, request: Request, call_next):
"""Process validation for the request."""
start_time = time.time()
validation_errors = []
# Skip validation for timing adjustment endpoint temporarily
if "/vtt/adjust-timing" in request.url.path:
return await call_next(request)
try:
# Validate headers
self.validator.validate_headers(request)
# Validate query parameters
self.validator.validate_query_params(request)
# Validate JSON payload if present
if request.method in ["POST", "PUT", "PATCH"]:
await self.validator.validate_json_payload(request)
# Process the request
response = await call_next(request)
# Track successful validation
track_validation_metrics(
endpoint=request.url.path,
@ -265,10 +275,10 @@ class ValidationMiddleware:
validation_time=time.time() - start_time,
error_types=[]
)
return response
except SecurityValidationError as e:
except SecurityValidationError:
validation_errors.append("security")
track_validation_metrics(
endpoint=request.url.path,
@ -277,7 +287,7 @@ class ValidationMiddleware:
validation_time=time.time() - start_time,
error_types=validation_errors
)
return JSONResponse(
status_code=status.HTTP_400_BAD_REQUEST,
content={
@ -285,7 +295,7 @@ class ValidationMiddleware:
"error_code": "SECURITY_VALIDATION_ERROR"
}
)
except ValidationError as e:
validation_errors.append("format")
track_validation_metrics(
@ -295,7 +305,7 @@ class ValidationMiddleware:
validation_time=time.time() - start_time,
error_types=validation_errors
)
return JSONResponse(
status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
content={
@ -303,7 +313,7 @@ class ValidationMiddleware:
"error_code": "VALIDATION_ERROR"
}
)
except Exception as e:
validation_errors.append("unknown")
track_validation_metrics(
@ -313,7 +323,7 @@ class ValidationMiddleware:
validation_time=time.time() - start_time,
error_types=validation_errors
)
# Log unexpected error but continue processing
print(f"Validation middleware error: {e}")
return await call_next(request)
@ -321,4 +331,4 @@ class ValidationMiddleware:
async def create_validation_middleware() -> ValidationMiddleware:
"""Factory function to create validation middleware."""
return ValidationMiddleware()
return ValidationMiddleware()

View file

@ -1,5 +1,5 @@
"""Database migration framework for MongoDB."""
from .migrator import MigrationManager, Migration
from .migrator import Migration, MigrationManager
__all__ = ["MigrationManager", "Migration"]
__all__ = ["MigrationManager", "Migration"]

View file

@ -1,11 +1,10 @@
"""MongoDB migration framework."""
import os
import importlib.util
from abc import ABC, abstractmethod
from datetime import datetime
from pathlib import Path
from typing import List, Optional
from motor.motor_asyncio import AsyncIOMotorDatabase
from app.core.database import get_database
@ -17,22 +16,23 @@ logger = get_logger(__name__)
class Migration(ABC):
"""Base class for database migrations."""
version: str = "0000-00-00-000000" # overridden by subclass as class variable
description: str = ""
def __init__(self):
self.version: str = "0000-00-00-000000" # Format: YYYY-MM-DD-HHMMSS
self.description: str = ""
self.db: Optional[AsyncIOMotorDatabase] = None
self.db: AsyncIOMotorDatabase | None = None
@abstractmethod
async def up(self) -> None:
"""Apply the migration."""
pass
@abstractmethod
async def down(self) -> None:
"""Rollback the migration."""
pass
async def set_database(self, db: AsyncIOMotorDatabase) -> None:
"""Set the database instance."""
self.db = db
@ -40,7 +40,7 @@ class Migration(ABC):
class MigrationRecord:
"""Represents a migration record in the database."""
def __init__(self, version: str, description: str, applied_at: datetime):
self.version = version
self.description = description
@ -49,163 +49,163 @@ class MigrationRecord:
class MigrationManager:
"""Manages database migrations."""
def __init__(self):
self.db: Optional[AsyncIOMotorDatabase] = None
self.db: AsyncIOMotorDatabase | None = None
self.migrations_dir = Path(__file__).parent / "scripts"
self.collection_name = "migration_history"
async def initialize(self) -> None:
"""Initialize the migration manager."""
self.db = await get_database()
await self._ensure_migration_collection()
async def _ensure_migration_collection(self) -> None:
"""Ensure the migration history collection exists with proper indexes."""
collection = self.db[self.collection_name]
# Create indexes for migration history
await collection.create_index([("version", 1)], unique=True)
await collection.create_index([("applied_at", -1)])
logger.info("Migration history collection initialized")
def discover_migrations(self) -> List[str]:
def discover_migrations(self) -> list[str]:
"""Discover all migration files in the migrations directory."""
if not self.migrations_dir.exists():
logger.warning(f"Migrations directory not found: {self.migrations_dir}")
return []
migration_files = []
for file_path in self.migrations_dir.glob("*.py"):
if file_path.name.startswith("migration_") and not file_path.name.startswith("__"):
migration_files.append(file_path.stem)
# Sort by version (filename should start with version)
migration_files.sort()
return migration_files
async def load_migration(self, migration_name: str) -> Migration:
"""Dynamically load a migration class."""
migration_path = self.migrations_dir / f"{migration_name}.py"
if not migration_path.exists():
raise FileNotFoundError(f"Migration file not found: {migration_path}")
# Load the module
spec = importlib.util.spec_from_file_location(migration_name, migration_path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Get the migration class (assume it's named Migration)
if not hasattr(module, 'Migration'):
raise AttributeError(f"Migration class not found in {migration_name}")
migration_class = getattr(module, 'Migration')
migration_class = module.Migration
migration = migration_class()
await migration.set_database(self.db)
return migration
async def get_applied_migrations(self) -> List[str]:
async def get_applied_migrations(self) -> list[str]:
"""Get list of applied migration versions."""
collection = self.db[self.collection_name]
cursor = collection.find({}, {"version": 1}).sort("version", 1)
applied = []
async for doc in cursor:
applied.append(doc["version"])
return applied
async def record_migration(self, migration: Migration) -> None:
"""Record a successful migration in the database."""
collection = self.db[self.collection_name]
record = {
"version": migration.version,
"description": migration.description,
"applied_at": datetime.utcnow()
}
await collection.insert_one(record)
logger.info(f"Recorded migration: {migration.version} - {migration.description}")
async def remove_migration_record(self, version: str) -> None:
"""Remove a migration record (for rollback)."""
collection = self.db[self.collection_name]
await collection.delete_one({"version": version})
logger.info(f"Removed migration record: {version}")
@trace_async_operation("migration_manager.migrate_up")
async def migrate_up(self, target_version: Optional[str] = None) -> List[str]:
async def migrate_up(self, target_version: str | None = None) -> list[str]:
"""
Apply migrations up to the target version.
Args:
target_version: Version to migrate to. If None, applies all pending migrations.
Returns:
List of applied migration versions.
"""
await self.initialize()
# Discover all migrations
all_migrations = self.discover_migrations()
applied_migrations = await self.get_applied_migrations()
# Find pending migrations
pending_migrations = []
for migration_name in all_migrations:
# Extract version from filename (assumes format: migration_YYYY-MM-DD-HHMMSS_description.py)
version = migration_name.replace("migration_", "").split("_")[0]
if version not in applied_migrations:
if target_version is None or version <= target_version:
pending_migrations.append((migration_name, version))
# Sort by version
pending_migrations.sort(key=lambda x: x[1])
applied = []
for migration_name, version in pending_migrations:
try:
logger.info(f"Applying migration: {migration_name}")
migration = await self.load_migration(migration_name)
await migration.up()
await self.record_migration(migration)
applied.append(version)
logger.info(f"Successfully applied migration: {version}")
except Exception as e:
logger.error(f"Failed to apply migration {migration_name}: {e}")
raise
return applied
@trace_async_operation("migration_manager.migrate_down")
async def migrate_down(self, target_version: str) -> List[str]:
async def migrate_down(self, target_version: str) -> list[str]:
"""
Rollback migrations down to the target version.
Args:
target_version: Version to rollback to.
Returns:
List of rolled back migration versions.
"""
await self.initialize()
applied_migrations = await self.get_applied_migrations()
# Find migrations to rollback (newer than target)
to_rollback = []
for version in reversed(applied_migrations):
if version > target_version:
to_rollback.append(version)
rolled_back = []
for version in to_rollback:
try:
@ -215,39 +215,39 @@ class MigrationManager:
if version in migration_file:
migration_name = migration_file
break
if not migration_name:
logger.warning(f"Migration file not found for version {version}")
continue
logger.info(f"Rolling back migration: {migration_name}")
migration = await self.load_migration(migration_name)
await migration.down()
await self.remove_migration_record(version)
rolled_back.append(version)
logger.info(f"Successfully rolled back migration: {version}")
except Exception as e:
logger.error(f"Failed to rollback migration {version}: {e}")
raise
return rolled_back
async def get_migration_status(self) -> dict:
"""Get current migration status."""
await self.initialize()
all_migrations = self.discover_migrations()
applied_migrations = await self.get_applied_migrations()
pending_count = len(all_migrations) - len(applied_migrations)
return {
"total_migrations": len(all_migrations),
"applied_migrations": len(applied_migrations),
"pending_migrations": pending_count,
"latest_applied": applied_migrations[-1] if applied_migrations else None,
"all_applied": applied_migrations
}
}

View file

@ -0,0 +1,22 @@
"""Entry point for running migrations: python -m app.migrations.run"""
import asyncio
from app.core.database import close_mongo_connection, connect_to_mongo
from app.migrations.migrator import MigrationManager
async def main() -> None:
await connect_to_mongo()
try:
mgr = MigrationManager()
applied = await mgr.migrate_up()
if applied:
print(f"Applied {len(applied)} migration(s): {applied}")
else:
print("Already up to date — no pending migrations.")
finally:
await close_mongo_connection()
if __name__ == "__main__":
asyncio.run(main())

View file

@ -1,39 +1,38 @@
"""Initial database schema setup migration."""
from datetime import datetime
from app.migrations.migrator import Migration
class Migration(Migration):
"""Initial schema setup with all collections and indexes."""
def __init__(self):
super().__init__()
self.version = "2025-08-17-120000"
self.description = "Initial database schema with users, jobs, and audit_logs collections"
async def up(self) -> None:
"""Create initial collections and indexes."""
# Users collection setup
await self.db.users.create_index([("email", 1)], unique=True)
await self.db.users.create_index([("role", 1)])
await self.db.users.create_index([("is_active", 1)])
await self.db.users.create_index([("created_at", -1)])
# Jobs collection setup
await self.db.jobs.create_index([("status", 1), ("created_at", -1)])
await self.db.jobs.create_index([("client_id", 1)])
await self.db.jobs.create_index([("updated_at", -1)])
await self.db.jobs.create_index([("languages", 1)])
# Create compound index for job queries
await self.db.jobs.create_index([
("status", 1),
("client_id", 1),
("created_at", -1)
])
# Audit logs collection setup
await self.db.audit_logs.create_index([("timestamp", -1)])
await self.db.audit_logs.create_index([("action", 1), ("timestamp", -1)])
@ -42,23 +41,23 @@ class Migration(Migration):
await self.db.audit_logs.create_index([("resource_type", 1), ("resource_id", 1)])
await self.db.audit_logs.create_index([("ip_address", 1), ("timestamp", -1)])
await self.db.audit_logs.create_index([("success", 1), ("timestamp", -1)])
# Text search index for audit logs
await self.db.audit_logs.create_index([
("description", "text"),
("details", "text"),
("error_message", "text")
])
print(f"✅ Applied migration {self.version}: {self.description}")
async def down(self) -> None:
"""Drop all collections (destructive - use with caution)."""
# This is a destructive operation - in production, you might want to backup first
await self.db.users.drop()
await self.db.jobs.drop()
await self.db.audit_logs.drop()
print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print("⚠️ WARNING: All data has been deleted!")
print("⚠️ WARNING: All data has been deleted!")

View file

@ -5,75 +5,75 @@ from app.migrations.migrator import Migration
class Migration(Migration):
"""Optimize indexes for better query performance."""
def __init__(self):
super().__init__()
self.version = "2025-08-17-120001"
self.description = "Index optimization for query performance improvements"
async def up(self) -> None:
"""Add optimized indexes for common query patterns."""
# Jobs collection optimizations
# Index for job status transitions and monitoring
await self.db.jobs.create_index([
("status", 1),
("updated_at", -1),
("client_id", 1)
], name="jobs_status_updated_client_idx")
# Index for queue management (pending jobs)
await self.db.jobs.create_index([
("status", 1),
("created_at", 1)
], name="jobs_queue_processing_idx")
# Index for client job history
await self.db.jobs.create_index([
("client_id", 1),
("created_at", -1),
("status", 1)
], name="jobs_client_history_idx")
# Sparse index for error tracking
await self.db.jobs.create_index([
("status", 1),
("error", 1)
], sparse=True, name="jobs_error_tracking_idx")
# Users collection optimizations
# Index for active user queries
await self.db.users.create_index([
("is_active", 1),
("role", 1),
("last_login_at", -1)
], name="users_active_role_login_idx")
# Index for user search by email pattern
await self.db.users.create_index([
("email", "text"),
("first_name", "text"),
("last_name", "text")
], name="users_search_idx")
# Audit logs collection optimizations
# Compound index for security monitoring
await self.db.audit_logs.create_index([
("severity", 1),
("action", 1),
("timestamp", -1)
], name="audit_security_monitoring_idx")
# Index for user activity analysis
await self.db.audit_logs.create_index([
("user_id", 1),
("action", 1),
("timestamp", -1)
], name="audit_user_activity_idx")
# Index for resource access tracking
await self.db.audit_logs.create_index([
("resource_type", 1),
@ -81,30 +81,30 @@ class Migration(Migration):
("action", 1),
("timestamp", -1)
], name="audit_resource_access_idx")
# Sparse index for failed operations
await self.db.audit_logs.create_index([
("success", 1),
("timestamp", -1)
], sparse=True, name="audit_failures_idx")
# Add TTL index for automatic audit log cleanup (optional)
# Uncomment if you want automatic cleanup after 2 years
# await self.db.audit_logs.create_index(
# [("timestamp", 1)],
# [("timestamp", 1)],
# expireAfterSeconds=63072000, # 2 years
# name="audit_ttl_idx"
# )
print(f"✅ Applied migration {self.version}: {self.description}")
async def down(self) -> None:
"""Remove the optimized indexes."""
# Drop the indexes we created
indexes_to_drop = [
"jobs_status_updated_client_idx",
"jobs_queue_processing_idx",
"jobs_queue_processing_idx",
"jobs_client_history_idx",
"jobs_error_tracking_idx",
"users_active_role_login_idx",
@ -114,21 +114,21 @@ class Migration(Migration):
"audit_resource_access_idx",
"audit_failures_idx"
]
for index_name in indexes_to_drop:
try:
await self.db.jobs.drop_index(index_name)
except Exception:
pass # Index might not exist on this collection
try:
await self.db.users.drop_index(index_name)
except Exception:
pass
try:
await self.db.audit_logs.drop_index(index_name)
except Exception:
pass
print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print(f"⚠️ Rolled back migration {self.version}: {self.description}")

View file

@ -1,20 +1,21 @@
"""Migrate audit log schema from basic to comprehensive format."""
from datetime import datetime
from app.migrations.migrator import Migration
class Migration(Migration):
"""Update audit log schema to comprehensive format."""
def __init__(self):
super().__init__()
self.version = "2025-08-17-120002"
self.description = "Update audit log schema from basic to comprehensive format"
async def up(self) -> None:
"""Migrate existing audit logs to new schema format."""
# Find all existing audit logs with old schema
old_logs_cursor = self.db.audit_logs.find({
# Look for logs that have the old schema structure
@ -24,9 +25,9 @@ class Migration(Migration):
{"timestamp": {"$exists": False}} # Missing new timestamp field
]
})
migration_count = 0
async for old_log in old_logs_cursor:
try:
# Map old fields to new schema
@ -38,82 +39,82 @@ class Migration(Migration):
"description": old_log.get("action", "Legacy action"),
"success": True,
"environment": "prod",
"service_name": "accessible-video-api",
"service_name": "accessible-video-api",
"api_version": "v1"
}
# Map optional fields if they exist
if "user_id" in old_log:
new_log["user_id"] = old_log["user_id"]
if "job_id" in old_log:
new_log["resource_type"] = "job"
new_log["resource_id"] = old_log["job_id"]
if "ip_address" in old_log:
new_log["ip_address"] = old_log["ip_address"]
if "user_agent" in old_log:
new_log["user_agent"] = old_log["user_agent"]
if "details" in old_log:
new_log["details"] = old_log["details"]
# Replace the old document with the new schema
await self.db.audit_logs.replace_one(
{"_id": old_log["_id"]},
new_log
)
migration_count += 1
except Exception as e:
print(f"Error migrating audit log {old_log.get('_id')}: {e}")
continue
print(f"✅ Applied migration {self.version}: Migrated {migration_count} audit log records")
def _map_old_action(self, old_action: str) -> str:
"""Map old action strings to new AuditAction enum values."""
action_mapping = {
# Job actions
"job_created": "job.create",
"job_approved": "job.approve",
"job_approved": "job.approve",
"job_rejected": "job.reject",
"job_updated": "job.update",
"job_cancelled": "job.cancel",
# Auth actions
"login": "auth.login.success",
"logout": "auth.logout",
"login_failed": "auth.login.failure",
# File actions
"file_uploaded": "file.upload",
"file_downloaded": "file.download",
# VTT actions
"vtt_edited": "vtt.edit",
# Admin actions
"user_created": "user.create",
"user_updated": "user.update",
"user_deleted": "user.delete",
}
return action_mapping.get(old_action, old_action)
async def down(self) -> None:
"""Rollback to old audit log schema format (limited)."""
# Find all audit logs with new schema
new_logs_cursor = self.db.audit_logs.find({
"timestamp": {"$exists": True},
"action": {"$exists": True}
})
rollback_count = 0
async for new_log in new_logs_cursor:
try:
# Map new fields back to old schema (lossy conversion)
@ -122,34 +123,34 @@ class Migration(Migration):
"when": new_log["timestamp"],
"action": new_log["action"]
}
# Map back optional fields
if "user_id" in new_log:
old_log["user_id"] = new_log["user_id"]
if "resource_type" in new_log and new_log["resource_type"] == "job":
old_log["job_id"] = new_log.get("resource_id")
if "ip_address" in new_log:
old_log["ip_address"] = new_log["ip_address"]
if "user_agent" in new_log:
old_log["user_agent"] = new_log["user_agent"]
if "details" in new_log:
old_log["details"] = new_log["details"]
# Replace with old schema
await self.db.audit_logs.replace_one(
{"_id": new_log["_id"]},
old_log
)
rollback_count += 1
except Exception as e:
print(f"Error rolling back audit log {new_log.get('_id')}: {e}")
continue
print(f"⚠️ Rolled back migration {self.version}: Reverted {rollback_count} audit log records")
print("⚠️ WARNING: Some audit log data may have been lost due to schema differences")
print("⚠️ WARNING: Some audit log data may have been lost due to schema differences")

View file

@ -24,7 +24,7 @@ class Migration(Migration):
# Create index on auth_provider for faster queries
await self.db.users.create_index([("auth_provider", 1)])
print(f"✅ Created index on auth_provider field")
print("✅ Created index on auth_provider field")
print(f"✅ Applied migration {self.version}: {self.description}")
@ -34,7 +34,7 @@ class Migration(Migration):
# Drop the index
try:
await self.db.users.drop_index("auth_provider_1")
print(f"✅ Dropped index on auth_provider field")
print("✅ Dropped index on auth_provider field")
except Exception as e:
print(f"⚠️ Could not drop index: {e}")

View file

@ -75,7 +75,7 @@ class Migration(Migration):
"validationLevel": "moderate", # moderate = only validate on insert/update, not existing docs
"validationAction": "error" # error = reject invalid documents
})
print(f"✅ Updated users collection validator")
print("✅ Updated users collection validator")
except Exception as e:
print(f"⚠️ Could not update validator: {e}")
# Try creating the collection if it doesn't exist
@ -86,7 +86,7 @@ class Migration(Migration):
validationLevel="moderate",
validationAction="error"
)
print(f"✅ Created users collection with validator")
print("✅ Created users collection with validator")
except Exception as e2:
print(f"⚠️ Could not create collection: {e2}")
@ -136,4 +136,4 @@ class Migration(Migration):
})
print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print(f"⚠️ WARNING: Production role users will fail validation!")
print("⚠️ WARNING: Production role users will fail validation!")

View file

@ -53,7 +53,7 @@ class Migration(Migration):
"validationLevel": "moderate",
"validationAction": "error"
})
print(f" Updated jobs collection validator")
print(" Updated jobs collection validator")
except Exception as e:
print(f" Could not update validator: {e}")
raise
@ -101,4 +101,4 @@ class Migration(Migration):
})
print(f" Rolled back migration {self.version}: {self.description}")
print(f" WARNING: Jobs with approved_source or qc_feedback status will fail validation!")
print(" WARNING: Jobs with approved_source or qc_feedback status will fail validation!")

View file

@ -54,7 +54,7 @@ class Migration(Migration):
"validationLevel": "moderate",
"validationAction": "error"
})
print(f" Updated jobs collection validator")
print(" Updated jobs collection validator")
except Exception as e:
print(f" Could not update validator: {e}")
raise
@ -104,4 +104,4 @@ class Migration(Migration):
})
print(f" Rolled back migration {self.version}: {self.description}")
print(f" WARNING: Jobs with rendering_video status will fail validation!")
print(" WARNING: Jobs with rendering_video status will fail validation!")

View file

@ -60,7 +60,7 @@ class Migration(Migration):
"validationLevel": "moderate",
"validationAction": "error"
})
print(f" Updated jobs collection validator")
print(" Updated jobs collection validator")
except Exception as e:
print(f" Could not update validator: {e}")
raise
@ -111,4 +111,4 @@ class Migration(Migration):
})
print(f" Rolled back migration {self.version}: {self.description}")
print(f" WARNING: Jobs with tts_failed or render_failed status will fail validation!")
print(" WARNING: Jobs with tts_failed or render_failed status will fail validation!")

View file

@ -61,7 +61,7 @@ class Migration(Migration):
"validationLevel": "moderate",
"validationAction": "error"
})
print(f" Updated jobs collection validator")
print(" Updated jobs collection validator")
except Exception as e:
print(f" Could not update validator: {e}")
raise
@ -114,4 +114,4 @@ class Migration(Migration):
})
print(f" Rolled back migration {self.version}: {self.description}")
print(f" WARNING: Jobs with rendering_qc status will fail validation!")
print(" WARNING: Jobs with rendering_qc status will fail validation!")

View file

@ -64,7 +64,7 @@ class Migration(Migration):
"validationLevel": "moderate",
"validationAction": "error"
})
print(f"✅ Updated users collection validator")
print("✅ Updated users collection validator")
except Exception as e:
print(f"⚠️ Could not update validator: {e}")
try:
@ -74,7 +74,7 @@ class Migration(Migration):
validationLevel="moderate",
validationAction="error"
)
print(f"✅ Created users collection with validator")
print("✅ Created users collection with validator")
except Exception as e2:
print(f"⚠️ Could not create collection: {e2}")
@ -134,4 +134,4 @@ class Migration(Migration):
})
print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print(f"⚠️ WARNING: Linguist role users will fail validation!")
print("⚠️ WARNING: Linguist role users will fail validation!")

View file

@ -69,7 +69,7 @@ class Migration(Migration):
"validationLevel": "moderate",
"validationAction": "error"
})
print(f"✅ Updated users collection validator")
print("✅ Updated users collection validator")
except Exception as e:
print(f"⚠️ Could not update validator: {e}")
try:
@ -79,7 +79,7 @@ class Migration(Migration):
validationLevel="moderate",
validationAction="error"
)
print(f"✅ Created users collection with validator")
print("✅ Created users collection with validator")
except Exception as e2:
print(f"⚠️ Could not create collection: {e2}")
@ -139,4 +139,4 @@ class Migration(Migration):
})
print(f"⚠️ Rolled back migration {self.version}: {self.description}")
print(f"⚠️ WARNING: project_manager role users will fail validation!")
print("⚠️ WARNING: project_manager role users will fail validation!")

View file

@ -1,6 +1,6 @@
"""Backfill memberships collection from existing pm_client_ids and team.member_user_ids."""
from datetime import datetime, timezone
from datetime import UTC, datetime
from app.migrations.migrator import Migration
@ -13,7 +13,7 @@ class Migration(Migration):
self.description = "Backfill memberships from pm_client_ids and team member lists"
async def up(self) -> None:
now = datetime.now(timezone.utc)
now = datetime.now(UTC)
upserted = 0
# 1. PROJECT_MANAGER users → MANAGER membership for each pm_client_id

View file

@ -0,0 +1,53 @@
"""Add PROCESSING_FAILED status to job schema validator and create failure indexes."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-29-000000"
description = "Add processing_failed status and failure/status compound indexes on jobs"
async def up(self) -> None:
db = self.db
# Add processing_failed to the schema validator enum (if validator exists)
try:
validator_info = await db.command(
"listCollections", filter={"name": "jobs"}
)
collections = [c async for c in validator_info["cursor"]]
if collections and collections[0].get("options", {}).get("validator"):
existing_validator = collections[0]["options"]["validator"]
status_path = (
existing_validator.get("$jsonSchema", {})
.get("properties", {})
.get("status", {})
.get("enum", [])
)
if status_path and "processing_failed" not in status_path:
status_path.append("processing_failed")
await db.command(
"collMod",
"jobs",
validator=existing_validator,
validationAction="warn",
)
except Exception:
# No validator or unsupported — skip gracefully
pass
# Indexes for failure dashboard queries
await db.jobs.create_index(
[("failure.step", 1), ("status", 1)],
name="idx_jobs_failure_step_status",
background=True,
)
await db.jobs.create_index(
[("status", 1), ("organization_id", 1), ("created_at", -1)],
name="idx_jobs_status_org_created",
background=True,
)
async def down(self) -> None:
db = self.db
await db.jobs.drop_index("idx_jobs_failure_step_status")
await db.jobs.drop_index("idx_jobs_status_org_created")

View file

@ -0,0 +1,46 @@
"""Create job_briefs collection with indexes."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-29-000001"
description = "Create job_briefs collection and indexes"
async def up(self) -> None:
db = self.db
# Ensure collection exists (insert + delete a dummy doc)
try:
await db.create_collection("job_briefs")
except Exception:
pass # already exists
await db.job_briefs.create_index(
[("organization_id", 1), ("status", 1), ("created_at", -1)],
name="idx_briefs_org_status_created",
background=True,
)
await db.job_briefs.create_index(
[("created_by", 1)],
name="idx_briefs_created_by",
background=True,
)
await db.job_briefs.create_index(
[("project_id", 1)],
name="idx_briefs_project_id",
background=True,
sparse=True,
)
await db.job_briefs.create_index(
[("job_id", 1)],
name="idx_briefs_job_id",
background=True,
sparse=True,
)
async def down(self) -> None:
db = self.db
await db.job_briefs.drop_index("idx_briefs_org_status_created")
await db.job_briefs.drop_index("idx_briefs_created_by")
await db.job_briefs.drop_index("idx_briefs_project_id")
await db.job_briefs.drop_index("idx_briefs_job_id")

View file

@ -0,0 +1,44 @@
"""Backfill Membership.team_ids from Team.member_user_ids (MT-17)."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-30-000000"
description = "Backfill team_ids on Membership records from Team.member_user_ids"
async def up(self) -> None:
db = self.db
upserted = 0
# For each team that has member_user_ids, push team_id into the matching Membership
async for team in db.teams.find(
{"member_user_ids": {"$exists": True, "$ne": []}},
{"_id": 1, "client_id": 1, "member_user_ids": 1},
):
team_id = str(team["_id"])
org_id = str(team.get("client_id", ""))
for user_id in team.get("member_user_ids", []):
result = await db.memberships.update_one(
{"user_id": str(user_id), "organization_id": org_id},
{"$addToSet": {"team_ids": team_id}},
)
if result.modified_count:
upserted += 1
# Ensure index for efficient team-based lookups
await db.memberships.create_index(
[("team_ids", 1)],
name="idx_memberships_team_ids",
background=True,
sparse=True,
)
print(f"✅ Backfilled team_ids on {upserted} Membership records")
async def down(self) -> None:
db = self.db
await db.memberships.update_many({}, {"$unset": {"team_ids": ""}})
try:
await db.memberships.drop_index("idx_memberships_team_ids")
except Exception:
pass

View file

@ -0,0 +1,38 @@
"""Add cancelled status to job schema validator."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-04-30-000001"
description = "Add cancelled status to jobs collection schema validator"
async def up(self) -> None:
db = self.db
try:
validator_info = await db.command(
"listCollections", filter={"name": "jobs"}
)
collections = [c async for c in validator_info["cursor"]]
if collections and collections[0].get("options", {}).get("validator"):
existing_validator = collections[0]["options"]["validator"]
status_path = (
existing_validator.get("$jsonSchema", {})
.get("properties", {})
.get("status", {})
.get("enum", [])
)
if status_path and "cancelled" not in status_path:
status_path.append("cancelled")
await db.command(
"collMod",
"jobs",
validator=existing_validator,
validationAction="warn",
)
except Exception:
# No validator or unsupported — skip gracefully
pass
async def down(self) -> None:
pass

View file

@ -0,0 +1,47 @@
"""Replace status enum in $jsonSchema validator with the full current list."""
from app.migrations.migrator import Migration
ALL_STATUSES = [
"created", "ingesting", "ai_processing",
"pending_qc", "approved_english", "approved_source",
"rejected", "qc_feedback",
"translating", "tts_generating", "tts_failed",
"rendering_video", "render_failed", "rendering_qc",
"pending_final_review", "completed",
"processing_failed", "cancelled",
]
class Migration(Migration):
version = "2026-04-30-000002"
description = "Fix status enum in jobs $jsonSchema validator (add processing_failed + cancelled)"
async def up(self) -> None:
db = self.db
result = await db.command("listCollections", filter={"name": "jobs"})
batch = result.get("cursor", {}).get("firstBatch", [])
if not batch:
return
existing_validator = batch[0].get("options", {}).get("validator")
if not existing_validator:
return
schema = existing_validator.get("$jsonSchema", {})
status_prop = schema.get("properties", {}).get("status")
if not status_prop:
return
status_prop["enum"] = ALL_STATUSES
await db.command(
"collMod",
"jobs",
validator=existing_validator,
validationLevel="moderate",
validationAction="error",
)
async def down(self) -> None:
pass

View file

@ -0,0 +1,26 @@
"""Backfill source_has_ad=False on existing jobs and job_briefs."""
from app.migrations.migrator import Migration
class Migration(Migration):
version = "2026-05-08-000000"
description = "Add source_has_ad field to jobs.source and job_briefs"
async def up(self) -> None:
db = self.db
jobs_result = await db.jobs.update_many(
{"source.source_has_ad": {"$exists": False}},
{"$set": {"source.source_has_ad": False}},
)
briefs_result = await db.job_briefs.update_many(
{"source_has_ad": {"$exists": False}},
{"$set": {"source_has_ad": False}},
)
print(f"✅ Backfilled source_has_ad on {jobs_result.modified_count} jobs, {briefs_result.modified_count} job_briefs")
async def down(self) -> None:
db = self.db
await db.jobs.update_many({}, {"$unset": {"source.source_has_ad": ""}})
await db.job_briefs.update_many({}, {"$unset": {"source_has_ad": ""}})

View file

@ -1,17 +1,18 @@
"""Audit log model for tracking sensitive operations."""
from datetime import datetime
from enum import Enum
from typing import Any, Dict, Optional
from enum import StrEnum
from typing import Any
from bson import ObjectId
from pydantic import BaseModel, Field
from .user import PyObjectId
class AuditAction(str, Enum):
class AuditAction(StrEnum):
"""Enumeration of auditable actions."""
# Authentication actions
LOGIN_SUCCESS = "auth.login.success"
LOGIN_FAILURE = "auth.login.failure"
@ -19,7 +20,7 @@ class AuditAction(str, Enum):
TOKEN_REFRESH = "auth.token.refresh"
PASSWORD_CHANGE = "auth.password.change"
PASSWORD_RESET = "auth.password.reset"
# User management actions
USER_CREATE = "user.create"
USER_UPDATE = "user.update"
@ -27,7 +28,7 @@ class AuditAction(str, Enum):
USER_ROLE_CHANGE = "user.role.change"
USER_ACTIVATE = "user.activate"
USER_DEACTIVATE = "user.deactivate"
# Job management actions
JOB_CREATE = "job.create"
JOB_UPDATE = "job.update"
@ -36,24 +37,89 @@ class AuditAction(str, Enum):
JOB_REJECT = "job.reject"
JOB_CANCEL = "job.cancel"
JOB_STATUS_CHANGE = "job.status.change"
JOB_TASK_FAILED = "job.task.failed"
JOB_RETRY = "job.retry"
JOB_BULK_RETRY = "job.bulk_retry"
# File operations
FILE_UPLOAD = "file.upload"
FILE_DOWNLOAD = "file.download"
FILE_DELETE = "file.delete"
FILE_ACCESS = "file.access"
# VTT editing actions
VTT_EDIT = "vtt.edit"
VTT_APPROVE = "vtt.approve"
VTT_REJECT = "vtt.reject"
VTT_RETRANSLATE = "vtt.retranslate"
# Per-language QC actions
LANGUAGE_QC_ASSIGN = "language_qc.assign"
LANGUAGE_QC_REASSIGN = "language_qc.reassign"
LANGUAGE_QC_REVIEWER_ASSIGN = "language_qc.reviewer_assign"
LANGUAGE_QC_REVIEWER_REASSIGN = "language_qc.reviewer_reassign"
LANGUAGE_QC_SUBMIT = "language_qc.submit"
LANGUAGE_QC_OPEN_REVIEW = "language_qc.open_review"
LANGUAGE_QC_APPROVE = "language_qc.approve"
LANGUAGE_QC_REJECT = "language_qc.reject"
LANGUAGE_QC_REOPEN = "language_qc.reopen"
LANGUAGE_QC_COMMENT = "language_qc.comment"
# Admin actions
ADMIN_CONFIG_CHANGE = "admin.config.change"
ADMIN_SYSTEM_ACTION = "admin.system.action"
ADMIN_DATA_EXPORT = "admin.data.export"
ADMIN_AUDIT_ACCESS = "admin.audit.access"
# Glossary management
GLOSSARY_UPLOAD = "glossary.upload"
GLOSSARY_VERSION_UPLOAD = "glossary.version.upload"
GLOSSARY_ACTIVATE = "glossary.activate"
GLOSSARY_ARCHIVE = "glossary.archive"
# Client management
CLIENT_CREATE = "client.create"
CLIENT_UPDATE = "client.update"
CLIENT_DEACTIVATE = "client.deactivate"
CLIENT_PM_ASSIGN = "client.pm_assign"
CLIENT_PM_REMOVE = "client.pm_remove"
CLIENT_TEAM_CREATE = "client.team_create"
CLIENT_TEAM_UPDATE = "client.team_update"
CLIENT_TEAM_DELETE = "client.team_delete"
CLIENT_TEAM_MEMBER_ADD = "client.team_member_add"
CLIENT_TEAM_MEMBER_REMOVE = "client.team_member_remove"
CLIENT_PROJECT_CREATE = "client.project_create"
CLIENT_PROJECT_UPDATE = "client.project_update"
CLIENT_PROJECT_ARCHIVE = "client.project_archive"
# Organization management
ORG_CREATE = "org.create"
ORG_UPDATE = "org.update"
ORG_MEMBER_ADD = "org.member_add"
ORG_MEMBER_UPDATE = "org.member_update"
ORG_MEMBER_REMOVE = "org.member_remove"
# Invitations
INVITATION_CREATE = "invitation.create"
INVITATION_REVOKE = "invitation.revoke"
INVITATION_ACCEPT = "invitation.accept"
# Language QC (additional)
LANGUAGE_QC_BULK_ASSIGN = "language_qc.bulk_assign"
LANGUAGE_QC_START_WORK = "language_qc.start_work"
LANGUAGE_QC_MARK_CUE_REVIEWED = "language_qc.mark_cue_reviewed"
# Brief management
BRIEF_CREATE = "brief.create"
BRIEF_UPDATE = "brief.update"
BRIEF_SUBMIT = "brief.submit"
BRIEF_APPROVE = "brief.approve"
# Share tokens
SHARE_TOKEN_CREATE = "share.token_create"
SHARE_TOKEN_REVOKE = "share.token_revoke"
SHARE_CLIENT_DECISION = "share.client_decision"
# Security events
RATE_LIMIT_EXCEEDED = "security.rate_limit.exceeded"
VALIDATION_FAILURE = "security.validation.failure"
@ -61,9 +127,9 @@ class AuditAction(str, Enum):
SUSPICIOUS_ACTIVITY = "security.suspicious.activity"
class AuditLogSeverity(str, Enum):
class AuditLogSeverity(StrEnum):
"""Severity levels for audit events."""
INFO = "info" # Normal operations
WARNING = "warning" # Suspicious but not critical
ERROR = "error" # Failed operations
@ -72,43 +138,43 @@ class AuditLogSeverity(str, Enum):
class AuditLog(BaseModel):
"""Audit log entry model."""
id: Optional[PyObjectId] = Field(default_factory=PyObjectId, alias="_id")
id: PyObjectId | None = Field(default_factory=lambda: str(ObjectId()), alias="_id")
# Core audit fields
timestamp: datetime = Field(default_factory=datetime.utcnow)
action: AuditAction
severity: AuditLogSeverity = AuditLogSeverity.INFO
# Actor information
user_id: Optional[PyObjectId] = None
user_email: Optional[str] = None
user_role: Optional[str] = None
user_id: PyObjectId | None = None
user_email: str | None = None
user_role: str | None = None
# Request context
ip_address: Optional[str] = None
user_agent: Optional[str] = None
request_id: Optional[str] = None
session_id: Optional[str] = None
ip_address: str | None = None
user_agent: str | None = None
request_id: str | None = None
session_id: str | None = None
# Resource information
resource_type: Optional[str] = None # e.g., "job", "user", "file"
resource_id: Optional[str] = None
resource_name: Optional[str] = None
resource_type: str | None = None # e.g., "job", "user", "file"
resource_id: str | None = None
resource_name: str | None = None
# Action details
description: str
details: Dict[str, Any] = Field(default_factory=dict)
details: dict[str, Any] = Field(default_factory=dict)
# Outcome
success: bool = True
error_message: Optional[str] = None
error_message: str | None = None
# Additional metadata
environment: str = "prod"
service_name: str = "accessible-video-api"
api_version: str = "v1"
class Config:
populate_by_name = True
arbitrary_types_allowed = True
@ -117,49 +183,49 @@ class AuditLog(BaseModel):
class AuditLogCreate(BaseModel):
"""Schema for creating audit log entries."""
action: AuditAction
severity: AuditLogSeverity = AuditLogSeverity.INFO
description: str
# Optional fields that can be provided
user_id: Optional[PyObjectId] = None
user_email: Optional[str] = None
user_role: Optional[str] = None
ip_address: Optional[str] = None
user_agent: Optional[str] = None
request_id: Optional[str] = None
resource_type: Optional[str] = None
resource_id: Optional[str] = None
resource_name: Optional[str] = None
details: Dict[str, Any] = Field(default_factory=dict)
user_id: PyObjectId | None = None
user_email: str | None = None
user_role: str | None = None
ip_address: str | None = None
user_agent: str | None = None
request_id: str | None = None
resource_type: str | None = None
resource_id: str | None = None
resource_name: str | None = None
details: dict[str, Any] = Field(default_factory=dict)
success: bool = True
error_message: Optional[str] = None
error_message: str | None = None
class AuditLogQuery(BaseModel):
"""Schema for querying audit logs."""
# Time range
start_date: Optional[datetime] = None
end_date: Optional[datetime] = None
start_date: datetime | None = None
end_date: datetime | None = None
# Filters
action: Optional[AuditAction] = None
severity: Optional[AuditLogSeverity] = None
user_id: Optional[PyObjectId] = None
user_email: Optional[str] = None
resource_type: Optional[str] = None
resource_id: Optional[str] = None
success: Optional[bool] = None
action: AuditAction | None = None
severity: AuditLogSeverity | None = None
user_id: PyObjectId | None = None
user_email: str | None = None
resource_type: str | None = None
resource_id: str | None = None
success: bool | None = None
# Search
search: Optional[str] = None # Full-text search in description and details
search: str | None = None # Full-text search in description and details
# Pagination
skip: int = 0
limit: int = 100
# Sorting
sort_by: str = "timestamp"
sort_order: int = -1 # -1 for descending, 1 for ascending
@ -167,7 +233,7 @@ class AuditLogQuery(BaseModel):
class AuditLogResponse(BaseModel):
"""Response schema for audit log queries."""
logs: list[AuditLog]
total_count: int
page: int

View file

@ -1,5 +1,5 @@
from datetime import datetime
from typing import Optional, Annotated
from typing import Annotated
from bson import ObjectId
from pydantic import BaseModel, BeforeValidator
@ -17,12 +17,12 @@ PyObjectId = Annotated[str, BeforeValidator(validate_object_id)]
class Client(BaseModel):
id: Optional[str] = None
id: str | None = None
name: str
slug: str
is_active: bool = True
created_at: Optional[datetime] = None
updated_at: Optional[datetime] = None
created_at: datetime | None = None
updated_at: datetime | None = None
class ClientCreate(BaseModel):
@ -31,18 +31,18 @@ class ClientCreate(BaseModel):
class ClientUpdate(BaseModel):
name: Optional[str] = None
slug: Optional[str] = None
is_active: Optional[bool] = None
name: str | None = None
slug: str | None = None
is_active: bool | None = None
class Team(BaseModel):
id: Optional[str] = None
id: str | None = None
name: str
client_id: str
member_user_ids: list[str] = []
created_at: Optional[datetime] = None
updated_at: Optional[datetime] = None
created_at: datetime | None = None
updated_at: datetime | None = None
class TeamCreate(BaseModel):
@ -50,22 +50,31 @@ class TeamCreate(BaseModel):
class TeamUpdate(BaseModel):
name: Optional[str] = None
name: str | None = None
class Project(BaseModel):
id: Optional[str] = None
id: str | None = None
name: str
client_id: str
is_active: bool = True
created_at: Optional[datetime] = None
updated_at: Optional[datetime] = None
default_languages: list[str] = []
default_linguist_id: str | None = None
default_reviewer_id: str | None = None
created_at: datetime | None = None
updated_at: datetime | None = None
class ProjectCreate(BaseModel):
name: str
default_languages: list[str] = []
default_linguist_id: str | None = None
default_reviewer_id: str | None = None
class ProjectUpdate(BaseModel):
name: Optional[str] = None
is_active: Optional[bool] = None
name: str | None = None
is_active: bool | None = None
default_languages: list[str] | None = None
default_linguist_id: str | None = None
default_reviewer_id: str | None = None

View file

@ -0,0 +1,142 @@
from __future__ import annotations
from datetime import datetime
from enum import StrEnum
from pydantic import BaseModel, Field
class GlossarySource(StrEnum):
XLSX_UPLOAD = "xlsx_upload"
FRAZE_API = "fraze_api" # reserved for future FRAZE integration
class GlossaryStatus(StrEnum):
ACTIVE = "active"
ARCHIVED = "archived"
class EmbeddingStatus(StrEnum):
PENDING = "pending"
IN_PROGRESS = "in_progress"
DONE = "done"
FAILED = "failed"
class Glossary(BaseModel):
id: str | None = Field(None, alias="_id")
client_id: str
name: str
description: str | None = None
source_locale: str # BCP-47 source column, e.g. "en-GB"
source: GlossarySource = GlossarySource.XLSX_UPLOAD
status: GlossaryStatus = GlossaryStatus.ACTIVE
current_version_id: str | None = None
created_at: datetime = Field(default_factory=datetime.utcnow)
created_by: str # user_id
model_config = {"populate_by_name": True, "arbitrary_types_allowed": True}
class GlossaryVersion(BaseModel):
id: str | None = Field(None, alias="_id")
glossary_id: str
version_number: int
source_xlsx_gcs_path: str | None = None # GCS path to original file
term_count: int = 0
embedded_count: int = 0
embedding_status: EmbeddingStatus = EmbeddingStatus.PENDING
created_at: datetime = Field(default_factory=datetime.utcnow)
created_by: str
change_note: str | None = None
model_config = {"populate_by_name": True}
class GlossaryTerm(BaseModel):
"""One source term with its per-locale translations."""
id: str | None = Field(None, alias="_id")
glossary_id: str
version_id: str
cid: str | None = None # 3M Content ID from xlsx
tid: str | None = None # 3M Term ID from xlsx
source_term: str # canonical source text (whitespace-normalised)
source_term_lower: str # lowercase for case-insensitive index
translations: dict[str, str] = {} # {locale_code: translated_text}
embedding: list[float] | None = None # 768-dim Gemini embedding
model_config = {"populate_by_name": True}
# ── Schema models (API request/response) ──────────────────────────────────────
class GlossaryCreate(BaseModel):
name: str
description: str | None = None
source_locale: str
change_note: str | None = None
class GlossaryVersionCreate(BaseModel):
source_locale: str
change_note: str | None = None
class GlossaryResponse(BaseModel):
id: str
client_id: str
name: str
description: str | None = None
source_locale: str
source: GlossarySource
status: GlossaryStatus
current_version_id: str | None = None
current_version_embedding_status: EmbeddingStatus | None = None
current_version_embedded_count: int | None = None
current_version_term_count: int | None = None
created_at: datetime
created_by: str
class GlossaryVersionResponse(BaseModel):
id: str
glossary_id: str
version_number: int
term_count: int
embedded_count: int
embedding_status: EmbeddingStatus
created_at: datetime
created_by: str
change_note: str | None = None
class GlossaryDetailResponse(GlossaryResponse):
versions: list[GlossaryVersionResponse] = []
class GlossaryTermPreview(BaseModel):
"""Subset of GlossaryTerm for UI previews."""
source_term: str
translations: dict[str, str]
class MatchedTerm(BaseModel):
"""A term matched against VTT source text, with the target-locale translation."""
source_term: str
target_translation: str
match_kind: str # "exact" | "vector"
score: float # 1.0 for exact, cosine similarity for vector
def glossary_from_doc(doc: dict) -> Glossary:
doc = dict(doc)
if "_id" in doc:
doc["_id"] = str(doc["_id"])
return Glossary.model_validate(doc)
def glossary_version_from_doc(doc: dict) -> GlossaryVersion:
doc = dict(doc)
if "_id" in doc:
doc["_id"] = str(doc["_id"])
return GlossaryVersion.model_validate(doc)

View file

@ -1,5 +1,4 @@
from datetime import datetime
from typing import Optional
from pydantic import BaseModel, EmailStr
@ -7,7 +6,7 @@ from .organization import OrgRole
class Invitation(BaseModel):
id: Optional[str] = None
id: str | None = None
email: str
organization_id: str
role_in_org: OrgRole
@ -15,9 +14,9 @@ class Invitation(BaseModel):
token_hash: str
invited_by_user_id: str
expires_at: datetime
accepted_at: Optional[datetime] = None
revoked_at: Optional[datetime] = None
created_at: Optional[datetime] = None
accepted_at: datetime | None = None
revoked_at: datetime | None = None
created_at: datetime | None = None
class InvitationCreate(BaseModel):
@ -40,9 +39,9 @@ class InvitationPreviewResponse(BaseModel):
class InvitationAcceptRequest(BaseModel):
token: str
full_name: Optional[str] = None
password: Optional[str] = None
ms_id_token: Optional[str] = None
full_name: str | None = None
password: str | None = None
ms_id_token: str | None = None
class InvitationResponse(BaseModel):
@ -52,9 +51,9 @@ class InvitationResponse(BaseModel):
role_in_org: OrgRole
invited_by_user_id: str
expires_at: datetime
accepted_at: Optional[datetime] = None
revoked_at: Optional[datetime] = None
created_at: Optional[datetime] = None
accepted_at: datetime | None = None
revoked_at: datetime | None = None
created_at: datetime | None = None
is_expired: bool = False
is_accepted: bool = False
is_revoked: bool = False

Some files were not shown because too many files have changed in this diff Show more