Commit graph

228 commits

Author SHA1 Message Date
Vadym Samoilenko
f91cb16005 fix(middleware): add word boundaries to injection patterns; default role to admin
- Add \b word boundaries to SQL injection and command injection regex patterns
  to prevent false positives on names like "Josh Smith" (sh\s+), "Norm " (rm\s+)
- Change default role in CreateUserModal from 'client' to 'admin'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 09:45:28 +01:00
Vadym Samoilenko
3a2bbc9ca0 fix(membership): correct \$unwind option preserveNullAndEmpty → preserveNullAndEmptyArrays
MongoDB 7.0 rejects the invalid key with code 28811, causing 500 on
GET /organizations/{id}/members.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 18:58:07 +01:00
Vadym Samoilenko
6588feedc7 fix(membership): use 'or ""' to guard against null email/full_name
u.get("key", "") returns None when key exists with null value in MongoDB,
causing Pydantic ValidationError on MemberDetail.email/full_name: str → 500.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 18:41:31 +01:00
Vadym Samoilenko
90867e9824 fix(qc): show captions on accessible video + allow admin/PM as linguist/reviewer
- Add retimed captions overlay to accessible video player in QCDetail;
  falls back to original captions if retimed VTT not yet generated
- Extend listUsers to accept comma-separated roles (e.g. linguist,admin)
  so admin/production users appear in linguist/reviewer assignment dropdowns

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 18:07:55 +01:00
Vadym Samoilenko
68ac65ac05 fix(briefs): fix Project/Assign-To dropdowns and expand Requested Outputs
Projects:
- PM now sees all active projects (same as admin/production) — was filtering
  to empty when pm_client_ids and org memberships were both unset

Assign To:
- Replaced useOrganizations()+useOrgMembers() with a new GET /admin/brief-assignees
  endpoint accessible to all authenticated users — returns active admin/PM/production
  users sorted by name; shows role next to name in dropdown

Requested Outputs:
- Added SDH Captions (VTT), Descriptive Transcript, Accessible Video (MP4)
- Accessible Video shows Pause Insert / Voice Overlay radio selector
- Added descriptive_transcript field to RequestedOutputs model (backend + frontend)

Access:
- Brief routes now open to 'client' role in addition to admin/PM/production

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 17:54:21 +01:00
Vadym Samoilenko
a3cfe2ff8c fix(renderer): skip ffprobe Phase 3.5 — use pre-computed freeze duration
Cloud Run-generated freeze segments caused FFprobe to return code 1 with
empty stderr when dispatched to the Celery ffmpeg queue, crashing the
render for every language. The freeze segments are created to an exact
pre-computed duration (ad_duration + silence_before + silence_after),
so probing is unnecessary — assign that value directly instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 17:29:33 +01:00
Vadym Samoilenko
df7fec701d fix(ui): connection dot in navbar, profile page, render error visibility + audit log
- Navbar: add WebSocket connection dot (green/yellow/red/gray) from GlobalWebSocketContext
- Profile page: /profile route shows email, full_name, role, auth_provider, languages
- JobResponse: expose failure and error fields (were stored in MongoDB but not returned)
  so frontend now shows actual render error message instead of generic fallback
- render_accessible_video: write JOB_TASK_FAILED audit log entry on render failure
  with language, error detail, step=render
- rerender_accessible_video: same audit log on re-render failure, step=rerender

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 16:19:12 +01:00
Vadym Samoilenko
2f4925353a feat(pause-insert): adaptive buffer, forward-snap, timeline drag + share link fix
Backend (Phase A):
- A1: Adaptive silence buffer — natural_gap_ms persisted per cue; renderer computes
  per-cue silence_before/silence_after instead of fixed 500ms; per-cue silence files
- A2: Forward-preferred snap — snap_pause_point prefers boundaries up to 4s ahead
  over boundaries within 1.5s behind, reducing mid-scene cuts
- A3: Min-gap validation — pause points with < 200ms gap trigger forward search
  to the next acceptable gap
- natural_gap_ms added to PausePointData model and api.ts type
- New config fields: whisper_snap_forward_window, whisper_snap_backward_window,
  ad_silence_buffer_default, ad_silence_buffer_min_after, ad_min_acceptable_gap
- Tests: test_whisper_snap.py (13 tests), test_video_renderer_buffers.py

Frontend (Phase B):
- B1: Drag pause-point markers — pointer state machine with 3px move threshold,
  clamp to min/max bounds, click-without-move still opens PausePointEditor
- B2: Drag freeze blocks — orange blocks translate with linked pause point
- B3: Time tooltip visible during drag, hidden on release
- Tests: TimelinePreview.drag.test.tsx (10 tests)

Fixes:
- Share link pointed to ai-sandbox.oliver.solutions — added app_url to Settings
  with correct optical-dev.oliver.solutions default; share_url now configurable
  via APP_URL env var
- Removed all ai-sandbox.oliver.solutions references from docker-compose,
  apache config, docs, and scripts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 16:09:09 +01:00
Vadym Samoilenko
8dee0b6ff5 fix(membership): guard against missing/invalid role_in_org in membership docs
list_org_members and _membership_from_doc used bracket access on role_in_org
which raises KeyError if the field is absent (old docs or direct DB inserts).
Also handles ValueError if the stored value doesn't match a valid OrgRole.
Falls back to OrgRole.MEMBER in both cases.

Fixes 500 on GET /organizations/{org_id}/members.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 14:35:09 +01:00
Vadym Samoilenko
997c1f622b fix(rbac): allow reviewer role to assign linguists and reviewers
assign, assign-reviewer, reassign-reviewer, and bulk-assign endpoints
were gated to project_manager/production/admin only, but the Reviewer
QC Detail page exposes Assign buttons to reviewer users.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 14:29:15 +01:00
Vadym Samoilenko
6559ccc1f9 feat(help): in-app role-based help guides + screenshot capture pipeline
- Help.tsx: role tabs, TOC scroll-spy, search, lightbox, react-markdown renderer
- 7 markdown guides (global, client, linguist, reviewer, production, PM, admin)
  with explicit click/drag/keyboard annotations throughout
- Sidebar: Help button added at bottom of nav (all roles)
- App.tsx: /help route, no RoleGate
- frontend/public/help-screenshots/{role}/: directories ready for screenshots
- tools/capture-help-screenshots.ts: Playwright screenshot script
  - Clicks "Local login" toggle before filling credentials
  - Uses test-admin local account (not SSO)
- backend/scripts/seed_test_users.py: idempotent MongoDB seed script
  creates 6 local-auth users (admin + 5 roles) for capture + local dev
- .env.screenshots.example: template with test-admin credentials
- Removes docs/video_accessibility_user_guide_v3.md (superseded by in-app guides)
- Deps: react-markdown, remark-gfm, rehype-raw added to frontend

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 13:08:13 +01:00
Vadym Samoilenko
9e6ce657bf fix(schema): empty string → None for captions/AD VTT fields (Bug 2B)
Frontend sends audio_description_vtt: "" for CC-only jobs.
Pydantic validator converts "" to None before validation,
so the backend skips VTT format validation and returns 200
instead of 400.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 12:06:09 +01:00
Vadym Samoilenko
f2968a2989 fix(vtt): regenerate descriptive_transcript.txt after PATCH /vtt saves
Bug 1: Editing any AD cue never updated descriptive_transcript.txt in GCS.
Bug 2A: Uploading replacement CC or AD .vtt had the same root cause.

After saving captions or AD VTT, read the other stream from GCS if not
provided in the request, merge both via generate_descriptive_transcript(),
upload the result to {job_id}/{lang}/descriptive_transcript.txt, and
update lang_output["descriptive_transcript_gcs"] before the DB write.

Bug 2B (CC-only job → 400 on empty audio_description_vtt): already fixed
by the existing `if request.audio_description_vtt:` guard (empty string
and None are both falsy) and frontend `adVtt || undefined` sending no
field rather than an empty string.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 12:03:35 +01:00
Vadym Samoilenko
32b12ff0a6 feat(ux): P2 role UX — reviewer queue, dashboard widgets, org filter, WS toast
Phase 2.3: VttEditor sticky banner + Re-translate wired into QCDetail
Phase 3.1: RoleGate on /briefs/* (PM/admin/production only)
Phase 3.2: LinguistQueue — sortable Assigned column, defaultRole prop
Phase 3.3: ReviewerQueue component + /qc/reviewer-queue route + sidebar entry
Phase 3.4: PM dashboard — Overdue and Stuck >24h widgets
Phase 3.5: Production dashboard — Awaiting Upload and Pending QC Handoff widgets
Phase 3.6: Admin UserList — org_id filter dropdown (uses listOrganizations)
WebSocket: onTerminalClose callback + error toast in GlobalWebSocketContext
Runbook: Apache ProxyTimeout ≥60s recommendation for WebSocket keepalives
Backend: fix F841 unused variable in test_cross_tenant_isolation.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 11:58:29 +01:00
Vadym Samoilenko
b427ee9f49 fix(authz): MT-3/6/7/8 org isolation + P1 English-first QC enforcement
Multi-tenancy isolation (P0):
- MT-3: Add get_job_or_403 (org membership check) to all 19+ job action endpoints
- MT-6: Same gate added to all review_notes (5) and vtt_versions (4) handlers
- MT-7: WebSocket /ws/jobs/{job_id} closes with 4403 on org mismatch;
  /ws/jobs passes accessible_org_ids to ConnectionManager; server-side
  keepalive at 20 s (asyncio.wait_for timeout) prevents proxy idle drops
- MT-8: list_users scoped to org memberships for non-platform-admins

WebSocket fixes (Mod Comms 2026-03-18 incident):
- Frontend heartbeat lowered 30 000 → 20 000 ms (was at Apache timeout edge)
- Terminal close codes 4001/4003/4004/4403 no longer trigger reconnect loop
- Silently discard server "keepalive" frames alongside existing "pong"

English-first QC (P1):
- _assert_can_approve blocks target language approval until source is APPROVED
- PRODUCTION/ADMIN roles bypass the gate
- Source VTT edits reset stale APPROVED/PENDING_REVIEW/IN_REVIEW target states

Tests (all passing):
- backend/tests/unit/test_language_qc_english_first.py (15 cases)
- backend/tests/unit/test_routes_jobs_org_isolation.py (12 cases)
- backend/tests/unit/test_review_notes_org_isolation.py (16 parametrized cases)
- backend/tests/unit/test_vtt_versions_org_isolation.py (16 parametrized cases)
- backend/tests/unit/test_websocket_org_isolation.py (11 cases)
- backend/tests/unit/test_admin_users_org_filter.py (7 cases)
- frontend: useJobStatusWebSocket.terminal.test.ts (9 cases)
- frontend: useJobStatusWebSocket.heartbeat.test.ts (9 cases)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 11:43:10 +01:00
Vadym Samoilenko
5d8d992e5a feat(briefs+notify+downloads): fix projects dropdown, add assignee, expand languages, fix PM email, add Download All
- NewBrief: use useAllProjects() (was useProjects('') which never fired)
- NewBrief: expand languages from 12 to 52 options with region variants
- NewBrief: add Assign To dropdown from org members
- Backend: add GET /clients/all-projects endpoint for cross-client project listing
- Backend: add assignee_id to JobBriefCreate/JobBriefResponse models + routes
- notify.py: send completion email to PMs (pm_client_ids) not client user — fixes email never arriving (was looking up users._id by client entity ID)
- Downloads: add Download All button that fetches all files sequentially

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 21:47:28 +01:00
Vadym Samoilenko
3bed598025 fix(glossary+jobs): add debug logging for glossary failures and fix AllJobs filter stale state
- glossary_service: add step-by-step debug/warning logs at each early-return point so
  the exact failure reason is visible in worker logs (project not found, no active version, etc.)
- glossary_service: guard against source_term_lower=None in ahocorasick automaton build
- glossary_service: guard against target_locale=None in _get_translation
- glossary_service: add full traceback to the outer exception catch for easier debugging
- JobsList: fix statusFilter stale state — useEffect now always syncs with URL params,
  clearing the filter when no ?status= param is present (previously the filter was never
  cleared, so navigating from /jobs?status=X to /jobs kept the old filter)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 21:25:41 +01:00
Vadym Samoilenko
713ae46d4a fix(tts): revert pro TTS to gemini-2.5-pro-preview-tts (3.1 pro TTS doesn't exist yet)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 21:01:22 +01:00
Vadym Samoilenko
3fb8dce3ee feat(ai): upgrade Gemini models to 3.1-pro-preview and 3.1-pro-tts-preview
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 21:00:32 +01:00
Vadym Samoilenko
12fe4ebcbb feat(tts): upgrade Gemini TTS model to gemini-3.1-flash-tts-preview
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 20:57:37 +01:00
Vadym Samoilenko
43ef3a6cd8 fix(migrations): correct listCollections cursor parsing, add processing_failed+cancelled to status enum
Previous migrations used async-for on a dict (Atlas returns firstBatch, not
async cursor) — silently failed. New migration reads firstBatch correctly and
sets the complete status list.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 20:47:21 +01:00
Vadym Samoilenko
8a1440201e fix(migrations): connect to mongo before running migrations in run.py
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 20:43:48 +01:00
Vadym Samoilenko
99554173e6 feat(migrations): add run.py entry point for python -m app.migrations.run
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 20:41:52 +01:00
Vadym Samoilenko
2e8cf8269e fix(tts): fetch job_doc before gcs_path call in _generate_language_tts; add cancelled migration
- translate_and_synthesize.py: fetch job_doc from DB right before the combined
  MP3 upload so gcs_path() has the gcs_prefix needed for newer jobs; removes the
  duplicate fetch that existed later in the same function
- migration_2026-04-30-000001: add 'cancelled' to MongoDB $jsonSchema validator
  enum so cancel_job writes no longer fail Document validation
- Dashboard.tsx: include all active processing statuses in the Processing counter

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 20:36:03 +01:00
Vadym Samoilenko
f681bd4f53 feat: add Stop Process button to cancel in-progress jobs
Adds POST /jobs/{id}/cancel endpoint that revokes the Celery task and
sets status to 'cancelled'. Shows a confirmation widget in the job
detail sidebar for admin/production roles when the job is in an active
processing state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 19:50:39 +01:00
Vadym Samoilenko
08a8a0d636 fix(tts): convert lameenc bytearray to bytes before GCS upload
lameenc.encode() returns bytearray, but google-cloud-storage's
_to_bytes() only accepts bytes/str — causing TypeError on every
upload_from_string() call. Cast to bytes() before returning.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 19:35:28 +01:00
Vadym Samoilenko
77a9d3b255 fix(docker): add ffmpeg to base image — fixes pydub AudioSegment in worker
ffmpeg was missing from the base image, causing all pydub operations
(AudioSegment.from_file, export) to fail in worker and tts-worker containers.
Moved ffmpeg install from whisper-worker stage to the shared base stage so
all container variants (api, worker, tts-worker, whisper-worker) have it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 19:12:57 +01:00
Vadym Samoilenko
7c15acc18a chore: update poetry.lock after adding lameenc dependency
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 18:34:04 +01:00
Vadym Samoilenko
a53cf960ae fix(tts): replace pydub MP3 export with lameenc (pure Python, no system ffmpeg)
Gemini TTS _pcm_to_mp3 used pydub.AudioSegment.export(format='mp3') which
requires a system ffmpeg binary. Worker containers don't have ffmpeg installed
(video ops run on Cloud Run). Switch to lameenc which is pure Python and
encodes PCM→MP3 without any system binary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 18:24:15 +01:00
Vadym Samoilenko
105895dd14 feat: apply EN source VTT changes to all target languages
When a reviewer saves the source language VTT during QC and confirms
the re-translate dialog, all target languages are re-translated via
Celery. Job transitions to `translating` and returns to `pending_qc`
when done. Existing polling in useJob covers progress display.

- schemas/job.py: add `retranslate_languages: bool` to VttUpdateRequest
- audit_log.py: add VTT_RETRANSLATE audit action
- translate_and_synthesize_task: accept languages/retranslate params,
  filter to specified languages, skip video render, return to PENDING_QC
- routes_jobs.py: add _trigger_retranslation helper, call after VTT save
- types/api.ts: add retranslate_languages to VttUpdateRequest
- useJob.ts: invalidate all lang VTTs on retranslate
- QCDetail.tsx: confirmation dialog when saving source VTT with targets

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 17:13:06 +01:00
Vadym Samoilenko
31199f8705 chore: push all session changes — backend hardening, tests, apache config, deploy scripts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 15:52:14 +01:00
Vadym Samoilenko
5fd370c093 test: fix all unit tests — 168 passing, 0 failures
- conftest.py: set required env vars before app import to prevent Settings() crash
- gcs.py: lazy bucket init checks _bucket instead of _client; add @bucket.setter
- vtt.py: fix float precision in _format_timestamp; include empty-text cues in parser
- security.py: guard verify_password against empty hash (passlib UnknownHashError)
- tts.py: _parse_timestamp raises ValueError("Invalid timestamp format: …")
- emailer.py: HTML-escape job_title in _render_completion_template (XSS fix)
- test_emailer.py: rewrite for Mailgun-based service (replaced SendGrid)
- test_gcs.py: fix UploadFile constructor, MIME type, remove executor.submit mock
- test_gemini.py: patch module-level client instead of non-existent genai.upload_file;
  translate_vtt tests use numbered-list mock responses matching new implementation
- test_tts.py: fix aiohttp async CM mock pattern; fix error message match
- test_models.py: update JobCreate to use source_is_english instead of language
- test_security.py: set jwt_access_ttl_min in token test
- test_cross_tenant_isolation.py: add patch to imports

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 14:02:04 +01:00
Vadym Samoilenko
d5e63129dd feat(upload): PR-3 GCS resumable chunked upload for large videos
Files >100 MB bypass the load balancer via browser→GCS direct upload:
- POST /jobs/upload/init — creates GCS resumable session, returns job_id + session URI
- POST /jobs/upload/complete — verifies GCS object, creates job, dispatches ingestion
- Frontend sends 8 MB chunks with Content-Range directly to GCS session URI
- infra/gcs-cors.json + deploy-dev.sh ensure_gcs_cors() enable browser CORS on bucket

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 11:35:13 +01:00
Vadym Samoilenko
c1948ea198 feat(ux): T-2/PR-7/PR-8 — status color helper, queue stats widget, upload-final-VTT override
T-2: Extract getJobStatusColor() into utils/jobStatusMessages.ts; StatusBadge now uses the
     shared helper (single source of truth for badge colors).

PR-7: GET /admin/production/queue-stats — returns Celery queue depths via Redis LLEN.
      Production dashboard shows a live panel (10s refresh) with per-queue task counts.

PR-8: POST /admin/production/jobs/{id}/upload-final-vtt — Production/Admin can upload a
      hand-crafted VTT to bypass AI, writing to GCS and advancing the job to PENDING_QC.
      Upload modal added to FailuresList with language + type (captions/ad) selectors.

docker-compose.optical-dev.yml: enable USE_CELERY_FALLBACK=true, set worker replicas=1
      for all pipeline workers (ffmpeg/tts/whisper) with WORKER_CONCURRENCY=2 so the full
      pipeline runs on the 2-CPU optical-dev server until Cloud Run VPC Connector is ready.

Fix: remove unused effectiveMs variable in TimelinePreview (TS6133).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 11:12:36 +01:00
Vadym Samoilenko
e4b350cd7d feat(ux): R-8 linguist language warn, PM CC editing, timeline right-click + CC insert
R-8 — Linguist language competence:
- Add User.languages[] BCP-47 field to backend model + UserResponse schema
- Frontend: show amber warning in assign modal when selected linguist has no
  competence listed for the target language

PM VTT editing (FinalDetail):
- PM and ADMIN can now edit captions/AD in the final review stage
- VttEditor becomes read-write with onCueSave wired to updateVttMutation
- Other roles remain read-only

Timeline right-click + add pause:
- Right-click anywhere on the timeline opens a context menu showing the timestamp
- If near a pause point marker: "Edit timing" + "Regenerate TTS" options
- If on empty space: "Add AD cue at Xs" → inserts a new AD cue in the editor
- Pause point markers widened from 1px → 2px (3px on hover) for easier clicking
- Right-click on a pause point marker directly opens the editor

VttEditor insertAtTimeMs prop:
- New prop triggers programmatic insert at a specific video timestamp
- Used by the timeline right-click "Add AD cue here" action

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 10:51:31 +01:00
Vadym Samoilenko
3f557724d3 feat(api): L-18 blocked-on-source, PR-10 promote-to-qc, R-12 reviewed_cues reset
- POST /{job_id}/actions/blocked_on_source (L-18): linguist/reviewer flags a source
  video issue; moves job to QC_FEEDBACK and records blocked_on_source_reason/at/by
- POST /{job_id}/actions/promote_to_qc (PR-10): production/admin manually bypasses
  AI processing for edge-case failures; adds audit history entry
- Reset reviewed_cues to 0 on submit_for_review (R-12) so reviewer must re-acknowledge
  all cues after each linguist resubmit
- Add assert_job_in_user_org + get_user_org_ids to core/dependencies.py (used by
  the new endpoints and the cross-tenant isolation test suite)
- Remove unused ingest_and_ai_task / translate_and_synthesize_task imports

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 10:38:39 +01:00
Vadym Samoilenko
ff372c7322 fix(security): close MT-17/18/19, restore cross-tenant tests, quick wins
Blocks 1–5 of stabilization plan:

SECURITY
- validation.py: restore settings.upload_max_video_bytes (T-14 regression fix)
  and JSON object key validation that was incorrectly removed
- MT-18: add accessible_org_ids filter to list_for_reviewer/list_for_linguist
  so reviewers/linguists only see jobs from their own org in QC queue
- MT-17: add Membership.team_ids[], write to it on invitation acceptance and
  direct team add/remove; migration backfills from Team.member_user_ids
- MT-19: validate all target_team_ids belong to invitation's org_id at creation

TESTS
- Restore test_cross_tenant_isolation.py (was deleted, only .pyc remained)
- Extend with MT-18 reviewer org isolation tests

QUICK WINS
- W-8: remove time.sleep(1) + dead debug block from POST /jobs (task was
  undefined — would have caused NameError → HTTP 500 on every job creation)
- T-13: warn at startup when REDIS_URL configured but connection failed
- T-16: skip language_qc lifespan migration when count=0 (no DB scan on startup)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 10:32:23 +01:00
Vadym Samoilenko
ea30425a63 fix(migrations): version/description as class vars, not instance vars in Migration base
__init__ was setting self.version = "0000-00-00-000000" on every instantiation,
overriding the subclass class variable. All migrations were recorded in DB
with the default version instead of their own, causing duplicate key errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 22:16:12 +01:00
Vadym Samoilenko
89fa87ba8a refactor(docker): remove ffmpeg from api/worker images — runs on Cloud Run Jobs
Heavy pipeline tasks (ingest, translate, render, tts) now dispatch to
va-worker Cloud Run Job which has its own Dockerfile.cloudrun with ffmpeg.
API and lightweight Celery worker (notify/embed) don't need it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 22:08:25 +01:00
Vadym Samoilenko
f4a82dcf76 fix(migrations): replace relative imports with absolute in PR-7 migrations
Migration runner executes scripts outside package context — relative
imports fail. Pattern matches all other migration files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 22:05:32 +01:00
Vadym Samoilenko
b3ace22009 feat(infra): move heavy workers to Cloud Run Jobs
Heavy pipeline tasks (ingest, translate, render, rerender) now dispatch
to a Cloud Run Job (va-worker) instead of local Celery workers. optical-dev
runs only api + lightweight worker (notify/embed) within its 2-CPU budget.

- backend/app/tasks/runner.py — Cloud Run Job entrypoint
- backend/app/services/cloud_run_dispatch.py — replaces .delay() for heavy tasks
- backend/Dockerfile.cloudrun — Cloud Run worker image (ffmpeg included)
- docker-compose.optical-dev.yml — 2-CPU safe overrides, disables heavy workers
- cloudbuild.yaml — builds va-worker image and updates Cloud Run Job
- deploy-dev.sh — uses 3-file compose, builds only api+worker locally
- routes_jobs, routes_admin_production, ingest_and_ai, translate_and_synthesize
  — all dispatch sites updated to use cloud_run_dispatch.dispatch()

USE_CELERY_FALLBACK=true in .env.local to use Celery locally during dev.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 21:47:10 +01:00
Vadym Samoilenko
4623b89aeb feat(mt-16): JWT org_ids claim + transient user.org_ids in deps
- create_access_token gains optional org_ids: list[str] param; encodes
  {exp, sub, org_ids, v:2} — org_ids is a prefilter hint only, never
  used as authorization source of truth (Redis cache is authoritative)
- Login, MS login, refresh endpoints: fetch memberships and include
  org_ids in issued access tokens via _get_user_org_ids() helper
- routes_invitations.py accept flow: same org_ids population on token
- get_current_user: reads org_ids from payload, attaches as transient
  user.__dict__["org_ids"] — available to OrgScopedQuery for prefilter
- Force logout: rotate JWT_SECRET env var at deployment time (no code
  change needed; all existing tokens immediately invalidated)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:46:39 +01:00
Vadym Samoilenko
54fcf47887 feat(mt-14): gcs_prefix on Job, gcs_path helper, rewrite path sites
- gcs_path(job, *parts) helper in gcs.py: uses job.gcs_prefix if set,
  falls back to job._id (legacy) — backward-compatible for all old jobs
- create_job: sets gcs_prefix=orgs/{org_id}/jobs/{job_id} when
  organization_id is known; legacy jobs without org get null prefix
- Rewrote hardcoded f"{job_id}/{lang}/..." paths in:
  - ingest_and_ai.py (4 upload sites)
  - translate_and_synthesize.py (9 sites via bulk regex)
  - render_accessible_video.py (3 sites: segments, video, captions)
  - rerender_accessible_video.py (3 sites)
- tools/migrate_gcs_org_prefix.py: idempotent operator script —
  preflight checks, copy→verify(count+md5)→mongo update→delete,
  ThreadPoolExecutor(4), resume file, dry-run + rollback modes

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:45:12 +01:00
Vadym Samoilenko
595897e61a feat(w-12): JobBrief model, endpoints, migration + brief→job linkage
- JobBrief model (DRAFT→SUBMITTED→APPROVED→FULFILLED) with 6 CRUD
  endpoints: list, create, get, patch (DRAFT only), submit, approve
- All endpoints use MembershipContext; read=VIEWER, mutate=MANAGER,
  approve=ADMIN for org-scoped access
- create_job accepts brief_id Form field; validates APPROVED brief,
  copies organization_id/project_id/deadline from brief, marks brief
  FULFILLED after job insert
- organization_id now populated from project client_id on job create
  (fixes missing multi-tenant field on new jobs)
- migration_2026-04-29-000001: job_briefs collection + 4 indexes
- Wired briefs router into main.py

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:38:08 +01:00
Vadym Samoilenko
a945653e73 feat(w-14): bulk failures dashboard + sidebar badge
- GET /admin/production/failures: list failed jobs filtered by step/org
- POST /admin/production/bulk-retry: dispatch retry for up to 50 jobs
  with "auto" (from failure.step) or "from_scratch" strategies
- FailuresList.tsx: accordion-grouped by error type, multi-select,
  bulk retry action, step label, retry count (red >3), updated date
- Sidebar: "Failures" item with live badge for production/admin roles
  (polls useJobs with processing_failed,tts_failed,render_failed)
- New useFailures / useBulkRetry hooks

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:36:30 +01:00
Vadym Samoilenko
264561895e feat(w-13): generic /jobs/{id}/retry endpoint + unified failure UI
- POST /jobs/{job_id}/retry dispatches correct pipeline task based on
  failure.step: ingestion/ai_processing → ingest_and_ai_task,
  translation/tts → translate_and_synthesize_task, render → rerender
- Increments retry_count, writes JOB_RETRY audit log entry
- Adds processing_failed to JobStatus type; JobFailure interface on Job
- Replaces TTS-only retry block with FailureBanner showing step/message/
  retry_count for all failed statuses (processing_failed, tts_failed,
  render_failed); Escalate mailto link for high-retry-count cases
- useRetryJob hook + apiClient.retryJob() call new endpoint

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:33:50 +01:00
Vadym Samoilenko
dca1ca9c8c feat(w-13): structured failure handlers in tasks; fix translation→TTS_FAILED bug
ingest_and_ai: exception handler now sets status=PROCESSING_FAILED and
writes Job.failure{step, type, message, retriable, occurred_at} instead
of leaving status unchanged (was silent failure).

translate_and_synthesize: replace the blanket TTS_FAILED status (even for
translation failures) with PROCESSING_FAILED + failure.step="translation"|
"tts" based on current job status at failure time.

render_accessible_video: add failure{step="render"} alongside existing
RENDER_FAILED status for UI consumption.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:28:37 +01:00
Vadym Samoilenko
3e3be935c6 feat(w-13): structured Job.failure schema, PROCESSING_FAILED status, audit actions
Add JobFailure model (step, type, message, retriable, occurred_at,
retry_count) to job.py. Add PROCESSING_FAILED to JobStatus (legacy
TTS_FAILED/RENDER_FAILED preserved for back-compat).

Add missing Job fields that existed in DB but not the Pydantic model:
organization_id, brief_id, gcs_prefix, initial_linguist_id,
initial_reviewer_id, failure, retry_count.

Add JOB_TASK_FAILED, JOB_RETRY, JOB_BULK_RETRY to AuditAction enum.

Add migration 2026-04-29-000000: processing_failed in schema validator +
compound indexes (failure.step/status) and (status/org_id/created_at).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:27:28 +01:00
Vadym Samoilenko
38038862c9 refactor(mt-15): consolidate authz in routes_jobs and dependencies
list_jobs now uses MembershipContext (Redis-cached, 60s TTL) to build
org-scoped queries instead of per-request memberships.find(). Falls back
to legacy get_accessible_project_ids for users with no memberships.

get_job replaces the role-specific CLIENT/PM access check with
get_job_or_403() which uniformly checks organization_id membership for
all roles (returns 404 not 403 to avoid leaking cross-org job existence).

get_accessible_project_ids in dependencies.py now uses _cached_memberships
from authz.py, eliminating the duplicate uncached DB query.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:26:07 +01:00
Vadym Samoilenko
5209f04318 feat(mt-13): bind glossary handlers to client_id via org membership check
All 8 glossary route handlers now verify the requesting user has org
membership in the target client_id using assert_user_in_org() from
core/authz.py. Read endpoints require VIEWER, mutations require MANAGER,
archive requires ADMIN (org-level). Removed dead _assert_can_read() and
_require_client_staff() helpers. Removed unused require_roles/User/UserRole
imports. Also added get_job_or_403() to authz.py for MT-15.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 20:24:41 +01:00