- docker-compose.yml: add USE_CELERY_FALLBACK env var to api and worker
services so cloud_run_dispatch uses Celery on optical-dev
- JobDetail.tsx: show actual error message instead of generic
"Processing failed at ." when failure step is unknown; also show
job.error string when no structured failure object exists
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
google.cloud.run_v2 is not installed; optical-dev dispatches pipeline tasks
via local Celery workers, not Cloud Run Jobs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a reviewer saves the source language VTT during QC and confirms
the re-translate dialog, all target languages are re-translated via
Celery. Job transitions to `translating` and returns to `pending_qc`
when done. Existing polling in useJob covers progress display.
- schemas/job.py: add `retranslate_languages: bool` to VttUpdateRequest
- audit_log.py: add VTT_RETRANSLATE audit action
- translate_and_synthesize_task: accept languages/retranslate params,
filter to specified languages, skip video render, return to PENDING_QC
- routes_jobs.py: add _trigger_retranslation helper, call after VTT save
- types/api.ts: add retranslate_languages to VttUpdateRequest
- useJob.ts: invalidate all lang VTTs on retranslate
- QCDetail.tsx: confirmation dialog when saving source VTT with targets
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Guard useJobDownloads with !!jobStatus so it never fires when job is
still loading (status undefined on first render)
- Expand EARLY_STATUSES to cover translating/tts_generating/rendering_*
which also have no outputs yet
- Remove Downloads.tsx hack that locked downloads to completed-only
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On optical-dev the Apache vhost is a standalone file in sites-enabled (not
a symlink to sites-available), so injecting the Include into sites-available
had no effect and the ProxyPassMatch rules were never loaded by Apache.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
faster_whisper loads its model into RAM at startup regardless of whether
tasks are routed to Cloud Run — reducing the limit to 512M caused OOM kill
on container start. Restored original limits (ffmpeg: 1G, whisper: 2G).
Cloud Run URLs (FFMPEG_SERVICE_URL / WHISPER_SERVICE_URL) remain set so CPU
offload is still active.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sets FFMPEG_SERVICE_URL and WHISPER_SERVICE_URL so video_renderer.py and
whisper_transcribe.py route CPU-heavy work to Cloud Run instead of running
ffmpeg/Whisper locally. Both Cloud Run services and IAM (roles/run.invoker
for accessible-video-worker@ and video-accessibility@ SAs) are already
provisioned — only the env vars were missing.
ffmpeg-worker container: 1G/0.5CPU → 256M/0.25CPU (HTTP dispatcher only)
whisper-worker container: 2G/0.5CPU → 512M/0.25CPU (HTTP dispatcher only)
Expected outcome: ffmpeg-worker drops from 51% CPU / 97% RAM to < 5% CPU.
Server load avg should fall from ~2.2 to ~1.0-1.3.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
deploy.sh and full-deploy.sh predate the optical-dev setup and reference
old URLs/compose files. deploy-dev.sh is the single source of truth.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
whisper-worker base has reservation 4G, optical-dev limit 2G causes Docker error.
Added explicit reservations to all three pipeline workers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
deploy-dev.sh:
- BUILD_SERVICES now includes tts-worker, ffmpeg-worker, whisper-worker (enabled
in docker-compose.optical-dev.yml via USE_CELERY_FALLBACK=true)
- ensure_apache_modules(): enables proxy, proxy_http, proxy_wstunnel, rewrite
- Apache fragment: WS proxy (ws://) placed BEFORE HTTP /api/ proxy (required
for correct longest-match precedence in Apache)
- Added ProxyTimeout 600 (10 min) and LimitRequestBody 2147483648 (2 GB) for
large video uploads; disablereuse=on for WS pool correctness
- Fragment always regenerated on deploy (picks up PORT/WEBROOT changes)
- Logs command uses full $COMPOSE variable instead of hardcoded partial flags
deploy/apache-video-accessibility.conf:
- Static reference copy of the Apache fragment with inline comments explaining
each directive
.env.production:
- Updated remaining ai-sandbox.oliver.solutions URLs to optical-dev.oliver.solutions
(API_BASE_URL, COOKIE_DOMAIN, CLIENT_BASE_URL, AZURE_REDIRECT_URI, CORS_ORIGINS)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
T-2: Extract getJobStatusColor() into utils/jobStatusMessages.ts; StatusBadge now uses the
shared helper (single source of truth for badge colors).
PR-7: GET /admin/production/queue-stats — returns Celery queue depths via Redis LLEN.
Production dashboard shows a live panel (10s refresh) with per-queue task counts.
PR-8: POST /admin/production/jobs/{id}/upload-final-vtt — Production/Admin can upload a
hand-crafted VTT to bypass AI, writing to GCS and advancing the job to PENDING_QC.
Upload modal added to FailuresList with language + type (captions/ad) selectors.
docker-compose.optical-dev.yml: enable USE_CELERY_FALLBACK=true, set worker replicas=1
for all pipeline workers (ffmpeg/tts/whisper) with WORKER_CONCURRENCY=2 so the full
pipeline runs on the 2-CPU optical-dev server until Cloud Run VPC Connector is ready.
Fix: remove unused effectiveMs variable in TimelinePreview (TS6133).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
R-8 — Linguist language competence:
- Add User.languages[] BCP-47 field to backend model + UserResponse schema
- Frontend: show amber warning in assign modal when selected linguist has no
competence listed for the target language
PM VTT editing (FinalDetail):
- PM and ADMIN can now edit captions/AD in the final review stage
- VttEditor becomes read-write with onCueSave wired to updateVttMutation
- Other roles remain read-only
Timeline right-click + add pause:
- Right-click anywhere on the timeline opens a context menu showing the timestamp
- If near a pause point marker: "Edit timing" + "Regenerate TTS" options
- If on empty space: "Add AD cue at Xs" → inserts a new AD cue in the editor
- Pause point markers widened from 1px → 2px (3px on hover) for easier clicking
- Right-click on a pause point marker directly opens the editor
VttEditor insertAtTimeMs prop:
- New prop triggers programmatic insert at a specific video timestamp
- Used by the timeline right-click "Add AD cue here" action
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove hover gate on insert/delete action buttons — all 3 buttons now permanently
visible when !readOnly so the insert affordance is clear on touch and small screens
- Add GapInsertRow: a clickable dashed bar shown before the first cue (when gap > 0.5s)
and between any two cues with a gap > 0.5s — directly addresses the case where music
or silence precedes the first caption (e.g. 0:00–24.5s gap in the Command Strip video)
- Fix: insertCue now calls saveCue immediately so the placeholder cue persists even if
the user navigates away before typing text
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- POST /{job_id}/actions/blocked_on_source (L-18): linguist/reviewer flags a source
video issue; moves job to QC_FEEDBACK and records blocked_on_source_reason/at/by
- POST /{job_id}/actions/promote_to_qc (PR-10): production/admin manually bypasses
AI processing for edge-case failures; adds audit history entry
- Reset reviewed_cues to 0 on submit_for_review (R-12) so reviewer must re-acknowledge
all cues after each linguist resubmit
- Add assert_job_in_user_org + get_user_org_ids to core/dependencies.py (used by
the new endpoints and the cross-tenant isolation test suite)
- Remove unused ingest_and_ai_task / translate_and_synthesize_task imports
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Blocks 1–5 of stabilization plan:
SECURITY
- validation.py: restore settings.upload_max_video_bytes (T-14 regression fix)
and JSON object key validation that was incorrectly removed
- MT-18: add accessible_org_ids filter to list_for_reviewer/list_for_linguist
so reviewers/linguists only see jobs from their own org in QC queue
- MT-17: add Membership.team_ids[], write to it on invitation acceptance and
direct team add/remove; migration backfills from Team.member_user_ids
- MT-19: validate all target_team_ids belong to invitation's org_id at creation
TESTS
- Restore test_cross_tenant_isolation.py (was deleted, only .pyc remained)
- Extend with MT-18 reviewer org isolation tests
QUICK WINS
- W-8: remove time.sleep(1) + dead debug block from POST /jobs (task was
undefined — would have caused NameError → HTTP 500 on every job creation)
- T-13: warn at startup when REDIS_URL configured but connection failed
- T-16: skip language_qc lifespan migration when count=0 (no DB scan on startup)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
API base URL and MSAL redirect URI were pointing to old ai-sandbox host,
causing Microsoft auth popup to redirect back to the wrong domain.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
__init__ was setting self.version = "0000-00-00-000000" on every instantiation,
overriding the subclass class variable. All migrations were recorded in DB
with the default version instead of their own, causing duplicate key errors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Heavy pipeline tasks (ingest, translate, render, tts) now dispatch to
va-worker Cloud Run Job which has its own Dockerfile.cloudrun with ffmpeg.
API and lightweight Celery worker (notify/embed) don't need it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port 8003 is occupied by infra-api-1 on optical-dev server.
Artifact Registry repo renamed from nexus to video-accessibility.
cloudbuild.yaml defaults _TAG to 'latest' for manual runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Heavy pipeline tasks (ingest, translate, render, rerender) now dispatch
to a Cloud Run Job (va-worker) instead of local Celery workers. optical-dev
runs only api + lightweight worker (notify/embed) within its 2-CPU budget.
- backend/app/tasks/runner.py — Cloud Run Job entrypoint
- backend/app/services/cloud_run_dispatch.py — replaces .delay() for heavy tasks
- backend/Dockerfile.cloudrun — Cloud Run worker image (ffmpeg included)
- docker-compose.optical-dev.yml — 2-CPU safe overrides, disables heavy workers
- cloudbuild.yaml — builds va-worker image and updates Cloud Run Job
- deploy-dev.sh — uses 3-file compose, builds only api+worker locally
- routes_jobs, routes_admin_production, ingest_and_ai, translate_and_synthesize
— all dispatch sites updated to use cloud_run_dispatch.dispatch()
USE_CELERY_FALLBACK=true in .env.local to use Celery locally during dev.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sequential image builds (one at a time to avoid OOM), auto Apache
fragment, migrations, frontend rsync, smoke test. Flags:
--skip-build / --skip-frontend / --skip-migrations
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes all remaining multi-tenant security gaps and adds production UX:
Security (MT-11/12/13/15/16):
- Cross-org assignment guard in language_qc for linguist/reviewer slots
- Remove PM/CLIENT bypass from _assert_client_access
- Bind all 8 glossary handlers to MembershipContext + OrgRole check
- Consolidate authz: get_job_or_403, assert_user_in_org, OrgScopedQuery in list_jobs
- JWT access tokens now carry org_ids hint claim (transient, not authoritative)
GCS org-prefix (MT-14):
- gcs_prefix field on Job: orgs/{org_id}/jobs/{job_id} for new jobs
- gcs_path() helper — falls back to legacy {job_id}/ for old jobs
- Rewrote 30+ hardcoded GCS path sites across tasks and routes
- Operator script tools/migrate_gcs_org_prefix.py (copy-verify-delete, resumable)
Failure recovery (W-13/14):
- Unified JobFailure schema: step/type/message/retriable/occurred_at/retry_count
- PROCESSING_FAILED status; legacy TTS_FAILED/RENDER_FAILED kept for back-compat
- Fix: translation-phase exceptions now record step="translation" not "tts"
- Generic POST /jobs/{id}/retry dispatches by failure.step
- GET /admin/production/failures + POST /admin/production/bulk-retry (cap 50)
- FailureBanner in JobDetail, failures badge in Sidebar
Job Brief workflow (W-12):
- JobBrief model + 6 CRUD endpoints (list/create/get/patch/submit/approve)
- create_job accepts brief_id Form param; copies org/deadline/project; marks FULFILLED
- BriefsList, NewBrief, BriefDetail UI; NewJob pre-fills from ?brief_id=
- Briefs badge in Sidebar for submitted briefs
Migrations: 2026-04-29-000000 (failure indexes) + 2026-04-29-000001 (job_briefs)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- create_access_token gains optional org_ids: list[str] param; encodes
{exp, sub, org_ids, v:2} — org_ids is a prefilter hint only, never
used as authorization source of truth (Redis cache is authoritative)
- Login, MS login, refresh endpoints: fetch memberships and include
org_ids in issued access tokens via _get_user_org_ids() helper
- routes_invitations.py accept flow: same org_ids population on token
- get_current_user: reads org_ids from payload, attaches as transient
user.__dict__["org_ids"] — available to OrgScopedQuery for prefilter
- Force logout: rotate JWT_SECRET env var at deployment time (no code
change needed; all existing tokens immediately invalidated)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- JobBrief model (DRAFT→SUBMITTED→APPROVED→FULFILLED) with 6 CRUD
endpoints: list, create, get, patch (DRAFT only), submit, approve
- All endpoints use MembershipContext; read=VIEWER, mutate=MANAGER,
approve=ADMIN for org-scoped access
- create_job accepts brief_id Form field; validates APPROVED brief,
copies organization_id/project_id/deadline from brief, marks brief
FULFILLED after job insert
- organization_id now populated from project client_id on job create
(fixes missing multi-tenant field on new jobs)
- migration_2026-04-29-000001: job_briefs collection + 4 indexes
- Wired briefs router into main.py
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- GET /admin/production/failures: list failed jobs filtered by step/org
- POST /admin/production/bulk-retry: dispatch retry for up to 50 jobs
with "auto" (from failure.step) or "from_scratch" strategies
- FailuresList.tsx: accordion-grouped by error type, multi-select,
bulk retry action, step label, retry count (red >3), updated date
- Sidebar: "Failures" item with live badge for production/admin roles
(polls useJobs with processing_failed,tts_failed,render_failed)
- New useFailures / useBulkRetry hooks
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ingest_and_ai: exception handler now sets status=PROCESSING_FAILED and
writes Job.failure{step, type, message, retriable, occurred_at} instead
of leaving status unchanged (was silent failure).
translate_and_synthesize: replace the blanket TTS_FAILED status (even for
translation failures) with PROCESSING_FAILED + failure.step="translation"|
"tts" based on current job status at failure time.
render_accessible_video: add failure{step="render"} alongside existing
RENDER_FAILED status for UI consumption.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add JobFailure model (step, type, message, retriable, occurred_at,
retry_count) to job.py. Add PROCESSING_FAILED to JobStatus (legacy
TTS_FAILED/RENDER_FAILED preserved for back-compat).
Add missing Job fields that existed in DB but not the Pydantic model:
organization_id, brief_id, gcs_prefix, initial_linguist_id,
initial_reviewer_id, failure, retry_count.
Add JOB_TASK_FAILED, JOB_RETRY, JOB_BULK_RETRY to AuditAction enum.
Add migration 2026-04-29-000000: processing_failed in schema validator +
compound indexes (failure.step/status) and (status/org_id/created_at).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
list_jobs now uses MembershipContext (Redis-cached, 60s TTL) to build
org-scoped queries instead of per-request memberships.find(). Falls back
to legacy get_accessible_project_ids for users with no memberships.
get_job replaces the role-specific CLIENT/PM access check with
get_job_or_403() which uniformly checks organization_id membership for
all roles (returns 404 not 403 to avoid leaking cross-org job existence).
get_accessible_project_ids in dependencies.py now uses _cached_memberships
from authz.py, eliminating the duplicate uncached DB query.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
All 8 glossary route handlers now verify the requesting user has org
membership in the target client_id using assert_user_in_org() from
core/authz.py. Read endpoints require VIEWER, mutations require MANAGER,
archive requires ADMIN (org-level). Removed dead _assert_can_read() and
_require_client_staff() helpers. Removed unused require_roles/User/UserRole
imports. Also added get_job_or_403() to authz.py for MT-15.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The unconditional `if user.role in (CLIENT, PROJECT_MANAGER): return`
allowed any PM to access any client regardless of membership. Removed;
kept pm_client_ids legacy fallback for pre-migration users.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Prevent PM in org A from assigning linguist/reviewer from org B.
Added _assert_user_in_job_org() helper that resolves job org_id (with
project fallback) and checks db.memberships for the assignee. Also added
assert_user_in_org() and get_job_or_403() to core/authz.py for use in
upcoming MT-13 and MT-15 commits.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
W-4: team assignment (linguist/reviewer) stored on job at creation,
auto-assigned to all language QC states on first GET /language-qc
(lazy init via auto_assign_defaults)
L-3 WS: broadcast_to_job when reviewer opens VTT for editing;
QCDetail shows "User X is editing [lang]" banner (auto-clears 5s)
R-5: comment broadcast via broadcast_to_job on add_comment();
QCDetail invalidates comments query on language_qc_comment WS event
L-15: QCDetail subscribes to language_qc_assigned WS event →
refetches lang-qc data and shows toast
R-7: VttEditor gets onCuePlay prop; AD editor in QCDetail wires
handleAdCuePlay → switches to accessible video mode, seeks & plays
T-15: beforeunload warning in NewJob while upload is in progress
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
T-6: Add Blob URL native <track> in VideoWithCaptions so browser CC button works in fullscreen.
T-7: Sync hidden <audio> AD playback with video play/pause/seeked events.
T-11: Double Submit Cookie CSRF — _set_auth_cookies issues httponly refresh_token + readable csrf_token; /refresh validates X-CSRF-Token header; frontend reads csrf_token cookie and sends header on all refresh calls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
VttDiffView component (frontend/src/components/VttEditor/VttDiffView.tsx):
- Lazy-loads VTT version list (newest-first) and diffs version 1 (AI baseline)
against the latest version
- Renders unified diff: green lines = added, red lines = removed (unchanged hidden)
- Collapsed by default; expand with "↔ Diff vs AI baseline" button
- Shows +N/-N change summary in header
QCDetail integration:
- VttDiffView added below both Captions and Audio Description VttEditors
(only appears for the selected language)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Backend:
- VttContentResponse gets etag field (SHA1 of captions+AD content)
- VttUpdateRequest gets if_match field (optional)
- GET /jobs/{id}/vtt: computes and returns etag
- PATCH /jobs/{id}/vtt: if if_match present, fetches current content, recomputes
hash, returns 409 Conflict if mismatch
Frontend:
- VttContentResponse type + VttUpdateRequest type updated
- QCDetail stores vttEtag from GET response
- All updateVttMutation calls pass if_match: vttEtag
- 409 responses show specific "Conflict: another user has modified" message
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>