video-accessibility

Author	SHA1	Message	Date
Vadym Samoilenko	1563714454	feat(saas): Phase 3 — membership-based authz + Mailgun + job.organization_id authz.py (new): - MembershipContext — per-request membership dict for the current user - get_membership_context FastAPI dependency - require_org_role(min_role) — dependency factory keyed off org_id path param - require_platform_admin() - OrgScopedQuery — adds organization_id filter; platform admin passes through - bump_user_membership_cache — invalidates Redis key on membership writes dependencies.py: - get_accessible_project_ids now queries memberships collection first; legacy pm_client_ids / team.member_user_ids fallback retained until migration runs (four job-route access checks at lines 608/1054/1181/1538 are fixed via this function) routes_clients.py: - _assert_pm_or_admin and _assert_client_access are now async and query memberships - All 10 call sites updated with await + db arg emailer.py: - Switched from SendGrid to Mailgun REST API via httpx (already in requirements) - _send() is now fully async; same public method signatures preserved - send_completion_email uses _send() config.py: - Added mailgun_api_key, mailgun_domain, mailgun_from settings - sendgrid_api_key kept with empty default for backward compat migration_2026-04-28-000003: - Backfills job.organization_id from project.client_id - Creates (organization_id, status, created_at) sparse index on jobs routes_organizations.py / routes_invitations.py: - Call bump_user_membership_cache after every membership write Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 16:56:42 +01:00
Vadym Samoilenko	2b721d182b	feat: Client → Team → Project isolation system with Project Manager role Backend: - New UserRole.PROJECT_MANAGER with pm_client_ids[] on User model - New models: Client (slug-based), Team (member_user_ids[]), Project (client-scoped) - Job model gains project_id field - New GET/POST/PATCH/DELETE /clients, /clients/{id}/teams, /clients/{id}/projects, /clients/{id}/pm routes (admin-only client CRUD; PM or admin for teams/projects) - get_accessible_project_ids() helper: staff→all, PM→their clients' projects, CLIENT→projects from teams they belong to (with legacy owner fallback) - list_jobs, get_job, bulk_download, get_vtt_content, delete_job all use new isolation Frontend: - UserRole type gains 'project_manager' - Job, JobCreateRequest gain project_id field - Client, Team, Project, PMUser types added - ApiClient: full client/team/project/PM CRUD methods - useClients hook with all query/mutation hooks - Admin pages: ClientList + ClientDetail (teams, members, projects, PM assignment) - NewJob form: client + project picker (shown when clients exist) - Sidebar: Clients nav item for admin and project_manager roles - Routes: /admin/clients and /admin/clients/:clientId behind RoleGate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 15:11:13 +01:00
Vadym Samoilenko	ea21cace96	feat: replace SDK with direct HTTP integration to centralized cost tracker - New services/cost_tracker.py: sync httpx preflight()/record() + async wrappers; BudgetExceeded exception; no-op when COST_TRACKER_BASE_URL is empty - Preflight budget check added before ingestion (Gemini), per-language translation (video-native + traditional), and per-language TTS dispatch - _record_gemini_usage and _record_tts_cost now call cost_tracker directly; removes broken asyncio.get_event_loop() hack from sync Celery worker - Fix: _cost_ctx now threaded into extract_accessibility_targeted (video-native path) - Fix: user_id/cost_project_id now propagated through dispatch_language_tts → synthesize_cue_task.s() and the rerender_accessible_video.py re-render path - Remove oliver-cost-tracker SDK dependency (was commented-out/never installed) - Drop cost_tracker_outbox_path setting and get_cost_tracker() factory - Update COST_TRACKER_BASE_URL default to optical-dev.oliver.solutions in .env.prod.example, docker-compose.yml, and all Cloud Run service yamls - Cloud Run yamls use Secret Manager ref (cost-tracker-api-key) for the API key Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 13:36:15 +01:00
Vadym Samoilenko	ae2c474061	feat: integrate oliver-cost-tracker SDK into video-accessibility Add AI cost tracking to all Gemini and TTS call sites: - config.py: add COST_TRACKER_* env vars (base_url, api_key, source_app, outbox_path, enabled) - dependencies.py: add get_cost_tracker() factory (lru_cache, graceful degradation if SDK not installed) - models/job.py: add cost_tracker_project_id field for cost attribution - services/gemini.py: - add import time, _record_gemini_usage() helper (reads usage_metadata) - add _cost_ctx kwarg to extract_accessibility, extract_accessibility_targeted, transcreate_content, translate_vtt, rewrite_tts_cue - record usage after every generate_content call via asyncio.create_task() - tasks/ingest_and_ai.py: pass _cost_ctx (user_id, job_id, project_id) to extract_accessibility - tasks/translate_and_synthesize.py: build _cost_ctx from job_doc and pass to transcreate_content + translate_vtt calls - tasks/tts_synthesis.py: add user_id + cost_project_id kwargs, add _record_tts_cost() helper (records len(text) chars to cost tracker) - pyproject.toml: document SDK install instructions (comment) - .env.prod.example: add COST_TRACKER_* vars Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 11:30:46 +01:00
Vadym Samoilenko	cf761c4bb6	feat: add linguist role and user management navigation - Add LINGUIST role to UserRole enum (backend + frontend) - Grant linguists access to QC Review, Final Review, review notes, and VTT editing - Add MongoDB migration to update schema validator with linguist role - Add admin seed: vadymsamoilenko@oliver.agency is promoted to admin on startup - Add User Management sidebar link for admin users - Fix Login.tsx role type cast to use UserRole instead of hardcoded union Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 11:46:33 +01:00
Vadym Samoilenko	6f963ff7c4	feat: DCMP compliance, descriptive transcript, new languages, QA bug fixes - Rewrote VTT translation to two-step (text-only → Gemini → apply to original timestamps) preventing caption timing desync - Added polling fallback for all processing states and Safari visibilitychange WebSocket reconnect - Added 11 new TTS languages (cs, da, fi, hu, no, sk, sv, es-419, pt-BR, fr-CA) - Updated caption/AD prompts to DCMP Captioning Key & Description Key standards (line splitting, ♪ music notation, italic tags, caption positioning, ethics guidelines) - Added descriptive transcript generation (WCAG 2.1 §1.2.1) combining captions + AD into plain text - Fixed amix normalize=0 to prevent audio loss in rendered videos - Fixed AD re-timing double-count when source_ms is None - Fixed cue block numbering to be 1-based in VttEditor and Timeline Preview Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 11:50:43 +00:00
Vadym Samoilenko	1e177a6d5c	feat: add ElevenLabs voice selection to frontend and backend Add dynamic ElevenLabs voice catalog with provider toggle in the UI, allowing users to browse ElevenLabs voices, configure stability and similarity boost settings, and preview/synthesize with ElevenLabs TTS. Backend: - New elevenlabs_voices.py service with 1-hour cached API fetching - TTS routes support ?provider= query param for voices and options - Preview endpoint routes to ElevenLabs or Gemini based on provider - stability/similarity_boost params flow through TTS synthesis pipeline - TTSPreferences model extended with ElevenLabs-specific fields - Deprecated hardcoded elevenlabs_voices config (now fetched dynamically) Frontend: - Provider toggle (Gemini/ElevenLabs) in VoiceSelector - ElevenLabsSettingsPanel with stability and similarity boost sliders - VoicePreviewButton supports provider-specific preview parameters - API client passes provider param to voices, options, and preview endpoints - New VoiceInfo, ProviderVoicesResponse, ProviderOptionsResponse types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 13:58:56 +00:00
michael	9580979ac8	feat: add environment-based worker concurrency for Cloud Run mode Allow configuring Celery worker concurrency via environment variables to take advantage of Cloud Run autoscaling: - Add WORKER_CONCURRENCY, WHISPER_WORKER_CONCURRENCY, FFMPEG_WORKER_CONCURRENCY settings to config.py with recommended values documented - Update Dockerfile to use ${WORKER_CONCURRENCY} and ${WHISPER_WORKER_CONCURRENCY} environment variables instead of hardcoded values - Update docker-compose.yml to pass concurrency env vars to worker commands - Add WHISPER_SERVICE_URL and FFMPEG_SERVICE_URL to relevant workers Recommended settings: Local mode: WHISPER=1, FFMPEG=1 (CPU/RAM constrained) Cloud Run mode: WHISPER=10, FFMPEG=20 (match autoscaling limits) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-02 10:27:07 -06:00
michael	79440929f4	feat: add Cloud Run HTTP services for Whisper and FFmpeg Migrate CPU-intensive workloads to Cloud Run for autoscaling: - Add Whisper HTTP service (FastAPI) with /transcribe endpoint - Add FFmpeg HTTP service (FastAPI) with /encode, /probe, /extract-frame, etc. - Add Dockerfiles for both services (8 vCPU, 32GB RAM, Gen2) - Add Cloud Build config for CI/CD deployment - Add Cloud Run service YAML configs with scale-to-zero - Update whisper_transcribe.py to call Cloud Run when WHISPER_SERVICE_URL set - Update video_renderer.py to call Cloud Run when FFMPEG_SERVICE_URL set - Update whisper_service.py for Cloud Run compatibility (no settings dependency) - Add ffmpeg_service_url and whisper_service_url to config.py Services scale 0→N based on request load, falling back to local execution when service URLs are not configured (hybrid mode). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-02 10:12:50 -06:00
michael	d2d8e32819	feat: add video-native translation mode for multi-language content Add a new "Video Native Mode" translation option that re-processes the video through Gemini for each target language, generating captions and audio descriptions directly from visual context. This produces more natural and culturally appropriate content compared to traditional VTT text translation. Changes: - Add translation_mode field to RequestedOutputs (video_native \| traditional) - Create gemini_ingestion_targeted.md prompt for target language generation - Add extract_accessibility_targeted() method to Gemini service - Modify translate_and_synthesize task to handle both translation modes - Add Translation Mode UI selector in NewJob screen (video_native is default) - Remove transcreation UI (replaced by video_native mode) - Remove Google Translate service (replaced by Gemini translation) - Add LanguageSelector component with searchable dropdown 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-31 13:50:05 -06:00
michael	54638d1065	feat: switch Whisper model from large-v3 to medium Medium model is faster and uses less memory (~1.5GB vs ~3GB) while still providing good multilingual transcription quality. Updated in: - config.py - docker-compose.yml - whisper-worker-service.yaml - cloudbuild.yaml - Dockerfile (pre-download) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 22:35:47 -06:00
michael	3538dea47f	fix: update whisper_max_search_window to 30s in config.py The setting in config.py (5.0) was overriding the default in whisper_service.py (30.0). Now both are consistent at 30s. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 21:53:57 -06:00
michael	614ff841fe	feat: upgrade Whisper model from base to large-v3 Uses the multilingual large model for more accurate transcription and sentence boundary detection. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 21:20:03 -06:00
michael	05bde8326d	feat: add Whisper-based pause point refinement for audio descriptions Implements word-level speech analysis using faster-whisper to refine AD pause points. Gemini's timestamps are snapped to natural speech gaps (sentence/phrase boundaries) to prevent pauses mid-word. Key changes: - Add WhisperService for transcription and gap detection - Add dedicated Celery task routed to 'whisper' queue - Integrate refinement into render_accessible_video task - Cache Whisper transcripts in MongoDB for reuse across languages - Add dedicated whisper-worker with concurrency=1 to prevent OOM Configuration: - Uses faster-whisper 'base' model (multilingual, ~145MB) - 5-second search window after Gemini's recommended point - Falls back to original timestamp if no gap found Infrastructure: - New Docker stage: whisper-worker - New Cloud Run service: accessible-video-whisper-worker - Updated docker-compose.yml with whisper-worker service 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 08:27:48 -06:00
michael	6effe58dc9	feat: add video review with timestamped notes to Final Review Add a comprehensive video review feature to the Final Review page that allows reviewers to watch videos with caption overlays and add timestamped notes. Backend: - New ReviewNote model for MongoDB with job_id, asset_key, timestamp, content - CRUD API endpoints at /jobs/{job_id}/review-notes - Owner-only edit/delete permissions (admins can bypass) - Database indexes for efficient querying Frontend: - VideoReviewPlayer component with video player and caption overlay - NotesSidebar for viewing/adding notes with auto-highlight when video reaches timestamp - SyncedCaptionList with auto-scroll and click-to-seek - AssetTabs for switching between languages and accessible videos - React Query hooks with 30s polling for collaborative updates Features: - Notes persist to database and are shared across all reviewers - Notes highlight for 5 seconds when video playback reaches their timestamp - Click note to seek video to that position - Pause video to add note at current timestamp - Accessible videos use retimed captions when available 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 15:30:00 -06:00
michael	865fcdc246	feat: add TTS settings panel with model, speed, and style options - Add model selection (flash vs pro) for quality control - Add speed slider (0.5x - 2.0x) for pacing adjustment - Add style presets (neutral, calm, energetic, professional, warm, documentary) - Add custom style prompt option for advanced customization - New /tts/options endpoint returns available TTS options - Voice preview now tests all settings so users hear exact output - Backward compatible: all new fields have sensible defaults 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 15:22:14 -06:00
michael	29643f6683	upgrade TTS to Gemini TTS with voice selection and preview - Add Gemini TTS service with 30 voices and 24 languages - Add TTS API endpoints for voice listing and preview - Add per-language voice selection in job creation form - Add voice override at QC approval stage - Add VoiceSelector and VoicePreviewButton components - Update TTSPreferences model with provider and voice mapping 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 14:41:57 -06:00
michael	665b49c3f1	added MSAL microsoft authentication	2025-10-10 09:19:39 -05:00
michael	7ea23b9858	fixed objectID/stringID mismatch	2025-10-08 18:23:05 -05:00
michael	1a1ed3048d	wrote docker files and deployment instructions	2025-10-08 16:00:12 -05:00
michael	af2562096a	initial commit	2025-08-24 16:28:33 -05:00

21 commits