video-accessibility

Author	SHA1	Message	Date
Vadym Samoilenko	ea21cace96	feat: replace SDK with direct HTTP integration to centralized cost tracker - New services/cost_tracker.py: sync httpx preflight()/record() + async wrappers; BudgetExceeded exception; no-op when COST_TRACKER_BASE_URL is empty - Preflight budget check added before ingestion (Gemini), per-language translation (video-native + traditional), and per-language TTS dispatch - _record_gemini_usage and _record_tts_cost now call cost_tracker directly; removes broken asyncio.get_event_loop() hack from sync Celery worker - Fix: _cost_ctx now threaded into extract_accessibility_targeted (video-native path) - Fix: user_id/cost_project_id now propagated through dispatch_language_tts → synthesize_cue_task.s() and the rerender_accessible_video.py re-render path - Remove oliver-cost-tracker SDK dependency (was commented-out/never installed) - Drop cost_tracker_outbox_path setting and get_cost_tracker() factory - Update COST_TRACKER_BASE_URL default to optical-dev.oliver.solutions in .env.prod.example, docker-compose.yml, and all Cloud Run service yamls - Cloud Run yamls use Secret Manager ref (cost-tracker-api-key) for the API key Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 13:36:15 +01:00
Vadym Samoilenko	ae2c474061	feat: integrate oliver-cost-tracker SDK into video-accessibility Add AI cost tracking to all Gemini and TTS call sites: - config.py: add COST_TRACKER_* env vars (base_url, api_key, source_app, outbox_path, enabled) - dependencies.py: add get_cost_tracker() factory (lru_cache, graceful degradation if SDK not installed) - models/job.py: add cost_tracker_project_id field for cost attribution - services/gemini.py: - add import time, _record_gemini_usage() helper (reads usage_metadata) - add _cost_ctx kwarg to extract_accessibility, extract_accessibility_targeted, transcreate_content, translate_vtt, rewrite_tts_cue - record usage after every generate_content call via asyncio.create_task() - tasks/ingest_and_ai.py: pass _cost_ctx (user_id, job_id, project_id) to extract_accessibility - tasks/translate_and_synthesize.py: build _cost_ctx from job_doc and pass to transcreate_content + translate_vtt calls - tasks/tts_synthesis.py: add user_id + cost_project_id kwargs, add _record_tts_cost() helper (records len(text) chars to cost tracker) - pyproject.toml: document SDK install instructions (comment) - .env.prod.example: add COST_TRACKER_* vars Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 11:30:46 +01:00
Vadym Samoilenko	6f963ff7c4	feat: DCMP compliance, descriptive transcript, new languages, QA bug fixes - Rewrote VTT translation to two-step (text-only → Gemini → apply to original timestamps) preventing caption timing desync - Added polling fallback for all processing states and Safari visibilitychange WebSocket reconnect - Added 11 new TTS languages (cs, da, fi, hu, no, sk, sv, es-419, pt-BR, fr-CA) - Updated caption/AD prompts to DCMP Captioning Key & Description Key standards (line splitting, ♪ music notation, italic tags, caption positioning, ethics guidelines) - Added descriptive transcript generation (WCAG 2.1 §1.2.1) combining captions + AD into plain text - Fixed amix normalize=0 to prevent audio loss in rendered videos - Fixed AD re-timing double-count when source_ms is None - Fixed cue block numbering to be 1-based in VttEditor and Timeline Preview Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 11:50:43 +00:00
Vadym Samoilenko	f4ddcce066	fix: resolve QA-reported bugs — MP3/VTT desync, crashes, notifications, and more BUG-1 & BUG-2 — Wrong audio plays after re-render / MP3 doesn't match text Root cause: audio files were named by index (cue_0.mp3, cue_1.mp3). When a cue was inserted or deleted, all following indices shifted but old MP3 files kept their original names, so re-render would play the wrong audio for the wrong cue. Fix: renamed files to cue_N_CONTENTHASH.mp3 and introduced an ad_cue_manifest stored in the job document that maps each cue index to its correct GCS URI. Re-render now reads from the manifest instead of guessing by filename. Also: editing AD cue text in the VTT editor now automatically queues TTS regeneration for changed cues — no more silent mismatches. BUG-3 — App crash / state desync when uploading VTT or clearing TTS queue Fixed handleVttFileUpload to only update local editor state after the server confirms the save — previously local state was updated first, so a network error left the UI showing content that wasn't actually saved. Fixed handleClearRegenerationQueue to only remove items from local state if the server removal succeeded — previously all items were cleared regardless. BUG-4 — AI generates different audio descriptions every time Added GenerateContentConfig(temperature=0.2, top_p=0.8, top_k=40) to the Gemini API call so output is more consistent across runs. BUG-5 — On-screen text inconsistently described Strengthened the AI prompt rule from a vague suggestion to a mandatory requirement with an explicit format: "Text on screen reads: [exact text]". Applied to both gemini_ingestion.md and gemini_ingestion_targeted.md. BUG-6 — No notification when re-render finishes Added rendering_qc toast notification and a dismissible green banner that appears in QCDetail when re-render transitions to pending_qc. The banner auto-dismisses after 10 seconds. Also increased WebSocket reconnect attempts from 5 to 15 and capped backoff at 60s to prevent falling back to manual refresh. BUG-7 — Timeline preview looks accurate but isn't after edits Added isStale prop to TimelinePreview. The timeline now shows an amber tint and "Preview may be outdated" label whenever there are unsaved pause point changes, pending TTS regenerations, or a new VTT has been uploaded. BUG-8 — ElevenLabs API errors break TTS with no fallback Added try/except fallback chain in _synthesize_single_cue: if the configured provider fails, it automatically retries with google, then gemini. BUG-9 — Concurrent re-render requests cause race conditions Made the PENDING_QC → RENDERING_QC status transition conditional (only succeeds if the job is still in PENDING_QC). Returns HTTP 409 if a re-render is already in progress. The completion transition back to PENDING_QC is also conditional so a cancelled/overridden render doesn't corrupt job state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-24 13:23:55 +00:00
Vadym Samoilenko	c413fcb747	feat: add SDH (Subtitles for Deaf and Hard of Hearing) caption output SDH captions extend standard VTT with speaker identification labels, sound effects [PHONE RINGS], music notation ♪, and off-screen indicators. - Add sdh_vtt flag to RequestedOutputs model and frontend form - Add sdh_captions_vtt_gcs field to LangOutput model - Inject SDH generation instructions into both Gemini prompts via {SDH_FIELD} and {SDH_GUIDELINES} placeholders when requested - Upload sdh_captions.vtt to GCS in ingest task - Pass SDH through video_native translation (Gemini generates it directly) and traditional translation (translate source SDH VTT via Gemini) - Expose sdh_captions_vtt in downloads endpoint and bulk zip export Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-18 15:02:18 +00:00
Vadym Samoilenko	2e8a8dc287	feat: add brand context, ethics guidelines, and improved AD prompt rules - Add brand_context field (job model, API, frontend form) so clients can list brand names present in their video; Gemini uses these names instead of generic descriptors (e.g. "Sellotape" not "sticky tape") - Add ethical guidelines section to both Gemini prompts covering person-first language, consistent race/gender description only when plot-relevant, no guessing at unconfirmed identity - Revamp audio description rules: priority ordering (essential → high-priority → time-permitting), pre-teaching placement, no cinematic jargon, succinct style replacing the former "20% longer" instruction - Thread brand_context through full stack: routes → job doc → ingest task → translate task → both Gemini prompt templates Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-18 14:46:09 +00:00
Vadym Samoilenko	c6c7ff51c7	fix: clear stale pause points when AD VTT is re-uploaded Old pause_points in edit_state always overrode new VTT cue timings during re-render, making AD VTT upload for timing adjustments non-functional. Clear pause_points and video_segments on AD VTT upload so re-render falls back to the new cue start times. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 17:07:55 +00:00
Vadym Samoilenko	1e177a6d5c	feat: add ElevenLabs voice selection to frontend and backend Add dynamic ElevenLabs voice catalog with provider toggle in the UI, allowing users to browse ElevenLabs voices, configure stability and similarity boost settings, and preview/synthesize with ElevenLabs TTS. Backend: - New elevenlabs_voices.py service with 1-hour cached API fetching - TTS routes support ?provider= query param for voices and options - Preview endpoint routes to ElevenLabs or Gemini based on provider - stability/similarity_boost params flow through TTS synthesis pipeline - TTSPreferences model extended with ElevenLabs-specific fields - Deprecated hardcoded elevenlabs_voices config (now fetched dynamically) Frontend: - Provider toggle (Gemini/ElevenLabs) in VoiceSelector - ElevenLabsSettingsPanel with stability and similarity boost sliders - VoicePreviewButton supports provider-specific preview parameters - API client passes provider param to voices, options, and preview endpoints - New VoiceInfo, ProviderVoicesResponse, ProviderOptionsResponse types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 13:58:56 +00:00
michael	030f1b67ee	fix: enforce AD cue pause_point monotonicity to preserve cue order Whisper's snap_pause_point() finds the nearest sentence boundary independently per cue, which can move a later cue's pause_point before an earlier cue's. The renderer then sorts by pause_point, producing non-sequential cue indices in the timeline. Add a forward monotonicity pass (clamp each pause_point >= previous) at three layers for defense-in-depth: - whisper_service: Phase 3 after consolidation - video_renderer: before temporal sort in _render_pause_insert_method - rerender_accessible_video: in _build_placements_with_adjustments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 08:15:06 -06:00
michael	577ed44dab	fix: queue TTS regeneration for shifted cues when inserting AD cue When a new AD cue is inserted in the middle of existing cues, the system now automatically queues TTS regeneration for the new cue AND all cues that shifted positions. This ensures MP3 file indices stay synchronized with VTT cue indices, preventing cues from being silently dropped during re-render. Changes: - VttEditor: Add onCueInserted callback to notify parent of insertions - QCDetail: Track insertion context and queue TTS for all shifted cues - rerender_accessible_video: Add warning log when cue/MP3 count mismatch Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-12 14:24:36 -06:00
michael	a6cd4cde07	fix: store source video coordinates in pause points for correct re-rendering The re-render task was using pause point coordinates from the accessible video timeline (which includes freeze frame durations) instead of the original source video coordinates. This caused pause points to exceed the source video duration and get clamped incorrectly. Changes: - Add source_ms field to PausePointData model to store source video cut point - Update video_renderer.py to populate source_ms when building pause points - Update rerender_accessible_video.py to use source_ms for placement calculations - Apply user adjustments as relative offsets (delta-based adjustment) - Update API responses and TypeScript types to include source_ms - Add backward compatibility fallback for jobs without source_ms Note: Existing jobs need to be re-processed from initial render to populate the new source_ms field. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-11 10:48:41 -06:00
michael	a59dbb60ac	fix: register rerender_accessible_video task with Celery worker The task was created but not imported in the Celery task registry, causing "Received unregistered task" error when triggering re-render. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-11 10:12:50 -06:00
michael	aa6777d2c2	feat: add QC accessible video review and editing capabilities - Reorder workflow: translations now happen BEFORE QC Review step - Add language tabs to switch between translated languages in QC - Add video mode tabs (Original Video / Accessible Video) - Add interactive timeline preview showing video segments and AD cues - Enable pause point adjustment with millisecond precision - Add TTS regeneration queue for selective cue re-synthesis - Add re-render controls with optional Whisper refinement - Persist video segments and TTS MP3s to GCS for editability - Add new RENDERING_QC job status for re-render operations - Create 5 new API endpoints for accessible video editing - Add rerender_accessible_video.py Celery task Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-11 08:32:27 -06:00
michael	add958008a	fix: use actual freeze segment durations for VTT subtitle retiming Subtitles were appearing progressively out of sync (~1.0s early per AD) because the VTT retimer calculated freeze durations theoretically rather than using actual rendered segment durations. Changes: - video_renderer: Measure actual freeze segment duration after creation - video_renderer: Return updated placements with actual_freeze_duration - vtt_retimer: Prefer actual_freeze_duration over calculated values - render_task: Pass actual durations to VTT retimer This ensures subtitle timing matches the real video timeline regardless of any FFmpeg encoding variations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 15:52:57 -06:00
michael	e44210ea64	feat: auto-rewrite TTS cues that fail synthesis When TTS synthesis fails after 3 retries, the system now: - Sends problematic cue text to Gemini for TTS-safe rewriting - Updates the VTT file in GCS with rewritten text - Retries TTS synthesis with the new text - Records successful rewrites in job.tts_rewrites field UI changes: - JobDetail shows amber caution box with original/rewritten text - JobsList shows warning icon next to jobs with rewrites - Error display clarifies text shown is "after rewrite attempt" Files changed: - backend/app/models/job.py: Add tts_rewrites field - backend/app/prompts/gemini_tts_rewrite.md: New prompt template - backend/app/services/gemini.py: Add rewrite_tts_cue method - backend/app/tasks/tts_synthesis.py: Add VTT update utilities - backend/app/tasks/translate_and_synthesize.py: Rewrite+retry logic - frontend/src/types/api.ts: Add TTSRewriteItem type - frontend/src/routes/jobs/JobDetail.tsx: Caution display - frontend/src/routes/jobs/JobsList.tsx: Warning indicator 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 14:42:50 -06:00
michael	8606877d01	fix: properly set tts_failed status when TTS synthesis fails The TTS error handling had a bug where failed jobs stayed in 'tts_generating' status instead of being set to 'tts_failed'. Root cause: synthesize_cue_task used autoretry_for=(Exception,) which raises the original exception after max retries, not MaxRetriesExceededError. The exception handler never fired. Changes: - tts_synthesis.py: Replace autoretry_for with manual retry logic that returns a failure dict on final failure instead of raising - translate_and_synthesize.py: Add propagate=False to group.get() to safely retrieve all results including failures - translate_and_synthesize.py: Update outer exception handler to set job status to tts_failed, store error details, and broadcast status update via WebSocket Now TTS failures will: 1. Set job status to 'tts_failed' 2. Store detailed error info (cue index, text, message) 3. Show error in UI with retry button 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 10:45:33 -06:00
michael	c512bdc184	feat: use AD VTT pause points instead of Gemini video analysis Optimize the accessible video workflow by eliminating the dedicated Gemini video analysis call for pause point estimation. Instead: - Use AD VTT cue start times as initial pause points for Whisper refinement - Add user-selectable accessible video method (pause_insert/overlay) at QC approval - Add bulk approval API endpoint with method selection - Add method selector UI to QCDetail page - Add bulk approval modal to QCList for jobs with accessible video Benefits: - Eliminates expensive Gemini API call with video upload - Faster workflow (~5-15 seconds saved per job) - Cost savings on Gemini video analysis - User control over accessible video integration method Backend changes: - Add accessible_video_method to RequestedOutputs and ApproveSourceRequest - Add POST /jobs/bulk/approve endpoint - Replace Gemini call with _build_placements_from_ad_vtt() helper - Mark analyze_accessible_video_placement() as deprecated Frontend changes: - Add method selector radio buttons to QCDetail - Add bulk approval modal with method selection to QCList - Update API client and React Query hooks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-03 19:05:45 -06:00
michael	7d2366d0f4	fix: add authentication for Cloud Run service calls Cloud Run services are deployed with --no-allow-unauthenticated, requiring an ID token in the Authorization header. - Add _get_cloud_run_id_token() helper using google-auth library - Update whisper_transcribe.py to include Bearer token in Cloud Run calls - Update video_renderer.py to include Bearer token in FFmpeg Cloud Run calls The ID token is fetched using the service account credentials (GOOGLE_APPLICATION_CREDENTIALS) and targets the Cloud Run service URL. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-02 11:41:07 -06:00
michael	79440929f4	feat: add Cloud Run HTTP services for Whisper and FFmpeg Migrate CPU-intensive workloads to Cloud Run for autoscaling: - Add Whisper HTTP service (FastAPI) with /transcribe endpoint - Add FFmpeg HTTP service (FastAPI) with /encode, /probe, /extract-frame, etc. - Add Dockerfiles for both services (8 vCPU, 32GB RAM, Gen2) - Add Cloud Build config for CI/CD deployment - Add Cloud Run service YAML configs with scale-to-zero - Update whisper_transcribe.py to call Cloud Run when WHISPER_SERVICE_URL set - Update video_renderer.py to call Cloud Run when FFMPEG_SERVICE_URL set - Update whisper_service.py for Cloud Run compatibility (no settings dependency) - Add ffmpeg_service_url and whisper_service_url to config.py Services scale 0→N based on request load, falling back to local execution when service URLs are not configured (hybrid mode). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-02 10:12:50 -06:00
michael	c1c0b876fc	feat: add RENDER_FAILED status with error propagation to GUI - Add RENDER_FAILED job status for when video rendering fails - Fix _check_accessible_video_completion to detect failures and transition job status accordingly (was stuck in RENDERING_VIDEO forever) - Store detailed error info in job.error including failed_languages array - Call completion check after failures to properly update job status - Broadcast WebSocket notification on render failures Frontend: - Add render_failed to JobStatus type and StatusBadge (red styling) - Add tts_failed and render_failed to JobsList STATUS_LABELS - Enhance JobDetail error display with: - Warning icon and prominent styling - Error type and message - Failed languages list with per-language errors - Timestamp of when error occurred - Update ProgressIndicator to handle failed states with red dot 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-01 10:18:27 -06:00
michael	77be93b526	perf: parallelize video-native translations with asyncio.gather Video-native translation mode now processes all target languages in parallel using asyncio.gather() with a semaphore (max 3 concurrent) for rate limiting. This significantly reduces total translation time when multiple languages are selected. - Add MAX_CONCURRENT_VIDEO_NATIVE constant for rate limiting - Refactor video-native path to use parallel coroutines - Keep traditional VTT translation mode sequential - Handle per-language errors without stopping other translations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-01 09:21:07 -06:00
michael	d2d8e32819	feat: add video-native translation mode for multi-language content Add a new "Video Native Mode" translation option that re-processes the video through Gemini for each target language, generating captions and audio descriptions directly from visual context. This produces more natural and culturally appropriate content compared to traditional VTT text translation. Changes: - Add translation_mode field to RequestedOutputs (video_native \| traditional) - Create gemini_ingestion_targeted.md prompt for target language generation - Add extract_accessibility_targeted() method to Gemini service - Modify translate_and_synthesize task to handle both translation modes - Add Translation Mode UI selector in NewJob screen (video_native is default) - Remove transcreation UI (replaced by video_native mode) - Remove Google Translate service (replaced by Gemini translation) - Add LanguageSelector component with searchable dropdown 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-31 13:50:05 -06:00
michael	6689778be7	feat: add dedicated TTS worker with parallel per-cue synthesis Break out TTS synthesis into a dedicated Celery worker (tts queue) with concurrency=8 for parallel processing. Each AD cue is now synthesized as a separate task, enabling up to 8 cues to be processed simultaneously. Key changes: - Add tts_synthesis.py with synthesize_cue_task for per-cue synthesis - Refactor translate_and_synthesize.py to dispatch cue tasks in parallel - Add tts-worker service to docker-compose.yml (concurrency=8) - Add Cloud Run service config for production deployment Benefits: - Parallel synthesis even for single jobs (e.g., 50 cues → 8 concurrent) - Natural rate limiting across multiple concurrent jobs - Fault tolerance with per-cue retries and GCS persistence 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-30 14:23:11 -06:00
michael	3588d3fa14	refactor: rewrite pause point refinement algorithm with ordered logic Completely rewrites the Whisper-based pause point refinement to use a two-phase approach with explicit ordering: Phase 1 - Individual refinement: 1. Check if pause point is "during speaking" (words within ±2s) - If NOT during speaking → use Gemini's exact point, no overlap 2. If during speaking, find nearest sentence boundary 3. Apply appropriate buffering based on context: - Case A: First sentence → pause 500ms before sentence starts - Case B: Last sentence → pause 500ms after sentence ends - Case C: Between sentences → full double buffer (overlap) Phase 2 - Consolidation (after all refinements): - Consolidate cues within 5s of each other to play back-to-back Key changes: - Add SentenceBoundary dataclass for tracking boundaries with context - Add _is_during_speaking() helper to detect speech proximity - Add _find_sentence_boundaries() with longest-gap fallback - Rewrite snap_pause_point() with new ordered algorithm - Update refine_all_pause_points() to pass words and use two phases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-29 08:19:03 -06:00
michael	ee6a30e7a7	feat: always generate fresh Whisper transcripts (disable caching) Remove the cached transcript lookup - always run a fresh Whisper transcription for each accessible video render. This ensures we get accurate word timestamps for the current video file. The transcript is still saved to the job document for debugging and auditing purposes, but it will never be read back for reuse. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 23:25:35 -06:00
michael	1c22872e69	fix: use dedicated whisper worker with FFmpeg dispatch pattern Changed the Whisper transcription to run on dedicated whisper-worker using the same dispatch pattern as FFmpeg: 1. apply_async() to dispatch to the whisper queue 2. Poll with ready() using async sleep to avoid blocking 3. Use allow_join_result() context manager 4. Get result only after task is ready This ensures Whisper runs with concurrency=1 on a dedicated worker to prevent memory overload while still allowing the render task to wait for results without deadlocking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 08:53:53 -06:00
michael	7b0ebb357c	fix: run Whisper transcription inline instead of as subtask Celery does not allow calling result.get() within a task as it causes deadlocks. Changed the implementation to run Whisper transcription directly using asyncio.to_thread() instead of dispatching to a separate Celery queue. The Whisper transcript is still cached in MongoDB for reuse across language variants. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 08:48:41 -06:00
michael	05bde8326d	feat: add Whisper-based pause point refinement for audio descriptions Implements word-level speech analysis using faster-whisper to refine AD pause points. Gemini's timestamps are snapped to natural speech gaps (sentence/phrase boundaries) to prevent pauses mid-word. Key changes: - Add WhisperService for transcription and gap detection - Add dedicated Celery task routed to 'whisper' queue - Integrate refinement into render_accessible_video task - Cache Whisper transcripts in MongoDB for reuse across languages - Add dedicated whisper-worker with concurrency=1 to prevent OOM Configuration: - Uses faster-whisper 'base' model (multilingual, ~145MB) - 5-second search window after Gemini's recommended point - Falls back to original timestamp if no gap found Infrastructure: - New Docker stage: whisper-worker - New Cloud Run service: accessible-video-whisper-worker - Updated docker-compose.yml with whisper-worker service 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 08:27:48 -06:00
michael	54799f4662	fix: broadcast WebSocket updates for ingesting and ai_processing status Previously only the final pending_qc status was broadcast via WebSocket. Now all intermediate status changes (ingesting, ai_processing) are also broadcast so the frontend can update in real-time during reprocessing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 07:38:25 -06:00
michael	396e4e74e0	feat: add rendering_video status for accessible video processing When jobs with accessible video option enabled enter video rendering phase, the status now transitions to 'rendering_video' so users can see why processing is taking longer. This provides better visibility into the video rendering pipeline. Changes: - Added RENDERING_VIDEO status to JobStatus enum - Updated render_accessible_video task to set new status - Added status display to StatusBadge, jobStatusMessages - Included new status in JobsList Translation filter group 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 06:49:46 -06:00
michael	901083b426	fix: ensure temp files use shared volume with correct permissions - Modified render_accessible_video.py to explicitly pass TMPDIR to tempfile.TemporaryDirectory() so files are created in shared volume - Updated docker-compose.yml to run containers as root initially, chown /shared-tmp to app:app, then switch to app user for celery - This ensures both worker containers can access the same temp files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 06:15:45 -06:00
michael	bf1c321088	feat: add dedicated ffmpeg queue to prevent server overload Add a dedicated Celery queue (ffmpeg) with concurrency=1 to serialize all FFmpeg operations. This prevents CPU spikes when multiple render tasks run in parallel with multiple languages. Changes: - Add ffmpeg_operations.py with run_ffmpeg_command and run_ffprobe_command tasks - Update VideoRendererService to dispatch ffmpeg commands via the queue - Add ffmpeg-worker service to docker-compose with --concurrency=1 - Configure main worker to exclude the ffmpeg queue 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 17:56:23 -06:00
michael	80d3866d32	feat: add accessible video (MP4 with embedded audio descriptions) Add new deliverable type that renders video with audio descriptions embedded. Supports two AI-determined methods: - Direct Overlay: ducks original audio and overlays AD TTS (for minimal dialogue) - Pause-Insert: freeze-frame video, insert AD, re-time subtitles (for significant dialogue) Backend: - Add Pydantic schemas for Gemini analysis response - Add Gemini prompt and analyze_accessible_video_placement() method - Add video_renderer.py service using FFmpeg for both rendering methods - Add vtt_retimer.py service for pause-insert subtitle adjustment - Add render_accessible_video.py Celery task - Modify TTS service to return individual per-cue segments - Update translate_and_synthesize.py to save segments and trigger rendering - Update download endpoint to include accessible video outputs Frontend: - Add accessible_video_mp4 checkbox to NewJob form - Update TypeScript types for new deliverable 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 11:06:41 -06:00
michael	865fcdc246	feat: add TTS settings panel with model, speed, and style options - Add model selection (flash vs pro) for quality control - Add speed slider (0.5x - 2.0x) for pacing adjustment - Add style presets (neutral, calm, energetic, professional, warm, documentary) - Add custom style prompt option for advanced customization - New /tts/options endpoint returns available TTS options - Voice preview now tests all settings so users hear exact output - Backward compatible: all new fields have sensible defaults 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 15:22:14 -06:00
michael	29643f6683	upgrade TTS to Gemini TTS with voice selection and preview - Add Gemini TTS service with 30 voices and 24 languages - Add TTS API endpoints for voice listing and preview - Add per-language voice selection in job creation form - Add voice override at QC approval stage - Add VoiceSelector and VoicePreviewButton components - Update TTSPreferences model with provider and voice mapping 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 14:41:57 -06:00
michael	58a4f1f627	add support for non-English original video uploads - Upload form now has "English / Different language" radio with optional language hint - Gemini auto-detects language and saves outputs to outputs.{detected_language} - QC review dynamically loads/saves VTT for source language - New APPROVED_SOURCE status for non-English videos (APPROVED_ENGLISH kept for backwards compat) - Translation pipeline reads from source language and passes source_language to Google Translate - All existing English jobs continue to work unchanged 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 10:33:58 -06:00
michael	762d7bcb38	fixed websockets live messaging for updates	2025-10-16 11:46:37 -05:00
michael	de61d0bd39	removed mongodb change stream monitoring, added global websockets monitoring for notifications, broke symmetry between toasts and persistent notifications (and refined which notifications get sent and how)	2025-08-25 15:48:18 -05:00
michael	0c54dd4f29	added websockets for live job status updates with toast notifications on job list page	2025-08-24 19:41:23 -05:00
michael	af2562096a	initial commit	2025-08-24 16:28:33 -05:00

40 commits