video-accessibility

Author	SHA1	Message	Date
michael	d965d1467a	fix: use rendered video coordinates for pause point positions Pause points were being stored with source video timestamps instead of rendered video timeline coordinates. This caused misalignment between the pause point markers and freeze frame segments in the timeline UI. Now pause points are calculated from the freeze frame segment start positions in the rendered timeline, ensuring they align correctly with the AD audio segments. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-11 09:37:31 -06:00
michael	aa6777d2c2	feat: add QC accessible video review and editing capabilities - Reorder workflow: translations now happen BEFORE QC Review step - Add language tabs to switch between translated languages in QC - Add video mode tabs (Original Video / Accessible Video) - Add interactive timeline preview showing video segments and AD cues - Enable pause point adjustment with millisecond precision - Add TTS regeneration queue for selective cue re-synthesis - Add re-render controls with optional Whisper refinement - Persist video segments and TTS MP3s to GCS for editability - Add new RENDERING_QC job status for re-render operations - Create 5 new API endpoints for accessible video editing - Add rerender_accessible_video.py Celery task Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-11 08:32:27 -06:00
michael	c5f59b1079	fix: use local ffprobe for freeze segment duration measurement The previous implementation incorrectly used _get_video_duration which in Cloud Run mode uses the cached source video URI instead of actually measuring the freeze segment files. This caused all freeze segments to report the source video duration (~78s) instead of their actual duration. Changed to use _get_video_duration_local directly since freeze segments are local files and need to be measured directly via ffprobe. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 16:11:03 -06:00
michael	add958008a	fix: use actual freeze segment durations for VTT subtitle retiming Subtitles were appearing progressively out of sync (~1.0s early per AD) because the VTT retimer calculated freeze durations theoretically rather than using actual rendered segment durations. Changes: - video_renderer: Measure actual freeze segment duration after creation - video_renderer: Return updated placements with actual_freeze_duration - vtt_retimer: Prefer actual_freeze_duration over calculated values - render_task: Pass actual durations to VTT retimer This ensures subtitle timing matches the real video timeline regardless of any FFmpeg encoding variations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 15:52:57 -06:00
michael	e44210ea64	feat: auto-rewrite TTS cues that fail synthesis When TTS synthesis fails after 3 retries, the system now: - Sends problematic cue text to Gemini for TTS-safe rewriting - Updates the VTT file in GCS with rewritten text - Retries TTS synthesis with the new text - Records successful rewrites in job.tts_rewrites field UI changes: - JobDetail shows amber caution box with original/rewritten text - JobsList shows warning icon next to jobs with rewrites - Error display clarifies text shown is "after rewrite attempt" Files changed: - backend/app/models/job.py: Add tts_rewrites field - backend/app/prompts/gemini_tts_rewrite.md: New prompt template - backend/app/services/gemini.py: Add rewrite_tts_cue method - backend/app/tasks/tts_synthesis.py: Add VTT update utilities - backend/app/tasks/translate_and_synthesize.py: Rewrite+retry logic - frontend/src/types/api.ts: Add TTSRewriteItem type - frontend/src/routes/jobs/JobDetail.tsx: Caution display - frontend/src/routes/jobs/JobsList.tsx: Warning indicator 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 14:42:50 -06:00
michael	83e4752327	feat: add server-side zip download for bulk job downloads Replace sequential browser-based bulk download with server-side zip generation. When users select "Download All Files" from bulk actions, the system now creates a single organized .zip file containing all job assets. Changes: - Add POST /jobs/bulk/download endpoint that streams zip to client - Add BulkDownloadRequest schema for the new endpoint - Create zip_download.py service with streaming zip generation - Update frontend to call new endpoint and download single zip file - Organize files in zip by job title and language subdirectories Zip structure: accessible_video_YYYYMMDD_HHMMSS.zip └── {job_title}/ ├── source.mp4 └── {lang}/ ├── captions.vtt ├── ad.vtt ├── ad.mp3 ├── accessible_video.mp4 └── accessible_captions.vtt 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 15:57:57 -06:00
michael	c512bdc184	feat: use AD VTT pause points instead of Gemini video analysis Optimize the accessible video workflow by eliminating the dedicated Gemini video analysis call for pause point estimation. Instead: - Use AD VTT cue start times as initial pause points for Whisper refinement - Add user-selectable accessible video method (pause_insert/overlay) at QC approval - Add bulk approval API endpoint with method selection - Add method selector UI to QCDetail page - Add bulk approval modal to QCList for jobs with accessible video Benefits: - Eliminates expensive Gemini API call with video upload - Faster workflow (~5-15 seconds saved per job) - Cost savings on Gemini video analysis - User control over accessible video integration method Backend changes: - Add accessible_video_method to RequestedOutputs and ApproveSourceRequest - Add POST /jobs/bulk/approve endpoint - Replace Gemini call with _build_placements_from_ad_vtt() helper - Mark analyze_accessible_video_placement() as deprecated Frontend changes: - Add method selector radio buttons to QCDetail - Add bulk approval modal with method selection to QCList - Update API client and React Query hooks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-03 19:05:45 -06:00
michael	5342ab1a28	fix: prevent event loop closed error in video renderer Cloud Run calls Use context manager for AsyncClient instead of caching on singleton. Each asyncio.run() creates a new event loop, so cached clients bound to previous event loops fail when reused across jobs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-03 08:44:27 -06:00
michael	3e2099515a	fix: use async httpx client for true parallel Cloud Run calls Changed from httpx.Client (sync) to httpx.AsyncClient so that asyncio.gather() actually executes HTTP calls in parallel instead of blocking the event loop sequentially. Before: ~5 min for 18 segments (serial HTTP calls despite gather) After: ~30 sec for 18 segments (truly parallel HTTP calls) Changes: - _http_client: httpx.Client -> httpx.AsyncClient - _call_cloud_run_probe: sync -> async - _call_cloud_run_endpoint: sync -> async - Added await to all Cloud Run HTTP calls 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-03 08:11:46 -06:00
michael	e2302d497d	perf: parallelize FFmpeg Cloud Run calls using asyncio.gather() Refactored _render_pause_insert to execute FFmpeg operations in parallel phases instead of sequentially: Phase 1: Parallel extraction - Generate shared silence (once, reused by all) - Extract ALL video segments simultaneously - Extract ALL freeze frames simultaneously - Extract final segment Phase 2: Parallel audio concatenation - Concatenate ALL audio tracks (silence + AD + silence) simultaneously Phase 3: Parallel freeze segment creation - Create ALL freeze segments simultaneously Phase 4: Assemble segments in correct order for final concatenation This reduces render time from ~3 minutes (serial) to ~30 seconds (parallel) for an 8-cue video when using Cloud Run FFmpeg service. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-02 17:18:23 -06:00
michael	87a4b1ab77	fix: use command_template instead of ffmpeg_args in _generate_silence_cloud_run The /run-ffmpeg Cloud Run endpoint expects command_template field with ffmpeg command placeholders, not ffmpeg_args. This fixes 422 validation errors when generating silence audio via Cloud Run. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-02 16:57:17 -06:00
michael	e68bac2f60	fix: correct FFmpeg probe request parameter name The /probe endpoint expects 'gcs_uri' but we were sending 'source_gcs_uri'. Fixed to match the ProbeRequest model in ffmpeg_http_service.py. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-02 16:43:01 -06:00
michael	7d2366d0f4	fix: add authentication for Cloud Run service calls Cloud Run services are deployed with --no-allow-unauthenticated, requiring an ID token in the Authorization header. - Add _get_cloud_run_id_token() helper using google-auth library - Update whisper_transcribe.py to include Bearer token in Cloud Run calls - Update video_renderer.py to include Bearer token in FFmpeg Cloud Run calls The ID token is fetched using the service account credentials (GOOGLE_APPLICATION_CREDENTIALS) and targets the Cloud Run service URL. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-02 11:41:07 -06:00
michael	79440929f4	feat: add Cloud Run HTTP services for Whisper and FFmpeg Migrate CPU-intensive workloads to Cloud Run for autoscaling: - Add Whisper HTTP service (FastAPI) with /transcribe endpoint - Add FFmpeg HTTP service (FastAPI) with /encode, /probe, /extract-frame, etc. - Add Dockerfiles for both services (8 vCPU, 32GB RAM, Gen2) - Add Cloud Build config for CI/CD deployment - Add Cloud Run service YAML configs with scale-to-zero - Update whisper_transcribe.py to call Cloud Run when WHISPER_SERVICE_URL set - Update video_renderer.py to call Cloud Run when FFMPEG_SERVICE_URL set - Update whisper_service.py for Cloud Run compatibility (no settings dependency) - Add ffmpeg_service_url and whisper_service_url to config.py Services scale 0→N based on request load, falling back to local execution when service URLs are not configured (hybrid mode). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-02 10:12:50 -06:00
michael	d2d8e32819	feat: add video-native translation mode for multi-language content Add a new "Video Native Mode" translation option that re-processes the video through Gemini for each target language, generating captions and audio descriptions directly from visual context. This produces more natural and culturally appropriate content compared to traditional VTT text translation. Changes: - Add translation_mode field to RequestedOutputs (video_native \| traditional) - Create gemini_ingestion_targeted.md prompt for target language generation - Add extract_accessibility_targeted() method to Gemini service - Modify translate_and_synthesize task to handle both translation modes - Add Translation Mode UI selector in NewJob screen (video_native is default) - Remove transcreation UI (replaced by video_native mode) - Remove Google Translate service (replaced by Gemini translation) - Add LanguageSelector component with searchable dropdown 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-31 13:50:05 -06:00
michael	e8b940aee8	feat: add TTS_FAILED status and robust error handling for TTS synthesis Add comprehensive error handling for TTS synthesis failures: Backend: - Add TTS_FAILED status to JobStatus enum for failed synthesis jobs - Add TTSSynthesisError exception with cue index and context tracking - Improve null-safe error handling in Gemini TTS response parsing - Add _synthesize_cue_with_retry() with exponential backoff (3 attempts) - Enhanced error logging with text preview and model context Frontend: - Add TTS_FAILED status styling (red badge) in StatusBadge component - Add tts_failed to JobStatus TypeScript type 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-30 14:26:07 -06:00
michael	b11c3d0d4f	fix: rewrite VTT retiming algorithm to prevent captions during AD segments The VTT retimer had two bugs causing subtitles to display during freeze periods and become out of sync: 1. Same offset applied to both start and end times (should differ when pause falls between them) 2. Cues spanning pause points weren't split (causing captions during freeze) Changes: - Add _offset_at() for timestamps AT or AFTER pause points - Add _offset_before() for timestamps STRICTLY BEFORE pause points - Add _retime_cue() to split cues at pause points into multiple segments - Add _filter_short_segments() to remove <100ms segments after splitting - Rewrite retime_for_pause_insert() to use new helper methods Example fix for cue 8s-12s with pause at 10s (4s freeze): - Before: 8s-12s (displayed during freeze!) - After: 8s-10s + 14s-16s (gap during AD) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-30 09:01:03 -06:00
michael	37593dd4bc	refactor: simplify pause point algorithm with midpoint snapping and silence buffers Replace complex overlap/catch-up logic with simpler approach: - Snap pause points to midpoint between sentences (not sentence boundaries) - Add 500ms silence before AND after AD audio during freeze frame - Resume playback from same midpoint (no overlap, no visual jump-back) This eliminates audio/visual anomalies caused by the previous algorithm's complexity around sentence boundary snapping and audio catch-up. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-29 09:55:40 -06:00
michael	37f5e8d1b0	fix: validate pause points and frame extraction - Get video duration BEFORE the render loop (not after) - Clamp pause_point to 100ms before video end if it exceeds duration - Add validation in _extract_frame() to verify frame was created - Add debug logging for frame extraction timestamps This prevents "Frame file not found" errors when pause points calculated by Whisper refinement exceed the source video duration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-29 09:32:13 -06:00
michael	40ece78652	feat: implement audio catch-up to eliminate visual jump-back artifact When a pause point falls between two sentences, the previous algorithm created a visual jump-back where the video rewound to resume_from after the AD played. This was distracting to viewers. New behavior: - Video plays normally to pause_point - Freeze frame shows + AD audio plays - Freeze frame CONTINUES while source audio from [resume_from, pause_point] plays (the "catch-up" audio) - Video resumes smoothly from pause_point (no visual jump) The audio from the overlap region plays twice (once during video, once during freeze extension) but this is acceptable as it's typically <1s and provides natural audio context around the AD. Implementation: - Add _extract_audio_segment() to extract catch-up audio from source - Add _concatenate_audio() to join AD + catch-up audio - Modify render loop to create extended freeze segments with combined audio - Resume video from pause_point instead of resume_from 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-29 09:02:46 -06:00
michael	ce7a1b182f	fix: improve FFmpeg error reporting and add input validation - Show last 1500 chars of stderr instead of first 500 to capture actual error messages (FFmpeg writes version banner first, errors at end) - Add validation for freeze segment creation: - Check duration > 0 - Verify frame and audio files exist - Add debug logging for parameters This helps diagnose FFmpeg failures that were previously showing only version/configuration info instead of the actual error. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-29 08:41:32 -06:00
michael	3588d3fa14	refactor: rewrite pause point refinement algorithm with ordered logic Completely rewrites the Whisper-based pause point refinement to use a two-phase approach with explicit ordering: Phase 1 - Individual refinement: 1. Check if pause point is "during speaking" (words within ±2s) - If NOT during speaking → use Gemini's exact point, no overlap 2. If during speaking, find nearest sentence boundary 3. Apply appropriate buffering based on context: - Case A: First sentence → pause 500ms before sentence starts - Case B: Last sentence → pause 500ms after sentence ends - Case C: Between sentences → full double buffer (overlap) Phase 2 - Consolidation (after all refinements): - Consolidate cues within 5s of each other to play back-to-back Key changes: - Add SentenceBoundary dataclass for tracking boundaries with context - Add _is_during_speaking() helper to detect speech proximity - Add _find_sentence_boundaries() with longest-gap fallback - Rewrite snap_pause_point() with new ordered algorithm - Update refine_all_pause_points() to pass words and use two phases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-29 08:19:03 -06:00
michael	d092800676	fix: treat consolidated AD cues as single segment for buffering Previously, all consolidated cues shared the same pause_point AND resume_from, which caused the overlap video segment to play between each AD cue in a consolidated group. Now consolidated cues are treated as a single AD segment: - All cues in a group share the same pause_point (front buffer once) - Only the LAST cue keeps resume_from (back buffer once) - Other cues have resume_from = pause_point (no video between ADs) This ensures consolidated ADs play seamlessly back-to-back: - Video plays up to pause_point (front buffer) - AD_1 plays - AD_2 plays immediately (no video) - AD_n plays immediately (no video) - Video resumes from resume_from (back buffer) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 23:33:15 -06:00
michael	a3b4db104a	feat: use Gemini's exact pause point for non-dialogue sections When the Whisper analysis detects no speech near a Gemini-recommended pause point, skip the full-gap-overlap algorithm and use the exact pause point with no overlap (pause_point == resume_from). This handles cases where Gemini chose a pause point in a silent or music-only section of the video - there's no dialogue to buffer around, so we simply pause and resume at the exact same point. Three outcomes now in snap_pause_point(): 1. No speech nearby → exact pause point, no overlap, no warning 2. Speech but no sentence break → warning (existing behavior) 3. Sentence break found → full-gap-overlap (existing behavior) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 23:13:12 -06:00
michael	12cae0919a	feat: implement full-gap-overlap algorithm for AD pause insertion Changes pause point calculation to use the entire gap between sentences as a buffer on BOTH sides of the audio description: - pause_point: Just BEFORE next sentence starts (gap.end - 50ms) - resume_from: Just AFTER previous sentence ends (gap.start + 50ms) This means a small portion of video plays twice (the gap duration), but creates a much more natural listening experience by maximizing the breathing room around audio descriptions. Changes: - whisper_service.py: snap_pause_point() now returns (pause_point, resume_from) - video_renderer.py: Uses resume_from for current_time after freeze segment - vtt_retimer.py: Calculates effective_offset including overlap duration - accessible_video.py: Added resume_from field to ADPlacementCue schema 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 22:51:49 -06:00
michael	dd7ac2e15c	debug: add logging for pause-insert video rendering Logs pause point placements, segment creation, and final segment calculation to help diagnose the 30s black footage issue. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 22:39:33 -06:00
michael	504e525a1f	feat: dynamic pause point buffer based on gap duration Instead of a fixed 175ms buffer, the pause point is now placed halfway between the end of the sentence and the start of the next word. If the half-gap exceeds 2 seconds, uses 500ms instead. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 22:33:07 -06:00
michael	01c96da95c	feat: use all available CPU cores for Whisper transcription Dynamically detects CPU count with os.cpu_count() instead of hardcoded 4 threads. Falls back to 4 if detection fails. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 21:59:31 -06:00
michael	dc78dc6fb5	feat: add detailed logging for Whisper model and processing time - Log model name explicitly when loading and transcribing - Log model load time - Log transcription processing time - Helps verify correct model is being used 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 21:58:41 -06:00
michael	582f9f066e	feat: expand Whisper search window to ±30s for sentence boundaries 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 17:34:30 -06:00
michael	c605cd1a88	feat: consolidate AD cues with pause points within 5s of each other If consecutive AD cues have pause points within 5 seconds, they now play back-to-back at the same pause point. This prevents AD from being inserted mid-sentence when cues are close together. Adds _consolidate_close_cues() method and consolidation_threshold parameter to refine_all_pause_points(). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 16:15:52 -06:00
michael	0647c9c112	feat: expand Whisper search window to ±20s for sentence boundaries Increases the search window from ±10s to ±20s to maximize the chance of finding a proper sentence ending and avoid falling back to Gemini's potentially imprecise pause points. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 15:51:42 -06:00
michael	407cc662e8	fix: insert first AD cue at video start if no sentence break found When the first AD cue (index 0) cannot find a sentence boundary within the ±10s search window, insert the AD at T=0:00 instead of using the potentially mid-sentence Gemini pause point. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 15:27:04 -06:00
michael	8806289eca	feat: improve pause point precision with sentence boundary detection - Update Gemini prompt to require transcription with precise timestamps - Add sentence_boundaries output field for validation - Add pause_point_rationale field to explain each pause point choice - Emphasize terminal punctuation only (., ?, !) - never commas - Expand Whisper search window from ±5s to ±10s - Increase post-pause buffer from 50ms to 175ms 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 14:41:12 -06:00
michael	e25a0d6ad0	feat: search +/-5s for sentence breaks only (no phrase breaks) Updated pause point algorithm: - Search range: 5 seconds BEFORE to 5 seconds AFTER Gemini pause point - ONLY considers sentence breaks (after periods, !, ?) - not phrase breaks - Chooses the closest sentence break to the Gemini pause point This ensures audio descriptions are inserted at natural sentence boundaries, not in the middle of sentences after commas. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 09:28:02 -06:00
michael	7760da1e1c	fix: snap pause point to start of gap instead of end The pause point algorithm was snapping to gap.end (start of next word), which caused the first word after the pause to be cut off. Changed to snap to gap.start (end of previous word) instead. Now the video pauses right after a word finishes, the AD plays during the silence gap, and the next word plays in full when video resumes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 09:07:03 -06:00
michael	05bde8326d	feat: add Whisper-based pause point refinement for audio descriptions Implements word-level speech analysis using faster-whisper to refine AD pause points. Gemini's timestamps are snapped to natural speech gaps (sentence/phrase boundaries) to prevent pauses mid-word. Key changes: - Add WhisperService for transcription and gap detection - Add dedicated Celery task routed to 'whisper' queue - Integrate refinement into render_accessible_video task - Cache Whisper transcripts in MongoDB for reuse across languages - Add dedicated whisper-worker with concurrency=1 to prevent OOM Configuration: - Uses faster-whisper 'base' model (multilingual, ~145MB) - 5-second search window after Gemini's recommended point - Falls back to original timestamp if no gap found Infrastructure: - New Docker stage: whisper-worker - New Cloud Run service: accessible-video-whisper-worker - Updated docker-compose.yml with whisper-worker service 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 08:27:48 -06:00
michael	e5ff124140	fix: use allow_join_result for celery subtask result retrieval Celery doesn't allow calling result.get() within a task by default to prevent deadlocks. Use allow_join_result() context manager since we've already confirmed the task is complete via ready() polling. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 18:09:37 -06:00
michael	bf1c321088	feat: add dedicated ffmpeg queue to prevent server overload Add a dedicated Celery queue (ffmpeg) with concurrency=1 to serialize all FFmpeg operations. This prevents CPU spikes when multiple render tasks run in parallel with multiple languages. Changes: - Add ffmpeg_operations.py with run_ffmpeg_command and run_ffprobe_command tasks - Update VideoRendererService to dispatch ffmpeg commands via the queue - Add ffmpeg-worker service to docker-compose with --concurrency=1 - Configure main worker to exclude the ffmpeg queue 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 17:56:23 -06:00
michael	fd68d1ef54	feat: add accessible video validation, remove AI confidence check - Add validation for accessible_video_gcs (file exists, size 0.1MB-5GB) - Add validation for retimed_captions_vtt_gcs when accessible video exists - Add AD Videos count to asset validation panel - Include retimed captions in VTT file count - Remove AI confidence from validation panel and backend checks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 16:41:57 -06:00
michael	54667fbcb8	fix: resolve audio/video sync issues in accessible video renderer - Update _get_video_properties() to extract audio sample_rate, channels, and pix_fmt in addition to video properties - Add _extract_segment_reencoded() for frame-accurate cuts using re-encoding instead of stream copy (fixes keyframe-only cut limitation) - Add _create_freeze_segment_matched() to enforce source audio property matching (fixes silent pauses caused by sample rate mismatch) - Update _render_pause_insert_method() to use new methods with uniform encoding parameters - Add -video_track_timescale 90000 for consistent timebase across segments Root causes fixed: 1. -c copy could only cut at keyframes, causing audio dropouts 2. Sample rate mismatch (48kHz source vs 44.1kHz MP3) caused silent freeze-frame segments when concatenated 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 12:05:32 -06:00
michael	80d3866d32	feat: add accessible video (MP4 with embedded audio descriptions) Add new deliverable type that renders video with audio descriptions embedded. Supports two AI-determined methods: - Direct Overlay: ducks original audio and overlays AD TTS (for minimal dialogue) - Pause-Insert: freeze-frame video, insert AD, re-time subtitles (for significant dialogue) Backend: - Add Pydantic schemas for Gemini analysis response - Add Gemini prompt and analyze_accessible_video_placement() method - Add video_renderer.py service using FFmpeg for both rendering methods - Add vtt_retimer.py service for pause-insert subtitle adjustment - Add render_accessible_video.py Celery task - Modify TTS service to return individual per-cue segments - Update translate_and_synthesize.py to save segments and trigger rendering - Update download endpoint to include accessible video outputs Frontend: - Add accessible_video_mp4 checkbox to NewJob form - Update TypeScript types for new deliverable 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 11:06:41 -06:00
michael	865fcdc246	feat: add TTS settings panel with model, speed, and style options - Add model selection (flash vs pro) for quality control - Add speed slider (0.5x - 2.0x) for pacing adjustment - Add style presets (neutral, calm, energetic, professional, warm, documentary) - Add custom style prompt option for advanced customization - New /tts/options endpoint returns available TTS options - Voice preview now tests all settings so users hear exact output - Backward compatible: all new fields have sensible defaults 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 15:22:14 -06:00
michael	29643f6683	upgrade TTS to Gemini TTS with voice selection and preview - Add Gemini TTS service with 30 voices and 24 languages - Add TTS API endpoints for voice listing and preview - Add per-language voice selection in job creation form - Add voice override at QC approval stage - Add VoiceSelector and VoicePreviewButton components - Update TTSPreferences model with provider and voice mapping 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 14:41:57 -06:00
michael	46b6f25fd0	upgrade to Gemini 3 Pro preview model - Change model from gemini-2.5-pro to gemini-3-pro-preview - Upgrade google-genai package from ^1.31.0 to ^1.56.0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 14:02:02 -06:00
michael	58a4f1f627	add support for non-English original video uploads - Upload form now has "English / Different language" radio with optional language hint - Gemini auto-detects language and saves outputs to outputs.{detected_language} - QC review dynamically loads/saves VTT for source language - New APPROVED_SOURCE status for non-English videos (APPROVED_ENGLISH kept for backwards compat) - Translation pipeline reads from source language and passes source_language to Google Translate - All existing English jobs continue to work unchanged 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 10:33:58 -06:00
michael	665b49c3f1	added MSAL microsoft authentication	2025-10-10 09:19:39 -05:00
michael	c2ed1429c9	better tts config for worker	2025-10-08 18:47:28 -05:00
michael	1a1ed3048d	wrote docker files and deployment instructions	2025-10-08 16:00:12 -05:00
michael	de61d0bd39	removed mongodb change stream monitoring, added global websockets monitoring for notifications, broke symmetry between toasts and persistent notifications (and refined which notifications get sent and how)	2025-08-25 15:48:18 -05:00

1 2

52 commits