video-accessibility

Author	SHA1	Message	Date
michael	dd7ac2e15c	debug: add logging for pause-insert video rendering Logs pause point placements, segment creation, and final segment calculation to help diagnose the 30s black footage issue. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 22:39:33 -06:00
michael	54638d1065	feat: switch Whisper model from large-v3 to medium Medium model is faster and uses less memory (~1.5GB vs ~3GB) while still providing good multilingual transcription quality. Updated in: - config.py - docker-compose.yml - whisper-worker-service.yaml - cloudbuild.yaml - Dockerfile (pre-download) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 22:35:47 -06:00
michael	504e525a1f	feat: dynamic pause point buffer based on gap duration Instead of a fixed 175ms buffer, the pause point is now placed halfway between the end of the sentence and the start of the next word. If the half-gap exceeds 2 seconds, uses 500ms instead. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 22:33:07 -06:00
michael	01c96da95c	feat: use all available CPU cores for Whisper transcription Dynamically detects CPU count with os.cpu_count() instead of hardcoded 4 threads. Falls back to 4 if detection fails. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 21:59:31 -06:00
michael	dc78dc6fb5	feat: add detailed logging for Whisper model and processing time - Log model name explicitly when loading and transcribing - Log model load time - Log transcription processing time - Helps verify correct model is being used 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 21:58:41 -06:00
michael	3538dea47f	fix: update whisper_max_search_window to 30s in config.py The setting in config.py (5.0) was overriding the default in whisper_service.py (30.0). Now both are consistent at 30s. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 21:53:57 -06:00
michael	4f82fad5dd	feat: pre-download Whisper large-v3 model during Docker build Downloads the model (~3GB) at build time to avoid cold start delays. Also updated comment to reflect large-v3 memory usage (~4-6GB). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 21:25:44 -06:00
michael	614ff841fe	feat: upgrade Whisper model from base to large-v3 Uses the multilingual large model for more accurate transcription and sentence boundary detection. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 21:20:03 -06:00
michael	582f9f066e	feat: expand Whisper search window to ±30s for sentence boundaries 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 17:34:30 -06:00
michael	c605cd1a88	feat: consolidate AD cues with pause points within 5s of each other If consecutive AD cues have pause points within 5 seconds, they now play back-to-back at the same pause point. This prevents AD from being inserted mid-sentence when cues are close together. Adds _consolidate_close_cues() method and consolidation_threshold parameter to refine_all_pause_points(). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 16:15:52 -06:00
michael	0647c9c112	feat: expand Whisper search window to ±20s for sentence boundaries Increases the search window from ±10s to ±20s to maximize the chance of finding a proper sentence ending and avoid falling back to Gemini's potentially imprecise pause points. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 15:51:42 -06:00
michael	407cc662e8	fix: insert first AD cue at video start if no sentence break found When the first AD cue (index 0) cannot find a sentence boundary within the ±10s search window, insert the AD at T=0:00 instead of using the potentially mid-sentence Gemini pause point. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 15:27:04 -06:00
michael	8806289eca	feat: improve pause point precision with sentence boundary detection - Update Gemini prompt to require transcription with precise timestamps - Add sentence_boundaries output field for validation - Add pause_point_rationale field to explain each pause point choice - Emphasize terminal punctuation only (., ?, !) - never commas - Expand Whisper search window from ±5s to ±10s - Increase post-pause buffer from 50ms to 175ms 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 14:41:12 -06:00
michael	3df163fd13	refactor: simplify GCS job deletion to use prefix-based cleanup Replace 3-stage redundant deletion with single prefix-based approach. All job files are under {job_id}/ prefix, so listing and deleting by prefix is simpler and catches all files including new types like accessible_video.mp4 and ad_cues/*.mp3. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 10:07:43 -06:00
michael	e25a0d6ad0	feat: search +/-5s for sentence breaks only (no phrase breaks) Updated pause point algorithm: - Search range: 5 seconds BEFORE to 5 seconds AFTER Gemini pause point - ONLY considers sentence breaks (after periods, !, ?) - not phrase breaks - Chooses the closest sentence break to the Gemini pause point This ensures audio descriptions are inserted at natural sentence boundaries, not in the middle of sentences after commas. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 09:28:02 -06:00
michael	523ac85a35	fix: pause at start of gap + add explicit whisper_transcribe import Two fixes: 1. Snap pause point to gap.start (end of previous word) to prevent cutting off the first word after the pause 2. Add explicit whisper_transcribe import in celery_worker.py 3. Fix misleading queue log message 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 09:11:29 -06:00
michael	7760da1e1c	fix: snap pause point to start of gap instead of end The pause point algorithm was snapping to gap.end (start of next word), which caused the first word after the pause to be cut off. Changed to snap to gap.start (end of previous word) instead. Now the video pauses right after a word finishes, the AD plays during the silence gap, and the next word plays in full when video resumes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 09:07:03 -06:00
michael	1c22872e69	fix: use dedicated whisper worker with FFmpeg dispatch pattern Changed the Whisper transcription to run on dedicated whisper-worker using the same dispatch pattern as FFmpeg: 1. apply_async() to dispatch to the whisper queue 2. Poll with ready() using async sleep to avoid blocking 3. Use allow_join_result() context manager 4. Get result only after task is ready This ensures Whisper runs with concurrency=1 on a dedicated worker to prevent memory overload while still allowing the render task to wait for results without deadlocking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 08:53:53 -06:00
michael	7b0ebb357c	fix: run Whisper transcription inline instead of as subtask Celery does not allow calling result.get() within a task as it causes deadlocks. Changed the implementation to run Whisper transcription directly using asyncio.to_thread() instead of dispatching to a separate Celery queue. The Whisper transcript is still cached in MongoDB for reuse across language variants. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 08:48:41 -06:00
michael	3ca70a7c03	fix: add rendering_video status to MongoDB schema validator The rendering_video status was defined in job.py and frontend types but was missing from the MongoDB schema validator, causing document update failures when jobs transitioned to the rendering_video state. Changes: - Add migration script to update existing databases - Update mongodb-init.js for new database setups 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 08:40:23 -06:00
michael	4d5dceea65	chore: update poetry.lock with faster-whisper dependency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 08:29:11 -06:00
michael	05bde8326d	feat: add Whisper-based pause point refinement for audio descriptions Implements word-level speech analysis using faster-whisper to refine AD pause points. Gemini's timestamps are snapped to natural speech gaps (sentence/phrase boundaries) to prevent pauses mid-word. Key changes: - Add WhisperService for transcription and gap detection - Add dedicated Celery task routed to 'whisper' queue - Integrate refinement into render_accessible_video task - Cache Whisper transcripts in MongoDB for reuse across languages - Add dedicated whisper-worker with concurrency=1 to prevent OOM Configuration: - Uses faster-whisper 'base' model (multilingual, ~145MB) - 5-second search window after Gemini's recommended point - Falls back to original timestamp if no gap found Infrastructure: - New Docker stage: whisper-worker - New Cloud Run service: accessible-video-whisper-worker - Updated docker-compose.yml with whisper-worker service 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 08:27:48 -06:00
michael	54799f4662	fix: broadcast WebSocket updates for ingesting and ai_processing status Previously only the final pending_qc status was broadcast via WebSocket. Now all intermediate status changes (ingesting, ai_processing) are also broadcast so the frontend can update in real-time during reprocessing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 07:38:25 -06:00
michael	150a3e27bd	fix: include client_id in JobResponse for user filter The Created By filter dropdown was empty because client_id was not being returned by the API. Added client_id to JobResponse schema and included it in the list_jobs response. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 07:28:05 -06:00
michael	46b0f2c092	feat: add filtering, sorting, and table view to All Jobs tab - Add created_by_name field to JobResponse schema and API - Batch-fetch user names in list_jobs endpoint for efficiency - Convert JobsList from card layout to sortable data table - Add search box (job name, filename, created by user) - Add user filter dropdown (populated from current jobs) - Add status filter dropdown (individual statuses from current jobs) - Add date range filter (All Time, Last 7 Days, Last 30 Days) - Add sortable columns: Job Name, Created By, Date Created, Status - Fetch all jobs for full client-side filtering capability - Add responsive horizontal scroll for mobile 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 07:16:21 -06:00
michael	396e4e74e0	feat: add rendering_video status for accessible video processing When jobs with accessible video option enabled enter video rendering phase, the status now transitions to 'rendering_video' so users can see why processing is taking longer. This provides better visibility into the video rendering pipeline. Changes: - Added RENDERING_VIDEO status to JobStatus enum - Updated render_accessible_video task to set new status - Added status display to StatusBadge, jobStatusMessages - Included new status in JobsList Translation filter group 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 06:49:46 -06:00
michael	901083b426	fix: ensure temp files use shared volume with correct permissions - Modified render_accessible_video.py to explicitly pass TMPDIR to tempfile.TemporaryDirectory() so files are created in shared volume - Updated docker-compose.yml to run containers as root initially, chown /shared-tmp to app:app, then switch to app user for celery - This ensures both worker containers can access the same temp files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 06:15:45 -06:00
michael	e5ff124140	fix: use allow_join_result for celery subtask result retrieval Celery doesn't allow calling result.get() within a task by default to prevent deadlocks. Use allow_join_result() context manager since we've already confirmed the task is complete via ready() polling. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 18:09:37 -06:00
michael	bf1c321088	feat: add dedicated ffmpeg queue to prevent server overload Add a dedicated Celery queue (ffmpeg) with concurrency=1 to serialize all FFmpeg operations. This prevents CPU spikes when multiple render tasks run in parallel with multiple languages. Changes: - Add ffmpeg_operations.py with run_ffmpeg_command and run_ffprobe_command tasks - Update VideoRendererService to dispatch ffmpeg commands via the queue - Add ffmpeg-worker service to docker-compose with --concurrency=1 - Configure main worker to exclude the ffmpeg queue 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 17:56:23 -06:00
michael	fd68d1ef54	feat: add accessible video validation, remove AI confidence check - Add validation for accessible_video_gcs (file exists, size 0.1MB-5GB) - Add validation for retimed_captions_vtt_gcs when accessible video exists - Add AD Videos count to asset validation panel - Include retimed captions in VTT file count - Remove AI confidence from validation panel and backend checks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 16:41:57 -06:00
michael	3cdea9dfec	fix: video review caption sync and event listener issues - Fix video event listeners not re-attaching when video element remounts (add activeTab?.videoUrl to useEffect dependency array) - Add retimed_captions_vtt to VTT API response for accessible videos - Use retimed captions for accessible video tab in VideoReviewPlayer 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 16:23:48 -06:00
michael	6effe58dc9	feat: add video review with timestamped notes to Final Review Add a comprehensive video review feature to the Final Review page that allows reviewers to watch videos with caption overlays and add timestamped notes. Backend: - New ReviewNote model for MongoDB with job_id, asset_key, timestamp, content - CRUD API endpoints at /jobs/{job_id}/review-notes - Owner-only edit/delete permissions (admins can bypass) - Database indexes for efficient querying Frontend: - VideoReviewPlayer component with video player and caption overlay - NotesSidebar for viewing/adding notes with auto-highlight when video reaches timestamp - SyncedCaptionList with auto-scroll and click-to-seek - AssetTabs for switching between languages and accessible videos - React Query hooks with 30s polling for collaborative updates Features: - Notes persist to database and are shared across all reviewers - Notes highlight for 5 seconds when video playback reaches their timestamp - Click note to seek video to that position - Pause video to add note at current timestamp - Accessible videos use retimed captions when available 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 15:30:00 -06:00
michael	81872987cc	fix: remove accessible_video_method from downloads response The method field (overlay/pause_insert) is metadata, not a downloadable file. Including it in the downloads dict caused the frontend to render a broken download link. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 12:31:30 -06:00
michael	54667fbcb8	fix: resolve audio/video sync issues in accessible video renderer - Update _get_video_properties() to extract audio sample_rate, channels, and pix_fmt in addition to video properties - Add _extract_segment_reencoded() for frame-accurate cuts using re-encoding instead of stream copy (fixes keyframe-only cut limitation) - Add _create_freeze_segment_matched() to enforce source audio property matching (fixes silent pauses caused by sample rate mismatch) - Update _render_pause_insert_method() to use new methods with uniform encoding parameters - Add -video_track_timescale 90000 for consistent timebase across segments Root causes fixed: 1. -c copy could only cut at keyframes, causing audio dropouts 2. Sample rate mismatch (48kHz source vs 44.1kHz MP3) caused silent freeze-frame segments when concatenated 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 12:05:32 -06:00
michael	6acb452cfa	fix: add render queue to Celery worker The accessible video render task was being dispatched to the 'render' queue but no worker was listening to it. Added 'render' to: - Dockerfile CMD args for worker queue list - celery_worker.py import and log message 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 11:39:34 -06:00
michael	80d3866d32	feat: add accessible video (MP4 with embedded audio descriptions) Add new deliverable type that renders video with audio descriptions embedded. Supports two AI-determined methods: - Direct Overlay: ducks original audio and overlays AD TTS (for minimal dialogue) - Pause-Insert: freeze-frame video, insert AD, re-time subtitles (for significant dialogue) Backend: - Add Pydantic schemas for Gemini analysis response - Add Gemini prompt and analyze_accessible_video_placement() method - Add video_renderer.py service using FFmpeg for both rendering methods - Add vtt_retimer.py service for pause-insert subtitle adjustment - Add render_accessible_video.py Celery task - Modify TTS service to return individual per-cue segments - Update translate_and_synthesize.py to save segments and trigger rendering - Update download endpoint to include accessible video outputs Frontend: - Add accessible_video_mp4 checkbox to NewJob form - Update TypeScript types for new deliverable 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 11:06:41 -06:00
michael	dad7ea09df	fix: generate audio descriptions in the video's detected language Updated Gemini ingestion prompt to explicitly require: - Detect the spoken language first - Write ALL outputs (summary, transcript, captions, audio_description) in that language - Do NOT translate to English - keep everything in the original language This fixes the issue where German videos would get English audio descriptions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 19:01:14 -06:00
michael	865fcdc246	feat: add TTS settings panel with model, speed, and style options - Add model selection (flash vs pro) for quality control - Add speed slider (0.5x - 2.0x) for pacing adjustment - Add style presets (neutral, calm, energetic, professional, warm, documentary) - Add custom style prompt option for advanced customization - New /tts/options endpoint returns available TTS options - Voice preview now tests all settings so users hear exact output - Backward compatible: all new fields have sensible defaults 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 15:22:14 -06:00
michael	093b55c473	fix: add ffmpeg to API container for TTS audio conversion The Gemini TTS service uses pydub which requires ffmpeg to convert audio formats. Previously only the Worker container had ffmpeg. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 14:55:14 -06:00
michael	3804692092	fix: correct import path for get_current_user in routes_tts The import was using a non-existent module path `..deps` instead of `...core.dependencies`, causing the API container to fail on startup. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 14:49:34 -06:00
michael	29643f6683	upgrade TTS to Gemini TTS with voice selection and preview - Add Gemini TTS service with 30 voices and 24 languages - Add TTS API endpoints for voice listing and preview - Add per-language voice selection in job creation form - Add voice override at QC approval stage - Add VoiceSelector and VoicePreviewButton components - Update TTSPreferences model with provider and voice mapping 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 14:41:57 -06:00
michael	46b6f25fd0	upgrade to Gemini 3 Pro preview model - Change model from gemini-2.5-pro to gemini-3-pro-preview - Upgrade google-genai package from ^1.31.0 to ^1.56.0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 14:02:02 -06:00
michael	e6578e0ccf	add approved_source and qc_feedback job statuses to MongoDB schema - Add migration to update jobs collection validator with new statuses - Update mongodb-init.js for fresh deployments - Fix deploy.sh to properly run migrations with 'python migrate.py up' 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 13:12:14 -06:00
michael	58a4f1f627	add support for non-English original video uploads - Upload form now has "English / Different language" radio with optional language hint - Gemini auto-detects language and saves outputs to outputs.{detected_language} - QC review dynamically loads/saves VTT for source language - New APPROVED_SOURCE status for non-English videos (APPROVED_ENGLISH kept for backwards compat) - Translation pipeline reads from source language and passes source_language to Google Translate - All existing English jobs continue to work unchanged 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 10:33:58 -06:00
michael	762d7bcb38	fixed websockets live messaging for updates	2025-10-16 11:46:37 -05:00
michael	d25fb921a1	fixed dates on scheme validator migration	2025-10-10 10:59:20 -05:00
michael	92169d047b	added scheme validator	2025-10-10 10:55:54 -05:00
michael	f59f5cf93b	fixed front end build errors	2025-10-10 10:26:57 -05:00
michael	aefd559e68	added production user role and made it default for new MSAL users - production can access everything EXCEPT user management - that's only for admin	2025-10-10 10:07:30 -05:00
michael	665b49c3f1	added MSAL microsoft authentication	2025-10-10 09:19:39 -05:00

1 2

69 commits