- Rewrote VTT translation to two-step (text-only → Gemini → apply to original timestamps) preventing caption timing desync
- Added polling fallback for all processing states and Safari visibilitychange WebSocket reconnect
- Added 11 new TTS languages (cs, da, fi, hu, no, sk, sv, es-419, pt-BR, fr-CA)
- Updated caption/AD prompts to DCMP Captioning Key & Description Key standards (line splitting, ♪ music notation, italic tags, caption positioning, ethics guidelines)
- Added descriptive transcript generation (WCAG 2.1 §1.2.1) combining captions + AD into plain text
- Fixed amix normalize=0 to prevent audio loss in rendered videos
- Fixed AD re-timing double-count when source_ms is None
- Fixed cue block numbering to be 1-based in VttEditor and Timeline Preview
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Whisper's snap_pause_point() finds the nearest sentence boundary
independently per cue, which can move a later cue's pause_point before
an earlier cue's. The renderer then sorts by pause_point, producing
non-sequential cue indices in the timeline.
Add a forward monotonicity pass (clamp each pause_point >= previous) at
three layers for defense-in-depth:
- whisper_service: Phase 3 after consolidation
- video_renderer: before temporal sort in _render_pause_insert_method
- rerender_accessible_video: in _build_placements_with_adjustments
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ad_cue_index as secondary sort key when sorting placements, ensuring
that consolidated cues maintain their original VTT order (cue 0 before cue 1).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The re-render task was using pause point coordinates from the accessible
video timeline (which includes freeze frame durations) instead of the
original source video coordinates. This caused pause points to exceed
the source video duration and get clamped incorrectly.
Changes:
- Add source_ms field to PausePointData model to store source video cut point
- Update video_renderer.py to populate source_ms when building pause points
- Update rerender_accessible_video.py to use source_ms for placement calculations
- Apply user adjustments as relative offsets (delta-based adjustment)
- Update API responses and TypeScript types to include source_ms
- Add backward compatibility fallback for jobs without source_ms
Note: Existing jobs need to be re-processed from initial render to populate
the new source_ms field.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Pause points were being stored with source video timestamps instead of
rendered video timeline coordinates. This caused misalignment between
the pause point markers and freeze frame segments in the timeline UI.
Now pause points are calculated from the freeze frame segment start
positions in the rendered timeline, ensuring they align correctly
with the AD audio segments.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Reorder workflow: translations now happen BEFORE QC Review step
- Add language tabs to switch between translated languages in QC
- Add video mode tabs (Original Video / Accessible Video)
- Add interactive timeline preview showing video segments and AD cues
- Enable pause point adjustment with millisecond precision
- Add TTS regeneration queue for selective cue re-synthesis
- Add re-render controls with optional Whisper refinement
- Persist video segments and TTS MP3s to GCS for editability
- Add new RENDERING_QC job status for re-render operations
- Create 5 new API endpoints for accessible video editing
- Add rerender_accessible_video.py Celery task
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The previous implementation incorrectly used _get_video_duration which
in Cloud Run mode uses the cached source video URI instead of actually
measuring the freeze segment files. This caused all freeze segments to
report the source video duration (~78s) instead of their actual duration.
Changed to use _get_video_duration_local directly since freeze segments
are local files and need to be measured directly via ffprobe.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Subtitles were appearing progressively out of sync (~1.0s early per AD)
because the VTT retimer calculated freeze durations theoretically
rather than using actual rendered segment durations.
Changes:
- video_renderer: Measure actual freeze segment duration after creation
- video_renderer: Return updated placements with actual_freeze_duration
- vtt_retimer: Prefer actual_freeze_duration over calculated values
- render_task: Pass actual durations to VTT retimer
This ensures subtitle timing matches the real video timeline regardless
of any FFmpeg encoding variations.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use context manager for AsyncClient instead of caching on singleton.
Each asyncio.run() creates a new event loop, so cached clients bound
to previous event loops fail when reused across jobs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changed from httpx.Client (sync) to httpx.AsyncClient so that
asyncio.gather() actually executes HTTP calls in parallel instead
of blocking the event loop sequentially.
Before: ~5 min for 18 segments (serial HTTP calls despite gather)
After: ~30 sec for 18 segments (truly parallel HTTP calls)
Changes:
- _http_client: httpx.Client -> httpx.AsyncClient
- _call_cloud_run_probe: sync -> async
- _call_cloud_run_endpoint: sync -> async
- Added await to all Cloud Run HTTP calls
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Refactored _render_pause_insert to execute FFmpeg operations in parallel
phases instead of sequentially:
Phase 1: Parallel extraction
- Generate shared silence (once, reused by all)
- Extract ALL video segments simultaneously
- Extract ALL freeze frames simultaneously
- Extract final segment
Phase 2: Parallel audio concatenation
- Concatenate ALL audio tracks (silence + AD + silence) simultaneously
Phase 3: Parallel freeze segment creation
- Create ALL freeze segments simultaneously
Phase 4: Assemble segments in correct order for final concatenation
This reduces render time from ~3 minutes (serial) to ~30 seconds (parallel)
for an 8-cue video when using Cloud Run FFmpeg service.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The /run-ffmpeg Cloud Run endpoint expects command_template field with
ffmpeg command placeholders, not ffmpeg_args. This fixes 422 validation
errors when generating silence audio via Cloud Run.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The /probe endpoint expects 'gcs_uri' but we were sending 'source_gcs_uri'.
Fixed to match the ProbeRequest model in ffmpeg_http_service.py.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cloud Run services are deployed with --no-allow-unauthenticated,
requiring an ID token in the Authorization header.
- Add _get_cloud_run_id_token() helper using google-auth library
- Update whisper_transcribe.py to include Bearer token in Cloud Run calls
- Update video_renderer.py to include Bearer token in FFmpeg Cloud Run calls
The ID token is fetched using the service account credentials
(GOOGLE_APPLICATION_CREDENTIALS) and targets the Cloud Run service URL.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Migrate CPU-intensive workloads to Cloud Run for autoscaling:
- Add Whisper HTTP service (FastAPI) with /transcribe endpoint
- Add FFmpeg HTTP service (FastAPI) with /encode, /probe, /extract-frame, etc.
- Add Dockerfiles for both services (8 vCPU, 32GB RAM, Gen2)
- Add Cloud Build config for CI/CD deployment
- Add Cloud Run service YAML configs with scale-to-zero
- Update whisper_transcribe.py to call Cloud Run when WHISPER_SERVICE_URL set
- Update video_renderer.py to call Cloud Run when FFMPEG_SERVICE_URL set
- Update whisper_service.py for Cloud Run compatibility (no settings dependency)
- Add ffmpeg_service_url and whisper_service_url to config.py
Services scale 0→N based on request load, falling back to local
execution when service URLs are not configured (hybrid mode).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace complex overlap/catch-up logic with simpler approach:
- Snap pause points to midpoint between sentences (not sentence boundaries)
- Add 500ms silence before AND after AD audio during freeze frame
- Resume playback from same midpoint (no overlap, no visual jump-back)
This eliminates audio/visual anomalies caused by the previous algorithm's
complexity around sentence boundary snapping and audio catch-up.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Get video duration BEFORE the render loop (not after)
- Clamp pause_point to 100ms before video end if it exceeds duration
- Add validation in _extract_frame() to verify frame was created
- Add debug logging for frame extraction timestamps
This prevents "Frame file not found" errors when pause points
calculated by Whisper refinement exceed the source video duration.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When a pause point falls between two sentences, the previous algorithm
created a visual jump-back where the video rewound to resume_from after
the AD played. This was distracting to viewers.
New behavior:
- Video plays normally to pause_point
- Freeze frame shows + AD audio plays
- Freeze frame CONTINUES while source audio from [resume_from, pause_point]
plays (the "catch-up" audio)
- Video resumes smoothly from pause_point (no visual jump)
The audio from the overlap region plays twice (once during video, once
during freeze extension) but this is acceptable as it's typically <1s
and provides natural audio context around the AD.
Implementation:
- Add _extract_audio_segment() to extract catch-up audio from source
- Add _concatenate_audio() to join AD + catch-up audio
- Modify render loop to create extended freeze segments with combined audio
- Resume video from pause_point instead of resume_from
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Show last 1500 chars of stderr instead of first 500 to capture actual
error messages (FFmpeg writes version banner first, errors at end)
- Add validation for freeze segment creation:
- Check duration > 0
- Verify frame and audio files exist
- Add debug logging for parameters
This helps diagnose FFmpeg failures that were previously showing only
version/configuration info instead of the actual error.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changes pause point calculation to use the entire gap between sentences
as a buffer on BOTH sides of the audio description:
- pause_point: Just BEFORE next sentence starts (gap.end - 50ms)
- resume_from: Just AFTER previous sentence ends (gap.start + 50ms)
This means a small portion of video plays twice (the gap duration), but
creates a much more natural listening experience by maximizing the
breathing room around audio descriptions.
Changes:
- whisper_service.py: snap_pause_point() now returns (pause_point, resume_from)
- video_renderer.py: Uses resume_from for current_time after freeze segment
- vtt_retimer.py: Calculates effective_offset including overlap duration
- accessible_video.py: Added resume_from field to ADPlacementCue schema
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Logs pause point placements, segment creation, and final segment
calculation to help diagnose the 30s black footage issue.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Celery doesn't allow calling result.get() within a task by default to
prevent deadlocks. Use allow_join_result() context manager since we've
already confirmed the task is complete via ready() polling.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a dedicated Celery queue (ffmpeg) with concurrency=1 to serialize
all FFmpeg operations. This prevents CPU spikes when multiple render
tasks run in parallel with multiple languages.
Changes:
- Add ffmpeg_operations.py with run_ffmpeg_command and run_ffprobe_command tasks
- Update VideoRendererService to dispatch ffmpeg commands via the queue
- Add ffmpeg-worker service to docker-compose with --concurrency=1
- Configure main worker to exclude the ffmpeg queue
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update _get_video_properties() to extract audio sample_rate, channels,
and pix_fmt in addition to video properties
- Add _extract_segment_reencoded() for frame-accurate cuts using
re-encoding instead of stream copy (fixes keyframe-only cut limitation)
- Add _create_freeze_segment_matched() to enforce source audio property
matching (fixes silent pauses caused by sample rate mismatch)
- Update _render_pause_insert_method() to use new methods with uniform
encoding parameters
- Add -video_track_timescale 90000 for consistent timebase across segments
Root causes fixed:
1. -c copy could only cut at keyframes, causing audio dropouts
2. Sample rate mismatch (48kHz source vs 44.1kHz MP3) caused silent
freeze-frame segments when concatenated
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add new deliverable type that renders video with audio descriptions embedded.
Supports two AI-determined methods:
- Direct Overlay: ducks original audio and overlays AD TTS (for minimal dialogue)
- Pause-Insert: freeze-frame video, insert AD, re-time subtitles (for significant dialogue)
Backend:
- Add Pydantic schemas for Gemini analysis response
- Add Gemini prompt and analyze_accessible_video_placement() method
- Add video_renderer.py service using FFmpeg for both rendering methods
- Add vtt_retimer.py service for pause-insert subtitle adjustment
- Add render_accessible_video.py Celery task
- Modify TTS service to return individual per-cue segments
- Update translate_and_synthesize.py to save segments and trigger rendering
- Update download endpoint to include accessible video outputs
Frontend:
- Add accessible_video_mp4 checkbox to NewJob form
- Update TypeScript types for new deliverable
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>