Commit graph

13 commits

Author SHA1 Message Date
Vadym Samoilenko
f4ddcce066 fix: resolve QA-reported bugs — MP3/VTT desync, crashes, notifications, and more
**BUG-1 & BUG-2 — Wrong audio plays after re-render / MP3 doesn't match text**
Root cause: audio files were named by index (cue_0.mp3, cue_1.mp3). When a cue
was inserted or deleted, all following indices shifted but old MP3 files kept
their original names, so re-render would play the wrong audio for the wrong cue.
Fix: renamed files to cue_N_CONTENTHASH.mp3 and introduced an ad_cue_manifest
stored in the job document that maps each cue index to its correct GCS URI.
Re-render now reads from the manifest instead of guessing by filename.
Also: editing AD cue text in the VTT editor now automatically queues TTS
regeneration for changed cues — no more silent mismatches.

**BUG-3 — App crash / state desync when uploading VTT or clearing TTS queue**
Fixed handleVttFileUpload to only update local editor state after the server
confirms the save — previously local state was updated first, so a network
error left the UI showing content that wasn't actually saved.
Fixed handleClearRegenerationQueue to only remove items from local state if
the server removal succeeded — previously all items were cleared regardless.

**BUG-4 — AI generates different audio descriptions every time**
Added GenerateContentConfig(temperature=0.2, top_p=0.8, top_k=40) to the
Gemini API call so output is more consistent across runs.

**BUG-5 — On-screen text inconsistently described**
Strengthened the AI prompt rule from a vague suggestion to a mandatory
requirement with an explicit format: "Text on screen reads: [exact text]".
Applied to both gemini_ingestion.md and gemini_ingestion_targeted.md.

**BUG-6 — No notification when re-render finishes**
Added rendering_qc toast notification and a dismissible green banner that
appears in QCDetail when re-render transitions to pending_qc. The banner
auto-dismisses after 10 seconds. Also increased WebSocket reconnect attempts
from 5 to 15 and capped backoff at 60s to prevent falling back to manual refresh.

**BUG-7 — Timeline preview looks accurate but isn't after edits**
Added isStale prop to TimelinePreview. The timeline now shows an amber tint
and "Preview may be outdated" label whenever there are unsaved pause point
changes, pending TTS regenerations, or a new VTT has been uploaded.

**BUG-8 — ElevenLabs API errors break TTS with no fallback**
Added try/except fallback chain in _synthesize_single_cue: if the configured
provider fails, it automatically retries with google, then gemini.

**BUG-9 — Concurrent re-render requests cause race conditions**
Made the PENDING_QC → RENDERING_QC status transition conditional (only
succeeds if the job is still in PENDING_QC). Returns HTTP 409 if a re-render
is already in progress. The completion transition back to PENDING_QC is also
conditional so a cancelled/overridden render doesn't corrupt job state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 13:23:55 +00:00
michael
aa6777d2c2 feat: add QC accessible video review and editing capabilities
- Reorder workflow: translations now happen BEFORE QC Review step
- Add language tabs to switch between translated languages in QC
- Add video mode tabs (Original Video / Accessible Video)
- Add interactive timeline preview showing video segments and AD cues
- Enable pause point adjustment with millisecond precision
- Add TTS regeneration queue for selective cue re-synthesis
- Add re-render controls with optional Whisper refinement
- Persist video segments and TTS MP3s to GCS for editability
- Add new RENDERING_QC job status for re-render operations
- Create 5 new API endpoints for accessible video editing
- Add rerender_accessible_video.py Celery task

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 08:32:27 -06:00
michael
add958008a fix: use actual freeze segment durations for VTT subtitle retiming
Subtitles were appearing progressively out of sync (~1.0s early per AD)
because the VTT retimer calculated freeze durations theoretically
rather than using actual rendered segment durations.

Changes:
- video_renderer: Measure actual freeze segment duration after creation
- video_renderer: Return updated placements with actual_freeze_duration
- vtt_retimer: Prefer actual_freeze_duration over calculated values
- render_task: Pass actual durations to VTT retimer

This ensures subtitle timing matches the real video timeline regardless
of any FFmpeg encoding variations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 15:52:57 -06:00
michael
c512bdc184 feat: use AD VTT pause points instead of Gemini video analysis
Optimize the accessible video workflow by eliminating the dedicated
Gemini video analysis call for pause point estimation. Instead:

- Use AD VTT cue start times as initial pause points for Whisper refinement
- Add user-selectable accessible video method (pause_insert/overlay) at QC approval
- Add bulk approval API endpoint with method selection
- Add method selector UI to QCDetail page
- Add bulk approval modal to QCList for jobs with accessible video

Benefits:
- Eliminates expensive Gemini API call with video upload
- Faster workflow (~5-15 seconds saved per job)
- Cost savings on Gemini video analysis
- User control over accessible video integration method

Backend changes:
- Add accessible_video_method to RequestedOutputs and ApproveSourceRequest
- Add POST /jobs/bulk/approve endpoint
- Replace Gemini call with _build_placements_from_ad_vtt() helper
- Mark analyze_accessible_video_placement() as deprecated

Frontend changes:
- Add method selector radio buttons to QCDetail
- Add bulk approval modal with method selection to QCList
- Update API client and React Query hooks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 19:05:45 -06:00
michael
c1c0b876fc feat: add RENDER_FAILED status with error propagation to GUI
- Add RENDER_FAILED job status for when video rendering fails
- Fix _check_accessible_video_completion to detect failures and transition
  job status accordingly (was stuck in RENDERING_VIDEO forever)
- Store detailed error info in job.error including failed_languages array
- Call completion check after failures to properly update job status
- Broadcast WebSocket notification on render failures

Frontend:
- Add render_failed to JobStatus type and StatusBadge (red styling)
- Add tts_failed and render_failed to JobsList STATUS_LABELS
- Enhance JobDetail error display with:
  - Warning icon and prominent styling
  - Error type and message
  - Failed languages list with per-language errors
  - Timestamp of when error occurred
- Update ProgressIndicator to handle failed states with red dot

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 10:18:27 -06:00
michael
3588d3fa14 refactor: rewrite pause point refinement algorithm with ordered logic
Completely rewrites the Whisper-based pause point refinement to use
a two-phase approach with explicit ordering:

Phase 1 - Individual refinement:
1. Check if pause point is "during speaking" (words within ±2s)
   - If NOT during speaking → use Gemini's exact point, no overlap
2. If during speaking, find nearest sentence boundary
3. Apply appropriate buffering based on context:
   - Case A: First sentence → pause 500ms before sentence starts
   - Case B: Last sentence → pause 500ms after sentence ends
   - Case C: Between sentences → full double buffer (overlap)

Phase 2 - Consolidation (after all refinements):
- Consolidate cues within 5s of each other to play back-to-back

Key changes:
- Add SentenceBoundary dataclass for tracking boundaries with context
- Add _is_during_speaking() helper to detect speech proximity
- Add _find_sentence_boundaries() with longest-gap fallback
- Rewrite snap_pause_point() with new ordered algorithm
- Update refine_all_pause_points() to pass words and use two phases

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 08:19:03 -06:00
michael
ee6a30e7a7 feat: always generate fresh Whisper transcripts (disable caching)
Remove the cached transcript lookup - always run a fresh Whisper
transcription for each accessible video render. This ensures we get
accurate word timestamps for the current video file.

The transcript is still saved to the job document for debugging and
auditing purposes, but it will never be read back for reuse.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 23:25:35 -06:00
michael
1c22872e69 fix: use dedicated whisper worker with FFmpeg dispatch pattern
Changed the Whisper transcription to run on dedicated whisper-worker
using the same dispatch pattern as FFmpeg:
1. apply_async() to dispatch to the whisper queue
2. Poll with ready() using async sleep to avoid blocking
3. Use allow_join_result() context manager
4. Get result only after task is ready

This ensures Whisper runs with concurrency=1 on a dedicated worker
to prevent memory overload while still allowing the render task
to wait for results without deadlocking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 08:53:53 -06:00
michael
7b0ebb357c fix: run Whisper transcription inline instead of as subtask
Celery does not allow calling result.get() within a task as it causes
deadlocks. Changed the implementation to run Whisper transcription
directly using asyncio.to_thread() instead of dispatching to a separate
Celery queue.

The Whisper transcript is still cached in MongoDB for reuse across
language variants.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 08:48:41 -06:00
michael
05bde8326d feat: add Whisper-based pause point refinement for audio descriptions
Implements word-level speech analysis using faster-whisper to refine
AD pause points. Gemini's timestamps are snapped to natural speech gaps
(sentence/phrase boundaries) to prevent pauses mid-word.

Key changes:
- Add WhisperService for transcription and gap detection
- Add dedicated Celery task routed to 'whisper' queue
- Integrate refinement into render_accessible_video task
- Cache Whisper transcripts in MongoDB for reuse across languages
- Add dedicated whisper-worker with concurrency=1 to prevent OOM

Configuration:
- Uses faster-whisper 'base' model (multilingual, ~145MB)
- 5-second search window after Gemini's recommended point
- Falls back to original timestamp if no gap found

Infrastructure:
- New Docker stage: whisper-worker
- New Cloud Run service: accessible-video-whisper-worker
- Updated docker-compose.yml with whisper-worker service

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 08:27:48 -06:00
michael
396e4e74e0 feat: add rendering_video status for accessible video processing
When jobs with accessible video option enabled enter video rendering
phase, the status now transitions to 'rendering_video' so users can
see why processing is taking longer. This provides better visibility
into the video rendering pipeline.

Changes:
- Added RENDERING_VIDEO status to JobStatus enum
- Updated render_accessible_video task to set new status
- Added status display to StatusBadge, jobStatusMessages
- Included new status in JobsList Translation filter group

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 06:49:46 -06:00
michael
901083b426 fix: ensure temp files use shared volume with correct permissions
- Modified render_accessible_video.py to explicitly pass TMPDIR to
  tempfile.TemporaryDirectory() so files are created in shared volume
- Updated docker-compose.yml to run containers as root initially,
  chown /shared-tmp to app:app, then switch to app user for celery
- This ensures both worker containers can access the same temp files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 06:15:45 -06:00
michael
80d3866d32 feat: add accessible video (MP4 with embedded audio descriptions)
Add new deliverable type that renders video with audio descriptions embedded.
Supports two AI-determined methods:
- Direct Overlay: ducks original audio and overlays AD TTS (for minimal dialogue)
- Pause-Insert: freeze-frame video, insert AD, re-time subtitles (for significant dialogue)

Backend:
- Add Pydantic schemas for Gemini analysis response
- Add Gemini prompt and analyze_accessible_video_placement() method
- Add video_renderer.py service using FFmpeg for both rendering methods
- Add vtt_retimer.py service for pause-insert subtitle adjustment
- Add render_accessible_video.py Celery task
- Modify TTS service to return individual per-cue segments
- Update translate_and_synthesize.py to save segments and trigger rendering
- Update download endpoint to include accessible video outputs

Frontend:
- Add accessible_video_mp4 checkbox to NewJob form
- Update TypeScript types for new deliverable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 11:06:41 -06:00