Commit graph

69 commits

Author SHA1 Message Date
michael
dd7ac2e15c debug: add logging for pause-insert video rendering
Logs pause point placements, segment creation, and final segment
calculation to help diagnose the 30s black footage issue.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 22:39:33 -06:00
michael
54638d1065 feat: switch Whisper model from large-v3 to medium
Medium model is faster and uses less memory (~1.5GB vs ~3GB)
while still providing good multilingual transcription quality.

Updated in:
- config.py
- docker-compose.yml
- whisper-worker-service.yaml
- cloudbuild.yaml
- Dockerfile (pre-download)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 22:35:47 -06:00
michael
504e525a1f feat: dynamic pause point buffer based on gap duration
Instead of a fixed 175ms buffer, the pause point is now placed
halfway between the end of the sentence and the start of the
next word. If the half-gap exceeds 2 seconds, uses 500ms instead.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 22:33:07 -06:00
michael
01c96da95c feat: use all available CPU cores for Whisper transcription
Dynamically detects CPU count with os.cpu_count() instead of
hardcoded 4 threads. Falls back to 4 if detection fails.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 21:59:31 -06:00
michael
dc78dc6fb5 feat: add detailed logging for Whisper model and processing time
- Log model name explicitly when loading and transcribing
- Log model load time
- Log transcription processing time
- Helps verify correct model is being used

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 21:58:41 -06:00
michael
3538dea47f fix: update whisper_max_search_window to 30s in config.py
The setting in config.py (5.0) was overriding the default in
whisper_service.py (30.0). Now both are consistent at 30s.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 21:53:57 -06:00
michael
4f82fad5dd feat: pre-download Whisper large-v3 model during Docker build
Downloads the model (~3GB) at build time to avoid cold start delays.
Also updated comment to reflect large-v3 memory usage (~4-6GB).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 21:25:44 -06:00
michael
614ff841fe feat: upgrade Whisper model from base to large-v3
Uses the multilingual large model for more accurate transcription
and sentence boundary detection.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 21:20:03 -06:00
michael
582f9f066e feat: expand Whisper search window to ±30s for sentence boundaries
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 17:34:30 -06:00
michael
c605cd1a88 feat: consolidate AD cues with pause points within 5s of each other
If consecutive AD cues have pause points within 5 seconds, they now
play back-to-back at the same pause point. This prevents AD from being
inserted mid-sentence when cues are close together.

Adds _consolidate_close_cues() method and consolidation_threshold
parameter to refine_all_pause_points().

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 16:15:52 -06:00
michael
0647c9c112 feat: expand Whisper search window to ±20s for sentence boundaries
Increases the search window from ±10s to ±20s to maximize the chance
of finding a proper sentence ending and avoid falling back to Gemini's
potentially imprecise pause points.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 15:51:42 -06:00
michael
407cc662e8 fix: insert first AD cue at video start if no sentence break found
When the first AD cue (index 0) cannot find a sentence boundary within
the ±10s search window, insert the AD at T=0:00 instead of using the
potentially mid-sentence Gemini pause point.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 15:27:04 -06:00
michael
8806289eca feat: improve pause point precision with sentence boundary detection
- Update Gemini prompt to require transcription with precise timestamps
- Add sentence_boundaries output field for validation
- Add pause_point_rationale field to explain each pause point choice
- Emphasize terminal punctuation only (., ?, !) - never commas
- Expand Whisper search window from ±5s to ±10s
- Increase post-pause buffer from 50ms to 175ms

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 14:41:12 -06:00
michael
3df163fd13 refactor: simplify GCS job deletion to use prefix-based cleanup
Replace 3-stage redundant deletion with single prefix-based approach.
All job files are under {job_id}/ prefix, so listing and deleting by
prefix is simpler and catches all files including new types like
accessible_video.mp4 and ad_cues/*.mp3.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 10:07:43 -06:00
michael
e25a0d6ad0 feat: search +/-5s for sentence breaks only (no phrase breaks)
Updated pause point algorithm:
- Search range: 5 seconds BEFORE to 5 seconds AFTER Gemini pause point
- ONLY considers sentence breaks (after periods, !, ?) - not phrase breaks
- Chooses the closest sentence break to the Gemini pause point

This ensures audio descriptions are inserted at natural sentence
boundaries, not in the middle of sentences after commas.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 09:28:02 -06:00
michael
523ac85a35 fix: pause at start of gap + add explicit whisper_transcribe import
Two fixes:
1. Snap pause point to gap.start (end of previous word) to prevent
   cutting off the first word after the pause
2. Add explicit whisper_transcribe import in celery_worker.py
3. Fix misleading queue log message

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 09:11:29 -06:00
michael
7760da1e1c fix: snap pause point to start of gap instead of end
The pause point algorithm was snapping to gap.end (start of next word),
which caused the first word after the pause to be cut off. Changed to
snap to gap.start (end of previous word) instead.

Now the video pauses right after a word finishes, the AD plays during
the silence gap, and the next word plays in full when video resumes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 09:07:03 -06:00
michael
1c22872e69 fix: use dedicated whisper worker with FFmpeg dispatch pattern
Changed the Whisper transcription to run on dedicated whisper-worker
using the same dispatch pattern as FFmpeg:
1. apply_async() to dispatch to the whisper queue
2. Poll with ready() using async sleep to avoid blocking
3. Use allow_join_result() context manager
4. Get result only after task is ready

This ensures Whisper runs with concurrency=1 on a dedicated worker
to prevent memory overload while still allowing the render task
to wait for results without deadlocking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 08:53:53 -06:00
michael
7b0ebb357c fix: run Whisper transcription inline instead of as subtask
Celery does not allow calling result.get() within a task as it causes
deadlocks. Changed the implementation to run Whisper transcription
directly using asyncio.to_thread() instead of dispatching to a separate
Celery queue.

The Whisper transcript is still cached in MongoDB for reuse across
language variants.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 08:48:41 -06:00
michael
3ca70a7c03 fix: add rendering_video status to MongoDB schema validator
The rendering_video status was defined in job.py and frontend types but
was missing from the MongoDB schema validator, causing document update
failures when jobs transitioned to the rendering_video state.

Changes:
- Add migration script to update existing databases
- Update mongodb-init.js for new database setups

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 08:40:23 -06:00
michael
4d5dceea65 chore: update poetry.lock with faster-whisper dependency
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 08:29:11 -06:00
michael
05bde8326d feat: add Whisper-based pause point refinement for audio descriptions
Implements word-level speech analysis using faster-whisper to refine
AD pause points. Gemini's timestamps are snapped to natural speech gaps
(sentence/phrase boundaries) to prevent pauses mid-word.

Key changes:
- Add WhisperService for transcription and gap detection
- Add dedicated Celery task routed to 'whisper' queue
- Integrate refinement into render_accessible_video task
- Cache Whisper transcripts in MongoDB for reuse across languages
- Add dedicated whisper-worker with concurrency=1 to prevent OOM

Configuration:
- Uses faster-whisper 'base' model (multilingual, ~145MB)
- 5-second search window after Gemini's recommended point
- Falls back to original timestamp if no gap found

Infrastructure:
- New Docker stage: whisper-worker
- New Cloud Run service: accessible-video-whisper-worker
- Updated docker-compose.yml with whisper-worker service

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 08:27:48 -06:00
michael
54799f4662 fix: broadcast WebSocket updates for ingesting and ai_processing status
Previously only the final pending_qc status was broadcast via WebSocket.
Now all intermediate status changes (ingesting, ai_processing) are also
broadcast so the frontend can update in real-time during reprocessing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 07:38:25 -06:00
michael
150a3e27bd fix: include client_id in JobResponse for user filter
The Created By filter dropdown was empty because client_id was not
being returned by the API. Added client_id to JobResponse schema
and included it in the list_jobs response.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 07:28:05 -06:00
michael
46b0f2c092 feat: add filtering, sorting, and table view to All Jobs tab
- Add created_by_name field to JobResponse schema and API
- Batch-fetch user names in list_jobs endpoint for efficiency
- Convert JobsList from card layout to sortable data table
- Add search box (job name, filename, created by user)
- Add user filter dropdown (populated from current jobs)
- Add status filter dropdown (individual statuses from current jobs)
- Add date range filter (All Time, Last 7 Days, Last 30 Days)
- Add sortable columns: Job Name, Created By, Date Created, Status
- Fetch all jobs for full client-side filtering capability
- Add responsive horizontal scroll for mobile

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 07:16:21 -06:00
michael
396e4e74e0 feat: add rendering_video status for accessible video processing
When jobs with accessible video option enabled enter video rendering
phase, the status now transitions to 'rendering_video' so users can
see why processing is taking longer. This provides better visibility
into the video rendering pipeline.

Changes:
- Added RENDERING_VIDEO status to JobStatus enum
- Updated render_accessible_video task to set new status
- Added status display to StatusBadge, jobStatusMessages
- Included new status in JobsList Translation filter group

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 06:49:46 -06:00
michael
901083b426 fix: ensure temp files use shared volume with correct permissions
- Modified render_accessible_video.py to explicitly pass TMPDIR to
  tempfile.TemporaryDirectory() so files are created in shared volume
- Updated docker-compose.yml to run containers as root initially,
  chown /shared-tmp to app:app, then switch to app user for celery
- This ensures both worker containers can access the same temp files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 06:15:45 -06:00
michael
e5ff124140 fix: use allow_join_result for celery subtask result retrieval
Celery doesn't allow calling result.get() within a task by default to
prevent deadlocks. Use allow_join_result() context manager since we've
already confirmed the task is complete via ready() polling.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 18:09:37 -06:00
michael
bf1c321088 feat: add dedicated ffmpeg queue to prevent server overload
Add a dedicated Celery queue (ffmpeg) with concurrency=1 to serialize
all FFmpeg operations. This prevents CPU spikes when multiple render
tasks run in parallel with multiple languages.

Changes:
- Add ffmpeg_operations.py with run_ffmpeg_command and run_ffprobe_command tasks
- Update VideoRendererService to dispatch ffmpeg commands via the queue
- Add ffmpeg-worker service to docker-compose with --concurrency=1
- Configure main worker to exclude the ffmpeg queue

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 17:56:23 -06:00
michael
fd68d1ef54 feat: add accessible video validation, remove AI confidence check
- Add validation for accessible_video_gcs (file exists, size 0.1MB-5GB)
- Add validation for retimed_captions_vtt_gcs when accessible video exists
- Add AD Videos count to asset validation panel
- Include retimed captions in VTT file count
- Remove AI confidence from validation panel and backend checks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 16:41:57 -06:00
michael
3cdea9dfec fix: video review caption sync and event listener issues
- Fix video event listeners not re-attaching when video element remounts
  (add activeTab?.videoUrl to useEffect dependency array)
- Add retimed_captions_vtt to VTT API response for accessible videos
- Use retimed captions for accessible video tab in VideoReviewPlayer

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 16:23:48 -06:00
michael
6effe58dc9 feat: add video review with timestamped notes to Final Review
Add a comprehensive video review feature to the Final Review page that allows
reviewers to watch videos with caption overlays and add timestamped notes.

Backend:
- New ReviewNote model for MongoDB with job_id, asset_key, timestamp, content
- CRUD API endpoints at /jobs/{job_id}/review-notes
- Owner-only edit/delete permissions (admins can bypass)
- Database indexes for efficient querying

Frontend:
- VideoReviewPlayer component with video player and caption overlay
- NotesSidebar for viewing/adding notes with auto-highlight when video reaches timestamp
- SyncedCaptionList with auto-scroll and click-to-seek
- AssetTabs for switching between languages and accessible videos
- React Query hooks with 30s polling for collaborative updates

Features:
- Notes persist to database and are shared across all reviewers
- Notes highlight for 5 seconds when video playback reaches their timestamp
- Click note to seek video to that position
- Pause video to add note at current timestamp
- Accessible videos use retimed captions when available

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 15:30:00 -06:00
michael
81872987cc fix: remove accessible_video_method from downloads response
The method field (overlay/pause_insert) is metadata, not a downloadable
file. Including it in the downloads dict caused the frontend to render
a broken download link.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 12:31:30 -06:00
michael
54667fbcb8 fix: resolve audio/video sync issues in accessible video renderer
- Update _get_video_properties() to extract audio sample_rate, channels,
  and pix_fmt in addition to video properties
- Add _extract_segment_reencoded() for frame-accurate cuts using
  re-encoding instead of stream copy (fixes keyframe-only cut limitation)
- Add _create_freeze_segment_matched() to enforce source audio property
  matching (fixes silent pauses caused by sample rate mismatch)
- Update _render_pause_insert_method() to use new methods with uniform
  encoding parameters
- Add -video_track_timescale 90000 for consistent timebase across segments

Root causes fixed:
1. -c copy could only cut at keyframes, causing audio dropouts
2. Sample rate mismatch (48kHz source vs 44.1kHz MP3) caused silent
   freeze-frame segments when concatenated

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 12:05:32 -06:00
michael
6acb452cfa fix: add render queue to Celery worker
The accessible video render task was being dispatched to the 'render' queue
but no worker was listening to it. Added 'render' to:
- Dockerfile CMD args for worker queue list
- celery_worker.py import and log message

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 11:39:34 -06:00
michael
80d3866d32 feat: add accessible video (MP4 with embedded audio descriptions)
Add new deliverable type that renders video with audio descriptions embedded.
Supports two AI-determined methods:
- Direct Overlay: ducks original audio and overlays AD TTS (for minimal dialogue)
- Pause-Insert: freeze-frame video, insert AD, re-time subtitles (for significant dialogue)

Backend:
- Add Pydantic schemas for Gemini analysis response
- Add Gemini prompt and analyze_accessible_video_placement() method
- Add video_renderer.py service using FFmpeg for both rendering methods
- Add vtt_retimer.py service for pause-insert subtitle adjustment
- Add render_accessible_video.py Celery task
- Modify TTS service to return individual per-cue segments
- Update translate_and_synthesize.py to save segments and trigger rendering
- Update download endpoint to include accessible video outputs

Frontend:
- Add accessible_video_mp4 checkbox to NewJob form
- Update TypeScript types for new deliverable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 11:06:41 -06:00
michael
dad7ea09df fix: generate audio descriptions in the video's detected language
Updated Gemini ingestion prompt to explicitly require:
- Detect the spoken language first
- Write ALL outputs (summary, transcript, captions, audio_description) in that language
- Do NOT translate to English - keep everything in the original language

This fixes the issue where German videos would get English audio descriptions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 19:01:14 -06:00
michael
865fcdc246 feat: add TTS settings panel with model, speed, and style options
- Add model selection (flash vs pro) for quality control
- Add speed slider (0.5x - 2.0x) for pacing adjustment
- Add style presets (neutral, calm, energetic, professional, warm, documentary)
- Add custom style prompt option for advanced customization
- New /tts/options endpoint returns available TTS options
- Voice preview now tests all settings so users hear exact output
- Backward compatible: all new fields have sensible defaults

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 15:22:14 -06:00
michael
093b55c473 fix: add ffmpeg to API container for TTS audio conversion
The Gemini TTS service uses pydub which requires ffmpeg to convert
audio formats. Previously only the Worker container had ffmpeg.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 14:55:14 -06:00
michael
3804692092 fix: correct import path for get_current_user in routes_tts
The import was using a non-existent module path `..deps` instead of
`...core.dependencies`, causing the API container to fail on startup.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 14:49:34 -06:00
michael
29643f6683 upgrade TTS to Gemini TTS with voice selection and preview
- Add Gemini TTS service with 30 voices and 24 languages
- Add TTS API endpoints for voice listing and preview
- Add per-language voice selection in job creation form
- Add voice override at QC approval stage
- Add VoiceSelector and VoicePreviewButton components
- Update TTSPreferences model with provider and voice mapping

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 14:41:57 -06:00
michael
46b6f25fd0 upgrade to Gemini 3 Pro preview model
- Change model from gemini-2.5-pro to gemini-3-pro-preview
- Upgrade google-genai package from ^1.31.0 to ^1.56.0

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 14:02:02 -06:00
michael
e6578e0ccf add approved_source and qc_feedback job statuses to MongoDB schema
- Add migration to update jobs collection validator with new statuses
- Update mongodb-init.js for fresh deployments
- Fix deploy.sh to properly run migrations with 'python migrate.py up'

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 13:12:14 -06:00
michael
58a4f1f627 add support for non-English original video uploads
- Upload form now has "English / Different language" radio with optional language hint
- Gemini auto-detects language and saves outputs to outputs.{detected_language}
- QC review dynamically loads/saves VTT for source language
- New APPROVED_SOURCE status for non-English videos (APPROVED_ENGLISH kept for backwards compat)
- Translation pipeline reads from source language and passes source_language to Google Translate
- All existing English jobs continue to work unchanged

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 10:33:58 -06:00
michael
762d7bcb38 fixed websockets live messaging for updates 2025-10-16 11:46:37 -05:00
michael
d25fb921a1 fixed dates on scheme validator migration 2025-10-10 10:59:20 -05:00
michael
92169d047b added scheme validator 2025-10-10 10:55:54 -05:00
michael
f59f5cf93b fixed front end build errors 2025-10-10 10:26:57 -05:00
michael
aefd559e68 added production user role and made it default for new MSAL users - production can access everything EXCEPT user management - that's only for admin 2025-10-10 10:07:30 -05:00
michael
665b49c3f1 added MSAL microsoft authentication 2025-10-10 09:19:39 -05:00