Commit graph

146 commits

Author SHA1 Message Date
Michael Clervi
6881565d75 .env and .gitignore 2026-01-02 16:21:41 +00:00
michael
79440929f4 feat: add Cloud Run HTTP services for Whisper and FFmpeg
Migrate CPU-intensive workloads to Cloud Run for autoscaling:

- Add Whisper HTTP service (FastAPI) with /transcribe endpoint
- Add FFmpeg HTTP service (FastAPI) with /encode, /probe, /extract-frame, etc.
- Add Dockerfiles for both services (8 vCPU, 32GB RAM, Gen2)
- Add Cloud Build config for CI/CD deployment
- Add Cloud Run service YAML configs with scale-to-zero
- Update whisper_transcribe.py to call Cloud Run when WHISPER_SERVICE_URL set
- Update video_renderer.py to call Cloud Run when FFMPEG_SERVICE_URL set
- Update whisper_service.py for Cloud Run compatibility (no settings dependency)
- Add ffmpeg_service_url and whisper_service_url to config.py

Services scale 0→N based on request load, falling back to local
execution when service URLs are not configured (hybrid mode).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 10:12:50 -06:00
michael
a8cc0b65a4 chore: update page title to Video Accessibility Platform
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 10:28:00 -06:00
michael
5e22f15e76 fix: TypeScript errors in JobDetail error display
Use explicit null returns and String() casts for unknown types
from job.error Record to satisfy TypeScript strict mode.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 10:22:20 -06:00
michael
c1c0b876fc feat: add RENDER_FAILED status with error propagation to GUI
- Add RENDER_FAILED job status for when video rendering fails
- Fix _check_accessible_video_completion to detect failures and transition
  job status accordingly (was stuck in RENDERING_VIDEO forever)
- Store detailed error info in job.error including failed_languages array
- Call completion check after failures to properly update job status
- Broadcast WebSocket notification on render failures

Frontend:
- Add render_failed to JobStatus type and StatusBadge (red styling)
- Add tts_failed and render_failed to JobsList STATUS_LABELS
- Enhance JobDetail error display with:
  - Warning icon and prominent styling
  - Error type and message
  - Failed languages list with per-language errors
  - Timestamp of when error occurred
- Update ProgressIndicator to handle failed states with red dot

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 10:18:27 -06:00
michael
0c1c115a8f fix: clarify language selector label
Changed "Target Languages" to "Target Languages for Translation"
for clarity in the new job form.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 09:38:29 -06:00
michael
e0850ca307 fix: include source language in jobs list language count
The Languages column now shows total languages (source + translations)
instead of just translation count, matching other parts of the UI.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 09:33:30 -06:00
michael
77be93b526 perf: parallelize video-native translations with asyncio.gather
Video-native translation mode now processes all target languages in parallel
using asyncio.gather() with a semaphore (max 3 concurrent) for rate limiting.
This significantly reduces total translation time when multiple languages
are selected.

- Add MAX_CONCURRENT_VIDEO_NATIVE constant for rate limiting
- Refactor video-native path to use parallel coroutines
- Keep traditional VTT translation mode sequential
- Handle per-language errors without stopping other translations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 09:21:07 -06:00
michael
d08c20914e fix: use delete_blob() to avoid read-only generation property error
The google-cloud-storage library made blob.generation read-only,
causing job deletion to fail silently (0 GCS files deleted).
Using bucket.delete_blob(name) instead avoids generation checking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 13:57:59 -06:00
michael
d2d8e32819 feat: add video-native translation mode for multi-language content
Add a new "Video Native Mode" translation option that re-processes the
video through Gemini for each target language, generating captions and
audio descriptions directly from visual context. This produces more
natural and culturally appropriate content compared to traditional VTT
text translation.

Changes:
- Add translation_mode field to RequestedOutputs (video_native | traditional)
- Create gemini_ingestion_targeted.md prompt for target language generation
- Add extract_accessibility_targeted() method to Gemini service
- Modify translate_and_synthesize task to handle both translation modes
- Add Translation Mode UI selector in NewJob screen (video_native is default)
- Remove transcreation UI (replaced by video_native mode)
- Remove Google Translate service (replaced by Gemini translation)
- Add LanguageSelector component with searchable dropdown

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 13:50:05 -06:00
michael
81079c2d17 fix: handle race conditions and 404 errors in bulk job deletion
- Deduplicate job IDs to prevent processing same job twice
- Convert GCS blob iterator to list upfront to avoid stale generations
- Clear blob.generation before delete to handle concurrent deletions
- Catch NotFound errors gracefully for already-deleted blobs
- Don't re-raise GCS errors - cleanup failures shouldn't block deletion
- Treat already-deleted jobs as successful (idempotent delete)
- Disable action dropdown during bulk operations in UI
- Show spinner with "Please wait" message during deletion

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 15:46:02 -06:00
michael
e8b940aee8 feat: add TTS_FAILED status and robust error handling for TTS synthesis
Add comprehensive error handling for TTS synthesis failures:

Backend:
- Add TTS_FAILED status to JobStatus enum for failed synthesis jobs
- Add TTSSynthesisError exception with cue index and context tracking
- Improve null-safe error handling in Gemini TTS response parsing
- Add _synthesize_cue_with_retry() with exponential backoff (3 attempts)
- Enhanced error logging with text preview and model context

Frontend:
- Add TTS_FAILED status styling (red badge) in StatusBadge component
- Add tts_failed to JobStatus TypeScript type

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 14:26:07 -06:00
michael
6689778be7 feat: add dedicated TTS worker with parallel per-cue synthesis
Break out TTS synthesis into a dedicated Celery worker (tts queue) with
concurrency=8 for parallel processing. Each AD cue is now synthesized as
a separate task, enabling up to 8 cues to be processed simultaneously.

Key changes:
- Add tts_synthesis.py with synthesize_cue_task for per-cue synthesis
- Refactor translate_and_synthesize.py to dispatch cue tasks in parallel
- Add tts-worker service to docker-compose.yml (concurrency=8)
- Add Cloud Run service config for production deployment

Benefits:
- Parallel synthesis even for single jobs (e.g., 50 cues → 8 concurrent)
- Natural rate limiting across multiple concurrent jobs
- Fault tolerance with per-cue retries and GCS persistence

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 14:23:11 -06:00
michael
b11c3d0d4f fix: rewrite VTT retiming algorithm to prevent captions during AD segments
The VTT retimer had two bugs causing subtitles to display during freeze
periods and become out of sync:

1. Same offset applied to both start and end times (should differ when
   pause falls between them)
2. Cues spanning pause points weren't split (causing captions during freeze)

Changes:
- Add _offset_at() for timestamps AT or AFTER pause points
- Add _offset_before() for timestamps STRICTLY BEFORE pause points
- Add _retime_cue() to split cues at pause points into multiple segments
- Add _filter_short_segments() to remove <100ms segments after splitting
- Rewrite retime_for_pause_insert() to use new helper methods

Example fix for cue 8s-12s with pause at 10s (4s freeze):
- Before: 8s-12s (displayed during freeze!)
- After: 8s-10s + 14s-16s (gap during AD)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 09:01:03 -06:00
michael
eea1c25ab2 feat: enhance VTT editor with timing editing, cue insert/delete
Add comprehensive editing capabilities to the QC Review VTT editor:
- Inline editable timestamp inputs for start/end times
- Insert cue before/after with midpoint timestamp calculation
- Delete cue with confirmation modal
- Real-time per-cue validation warnings for overlaps
- Hover action buttons (insert before, insert after, delete)

Also fixes VTT parser to properly handle empty/invalid timestamps.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 11:46:24 -06:00
michael
0a79a82b04 fix: resolve race condition preventing captions in batch uploads
The video event listener effect had an empty dependency array, causing it
to run only once on mount. For batch uploads where downloads load slower,
the video element wasn't rendered yet when the effect ran, so listeners
were never attached. Adding currentVideoUrl as a dependency ensures the
effect re-runs when the video becomes available.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 10:59:30 -06:00
michael
37593dd4bc refactor: simplify pause point algorithm with midpoint snapping and silence buffers
Replace complex overlap/catch-up logic with simpler approach:
- Snap pause points to midpoint between sentences (not sentence boundaries)
- Add 500ms silence before AND after AD audio during freeze frame
- Resume playback from same midpoint (no overlap, no visual jump-back)

This eliminates audio/visual anomalies caused by the previous algorithm's
complexity around sentence boundary snapping and audio catch-up.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 09:55:40 -06:00
michael
37f5e8d1b0 fix: validate pause points and frame extraction
- Get video duration BEFORE the render loop (not after)
- Clamp pause_point to 100ms before video end if it exceeds duration
- Add validation in _extract_frame() to verify frame was created
- Add debug logging for frame extraction timestamps

This prevents "Frame file not found" errors when pause points
calculated by Whisper refinement exceed the source video duration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 09:32:13 -06:00
michael
40ece78652 feat: implement audio catch-up to eliminate visual jump-back artifact
When a pause point falls between two sentences, the previous algorithm
created a visual jump-back where the video rewound to resume_from after
the AD played. This was distracting to viewers.

New behavior:
- Video plays normally to pause_point
- Freeze frame shows + AD audio plays
- Freeze frame CONTINUES while source audio from [resume_from, pause_point]
  plays (the "catch-up" audio)
- Video resumes smoothly from pause_point (no visual jump)

The audio from the overlap region plays twice (once during video, once
during freeze extension) but this is acceptable as it's typically <1s
and provides natural audio context around the AD.

Implementation:
- Add _extract_audio_segment() to extract catch-up audio from source
- Add _concatenate_audio() to join AD + catch-up audio
- Modify render loop to create extended freeze segments with combined audio
- Resume video from pause_point instead of resume_from

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 09:02:46 -06:00
michael
ce7a1b182f fix: improve FFmpeg error reporting and add input validation
- Show last 1500 chars of stderr instead of first 500 to capture actual
  error messages (FFmpeg writes version banner first, errors at end)
- Add validation for freeze segment creation:
  - Check duration > 0
  - Verify frame and audio files exist
  - Add debug logging for parameters

This helps diagnose FFmpeg failures that were previously showing only
version/configuration info instead of the actual error.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 08:41:32 -06:00
michael
3588d3fa14 refactor: rewrite pause point refinement algorithm with ordered logic
Completely rewrites the Whisper-based pause point refinement to use
a two-phase approach with explicit ordering:

Phase 1 - Individual refinement:
1. Check if pause point is "during speaking" (words within ±2s)
   - If NOT during speaking → use Gemini's exact point, no overlap
2. If during speaking, find nearest sentence boundary
3. Apply appropriate buffering based on context:
   - Case A: First sentence → pause 500ms before sentence starts
   - Case B: Last sentence → pause 500ms after sentence ends
   - Case C: Between sentences → full double buffer (overlap)

Phase 2 - Consolidation (after all refinements):
- Consolidate cues within 5s of each other to play back-to-back

Key changes:
- Add SentenceBoundary dataclass for tracking boundaries with context
- Add _is_during_speaking() helper to detect speech proximity
- Add _find_sentence_boundaries() with longest-gap fallback
- Rewrite snap_pause_point() with new ordered algorithm
- Update refine_all_pause_points() to pass words and use two phases

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 08:19:03 -06:00
michael
d092800676 fix: treat consolidated AD cues as single segment for buffering
Previously, all consolidated cues shared the same pause_point AND
resume_from, which caused the overlap video segment to play between
each AD cue in a consolidated group.

Now consolidated cues are treated as a single AD segment:
- All cues in a group share the same pause_point (front buffer once)
- Only the LAST cue keeps resume_from (back buffer once)
- Other cues have resume_from = pause_point (no video between ADs)

This ensures consolidated ADs play seamlessly back-to-back:
- Video plays up to pause_point (front buffer)
- AD_1 plays
- AD_2 plays immediately (no video)
- AD_n plays immediately (no video)
- Video resumes from resume_from (back buffer)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 23:33:15 -06:00
michael
ee6a30e7a7 feat: always generate fresh Whisper transcripts (disable caching)
Remove the cached transcript lookup - always run a fresh Whisper
transcription for each accessible video render. This ensures we get
accurate word timestamps for the current video file.

The transcript is still saved to the job document for debugging and
auditing purposes, but it will never be read back for reuse.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 23:25:35 -06:00
michael
e8fde7962f chore: increase accessible-video-worker concurrency from 4 to 8
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 23:20:58 -06:00
michael
a3b4db104a feat: use Gemini's exact pause point for non-dialogue sections
When the Whisper analysis detects no speech near a Gemini-recommended
pause point, skip the full-gap-overlap algorithm and use the exact
pause point with no overlap (pause_point == resume_from).

This handles cases where Gemini chose a pause point in a silent or
music-only section of the video - there's no dialogue to buffer
around, so we simply pause and resume at the exact same point.

Three outcomes now in snap_pause_point():
1. No speech nearby → exact pause point, no overlap, no warning
2. Speech but no sentence break → warning (existing behavior)
3. Sentence break found → full-gap-overlap (existing behavior)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 23:13:12 -06:00
michael
12cae0919a feat: implement full-gap-overlap algorithm for AD pause insertion
Changes pause point calculation to use the entire gap between sentences
as a buffer on BOTH sides of the audio description:

- pause_point: Just BEFORE next sentence starts (gap.end - 50ms)
- resume_from: Just AFTER previous sentence ends (gap.start + 50ms)

This means a small portion of video plays twice (the gap duration), but
creates a much more natural listening experience by maximizing the
breathing room around audio descriptions.

Changes:
- whisper_service.py: snap_pause_point() now returns (pause_point, resume_from)
- video_renderer.py: Uses resume_from for current_time after freeze segment
- vtt_retimer.py: Calculates effective_offset including overlap duration
- accessible_video.py: Added resume_from field to ADPlacementCue schema

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 22:51:49 -06:00
michael
dd7ac2e15c debug: add logging for pause-insert video rendering
Logs pause point placements, segment creation, and final segment
calculation to help diagnose the 30s black footage issue.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 22:39:33 -06:00
michael
54638d1065 feat: switch Whisper model from large-v3 to medium
Medium model is faster and uses less memory (~1.5GB vs ~3GB)
while still providing good multilingual transcription quality.

Updated in:
- config.py
- docker-compose.yml
- whisper-worker-service.yaml
- cloudbuild.yaml
- Dockerfile (pre-download)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 22:35:47 -06:00
michael
504e525a1f feat: dynamic pause point buffer based on gap duration
Instead of a fixed 175ms buffer, the pause point is now placed
halfway between the end of the sentence and the start of the
next word. If the half-gap exceeds 2 seconds, uses 500ms instead.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 22:33:07 -06:00
michael
70a07f3732 debug: add console logging for caption display troubleshooting
Logs VTT content fetching, parsing, and current caption state
to help diagnose subtitle overlay issues in final review mode.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 22:29:56 -06:00
michael
01c96da95c feat: use all available CPU cores for Whisper transcription
Dynamically detects CPU count with os.cpu_count() instead of
hardcoded 4 threads. Falls back to 4 if detection fails.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 21:59:31 -06:00
michael
dc78dc6fb5 feat: add detailed logging for Whisper model and processing time
- Log model name explicitly when loading and transcribing
- Log model load time
- Log transcription processing time
- Helps verify correct model is being used

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 21:58:41 -06:00
michael
6baf6cc254 fix: set WHISPER_MODEL env var default to large-v3
Environment variables were overriding config.py with 'base' model.
Updated defaults in:
- docker-compose.yml
- whisper-worker-service.yaml

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 21:56:41 -06:00
michael
3538dea47f fix: update whisper_max_search_window to 30s in config.py
The setting in config.py (5.0) was overriding the default in
whisper_service.py (30.0). Now both are consistent at 30s.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 21:53:57 -06:00
michael
4f82fad5dd feat: pre-download Whisper large-v3 model during Docker build
Downloads the model (~3GB) at build time to avoid cold start delays.
Also updated comment to reflect large-v3 memory usage (~4-6GB).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 21:25:44 -06:00
michael
1395807b23 feat: increase whisper worker memory to 8GB for large-v3 model
- docker-compose.yml: 4G -> 8G limit, 2G -> 4G reservation
- whisper-worker-service.yaml: 4Gi -> 8Gi limit, 2Gi -> 4Gi request
- cloudbuild.yaml: 4Gi -> 8Gi, WHISPER_MODEL=base -> large-v3

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 21:23:29 -06:00
michael
614ff841fe feat: upgrade Whisper model from base to large-v3
Uses the multilingual large model for more accurate transcription
and sentence boundary detection.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 21:20:03 -06:00
michael
582f9f066e feat: expand Whisper search window to ±30s for sentence boundaries
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 17:34:30 -06:00
michael
c605cd1a88 feat: consolidate AD cues with pause points within 5s of each other
If consecutive AD cues have pause points within 5 seconds, they now
play back-to-back at the same pause point. This prevents AD from being
inserted mid-sentence when cues are close together.

Adds _consolidate_close_cues() method and consolidation_threshold
parameter to refine_all_pause_points().

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 16:15:52 -06:00
michael
0647c9c112 feat: expand Whisper search window to ±20s for sentence boundaries
Increases the search window from ±10s to ±20s to maximize the chance
of finding a proper sentence ending and avoid falling back to Gemini's
potentially imprecise pause points.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 15:51:42 -06:00
michael
407cc662e8 fix: insert first AD cue at video start if no sentence break found
When the first AD cue (index 0) cannot find a sentence boundary within
the ±10s search window, insert the AD at T=0:00 instead of using the
potentially mid-sentence Gemini pause point.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 15:27:04 -06:00
michael
8806289eca feat: improve pause point precision with sentence boundary detection
- Update Gemini prompt to require transcription with precise timestamps
- Add sentence_boundaries output field for validation
- Add pause_point_rationale field to explain each pause point choice
- Emphasize terminal punctuation only (., ?, !) - never commas
- Expand Whisper search window from ±5s to ±10s
- Increase post-pause buffer from 50ms to 175ms

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 14:41:12 -06:00
michael
f68bcab667 fix: include accessible video and re-timed captions in bulk download
The "Download All Files" function was missing accessible_video_mp4 and
accessible_captions_vtt files that the backend provides. Updated both
the bulk download in JobsList and the individual Downloads page to
include all available file types.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 11:08:12 -06:00
michael
3df163fd13 refactor: simplify GCS job deletion to use prefix-based cleanup
Replace 3-stage redundant deletion with single prefix-based approach.
All job files are under {job_id}/ prefix, so listing and deleting by
prefix is simpler and catches all files including new types like
accessible_video.mp4 and ad_cues/*.mp3.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 10:07:43 -06:00
michael
e25a0d6ad0 feat: search +/-5s for sentence breaks only (no phrase breaks)
Updated pause point algorithm:
- Search range: 5 seconds BEFORE to 5 seconds AFTER Gemini pause point
- ONLY considers sentence breaks (after periods, !, ?) - not phrase breaks
- Chooses the closest sentence break to the Gemini pause point

This ensures audio descriptions are inserted at natural sentence
boundaries, not in the middle of sentences after commas.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 09:28:02 -06:00
michael
523ac85a35 fix: pause at start of gap + add explicit whisper_transcribe import
Two fixes:
1. Snap pause point to gap.start (end of previous word) to prevent
   cutting off the first word after the pause
2. Add explicit whisper_transcribe import in celery_worker.py
3. Fix misleading queue log message

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 09:11:29 -06:00
michael
7760da1e1c fix: snap pause point to start of gap instead of end
The pause point algorithm was snapping to gap.end (start of next word),
which caused the first word after the pause to be cut off. Changed to
snap to gap.start (end of previous word) instead.

Now the video pauses right after a word finishes, the AD plays during
the silence gap, and the next word plays in full when video resumes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 09:07:03 -06:00
michael
1c22872e69 fix: use dedicated whisper worker with FFmpeg dispatch pattern
Changed the Whisper transcription to run on dedicated whisper-worker
using the same dispatch pattern as FFmpeg:
1. apply_async() to dispatch to the whisper queue
2. Poll with ready() using async sleep to avoid blocking
3. Use allow_join_result() context manager
4. Get result only after task is ready

This ensures Whisper runs with concurrency=1 on a dedicated worker
to prevent memory overload while still allowing the render task
to wait for results without deadlocking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 08:53:53 -06:00
michael
7b0ebb357c fix: run Whisper transcription inline instead of as subtask
Celery does not allow calling result.get() within a task as it causes
deadlocks. Changed the implementation to run Whisper transcription
directly using asyncio.to_thread() instead of dispatching to a separate
Celery queue.

The Whisper transcript is still cached in MongoDB for reuse across
language variants.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 08:48:41 -06:00
michael
3ca70a7c03 fix: add rendering_video status to MongoDB schema validator
The rendering_video status was defined in job.py and frontend types but
was missing from the MongoDB schema validator, causing document update
failures when jobs transitioned to the rendering_video state.

Changes:
- Add migration script to update existing databases
- Update mongodb-init.js for new database setups

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 08:40:23 -06:00