Commit graph

68 commits

Author SHA1 Message Date
Vadym Samoilenko
1563714454 feat(saas): Phase 3 — membership-based authz + Mailgun + job.organization_id
authz.py (new):
- MembershipContext — per-request membership dict for the current user
- get_membership_context FastAPI dependency
- require_org_role(min_role) — dependency factory keyed off org_id path param
- require_platform_admin()
- OrgScopedQuery — adds organization_id filter; platform admin passes through
- bump_user_membership_cache — invalidates Redis key on membership writes

dependencies.py:
- get_accessible_project_ids now queries memberships collection first;
  legacy pm_client_ids / team.member_user_ids fallback retained until migration runs
  (four job-route access checks at lines 608/1054/1181/1538 are fixed via this function)

routes_clients.py:
- _assert_pm_or_admin and _assert_client_access are now async and query memberships
- All 10 call sites updated with await + db arg

emailer.py:
- Switched from SendGrid to Mailgun REST API via httpx (already in requirements)
- _send() is now fully async; same public method signatures preserved
- send_completion_email uses _send()

config.py:
- Added mailgun_api_key, mailgun_domain, mailgun_from settings
- sendgrid_api_key kept with empty default for backward compat

migration_2026-04-28-000003:
- Backfills job.organization_id from project.client_id
- Creates (organization_id, status, created_at) sparse index on jobs

routes_organizations.py / routes_invitations.py:
- Call bump_user_membership_cache after every membership write

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 16:56:42 +01:00
Vadym Samoilenko
00fb1aacc6 feat(saas): Phase 2 — invitation flow, email templates, MS SSO zero-membership
Backend:
- models/invitation.py — Invitation model + create/accept/preview schemas
- routes_invitations.py — org-scoped POST/GET/DELETE + public preview/accept endpoints
  Single-use token via find_one_and_update; sha256(token) stored in DB, plaintext in email URL
- emailer.py — _send() helper; send_invitation_email, send_welcome_email, send_password_reset_email
  send_completion_email refactored to use _send()
- migration_2026-04-28-000002 — creates invitations collection with TTL index (30d audit trail)
- routes_auth.py — new MS SSO users provisioned with zero memberships instead of role=PRODUCTION;
  they land on "no access" page until an admin invites them
- main.py — registers invitations_org_router and invitations_router

Frontend:
- routes/AcceptInvite.tsx — public page at /accept-invite?token=...
  Four states: new user (name+password), existing user (confirm), MS user, already-member
- App.tsx — /accept-invite route outside RequireAuth
- types/api.ts — Invitation, InvitationCreate, InvitationPreview, InvitationAcceptRequest/Response
- lib/api.ts — listInvitations, createInvitation, revokeInvitation, previewInvitation, acceptInvitation
- hooks/useClients.ts — useInvitations, useCreateInvitation, useRevokeInvitation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 16:52:08 +01:00
Vadym Samoilenko
6f1be645ce feat(saas): Phase 0+1 — Organization/Membership entities and dev branch
Introduces the multi-tenant SaaS foundation alongside the existing
client/team/project model (zero-downtime shim period):

Backend:
- app/models/organization.py — Organization + OrgRole enum (OWNER/ADMIN/MANAGER/MEMBER/VIEWER)
- app/models/membership.py — Membership model with MemberDetail for enriched responses
- app/services/membership_service.py — upsert/remove/list/has_org_role helpers
- app/api/v1/routes_organizations.py — /organizations CRUD + /members sub-resource + /me/memberships
- main.py — registers organizations router
- migrations: create memberships collection (unique index) + backfill from pm_client_ids/team members

Frontend:
- types/api.ts — Organization, OrgRole, Membership, OrganizationCreateRequest types; Client marked @deprecated
- hooks/useClients.ts — useOrganizations, useOrganization, useOrgMembers, useAddOrgMember,
  useUpdateOrgMember, useRemoveOrgMember, useMyMemberships
- lib/api.ts — listOrganizations, getOrganization, createOrganization, updateOrganization,
  listOrgMembers, addOrgMember, updateOrgMember, removeOrgMember, getMyMemberships

Reads fall back to the clients collection during transition; all writes go to organizations.
Existing /clients endpoints and hooks are untouched.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 16:46:24 +01:00
Michael Clervi
f7d4624fc7 fix: correct cost tracker API field names and endpoint path
- preflight: use allow (not allowed), deny_reason, project_external_id
- record: units dict {token_input/token_output/char} instead of flat fields
- record: use /usage/record endpoint path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 13:42:29 +00:00
Vadym Samoilenko
ea21cace96 feat: replace SDK with direct HTTP integration to centralized cost tracker
- New services/cost_tracker.py: sync httpx preflight()/record() + async wrappers;
  BudgetExceeded exception; no-op when COST_TRACKER_BASE_URL is empty
- Preflight budget check added before ingestion (Gemini), per-language translation
  (video-native + traditional), and per-language TTS dispatch
- _record_gemini_usage and _record_tts_cost now call cost_tracker directly;
  removes broken asyncio.get_event_loop() hack from sync Celery worker
- Fix: _cost_ctx now threaded into extract_accessibility_targeted (video-native path)
- Fix: user_id/cost_project_id now propagated through dispatch_language_tts →
  synthesize_cue_task.s() and the rerender_accessible_video.py re-render path
- Remove oliver-cost-tracker SDK dependency (was commented-out/never installed)
- Drop cost_tracker_outbox_path setting and get_cost_tracker() factory
- Update COST_TRACKER_BASE_URL default to optical-dev.oliver.solutions in
  .env.prod.example, docker-compose.yml, and all Cloud Run service yamls
- Cloud Run yamls use Secret Manager ref (cost-tracker-api-key) for the API key

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 13:36:15 +01:00
Vadym Samoilenko
ae2c474061 feat: integrate oliver-cost-tracker SDK into video-accessibility
Add AI cost tracking to all Gemini and TTS call sites:

- config.py: add COST_TRACKER_* env vars (base_url, api_key, source_app,
  outbox_path, enabled)
- dependencies.py: add get_cost_tracker() factory (lru_cache, graceful
  degradation if SDK not installed)
- models/job.py: add cost_tracker_project_id field for cost attribution
- services/gemini.py:
  - add import time, _record_gemini_usage() helper (reads usage_metadata)
  - add _cost_ctx kwarg to extract_accessibility, extract_accessibility_targeted,
    transcreate_content, translate_vtt, rewrite_tts_cue
  - record usage after every generate_content call via asyncio.create_task()
- tasks/ingest_and_ai.py: pass _cost_ctx (user_id, job_id, project_id) to
  extract_accessibility
- tasks/translate_and_synthesize.py: build _cost_ctx from job_doc and pass
  to transcreate_content + translate_vtt calls
- tasks/tts_synthesis.py: add user_id + cost_project_id kwargs, add
  _record_tts_cost() helper (records len(text) chars to cost tracker)
- pyproject.toml: document SDK install instructions (comment)
- .env.prod.example: add COST_TRACKER_* vars

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 11:30:46 +01:00
Vadym Samoilenko
8356dbdbfe fix: add charset=utf-8 to VTT content-type to prevent ♪ encoding issues
Without charset specification, browsers/tools interpret text/vtt as
Latin-1, causing UTF-8 multi-byte characters like ♪ (U+266A) to render
as garbled text (♪).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 14:17:16 +00:00
Vadym Samoilenko
6f963ff7c4 feat: DCMP compliance, descriptive transcript, new languages, QA bug fixes
- Rewrote VTT translation to two-step (text-only → Gemini → apply to original timestamps) preventing caption timing desync
- Added polling fallback for all processing states and Safari visibilitychange WebSocket reconnect
- Added 11 new TTS languages (cs, da, fi, hu, no, sk, sv, es-419, pt-BR, fr-CA)
- Updated caption/AD prompts to DCMP Captioning Key & Description Key standards (line splitting, ♪ music notation, italic tags, caption positioning, ethics guidelines)
- Added descriptive transcript generation (WCAG 2.1 §1.2.1) combining captions + AD into plain text
- Fixed amix normalize=0 to prevent audio loss in rendered videos
- Fixed AD re-timing double-count when source_ms is None
- Fixed cue block numbering to be 1-based in VttEditor and Timeline Preview

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 11:50:43 +00:00
Vadym Samoilenko
f4ddcce066 fix: resolve QA-reported bugs — MP3/VTT desync, crashes, notifications, and more
**BUG-1 & BUG-2 — Wrong audio plays after re-render / MP3 doesn't match text**
Root cause: audio files were named by index (cue_0.mp3, cue_1.mp3). When a cue
was inserted or deleted, all following indices shifted but old MP3 files kept
their original names, so re-render would play the wrong audio for the wrong cue.
Fix: renamed files to cue_N_CONTENTHASH.mp3 and introduced an ad_cue_manifest
stored in the job document that maps each cue index to its correct GCS URI.
Re-render now reads from the manifest instead of guessing by filename.
Also: editing AD cue text in the VTT editor now automatically queues TTS
regeneration for changed cues — no more silent mismatches.

**BUG-3 — App crash / state desync when uploading VTT or clearing TTS queue**
Fixed handleVttFileUpload to only update local editor state after the server
confirms the save — previously local state was updated first, so a network
error left the UI showing content that wasn't actually saved.
Fixed handleClearRegenerationQueue to only remove items from local state if
the server removal succeeded — previously all items were cleared regardless.

**BUG-4 — AI generates different audio descriptions every time**
Added GenerateContentConfig(temperature=0.2, top_p=0.8, top_k=40) to the
Gemini API call so output is more consistent across runs.

**BUG-5 — On-screen text inconsistently described**
Strengthened the AI prompt rule from a vague suggestion to a mandatory
requirement with an explicit format: "Text on screen reads: [exact text]".
Applied to both gemini_ingestion.md and gemini_ingestion_targeted.md.

**BUG-6 — No notification when re-render finishes**
Added rendering_qc toast notification and a dismissible green banner that
appears in QCDetail when re-render transitions to pending_qc. The banner
auto-dismisses after 10 seconds. Also increased WebSocket reconnect attempts
from 5 to 15 and capped backoff at 60s to prevent falling back to manual refresh.

**BUG-7 — Timeline preview looks accurate but isn't after edits**
Added isStale prop to TimelinePreview. The timeline now shows an amber tint
and "Preview may be outdated" label whenever there are unsaved pause point
changes, pending TTS regenerations, or a new VTT has been uploaded.

**BUG-8 — ElevenLabs API errors break TTS with no fallback**
Added try/except fallback chain in _synthesize_single_cue: if the configured
provider fails, it automatically retries with google, then gemini.

**BUG-9 — Concurrent re-render requests cause race conditions**
Made the PENDING_QC → RENDERING_QC status transition conditional (only
succeeds if the job is still in PENDING_QC). Returns HTTP 409 if a re-render
is already in progress. The completion transition back to PENDING_QC is also
conditional so a cancelled/overridden render doesn't corrupt job state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 13:23:55 +00:00
Vadym Samoilenko
c413fcb747 feat: add SDH (Subtitles for Deaf and Hard of Hearing) caption output
SDH captions extend standard VTT with speaker identification labels,
sound effects [PHONE RINGS], music notation ♪, and off-screen indicators.

- Add sdh_vtt flag to RequestedOutputs model and frontend form
- Add sdh_captions_vtt_gcs field to LangOutput model
- Inject SDH generation instructions into both Gemini prompts via
  {SDH_FIELD} and {SDH_GUIDELINES} placeholders when requested
- Upload sdh_captions.vtt to GCS in ingest task
- Pass SDH through video_native translation (Gemini generates it directly)
  and traditional translation (translate source SDH VTT via Gemini)
- Expose sdh_captions_vtt in downloads endpoint and bulk zip export

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 15:02:18 +00:00
Vadym Samoilenko
2e8a8dc287 feat: add brand context, ethics guidelines, and improved AD prompt rules
- Add brand_context field (job model, API, frontend form) so clients can
  list brand names present in their video; Gemini uses these names instead
  of generic descriptors (e.g. "Sellotape" not "sticky tape")
- Add ethical guidelines section to both Gemini prompts covering
  person-first language, consistent race/gender description only when
  plot-relevant, no guessing at unconfirmed identity
- Revamp audio description rules: priority ordering (essential →
  high-priority → time-permitting), pre-teaching placement, no cinematic
  jargon, succinct style replacing the former "20% longer" instruction
- Thread brand_context through full stack: routes → job doc → ingest
  task → translate task → both Gemini prompt templates

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 14:46:09 +00:00
Vadym Samoilenko
222826baa7 fix: propagate ElevenLabs voice fetch errors to frontend
- elevenlabs_voices.py: re-raise exception on first fetch failure
  (empty cache) instead of silently returning empty list
- routes_tts.py: catch get_voices() exception and return available=False
  with the error detail; add optional error field to ProviderVoicesResponse
- VoiceSelector: show actual API error message when available=false

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 14:27:45 +00:00
Vadym Samoilenko
1e177a6d5c feat: add ElevenLabs voice selection to frontend and backend
Add dynamic ElevenLabs voice catalog with provider toggle in the UI,
allowing users to browse ElevenLabs voices, configure stability and
similarity boost settings, and preview/synthesize with ElevenLabs TTS.

Backend:
- New elevenlabs_voices.py service with 1-hour cached API fetching
- TTS routes support ?provider= query param for voices and options
- Preview endpoint routes to ElevenLabs or Gemini based on provider
- stability/similarity_boost params flow through TTS synthesis pipeline
- TTSPreferences model extended with ElevenLabs-specific fields
- Deprecated hardcoded elevenlabs_voices config (now fetched dynamically)

Frontend:
- Provider toggle (Gemini/ElevenLabs) in VoiceSelector
- ElevenLabsSettingsPanel with stability and similarity boost sliders
- VoicePreviewButton supports provider-specific preview parameters
- API client passes provider param to voices, options, and preview endpoints
- New VoiceInfo, ProviderVoicesResponse, ProviderOptionsResponse types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 13:58:56 +00:00
michael
030f1b67ee fix: enforce AD cue pause_point monotonicity to preserve cue order
Whisper's snap_pause_point() finds the nearest sentence boundary
independently per cue, which can move a later cue's pause_point before
an earlier cue's. The renderer then sorts by pause_point, producing
non-sequential cue indices in the timeline.

Add a forward monotonicity pass (clamp each pause_point >= previous) at
three layers for defense-in-depth:
- whisper_service: Phase 3 after consolidation
- video_renderer: before temporal sort in _render_pause_insert_method
- rerender_accessible_video: in _build_placements_with_adjustments

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 08:15:06 -06:00
michael
30483b3ec1 fix: preserve cue order when consolidated AD cues share same pause point
Add ad_cue_index as secondary sort key when sorting placements, ensuring
that consolidated cues maintain their original VTT order (cue 0 before cue 1).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 10:15:12 -06:00
michael
a6cd4cde07 fix: store source video coordinates in pause points for correct re-rendering
The re-render task was using pause point coordinates from the accessible
video timeline (which includes freeze frame durations) instead of the
original source video coordinates. This caused pause points to exceed
the source video duration and get clamped incorrectly.

Changes:
- Add source_ms field to PausePointData model to store source video cut point
- Update video_renderer.py to populate source_ms when building pause points
- Update rerender_accessible_video.py to use source_ms for placement calculations
- Apply user adjustments as relative offsets (delta-based adjustment)
- Update API responses and TypeScript types to include source_ms
- Add backward compatibility fallback for jobs without source_ms

Note: Existing jobs need to be re-processed from initial render to populate
the new source_ms field.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 10:48:41 -06:00
michael
d965d1467a fix: use rendered video coordinates for pause point positions
Pause points were being stored with source video timestamps instead of
rendered video timeline coordinates. This caused misalignment between
the pause point markers and freeze frame segments in the timeline UI.

Now pause points are calculated from the freeze frame segment start
positions in the rendered timeline, ensuring they align correctly
with the AD audio segments.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 09:37:31 -06:00
michael
aa6777d2c2 feat: add QC accessible video review and editing capabilities
- Reorder workflow: translations now happen BEFORE QC Review step
- Add language tabs to switch between translated languages in QC
- Add video mode tabs (Original Video / Accessible Video)
- Add interactive timeline preview showing video segments and AD cues
- Enable pause point adjustment with millisecond precision
- Add TTS regeneration queue for selective cue re-synthesis
- Add re-render controls with optional Whisper refinement
- Persist video segments and TTS MP3s to GCS for editability
- Add new RENDERING_QC job status for re-render operations
- Create 5 new API endpoints for accessible video editing
- Add rerender_accessible_video.py Celery task

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 08:32:27 -06:00
michael
c5f59b1079 fix: use local ffprobe for freeze segment duration measurement
The previous implementation incorrectly used _get_video_duration which
in Cloud Run mode uses the cached source video URI instead of actually
measuring the freeze segment files. This caused all freeze segments to
report the source video duration (~78s) instead of their actual duration.

Changed to use _get_video_duration_local directly since freeze segments
are local files and need to be measured directly via ffprobe.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 16:11:03 -06:00
michael
add958008a fix: use actual freeze segment durations for VTT subtitle retiming
Subtitles were appearing progressively out of sync (~1.0s early per AD)
because the VTT retimer calculated freeze durations theoretically
rather than using actual rendered segment durations.

Changes:
- video_renderer: Measure actual freeze segment duration after creation
- video_renderer: Return updated placements with actual_freeze_duration
- vtt_retimer: Prefer actual_freeze_duration over calculated values
- render_task: Pass actual durations to VTT retimer

This ensures subtitle timing matches the real video timeline regardless
of any FFmpeg encoding variations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 15:52:57 -06:00
michael
e44210ea64 feat: auto-rewrite TTS cues that fail synthesis
When TTS synthesis fails after 3 retries, the system now:
- Sends problematic cue text to Gemini for TTS-safe rewriting
- Updates the VTT file in GCS with rewritten text
- Retries TTS synthesis with the new text
- Records successful rewrites in job.tts_rewrites field

UI changes:
- JobDetail shows amber caution box with original/rewritten text
- JobsList shows warning icon next to jobs with rewrites
- Error display clarifies text shown is "after rewrite attempt"

Files changed:
- backend/app/models/job.py: Add tts_rewrites field
- backend/app/prompts/gemini_tts_rewrite.md: New prompt template
- backend/app/services/gemini.py: Add rewrite_tts_cue method
- backend/app/tasks/tts_synthesis.py: Add VTT update utilities
- backend/app/tasks/translate_and_synthesize.py: Rewrite+retry logic
- frontend/src/types/api.ts: Add TTSRewriteItem type
- frontend/src/routes/jobs/JobDetail.tsx: Caution display
- frontend/src/routes/jobs/JobsList.tsx: Warning indicator

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 14:42:50 -06:00
michael
83e4752327 feat: add server-side zip download for bulk job downloads
Replace sequential browser-based bulk download with server-side zip
generation. When users select "Download All Files" from bulk actions,
the system now creates a single organized .zip file containing all
job assets.

Changes:
- Add POST /jobs/bulk/download endpoint that streams zip to client
- Add BulkDownloadRequest schema for the new endpoint
- Create zip_download.py service with streaming zip generation
- Update frontend to call new endpoint and download single zip file
- Organize files in zip by job title and language subdirectories

Zip structure:
accessible_video_YYYYMMDD_HHMMSS.zip
└── {job_title}/
    ├── source.mp4
    └── {lang}/
        ├── captions.vtt
        ├── ad.vtt
        ├── ad.mp3
        ├── accessible_video.mp4
        └── accessible_captions.vtt

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 15:57:57 -06:00
michael
c512bdc184 feat: use AD VTT pause points instead of Gemini video analysis
Optimize the accessible video workflow by eliminating the dedicated
Gemini video analysis call for pause point estimation. Instead:

- Use AD VTT cue start times as initial pause points for Whisper refinement
- Add user-selectable accessible video method (pause_insert/overlay) at QC approval
- Add bulk approval API endpoint with method selection
- Add method selector UI to QCDetail page
- Add bulk approval modal to QCList for jobs with accessible video

Benefits:
- Eliminates expensive Gemini API call with video upload
- Faster workflow (~5-15 seconds saved per job)
- Cost savings on Gemini video analysis
- User control over accessible video integration method

Backend changes:
- Add accessible_video_method to RequestedOutputs and ApproveSourceRequest
- Add POST /jobs/bulk/approve endpoint
- Replace Gemini call with _build_placements_from_ad_vtt() helper
- Mark analyze_accessible_video_placement() as deprecated

Frontend changes:
- Add method selector radio buttons to QCDetail
- Add bulk approval modal with method selection to QCList
- Update API client and React Query hooks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 19:05:45 -06:00
michael
5342ab1a28 fix: prevent event loop closed error in video renderer Cloud Run calls
Use context manager for AsyncClient instead of caching on singleton.
Each asyncio.run() creates a new event loop, so cached clients bound
to previous event loops fail when reused across jobs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 08:44:27 -06:00
michael
3e2099515a fix: use async httpx client for true parallel Cloud Run calls
Changed from httpx.Client (sync) to httpx.AsyncClient so that
asyncio.gather() actually executes HTTP calls in parallel instead
of blocking the event loop sequentially.

Before: ~5 min for 18 segments (serial HTTP calls despite gather)
After: ~30 sec for 18 segments (truly parallel HTTP calls)

Changes:
- _http_client: httpx.Client -> httpx.AsyncClient
- _call_cloud_run_probe: sync -> async
- _call_cloud_run_endpoint: sync -> async
- Added await to all Cloud Run HTTP calls

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 08:11:46 -06:00
michael
e2302d497d perf: parallelize FFmpeg Cloud Run calls using asyncio.gather()
Refactored _render_pause_insert to execute FFmpeg operations in parallel
phases instead of sequentially:

Phase 1: Parallel extraction
- Generate shared silence (once, reused by all)
- Extract ALL video segments simultaneously
- Extract ALL freeze frames simultaneously
- Extract final segment

Phase 2: Parallel audio concatenation
- Concatenate ALL audio tracks (silence + AD + silence) simultaneously

Phase 3: Parallel freeze segment creation
- Create ALL freeze segments simultaneously

Phase 4: Assemble segments in correct order for final concatenation

This reduces render time from ~3 minutes (serial) to ~30 seconds (parallel)
for an 8-cue video when using Cloud Run FFmpeg service.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 17:18:23 -06:00
michael
87a4b1ab77 fix: use command_template instead of ffmpeg_args in _generate_silence_cloud_run
The /run-ffmpeg Cloud Run endpoint expects command_template field with
ffmpeg command placeholders, not ffmpeg_args. This fixes 422 validation
errors when generating silence audio via Cloud Run.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 16:57:17 -06:00
michael
e68bac2f60 fix: correct FFmpeg probe request parameter name
The /probe endpoint expects 'gcs_uri' but we were sending 'source_gcs_uri'.
Fixed to match the ProbeRequest model in ffmpeg_http_service.py.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 16:43:01 -06:00
michael
7d2366d0f4 fix: add authentication for Cloud Run service calls
Cloud Run services are deployed with --no-allow-unauthenticated,
requiring an ID token in the Authorization header.

- Add _get_cloud_run_id_token() helper using google-auth library
- Update whisper_transcribe.py to include Bearer token in Cloud Run calls
- Update video_renderer.py to include Bearer token in FFmpeg Cloud Run calls

The ID token is fetched using the service account credentials
(GOOGLE_APPLICATION_CREDENTIALS) and targets the Cloud Run service URL.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 11:41:07 -06:00
michael
79440929f4 feat: add Cloud Run HTTP services for Whisper and FFmpeg
Migrate CPU-intensive workloads to Cloud Run for autoscaling:

- Add Whisper HTTP service (FastAPI) with /transcribe endpoint
- Add FFmpeg HTTP service (FastAPI) with /encode, /probe, /extract-frame, etc.
- Add Dockerfiles for both services (8 vCPU, 32GB RAM, Gen2)
- Add Cloud Build config for CI/CD deployment
- Add Cloud Run service YAML configs with scale-to-zero
- Update whisper_transcribe.py to call Cloud Run when WHISPER_SERVICE_URL set
- Update video_renderer.py to call Cloud Run when FFMPEG_SERVICE_URL set
- Update whisper_service.py for Cloud Run compatibility (no settings dependency)
- Add ffmpeg_service_url and whisper_service_url to config.py

Services scale 0→N based on request load, falling back to local
execution when service URLs are not configured (hybrid mode).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 10:12:50 -06:00
michael
d2d8e32819 feat: add video-native translation mode for multi-language content
Add a new "Video Native Mode" translation option that re-processes the
video through Gemini for each target language, generating captions and
audio descriptions directly from visual context. This produces more
natural and culturally appropriate content compared to traditional VTT
text translation.

Changes:
- Add translation_mode field to RequestedOutputs (video_native | traditional)
- Create gemini_ingestion_targeted.md prompt for target language generation
- Add extract_accessibility_targeted() method to Gemini service
- Modify translate_and_synthesize task to handle both translation modes
- Add Translation Mode UI selector in NewJob screen (video_native is default)
- Remove transcreation UI (replaced by video_native mode)
- Remove Google Translate service (replaced by Gemini translation)
- Add LanguageSelector component with searchable dropdown

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 13:50:05 -06:00
michael
e8b940aee8 feat: add TTS_FAILED status and robust error handling for TTS synthesis
Add comprehensive error handling for TTS synthesis failures:

Backend:
- Add TTS_FAILED status to JobStatus enum for failed synthesis jobs
- Add TTSSynthesisError exception with cue index and context tracking
- Improve null-safe error handling in Gemini TTS response parsing
- Add _synthesize_cue_with_retry() with exponential backoff (3 attempts)
- Enhanced error logging with text preview and model context

Frontend:
- Add TTS_FAILED status styling (red badge) in StatusBadge component
- Add tts_failed to JobStatus TypeScript type

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 14:26:07 -06:00
michael
b11c3d0d4f fix: rewrite VTT retiming algorithm to prevent captions during AD segments
The VTT retimer had two bugs causing subtitles to display during freeze
periods and become out of sync:

1. Same offset applied to both start and end times (should differ when
   pause falls between them)
2. Cues spanning pause points weren't split (causing captions during freeze)

Changes:
- Add _offset_at() for timestamps AT or AFTER pause points
- Add _offset_before() for timestamps STRICTLY BEFORE pause points
- Add _retime_cue() to split cues at pause points into multiple segments
- Add _filter_short_segments() to remove <100ms segments after splitting
- Rewrite retime_for_pause_insert() to use new helper methods

Example fix for cue 8s-12s with pause at 10s (4s freeze):
- Before: 8s-12s (displayed during freeze!)
- After: 8s-10s + 14s-16s (gap during AD)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 09:01:03 -06:00
michael
37593dd4bc refactor: simplify pause point algorithm with midpoint snapping and silence buffers
Replace complex overlap/catch-up logic with simpler approach:
- Snap pause points to midpoint between sentences (not sentence boundaries)
- Add 500ms silence before AND after AD audio during freeze frame
- Resume playback from same midpoint (no overlap, no visual jump-back)

This eliminates audio/visual anomalies caused by the previous algorithm's
complexity around sentence boundary snapping and audio catch-up.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 09:55:40 -06:00
michael
37f5e8d1b0 fix: validate pause points and frame extraction
- Get video duration BEFORE the render loop (not after)
- Clamp pause_point to 100ms before video end if it exceeds duration
- Add validation in _extract_frame() to verify frame was created
- Add debug logging for frame extraction timestamps

This prevents "Frame file not found" errors when pause points
calculated by Whisper refinement exceed the source video duration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 09:32:13 -06:00
michael
40ece78652 feat: implement audio catch-up to eliminate visual jump-back artifact
When a pause point falls between two sentences, the previous algorithm
created a visual jump-back where the video rewound to resume_from after
the AD played. This was distracting to viewers.

New behavior:
- Video plays normally to pause_point
- Freeze frame shows + AD audio plays
- Freeze frame CONTINUES while source audio from [resume_from, pause_point]
  plays (the "catch-up" audio)
- Video resumes smoothly from pause_point (no visual jump)

The audio from the overlap region plays twice (once during video, once
during freeze extension) but this is acceptable as it's typically <1s
and provides natural audio context around the AD.

Implementation:
- Add _extract_audio_segment() to extract catch-up audio from source
- Add _concatenate_audio() to join AD + catch-up audio
- Modify render loop to create extended freeze segments with combined audio
- Resume video from pause_point instead of resume_from

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 09:02:46 -06:00
michael
ce7a1b182f fix: improve FFmpeg error reporting and add input validation
- Show last 1500 chars of stderr instead of first 500 to capture actual
  error messages (FFmpeg writes version banner first, errors at end)
- Add validation for freeze segment creation:
  - Check duration > 0
  - Verify frame and audio files exist
  - Add debug logging for parameters

This helps diagnose FFmpeg failures that were previously showing only
version/configuration info instead of the actual error.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 08:41:32 -06:00
michael
3588d3fa14 refactor: rewrite pause point refinement algorithm with ordered logic
Completely rewrites the Whisper-based pause point refinement to use
a two-phase approach with explicit ordering:

Phase 1 - Individual refinement:
1. Check if pause point is "during speaking" (words within ±2s)
   - If NOT during speaking → use Gemini's exact point, no overlap
2. If during speaking, find nearest sentence boundary
3. Apply appropriate buffering based on context:
   - Case A: First sentence → pause 500ms before sentence starts
   - Case B: Last sentence → pause 500ms after sentence ends
   - Case C: Between sentences → full double buffer (overlap)

Phase 2 - Consolidation (after all refinements):
- Consolidate cues within 5s of each other to play back-to-back

Key changes:
- Add SentenceBoundary dataclass for tracking boundaries with context
- Add _is_during_speaking() helper to detect speech proximity
- Add _find_sentence_boundaries() with longest-gap fallback
- Rewrite snap_pause_point() with new ordered algorithm
- Update refine_all_pause_points() to pass words and use two phases

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 08:19:03 -06:00
michael
d092800676 fix: treat consolidated AD cues as single segment for buffering
Previously, all consolidated cues shared the same pause_point AND
resume_from, which caused the overlap video segment to play between
each AD cue in a consolidated group.

Now consolidated cues are treated as a single AD segment:
- All cues in a group share the same pause_point (front buffer once)
- Only the LAST cue keeps resume_from (back buffer once)
- Other cues have resume_from = pause_point (no video between ADs)

This ensures consolidated ADs play seamlessly back-to-back:
- Video plays up to pause_point (front buffer)
- AD_1 plays
- AD_2 plays immediately (no video)
- AD_n plays immediately (no video)
- Video resumes from resume_from (back buffer)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 23:33:15 -06:00
michael
a3b4db104a feat: use Gemini's exact pause point for non-dialogue sections
When the Whisper analysis detects no speech near a Gemini-recommended
pause point, skip the full-gap-overlap algorithm and use the exact
pause point with no overlap (pause_point == resume_from).

This handles cases where Gemini chose a pause point in a silent or
music-only section of the video - there's no dialogue to buffer
around, so we simply pause and resume at the exact same point.

Three outcomes now in snap_pause_point():
1. No speech nearby → exact pause point, no overlap, no warning
2. Speech but no sentence break → warning (existing behavior)
3. Sentence break found → full-gap-overlap (existing behavior)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 23:13:12 -06:00
michael
12cae0919a feat: implement full-gap-overlap algorithm for AD pause insertion
Changes pause point calculation to use the entire gap between sentences
as a buffer on BOTH sides of the audio description:

- pause_point: Just BEFORE next sentence starts (gap.end - 50ms)
- resume_from: Just AFTER previous sentence ends (gap.start + 50ms)

This means a small portion of video plays twice (the gap duration), but
creates a much more natural listening experience by maximizing the
breathing room around audio descriptions.

Changes:
- whisper_service.py: snap_pause_point() now returns (pause_point, resume_from)
- video_renderer.py: Uses resume_from for current_time after freeze segment
- vtt_retimer.py: Calculates effective_offset including overlap duration
- accessible_video.py: Added resume_from field to ADPlacementCue schema

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 22:51:49 -06:00
michael
dd7ac2e15c debug: add logging for pause-insert video rendering
Logs pause point placements, segment creation, and final segment
calculation to help diagnose the 30s black footage issue.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 22:39:33 -06:00
michael
504e525a1f feat: dynamic pause point buffer based on gap duration
Instead of a fixed 175ms buffer, the pause point is now placed
halfway between the end of the sentence and the start of the
next word. If the half-gap exceeds 2 seconds, uses 500ms instead.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 22:33:07 -06:00
michael
01c96da95c feat: use all available CPU cores for Whisper transcription
Dynamically detects CPU count with os.cpu_count() instead of
hardcoded 4 threads. Falls back to 4 if detection fails.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 21:59:31 -06:00
michael
dc78dc6fb5 feat: add detailed logging for Whisper model and processing time
- Log model name explicitly when loading and transcribing
- Log model load time
- Log transcription processing time
- Helps verify correct model is being used

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 21:58:41 -06:00
michael
582f9f066e feat: expand Whisper search window to ±30s for sentence boundaries
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 17:34:30 -06:00
michael
c605cd1a88 feat: consolidate AD cues with pause points within 5s of each other
If consecutive AD cues have pause points within 5 seconds, they now
play back-to-back at the same pause point. This prevents AD from being
inserted mid-sentence when cues are close together.

Adds _consolidate_close_cues() method and consolidation_threshold
parameter to refine_all_pause_points().

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 16:15:52 -06:00
michael
0647c9c112 feat: expand Whisper search window to ±20s for sentence boundaries
Increases the search window from ±10s to ±20s to maximize the chance
of finding a proper sentence ending and avoid falling back to Gemini's
potentially imprecise pause points.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 15:51:42 -06:00
michael
407cc662e8 fix: insert first AD cue at video start if no sentence break found
When the first AD cue (index 0) cannot find a sentence boundary within
the ±10s search window, insert the AD at T=0:00 instead of using the
potentially mid-sentence Gemini pause point.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 15:27:04 -06:00
michael
8806289eca feat: improve pause point precision with sentence boundary detection
- Update Gemini prompt to require transcription with precise timestamps
- Add sentence_boundaries output field for validation
- Add pause_point_rationale field to explain each pause point choice
- Emphasize terminal punctuation only (., ?, !) - never commas
- Expand Whisper search window from ±5s to ±10s
- Increase post-pause buffer from 50ms to 175ms

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 14:41:12 -06:00