video-accessibility

Author	SHA1	Message	Date
Vadym Samoilenko	6f963ff7c4	feat: DCMP compliance, descriptive transcript, new languages, QA bug fixes - Rewrote VTT translation to two-step (text-only → Gemini → apply to original timestamps) preventing caption timing desync - Added polling fallback for all processing states and Safari visibilitychange WebSocket reconnect - Added 11 new TTS languages (cs, da, fi, hu, no, sk, sv, es-419, pt-BR, fr-CA) - Updated caption/AD prompts to DCMP Captioning Key & Description Key standards (line splitting, ♪ music notation, italic tags, caption positioning, ethics guidelines) - Added descriptive transcript generation (WCAG 2.1 §1.2.1) combining captions + AD into plain text - Fixed amix normalize=0 to prevent audio loss in rendered videos - Fixed AD re-timing double-count when source_ms is None - Fixed cue block numbering to be 1-based in VttEditor and Timeline Preview Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 11:50:43 +00:00
Vadym Samoilenko	f4ddcce066	fix: resolve QA-reported bugs — MP3/VTT desync, crashes, notifications, and more BUG-1 & BUG-2 — Wrong audio plays after re-render / MP3 doesn't match text Root cause: audio files were named by index (cue_0.mp3, cue_1.mp3). When a cue was inserted or deleted, all following indices shifted but old MP3 files kept their original names, so re-render would play the wrong audio for the wrong cue. Fix: renamed files to cue_N_CONTENTHASH.mp3 and introduced an ad_cue_manifest stored in the job document that maps each cue index to its correct GCS URI. Re-render now reads from the manifest instead of guessing by filename. Also: editing AD cue text in the VTT editor now automatically queues TTS regeneration for changed cues — no more silent mismatches. BUG-3 — App crash / state desync when uploading VTT or clearing TTS queue Fixed handleVttFileUpload to only update local editor state after the server confirms the save — previously local state was updated first, so a network error left the UI showing content that wasn't actually saved. Fixed handleClearRegenerationQueue to only remove items from local state if the server removal succeeded — previously all items were cleared regardless. BUG-4 — AI generates different audio descriptions every time Added GenerateContentConfig(temperature=0.2, top_p=0.8, top_k=40) to the Gemini API call so output is more consistent across runs. BUG-5 — On-screen text inconsistently described Strengthened the AI prompt rule from a vague suggestion to a mandatory requirement with an explicit format: "Text on screen reads: [exact text]". Applied to both gemini_ingestion.md and gemini_ingestion_targeted.md. BUG-6 — No notification when re-render finishes Added rendering_qc toast notification and a dismissible green banner that appears in QCDetail when re-render transitions to pending_qc. The banner auto-dismisses after 10 seconds. Also increased WebSocket reconnect attempts from 5 to 15 and capped backoff at 60s to prevent falling back to manual refresh. BUG-7 — Timeline preview looks accurate but isn't after edits Added isStale prop to TimelinePreview. The timeline now shows an amber tint and "Preview may be outdated" label whenever there are unsaved pause point changes, pending TTS regenerations, or a new VTT has been uploaded. BUG-8 — ElevenLabs API errors break TTS with no fallback Added try/except fallback chain in _synthesize_single_cue: if the configured provider fails, it automatically retries with google, then gemini. BUG-9 — Concurrent re-render requests cause race conditions Made the PENDING_QC → RENDERING_QC status transition conditional (only succeeds if the job is still in PENDING_QC). Returns HTTP 409 if a re-render is already in progress. The completion transition back to PENDING_QC is also conditional so a cancelled/overridden render doesn't corrupt job state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-24 13:23:55 +00:00
Vadym Samoilenko	c413fcb747	feat: add SDH (Subtitles for Deaf and Hard of Hearing) caption output SDH captions extend standard VTT with speaker identification labels, sound effects [PHONE RINGS], music notation ♪, and off-screen indicators. - Add sdh_vtt flag to RequestedOutputs model and frontend form - Add sdh_captions_vtt_gcs field to LangOutput model - Inject SDH generation instructions into both Gemini prompts via {SDH_FIELD} and {SDH_GUIDELINES} placeholders when requested - Upload sdh_captions.vtt to GCS in ingest task - Pass SDH through video_native translation (Gemini generates it directly) and traditional translation (translate source SDH VTT via Gemini) - Expose sdh_captions_vtt in downloads endpoint and bulk zip export Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-18 15:02:18 +00:00
Vadym Samoilenko	2e8a8dc287	feat: add brand context, ethics guidelines, and improved AD prompt rules - Add brand_context field (job model, API, frontend form) so clients can list brand names present in their video; Gemini uses these names instead of generic descriptors (e.g. "Sellotape" not "sticky tape") - Add ethical guidelines section to both Gemini prompts covering person-first language, consistent race/gender description only when plot-relevant, no guessing at unconfirmed identity - Revamp audio description rules: priority ordering (essential → high-priority → time-permitting), pre-teaching placement, no cinematic jargon, succinct style replacing the former "20% longer" instruction - Thread brand_context through full stack: routes → job doc → ingest task → translate task → both Gemini prompt templates Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-18 14:46:09 +00:00
michael	e44210ea64	feat: auto-rewrite TTS cues that fail synthesis When TTS synthesis fails after 3 retries, the system now: - Sends problematic cue text to Gemini for TTS-safe rewriting - Updates the VTT file in GCS with rewritten text - Retries TTS synthesis with the new text - Records successful rewrites in job.tts_rewrites field UI changes: - JobDetail shows amber caution box with original/rewritten text - JobsList shows warning icon next to jobs with rewrites - Error display clarifies text shown is "after rewrite attempt" Files changed: - backend/app/models/job.py: Add tts_rewrites field - backend/app/prompts/gemini_tts_rewrite.md: New prompt template - backend/app/services/gemini.py: Add rewrite_tts_cue method - backend/app/tasks/tts_synthesis.py: Add VTT update utilities - backend/app/tasks/translate_and_synthesize.py: Rewrite+retry logic - frontend/src/types/api.ts: Add TTSRewriteItem type - frontend/src/routes/jobs/JobDetail.tsx: Caution display - frontend/src/routes/jobs/JobsList.tsx: Warning indicator 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 14:42:50 -06:00
michael	d2d8e32819	feat: add video-native translation mode for multi-language content Add a new "Video Native Mode" translation option that re-processes the video through Gemini for each target language, generating captions and audio descriptions directly from visual context. This produces more natural and culturally appropriate content compared to traditional VTT text translation. Changes: - Add translation_mode field to RequestedOutputs (video_native \| traditional) - Create gemini_ingestion_targeted.md prompt for target language generation - Add extract_accessibility_targeted() method to Gemini service - Modify translate_and_synthesize task to handle both translation modes - Add Translation Mode UI selector in NewJob screen (video_native is default) - Remove transcreation UI (replaced by video_native mode) - Remove Google Translate service (replaced by Gemini translation) - Add LanguageSelector component with searchable dropdown 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-31 13:50:05 -06:00
michael	8806289eca	feat: improve pause point precision with sentence boundary detection - Update Gemini prompt to require transcription with precise timestamps - Add sentence_boundaries output field for validation - Add pause_point_rationale field to explain each pause point choice - Emphasize terminal punctuation only (., ?, !) - never commas - Expand Whisper search window from ±5s to ±10s - Increase post-pause buffer from 50ms to 175ms 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 14:41:12 -06:00
michael	80d3866d32	feat: add accessible video (MP4 with embedded audio descriptions) Add new deliverable type that renders video with audio descriptions embedded. Supports two AI-determined methods: - Direct Overlay: ducks original audio and overlays AD TTS (for minimal dialogue) - Pause-Insert: freeze-frame video, insert AD, re-time subtitles (for significant dialogue) Backend: - Add Pydantic schemas for Gemini analysis response - Add Gemini prompt and analyze_accessible_video_placement() method - Add video_renderer.py service using FFmpeg for both rendering methods - Add vtt_retimer.py service for pause-insert subtitle adjustment - Add render_accessible_video.py Celery task - Modify TTS service to return individual per-cue segments - Update translate_and_synthesize.py to save segments and trigger rendering - Update download endpoint to include accessible video outputs Frontend: - Add accessible_video_mp4 checkbox to NewJob form - Update TypeScript types for new deliverable 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 11:06:41 -06:00
michael	dad7ea09df	fix: generate audio descriptions in the video's detected language Updated Gemini ingestion prompt to explicitly require: - Detect the spoken language first - Write ALL outputs (summary, transcript, captions, audio_description) in that language - Do NOT translate to English - keep everything in the original language This fixes the issue where German videos would get English audio descriptions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 19:01:14 -06:00
michael	af2562096a	initial commit	2025-08-24 16:28:33 -05:00

10 commits