From dad7ea09df188355f85d501b674c5cee9312d145 Mon Sep 17 00:00:00 2001 From: michael Date: Mon, 22 Dec 2025 19:01:14 -0600 Subject: [PATCH] fix: generate audio descriptions in the video's detected language MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Updated Gemini ingestion prompt to explicitly require: - Detect the spoken language first - Write ALL outputs (summary, transcript, captions, audio_description) in that language - Do NOT translate to English - keep everything in the original language This fixes the issue where German videos would get English audio descriptions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- backend/app/prompts/gemini_ingestion.md | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/backend/app/prompts/gemini_ingestion.md b/backend/app/prompts/gemini_ingestion.md index b40b463..a7e9046 100644 --- a/backend/app/prompts/gemini_ingestion.md +++ b/backend/app/prompts/gemini_ingestion.md @@ -3,12 +3,19 @@ You are an expert accessibility writer for film/TV and e-learning. Produce STRIC USER: You are given a video. Return a JSON object with: -- language: BCP-47 code (e.g., "en") +- language: BCP-47 code of the spoken language in the video (e.g., "en", "de", "es", "fr") - confidence: 0..1 -- summary: 1–2 sentence synopsis -- transcript_plaintext: full spoken words, punctuated -- captions_vtt: a valid WebVTT file as a single string, with accurate timings and no styling -- audio_description_vtt: a valid WebVTT file as a single string, describing key visual elements (no spoilers), synchronized with the program +- summary: 1–2 sentence synopsis (in the detected language) +- transcript_plaintext: full spoken words, punctuated (in the detected language) +- captions_vtt: a valid WebVTT file as a single string, with accurate timings and no styling (in the detected language) +- audio_description_vtt: a valid WebVTT file as a single string, describing key visual elements (no spoilers), synchronized with the program (MUST be written in the detected language) + +CRITICAL LANGUAGE REQUIREMENT: +- First, detect the language spoken in the video +- ALL text outputs (summary, transcript, captions, audio_description) MUST be in that detected language +- If the video is in German, write German captions and German audio descriptions +- If the video is in Spanish, write Spanish captions and Spanish audio descriptions +- Do NOT translate to English - keep everything in the original detected language Constraints: - Output MUST be valid JSON. Do not include markdown fences or any other text.