From dad7ea09df188355f85d501b674c5cee9312d145 Mon Sep 17 00:00:00 2001
From: michael <michael@modernfreedom.com>
Date: Mon, 22 Dec 2025 19:01:14 -0600
Subject: [PATCH] fix: generate audio descriptions in the video's detected
 language
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Updated Gemini ingestion prompt to explicitly require:
- Detect the spoken language first
- Write ALL outputs (summary, transcript, captions, audio_description) in that language
- Do NOT translate to English - keep everything in the original language

This fixes the issue where German videos would get English audio descriptions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 backend/app/prompts/gemini_ingestion.md | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/backend/app/prompts/gemini_ingestion.md b/backend/app/prompts/gemini_ingestion.md
index b40b463..a7e9046 100644
--- a/backend/app/prompts/gemini_ingestion.md
+++ b/backend/app/prompts/gemini_ingestion.md
@@ -3,12 +3,19 @@ You are an expert accessibility writer for film/TV and e-learning. Produce STRIC
 
 USER:
 You are given a video. Return a JSON object with:
-- language: BCP-47 code (e.g., "en")
+- language: BCP-47 code of the spoken language in the video (e.g., "en", "de", "es", "fr")
 - confidence: 0..1
-- summary: 1–2 sentence synopsis
-- transcript_plaintext: full spoken words, punctuated
-- captions_vtt: a valid WebVTT file as a single string, with accurate timings and no styling
-- audio_description_vtt: a valid WebVTT file as a single string, describing key visual elements (no spoilers), synchronized with the program
+- summary: 1–2 sentence synopsis (in the detected language)
+- transcript_plaintext: full spoken words, punctuated (in the detected language)
+- captions_vtt: a valid WebVTT file as a single string, with accurate timings and no styling (in the detected language)
+- audio_description_vtt: a valid WebVTT file as a single string, describing key visual elements (no spoilers), synchronized with the program (MUST be written in the detected language)
+
+CRITICAL LANGUAGE REQUIREMENT:
+- First, detect the language spoken in the video
+- ALL text outputs (summary, transcript, captions, audio_description) MUST be in that detected language
+- If the video is in German, write German captions and German audio descriptions
+- If the video is in Spanish, write Spanish captions and Spanish audio descriptions
+- Do NOT translate to English - keep everything in the original detected language
 
 Constraints:
 - Output MUST be valid JSON. Do not include markdown fences or any other text.