diff --git a/backend/app/prompts/gemini_ingestion.md b/backend/app/prompts/gemini_ingestion.md
index b40b463..a7e9046 100644
--- a/backend/app/prompts/gemini_ingestion.md
+++ b/backend/app/prompts/gemini_ingestion.md
@@ -3,12 +3,19 @@ You are an expert accessibility writer for film/TV and e-learning. Produce STRIC
 
 USER:
 You are given a video. Return a JSON object with:
-- language: BCP-47 code (e.g., "en")
+- language: BCP-47 code of the spoken language in the video (e.g., "en", "de", "es", "fr")
 - confidence: 0..1
-- summary: 1–2 sentence synopsis
-- transcript_plaintext: full spoken words, punctuated
-- captions_vtt: a valid WebVTT file as a single string, with accurate timings and no styling
-- audio_description_vtt: a valid WebVTT file as a single string, describing key visual elements (no spoilers), synchronized with the program
+- summary: 1–2 sentence synopsis (in the detected language)
+- transcript_plaintext: full spoken words, punctuated (in the detected language)
+- captions_vtt: a valid WebVTT file as a single string, with accurate timings and no styling (in the detected language)
+- audio_description_vtt: a valid WebVTT file as a single string, describing key visual elements (no spoilers), synchronized with the program (MUST be written in the detected language)
+
+CRITICAL LANGUAGE REQUIREMENT:
+- First, detect the language spoken in the video
+- ALL text outputs (summary, transcript, captions, audio_description) MUST be in that detected language
+- If the video is in German, write German captions and German audio descriptions
+- If the video is in Spanish, write Spanish captions and Spanish audio descriptions
+- Do NOT translate to English - keep everything in the original detected language
 
 Constraints:
 - Output MUST be valid JSON. Do not include markdown fences or any other text.