diff --git a/backend/app/prompts/gemini_ingestion.md b/backend/app/prompts/gemini_ingestion.md index b40b463..a7e9046 100644 --- a/backend/app/prompts/gemini_ingestion.md +++ b/backend/app/prompts/gemini_ingestion.md @@ -3,12 +3,19 @@ You are an expert accessibility writer for film/TV and e-learning. Produce STRIC USER: You are given a video. Return a JSON object with: -- language: BCP-47 code (e.g., "en") +- language: BCP-47 code of the spoken language in the video (e.g., "en", "de", "es", "fr") - confidence: 0..1 -- summary: 1–2 sentence synopsis -- transcript_plaintext: full spoken words, punctuated -- captions_vtt: a valid WebVTT file as a single string, with accurate timings and no styling -- audio_description_vtt: a valid WebVTT file as a single string, describing key visual elements (no spoilers), synchronized with the program +- summary: 1–2 sentence synopsis (in the detected language) +- transcript_plaintext: full spoken words, punctuated (in the detected language) +- captions_vtt: a valid WebVTT file as a single string, with accurate timings and no styling (in the detected language) +- audio_description_vtt: a valid WebVTT file as a single string, describing key visual elements (no spoilers), synchronized with the program (MUST be written in the detected language) + +CRITICAL LANGUAGE REQUIREMENT: +- First, detect the language spoken in the video +- ALL text outputs (summary, transcript, captions, audio_description) MUST be in that detected language +- If the video is in German, write German captions and German audio descriptions +- If the video is in Spanish, write Spanish captions and Spanish audio descriptions +- Do NOT translate to English - keep everything in the original detected language Constraints: - Output MUST be valid JSON. Do not include markdown fences or any other text.