--- title: "VTT Edit → Descriptive Transcript Regeneration" description: "Pattern for keeping descriptive_transcript.txt in sync when captions or audio description VTTs are edited via PATCH /vtt" tags: [fastapi, gcs, vtt, accessibility, celery] created: 2026-05-01 updated: 2026-05-01 projects: [video-accessibility] --- # VTT Edit → Descriptive Transcript Regeneration ## Problem When a reviewer edits either captions.vtt or ad.vtt via `PATCH /jobs/{id}/vtt`, the descriptive transcript (`descriptive_transcript.txt`) becomes stale — it still reflects the pre-edit VTT content. This goes undetected because the transcript is not re-generated in the PATCH handler. ## Pattern In the PATCH handler, after writing the edited VTT(s) to GCS but before the MongoDB update: 1. Determine which stream was edited (request body) and which was not 2. Read the unchanged stream from GCS 3. Merge both streams via `generate_descriptive_transcript(captions_text, ad_text)` 4. Upload the new transcript to GCS 5. Update `lang_output["descriptive_transcript_gcs"]` so the MongoDB doc points to the fresh file Wrap in a broad `except` so a transcript failure never blocks the VTT save. ```python # After GCS uploads for captions/AD: if request.captions_vtt or request.audio_description_vtt: try: from ...services.descriptive_transcript import ( generate_descriptive_transcript as _gen_transcript, ) captions_text = request.captions_vtt if not captions_text: cc_gcs = lang_output.get("captions_vtt_gcs") if cc_gcs: _blob = gcs_service.bucket.blob( cc_gcs.replace(f"gs://{settings.gcs_bucket}/", "") ) captions_text = await asyncio.get_event_loop().run_in_executor( gcs_service.executor, _blob.download_as_text ) ad_text = request.audio_description_vtt if not ad_text: ad_gcs = lang_output.get("ad_vtt_gcs") if ad_gcs: _blob = gcs_service.bucket.blob( ad_gcs.replace(f"gs://{settings.gcs_bucket}/", "") ) ad_text = await asyncio.get_event_loop().run_in_executor( gcs_service.executor, _blob.download_as_text ) transcript_text = _gen_transcript(captions_text or "", ad_text or "") if transcript_text: transcript_uri = await upload_vtt_to_gcs( transcript_text, f"{job_id}/{target_language}/descriptive_transcript.txt", ) lang_output["descriptive_transcript_gcs"] = transcript_uri except Exception as _tr_err: logger.warning( f"Failed to regenerate descriptive transcript for job {job_id}: {_tr_err}" ) ``` ## Notes - `asyncio.get_event_loop().run_in_executor(gcs_service.executor, blob.download_as_text)` — use the GCS service's thread pool executor to keep GCS SDK calls off the async loop - The local import inside try/except avoids circular import issues if the module is conditionally present - Always update the GCS pointer in `lang_output` before the MongoDB update — the write is atomic at the document level, so both the VTT path and the transcript path update together - This pattern applies to any derived artifact that depends on two source VTT files