video-accessibility/backend/app/tasks
Vadym Samoilenko fa351e4d25 feat: per-client glossary — hybrid exact/vector retrieval + AI injection
Adds full glossary system so Gemini uses client-approved terminology
when generating subtitles and translations (critical for 3M brand names
and product codes across 16 target locales).

Backend:
- lib/locales.py: BCP-47 locale registry, normalises xlsx fr_fr → fr-FR
- models/glossary.py: Glossary / GlossaryVersion / GlossaryTerm + enums
- services/glossary_service.py: xlsx parse (openpyxl), ingest to Mongo,
  hybrid retrieval (Aho-Corasick exact + Atlas Vector Search), prompt block
- services/embedding_service.py: Gemini text-embedding-004, batch 100, retry
- tasks/embed_glossary.py: Celery background task for async embedding
- api/v1/routes_glossaries.py: CRUD endpoints under /clients/{id}/glossaries
- gemini.py: _build_glossary_block(), {GLOSSARY} injection in all 4 call sites
- tts.py / gemini_tts.py: pass full locale codes (no split("-")[0] truncation)
- tasks/translate_and_synthesize.py: glossary lookup + injection per language
- prompts: {GLOSSARY} placeholder in ingestion, targeted, transcreation prompts
- pyproject.toml: +openpyxl, +pyahocorasick

Frontend:
- routes/admin/glossaries/: GlossaryList, GlossaryUpload, GlossaryDetail
- App.tsx: 3 new routes under /admin/clients/:clientId/glossaries
- ClientDetail.tsx: Glossaries card with count + quick links
- types/api.ts: Glossary, GlossaryVersion, GlossaryDetail, GlossaryTerm types
- lib/api.ts: 7 new API methods (upload, list, detail, terms, versions, activate, archive)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 13:03:38 +01:00
..
__pycache__ removed mongodb change stream monitoring, added global websockets monitoring for notifications, broke symmetry between toasts and persistent notifications (and refined which notifications get sent and how) 2025-08-25 15:48:18 -05:00
__init__.py feat: per-client glossary — hybrid exact/vector retrieval + AI injection 2026-04-29 13:03:38 +01:00
embed_glossary.py feat: per-client glossary — hybrid exact/vector retrieval + AI injection 2026-04-29 13:03:38 +01:00
ffmpeg_operations.py feat: add dedicated ffmpeg queue to prevent server overload 2025-12-26 17:56:23 -06:00
ingest_and_ai.py feat: replace SDK with direct HTTP integration to centralized cost tracker 2026-04-27 13:36:15 +01:00
notify.py added websockets for live job status updates with toast notifications on job list page 2025-08-24 19:41:23 -05:00
render_accessible_video.py fix: resolve QA-reported bugs — MP3/VTT desync, crashes, notifications, and more 2026-03-24 13:23:55 +00:00
rerender_accessible_video.py feat: replace SDK with direct HTTP integration to centralized cost tracker 2026-04-27 13:36:15 +01:00
translate_and_synthesize.py feat: per-client glossary — hybrid exact/vector retrieval + AI injection 2026-04-29 13:03:38 +01:00
tts_synthesis.py feat: replace SDK with direct HTTP integration to centralized cost tracker 2026-04-27 13:36:15 +01:00
whisper_transcribe.py fix: add authentication for Cloud Run service calls 2026-01-02 11:41:07 -06:00