video-accessibility/backend/app/services
Vadym Samoilenko fa351e4d25 feat: per-client glossary — hybrid exact/vector retrieval + AI injection
Adds full glossary system so Gemini uses client-approved terminology
when generating subtitles and translations (critical for 3M brand names
and product codes across 16 target locales).

Backend:
- lib/locales.py: BCP-47 locale registry, normalises xlsx fr_fr → fr-FR
- models/glossary.py: Glossary / GlossaryVersion / GlossaryTerm + enums
- services/glossary_service.py: xlsx parse (openpyxl), ingest to Mongo,
  hybrid retrieval (Aho-Corasick exact + Atlas Vector Search), prompt block
- services/embedding_service.py: Gemini text-embedding-004, batch 100, retry
- tasks/embed_glossary.py: Celery background task for async embedding
- api/v1/routes_glossaries.py: CRUD endpoints under /clients/{id}/glossaries
- gemini.py: _build_glossary_block(), {GLOSSARY} injection in all 4 call sites
- tts.py / gemini_tts.py: pass full locale codes (no split("-")[0] truncation)
- tasks/translate_and_synthesize.py: glossary lookup + injection per language
- prompts: {GLOSSARY} placeholder in ingestion, targeted, transcreation prompts
- pyproject.toml: +openpyxl, +pyahocorasick

Frontend:
- routes/admin/glossaries/: GlossaryList, GlossaryUpload, GlossaryDetail
- App.tsx: 3 new routes under /admin/clients/:clientId/glossaries
- ClientDetail.tsx: Glossaries card with count + quick links
- types/api.ts: Glossary, GlossaryVersion, GlossaryDetail, GlossaryTerm types
- lib/api.ts: 7 new API methods (upload, list, detail, terms, versions, activate, archive)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 13:03:38 +01:00
..
__pycache__ better tts config for worker 2025-10-08 18:47:28 -05:00
audit_logger.py initial commit 2025-08-24 16:28:33 -05:00
cost_tracker.py fix: correct cost tracker API field names and endpoint path 2026-04-27 13:42:29 +00:00
descriptive_transcript.py feat: DCMP compliance, descriptive transcript, new languages, QA bug fixes 2026-03-27 11:50:43 +00:00
elevenlabs_voices.py fix: propagate ElevenLabs voice fetch errors to frontend 2026-03-03 14:27:45 +00:00
emailer.py feat(saas): Phase 3 — membership-based authz + Mailgun + job.organization_id 2026-04-27 16:56:42 +01:00
embedding_service.py feat: per-client glossary — hybrid exact/vector retrieval + AI injection 2026-04-29 13:03:38 +01:00
ffmpeg_http_service.py feat: add Cloud Run HTTP services for Whisper and FFmpeg 2026-01-02 10:12:50 -06:00
gcs.py fix: add charset=utf-8 to VTT content-type to prevent ♪ encoding issues 2026-03-27 14:17:16 +00:00
gemini.py feat: per-client glossary — hybrid exact/vector retrieval + AI injection 2026-04-29 13:03:38 +01:00
gemini_tts.py feat: per-client glossary — hybrid exact/vector retrieval + AI injection 2026-04-29 13:03:38 +01:00
glossary_service.py feat: per-client glossary — hybrid exact/vector retrieval + AI injection 2026-04-29 13:03:38 +01:00
language_qc.py feat: per-language QC workflow with linguist assignment 2026-04-29 12:09:40 +01:00
membership_service.py feat(saas): Phase 0+1 — Organization/Membership entities and dev branch 2026-04-27 16:46:24 +01:00
microsoft_auth.py added MSAL microsoft authentication 2025-10-10 09:19:39 -05:00
secrets_manager.py initial commit 2025-08-24 16:28:33 -05:00
tts.py feat: per-client glossary — hybrid exact/vector retrieval + AI injection 2026-04-29 13:03:38 +01:00
validation.py feat: add accessible video validation, remove AI confidence check 2025-12-26 16:41:57 -06:00
video_renderer.py feat: DCMP compliance, descriptive transcript, new languages, QA bug fixes 2026-03-27 11:50:43 +00:00
vtt_retimer.py fix: use actual freeze segment durations for VTT subtitle retiming 2026-01-05 15:52:57 -06:00
vtt_versioning.py feat: VTT version control — snapshots, diff, restore 2026-04-29 11:46:21 +01:00
websocket.py wrote docker files and deployment instructions 2025-10-08 16:00:12 -05:00
websocket_publisher.py wrote docker files and deployment instructions 2025-10-08 16:00:12 -05:00
whisper_http_service.py feat: add Cloud Run HTTP services for Whisper and FFmpeg 2026-01-02 10:12:50 -06:00
whisper_service.py fix: enforce AD cue pause_point monotonicity to preserve cue order 2026-02-26 08:15:06 -06:00
zip_download.py feat: DCMP compliance, descriptive transcript, new languages, QA bug fixes 2026-03-27 11:50:43 +00:00