video-accessibility

Author	SHA1	Message	Date
Vadym Samoilenko	6bf88474ee	feat(embed): switch embeddings to Vertex AI text-multilingual-embedding-002 Replace AI Studio gemini-embedding-001 with Vertex AI text-multilingual-embedding-002 via google-genai SDK (vertexai=True). Vertex AI uses ADC (already configured) and has significantly higher per-project quotas than AI Studio per-user limits. Same 768-dim output; multilingual model better suited for 50+ language glossaries. Add gcp_location config field (default us-central1). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-13 18:41:32 +01:00
Vadym Samoilenko	ca312d48fa	chore(lint): fix all ruff errors — 0 warnings remaining - B904 (55): add `from err` / `from None` to raise-in-except across 13 files - F821 (1): add missing HTTPException import in routes_language_qc.py - F841 (7): remove unused variable assignments (current_user, job_title, tts_provider, etc.) - W293 (13): strip trailing whitespace from blank lines - C416 (4): rewrite unnecessary dict comprehensions as dict() - C401 (1): rewrite unnecessary generator as set comprehension - E701 (4): split multi-statement lines in cost_tracker.py - E741 (1): rename ambiguous `l` to `lang` in cloud_run_dispatch.py - B007 (4): prefix unused loop variables with _ in tts.py, video_renderer.py - I001 (1): sort imports in tasks/__init__.py (move stdlib to top) - E402 (3): move threading/time/signals imports to top of tasks/__init__.py - UP042 (9): replace (str, Enum) with StrEnum in all model/schema enums Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-13 17:13:08 +01:00
Vadym Samoilenko	c380a96c72	refactor(tts): switch Gemini TTS from AI Studio API to Cloud TTS API The AI Studio API (generativelanguage.googleapis.com) enforces a hard 10 RPM quota on preview models regardless of billing tier. Switching to Cloud TTS API (texttospeech.googleapis.com) with the same Gemini models uses a separate, production-grade quota that scales on paid plans. Changes: - Replace genai.Client + generate_content(AUDIO) with texttospeech.TextToSpeechClient - Style prompt now goes to SynthesisInput.prompt (dedicated field, not prepended text) - Speed goes to AudioConfig.speaking_rate (no longer encoded in prompt text) - Cloud TTS returns MP3 directly — remove PCM→MP3 lameenc conversion - config: update pro model from gemini-2.5-pro-preview-tts → gemini-2.5-pro-tts (GA) - Service account already has roles/aiplatform.user (granted today) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-08 17:16:32 +01:00
Vadym Samoilenko	77a4eb10e0	fix(auth): await get_redis() coroutine in membership cache get_redis() is an async function but was called without await in _cached_memberships(), causing RuntimeWarning and silently bypassing the Redis membership cache on every request — all membership lookups were hitting MongoDB instead of cache. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-08 13:57:57 +01:00
Vadym Samoilenko	2f4925353a	feat(pause-insert): adaptive buffer, forward-snap, timeline drag + share link fix Backend (Phase A): - A1: Adaptive silence buffer — natural_gap_ms persisted per cue; renderer computes per-cue silence_before/silence_after instead of fixed 500ms; per-cue silence files - A2: Forward-preferred snap — snap_pause_point prefers boundaries up to 4s ahead over boundaries within 1.5s behind, reducing mid-scene cuts - A3: Min-gap validation — pause points with < 200ms gap trigger forward search to the next acceptable gap - natural_gap_ms added to PausePointData model and api.ts type - New config fields: whisper_snap_forward_window, whisper_snap_backward_window, ad_silence_buffer_default, ad_silence_buffer_min_after, ad_min_acceptable_gap - Tests: test_whisper_snap.py (13 tests), test_video_renderer_buffers.py Frontend (Phase B): - B1: Drag pause-point markers — pointer state machine with 3px move threshold, clamp to min/max bounds, click-without-move still opens PausePointEditor - B2: Drag freeze blocks — orange blocks translate with linked pause point - B3: Time tooltip visible during drag, hidden on release - Tests: TimelinePreview.drag.test.tsx (10 tests) Fixes: - Share link pointed to ai-sandbox.oliver.solutions — added app_url to Settings with correct optical-dev.oliver.solutions default; share_url now configurable via APP_URL env var - Removed all ai-sandbox.oliver.solutions references from docker-compose, apache config, docs, and scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-01 16:09:09 +01:00
Vadym Samoilenko	713ae46d4a	fix(tts): revert pro TTS to gemini-2.5-pro-preview-tts (3.1 pro TTS doesn't exist yet) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 21:01:22 +01:00
Vadym Samoilenko	3fb8dce3ee	feat(ai): upgrade Gemini models to 3.1-pro-preview and 3.1-pro-tts-preview Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 21:00:32 +01:00
Vadym Samoilenko	12fe4ebcbb	feat(tts): upgrade Gemini TTS model to gemini-3.1-flash-tts-preview Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 20:57:37 +01:00
Vadym Samoilenko	31199f8705	chore: push all session changes — backend hardening, tests, apache config, deploy scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 15:52:14 +01:00
Vadym Samoilenko	5fd370c093	test: fix all unit tests — 168 passing, 0 failures - conftest.py: set required env vars before app import to prevent Settings() crash - gcs.py: lazy bucket init checks _bucket instead of _client; add @bucket.setter - vtt.py: fix float precision in _format_timestamp; include empty-text cues in parser - security.py: guard verify_password against empty hash (passlib UnknownHashError) - tts.py: _parse_timestamp raises ValueError("Invalid timestamp format: …") - emailer.py: HTML-escape job_title in _render_completion_template (XSS fix) - test_emailer.py: rewrite for Mailgun-based service (replaced SendGrid) - test_gcs.py: fix UploadFile constructor, MIME type, remove executor.submit mock - test_gemini.py: patch module-level client instead of non-existent genai.upload_file; translate_vtt tests use numbered-list mock responses matching new implementation - test_tts.py: fix aiohttp async CM mock pattern; fix error message match - test_models.py: update JobCreate to use source_is_english instead of language - test_security.py: set jwt_access_ttl_min in token test - test_cross_tenant_isolation.py: add patch to imports Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 14:02:04 +01:00
Vadym Samoilenko	3f557724d3	feat(api): L-18 blocked-on-source, PR-10 promote-to-qc, R-12 reviewed_cues reset - POST /{job_id}/actions/blocked_on_source (L-18): linguist/reviewer flags a source video issue; moves job to QC_FEEDBACK and records blocked_on_source_reason/at/by - POST /{job_id}/actions/promote_to_qc (PR-10): production/admin manually bypasses AI processing for edge-case failures; adds audit history entry - Reset reviewed_cues to 0 on submit_for_review (R-12) so reviewer must re-acknowledge all cues after each linguist resubmit - Add assert_job_in_user_org + get_user_org_ids to core/dependencies.py (used by the new endpoints and the cross-tenant isolation test suite) - Remove unused ingest_and_ai_task / translate_and_synthesize_task imports Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 10:38:39 +01:00
Vadym Samoilenko	4623b89aeb	feat(mt-16): JWT org_ids claim + transient user.org_ids in deps - create_access_token gains optional org_ids: list[str] param; encodes {exp, sub, org_ids, v:2} — org_ids is a prefilter hint only, never used as authorization source of truth (Redis cache is authoritative) - Login, MS login, refresh endpoints: fetch memberships and include org_ids in issued access tokens via _get_user_org_ids() helper - routes_invitations.py accept flow: same org_ids population on token - get_current_user: reads org_ids from payload, attaches as transient user.__dict__["org_ids"] — available to OrgScopedQuery for prefilter - Force logout: rotate JWT_SECRET env var at deployment time (no code change needed; all existing tokens immediately invalidated) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 20:46:39 +01:00
Vadym Samoilenko	38038862c9	refactor(mt-15): consolidate authz in routes_jobs and dependencies list_jobs now uses MembershipContext (Redis-cached, 60s TTL) to build org-scoped queries instead of per-request memberships.find(). Falls back to legacy get_accessible_project_ids for users with no memberships. get_job replaces the role-specific CLIENT/PM access check with get_job_or_403() which uniformly checks organization_id membership for all roles (returns 404 not 403 to avoid leaking cross-org job existence). get_accessible_project_ids in dependencies.py now uses _cached_memberships from authz.py, eliminating the duplicate uncached DB query. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 20:26:07 +01:00
Vadym Samoilenko	312af2d7fb	feat(mt-11): cross-org assignment guard in language_qc Prevent PM in org A from assigning linguist/reviewer from org B. Added _assert_user_in_job_org() helper that resolves job org_id (with project fallback) and checks db.memberships for the assignee. Also added assert_user_in_org() and get_job_or_403() to core/authz.py for use in upcoming MT-13 and MT-15 commits. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 20:22:46 +01:00
Vadym Samoilenko	1bf0fb9eed	feat(pr4+pr5): hotkeys, unified status labels, upload size constant PR-4 hotkeys (L-9): - QCDetail: Cmd/Ctrl+S saves current VTT (handleSaveFullVtt) - QCDetail: Escape closes both reject forms (final review + language reject modal) PR-5 T-1 (unified status labels): - Add JOB_STATUS_LABELS and getJobStatusLabel to utils/jobStatusMessages.ts - JobsList.tsx: remove local STATUS_LABELS duplicate, import from shared util - StatusBadge.tsx: remove 30-line switch duplicate, use getJobStatusLabel PR-5 T-14 (unified upload size constant): - config.py: upload_max_video_bytes = 2GB, upload_signed_url_ttl_hours = 24 - validation.py: use settings.upload_max_video_bytes instead of magic number - notify.py: use settings.upload_signed_url_ttl_hours for signed URL TTL Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 18:42:03 +01:00
Vadym Samoilenko	a168af1aa7	feat: two-stage QC (linguist→reviewer), project picker, comments, email notifications, deadlines - Two-stage QC workflow: linguist edits + submits → reviewer approves/rejects per language. New statuses: in_progress, pending_review, in_review. New service functions: submit_for_review, open_review, assign_reviewer, reassign_reviewer, add_comment. Linguist and reviewer deadlines. - Reject now resets language to in_progress so linguist can iterate without full re-assignment. - QC comment threads per language (append-only), visible to all assignees. - Email notifications via Mailgun on: assignment, submit-for-review, comment, approve, reject. Best-effort (failures do not roll back QC actions). asyncio.gather for parallel fan-out. - New audit actions: LANGUAGE_QC_REVIEWER_ASSIGN/REASSIGN, LANGUAGE_QC_SUBMIT, LANGUAGE_QC_OPEN_REVIEW, LANGUAGE_QC_COMMENT. - Inline project picker in NewJob: "＋ Create new project…" option with name, default languages, default linguist, default reviewer. Pre-fills languages on the new job. - Project model extended with default_languages, default_linguist_id, default_reviewer_id. - RBAC: CLIENT org-members can now create projects (backend guard relaxed). - LinguistQueue: role toggle "As linguist / As reviewer" + new status tabs. - QCDetail: two-slot assignment cards (linguist + reviewer), deadline display, role-aware action buttons, comments panel with optimistic insert and 15s refetch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 16:59:40 +01:00
Vadym Samoilenko	a3b300b76a	docs: add canonical documentation + audit cleanup - AGENTS.md: canonical project entry point (Quick Nav, pipeline, constraints) - docs/: complete docs tree — architecture, API spec, DB schema, infra, runbook, requirements, tech stack, principles, reference ADRs, guides, tasks backlog, testing strategy - tests/README.md: test commands, structure, known gaps - README.md / CLAUDE.md / DEPLOYMENT.md: updated with canonical doc links - .archive/: backup of pre-documentation-pipeline originals - backend/uv.lock: uv dependency lockfile - Delete committed __pycache__ .pyc files (should have been gitignored) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:22:51 +01:00
Vadym Samoilenko	4c6624c3d4	fix: code health sweep — M-01 through M-07 M-01 authz.py: move cache_key above try block to avoid NameError when first Redis call returns None M-02 main.py: re-enable validation middleware (was TEMPORARILY DISABLED) M-03 routes_auth.py / main.py: replace print() debug lines with structured logger calls; logger now module-level in routes_auth.py M-04 gcs.py: asyncio.get_event_loop() → get_running_loop() (deprecation) M-05 translate_and_synthesize.py: bind loop vars in closure defaults to fix B023 ruff warnings (transcreate/translate_captions/etc.) M-06 rate_limiting.py: only trust X-Forwarded-For when X-Forwarded-Proto is https; use rightmost entry (proxy-appended) not leftmost M-07 validation.py: extend MongoDB operator blocklist to cover $expr, $function, $accumulator, $nin, $gte, $lte, $jsonSchema, $mod Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:18:02 +01:00
Vadym Samoilenko	c6b19d01f2	security: remove default admin password fallback (C-04) seed_default_admin now skips creation and logs a warning when DEFAULT_ADMIN_PASSWORD is unset instead of falling back to the hardcoded ChangeMe123! value. Existing-admin promotion path is unaffected. Added DEFAULT_ADMIN_PASSWORD to .env.prod.example. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:12:24 +01:00
Vadym Samoilenko	70f6c6befb	security: reject refresh tokens used as access tokens (C-02) get_current_user and get_current_user_optional now reject any token whose payload carries type="refresh". Access tokens carry no type field so the check is asymmetric and safe. Prevents a refresh-cookie value from being replayed as a Bearer access token. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:11:50 +01:00
Vadym Samoilenko	fa351e4d25	feat: per-client glossary — hybrid exact/vector retrieval + AI injection Adds full glossary system so Gemini uses client-approved terminology when generating subtitles and translations (critical for 3M brand names and product codes across 16 target locales). Backend: - lib/locales.py: BCP-47 locale registry, normalises xlsx fr_fr → fr-FR - models/glossary.py: Glossary / GlossaryVersion / GlossaryTerm + enums - services/glossary_service.py: xlsx parse (openpyxl), ingest to Mongo, hybrid retrieval (Aho-Corasick exact + Atlas Vector Search), prompt block - services/embedding_service.py: Gemini text-embedding-004, batch 100, retry - tasks/embed_glossary.py: Celery background task for async embedding - api/v1/routes_glossaries.py: CRUD endpoints under /clients/{id}/glossaries - gemini.py: _build_glossary_block(), {GLOSSARY} injection in all 4 call sites - tts.py / gemini_tts.py: pass full locale codes (no split("-")[0] truncation) - tasks/translate_and_synthesize.py: glossary lookup + injection per language - prompts: {GLOSSARY} placeholder in ingestion, targeted, transcreation prompts - pyproject.toml: +openpyxl, +pyahocorasick Frontend: - routes/admin/glossaries/: GlossaryList, GlossaryUpload, GlossaryDetail - App.tsx: 3 new routes under /admin/clients/:clientId/glossaries - ClientDetail.tsx: Glossaries card with count + quick links - types/api.ts: Glossary, GlossaryVersion, GlossaryDetail, GlossaryTerm types - lib/api.ts: 7 new API methods (upload, list, detail, terms, versions, activate, archive) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 13:03:38 +01:00
Vadym Samoilenko	05f25a1141	feat: per-language QC workflow with linguist assignment - Job.language_qc dict tracks per-language status (pending/in_review/approved/rejected) with full event history; qc_assignments denormalized array enables efficient queue queries - language_qc service handles assign/reassign/approve/reject/reopen with atomic DB updates, audit logging, and auto-advancement to pending_final_review when all languages approved - Linguists can only edit VTT and trigger re-renders for their assigned language (403 guard) - return_to_qc resets all language statuses while preserving assignments - routes_language_qc.py: 7 new endpoints; /me/language-qc-queue for linguist queue - Startup migration idempotently seeds language_qc for all existing jobs - Frontend: LanguageQCState types, API methods, LinguistQueue page, QCDetail redesigned with per-language status badges, assignment dropdown, inline approve/reject buttons, progress bar, and reject modal; My QC Queue sidebar link Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 12:09:40 +01:00
Vadym Samoilenko	bab30e1508	feat: VTT version control — snapshots, diff, restore Backend: - VttVersion model (vtt_version.py): immutable snapshot per job/lang/kind/version - vtt_versioning service: create_version (atomic counter + GCS snapshot), list_versions, get_version, restore_version, diff_versions (difflib line-level) - routes_vtt_versions.py: GET /versions, GET /versions/{v}, GET /versions/diff, POST /versions/{v}/restore (PRODUCTION/ADMIN only, overwrites live file + audit log) - Hook create_version into update_job_vtt_content before each live-file overwrite - Mongo indexes: unique (job_id, lang, kind, version) + (job_id, created_at) Frontend: - VttVersionSummary / VttVersionFull / VttDiffResponse types - api.ts: listVttVersions, getVttVersion, diffVttVersions, restoreVttVersion - VersionsTab.tsx: lang/kind switcher, version list with A/B compare buttons, inline diff viewer (color-coded +/−), content viewer, restore with confirm dialog - JobDetail.tsx: new "VTT Versions" tab wired to VersionsTab Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 11:46:21 +01:00
Vadym Samoilenko	1563714454	feat(saas): Phase 3 — membership-based authz + Mailgun + job.organization_id authz.py (new): - MembershipContext — per-request membership dict for the current user - get_membership_context FastAPI dependency - require_org_role(min_role) — dependency factory keyed off org_id path param - require_platform_admin() - OrgScopedQuery — adds organization_id filter; platform admin passes through - bump_user_membership_cache — invalidates Redis key on membership writes dependencies.py: - get_accessible_project_ids now queries memberships collection first; legacy pm_client_ids / team.member_user_ids fallback retained until migration runs (four job-route access checks at lines 608/1054/1181/1538 are fixed via this function) routes_clients.py: - _assert_pm_or_admin and _assert_client_access are now async and query memberships - All 10 call sites updated with await + db arg emailer.py: - Switched from SendGrid to Mailgun REST API via httpx (already in requirements) - _send() is now fully async; same public method signatures preserved - send_completion_email uses _send() config.py: - Added mailgun_api_key, mailgun_domain, mailgun_from settings - sendgrid_api_key kept with empty default for backward compat migration_2026-04-28-000003: - Backfills job.organization_id from project.client_id - Creates (organization_id, status, created_at) sparse index on jobs routes_organizations.py / routes_invitations.py: - Call bump_user_membership_cache after every membership write Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 16:56:42 +01:00
Vadym Samoilenko	2b721d182b	feat: Client → Team → Project isolation system with Project Manager role Backend: - New UserRole.PROJECT_MANAGER with pm_client_ids[] on User model - New models: Client (slug-based), Team (member_user_ids[]), Project (client-scoped) - Job model gains project_id field - New GET/POST/PATCH/DELETE /clients, /clients/{id}/teams, /clients/{id}/projects, /clients/{id}/pm routes (admin-only client CRUD; PM or admin for teams/projects) - get_accessible_project_ids() helper: staff→all, PM→their clients' projects, CLIENT→projects from teams they belong to (with legacy owner fallback) - list_jobs, get_job, bulk_download, get_vtt_content, delete_job all use new isolation Frontend: - UserRole type gains 'project_manager' - Job, JobCreateRequest gain project_id field - Client, Team, Project, PMUser types added - ApiClient: full client/team/project/PM CRUD methods - useClients hook with all query/mutation hooks - Admin pages: ClientList + ClientDetail (teams, members, projects, PM assignment) - NewJob form: client + project picker (shown when clients exist) - Sidebar: Clients nav item for admin and project_manager roles - Routes: /admin/clients and /admin/clients/:clientId behind RoleGate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 15:11:13 +01:00
Vadym Samoilenko	ea21cace96	feat: replace SDK with direct HTTP integration to centralized cost tracker - New services/cost_tracker.py: sync httpx preflight()/record() + async wrappers; BudgetExceeded exception; no-op when COST_TRACKER_BASE_URL is empty - Preflight budget check added before ingestion (Gemini), per-language translation (video-native + traditional), and per-language TTS dispatch - _record_gemini_usage and _record_tts_cost now call cost_tracker directly; removes broken asyncio.get_event_loop() hack from sync Celery worker - Fix: _cost_ctx now threaded into extract_accessibility_targeted (video-native path) - Fix: user_id/cost_project_id now propagated through dispatch_language_tts → synthesize_cue_task.s() and the rerender_accessible_video.py re-render path - Remove oliver-cost-tracker SDK dependency (was commented-out/never installed) - Drop cost_tracker_outbox_path setting and get_cost_tracker() factory - Update COST_TRACKER_BASE_URL default to optical-dev.oliver.solutions in .env.prod.example, docker-compose.yml, and all Cloud Run service yamls - Cloud Run yamls use Secret Manager ref (cost-tracker-api-key) for the API key Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 13:36:15 +01:00
Vadym Samoilenko	ae2c474061	feat: integrate oliver-cost-tracker SDK into video-accessibility Add AI cost tracking to all Gemini and TTS call sites: - config.py: add COST_TRACKER_* env vars (base_url, api_key, source_app, outbox_path, enabled) - dependencies.py: add get_cost_tracker() factory (lru_cache, graceful degradation if SDK not installed) - models/job.py: add cost_tracker_project_id field for cost attribution - services/gemini.py: - add import time, _record_gemini_usage() helper (reads usage_metadata) - add _cost_ctx kwarg to extract_accessibility, extract_accessibility_targeted, transcreate_content, translate_vtt, rewrite_tts_cue - record usage after every generate_content call via asyncio.create_task() - tasks/ingest_and_ai.py: pass _cost_ctx (user_id, job_id, project_id) to extract_accessibility - tasks/translate_and_synthesize.py: build _cost_ctx from job_doc and pass to transcreate_content + translate_vtt calls - tasks/tts_synthesis.py: add user_id + cost_project_id kwargs, add _record_tts_cost() helper (records len(text) chars to cost tracker) - pyproject.toml: document SDK install instructions (comment) - .env.prod.example: add COST_TRACKER_* vars Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 11:30:46 +01:00
Vadym Samoilenko	cf761c4bb6	feat: add linguist role and user management navigation - Add LINGUIST role to UserRole enum (backend + frontend) - Grant linguists access to QC Review, Final Review, review notes, and VTT editing - Add MongoDB migration to update schema validator with linguist role - Add admin seed: vadymsamoilenko@oliver.agency is promoted to admin on startup - Add User Management sidebar link for admin users - Fix Login.tsx role type cast to use UserRole instead of hardcoded union Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 11:46:33 +01:00
Vadym Samoilenko	6f963ff7c4	feat: DCMP compliance, descriptive transcript, new languages, QA bug fixes - Rewrote VTT translation to two-step (text-only → Gemini → apply to original timestamps) preventing caption timing desync - Added polling fallback for all processing states and Safari visibilitychange WebSocket reconnect - Added 11 new TTS languages (cs, da, fi, hu, no, sk, sv, es-419, pt-BR, fr-CA) - Updated caption/AD prompts to DCMP Captioning Key & Description Key standards (line splitting, ♪ music notation, italic tags, caption positioning, ethics guidelines) - Added descriptive transcript generation (WCAG 2.1 §1.2.1) combining captions + AD into plain text - Fixed amix normalize=0 to prevent audio loss in rendered videos - Fixed AD re-timing double-count when source_ms is None - Fixed cue block numbering to be 1-based in VttEditor and Timeline Preview Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 11:50:43 +00:00
Vadym Samoilenko	1e177a6d5c	feat: add ElevenLabs voice selection to frontend and backend Add dynamic ElevenLabs voice catalog with provider toggle in the UI, allowing users to browse ElevenLabs voices, configure stability and similarity boost settings, and preview/synthesize with ElevenLabs TTS. Backend: - New elevenlabs_voices.py service with 1-hour cached API fetching - TTS routes support ?provider= query param for voices and options - Preview endpoint routes to ElevenLabs or Gemini based on provider - stability/similarity_boost params flow through TTS synthesis pipeline - TTSPreferences model extended with ElevenLabs-specific fields - Deprecated hardcoded elevenlabs_voices config (now fetched dynamically) Frontend: - Provider toggle (Gemini/ElevenLabs) in VoiceSelector - ElevenLabsSettingsPanel with stability and similarity boost sliders - VoicePreviewButton supports provider-specific preview parameters - API client passes provider param to voices, options, and preview endpoints - New VoiceInfo, ProviderVoicesResponse, ProviderOptionsResponse types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 13:58:56 +00:00
michael	9580979ac8	feat: add environment-based worker concurrency for Cloud Run mode Allow configuring Celery worker concurrency via environment variables to take advantage of Cloud Run autoscaling: - Add WORKER_CONCURRENCY, WHISPER_WORKER_CONCURRENCY, FFMPEG_WORKER_CONCURRENCY settings to config.py with recommended values documented - Update Dockerfile to use ${WORKER_CONCURRENCY} and ${WHISPER_WORKER_CONCURRENCY} environment variables instead of hardcoded values - Update docker-compose.yml to pass concurrency env vars to worker commands - Add WHISPER_SERVICE_URL and FFMPEG_SERVICE_URL to relevant workers Recommended settings: Local mode: WHISPER=1, FFMPEG=1 (CPU/RAM constrained) Cloud Run mode: WHISPER=10, FFMPEG=20 (match autoscaling limits) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-02 10:27:07 -06:00
michael	79440929f4	feat: add Cloud Run HTTP services for Whisper and FFmpeg Migrate CPU-intensive workloads to Cloud Run for autoscaling: - Add Whisper HTTP service (FastAPI) with /transcribe endpoint - Add FFmpeg HTTP service (FastAPI) with /encode, /probe, /extract-frame, etc. - Add Dockerfiles for both services (8 vCPU, 32GB RAM, Gen2) - Add Cloud Build config for CI/CD deployment - Add Cloud Run service YAML configs with scale-to-zero - Update whisper_transcribe.py to call Cloud Run when WHISPER_SERVICE_URL set - Update video_renderer.py to call Cloud Run when FFMPEG_SERVICE_URL set - Update whisper_service.py for Cloud Run compatibility (no settings dependency) - Add ffmpeg_service_url and whisper_service_url to config.py Services scale 0→N based on request load, falling back to local execution when service URLs are not configured (hybrid mode). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-02 10:12:50 -06:00
michael	d2d8e32819	feat: add video-native translation mode for multi-language content Add a new "Video Native Mode" translation option that re-processes the video through Gemini for each target language, generating captions and audio descriptions directly from visual context. This produces more natural and culturally appropriate content compared to traditional VTT text translation. Changes: - Add translation_mode field to RequestedOutputs (video_native \| traditional) - Create gemini_ingestion_targeted.md prompt for target language generation - Add extract_accessibility_targeted() method to Gemini service - Modify translate_and_synthesize task to handle both translation modes - Add Translation Mode UI selector in NewJob screen (video_native is default) - Remove transcreation UI (replaced by video_native mode) - Remove Google Translate service (replaced by Gemini translation) - Add LanguageSelector component with searchable dropdown 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-31 13:50:05 -06:00
michael	54638d1065	feat: switch Whisper model from large-v3 to medium Medium model is faster and uses less memory (~1.5GB vs ~3GB) while still providing good multilingual transcription quality. Updated in: - config.py - docker-compose.yml - whisper-worker-service.yaml - cloudbuild.yaml - Dockerfile (pre-download) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 22:35:47 -06:00
michael	3538dea47f	fix: update whisper_max_search_window to 30s in config.py The setting in config.py (5.0) was overriding the default in whisper_service.py (30.0). Now both are consistent at 30s. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 21:53:57 -06:00
michael	614ff841fe	feat: upgrade Whisper model from base to large-v3 Uses the multilingual large model for more accurate transcription and sentence boundary detection. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-28 21:20:03 -06:00
michael	05bde8326d	feat: add Whisper-based pause point refinement for audio descriptions Implements word-level speech analysis using faster-whisper to refine AD pause points. Gemini's timestamps are snapped to natural speech gaps (sentence/phrase boundaries) to prevent pauses mid-word. Key changes: - Add WhisperService for transcription and gap detection - Add dedicated Celery task routed to 'whisper' queue - Integrate refinement into render_accessible_video task - Cache Whisper transcripts in MongoDB for reuse across languages - Add dedicated whisper-worker with concurrency=1 to prevent OOM Configuration: - Uses faster-whisper 'base' model (multilingual, ~145MB) - 5-second search window after Gemini's recommended point - Falls back to original timestamp if no gap found Infrastructure: - New Docker stage: whisper-worker - New Cloud Run service: accessible-video-whisper-worker - Updated docker-compose.yml with whisper-worker service 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-27 08:27:48 -06:00
michael	6effe58dc9	feat: add video review with timestamped notes to Final Review Add a comprehensive video review feature to the Final Review page that allows reviewers to watch videos with caption overlays and add timestamped notes. Backend: - New ReviewNote model for MongoDB with job_id, asset_key, timestamp, content - CRUD API endpoints at /jobs/{job_id}/review-notes - Owner-only edit/delete permissions (admins can bypass) - Database indexes for efficient querying Frontend: - VideoReviewPlayer component with video player and caption overlay - NotesSidebar for viewing/adding notes with auto-highlight when video reaches timestamp - SyncedCaptionList with auto-scroll and click-to-seek - AssetTabs for switching between languages and accessible videos - React Query hooks with 30s polling for collaborative updates Features: - Notes persist to database and are shared across all reviewers - Notes highlight for 5 seconds when video playback reaches their timestamp - Click note to seek video to that position - Pause video to add note at current timestamp - Accessible videos use retimed captions when available 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 15:30:00 -06:00
michael	865fcdc246	feat: add TTS settings panel with model, speed, and style options - Add model selection (flash vs pro) for quality control - Add speed slider (0.5x - 2.0x) for pacing adjustment - Add style presets (neutral, calm, energetic, professional, warm, documentary) - Add custom style prompt option for advanced customization - New /tts/options endpoint returns available TTS options - Voice preview now tests all settings so users hear exact output - Backward compatible: all new fields have sensible defaults 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 15:22:14 -06:00
michael	29643f6683	upgrade TTS to Gemini TTS with voice selection and preview - Add Gemini TTS service with 30 voices and 24 languages - Add TTS API endpoints for voice listing and preview - Add per-language voice selection in job creation form - Add voice override at QC approval stage - Add VoiceSelector and VoicePreviewButton components - Update TTSPreferences model with provider and voice mapping 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 14:41:57 -06:00
michael	665b49c3f1	added MSAL microsoft authentication	2025-10-10 09:19:39 -05:00
michael	7ea23b9858	fixed objectID/stringID mismatch	2025-10-08 18:23:05 -05:00
michael	1a1ed3048d	wrote docker files and deployment instructions	2025-10-08 16:00:12 -05:00
michael	af2562096a	initial commit	2025-08-24 16:28:33 -05:00

44 commits