video-accessibility/CLAUDE.md
Vadym Samoilenko ae2c474061 feat: integrate oliver-cost-tracker SDK into video-accessibility
Add AI cost tracking to all Gemini and TTS call sites:

- config.py: add COST_TRACKER_* env vars (base_url, api_key, source_app,
  outbox_path, enabled)
- dependencies.py: add get_cost_tracker() factory (lru_cache, graceful
  degradation if SDK not installed)
- models/job.py: add cost_tracker_project_id field for cost attribution
- services/gemini.py:
  - add import time, _record_gemini_usage() helper (reads usage_metadata)
  - add _cost_ctx kwarg to extract_accessibility, extract_accessibility_targeted,
    transcreate_content, translate_vtt, rewrite_tts_cue
  - record usage after every generate_content call via asyncio.create_task()
- tasks/ingest_and_ai.py: pass _cost_ctx (user_id, job_id, project_id) to
  extract_accessibility
- tasks/translate_and_synthesize.py: build _cost_ctx from job_doc and pass
  to transcreate_content + translate_vtt calls
- tasks/tts_synthesis.py: add user_id + cost_project_id kwargs, add
  _record_tts_cost() helper (records len(text) chars to cost tracker)
- pyproject.toml: document SDK install instructions (comment)
- .env.prod.example: add COST_TRACKER_* vars

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 11:30:46 +01:00

5.2 KiB

Accessible Video Processing Platform - Development Guide

Project Overview

This is a comprehensive video accessibility platform that automatically generates closed captions and audio descriptions using AI, with quality control workflows and multi-language support.

Core Tech Stack:

  • Frontend: React 18 + Vite SPA (TypeScript)
  • Backend: FastAPI + Celery workers (Python 3.11+)
  • Database: MongoDB Atlas
  • Storage: Google Cloud Storage with signed URLs
  • AI: Gemini 2.5 Pro, Google Cloud Translate, ElevenLabs TTS
  • Queue: Redis + Celery
  • Auth: JWT with HttpOnly refresh cookies

Development Instructions

CRITICAL: Always Read the Full Development Plan

Before starting any development work, ALWAYS read the entire video_accessibility_development_plan.txt file. This document contains:

  • Complete technical specifications
  • API contracts and schemas
  • Database models and indexes
  • Worker pipeline details
  • Frontend component specifications
  • Security requirements
  • Testing strategies

The development plan is the authoritative source for all implementation details. Refer to it frequently during development to ensure consistency with the overall architecture.

Key Implementation Phases

Phase 1: Foundation & Setup

  • Monorepo structure (backend/, frontend/, infra/)
  • FastAPI backend initialization
  • React + Vite frontend setup
  • MongoDB and Redis configuration
  • JWT authentication with RBAC

Phase 2: Core Services

  • Google Cloud Storage integration
  • Gemini 2.5 Pro service
  • Job model with state machine
  • Celery worker infrastructure

Phase 3: Ingestion & AI Pipeline

  • Video upload system
  • Ingestion worker task
  • VTT generation
  • Gemini prompt system

Phase 4: Quality Control System

  • VTT editor component
  • QC dashboard for reviewers
  • Approval/rejection workflow
  • Video player with captions

Phase 5: Translation & TTS Pipeline

  • Google Cloud Translate integration
  • Transcreation system
  • Translation worker
  • TTS service integration

Phase 6: Final Review & Delivery

  • Final review interface
  • Job completion workflow
  • Email notifications
  • Client download portal

Phase 7: Production Readiness

  • Comprehensive testing
  • Security hardening
  • Observability setup
  • CI/CD configuration

Job Status State Machine

created → ingesting → ai_processing → translating → tts_generating → rendering_video → pending_qc → pending_final_review → completed
                                                                                           ↓
                                                                                       rejected

Key Architecture Decisions

Security

  • Access tokens stored in memory (not localStorage)
  • Refresh tokens in HttpOnly cookies
  • RBAC enforcement server-side
  • Signed URLs for file access (24h expiry)
  • Audit logs for all reviewer actions

Data Flow

  1. Client uploads MP4 → GCS + MongoDB record
  2. Celery worker processes video with Gemini 2.5 Pro
  3. Generates captions.vtt and audio_description.vtt for source language
  4. Translation, TTS synthesis, and accessible video rendering run automatically
  5. Job enters QC Review for reviewer approval (edits can trigger re-rendering)
  6. QC approval moves job directly to Final Review
  7. Final review and client notification with download links

File Structure

gs://accessible-video/{jobId}/
  source.mp4
  en/
    captions.vtt
    ad.vtt
    ad.mp3
  {lang}/
    captions.vtt
    ad.vtt
    ad.mp3

Development Guidelines

Before Each Session

  1. Read the complete video_accessibility_development_plan.txt
  2. Review the current todo list and phase
  3. Check existing code patterns and conventions
  4. Understand the security and accessibility requirements

Code Standards

  • Follow existing patterns in the codebase
  • Implement proper error handling and retries
  • Add OpenTelemetry tracing for observability
  • Ensure RBAC is enforced on all endpoints
  • Validate all VTT outputs for correctness
  • Write unit tests for all services and utilities

Testing Requirements

  • Unit tests ≥80% coverage for services/utils
  • Integration tests with mocked AI services
  • E2E tests for complete workflows
  • Performance testing for video processing

Lint/Type Check Commands

  • Backend: ruff check . and mypy .
  • Frontend: npm run lint and npm run type-check

Important Files to Reference

  • video_accessibility_development_plan.txt - Complete specification
  • Backend schemas in section 17 of the plan
  • API design in section 7 of the plan
  • Frontend component specs in section 10 of the plan
  • Security requirements in section 11 of the plan

Risk Mitigations

  • Invalid JSON from AI models: Pydantic validation + self-heal prompts
  • Timestamp drift: Preserve cue timings in translations
  • TTS alignment: Per-cue synthesis with crossfades
  • Queue backlog: Autoscaling workers with monitoring
  • Security: Secret Manager, least-privilege IAM, no client secrets

Knowledge Wiki

A cross-project knowledge base is maintained automatically from all Claude Code sessions.

  • Index: /Users/aimpress/Library/Mobile Documents/iCloud~md~obsidian/Documents/VadymSamoilenko/wiki/index.md
  • Query: cd ~/.claude/memory-compiler && uv run python scripts/query.py "your question"
  • Every session in this project automatically feeds the knowledge base.