Vadym Samoilenko ae2c474061 feat: integrate oliver-cost-tracker SDK into video-accessibility

Add AI cost tracking to all Gemini and TTS call sites:

- config.py: add COST_TRACKER_* env vars (base_url, api_key, source_app,
  outbox_path, enabled)
- dependencies.py: add get_cost_tracker() factory (lru_cache, graceful
  degradation if SDK not installed)
- models/job.py: add cost_tracker_project_id field for cost attribution
- services/gemini.py:
  - add import time, _record_gemini_usage() helper (reads usage_metadata)
  - add _cost_ctx kwarg to extract_accessibility, extract_accessibility_targeted,
    transcreate_content, translate_vtt, rewrite_tts_cue
  - record usage after every generate_content call via asyncio.create_task()
- tasks/ingest_and_ai.py: pass _cost_ctx (user_id, job_id, project_id) to
  extract_accessibility
- tasks/translate_and_synthesize.py: build _cost_ctx from job_doc and pass
  to transcreate_content + translate_vtt calls
- tasks/tts_synthesis.py: add user_id + cost_project_id kwargs, add
  _record_tts_cost() helper (records len(text) chars to cost tracker)
- pyproject.toml: document SDK install instructions (comment)
- .env.prod.example: add COST_TRACKER_* vars

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-27 11:30:46 +01:00

5.2 KiB

Raw Permalink Blame History

Accessible Video Processing Platform - Development Guide

Project Overview

This is a comprehensive video accessibility platform that automatically generates closed captions and audio descriptions using AI, with quality control workflows and multi-language support.

Core Tech Stack:

Frontend: React 18 + Vite SPA (TypeScript)
Backend: FastAPI + Celery workers (Python 3.11+)
Database: MongoDB Atlas
Storage: Google Cloud Storage with signed URLs
AI: Gemini 2.5 Pro, Google Cloud Translate, ElevenLabs TTS
Queue: Redis + Celery
Auth: JWT with HttpOnly refresh cookies

Development Instructions

CRITICAL: Always Read the Full Development Plan

Before starting any development work, ALWAYS read the entire video_accessibility_development_plan.txt file. This document contains:

Complete technical specifications
API contracts and schemas
Database models and indexes
Worker pipeline details
Frontend component specifications
Security requirements
Testing strategies

The development plan is the authoritative source for all implementation details. Refer to it frequently during development to ensure consistency with the overall architecture.

Key Implementation Phases

Phase 1: Foundation & Setup

Monorepo structure (backend/, frontend/, infra/)
FastAPI backend initialization
React + Vite frontend setup
MongoDB and Redis configuration
JWT authentication with RBAC

Phase 2: Core Services

Google Cloud Storage integration
Gemini 2.5 Pro service
Job model with state machine
Celery worker infrastructure

Phase 3: Ingestion & AI Pipeline

Video upload system
Ingestion worker task
VTT generation
Gemini prompt system

Phase 4: Quality Control System

VTT editor component
QC dashboard for reviewers
Approval/rejection workflow
Video player with captions

Phase 5: Translation & TTS Pipeline

Google Cloud Translate integration
Transcreation system
Translation worker
TTS service integration

Phase 6: Final Review & Delivery

Final review interface
Job completion workflow
Email notifications
Client download portal

Phase 7: Production Readiness

Comprehensive testing
Security hardening
Observability setup
CI/CD configuration

Job Status State Machine

created → ingesting → ai_processing → translating → tts_generating → rendering_video → pending_qc → pending_final_review → completed
                                                                                           ↓
                                                                                       rejected

Key Architecture Decisions

Security

Access tokens stored in memory (not localStorage)
Refresh tokens in HttpOnly cookies
RBAC enforcement server-side
Signed URLs for file access (24h expiry)
Audit logs for all reviewer actions

Data Flow

Client uploads MP4 → GCS + MongoDB record
Celery worker processes video with Gemini 2.5 Pro
Generates captions.vtt and audio_description.vtt for source language
Translation, TTS synthesis, and accessible video rendering run automatically
Job enters QC Review for reviewer approval (edits can trigger re-rendering)
QC approval moves job directly to Final Review
Final review and client notification with download links

File Structure

gs://accessible-video/{jobId}/
  source.mp4
  en/
    captions.vtt
    ad.vtt
    ad.mp3
  {lang}/
    captions.vtt
    ad.vtt
    ad.mp3

Development Guidelines

Before Each Session

Read the complete video_accessibility_development_plan.txt
Review the current todo list and phase
Check existing code patterns and conventions
Understand the security and accessibility requirements

Code Standards

Follow existing patterns in the codebase
Implement proper error handling and retries
Add OpenTelemetry tracing for observability
Ensure RBAC is enforced on all endpoints
Validate all VTT outputs for correctness
Write unit tests for all services and utilities

Testing Requirements

Unit tests ≥80% coverage for services/utils
Integration tests with mocked AI services
E2E tests for complete workflows
Performance testing for video processing

Lint/Type Check Commands

Backend: ruff check . and mypy .
Frontend: npm run lint and npm run type-check

Important Files to Reference

video_accessibility_development_plan.txt - Complete specification
Backend schemas in section 17 of the plan
API design in section 7 of the plan
Frontend component specs in section 10 of the plan
Security requirements in section 11 of the plan

Risk Mitigations

Invalid JSON from AI models: Pydantic validation + self-heal prompts
Timestamp drift: Preserve cue timings in translations
TTS alignment: Per-cue synthesis with crossfades
Queue backlog: Autoscaling workers with monitoring
Security: Secret Manager, least-privilege IAM, no client secrets

Knowledge Wiki

A cross-project knowledge base is maintained automatically from all Claude Code sessions.

Index: /Users/aimpress/Library/Mobile Documents/iCloud~md~obsidian/Documents/VadymSamoilenko/wiki/index.md
Query: cd ~/.claude/memory-compiler && uv run python scripts/query.py "your question"
Every session in this project automatically feeds the knowledge base.

5.2 KiB Raw Permalink Blame History