Add AI cost tracking to all Gemini and TTS call sites:
- config.py: add COST_TRACKER_* env vars (base_url, api_key, source_app,
outbox_path, enabled)
- dependencies.py: add get_cost_tracker() factory (lru_cache, graceful
degradation if SDK not installed)
- models/job.py: add cost_tracker_project_id field for cost attribution
- services/gemini.py:
- add import time, _record_gemini_usage() helper (reads usage_metadata)
- add _cost_ctx kwarg to extract_accessibility, extract_accessibility_targeted,
transcreate_content, translate_vtt, rewrite_tts_cue
- record usage after every generate_content call via asyncio.create_task()
- tasks/ingest_and_ai.py: pass _cost_ctx (user_id, job_id, project_id) to
extract_accessibility
- tasks/translate_and_synthesize.py: build _cost_ctx from job_doc and pass
to transcreate_content + translate_vtt calls
- tasks/tts_synthesis.py: add user_id + cost_project_id kwargs, add
_record_tts_cost() helper (records len(text) chars to cost tracker)
- pyproject.toml: document SDK install instructions (comment)
- .env.prod.example: add COST_TRACKER_* vars
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
157 lines
5.2 KiB
Markdown
157 lines
5.2 KiB
Markdown
# Accessible Video Processing Platform - Development Guide
|
|
|
|
## Project Overview
|
|
This is a comprehensive video accessibility platform that automatically generates closed captions and audio descriptions using AI, with quality control workflows and multi-language support.
|
|
|
|
**Core Tech Stack:**
|
|
- Frontend: React 18 + Vite SPA (TypeScript)
|
|
- Backend: FastAPI + Celery workers (Python 3.11+)
|
|
- Database: MongoDB Atlas
|
|
- Storage: Google Cloud Storage with signed URLs
|
|
- AI: Gemini 2.5 Pro, Google Cloud Translate, ElevenLabs TTS
|
|
- Queue: Redis + Celery
|
|
- Auth: JWT with HttpOnly refresh cookies
|
|
|
|
## Development Instructions
|
|
|
|
### CRITICAL: Always Read the Full Development Plan
|
|
**Before starting any development work, ALWAYS read the entire `video_accessibility_development_plan.txt` file.** This document contains:
|
|
- Complete technical specifications
|
|
- API contracts and schemas
|
|
- Database models and indexes
|
|
- Worker pipeline details
|
|
- Frontend component specifications
|
|
- Security requirements
|
|
- Testing strategies
|
|
|
|
The development plan is the authoritative source for all implementation details. Refer to it frequently during development to ensure consistency with the overall architecture.
|
|
|
|
## Key Implementation Phases
|
|
|
|
### Phase 1: Foundation & Setup
|
|
- Monorepo structure (backend/, frontend/, infra/)
|
|
- FastAPI backend initialization
|
|
- React + Vite frontend setup
|
|
- MongoDB and Redis configuration
|
|
- JWT authentication with RBAC
|
|
|
|
### Phase 2: Core Services
|
|
- Google Cloud Storage integration
|
|
- Gemini 2.5 Pro service
|
|
- Job model with state machine
|
|
- Celery worker infrastructure
|
|
|
|
### Phase 3: Ingestion & AI Pipeline
|
|
- Video upload system
|
|
- Ingestion worker task
|
|
- VTT generation
|
|
- Gemini prompt system
|
|
|
|
### Phase 4: Quality Control System
|
|
- VTT editor component
|
|
- QC dashboard for reviewers
|
|
- Approval/rejection workflow
|
|
- Video player with captions
|
|
|
|
### Phase 5: Translation & TTS Pipeline
|
|
- Google Cloud Translate integration
|
|
- Transcreation system
|
|
- Translation worker
|
|
- TTS service integration
|
|
|
|
### Phase 6: Final Review & Delivery
|
|
- Final review interface
|
|
- Job completion workflow
|
|
- Email notifications
|
|
- Client download portal
|
|
|
|
### Phase 7: Production Readiness
|
|
- Comprehensive testing
|
|
- Security hardening
|
|
- Observability setup
|
|
- CI/CD configuration
|
|
|
|
## Job Status State Machine
|
|
```
|
|
created → ingesting → ai_processing → translating → tts_generating → rendering_video → pending_qc → pending_final_review → completed
|
|
↓
|
|
rejected
|
|
```
|
|
|
|
## Key Architecture Decisions
|
|
|
|
### Security
|
|
- Access tokens stored in memory (not localStorage)
|
|
- Refresh tokens in HttpOnly cookies
|
|
- RBAC enforcement server-side
|
|
- Signed URLs for file access (24h expiry)
|
|
- Audit logs for all reviewer actions
|
|
|
|
### Data Flow
|
|
1. Client uploads MP4 → GCS + MongoDB record
|
|
2. Celery worker processes video with Gemini 2.5 Pro
|
|
3. Generates captions.vtt and audio_description.vtt for source language
|
|
4. Translation, TTS synthesis, and accessible video rendering run automatically
|
|
5. Job enters QC Review for reviewer approval (edits can trigger re-rendering)
|
|
6. QC approval moves job directly to Final Review
|
|
7. Final review and client notification with download links
|
|
|
|
### File Structure
|
|
```
|
|
gs://accessible-video/{jobId}/
|
|
source.mp4
|
|
en/
|
|
captions.vtt
|
|
ad.vtt
|
|
ad.mp3
|
|
{lang}/
|
|
captions.vtt
|
|
ad.vtt
|
|
ad.mp3
|
|
```
|
|
|
|
## Development Guidelines
|
|
|
|
### Before Each Session
|
|
1. Read the complete `video_accessibility_development_plan.txt`
|
|
2. Review the current todo list and phase
|
|
3. Check existing code patterns and conventions
|
|
4. Understand the security and accessibility requirements
|
|
|
|
### Code Standards
|
|
- Follow existing patterns in the codebase
|
|
- Implement proper error handling and retries
|
|
- Add OpenTelemetry tracing for observability
|
|
- Ensure RBAC is enforced on all endpoints
|
|
- Validate all VTT outputs for correctness
|
|
- Write unit tests for all services and utilities
|
|
|
|
### Testing Requirements
|
|
- Unit tests ≥80% coverage for services/utils
|
|
- Integration tests with mocked AI services
|
|
- E2E tests for complete workflows
|
|
- Performance testing for video processing
|
|
|
|
### Lint/Type Check Commands
|
|
- Backend: `ruff check .` and `mypy .`
|
|
- Frontend: `npm run lint` and `npm run type-check`
|
|
|
|
## Important Files to Reference
|
|
- `video_accessibility_development_plan.txt` - Complete specification
|
|
- Backend schemas in section 17 of the plan
|
|
- API design in section 7 of the plan
|
|
- Frontend component specs in section 10 of the plan
|
|
- Security requirements in section 11 of the plan
|
|
|
|
## Risk Mitigations
|
|
- Invalid JSON from AI models: Pydantic validation + self-heal prompts
|
|
- Timestamp drift: Preserve cue timings in translations
|
|
- TTS alignment: Per-cue synthesis with crossfades
|
|
- Queue backlog: Autoscaling workers with monitoring
|
|
- Security: Secret Manager, least-privilege IAM, no client secrets
|
|
|
|
## Knowledge Wiki
|
|
A cross-project knowledge base is maintained automatically from all Claude Code sessions.
|
|
- **Index:** `/Users/aimpress/Library/Mobile Documents/iCloud~md~obsidian/Documents/VadymSamoilenko/wiki/index.md`
|
|
- **Query:** `cd ~/.claude/memory-compiler && uv run python scripts/query.py "your question"`
|
|
- Every session in this project automatically feeds the knowledge base.
|