- AGENTS.md: canonical project entry point (Quick Nav, pipeline, constraints) - docs/: complete docs tree — architecture, API spec, DB schema, infra, runbook, requirements, tech stack, principles, reference ADRs, guides, tasks backlog, testing strategy - tests/README.md: test commands, structure, known gaps - README.md / CLAUDE.md / DEPLOYMENT.md: updated with canonical doc links - .archive/: backup of pre-documentation-pipeline originals - backend/uv.lock: uv dependency lockfile - Delete committed __pycache__ .pyc files (should have been gitignored) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.3 KiB
5.3 KiB
Accessible Video Processing Platform - Development Guide
@AGENTS.md
Project Overview
This is a comprehensive video accessibility platform that automatically generates closed captions and audio descriptions using AI, with quality control workflows and multi-language support.
Core Tech Stack:
- Frontend: React 18 + Vite SPA (TypeScript)
- Backend: FastAPI + Celery workers (Python 3.11+)
- Database: MongoDB Atlas
- Storage: Google Cloud Storage with signed URLs
- AI: Gemini 2.5 Pro, Google Cloud Translate, ElevenLabs TTS
- Queue: Redis + Celery
- Auth: JWT with HttpOnly refresh cookies
Development Instructions
CRITICAL: Always Read the Full Development Plan
Before starting any development work, ALWAYS read the entire video_accessibility_development_plan.txt file. This document contains:
- Complete technical specifications
- API contracts and schemas
- Database models and indexes
- Worker pipeline details
- Frontend component specifications
- Security requirements
- Testing strategies
The development plan is the authoritative source for all implementation details. Refer to it frequently during development to ensure consistency with the overall architecture.
Key Implementation Phases
Phase 1: Foundation & Setup
- Monorepo structure (backend/, frontend/, infra/)
- FastAPI backend initialization
- React + Vite frontend setup
- MongoDB and Redis configuration
- JWT authentication with RBAC
Phase 2: Core Services
- Google Cloud Storage integration
- Gemini 2.5 Pro service
- Job model with state machine
- Celery worker infrastructure
Phase 3: Ingestion & AI Pipeline
- Video upload system
- Ingestion worker task
- VTT generation
- Gemini prompt system
Phase 4: Quality Control System
- VTT editor component
- QC dashboard for reviewers
- Approval/rejection workflow
- Video player with captions
Phase 5: Translation & TTS Pipeline
- Google Cloud Translate integration
- Transcreation system
- Translation worker
- TTS service integration
Phase 6: Final Review & Delivery
- Final review interface
- Job completion workflow
- Email notifications
- Client download portal
Phase 7: Production Readiness
- Comprehensive testing
- Security hardening
- Observability setup
- CI/CD configuration
Job Status State Machine
created → ingesting → ai_processing → translating → tts_generating → rendering_video → pending_qc → pending_final_review → completed
↓
rejected
Key Architecture Decisions
Security
- Access tokens stored in memory (not localStorage)
- Refresh tokens in HttpOnly cookies
- RBAC enforcement server-side
- Signed URLs for file access (24h expiry)
- Audit logs for all reviewer actions
Data Flow
- Client uploads MP4 → GCS + MongoDB record
- Celery worker processes video with Gemini 2.5 Pro
- Generates captions.vtt and audio_description.vtt for source language
- Translation, TTS synthesis, and accessible video rendering run automatically
- Job enters QC Review for reviewer approval (edits can trigger re-rendering)
- QC approval moves job directly to Final Review
- Final review and client notification with download links
File Structure
gs://accessible-video/{jobId}/
source.mp4
en/
captions.vtt
ad.vtt
ad.mp3
{lang}/
captions.vtt
ad.vtt
ad.mp3
Development Guidelines
Before Each Session
- Read the complete
video_accessibility_development_plan.txt - Review the current todo list and phase
- Check existing code patterns and conventions
- Understand the security and accessibility requirements
Code Standards
- Follow existing patterns in the codebase
- Implement proper error handling and retries
- Add OpenTelemetry tracing for observability
- Ensure RBAC is enforced on all endpoints
- Validate all VTT outputs for correctness
- Write unit tests for all services and utilities
Testing Requirements
- Unit tests ≥80% coverage for services/utils
- Integration tests with mocked AI services
- E2E tests for complete workflows
- Performance testing for video processing
Lint/Type Check Commands
- Backend:
ruff check .andmypy . - Frontend:
npm run lintandnpm run type-check
Important Files to Reference
video_accessibility_development_plan.txt- Complete specification- Backend schemas in section 17 of the plan
- API design in section 7 of the plan
- Frontend component specs in section 10 of the plan
- Security requirements in section 11 of the plan
Risk Mitigations
- Invalid JSON from AI models: Pydantic validation + self-heal prompts
- Timestamp drift: Preserve cue timings in translations
- TTS alignment: Per-cue synthesis with crossfades
- Queue backlog: Autoscaling workers with monitoring
- Security: Secret Manager, least-privilege IAM, no client secrets
Knowledge Wiki
A cross-project knowledge base is maintained automatically from all Claude Code sessions.
- Index:
/Users/aimpress/Library/Mobile Documents/iCloud~md~obsidian/Documents/VadymSamoilenko/wiki/index.md - Query:
cd ~/.claude/memory-compiler && uv run python scripts/query.py "your question" - Every session in this project automatically feeds the knowledge base.