No description
Find a file
2025-08-24 16:28:33 -05:00
.github/workflows initial commit 2025-08-24 16:28:33 -05:00
backend initial commit 2025-08-24 16:28:33 -05:00
docs initial commit 2025-08-24 16:28:33 -05:00
frontend initial commit 2025-08-24 16:28:33 -05:00
infra initial commit 2025-08-24 16:28:33 -05:00
.DS_Store initial commit 2025-08-24 16:28:33 -05:00
.gitignore initial commit 2025-08-24 16:28:33 -05:00
CLAUDE.md initial commit 2025-08-24 16:28:33 -05:00
docker-compose.yml initial commit 2025-08-24 16:28:33 -05:00
Makefile initial commit 2025-08-24 16:28:33 -05:00
mongo-init.js initial commit 2025-08-24 16:28:33 -05:00
mongo-keyfile initial commit 2025-08-24 16:28:33 -05:00
README.md initial commit 2025-08-24 16:28:33 -05:00
video_accessibility_development_plan.txt initial commit 2025-08-24 16:28:33 -05:00

Accessible Video Processing Platform

An AI-powered platform for generating accessible video content including closed captions, audio descriptions, and multi-language translations.

Features

  • AI-Powered Processing: Uses Gemini 2.5 Pro for intelligent caption and audio description generation
  • Multi-Language Support: Automatic translation and cultural transcreation
  • Quality Control Workflow: Built-in review and approval process
  • Audio Description: Text-to-speech generation for voiceovers
  • Secure File Handling: Google Cloud Storage with signed URLs
  • Role-Based Access: Client, reviewer, and admin roles with appropriate permissions

Tech Stack

Backend

  • FastAPI - Modern Python web framework
  • Celery - Distributed task queue for video processing
  • MongoDB - Document database for job and user data
  • Redis - Task queue broker and caching
  • Google Cloud Services - Storage, AI, and TTS

Frontend

  • React 18 - UI framework
  • Vite - Fast build tool and dev server
  • TypeScript - Type safety
  • TanStack Query - Data fetching and caching
  • Tailwind CSS - Utility-first styling

Getting Started

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • Poetry (for Python dependency management)
  • MongoDB (Atlas recommended)
  • Redis
  • Google Cloud Project with required APIs enabled

Installation

  1. Clone and setup environment:

    git clone <repository>
    cd accessible-video
    make setup-env
    
  2. Install dependencies:

    make install
    
  3. Configure environment variables:

    • Update backend/.env with your database, API keys, and service credentials
    • Update frontend/.env with your API base URL

Development

Start all services (requires tmux):

make dev

Or start services individually:

# Terminal 1 - Backend API
make dev-backend

# Terminal 2 - Frontend SPA  
make dev-frontend

# Terminal 3 - Celery Worker
make dev-worker

The application will be available at:

Testing

# Run all tests
make test-backend
make test-frontend

# Lint code
make lint

Architecture

Job Processing Pipeline

  1. Upload: Client uploads MP4 video
  2. Ingestion: Video is processed and analyzed by Gemini 2.5 Pro
  3. QC Review: Human reviewer approves/rejects English captions and audio descriptions
  4. Translation: Approved content is translated to target languages
  5. TTS Generation: Audio descriptions are converted to speech
  6. Final Review: Reviewer approves final multi-language assets
  7. Delivery: Client receives email with download links

File Structure

backend/          # FastAPI application
├── app/
│   ├── api/      # REST API routes
│   ├── core/     # Configuration and shared utilities
│   ├── models/   # Pydantic data models
│   ├── services/ # External service integrations
│   ├── tasks/    # Celery background tasks
│   └── prompts/  # AI prompt templates
└── tests/        # Test suite

frontend/         # React SPA
├── src/
│   ├── components/ # Reusable UI components
│   ├── routes/     # Page components
│   ├── lib/        # Utilities and API client
│   ├── hooks/      # Custom React hooks
│   └── types/      # TypeScript definitions
└── public/       # Static assets

Configuration

Required Environment Variables

Backend (.env):

  • MONGODB_URI - MongoDB connection string
  • REDIS_URL - Redis connection string
  • JWT_SECRET - Secret for JWT token signing
  • GEMINI_API_KEY - Google Gemini API key
  • GCS_BUCKET - Google Cloud Storage bucket name
  • SENDGRID_API_KEY - SendGrid for email notifications

Frontend (.env):

  • VITE_API_BASE_URL - Backend API URL

Google Cloud Setup

  1. Create a GCP project
  2. Enable required APIs:
    • Cloud Storage API
    • Cloud Translation API
    • Cloud Text-to-Speech API
    • Vertex AI API (for Gemini)
  3. Create service account with appropriate permissions
  4. Download service account key and configure GOOGLE_APPLICATION_CREDENTIALS

Deployment

The application is designed for deployment on Google Cloud:

  • Backend: Cloud Run with auto-scaling
  • Workers: Cloud Run with Celery
  • Frontend: Cloud Storage + Cloud CDN
  • Database: MongoDB Atlas
  • Queue: Cloud Memorystore (Redis)

See /infra directory for deployment configurations.

Security

  • JWT authentication with refresh token rotation
  • Role-based access control (RBAC)
  • Signed URLs for secure file access
  • Audit logging for all reviewer actions
  • HTTPS enforcement in production

Development Guide

Always refer to the complete development plan in video_accessibility_development_plan.txt for detailed specifications and requirements. The CLAUDE.md file contains additional development guidelines and phase-by-phase implementation details.