video-accessibility-old/docs/project/requirements.md
Vadym Samoilenko a3b300b76a docs: add canonical documentation + audit cleanup
- AGENTS.md: canonical project entry point (Quick Nav, pipeline, constraints)
- docs/: complete docs tree — architecture, API spec, DB schema, infra,
  runbook, requirements, tech stack, principles, reference ADRs, guides,
  tasks backlog, testing strategy
- tests/README.md: test commands, structure, known gaps
- README.md / CLAUDE.md / DEPLOYMENT.md: updated with canonical doc links
- .archive/: backup of pre-documentation-pipeline originals
- backend/uv.lock: uv dependency lockfile
- Delete committed __pycache__ .pyc files (should have been gitignored)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:22:51 +01:00

6.8 KiB

Functional Requirements — Accessible Video Processing Platform

Purpose

This document specifies what the system must do from a user perspective. Implementation details belong in architecture.md. Non-functional requirements (performance, security) belong in architecture.md.


User Roles

Role Who Primary action
CLIENT Paying customer Upload videos, download deliverables
REVIEWER Oliver internal QC approve/reject captions + audio description
LINGUIST Language specialist Review and approve translated content per language
PM Project manager Assign linguists, give final approval, monitor all jobs
ADMIN Platform operator Manage users, view audit log, configure platform

Core Features

R-01: Video Upload

Requirement Detail
R-01.1 Client can upload an MP4 video file
R-01.2 File is stored in GCS; client receives a job ID
R-01.3 Upload progress is displayed in real time
R-01.4 System validates file type (MP4 only) and size on upload
R-01.5 Upload creates a job record in CREATED state

R-02: AI Processing Pipeline

Requirement Detail
R-02.1 System automatically generates closed captions in VTT format using Gemini 2.5 Pro
R-02.2 System generates audio description VTT (scene descriptions for blind/low-vision viewers)
R-02.3 System generates SDH captions (includes sound effects and speaker IDs)
R-02.4 System generates a descriptive transcript
R-02.5 All generated VTT files are validated for correct format before advancing to QC
R-02.6 Job status is updated in real time via WebSocket during processing

R-03: Quality Control Workflow

Requirement Detail
R-03.1 Reviewer can view video with captions side-by-side
R-03.2 Reviewer can edit individual VTT cues (text + timing) inline
R-03.3 Reviewer can approve English content (advances to APPROVED_ENGLISH)
R-03.4 Reviewer can reject a job with a reason (advances to REJECTED)
R-03.5 Reviewer can send QC feedback without full rejection (advances to QC_FEEDBACK)
R-03.6 All QC actions are recorded in the audit log with timestamp and user ID
R-03.7 VTT edits create a version snapshot before overwriting (version history maintained)

R-04: Per-Language QC

Requirement Detail
R-04.1 PM can assign a specific linguist to each output language
R-04.2 Linguist can approve or reject their assigned language
R-04.3 Language statuses are independent — approving French does not affect German
R-04.4 Linguist cannot approve a language not assigned to them
R-04.5 Job advances to PENDING_FINAL_REVIEW only when all languages are approved
R-04.6 Per-language QC actions are recorded in the audit log

R-05: Translation and TTS

Requirement Detail
R-05.1 System translates captions and AD into all requested output languages
R-05.2 System applies cultural transcreation (Gemini-assisted) where configured
R-05.3 System uses client-specific glossary terms in translation prompts
R-05.4 System synthesises audio description audio via Google TTS or ElevenLabs
R-05.5 TTS is performed per cue to preserve timing
R-05.6 TTS failures result in TTS_FAILED state; manual retry is supported

R-06: Glossary Management

Requirement Detail
R-06.1 Admin can create, read, update, and delete glossary terms per client organisation
R-06.2 Glossary terms specify source term, target language, preferred translation
R-06.3 System uses exact match first, then vector similarity (≥0.75) for retrieval
R-06.4 Up to 50 terms are injected per translation prompt
R-06.5 Glossary embeddings are indexed in Atlas Vector Search

R-07: VTT Version Control

Requirement Detail
R-07.1 System creates a snapshot before each VTT edit save
R-07.2 Reviewer can view version history with author, timestamp, and diff
R-07.3 Reviewer can restore any previous version
R-07.4 Concurrent edit conflict is detected and reported to the later editor

R-08: Final Review and Delivery

Requirement Detail
R-08.1 PM can view all final deliverable files before approving
R-08.2 PM approval triggers client notification email
R-08.3 Email contains signed GCS download URLs (24h expiry)
R-08.4 Client can download captions, audio descriptions, and accessible video
R-08.5 Job status advances to COMPLETED after PM approval

R-09: Authentication and Access Control

Requirement Detail
R-09.1 Users authenticate via email/password (local) or Microsoft SSO (enterprise)
R-09.2 Access tokens are valid for 15 minutes; refresh tokens for 7 days
R-09.3 Refresh tokens are stored in HttpOnly cookies only
R-09.4 All API endpoints enforce RBAC — role checked server-side on every request
R-09.5 Login is rate-limited to 5 attempts per 5-minute window

R-10: Audit Logging

Requirement Detail
R-10.1 Every state-changing action by a reviewer, linguist, or PM creates an audit log entry
R-10.2 Audit log entries contain: actor user ID, action type, job ID, timestamp, before/after state
R-10.3 Admin can view the full audit log filtered by user, job, or date range
R-10.4 Audit log entries are immutable once written

R-11: Real-time Notifications

Requirement Detail
R-11.1 Job status changes are pushed to connected clients via WebSocket
R-11.2 WebSocket events are org-scoped — users only receive events for their organisation
R-11.3 WebSocket connection recovers automatically after disconnect (exponential backoff)

Out of Scope (Current Version)

Feature Reason
Automated transcription (Whisper) Gemini handles transcription; Whisper worker exists but not active
CI/CD pipeline Manual deploy via scripts; CI exists but does not run full test suite
Load testing Not implemented; deferred to Phase 7
Multi-tenant billing Cost tracking via oliver-cost-tracker SDK (read-only dashboard)

Maintenance

Update triggers: New feature scope confirmed, requirement changed by stakeholder, QC workflow changes. Verification: Every R-XX.X requirement maps to at least one test in tests/README.md or /tmp/audit/test-plan.md.