video-accessibility/docs/project/requirements.md
Vadym Samoilenko a3b300b76a docs: add canonical documentation + audit cleanup
- AGENTS.md: canonical project entry point (Quick Nav, pipeline, constraints)
- docs/: complete docs tree — architecture, API spec, DB schema, infra,
  runbook, requirements, tech stack, principles, reference ADRs, guides,
  tasks backlog, testing strategy
- tests/README.md: test commands, structure, known gaps
- README.md / CLAUDE.md / DEPLOYMENT.md: updated with canonical doc links
- .archive/: backup of pre-documentation-pipeline originals
- backend/uv.lock: uv dependency lockfile
- Delete committed __pycache__ .pyc files (should have been gitignored)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:22:51 +01:00

154 lines
6.8 KiB
Markdown

# Functional Requirements — Accessible Video Processing Platform
<!-- SCOPE: requirements | owner: ln-112 | generated: 2026-04-29 -->
## Purpose
This document specifies what the system must do from a user perspective. Implementation details belong in [architecture.md](architecture.md). Non-functional requirements (performance, security) belong in [architecture.md](architecture.md#security-model).
---
## User Roles
| Role | Who | Primary action |
|------|-----|---------------|
| CLIENT | Paying customer | Upload videos, download deliverables |
| REVIEWER | Oliver internal | QC approve/reject captions + audio description |
| LINGUIST | Language specialist | Review and approve translated content per language |
| PM | Project manager | Assign linguists, give final approval, monitor all jobs |
| ADMIN | Platform operator | Manage users, view audit log, configure platform |
---
## Core Features
### R-01: Video Upload
| Requirement | Detail |
|-------------|--------|
| R-01.1 | Client can upload an MP4 video file |
| R-01.2 | File is stored in GCS; client receives a job ID |
| R-01.3 | Upload progress is displayed in real time |
| R-01.4 | System validates file type (MP4 only) and size on upload |
| R-01.5 | Upload creates a job record in CREATED state |
### R-02: AI Processing Pipeline
| Requirement | Detail |
|-------------|--------|
| R-02.1 | System automatically generates closed captions in VTT format using Gemini 2.5 Pro |
| R-02.2 | System generates audio description VTT (scene descriptions for blind/low-vision viewers) |
| R-02.3 | System generates SDH captions (includes sound effects and speaker IDs) |
| R-02.4 | System generates a descriptive transcript |
| R-02.5 | All generated VTT files are validated for correct format before advancing to QC |
| R-02.6 | Job status is updated in real time via WebSocket during processing |
### R-03: Quality Control Workflow
| Requirement | Detail |
|-------------|--------|
| R-03.1 | Reviewer can view video with captions side-by-side |
| R-03.2 | Reviewer can edit individual VTT cues (text + timing) inline |
| R-03.3 | Reviewer can approve English content (advances to APPROVED_ENGLISH) |
| R-03.4 | Reviewer can reject a job with a reason (advances to REJECTED) |
| R-03.5 | Reviewer can send QC feedback without full rejection (advances to QC_FEEDBACK) |
| R-03.6 | All QC actions are recorded in the audit log with timestamp and user ID |
| R-03.7 | VTT edits create a version snapshot before overwriting (version history maintained) |
### R-04: Per-Language QC
| Requirement | Detail |
|-------------|--------|
| R-04.1 | PM can assign a specific linguist to each output language |
| R-04.2 | Linguist can approve or reject their assigned language |
| R-04.3 | Language statuses are independent — approving French does not affect German |
| R-04.4 | Linguist cannot approve a language not assigned to them |
| R-04.5 | Job advances to PENDING_FINAL_REVIEW only when all languages are approved |
| R-04.6 | Per-language QC actions are recorded in the audit log |
### R-05: Translation and TTS
| Requirement | Detail |
|-------------|--------|
| R-05.1 | System translates captions and AD into all requested output languages |
| R-05.2 | System applies cultural transcreation (Gemini-assisted) where configured |
| R-05.3 | System uses client-specific glossary terms in translation prompts |
| R-05.4 | System synthesises audio description audio via Google TTS or ElevenLabs |
| R-05.5 | TTS is performed per cue to preserve timing |
| R-05.6 | TTS failures result in TTS_FAILED state; manual retry is supported |
### R-06: Glossary Management
| Requirement | Detail |
|-------------|--------|
| R-06.1 | Admin can create, read, update, and delete glossary terms per client organisation |
| R-06.2 | Glossary terms specify source term, target language, preferred translation |
| R-06.3 | System uses exact match first, then vector similarity (≥0.75) for retrieval |
| R-06.4 | Up to 50 terms are injected per translation prompt |
| R-06.5 | Glossary embeddings are indexed in Atlas Vector Search |
### R-07: VTT Version Control
| Requirement | Detail |
|-------------|--------|
| R-07.1 | System creates a snapshot before each VTT edit save |
| R-07.2 | Reviewer can view version history with author, timestamp, and diff |
| R-07.3 | Reviewer can restore any previous version |
| R-07.4 | Concurrent edit conflict is detected and reported to the later editor |
### R-08: Final Review and Delivery
| Requirement | Detail |
|-------------|--------|
| R-08.1 | PM can view all final deliverable files before approving |
| R-08.2 | PM approval triggers client notification email |
| R-08.3 | Email contains signed GCS download URLs (24h expiry) |
| R-08.4 | Client can download captions, audio descriptions, and accessible video |
| R-08.5 | Job status advances to COMPLETED after PM approval |
### R-09: Authentication and Access Control
| Requirement | Detail |
|-------------|--------|
| R-09.1 | Users authenticate via email/password (local) or Microsoft SSO (enterprise) |
| R-09.2 | Access tokens are valid for 15 minutes; refresh tokens for 7 days |
| R-09.3 | Refresh tokens are stored in HttpOnly cookies only |
| R-09.4 | All API endpoints enforce RBAC — role checked server-side on every request |
| R-09.5 | Login is rate-limited to 5 attempts per 5-minute window |
### R-10: Audit Logging
| Requirement | Detail |
|-------------|--------|
| R-10.1 | Every state-changing action by a reviewer, linguist, or PM creates an audit log entry |
| R-10.2 | Audit log entries contain: actor user ID, action type, job ID, timestamp, before/after state |
| R-10.3 | Admin can view the full audit log filtered by user, job, or date range |
| R-10.4 | Audit log entries are immutable once written |
### R-11: Real-time Notifications
| Requirement | Detail |
|-------------|--------|
| R-11.1 | Job status changes are pushed to connected clients via WebSocket |
| R-11.2 | WebSocket events are org-scoped — users only receive events for their organisation |
| R-11.3 | WebSocket connection recovers automatically after disconnect (exponential backoff) |
---
## Out of Scope (Current Version)
| Feature | Reason |
|---------|--------|
| Automated transcription (Whisper) | Gemini handles transcription; Whisper worker exists but not active |
| CI/CD pipeline | Manual deploy via scripts; CI exists but does not run full test suite |
| Load testing | Not implemented; deferred to Phase 7 |
| Multi-tenant billing | Cost tracking via oliver-cost-tracker SDK (read-only dashboard) |
---
## Maintenance
**Update triggers:** New feature scope confirmed, requirement changed by stakeholder, QC workflow changes.
**Verification:** Every R-XX.X requirement maps to at least one test in [tests/README.md](../../tests/README.md) or [/tmp/audit/test-plan.md](/tmp/audit/test-plan.md).
<!-- END SCOPE: requirements -->