- AGENTS.md: canonical project entry point (Quick Nav, pipeline, constraints) - docs/: complete docs tree — architecture, API spec, DB schema, infra, runbook, requirements, tech stack, principles, reference ADRs, guides, tasks backlog, testing strategy - tests/README.md: test commands, structure, known gaps - README.md / CLAUDE.md / DEPLOYMENT.md: updated with canonical doc links - .archive/: backup of pre-documentation-pipeline originals - backend/uv.lock: uv dependency lockfile - Delete committed __pycache__ .pyc files (should have been gitignored) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
219 lines
7.1 KiB
Markdown
219 lines
7.1 KiB
Markdown
# Database Schema — Accessible Video Processing Platform
|
|
|
|
<!-- SCOPE: database-schema | owner: ln-113 | generated: 2026-04-29 -->
|
|
|
|
**Database:** MongoDB Atlas
|
|
**Database name:** configured via `MONGODB_DB` env var (default: `accessible_video`)
|
|
|
|
---
|
|
|
|
## Collections
|
|
|
|
### `jobs`
|
|
|
|
Central document for each video accessibility job.
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| _id | ObjectId | Primary key |
|
|
| org_id | ObjectId | Owning organisation |
|
|
| client_user_id | ObjectId | User who uploaded the video |
|
|
| status | string | JobStatus enum (16 values — see architecture.md) |
|
|
| source_language | string | BCP-47 code (e.g., `en-US`) |
|
|
| requested_outputs | array[string] | Output language codes |
|
|
| source | object | `{ gcs_path, filename, duration_seconds }` |
|
|
| outputs | object | Per-language `{ captions_vtt, ad_vtt, ad_mp3, accessible_mp4 }` GCS paths |
|
|
| review | object | QC state `{ reviewer_id, approved_at, rejected_at, reason }` |
|
|
| language_qc | object | Per-language QC state (see LanguageQCState below) |
|
|
| vtt_versions | array | Version snapshot references (see `vtt_versions` collection) |
|
|
| glossary_id | ObjectId | Client glossary to use for translation |
|
|
| retry_count | int | Number of task retries |
|
|
| error | string | Last error message |
|
|
| created_at | datetime | ISO 8601 |
|
|
| updated_at | datetime | ISO 8601 |
|
|
| completed_at | datetime | ISO 8601 |
|
|
|
|
**LanguageQCState (per-language, nested in `language_qc`):**
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| status | string | `pending`, `assigned`, `approved`, `rejected`, `feedback_requested` |
|
|
| linguist_id | ObjectId | Assigned linguist (nullable) |
|
|
| assigned_at | datetime | When linguist was assigned |
|
|
| reviewed_at | datetime | When approved/rejected |
|
|
| reason | string | Rejection or feedback reason |
|
|
|
|
**Indexes:**
|
|
|
|
| Index | Fields | Purpose |
|
|
|-------|--------|---------|
|
|
| Primary | `_id` | Document lookup |
|
|
| org_status | `org_id` + `status` | List jobs by org and status |
|
|
| client | `client_user_id` | Client's own jobs |
|
|
| created | `created_at` (desc) | Time-sorted listing |
|
|
| status | `status` | Status-filtered queries |
|
|
|
|
---
|
|
|
|
### `users`
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| _id | ObjectId | Primary key |
|
|
| email | string | Unique, lowercase |
|
|
| hashed_password | string | bcrypt hash (null for SSO-only users) |
|
|
| role | string | `client`, `reviewer`, `linguist`, `pm`, `admin` |
|
|
| org_id | ObjectId | Primary organisation |
|
|
| is_active | boolean | Account enabled flag |
|
|
| microsoft_id | string | Entra ID subject claim (nullable) |
|
|
| created_at | datetime | |
|
|
| updated_at | datetime | |
|
|
|
|
**Indexes:**
|
|
|
|
| Index | Fields | Purpose |
|
|
|-------|--------|---------|
|
|
| email_unique | `email` (unique) | Login lookup |
|
|
| org | `org_id` | Members-of-org query |
|
|
| microsoft | `microsoft_id` (sparse) | SSO user lookup |
|
|
|
|
---
|
|
|
|
### `organizations`
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| _id | ObjectId | Primary key |
|
|
| name | string | Organisation display name |
|
|
| slug | string | URL-safe identifier |
|
|
| member_ids | array[ObjectId] | User IDs in this org |
|
|
| created_at | datetime | |
|
|
|
|
**Indexes:**
|
|
|
|
| Index | Fields | Purpose |
|
|
|-------|--------|---------|
|
|
| slug_unique | `slug` (unique) | Org lookup by slug |
|
|
|
|
---
|
|
|
|
### `glossaries`
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| _id | ObjectId | Primary key |
|
|
| org_id | ObjectId | Owning organisation |
|
|
| name | string | Glossary display name |
|
|
| terms | array | Array of GlossaryTerm documents |
|
|
| created_at | datetime | |
|
|
| updated_at | datetime | |
|
|
|
|
**GlossaryTerm (embedded in `terms`):**
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| _id | ObjectId | Term ID |
|
|
| source_term | string | Term in source language |
|
|
| target_language | string | BCP-47 code |
|
|
| preferred_translation | string | Required translation |
|
|
| context | string | Usage notes (optional) |
|
|
| embedding | array[float] | Vector embedding for similarity search |
|
|
|
|
**Indexes:**
|
|
|
|
| Index | Fields | Purpose |
|
|
|-------|--------|---------|
|
|
| org | `org_id` | List org glossaries |
|
|
| vector | `terms.embedding` (Atlas Vector Search) | Similarity retrieval |
|
|
|
|
**Atlas Vector Search index name:** `glossary_embedding_index`
|
|
|
|
---
|
|
|
|
### `vtt_versions`
|
|
|
|
Immutable version snapshots created before each VTT save.
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| _id | ObjectId | Primary key |
|
|
| job_id | ObjectId | Parent job |
|
|
| language | string | Language code |
|
|
| version_number | int | Sequential version number |
|
|
| content | string | Full VTT file content at time of snapshot |
|
|
| author_id | ObjectId | User who made the change |
|
|
| created_at | datetime | Snapshot timestamp |
|
|
| diff_from_prev | string | Diff against previous version (optional) |
|
|
|
|
**Indexes:**
|
|
|
|
| Index | Fields | Purpose |
|
|
|-------|--------|---------|
|
|
| job_lang | `job_id` + `language` + `version_number` | Version history listing |
|
|
| job_lang_created | `job_id` + `language` + `created_at` (desc) | Time-sorted history |
|
|
|
|
---
|
|
|
|
### `audit_logs`
|
|
|
|
Immutable audit trail for all reviewer, linguist, and PM actions.
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| _id | ObjectId | Primary key |
|
|
| actor_id | ObjectId | User performing the action |
|
|
| actor_email | string | Denormalised for readability |
|
|
| action | string | Action type enum (see below) |
|
|
| job_id | ObjectId | Affected job (nullable) |
|
|
| org_id | ObjectId | Organisation context |
|
|
| before_state | string | Job status before action |
|
|
| after_state | string | Job status after action |
|
|
| metadata | object | Action-specific context (reason, language, etc.) |
|
|
| created_at | datetime | Event timestamp |
|
|
|
|
**Action types:**
|
|
|
|
| Action | Trigger |
|
|
|--------|---------|
|
|
| `job_approved` | QC approve |
|
|
| `job_rejected` | QC reject |
|
|
| `qc_feedback_sent` | QC feedback |
|
|
| `language_approved` | Language-level QC approve |
|
|
| `language_rejected` | Language-level QC reject |
|
|
| `linguist_assigned` | PM assigns linguist |
|
|
| `vtt_edited` | VTT content saved |
|
|
| `vtt_restored` | Version restore |
|
|
| `job_retry` | Admin manual retry |
|
|
| `user_invited` | PM/Admin invites member |
|
|
|
|
**Indexes:**
|
|
|
|
| Index | Fields | Purpose |
|
|
|-------|--------|---------|
|
|
| job | `job_id` + `created_at` | Per-job audit trail |
|
|
| org_created | `org_id` + `created_at` (desc) | Org-level audit log |
|
|
| actor | `actor_id` + `created_at` | Per-user action history |
|
|
|
|
---
|
|
|
|
### `invitations`
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| _id | ObjectId | Primary key |
|
|
| email | string | Invitee email |
|
|
| org_id | ObjectId | Org being joined |
|
|
| role | string | Role to assign on accept |
|
|
| token | string | Unique invite token (hashed) |
|
|
| expires_at | datetime | 7-day expiry |
|
|
| accepted_at | datetime | Nullable — set on accept |
|
|
| created_by | ObjectId | User who sent invite |
|
|
|
|
---
|
|
|
|
## Maintenance
|
|
|
|
**Update triggers:** New collection added, index added or removed, field added to model.
|
|
**Verification:** All collections listed here exist in production Atlas. Index names match `backend/app/core/database.py` `create_indexes()` function (currently commented out — indexes were created manually).
|
|
|
|
<!-- END SCOPE: database-schema -->
|