# Database Schema — Accessible Video Processing Platform **Database:** MongoDB Atlas **Database name:** configured via `MONGODB_DB` env var (default: `accessible_video`) --- ## Collections ### `jobs` Central document for each video accessibility job. | Field | Type | Description | |-------|------|-------------| | _id | ObjectId | Primary key | | org_id | ObjectId | Owning organisation | | client_user_id | ObjectId | User who uploaded the video | | status | string | JobStatus enum (16 values — see architecture.md) | | source_language | string | BCP-47 code (e.g., `en-US`) | | requested_outputs | array[string] | Output language codes | | source | object | `{ gcs_path, filename, duration_seconds }` | | outputs | object | Per-language `{ captions_vtt, ad_vtt, ad_mp3, accessible_mp4 }` GCS paths | | review | object | QC state `{ reviewer_id, approved_at, rejected_at, reason }` | | language_qc | object | Per-language QC state (see LanguageQCState below) | | vtt_versions | array | Version snapshot references (see `vtt_versions` collection) | | glossary_id | ObjectId | Client glossary to use for translation | | retry_count | int | Number of task retries | | error | string | Last error message | | created_at | datetime | ISO 8601 | | updated_at | datetime | ISO 8601 | | completed_at | datetime | ISO 8601 | **LanguageQCState (per-language, nested in `language_qc`):** | Field | Type | Description | |-------|------|-------------| | status | string | `pending`, `assigned`, `approved`, `rejected`, `feedback_requested` | | linguist_id | ObjectId | Assigned linguist (nullable) | | assigned_at | datetime | When linguist was assigned | | reviewed_at | datetime | When approved/rejected | | reason | string | Rejection or feedback reason | **Indexes:** | Index | Fields | Purpose | |-------|--------|---------| | Primary | `_id` | Document lookup | | org_status | `org_id` + `status` | List jobs by org and status | | client | `client_user_id` | Client's own jobs | | created | `created_at` (desc) | Time-sorted listing | | status | `status` | Status-filtered queries | --- ### `users` | Field | Type | Description | |-------|------|-------------| | _id | ObjectId | Primary key | | email | string | Unique, lowercase | | hashed_password | string | bcrypt hash (null for SSO-only users) | | role | string | `client`, `reviewer`, `linguist`, `pm`, `admin` | | org_id | ObjectId | Primary organisation | | is_active | boolean | Account enabled flag | | microsoft_id | string | Entra ID subject claim (nullable) | | created_at | datetime | | | updated_at | datetime | | **Indexes:** | Index | Fields | Purpose | |-------|--------|---------| | email_unique | `email` (unique) | Login lookup | | org | `org_id` | Members-of-org query | | microsoft | `microsoft_id` (sparse) | SSO user lookup | --- ### `organizations` | Field | Type | Description | |-------|------|-------------| | _id | ObjectId | Primary key | | name | string | Organisation display name | | slug | string | URL-safe identifier | | member_ids | array[ObjectId] | User IDs in this org | | created_at | datetime | | **Indexes:** | Index | Fields | Purpose | |-------|--------|---------| | slug_unique | `slug` (unique) | Org lookup by slug | --- ### `glossaries` | Field | Type | Description | |-------|------|-------------| | _id | ObjectId | Primary key | | org_id | ObjectId | Owning organisation | | name | string | Glossary display name | | terms | array | Array of GlossaryTerm documents | | created_at | datetime | | | updated_at | datetime | | **GlossaryTerm (embedded in `terms`):** | Field | Type | Description | |-------|------|-------------| | _id | ObjectId | Term ID | | source_term | string | Term in source language | | target_language | string | BCP-47 code | | preferred_translation | string | Required translation | | context | string | Usage notes (optional) | | embedding | array[float] | Vector embedding for similarity search | **Indexes:** | Index | Fields | Purpose | |-------|--------|---------| | org | `org_id` | List org glossaries | | vector | `terms.embedding` (Atlas Vector Search) | Similarity retrieval | **Atlas Vector Search index name:** `glossary_embedding_index` --- ### `vtt_versions` Immutable version snapshots created before each VTT save. | Field | Type | Description | |-------|------|-------------| | _id | ObjectId | Primary key | | job_id | ObjectId | Parent job | | language | string | Language code | | version_number | int | Sequential version number | | content | string | Full VTT file content at time of snapshot | | author_id | ObjectId | User who made the change | | created_at | datetime | Snapshot timestamp | | diff_from_prev | string | Diff against previous version (optional) | **Indexes:** | Index | Fields | Purpose | |-------|--------|---------| | job_lang | `job_id` + `language` + `version_number` | Version history listing | | job_lang_created | `job_id` + `language` + `created_at` (desc) | Time-sorted history | --- ### `audit_logs` Immutable audit trail for all reviewer, linguist, and PM actions. | Field | Type | Description | |-------|------|-------------| | _id | ObjectId | Primary key | | actor_id | ObjectId | User performing the action | | actor_email | string | Denormalised for readability | | action | string | Action type enum (see below) | | job_id | ObjectId | Affected job (nullable) | | org_id | ObjectId | Organisation context | | before_state | string | Job status before action | | after_state | string | Job status after action | | metadata | object | Action-specific context (reason, language, etc.) | | created_at | datetime | Event timestamp | **Action types:** | Action | Trigger | |--------|---------| | `job_approved` | QC approve | | `job_rejected` | QC reject | | `qc_feedback_sent` | QC feedback | | `language_approved` | Language-level QC approve | | `language_rejected` | Language-level QC reject | | `linguist_assigned` | PM assigns linguist | | `vtt_edited` | VTT content saved | | `vtt_restored` | Version restore | | `job_retry` | Admin manual retry | | `user_invited` | PM/Admin invites member | **Indexes:** | Index | Fields | Purpose | |-------|--------|---------| | job | `job_id` + `created_at` | Per-job audit trail | | org_created | `org_id` + `created_at` (desc) | Org-level audit log | | actor | `actor_id` + `created_at` | Per-user action history | --- ### `invitations` | Field | Type | Description | |-------|------|-------------| | _id | ObjectId | Primary key | | email | string | Invitee email | | org_id | ObjectId | Org being joined | | role | string | Role to assign on accept | | token | string | Unique invite token (hashed) | | expires_at | datetime | 7-day expiry | | accepted_at | datetime | Nullable — set on accept | | created_by | ObjectId | User who sent invite | --- ## Maintenance **Update triggers:** New collection added, index added or removed, field added to model. **Verification:** All collections listed here exist in production Atlas. Index names match `backend/app/core/database.py` `create_indexes()` function (currently commented out — indexes were created manually).