Database Schema — Accessible Video Processing Platform
Database: MongoDB Atlas
Database name: configured via MONGODB_DB env var (default: accessible_video)
Collections
jobs
Central document for each video accessibility job.
| Field |
Type |
Description |
| _id |
ObjectId |
Primary key |
| org_id |
ObjectId |
Owning organisation |
| client_user_id |
ObjectId |
User who uploaded the video |
| status |
string |
JobStatus enum (16 values — see architecture.md) |
| source_language |
string |
BCP-47 code (e.g., en-US) |
| requested_outputs |
array[string] |
Output language codes |
| source |
object |
{ gcs_path, filename, duration_seconds } |
| outputs |
object |
Per-language { captions_vtt, ad_vtt, ad_mp3, accessible_mp4 } GCS paths |
| review |
object |
QC state { reviewer_id, approved_at, rejected_at, reason } |
| language_qc |
object |
Per-language QC state (see LanguageQCState below) |
| vtt_versions |
array |
Version snapshot references (see vtt_versions collection) |
| glossary_id |
ObjectId |
Client glossary to use for translation |
| retry_count |
int |
Number of task retries |
| error |
string |
Last error message |
| created_at |
datetime |
ISO 8601 |
| updated_at |
datetime |
ISO 8601 |
| completed_at |
datetime |
ISO 8601 |
LanguageQCState (per-language, nested in language_qc):
| Field |
Type |
Description |
| status |
string |
pending, assigned, approved, rejected, feedback_requested |
| linguist_id |
ObjectId |
Assigned linguist (nullable) |
| assigned_at |
datetime |
When linguist was assigned |
| reviewed_at |
datetime |
When approved/rejected |
| reason |
string |
Rejection or feedback reason |
Indexes:
| Index |
Fields |
Purpose |
| Primary |
_id |
Document lookup |
| org_status |
org_id + status |
List jobs by org and status |
| client |
client_user_id |
Client's own jobs |
| created |
created_at (desc) |
Time-sorted listing |
| status |
status |
Status-filtered queries |
users
| Field |
Type |
Description |
| _id |
ObjectId |
Primary key |
| email |
string |
Unique, lowercase |
| hashed_password |
string |
bcrypt hash (null for SSO-only users) |
| role |
string |
client, reviewer, linguist, pm, admin |
| org_id |
ObjectId |
Primary organisation |
| is_active |
boolean |
Account enabled flag |
| microsoft_id |
string |
Entra ID subject claim (nullable) |
| created_at |
datetime |
|
| updated_at |
datetime |
|
Indexes:
| Index |
Fields |
Purpose |
| email_unique |
email (unique) |
Login lookup |
| org |
org_id |
Members-of-org query |
| microsoft |
microsoft_id (sparse) |
SSO user lookup |
organizations
| Field |
Type |
Description |
| _id |
ObjectId |
Primary key |
| name |
string |
Organisation display name |
| slug |
string |
URL-safe identifier |
| member_ids |
array[ObjectId] |
User IDs in this org |
| created_at |
datetime |
|
Indexes:
| Index |
Fields |
Purpose |
| slug_unique |
slug (unique) |
Org lookup by slug |
glossaries
| Field |
Type |
Description |
| _id |
ObjectId |
Primary key |
| org_id |
ObjectId |
Owning organisation |
| name |
string |
Glossary display name |
| terms |
array |
Array of GlossaryTerm documents |
| created_at |
datetime |
|
| updated_at |
datetime |
|
GlossaryTerm (embedded in terms):
| Field |
Type |
Description |
| _id |
ObjectId |
Term ID |
| source_term |
string |
Term in source language |
| target_language |
string |
BCP-47 code |
| preferred_translation |
string |
Required translation |
| context |
string |
Usage notes (optional) |
| embedding |
array[float] |
Vector embedding for similarity search |
Indexes:
| Index |
Fields |
Purpose |
| org |
org_id |
List org glossaries |
| vector |
terms.embedding (Atlas Vector Search) |
Similarity retrieval |
Atlas Vector Search index name: glossary_embedding_index
vtt_versions
Immutable version snapshots created before each VTT save.
| Field |
Type |
Description |
| _id |
ObjectId |
Primary key |
| job_id |
ObjectId |
Parent job |
| language |
string |
Language code |
| version_number |
int |
Sequential version number |
| content |
string |
Full VTT file content at time of snapshot |
| author_id |
ObjectId |
User who made the change |
| created_at |
datetime |
Snapshot timestamp |
| diff_from_prev |
string |
Diff against previous version (optional) |
Indexes:
| Index |
Fields |
Purpose |
| job_lang |
job_id + language + version_number |
Version history listing |
| job_lang_created |
job_id + language + created_at (desc) |
Time-sorted history |
audit_logs
Immutable audit trail for all reviewer, linguist, and PM actions.
| Field |
Type |
Description |
| _id |
ObjectId |
Primary key |
| actor_id |
ObjectId |
User performing the action |
| actor_email |
string |
Denormalised for readability |
| action |
string |
Action type enum (see below) |
| job_id |
ObjectId |
Affected job (nullable) |
| org_id |
ObjectId |
Organisation context |
| before_state |
string |
Job status before action |
| after_state |
string |
Job status after action |
| metadata |
object |
Action-specific context (reason, language, etc.) |
| created_at |
datetime |
Event timestamp |
Action types:
| Action |
Trigger |
job_approved |
QC approve |
job_rejected |
QC reject |
qc_feedback_sent |
QC feedback |
language_approved |
Language-level QC approve |
language_rejected |
Language-level QC reject |
linguist_assigned |
PM assigns linguist |
vtt_edited |
VTT content saved |
vtt_restored |
Version restore |
job_retry |
Admin manual retry |
user_invited |
PM/Admin invites member |
Indexes:
| Index |
Fields |
Purpose |
| job |
job_id + created_at |
Per-job audit trail |
| org_created |
org_id + created_at (desc) |
Org-level audit log |
| actor |
actor_id + created_at |
Per-user action history |
invitations
| Field |
Type |
Description |
| _id |
ObjectId |
Primary key |
| email |
string |
Invitee email |
| org_id |
ObjectId |
Org being joined |
| role |
string |
Role to assign on accept |
| token |
string |
Unique invite token (hashed) |
| expires_at |
datetime |
7-day expiry |
| accepted_at |
datetime |
Nullable — set on accept |
| created_by |
ObjectId |
User who sent invite |
Maintenance
Update triggers: New collection added, index added or removed, field added to model.
Verification: All collections listed here exist in production Atlas. Index names match backend/app/core/database.py create_indexes() function (currently commented out — indexes were created manually).