Database Schema — Accessible Video Processing Platform
Generated: 2026-05-01
Quick Navigation
Agent Entry
| Signal |
Value |
| Purpose |
Authoritative reference for all MongoDB collections, fields, and indexes |
| Read When |
You need collection structure, field types, or index strategy |
| Skip When |
You need API endpoints → api_spec.md; infrastructure → infrastructure.md |
| Canonical |
Yes |
| Next Docs |
Architecture, API Spec |
| Primary Sources |
backend/app/models/*.py, backend/app/migrations/scripts/ |
Overview
| Detail |
Value |
| Database |
MongoDB Atlas |
| DB name |
accessible_video (env: MONGODB_DB) |
| Driver |
Motor (async) via PyMongo |
| Schema style |
Schema-on-read; Pydantic models enforce shape in application layer |
ER Diagram
erDiagram
users {
ObjectId _id
string email
string hashed_password
string full_name
string role
string auth_provider
bool is_active
array pm_client_ids
array languages
datetime created_at
datetime updated_at
}
jobs {
ObjectId _id
string org_id
string client_id
string project_id
string created_by_user_id
string status
object source
object requested_outputs
object tts_preferences
map language_outputs
object review
map language_qc
object failure
datetime created_at
datetime updated_at
}
organizations {
ObjectId _id
string name
string slug
bool is_active
string plan
datetime created_at
datetime updated_at
}
memberships {
ObjectId _id
string user_id
string organization_id
string role_in_org
array team_ids
datetime created_at
string created_by
}
clients {
ObjectId _id
string name
string slug
bool is_active
datetime created_at
datetime updated_at
}
teams {
ObjectId _id
string name
string client_id
array member_user_ids
datetime created_at
}
projects {
ObjectId _id
string name
string client_id
bool is_active
array default_languages
string default_linguist_id
string default_reviewer_id
datetime created_at
}
glossaries {
ObjectId _id
string client_id
string name
string source_locale
string status
string current_version_id
datetime created_at
string created_by
}
glossary_versions {
ObjectId _id
string glossary_id
int version_number
int term_count
string embedding_status
datetime created_at
}
glossary_terms {
ObjectId _id
string glossary_id
string version_id
string source_term
map translations
}
job_briefs {
ObjectId _id
string organization_id
string project_id
string title
object requested_outputs
string status
string created_by
string job_id
datetime created_at
datetime updated_at
}
vtt_versions {
ObjectId _id
string job_id
string lang
string kind
int version
string content
string gcs_uri
datetime created_at
object created_by
}
share_tokens {
string _id
string job_id
string organization_id
string created_by_user_id
datetime expires_at
bool is_active
}
invitations {
ObjectId _id
string email
string organization_id
string role_in_org
string token_hash
datetime expires_at
datetime accepted_at
}
review_notes {
ObjectId _id
string job_id
string asset_key
float timestamp_seconds
string content
string user_id
datetime created_at
}
audit_logs {
ObjectId _id
string action
string user_id
string target_type
string target_id
object details
datetime timestamp
}
users ||--o{ memberships : "belongs to org via"
organizations ||--o{ memberships : "has"
users ||--o{ jobs : "creates"
clients ||--o{ jobs : "owns"
projects ||--o{ jobs : "groups"
clients ||--o{ teams : "has"
clients ||--o{ projects : "has"
clients ||--o{ glossaries : "owns"
glossaries ||--o{ glossary_versions : "has"
glossary_versions ||--o{ glossary_terms : "contains"
jobs ||--o{ vtt_versions : "has"
jobs ||--o{ review_notes : "has"
jobs ||--o{ share_tokens : "has"
organizations ||--o{ job_briefs : "submits"
Collections
users
| Field |
Type |
Required |
Notes |
_id |
ObjectId |
Yes |
Primary key |
email |
string |
Yes |
Unique; login identity |
hashed_password |
string |
No |
Null for Microsoft SSO users |
full_name |
string |
Yes |
Display name |
role |
string |
Yes |
Enum: client, linguist, reviewer, production, project_manager, admin |
auth_provider |
string |
Yes |
Enum: local, microsoft |
is_active |
bool |
Yes |
Soft-disable without deleting |
pm_client_ids |
string[] |
No |
Client IDs where user is Project Manager |
languages |
string[] |
No |
BCP-47 codes (linguist/reviewer competency) |
created_at |
datetime |
Yes |
|
updated_at |
datetime |
No |
|
Indexes: email (unique), role, is_active, created_at (desc), auth_provider
jobs
| Field |
Type |
Required |
Notes |
_id |
ObjectId |
Yes |
Primary key |
org_id |
string |
No |
Owning organisation |
client_id |
string |
No |
Owning client |
project_id |
string |
No |
Owning project |
created_by_user_id |
string |
Yes |
Uploader |
status |
string |
Yes |
See Job Status Enum below |
source |
object |
Yes |
{filename, gcs_uri, duration_s, language, detected_language} |
requested_outputs |
object |
Yes |
{captions_vtt, audio_description_vtt, audio_description_mp3, accessible_video_mp4, languages[], tts_preferences} |
tts_preferences |
object |
No |
{provider, default_voice, voices_per_language, model, speed, style_preset} |
language_outputs |
map |
No |
Key: BCP-47 lang code → LangOutput object |
review |
object |
No |
{notes, reviewer_id, history[]} |
language_qc |
map |
No |
Key: BCP-47 lang code → LanguageQCState |
failure |
object |
No |
{step, type, message, retriable, occurred_at, retry_count} |
created_at |
datetime |
Yes |
|
updated_at |
datetime |
No |
|
LangOutput sub-document:
| Field |
Type |
Notes |
captions_vtt_gcs |
string |
GCS URI |
sdh_captions_vtt_gcs |
string |
SDH captions (speaker labels, sound effects, music) |
ad_vtt_gcs |
string |
Audio description VTT |
ad_mp3_gcs |
string |
Audio description MP3 |
accessible_video_gcs |
string |
Rendered accessible MP4 |
accessible_video_method |
string |
overlay or pause_insert |
retimed_captions_vtt_gcs |
string |
Re-timed captions for pause-insert |
ad_cue_manifest |
array |
Per-cue: {cue_index, gcs_uri, text, duration_s} |
accessible_video_edit_state |
object |
Pause points, video segments, TTS regen queue |
descriptive_transcript_gcs |
string |
WCAG combined transcript |
origin |
string |
translate, transcreate, gemini_translate, video_native |
qa_notes |
string |
QA reviewer notes |
LanguageQCState sub-document:
| Field |
Type |
Notes |
status |
string |
pending, in_progress, pending_review, in_review, approved, rejected |
assigned_linguist_id |
string |
|
assigned_linguist_email |
string |
|
assigned_reviewer_id |
string |
|
assigned_reviewer_email |
string |
|
linguist_deadline |
datetime |
|
reviewer_deadline |
datetime |
|
total_cues |
int |
Set when reviewer opens |
reviewed_cues |
int |
Incremented as cues are reviewed |
reject_category |
string |
timing, mistranslation, terminology, profanity, length |
history |
array |
LanguageQCEvent[] |
comments |
array |
LanguageQCComment[] |
Job Status Enum:
| Status |
Meaning |
created |
Job record created, upload pending |
ingesting |
Celery worker downloading/preparing video |
ai_processing |
Gemini 2.5 Pro generating VTT |
pending_qc |
Ready for QC review |
approved_english |
English source VTT approved |
approved_source |
Non-English source VTT approved |
rejected |
Rejected at QC |
qc_feedback |
Returned with QC notes |
translating |
Google Translate running |
tts_generating |
TTS synthesis in progress |
tts_failed |
Legacy; use processing_failed + failure.step="tts" for new failures |
rendering_video |
Accessible video render in progress |
render_failed |
Legacy failure status |
processing_failed |
Unified failure — see failure.step |
rendering_qc |
Re-rendering during QC |
pending_final_review |
PM final sign-off required |
completed |
Fully delivered |
cancelled |
Manually cancelled |
Indexes: (status, created_at desc), org_id, client_id, project_id, created_by_user_id
organizations
| Field |
Type |
Required |
Notes |
_id |
ObjectId |
Yes |
|
name |
string |
Yes |
|
slug |
string |
Yes |
URL-safe identifier |
is_active |
bool |
Yes |
|
plan |
string |
Yes |
Default: standard |
created_at |
datetime |
Yes |
|
updated_at |
datetime |
No |
|
memberships
| Field |
Type |
Required |
Notes |
_id |
ObjectId |
Yes |
|
user_id |
string |
Yes |
|
organization_id |
string |
Yes |
|
role_in_org |
string |
Yes |
owner, admin, manager, member, viewer |
team_ids |
string[] |
No |
Teams within the org |
created_at |
datetime |
No |
|
created_by |
string |
No |
|
Indexes: (user_id, organization_id) unique, organization_id, user_id
clients
| Field |
Type |
Notes |
_id |
ObjectId |
|
name |
string |
|
slug |
string |
|
is_active |
bool |
|
created_at |
datetime |
|
updated_at |
datetime |
|
teams
| Field |
Type |
Notes |
_id |
ObjectId |
|
name |
string |
|
client_id |
string |
FK → clients |
member_user_ids |
string[] |
User IDs |
created_at |
datetime |
|
projects
| Field |
Type |
Notes |
_id |
ObjectId |
|
name |
string |
|
client_id |
string |
FK → clients |
is_active |
bool |
|
default_languages |
string[] |
BCP-47 language codes |
default_linguist_id |
string |
|
default_reviewer_id |
string |
|
created_at |
datetime |
|
updated_at |
datetime |
|
glossaries
| Field |
Type |
Notes |
_id |
ObjectId |
|
client_id |
string |
FK → clients |
name |
string |
|
description |
string |
Optional |
source_locale |
string |
BCP-47, e.g. en-GB |
source |
string |
xlsx_upload or fraze_api |
status |
string |
active, archived |
current_version_id |
string |
FK → glossary_versions |
created_at |
datetime |
|
created_by |
string |
user_id |
glossary_versions
| Field |
Type |
Notes |
_id |
ObjectId |
|
glossary_id |
string |
FK → glossaries |
version_number |
int |
Monotonically increasing |
source_xlsx_gcs_path |
string |
Original XLSX on GCS |
term_count |
int |
|
embedded_count |
int |
Terms with embeddings |
embedding_status |
string |
pending, in_progress, done, failed |
created_at |
datetime |
|
created_by |
string |
user_id |
change_note |
string |
Optional version note |
glossary_terms
| Field |
Type |
Notes |
_id |
ObjectId |
|
glossary_id |
string |
FK → glossaries |
version_id |
string |
FK → glossary_versions |
| source term |
string |
Source language term |
| translations |
map |
locale → translated string |
job_briefs
| Field |
Type |
Notes |
_id |
ObjectId |
|
organization_id |
string |
FK → organizations |
project_id |
string |
Optional FK → projects |
title |
string |
|
description |
string |
Optional |
requested_outputs |
object |
Same shape as jobs.requested_outputs |
languages |
string[] |
Requested target languages |
deadline |
datetime |
Optional |
status |
string |
draft, submitted, approved, rejected, fulfilled |
created_by |
string |
user_id |
job_id |
string |
FK → jobs (set when brief is fulfilled) |
created_at |
datetime |
|
updated_at |
datetime |
|
submitted_at |
datetime |
|
approved_by |
string |
PM user_id |
reject_reason |
string |
|
vtt_versions
| Field |
Type |
Notes |
_id |
ObjectId |
|
job_id |
string |
FK → jobs |
lang |
string |
BCP-47 language code |
kind |
string |
captions or ad |
version |
int |
Monotonically increasing per (job_id, lang, kind) |
content |
string |
Full VTT file content |
gcs_uri |
string |
Backup copy on GCS |
created_at |
datetime |
|
created_by |
object |
{user_id, user_email} |
note |
string |
Editor note |
parent_version |
int |
Version this was derived from |
cue_count |
int |
|
byte_size |
int |
|
share_tokens
| Field |
Type |
Notes |
_id |
string |
Token itself (32 hex chars) — used as PK |
job_id |
string |
FK → jobs |
organization_id |
string |
FK → organizations |
created_by_user_id |
string |
|
created_by_email |
string |
Denormalised |
created_at |
datetime |
|
expires_at |
datetime |
Null = never expires |
is_active |
bool |
|
label |
string |
Human note |
invitations
| Field |
Type |
Notes |
_id |
ObjectId |
|
email |
string |
Invitee email |
organization_id |
string |
FK → organizations |
role_in_org |
string |
OrgRole enum |
target_team_ids |
string[] |
Teams to join on accept |
token_hash |
string |
SHA-256 hash of invitation token |
invited_by_user_id |
string |
|
expires_at |
datetime |
|
accepted_at |
datetime |
Null if pending |
revoked_at |
datetime |
Null if active |
created_at |
datetime |
|
Indexes: token_hash, organization_id, email, TTL on expires_at
review_notes
| Field |
Type |
Notes |
_id |
ObjectId |
|
job_id |
string |
FK → jobs |
asset_key |
string |
e.g. en, es, en_accessible |
timestamp_seconds |
float |
Video timestamp when note was placed |
content |
string |
Note text |
user_id |
string |
Author |
user_name |
string |
Denormalised display name |
created_at |
datetime |
|
updated_at |
datetime |
|
audit_logs
| Field |
Type |
Notes |
_id |
ObjectId |
|
action |
string |
Dotted enum, e.g. auth.login.success, job.approve |
user_id |
string |
Actor |
target_type |
string |
e.g. job, user |
target_id |
string |
Target resource ID |
details |
object |
Freeform context |
timestamp |
datetime |
|
Indexes: timestamp (desc), user_id, action, (target_type, target_id)
migrations
Internal collection managed by backend/app/migrations/migrator.py.
| Field |
Type |
Notes |
_id |
ObjectId |
|
version |
string |
Migration script name/timestamp |
applied_at |
datetime |
|
Indexes: version (unique), applied_at (desc)
GCS File Layout
All binary assets are stored in GCS; MongoDB stores only GCS URIs.
gs://{GCS_BUCKET}/jobs/{job_id}/
source.mp4
{lang}/
captions.vtt
ad.vtt
ad.mp3
accessible.mp4
sdh_captions.vtt
descriptive_transcript.txt
cues/
cue_{n}.mp3
Maintenance
Last Updated: 2026-05-01
Update Triggers:
- New model file added in
backend/app/models/
- New migration script that adds fields or collections
- Field added/removed from existing Pydantic model
- New index created in a migration
Verification: