vault backup: 2026-04-27 11:11:54

2026-04-27 11:11:54 +01:00 · 2026-04-27 11:11:54 +01:00 · 41e0ee3ea1
commit 41e0ee3ea1
parent e128d9e58a
13 changed files with 890 additions and 1 deletions
--- a/wiki/_master-index.md
+++ b/wiki/_master-index.md
@ -36,5 +36,10 @@ This 3-hop pattern works for hundreds of articles without vector search.
 | [[wiki/reports/_index\|reports/]] | Weekly and monthly knowledge base summaries | 0 |
 | [[wiki/infrastructure/_index\|infrastructure/]] | Server inventory: all 10 SSH hosts — optical, optical-dev, baic, librechat, modocmms, box-cli, aimpress, pve | 9 |

+| [[wiki/architecture/ai-cost-tracker\|architecture: ai-cost-tracker]] | Shared AI cost tracking service — architecture, Workspace→Team→Project, preflight+record SDK, LiteLLM pricing | 1 |
+| [[wiki/tech-patterns/cost-tracker-integration\|tech-patterns: cost-tracker-integration]] | Integration playbook for any Oliver project connecting to ai-cost-tracker (9-step guide + troubleshooting) | 1 |
+| [[wiki/tech-patterns/cost-tracker-pricing-sources\|tech-patterns: cost-tracker-pricing-sources]] | Three-layer pricing pipeline: LiteLLM > YAML > override; historical effective_from/to | 1 |
+| [[wiki/tech-patterns/cost-tracker-providers\|tech-patterns: cost-tracker-providers]] | Billing units per provider: Gemini usage_metadata, ElevenLabs/GCP TTS len(text), future OpenAI/Anthropic | 1 |
+
 <!-- New topic folders added here automatically as they are created -->
 <!-- Format: | [[wiki/topic/_index\|topic/]] | One-line description | N articles | -->
--- a/wiki/architecture/_index.md
+++ b/wiki/architecture/_index.md
@ -29,4 +29,5 @@ Cross-cutting architectural decisions that appear in multiple Oliver projects.
 5. **DEV_AUTH_BYPASS** — skip Azure AD in local dev, always use real auth in production


-| [[wiki/architecture/optical-dev-server-deploy\|optical-dev-server-deploy]] | optical-dev GCP server: single-vhost Apache, Include pattern, port table, deploy script cache | Barclays Banner Builder, all Oliver projects |
+| [[wiki/architecture/optical-dev-server-deploy\|optical-dev-server-deploy]] | optical-dev GCP server: single-vhost Apache, Include pattern, port table, deploy script cache | Barclays Banner Builder, all Oliver projects |
+| [[wiki/architecture/ai-cost-tracker\|ai-cost-tracker]] | Shared AI cost tracking service: Workspace→Team→Project, LiteLLM pricing, preflight+record SDK, hard limits | All Oliver projects |
--- a/wiki/architecture/ai-cost-tracker.md
+++ b/wiki/architecture/ai-cost-tracker.md
@ -0,0 +1,78 @@
+---
+title: AI Cost Tracker — Architecture
+tags: [architecture, ai, cost-tracking, oliver-platform]
+created: 2026-04-27
+updated: 2026-04-27
+---
+
+# AI Cost Tracker
+
+Centralised **shared service** that tracks AI API spend across all Oliver projects. Every project that calls Gemini, ElevenLabs, Google Cloud TTS, or other paid AI APIs sends usage events here.
+
+## Why it exists
+
+Oliver runs multiple projects (video-accessibility, One2Edit, Box-pipelines …) all consuming paid AI APIs. Without centralised tracking: no visibility into total spend, no per-client cost attribution, no budget enforcement.
+
+## Architecture diagram
+
+```
+┌──────────────────────────────────┐         ┌────────────────────────────────────┐
+│ Any Oliver Project               │         │ ai-cost-tracker                    │
+│ (e.g. video-accessibility)       │ POST    │ git@bitbucket.org:zlalani/...      │
+│                                  │ /v1/    │                                    │
+│  AI call sites                   │ preflt  │ FastAPI + MongoDB + Redis + React  │
+│  (Gemini, ElevenLabs, GCP TTS)   │────────►│  POST /v1/preflight                │
+│              │                   │         │  POST /v1/usage/record             │
+│  oliver_cost_tracker SDK         │         │  POST /v1/users/upsert             │
+│  - preflight(estimate)           │◄────────│  POST /v1/projects/upsert          │
+│  - record(actual)                │         │                                    │
+│  - SQLite outbox + retry         │         │ Admin UI (Microsoft SSO)           │
+└──────────────────────────────────┘         │  Workspaces / Teams / Projects     │
+                                             │  Pricing + LiteLLM auto-sync       │
+                                             │  Budgets + email alerts            │
+                                             │  Dashboard + Pivot analytics       │
+                                             └────────────────────────────────────┘
+```
+
+## Key design decisions
+
+| Decision | Choice | Why |
+|---|---|---|
+| Deployment | Separate repo + separate server | Clean isolation, independent scaling |
+| Org hierarchy | Workspace → Team → Project | Matches Oliver agency structure |
+| User ownership | Each project owns users; lazy mirror in shared | No SSO migration needed |
+| Pricing | LiteLLM auto-sync + YAML (non-LLM) + admin override | Auto-updated for LLMs, manual for chars |
+| Transport | Sync HTTP + SQLite outbox fallback | Never breaks the AI pipeline |
+| Budget enforcement | Hard limits via preflight check | `allow=false` before call is made |
+| Auth (projects) | API key per project | Simple, revocable, auditable |
+| Auth (admins) | Microsoft SSO | Consistent with all Oliver projects |
+
+## Org hierarchy
+
+```
+Workspace  (e.g. "Ford", "H&M", "Oliver Internal")
+  └── Team  (e.g. "Video", "Localization", "QC")
+       └── Project  (e.g. "Ford Q3 Campaign", "H&M Spring 2026")
+            └── Job / event  (individual AI call)
+```
+
+Users live in each project (video-accessibility, etc.) and are **lazily mirrored** into the cost-tracker when their first usage event arrives.
+
+## Tech stack
+
+Mirrors video-accessibility for team familiarity:
+- Backend: **FastAPI + MongoDB Atlas + Redis + Celery**
+- Frontend: **React 18 + Vite (TypeScript)**
+- Auth admin: **Microsoft AAD (MSAL)**
+- Charts: **recharts**
+- Tables/pivot: **@tanstack/react-table**
+
+## Related articles
+
+- [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]] — how to connect a new project
+- [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]] — how pricing is maintained
+- [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] — billing units per AI provider
+- [[wiki/concepts/preflight-record-pattern|preflight-record-pattern]] — the core usage-tracking pattern
+- [[wiki/concepts/lazy-user-mirror|lazy-user-mirror]] — how user sync works
+- [[wiki/concepts/sync-with-outbox|sync-with-outbox]] — resilient HTTP calls with SQLite fallback
+- [[wiki/projects-overview/ai-cost-tracker|ai-cost-tracker (project card)]] — project registry card
--- a/wiki/concepts/_index.md
+++ b/wiki/concepts/_index.md
@ -50,5 +50,10 @@
 | [[wiki/concepts/fish-fisher-conf-d-conflict]] | Fisher plugin manager conflict when conf.d/ files are manually copied — delete and run fisher update to reinstall cleanly | daily/2026-04-22.md | 2026-04-22 |
 | [[wiki/concepts/macos-python-version-hooks]] | macOS system /usr/bin/python3 = 3.9; Path\|None syntax requires 3.10+; Claude Code hooks must use /opt/homebrew/bin/python3 | daily/2026-04-22.md | 2026-04-22 |

+| [[wiki/concepts/preflight-record-pattern]] | preflight(estimate) → AI call → record(actual) — the 3-step AI cost-tracking pattern with budget enforcement | ai-cost-tracker | 2026-04-27 |
+| [[wiki/concepts/lazy-user-mirror]] | User mirror created on first AI event, not on user creation — minimal integration surface, source project stays owner | ai-cost-tracker | 2026-04-27 |
+| [[wiki/concepts/sync-with-outbox]] | Sync HTTP + SQLite outbox: record() never blocks the AI pipeline; background flusher retries up to 10x | ai-cost-tracker | 2026-04-27 |
+| [[wiki/concepts/litellm-pricing-source]] | LiteLLM model_prices JSON as auto-updating LLM price source — why scraping provider sites is fragile | ai-cost-tracker | 2026-04-27 |
+
 <!-- Articles added automatically by compile.py -->
 <!-- Format: | [[concepts/slug]] | One-line summary | daily/YYYY-MM-DD.md | date | -->
--- a/wiki/concepts/lazy-user-mirror.md
+++ b/wiki/concepts/lazy-user-mirror.md
@ -0,0 +1,58 @@
+---
+title: Lazy User Mirror
+tags: [concept, architecture, cost-tracking, users]
+created: 2026-04-27
+updated: 2026-04-27
+---
+
+# Lazy User Mirror
+
+The cost-tracker maintains a read-only mirror of users from each connected project. "Lazy" means the mirror record is created **on first use**, not when the user is created in the source project.
+
+## Why lazy (not eager)?
+
+- Source projects own their users; we don't want to add mandatory webhooks to every user CRUD operation
+- A user who never triggers an AI call doesn't need to exist in the mirror
+- Keeps integration surface minimal — just send `user_external_id` with every event
+
+## How it works
+
+1. When `POST /v1/preflight` or `POST /v1/usage/record` arrives with a `user_external_id` not yet in `users_mirror` → cost-tracker **auto-creates** a mirror record using `email`, `full_name`, and `role` from the request payload.
+
+2. The SDK sends these on every call:
+   ```python
+   await ct.preflight(
+       user_external_id=user.id,
+       user_email=user.email,          # for mirror creation
+       user_full_name=user.full_name,  # for analytics display
+       user_role=user.role,            # for RBAC in analytics
+       ...
+   )
+   ```
+
+3. `users_mirror` record fields: `source_app`, `external_id`, `email`, `full_name`, `role`, `workspace_id` (from project lookup), `team_id`, `first_seen_at`, `last_seen_at`.
+
+4. Subsequent calls with the same `user_external_id` update `last_seen_at` only — no overwrite of profile fields. Profile changes are pushed via explicit `POST /v1/users/upsert`.
+
+## Explicit sync for profile changes
+
+If a user's email, role, or name changes in the source project:
+
+```python
+await ct.upsert_user(
+    external_id=user.id,
+    email=user.email,
+    full_name=user.full_name,
+    role=user.role,
+    project_external_id=...,
+)
+```
+
+In video-accessibility this is called from:
+- `POST /admin/users` — on user creation (`routes_admin.py:101`)
+- Login flow — on every successful login (keeps mirror fresh)
+
+## Analytics implications
+
+- Users without any AI events never appear in the mirror → analytics shows only users who actually consumed AI
+- Orphan users (no `project_external_id` provided) are mirrored but show `workspace_id=null` until assigned via bulk action in admin UI
--- a/wiki/concepts/litellm-pricing-source.md
+++ b/wiki/concepts/litellm-pricing-source.md
@ -0,0 +1,59 @@
+---
+title: LiteLLM as Pricing Source
+tags: [concept, ai, cost-tracking, pricing, llm]
+created: 2026-04-27
+updated: 2026-04-27
+---
+
+# LiteLLM as Pricing Source
+
+The AI Cost Tracker uses the open-source **LiteLLM model prices JSON** as the primary source of truth for LLM pricing. This eliminates the need to scrape provider websites.
+
+## What is LiteLLM?
+
+[LiteLLM](https://github.com/BerriAI/litellm) is an open-source Python library (30k+ GitHub stars) for calling 100+ LLM providers with a unified interface. It maintains a community-curated `model_prices_and_context_window.json` covering Gemini, OpenAI, Anthropic, Cohere, Mistral, Together AI, and many others.
+
+## Why not scrape provider websites directly?
+
+| Problem | Impact |
+|---|---|
+| Pricing pages are React SPAs | Need headless browser; brittle |
+| Layout changes without notice | Breaks silently; wrong costs logged |
+| Different billing units per provider | Complex parsing; easy to get wrong |
+| Tier/volume discounts in HTML | Nearly impossible to parse reliably |
+| ToS may prohibit scraping | Legal risk |
+
+LiteLLM maintains all of this in a single structured JSON — battle-tested by thousands of production deployments.
+
+## The JSON structure
+
+```json
+{
+  "gemini/gemini-3-pro-preview": {
+    "input_cost_per_token": 0.00000125,
+    "output_cost_per_token": 0.000005,
+    "litellm_provider": "google",
+    "mode": "chat",
+    "max_tokens": 65536
+  }
+}
+```
+
+## What LiteLLM does NOT cover
+
+- **ElevenLabs** — not an LLM; character-based billing
+- **Google Cloud TTS** — not an LLM; character-based billing
+- **Self-hosted models** — no external billing
+
+These are defined in `pricing/models.yaml` in the cost-tracker repo. See [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]].
+
+## Keeping prices accurate
+
+1. Daily Celery beat task (`tasks/pricing_sync.py`) fetches the latest JSON
+2. If a price changes → admin gets notified; new price record created with `effective_from=today`
+3. Old price records kept forever for historical reporting
+4. To freeze at a known-good version: set `LITELLM_COMMIT_HASH` env var
+
+## The alternative considered
+
+Direct website scraping was evaluated and rejected due to the problems listed above. LiteLLM is the standard community solution for this exact use case.
--- a/wiki/concepts/preflight-record-pattern.md
+++ b/wiki/concepts/preflight-record-pattern.md
@ -0,0 +1,59 @@
+---
+title: Preflight + Record Pattern
+tags: [concept, ai, cost-tracking, patterns]
+created: 2026-04-27
+updated: 2026-04-27
+---
+
+# Preflight + Record Pattern
+
+The core usage-tracking pattern used by the AI Cost Tracker SDK. Every paid AI call follows the same three steps.
+
+## The pattern
+
+```
+preflight(estimated_units) → call AI → record(actual_units)
+```
+
+1. **Preflight** — before the AI call, ask the cost-tracker: "Is this workspace/project within budget?"
+   - Input: model name + estimated units (tokens, chars, etc.)
+   - Output: `allow=true/false`, estimated cost, `request_id`
+   - If `allow=false` → raise `BudgetExceeded` before calling the AI API
+
+2. **AI call** — the actual paid API call (unmodified)
+
+3. **Record** — after the call, report actual usage
+   - Input: `request_id` from preflight + actual units from response
+   - Output: `event_id`, `cost_usd`
+   - If cost-tracker is unavailable → SDK saves to SQLite outbox and retries in background (see [[wiki/concepts/sync-with-outbox|sync-with-outbox]])
+
+## Why two steps?
+
+- **Preflight** enables hard budget enforcement **before** money is spent
+- **Record** captures accurate actual usage (estimated ≠ actual for output tokens)
+- Decoupling protects the AI pipeline: if cost-tracker goes down after preflight, `record()` still succeeds via outbox
+
+## Estimation accuracy
+
+Preflight uses estimated units because output token count is unknown before the call:
+
+| Provider | What we estimate | Accuracy |
+|---|---|---|
+| Gemini text | input tokens (`len/4`), output tokens (caller hint) | ±30% |
+| Gemini video | input tokens (file-size table), output tokens (hint) | ±50% |
+| ElevenLabs | chars (exact — `len(text)`) | 100% |
+| Google TTS | chars (exact — `len(text)`) | 100% |
+
+Over-estimation is better than under-estimation for budget enforcement. If you consistently over-estimate by 50%, tune the default `estimated_output_tokens` hint downward.
+
+## Hard limit mechanics
+
+- Preflight computes `current_month_spend + estimated_cost`
+- If this exceeds `budget.amount_usd` AND `budget.hard_limit=True` → `allow=false`
+- The budget check is **eventual** (reads from pre-aggregated rollups + today's raw events), not transactional — brief overage is possible under high concurrency
+- This is acceptable for AI cost tracking: exact-to-the-cent enforcement would require distributed locks and add unacceptable latency
+
+## Projects using this pattern
+
+- [[wiki/projects-overview/ai-cost-tracker|ai-cost-tracker]] (defines the pattern)
+- video-accessibility (first consumer, Phase 1)
--- a/wiki/concepts/sync-with-outbox.md
+++ b/wiki/concepts/sync-with-outbox.md
@ -0,0 +1,67 @@
+---
+title: Sync HTTP + SQLite Outbox Pattern
+tags: [concept, architecture, resilience, cost-tracking]
+created: 2026-04-27
+updated: 2026-04-27
+---
+
+# Sync HTTP + SQLite Outbox Pattern
+
+The pattern used by the cost-tracker SDK to ensure usage events are **never lost** even if the cost-tracker service is temporarily unavailable.
+
+## The problem
+
+The AI pipeline (Celery worker) calls `ct.record(...)` after each AI API call. If the cost-tracker service is down, a naive implementation would either:
+- Silently drop the event (cost data lost)
+- Raise an exception (AI pipeline fails — unacceptable)
+
+## The solution
+
+```
+record() → try POST /v1/usage/record
+              ├── success → done
+              └── failure (timeout / 5xx / network) → save to SQLite outbox
+                                                           ↓
+                                              background flusher (every 30s)
+                                              retries all pending events with
+                                              exponential backoff
+```
+
+## Implementation details
+
+**SQLite outbox** (one file per worker, default `/tmp/cost_outbox.sqlite`):
+- Schema: `(id, ts, payload_json, attempts, last_attempt_at, status)`
+- Written synchronously before returning from `record()` on failure
+- Never blocks the AI pipeline
+
+**Background flusher** (asyncio background task):
+- Starts when `CostTracker` is initialised
+- Every 30 seconds: reads all `status='pending'` rows, retries `POST /v1/usage/record`
+- On success: marks `status='sent'`
+- After 10 failed attempts: marks `status='dead'`, logs warning → human investigation needed
+
+**Graceful degradation:**
+- `record()` never raises `CostTrackerUnavailable` — it's fire-and-forget via outbox
+- `preflight()` returns `allow=true` on connectivity failure by default (`fail_open=True`). Configurable.
+
+## Configuration
+
+```python
+ct = CostTracker(
+    ...
+    outbox_path="/tmp/cost_outbox.sqlite",
+    flush_interval_seconds=30,
+    max_retry_attempts=10,
+    fail_open=True,    # preflight returns allow=True when service unreachable
+)
+```
+
+## Monitoring
+
+- Outbox depth reported in SDK's `/metrics` endpoint (if enabled)
+- `dead` status rows require manual review — add to monitoring alert
+
+## Where this pattern applies
+
+- Any Oliver project using the `oliver-cost-tracker` SDK
+- Generally applicable to any fire-and-forget side-effect call where data loss is unacceptable but the consumer must not block the main flow
--- a/wiki/projects-overview/ai-cost-tracker.md
+++ b/wiki/projects-overview/ai-cost-tracker.md
@ -0,0 +1,50 @@
+---
+title: AI Cost Tracker — Project Card
+tags: [projects, oliver-platform, ai, cost-tracking]
+created: 2026-04-27
+updated: 2026-04-27
+---
+
+# AI Cost Tracker
+
+| Field | Value |
+|---|---|
+| **Repo** | `git@bitbucket.org:zlalani/ai-cost-tracker.git` |
+| **Stack** | FastAPI + MongoDB Atlas + Redis + Celery + React 18 + Vite |
+| **Auth (admin)** | Microsoft AAD (MSAL) |
+| **Auth (API)** | API key per connected project |
+| **Status** | Phase 1 — building (April 2026) |
+| **First consumer** | video-accessibility |
+
+## What it does
+
+Centralised AI cost tracking for all Oliver projects. Every project sends preflight + record events; cost-tracker aggregates, stores, and presents analytics by workspace / team / project / user / model.
+
+See [[wiki/architecture/ai-cost-tracker|ai-cost-tracker architecture]] for full architecture details.
+
+## Key URLs
+
+- Admin UI: TBD (e.g. `https://cost.oliver.agency`)
+- API base: `https://cost.oliver.agency/v1`
+- Health: `https://cost.oliver.agency/v1/health`
+
+## Connected projects
+
+| Project | Source app name | Connected since |
+|---|---|---|
+| video-accessibility | `video-accessibility` | Phase 1 (April 2026) |
+
+## How to connect a new project
+
+See [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]] — full step-by-step guide.
+
+## Related articles
+
+- [[wiki/architecture/ai-cost-tracker|Architecture overview]]
+- [[wiki/tech-patterns/cost-tracker-integration|Integration guide (new projects)]]
+- [[wiki/tech-patterns/cost-tracker-pricing-sources|Pricing sources]]
+- [[wiki/tech-patterns/cost-tracker-providers|Provider billing units]]
+- [[wiki/concepts/preflight-record-pattern|Preflight + Record pattern]]
+- [[wiki/concepts/lazy-user-mirror|Lazy user mirror]]
+- [[wiki/concepts/sync-with-outbox|Sync HTTP + outbox pattern]]
+- [[wiki/concepts/litellm-pricing-source|LiteLLM as pricing source]]
--- a/wiki/tech-patterns/_index.md
+++ b/wiki/tech-patterns/_index.md
@ -24,6 +24,9 @@ Recurring technology stacks used across Oliver Agency projects. Each article cov
 | [[wiki/tech-patterns/one2edit-api\|one2edit-api]] | One2Edit translation platform API | 3M Portal, H&M O2E Tool |
 | [[wiki/tech-patterns/nodejs-vanilla-proxy\|nodejs-vanilla-proxy]] | Node.js + Vanilla JS lightweight proxy tools | 3M Portal, Ferrero, Homepage |
 | [[wiki/tech-patterns/kling-veo-video-api\|kling-veo-video-api]] | Kling AI + Google Veo 3.1 video generation — camera control, I2V, polling | Cinema Studio Pro |
+| [[wiki/tech-patterns/cost-tracker-integration\|cost-tracker-integration]] | Step-by-step guide: connect any Oliver project to ai-cost-tracker (API key, SDK install, wrap AI calls, budgets) | All Oliver projects |
+| [[wiki/tech-patterns/cost-tracker-pricing-sources\|cost-tracker-pricing-sources]] | Three-layer pricing pipeline: LiteLLM auto-sync > YAML (non-LLM) > admin override; historical effective_from/to | ai-cost-tracker |
+| [[wiki/tech-patterns/cost-tracker-providers\|cost-tracker-providers]] | Billing units per AI provider: Gemini tokens (usage_metadata), ElevenLabs chars, Google TTS chars | All AI projects |

 ## Quick Decision Guide

--- a/wiki/tech-patterns/cost-tracker-integration.md
+++ b/wiki/tech-patterns/cost-tracker-integration.md
@ -0,0 +1,232 @@
+---
+title: AI Cost Tracker — Integrating a New Project
+tags: [how-to, ai, cost-tracking, integration]
+created: 2026-04-27
+updated: 2026-04-27
+---
+
+# Integrating a New Project with AI Cost Tracker
+
+Step-by-step guide for connecting any Oliver backend project to the shared cost-tracker.
+
+## Prerequisites
+
+- You have a Bitbucket account with access to `zlalani/ai-cost-tracker`
+- You can reach the cost-tracker admin UI (ask for the domain)
+- Your project is a FastAPI + Python backend (adaptable for other stacks)
+
+---
+
+## Step 1 — Get an API key
+
+1. Open the cost-tracker Admin UI → **API Keys** → **Create key**
+2. Name it after your project (e.g. `video-accessibility-prod`)
+3. Scope: `preflight`, `record`, `upsert`
+4. Copy the key — it is shown **only once**
+5. Store in your project's GCP Secret Manager (same pattern as `GEMINI_API_KEY`)
+
+---
+
+## Step 2 — Install the SDK
+
+```bash
+pip install oliver-cost-tracker
+# or, while the private package isn't published yet:
+# git submodule add git@bitbucket.org:zlalani/ai-cost-tracker.git vendor/cost-tracker
+# pip install -e vendor/cost-tracker/sdk/
+```
+
+---
+
+## Step 3 — Add environment variables
+
+```env
+COST_TRACKER_BASE_URL=https://cost.oliver.agency
+COST_TRACKER_API_KEY=ct_live_xxxxxxxxxxxxxxxxxxxx
+COST_TRACKER_SOURCE_APP=video-accessibility
+COST_TRACKER_OUTBOX_PATH=/tmp/cost_outbox.sqlite
+```
+
+---
+
+## Step 4 — Initialise the client
+
+In `core/dependencies.py` (FastAPI):
+
+```python
+from oliver_cost_tracker import CostTracker
+from functools import lru_cache
+
+@lru_cache
+def get_cost_tracker() -> CostTracker:
+    return CostTracker(
+        base_url=settings.cost_tracker_base_url,
+        api_key=settings.cost_tracker_api_key,
+        source_app=settings.cost_tracker_source_app,
+        outbox_path=settings.cost_tracker_outbox_path,
+    )
+```
+
+In Celery workers, instantiate once at module level:
+
+```python
+cost_tracker = CostTracker(
+    base_url=settings.cost_tracker_base_url,
+    api_key=settings.cost_tracker_api_key,
+    source_app="video-accessibility",
+    outbox_path="/tmp/cost_outbox.sqlite",
+)
+```
+
+---
+
+## Step 5 — Wrap AI calls (preflight → call → record)
+
+This is the **core pattern**. Every paid AI call follows three steps.
+See [[wiki/concepts/preflight-record-pattern|preflight-record-pattern]] for the full explanation.
+
+```python
+from oliver_cost_tracker import CostTracker, BudgetExceeded
+import time
+
+async def call_gemini_with_tracking(
+    ct: CostTracker,
+    prompt: str,
+    user_id: str,
+    job_id: str,
+    project_id: str | None = None,
+) -> GenerateContentResponse:
+
+    # 1. Preflight — checks budget, returns allow/deny
+    preflight = await ct.preflight(
+        user_external_id=user_id,
+        project_external_id=project_id,
+        job_external_id=job_id,
+        model="gemini-3-pro-preview",
+        estimated_input_tokens=ct.estimate_tokens(prompt),
+        estimated_output_tokens=2048,  # conservative overestimate
+    )
+    if not preflight.allow:
+        raise BudgetExceeded(preflight.deny_reason)
+
+    # 2. AI call
+    t0 = time.monotonic()
+    response = await client.models.generate_content(
+        model="gemini-3-pro-preview",
+        contents=prompt,
+    )
+    elapsed_ms = int((time.monotonic() - t0) * 1000)
+
+    # 3. Record actual usage
+    await ct.record(
+        request_id=preflight.request_id,
+        user_external_id=user_id,
+        project_external_id=project_id,
+        job_external_id=job_id,
+        model="gemini-3-pro-preview",
+        input_tokens=response.usage_metadata.prompt_token_count,
+        output_tokens=response.usage_metadata.candidates_token_count,
+        latency_ms=elapsed_ms,
+        status="success",
+    )
+    return response
+```
+
+For providers without usage metadata (ElevenLabs, Google Cloud TTS), compute units from input:
+
+```python
+await ct.record(
+    ...
+    model="eleven_multilingual_v2",
+    chars=len(text),  # billing unit is characters
+    latency_ms=elapsed_ms,
+    status="success",
+)
+```
+
+See [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] for all provider details.
+
+---
+
+## Step 6 — Attribution in background tasks (Celery)
+
+Celery tasks typically receive only `job_id`. Fetch `user_id` and `project_id` from the job document:
+
+```python
+@celery_app.task
+async def my_ai_task(job_id: str):
+    job = await db.jobs.find_one({"_id": job_id}, {"client_id": 1, "project_id": 1})
+    user_id = job["client_id"]
+    project_id = job.get("project_id")   # may be None for old jobs
+
+    result = await call_gemini_with_tracking(
+        ct=cost_tracker,
+        prompt=build_prompt(job),
+        user_id=user_id,
+        job_id=job_id,
+        project_id=project_id,
+    )
+```
+
+For per-cue Celery tasks (like `tts_synthesis.py`), add `user_id` and `project_id` to the task kwargs to avoid an extra DB fetch per cue.
+
+---
+
+## Step 7 — Create Workspace / Team / Project in Admin UI
+
+Before going live, set up the org structure in the cost-tracker Admin UI:
+
+1. **Workspace** → Create (e.g. "Ford", "H&M", or "video-accessibility" for internal use)
+2. **Team** → Create under the workspace (e.g. "Video Production")
+3. **Project** → Create under the team, set `source_app` = your project name and `external_id` = the ID your project will send in `project_external_id`
+
+> Jobs sent before a project is created appear as **Unassigned** in the dashboard. You can reassign them bulk later.
+
+---
+
+## Step 8 — Set budgets and alerts
+
+In Admin UI → **Budgets** → Create:
+
+- `scope_type`: workspace / team / project
+- `amount_usd`: monthly limit
+- `alert_thresholds`: [0.5, 0.8, 1.0] (email at 50%, 80%, 100%)
+- `hard_limit`: true = preflight returns `allow=false` when exceeded
+
+---
+
+## Step 9 — Smoke test
+
+```bash
+# 1. Check service is up
+curl https://cost.oliver.agency/v1/health
+
+# 2. Preflight
+curl -X POST https://cost.oliver.agency/v1/preflight \
+  -H "X-API-Key: ct_live_xxx" \
+  -H "Content-Type: application/json" \
+  -d '{"user_external_id":"test-user","model":"gemini-3-pro-preview","estimated_units":{"input_tokens":1000,"output_tokens":200}}'
+# expect: {"allow":true,"estimated_cost_usd":...,"request_id":"..."}
+
+# 3. Record
+curl -X POST https://cost.oliver.agency/v1/usage/record \
+  -H "X-API-Key: ct_live_xxx" \
+  -H "Content-Type: application/json" \
+  -d '{"request_id":"<from above>","user_external_id":"test-user","model":"gemini-3-pro-preview","units":{"input_tokens":987,"output_tokens":180},"latency_ms":1200,"status":"success"}'
+# expect: {"event_id":"...","cost_usd":0.00214}
+```
+
+Then open the Admin UI Dashboard — the test event should appear within seconds.
+
+---
+
+## Troubleshooting
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| `preflight` returns 401 | Wrong or missing API key | Check `COST_TRACKER_API_KEY` env var; verify key is active in Admin UI |
+| `preflight` returns `allow=false` | Budget exceeded | Admin UI → Budgets → raise limit or wait for next billing period |
+| Events not appearing in dashboard | Outbox accumulating (service down) | Check `/tmp/cost_outbox.sqlite`; service auto-retries when back up |
+| `cost_usd=null` on events | Model not in pricing table | Admin UI → Pricing → add model, or check LiteLLM sync task ran |
+| Slow preflight (>500ms) | cost-tracker under load or network | SDK retries automatically; if persistent, check service metrics |
+| Token estimates wildly off | Char/4 heuristic for video prompts | Gemini video needs file_size-based lookup; see [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] |
--- a/wiki/tech-patterns/cost-tracker-pricing-sources.md
+++ b/wiki/tech-patterns/cost-tracker-pricing-sources.md
@ -0,0 +1,131 @@
+---
+title: AI Cost Tracker — Pricing Sources
+tags: [how-to, ai, cost-tracking, pricing]
+created: 2026-04-27
+updated: 2026-04-27
+---
+
+# AI Cost Tracker — Pricing Sources
+
+The cost-tracker uses a **three-layer hybrid pricing pipeline**. Understanding the priority order is essential for accurate billing attribution.
+
+## Priority order
+
+```
+override  (highest — set manually by admin)
+  yaml    (fallback — versioned in repo for non-LLM providers)
+  litellm (lowest — auto-synced daily from open source)
+```
+
+`compute_cost(provider, model, units, ts)` returns the cost using the **highest-priority active price** for the given model at timestamp `ts`.
+
+---
+
+## Layer 1 — LiteLLM auto-sync (LLM providers)
+
+**Source:** `https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json`
+
+**Coverage:** Gemini, OpenAI, Anthropic, Cohere, Mistral, Together, and 100+ others.
+
+**Sync schedule:** Celery beat task `tasks/pricing_sync.py` runs **daily at 02:00 UTC**.
+
+**What happens on sync:**
+1. Fetches the JSON (pinned to a configurable commit hash in `LITELLM_COMMIT_HASH` env var)
+2. Maps `input_cost_per_token` / `output_cost_per_token` to our schema
+3. For each model:
+   - If no existing price → creates new `model_prices` record with `source="litellm"`
+   - If price unchanged → updates `litellm_commit_hash`, no other change
+   - If price **changed** → closes old record (`effective_to=today`), creates new record, sends **admin notification email**
+
+> **Note:** Auto-price changes never silently modify `source="override"` records. If you have an override active, the sync logs a divergence warning but leaves your override intact.
+
+**To pin a specific version** (for reproducibility):
+```env
+LITELLM_COMMIT_HASH=abc123def456   # pin to a known-good commit
+```
+
+See [[wiki/concepts/litellm-pricing-source|litellm-pricing-source]] for deeper explanation.
+
+---
+
+## Layer 2 — YAML (non-LLM providers)
+
+**File:** `backend/app/pricing/models.yaml` — versioned in the cost-tracker repo.
+
+Contains providers that LiteLLM does not cover:
+
+```yaml
+# ElevenLabs
+- provider: elevenlabs
+  model: eleven_multilingual_v2
+  billing_unit: char
+  price_per_1k_usd: 0.30
+  effective_from: "2025-01-01"
+
+- provider: elevenlabs
+  model: eleven_flash_v2_5
+  billing_unit: char
+  price_per_1k_usd: 0.11
+  effective_from: "2025-01-01"
+
+# Google Cloud TTS
+- provider: google_tts
+  model: standard
+  billing_unit: char
+  price_per_1m_usd: 4.00
+  effective_from: "2024-01-01"
+
+- provider: google_tts
+  model: wavenet
+  billing_unit: char
+  price_per_1m_usd: 16.00
+  effective_from: "2024-01-01"
+```
+
+**When to update YAML:**
+- ElevenLabs raises/lowers per-char pricing
+- Google Cloud TTS changes tier pricing
+- Adding a brand-new non-LLM provider
+
+**How to update:**
+1. Add a new entry with the new price and `effective_from: "YYYY-MM-DD"`
+2. Leave the old entry — it is used for historical cost attribution
+3. Deploy the new YAML → loader upserts on startup
+
+**Do NOT delete old entries** — they are needed for retroactive reports.
+
+---
+
+## Layer 3 — Admin override (UI)
+
+Use when you have:
+- A negotiated enterprise contract price (different from public pricing)
+- A volume discount or committed-use agreement
+- A temporary promotional rate
+- A price correction before the next LiteLLM sync
+
+**How to create an override:**
+1. Admin UI → **Pricing** → find the model → **Override price**
+2. Set: `price_per_unit_usd`, `effective_from` (defaults to today), optional `override_reason`
+3. Save → old price gets `effective_to=effective_from`, override is now active
+
+Override records are never auto-modified by LiteLLM sync.
+
+---
+
+## Historical pricing and retroactive reports
+
+Every usage event is stored with `price_id` — a reference to the exact `model_prices` record active at the time of the call:
+
+- **Retroactive reports are always accurate** — changing a price today does not affect yesterday's costs
+- Old `model_prices` records with `effective_to` set are never deleted
+- Re-evaluating historical costs with new pricing = manual export + spreadsheet (not a built-in feature)
+
+---
+
+## Monthly reconciliation
+
+Recommended monthly check:
+1. Download invoice from Google Cloud Console / ElevenLabs dashboard
+2. Compare with cost-tracker "Actual vs Billed" report (Admin UI → Analytics → Reconciliation)
+3. If >5% discrepancy: check for `pricing_missing=true` events and add missing model prices
--- a/wiki/tech-patterns/cost-tracker-providers.md
+++ b/wiki/tech-patterns/cost-tracker-providers.md
@ -0,0 +1,141 @@
+---
+title: AI Cost Tracker — Billing Units per Provider
+tags: [reference, ai, cost-tracking, providers]
+created: 2026-04-27
+updated: 2026-04-27
+---
+
+# Billing Units per Provider
+
+Reference for how each AI provider bills and how to extract usage data from their API responses.
+
+## Gemini (Google AI / Vertex AI)
+
+**Billing unit:** tokens (input + output separately)
+
+**SDK:** `google-genai` Python SDK
+
+**How to get usage:**
+```python
+response = await client.models.generate_content(...)
+
+input_tokens  = response.usage_metadata.prompt_token_count
+output_tokens = response.usage_metadata.candidates_token_count
+total_tokens  = response.usage_metadata.total_token_count
+```
+
+> ⚠️ `usage_metadata` is available on all `generate_content` responses including multimodal (video + text prompts). It was **not being read** in video-accessibility before the cost-tracker integration — added as part of Phase 1.
+
+**Token estimation before the call:**
+- Text: `len(text) / 4` (rough heuristic; actual tokenisation varies ±30%)
+- Video file: use Google's published token table:
+  - < 1 min video ≈ 1,000–2,000 tokens + audio
+  - Exact: check `google.genai` file metadata after upload
+- Image: ~258 tokens per 512×512 tile
+
+**Pricing:** auto-synced from LiteLLM. See [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]].
+
+---
+
+## Gemini TTS (audio generation via generate_content)
+
+**Billing unit:** tokens (output audio tokens, different rate from text)
+
+**SDK:** same `google-genai`, with `response_modalities=["AUDIO"]`
+
+**How to get usage:**
+```python
+response = await client.models.generate_content(
+    model="gemini-2.5-flash-preview-tts",
+    contents=...,
+    config=GenerateContentConfig(response_modalities=["AUDIO"]),
+)
+output_tokens = response.usage_metadata.candidates_token_count
+```
+
+Audio output token rate differs from text output rate — verify in LiteLLM for model `gemini-2.5-flash-preview-tts`.
+
+---
+
+## ElevenLabs TTS
+
+**Billing unit:** characters (input text length)
+
+**SDK:** custom HTTP (`aiohttp` POST to `https://api.elevenlabs.io/v1/text-to-speech/{voice_id}`)
+
+**Response:** returns raw audio bytes. **No usage metadata in response.**
+
+**How to measure:** compute `len(text)` at the call site **before** making the request:
+
+```python
+char_count = len(text)
+# make the ElevenLabs call
+await ct.record(..., chars=char_count, model="eleven_multilingual_v2", ...)
+```
+
+**Subscription vs pay-as-you-go:** ElevenLabs bills against a monthly character quota. When quota is exceeded, pay-as-you-go rate applies. The cost-tracker assumes pay-as-you-go for all characters (conservative upper bound). Adjust via admin override if on a subscription plan.
+
+---
+
+## Google Cloud TTS
+
+**Billing unit:** characters (input text length, after SSML stripping)
+
+**SDK:** `google.cloud.texttospeech` Python SDK
+
+**Response:** `SynthesizeSpeechResponse` with `audio_content` (bytes). **No character count in response.**
+
+**How to measure:**
+```python
+char_count = len(synthesis_input.text)
+# for SSML Google bills stripped char count — approximate with len(ssml)
+await ct.record(..., chars=char_count, model="standard", ...)
+```
+
+**Voice tiers and pricing:**
+
+| Voice type | Billing model name | Price per 1M chars |
+|---|---|---|
+| Standard | `google_tts/standard` | $4.00 |
+| WaveNet | `google_tts/wavenet` | $16.00 |
+| Neural2 | `google_tts/neural2` | $16.00 |
+| Studio | `google_tts/studio` | $160.00 |
+
+Defined in `pricing/models.yaml` in the cost-tracker repo.
+
+---
+
+## OpenAI (future)
+
+**Billing unit:** tokens (input + output)
+
+```python
+response = client.chat.completions.create(...)
+input_tokens  = response.usage.prompt_tokens
+output_tokens = response.usage.completion_tokens
+```
+
+Auto-synced by LiteLLM.
+
+---
+
+## Anthropic Claude (future)
+
+**Billing unit:** tokens (input + output)
+
+```python
+response = client.messages.create(...)
+input_tokens  = response.usage.input_tokens
+output_tokens = response.usage.output_tokens
+```
+
+Auto-synced by LiteLLM.
+
+---
+
+## Whisper (self-hosted)
+
+**Not billed per token.** Runs on Cloud Run / GPU compute.
+
+Billing = infrastructure cost (compute time). Phase 1 does not track this.
+Future Phase 2: track `audio_duration_seconds` and approximate cost from Cloud Run billing data.